Re: [Tutor] pickle problems

2012-08-24 Thread Richard D. Moores
Case Van Horsen wrote the following to me about gmpy2.is_prime. I post
it with his permission.

Dick Moores

The summary: gmpy2.is_prime() just provides direct access to the
underlying library (GMP or MPIR) function. Depending on the library
and version, the behavior is subtly different.

With the current version, both libraries behave similarly - the
Miller-Rabin test is repeated 25 times. The sequence of bases checked
are "random" values. The bases depend on the value of the number being
tested. The bases really aren't random since same set of bases is used
for each test of a given number. (Testing with 5 repetitions followed
by another test with 10 repetitions actually repeats the tests with
the first 5 bases and then tries 5 more bases. It is not the same as
testing with 15 repetiions.)

The Miller-Rabin test actually detects composite numbers. It will fail
to detect a composite number with at most 1/4 of all possible bases.
So after 5 tests, composite numbers are detected with a guaranteed
error of 1 in 1024. On average, the error rate is less than that, but
for certain composites the worst-case error estimates can be reached.
The worst-case is reached when the number being checked is p*(2p-1)
where both p and 2p-1 are prime.

A few people have tried worst-case numbers with GMP and have found
many values that require 10 or more iterations. (One value that
requires 15 iterations has been found.) So the default of 25
iterations is reasonable but it could be set higher if you are truly
paranoid however this will make the test slower for actual primes.

An alternative primality test is known as the BPSW test. It is
available in gmpy2 as is_bpsw_prp(). IIRC, the implementation requires
about the same running time as 10 iterations of Miller-Rabin. But the
biggest advantage is that there are no known false results. The test
has not been proven to work 100% of the time. It is conjectured that
there are false results should occur but no one has ever found one.
The MPIR library has a faster version of the BPSW test but I don't
make it available at the moment. I will probably add it to gmpy2 soon.
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] pickle problems

2012-08-23 Thread Dave Angel
On 08/23/2012 04:43 AM, Steven D'Aprano wrote:
> On Wed, Aug 22, 2012 at 11:39:46PM -0400, Dave Angel wrote:
>> On 08/22/2012 07:32 PM, Richard D. Moores wrote:
>>> 
>>>
>>> My code uses gmpy2.is_prime()  (lines 79 and 89). is_prime() is VERY fast.
>> You do know that this gmpy2 function is only statistically correct ?  it
>> can false positive.  I don't know what the probs are, but it uses
>> Miller-Rabin, with a default factor of 25.
> What Dave means is:
>
> - If gmpy2.is_prime says a number is not prime, then that is 
>   certainly true;
>
> - but if it says that a number is prime, then that is only 
>   probably true.
>
> With 25 Miller-Rabin tests, the probability of a false positive is (if I 
> remember correctly) 1 time in 4**25 or about 1 time in a million 
> billion. (That's American billion, 10**9, not old-style English 
> billion.) If you tested a thousand prime numbers a second for thirty 
> thousand years, you would expect perhaps one false positive.
>
> The odds are much higher that a hardware fault or stray cosmic ray 
> hitting the CPU or memory chip will cause the computer to calculate the 
> wrong answer.
>

I knew of the Miller Rabin test, but I did not know the probabilities
involved. I'm a little surprised I didn't see any docs on the gmpy2 site
that gave such probabilities.  it defaults to 25, without explaining how
that number relates to anything in the wikipedia article.  
https://en.wikipedia.org/wiki/Miller%E2%80%93Rabin_primality_test

Does anyone know if the gmp2 code is testing the sequence described in
wikipedia, where a is  2, 3, 5, 7, 11, 13, and 17 ?   If that relates
directly, it would seem that 7 tests would give 100% confidence for n up
to 14+ digits.



-- 

DaveA

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] pickle problems

2012-08-23 Thread Richard D. Moores
On Wed, Aug 22, 2012 at 8:39 PM, Dave Angel  wrote:
> On 08/22/2012 07:32 PM, Richard D. Moores wrote:
>> 
>>
>> My code uses gmpy2.is_prime()  (lines 79 and 89). is_prime() is VERY fast.
>
> You do know that this gmpy2 function is only statistically correct ?

Yes. See Steven's reply for the probabilities.

Dick

>  it
> can false positive.  I don't know what the probs are, but it uses
> Miller-Rabin, with a default factor of 25.
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] pickle problems

2012-08-23 Thread Steven D'Aprano
On Wed, Aug 22, 2012 at 11:39:46PM -0400, Dave Angel wrote:
> On 08/22/2012 07:32 PM, Richard D. Moores wrote:
> > 
> >
> > My code uses gmpy2.is_prime()  (lines 79 and 89). is_prime() is VERY fast.
> 
> You do know that this gmpy2 function is only statistically correct ?  it
> can false positive.  I don't know what the probs are, but it uses
> Miller-Rabin, with a default factor of 25.

What Dave means is:

- If gmpy2.is_prime says a number is not prime, then that is 
  certainly true;

- but if it says that a number is prime, then that is only 
  probably true.

With 25 Miller-Rabin tests, the probability of a false positive is (if I 
remember correctly) 1 time in 4**25 or about 1 time in a million 
billion. (That's American billion, 10**9, not old-style English 
billion.) If you tested a thousand prime numbers a second for thirty 
thousand years, you would expect perhaps one false positive.

The odds are much higher that a hardware fault or stray cosmic ray 
hitting the CPU or memory chip will cause the computer to calculate the 
wrong answer.


For anyone interested, here is my pyprimes module:

http://pypi.python.org/pypi/pyprimes/

which is written in pure Python (no reliance on gmpy2), so you can 
actually see how the algorithms work, although it is correspondingly 
much slower.

Source code can be found here:

http://code.google.com/p/pyprimes/source/browse/src/pyprimes.py

It is *extensively* documented. The API is alpha-quality and subject to 
change.


-- 
Steven
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] pickle problems

2012-08-22 Thread Dave Angel
On 08/22/2012 07:32 PM, Richard D. Moores wrote:
> 
>
> My code uses gmpy2.is_prime()  (lines 79 and 89). is_prime() is VERY fast.

You do know that this gmpy2 function is only statistically correct ?  it
can false positive.  I don't know what the probs are, but it uses
Miller-Rabin, with a default factor of 25.




-- 

DaveA

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] pickle problems

2012-08-22 Thread Richard D. Moores
On Wed, Aug 22, 2012 at 11:54 AM, Steven D'Aprano  wrote:
> On 23/08/12 02:17, Richard D. Moores wrote:
>>
>> I've incorporated many of the suggestions I've received here, plus some 
>> important tips from Case Van Horsen, the gmpy2 maintainer, and I believe one 
>> of its developers.
>>
>> Here's a function, factor_integer(), for quickly factoring any integer
>> up to 1e17:.
>
>
> That relies on a pre-existing cache of prime numbers. If you don't use
> those cached prime numbers, do you know how long your code takes?

Much longer.

>> Larger integers will be factored eventually -- the wait can be long or
>> short. Probably long if they require the services of lines 82-91.
>>
>> Examples of extremes, both between 1e25 and 1e26:
>> 2,835,334,663,465,375,591,838,337  [3, 19, 37, 71, 947,
>> 19994908447741489]  1.172 secs
>> 9,349,337,574,247,205,482,125,105  [3, 5, 2027, 2311296661,
>> 133039358281] 402.5 secs
>
>
> I'm astounded by the times you quote there.

Excellent!

> The first example includes the prime 19994908447741489, which is *not*
> in the cache, since it is bigger than 1e17. And yet it only takes about
> a second to find all six primes. If true, that's astonishingly good for
> something written in pure Python.

My code uses gmpy2.is_prime()  (lines 79 and 89). is_prime() is VERY fast.

C:\Windows\System32>python -m timeit -v -r 5 -s "from gmpy2 import
is_prime" "is_prime(19994908447741489)"
10 loops -> 0.0008 secs
100 loops -> 0.0073 secs
1000 loops -> 0.0409 secs
1 loops -> 0.394 secs
raw times: 0.392 0.393 0.394 0.392 0.393
1 loops, best of 5: 39.2 usec per loop

and to show its speed for enormous prime ints:
>>> is_prime(987987987199949084477414897654765465435634564357034523045023453475389457937283)
True

C:\Windows\System32>python -m timeit -v -r 5 -s "from gmpy2 import
is_prime" 
"is_prime(987987987199949084477414897654765465435634564357034523045023453475389457937283)"
10 loops -> 0.0077 secs
100 loops -> 0.0765 secs
1000 loops -> 0.758 secs
raw times: 0.757 0.758 0.757 0.757 0.758
1000 loops, best of 5: 757 usec per loop

Also, of the prime factors of  2,835,334,663,465,375,591,838,337, all
but the last one, 19994908447741489, are small and quickly found.
Once factors 3, 19, 37, 71, 947 are found, 19994908447741489 is simply
the quotient of
 2835334663465375591838337 // (3*19*37*71*947), and is_prime confirms
its primacy.

> The second example includes the prime 133039358281, which is smaller
> than 1e17 and so should be in the cache and therefore found (almost)
> instantly.

No, the cache has only primes up to 100,000,000, the last one being 99,999,989.

> And yet you claim it takes nearly seven minutes to find it.
> If correct, that's astonishingly awful given that you have the prime
> numbers already cached.

The delay is caused by the 2nd largest factor being a big one,
2311296661 which is greater than 100 million, whereas in the first
example, the 2nd largest factor is a puny 947 -- but most important, I
think, is that it smaller than 100 million.

> Can you explain those timings? How did you measure them?

Instead of just line 99 at the end of the pasted script, I had

from time import clock as t

t0 = t()
print(factor_integer(2835334663465375591838337))
t1 = t()
print("time =", round((t1-t0), 3))
"""
OUTPUT:
2,835,334,663,465,375,591,838,337 [3, 19, 37, 71, 947, 19994908447741489]
1.121
"""

Essentially the same algorithm is employed in the other script I
pasted at , factor_random_integers().
I've run factor_random_integers a couple of more times, with
num_integers, min_length, max_length = 10, 25, 25. See
, "More output from
factor_random_integers".

Dick
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] pickle problems

2012-08-22 Thread Steven D'Aprano

On 23/08/12 02:17, Richard D. Moores wrote:

I've incorporated many of the suggestions I've received here.

Here's a function, factor_integer(), for quickly factoring any integer
up to 1e17:.


That relies on a pre-existing cache of prime numbers. If you don't use
those cached prime numbers, do you know how long your code takes?



Larger integers will be factored eventually -- the wait can be long or
short. Probably long if they require the services of lines 82-91.

Examples of extremes, both between 1e25 and 1e26:
2,835,334,663,465,375,591,838,337  [3, 19, 37, 71, 947,
19994908447741489]  1.172 secs
9,349,337,574,247,205,482,125,105  [3, 5, 2027, 2311296661,
133039358281] 402.5 secs


I'm astounded by the times you quote there.

The first example includes the prime 19994908447741489, which is *not*
in the cache, since it is bigger than 1e17. And yet it only takes about
a second to find all six primes. If true, that's astonishingly good for
something written in pure Python.

The second example includes the prime 133039358281, which is smaller
than 1e17 and so should be in the cache and therefore found (almost)
instantly. And yet you claim it takes nearly seven minutes to find it.
If correct, that's astonishingly awful given that you have the prime
numbers already cached.

Can you explain those timings? How did you measure them?



--
Steven
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] pickle problems

2012-08-22 Thread Richard D. Moores
I've incorporated many of the suggestions I've received here.

Here's a function, factor_integer(), for quickly factoring any integer
up to 1e17: .

Larger integers will be factored eventually -- the wait can be long or
short. Probably long if they require the services of lines 82-91.

Examples of extremes, both between 1e25 and 1e26:
2,835,334,663,465,375,591,838,337  [3, 19, 37, 71, 947,
19994908447741489]  1.172 secs
9,349,337,574,247,205,482,125,105  [3, 5, 2027, 2311296661,
133039358281] 402.5 secs

These were found with the use of a related script,
"factor_random_integers.py", pasted at 


Dick Moores
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] pickle problems

2012-08-12 Thread Richard D. Moores
On Sun, Aug 12, 2012 at 10:49 AM, eryksun  wrote:
> On Sun, Aug 12, 2012 at 10:58 AM, Dave Angel  wrote:


> Good catch in Kent Johnson's code. Maybe he'll search for his name and
> find this. It should be `r = r // factor`.

The fault was probably mine. Kent wrote this in 2004, in Python 2.x.
When I moved to 3.x I converted the function to 3.x, but missed that
one.
See Kent's Tutor post archived at


I've had an amazing amount of help today. I haven't had time to attend
to all of it yet, but thought I'd get this out. Since posting last
I've found that my script () that I
thought was perfect (except for lines 74-78), isn't. Obvious errors in
the dictionary/pickle, such as even prime factors! I'm hoping that
correcting Kent's function will stop the sort of errors I've found.
And I've found an obvious test: The product of an integers prime
factors must equal the integer itself. And of course the factors must
be tested for primacy(?)/prime numberhood(?), which is quickly done
with gmpy2's is_prime().

I just now checked one of the erroneous items that had an even prime factor:

n = 279918016070854658 a random 18-digit integer
the factors of 279,918,016,070,854,658 are [2, 7, 41, 487662048903928]

After correcting Kent's function, I correctly get

> 279918016070854658
the factors of 279,918,016,070,854,658 are [2, 7, 3011, 6640366657277]

>>>487662048903928*2*7*41
279918016070854672
>>>2*7*3011*6640366657277
279918016070854658

>>>is_prime(3011)
True
>>>is_prime(6640366657277)
True

I also just pasted the 3rd version of my script,
pickle_attempt_for_web3.py, at .

Dick
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] pickle problems

2012-08-12 Thread Peter Otten
Richard D. Moores wrote:

> f = open("factors.txt", 'rb')
> data = pickle.load(f)
> f.close

f.close looks up the close method but doesn't invoke it; you need f.close().
Alternatively use a with statement:

with open("factors.txt", "rb") as f:
data = pickle.load(f)

This will close the file even if an exception occurs when trying to unpickle 
the data.

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] pickle problems

2012-08-12 Thread eryksun
On Sun, Aug 12, 2012 at 10:58 AM, Dave Angel  wrote:
>
> 2) You wind up with a floating point number.  If you're getting floats,
> then you're limited to their precision, maybe 18 digits, and limited to
> their speed.  Perhaps you need to use the // divide rather than the /
> one.  And perhaps it'd help to use divmod instead of using % and /
> separately.

Good catch in Kent Johnson's code. Maybe he'll search for his name and
find this. It should be `r = r // factor`.

> 3) You're doing a % on all the odd numbers above 1, where you really
> only need to try the primes.  If you build an iterator for the primes,
> you don't need to precalculate them, just cache them for reuse.  Again,
> gmp2 is likely to have something useful.

@Richard, if you haven't learned generators, see the docs for yield
expressions:

http://docs.python.org/py3k/reference/expressions.html#yield-expressions
http://www.python.org/dev/peps/pep-0255

First yield the cached primes. Then compute the next prime, cache it,
and yield it. Use the cache to speed up finding the next prime. You
can pickle the cache of primes instead of the factorizations.

> I'm not sure what the point is of preserving the factorizations,
> especially since you don't store the number you're factoring for each
> case.

The numbers are the dictionary keys.
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] pickle problems

2012-08-12 Thread Dave Angel
On 08/12/2012 04:44 AM, Richard D. Moores wrote:

> 
>
> One caveat is that I have yet to fix the problem with lines 75-78.
>
> One thing I'd like to implement is a monitor of the time
> factorsOfInteger(n) takes to process some of the 18-digit ints (line
> 153). Most are processed within a second or two, but some can take
> several minutes. I'd like to limit the time to 15 or 20 seconds, but
> is there a way to do this? Just a wild guess, but is this where
> threading would be useful?
>
>

Now that you're asking about timing, I have to point out that your
algorithm isn't very fast.  If you could speed it up, perhaps you
wouldn't need to limit the time.

1) You use gmpy2,is_prime().  Seems likely that a library with an
is_prime function might also have some other functions to aid in factoring.

2) You wind up with a floating point number.  If you're getting floats,
then you're limited to their precision, maybe 18 digits, and limited to
their speed.  Perhaps you need to use the // divide rather than the /
one.  And perhaps it'd help to use divmod instead of using % and /
separately.

3) You're doing a % on all the odd numbers above 1, where you really
only need to try the primes.  If you build an iterator for the primes,
you don't need to precalculate them, just cache them for reuse.  Again,
gmp2 is likely to have something useful.

I'm not sure what the point is of preserving the factorizations,
especially since you don't store the number you're factoring for each
case.  Seems to me it'd be more useful to save a list of primes.  But
you've got your reasons i'm sure.

-- 

DaveA

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] pickle problems

2012-08-12 Thread eryksun
On Sun, Aug 12, 2012 at 8:46 AM, eryksun  wrote:
>
> t = threading.Thread(target=factorsOfInteger, args=(n, qin, qout)).start()

Sorry I need to proofread better. That should be the following:

t = threading.Thread(target=factorsOfInteger, args=(n, qin, qout))
t.start()
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] pickle problems

2012-08-12 Thread eryksun
On Sun, Aug 12, 2012 at 7:44 AM, Richard D. Moores  wrote:
>
> I just discovered
> os.path.getsize('factors.txt')
> and that factors.txt has a size of 2 bytes when "empty".
> (I changed the file extension to .txt so I could delete the contents.)

No, an empty file has no data; the size is 0. You must have saved a
text file in Windows that added a byte order mark (BOM). Windows adds
a BOM for UTF-8 and UTF-16 ('Unicode') files. It sounds silly to have
a BOM for UTF-8, which can't have a little endian or big endian byte
order, but it's there to distinguish UTF-8 from 8-bit ANSI (e.g.
Windows 1252).
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] pickle problems

2012-08-12 Thread eryksun
On Sun, Aug 12, 2012 at 4:44 AM, Richard D. Moores  wrote:
>
> OK, thanks for the code, which I will duly study. However, I just
> pasted my new version, pickle_attempt_for_web2.py at
> . I've tested it and tested it, and it
> does exactly what I wanted (thanks to you!). Yes, you're correct, I
> want to grow the pickle file, the dictionary. The new version puts
> every new item into it -- nothing gets lost.

I forgot to separate out the PickleError. It shouldn't append new data
to a file that gave an error, so rename the existing file. A new file
is created later when factors.dat is opened in append mode ('ab').

You asked about handling an empty file. This version (which grows the
existing file per session instead of writing a new file) always sees
EOFError since it calls pickle.load(f) until it reaches EOF. But the
same applies if you only call pickle.load(f) once on an empty file.
Just handle the exception with a simple "pass" statement.

I narrowed the IOError handling to only deal with ENOENT (file not
found). There are other reasons the file could fail to open for
reading (e.g. it's a directory or you don't have permission), none of
which you probably want to handle automatically, so it just exits with
an error.

I restructured the main loop to put everything in the try block and
handled the ValueError case as well as exiting with an error if the
pickle/save fails.

import sys, os, errno
import pickle
from pickle import PickleError

D = {}
session = {}
try:
with open("factors.dat", 'rb') as f:
while True:
D.update(pickle.load(f))
except EOFError:
pass#Empty file or finished loading
except IOError as e:
if e.errno != errno.ENOENT:
sys.exit("Can't open factors.dat for reading.")
except PickleError:
try:
#Rename so as not to append to a bad file
os.rename('factors.dat', 'factors.err')
except OSError:
sys.exit("Renaming damaged factors.dat failed.")

if len(D):
print("Loaded", len(D), "entries.")

while True:
try:
n = int(input("Enter a non-negative integer (0 to quit): "))
if n < 0:
raise ValueError

if n == 0:
print("D has", len(D), "entries")
if len(session):
with open('factors.dat', 'ab') as f:
   pickle.dump(session, f)
sys.exit(0)

factors = D[n]
print("the factors of", n, "are", factors)

except ValueError:
print("e.g. 0, 1, 2, 3")

except KeyError:
factors = factorsOfInteger(n)
print("the factors of", n, "are", factors)
D[n] = session[n] = factors

except (IOError, PickleError) as e:
sys.exit("Error saving data: {0}".format(e.args[-1]))


> One thing I'd like to implement is a monitor of the time
> factorsOfInteger(n) takes to process some of the 18-digit ints (line
> 153). Most are processed within a second or two, but some can take
> several minutes. I'd like to limit the time to 15 or 20 seconds, but
> is there a way to do this? Just a wild guess, but is this where
> threading would be useful?

I'd put the time check in the main loop of factorsOfInteger.

threading doesn't have an interface to stop a running thread. If you
want to use threads, use queues since they're a simple, thread-safe
way to communicate between threads. You can modify factorsOfInteger to
monitor a queue.Queue and break when it receives a command to halt.
Use a 2nd queue to get the result back from the worker thread.
Specifically, only call findFactor in factorsOfInteger if qin.empty()
is True. Otherwise, qout.put(False) and break. If the function
terminates normally, then qout.put(factors).

qin = queue.Queue()
qout = queue.Queue()

t = threading.Thread(target=factorsOfInteger, args=(n, qin, qout)).start()

try:
factors = qout.get(timeout=20)
except queue.Empty:
qin.put('halt')

t.join()  #wait for the thread to terminate
factors = qout.get()  #possibly False if factorsOfInteger quit early
if factors:
D[n] = session[n] = factors

See the docs:

http://docs.python.org/py3k/library/threading.html
http://docs.python.org/py3k/library/queue.html

You could also use a multiprocessing pool to speed things up if you
have multiple cores. You'd have to rewrite the factorization code a
bit. Partition the search range (i.e. up to the square root) among N
cores and run findFactor in parallel. If you have 4 cores, you'll get
up to 4 factors. Divide them out, repartition the new range, and
repeat. Or something like that. I'm sure you can find a good reference
online for parallel factorization.
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] pickle problems

2012-08-12 Thread Richard D. Moores
On Sun, Aug 12, 2012 at 2:13 AM, Richard D. Moores  wrote:
.
> But what
> about case where factors.dat is empty? Is there a test for that?

I just discovered
os.path.getsize('factors.txt')
and that factors.txt has a size of 2 bytes when "empty".
(I changed the file extension to .txt so I could delete the contents.)

So I now have
===
D = {}
if os.path.getsize('factors.txt') == 2:
#The file exists, but is empty: leave it alone
pass

elif os.path.exists('factors.txt'):
#The file exists and is not empty: use it
f = open("factors.txt", 'rb')
data = pickle.load(f)
f.close
D = data

else:
# create the file and close it
f = open("factors.txt", 'wb')
f.close
==
which seems to work just fine for all cases.
Those lines would replace lines 74-78 in 

Dick
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] pickle problems

2012-08-12 Thread Richard D. Moores
On Sun, Aug 12, 2012 at 1:00 AM, Alan Gauld  wrote:
> On 12/08/12 03:43, Richard D. Moores wrote:
>
>> ===
>> if "factors.dat":
>
> This is testing if the string is True, which it always is.
> I assume you intended something like
>
> if os.path.exists('factors.dat'):
>
>
>>  f = open("factors.dat", 'rb')
>>  data = pickle.load(f)
>>  f.close
>>  D = data
>> else:
>>  f = open("factors.dat", 'wb')
>
>
> Not sure there is any point in opening a new file at this point. you are
> trying to populate data, but if there's no file there is no data so instead
> of opening the file you want something like data = {}

Great! OK, I now have
=
D = {}
if os.path.exists('factors.dat'):
f = open("factors.dat", 'rb')
data = pickle.load(f)
f.close
D = data
else:
f = open("factors.dat", 'wb')
f.close
===

Which takes care of the case where factors.dat is missing. But what
about case where factors.dat is empty? Is there a test for that?

When factors.dat exists, but is empty, I get
==
Traceback (most recent call last):
  File "C:\P32Working\pickle_attempt_for_web3a.py", line 78, in 
data = pickle.load(f)
EOFError
Process terminated with an exit code of 1
==

Dick
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] pickle problems

2012-08-12 Thread Richard D. Moores
On Sat, Aug 11, 2012 at 9:51 PM, eryksun  wrote:
> On Sat, Aug 11, 2012 at 11:19 PM, Richard D. Moores  
> wrote:
>>
>>> To clarify, you can store multiple pickles in a file, but each needs
>>> its own load. So you'd have to maintain a session dictionary for the
>>> factors of new integers. Then append the pickled session to the file
>>> when the user quits. When the program starts you'd have to loop
>>> through the file to update D with each pickled session.
>>
>> Isn't that essentially what my script does?
>
> On line 69 your script (http://pastebin.com/SNwKRuSK) appends the
> current D to the end. So only the last pickle appended would be
> complete. My first response was for you to switch to 'wb' mode to
> truncate the file and only save the latest complete session.
>
> Then it occurred to me that you actually wanted to grow the pickle the
> file. I proposed the above solution to append the new factorizations
> per session. Then at the start load the pickled sessions in a loop,
> updating D with each loaded dictionary. For example:

OK, thanks for the code, which I will duly study. However, I just
pasted my new version, pickle_attempt_for_web2.py at
. I've tested it and tested it, and it
does exactly what I wanted (thanks to you!). Yes, you're correct, I
want to grow the pickle file, the dictionary. The new version puts
every new item into it -- nothing gets lost.

One caveat is that I have yet to fix the problem with lines 75-78.

One thing I'd like to implement is a monitor of the time
factorsOfInteger(n) takes to process some of the 18-digit ints (line
153). Most are processed within a second or two, but some can take
several minutes. I'd like to limit the time to 15 or 20 seconds, but
is there a way to do this? Just a wild guess, but is this where
threading would be useful?

Dick
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] pickle problems

2012-08-12 Thread Alan Gauld

On 12/08/12 03:43, Richard D. Moores wrote:


===
if "factors.dat":


This is testing if the string is True, which it always is.
I assume you intended something like

if os.path.exists('factors.dat'):


 f = open("factors.dat", 'rb')
 data = pickle.load(f)
 f.close
 D = data
else:
 f = open("factors.dat", 'wb')


Not sure there is any point in opening a new file at this point. you are 
trying to populate data, but if there's no file there is no data so 
instead of opening the file you want something like data = {}



--
Alan G
Author of the Learn to Program web site
http://www.alan-g.me.uk/

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] pickle problems

2012-08-11 Thread eryksun
On Sat, Aug 11, 2012 at 11:19 PM, Richard D. Moores  wrote:
>
>> To clarify, you can store multiple pickles in a file, but each needs
>> its own load. So you'd have to maintain a session dictionary for the
>> factors of new integers. Then append the pickled session to the file
>> when the user quits. When the program starts you'd have to loop
>> through the file to update D with each pickled session.
>
> Isn't that essentially what my script does?

On line 69 your script (http://pastebin.com/SNwKRuSK) appends the
current D to the end. So only the last pickle appended would be
complete. My first response was for you to switch to 'wb' mode to
truncate the file and only save the latest complete session.

Then it occurred to me that you actually wanted to grow the pickle the
file. I proposed the above solution to append the new factorizations
per session. Then at the start load the pickled sessions in a loop,
updating D with each loaded dictionary. For example:

D = {}
session = {}
try:
with open('factors.dat', 'rb') as f:
while True:
D.update(pickle.load(f))
except EOFError:
pass
except (IOError, pickle.PickleError):
D = {}

if len(D):
print("Loaded", len(D), "entries.")

while True:
n = int(input("integer: "))

if n == 0:
print("D has", len(D), "entries.")
with open('factors.dat', 'ab') as f:
pickle.dump(session, f)
break

try:
factors = D[n]
print("the factors of", n, "are", factors)

except KeyError:
factors = factorsOfInteger(n)
print("the factors of", n, "are", factors)
D[n] = factors
session[n] = factors
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] pickle problems

2012-08-11 Thread eryksun
On Sat, Aug 11, 2012 at 10:43 PM, Richard D. Moores  wrote:
>
> Changing to 'wb' mode and using the file extension .dat completely
> corrected the problems, it seems. Line 69 now reads,
> f = open("factors.dat", 'wb')

The extension doesn't affect anything about the file. It's just .txt
indicates a text file.

> But the reference I have says of 'wb': "Write to a binary file. If the
> file exists, its contents are overwritten. If the file doesn't exist,
> it's created. Why doesn't factors.dat get overwritten before the
> pickle.dump()? Is it because there isn't any new data to be written at
> that point?

Opening in mode 'wb' truncates the file. Then pickle.dump() writes the
pickle to it. Before you were appending to the end of the file, so you
had multiple pickled dictionaries stored in the file.

> And another question, if I might. If factors.dat doesn't exist, to use
> the program I need to manually create it and rem out lines 49-52 the
> first time I call the script. I thought I could replace lines 49-52
> with
>
> if "factors.dat":
> f = open("factors.dat", 'rb')
> data = pickle.load(f)
> f.close
> D = data
> else:
> f = open("factors.dat", 'wb')

You can catch the IOError if the file doesn't exist or
pickle.PickleError if it's corrupted, and just assign an empty dict.

try:
with open("factors.dat", 'rb') as f:
D = pickle.load(f)
except (IOError, pickle.PickleError):
D = {}

There's no reason to open the file for writing at this point. You do that later.
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] pickle problems

2012-08-11 Thread Richard D. Moores
On Sat, Aug 11, 2012 at 6:58 PM, eryksun  wrote:
> On Sat, Aug 11, 2012 at 9:18 PM, eryksun  wrote:
>>

> To clarify, you can store multiple pickles in a file, but each needs
> its own load. So you'd have to maintain a session dictionary for the
> factors of new integers. Then append the pickled session to the file
> when the user quits. When the program starts you'd have to loop
> through the file to update D with each pickled session.

Isn't that essentially what my script does?

> If you want an approach that's simpler and faster, use the shelve
> module. A shelf is a dictionary-like object that uses pickle to
> serialize objects stored to a database. The keys have to be strings,
> so your code would change to  `D[str(n)] = factors`.
>
> http://docs.python.org/py3k/library/shelve.html#module-shelve

Yes, shelve is next, now that I have mastered (ha!) pickle.

Thanks again for the terrific help.

Dick
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] pickle problems

2012-08-11 Thread Richard D. Moores
On Sat, Aug 11, 2012 at 6:18 PM, eryksun  wrote:
> On Sat, Aug 11, 2012 at 6:30 PM, Richard D. Moores  wrote:
>>
>> I wrote pickle_attempt.py as an exercise to try to learn to use the
>> pickle module. See the version I edited for Tutor,
>> pickle_attempt_for_web.py at
>> .
>> 
>> But after closing the program and restarting it, those items have
>> disappeared from D.
>
> On line 68 you open the file in 'ab' mode. A pickle stream ends with
> an ASCII period (\x2e). Anything appended after that is ignored. Use
> 'wb' mode. Also, Python 3.x defaults to protocol 3, which is binary,
> so you might want a file extension other than .txt, such as .pkl,
> .dat, .bin, etc.

Changing to 'wb' mode and using the file extension .dat completely
corrected the problems, it seems. Line 69 now reads,
f = open("factors.dat", 'wb')

But the reference I have says of 'wb': "Write to a binary file. If the
file exists, its contents are overwritten. If the file doesn't exist,
it's created. Why doesn't factors.dat get overwritten before the
pickle.dump()? Is it because there isn't any new data to be written at
that point?

And another question, if I might. If factors.dat doesn't exist, to use
the program I need to manually create it and rem out lines 49-52 the
first time I call the script. I thought I could replace lines 49-52
with

===
if "factors.dat":
f = open("factors.dat", 'rb')
data = pickle.load(f)
f.close
D = data
else:
f = open("factors.dat", 'wb')
==

Doesn't work. What would?

> If you're curious about the protocol, you can disassemble a pickle
> using pickletools.dis(pickle), where pickle is bytes or a file-like
> object.

I'll give that a try.

Thanks!

Dick Moores
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] pickle problems

2012-08-11 Thread eryksun
On Sat, Aug 11, 2012 at 9:18 PM, eryksun  wrote:
>
> On line 68 you open the file in 'ab' mode. A pickle stream ends with
> an ASCII period (\x2e). Anything appended after that is ignored. Use
> 'wb' mode. Also, Python 3.x defaults to protocol 3, which is binary,
> so you might want a file extension other than .txt, such as .pkl,
> .dat, .bin, etc.

To clarify, you can store multiple pickles in a file, but each needs
its own load. So you'd have to maintain a session dictionary for the
factors of new integers. Then append the pickled session to the file
when the user quits. When the program starts you'd have to loop
through the file to update D with each pickled session.

If you want an approach that's simpler and faster, use the shelve
module. A shelf is a dictionary-like object that uses pickle to
serialize objects stored to a database. The keys have to be strings,
so your code would change to  `D[str(n)] = factors`.

http://docs.python.org/py3k/library/shelve.html#module-shelve
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] pickle problems

2012-08-11 Thread eryksun
On Sat, Aug 11, 2012 at 6:30 PM, Richard D. Moores  wrote:
>
> I wrote pickle_attempt.py as an exercise to try to learn to use the
> pickle module. See the version I edited for Tutor,
> pickle_attempt_for_web.py at
> .
> 
> But after closing the program and restarting it, those items have
> disappeared from D.

On line 68 you open the file in 'ab' mode. A pickle stream ends with
an ASCII period (\x2e). Anything appended after that is ignored. Use
'wb' mode. Also, Python 3.x defaults to protocol 3, which is binary,
so you might want a file extension other than .txt, such as .pkl,
.dat, .bin, etc.

If you're curious about the protocol, you can disassemble a pickle
using pickletools.dis(pickle), where pickle is bytes or a file-like
object.
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor