Re: my computer is allergic to pickles

2011-03-11 Thread Bob Fnord
Miki Tebeka  wrote:

> > >From looking at the shelve info in the library reference, I get
> > the impression it's tricky to change the values in the dict for
> > existing keys and be sure they get changed on disk.
> You can use writeback=True or call sync at the right places.
> 
> 
> > How can you convert a tuple of strings to a string and back in a
> > reliable deterministic way? The original strings may have ' " ,
> > in them.
> You can use marshal, json or any other serializing library.

Thanks for the tips. I guess I'll use json if I ever need to be
able to read the file with something else besides python, but
marshal seems fine for now.

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: my computer is allergic to pickles

2011-03-11 Thread Bob Fnord
Peter Otten <__pete...@web.de> wrote:

> Bob Fnord wrote:

> > I started by using cPickle to save the instance of the class that
> > contained this dict, but the pickling process started to write
> > the file but ate so much memory that my computer (4 GB RAM)
> > crashed so badly that I had to press the reset button. I've never
> > seen out-of-memory errors do this before. Is this normal?
snipped myself
> > Any comments, suggestions?
> 
> Have you seen that one?
> 
> http://mail.python.org/pipermail/python-list/2008-July/1139855.html

Not until now, but that's interesting. But I didn't even get a
backtrace, just a totally locked up computer!

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: my computer is allergic to pickles

2011-03-09 Thread Miki Tebeka
> >From looking at the shelve info in the library reference, I get
> the impression it's tricky to change the values in the dict for
> existing keys and be sure they get changed on disk.
You can use writeback=True or call sync at the right places.


> How can you convert a tuple of strings to a string and back in a
> reliable deterministic way? The original strings may have ' " ,
> in them.
You can use marshal, json or any other serializing library.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: my computer is allergic to pickles

2011-03-09 Thread Peter Otten
Bob Fnord wrote:

> I'm using python to do some log file analysis and I need to store
> on disk a very large dict with tuples of strings as keys and
> lists of strings and numbers as values.
> 
> I started by using cPickle to save the instance of the class that
> contained this dict, but the pickling process started to write
> the file but ate so much memory that my computer (4 GB RAM)
> crashed so badly that I had to press the reset button. I've never
> seen out-of-memory errors do this before. Is this normal?
> 
> (I know from the output that got written before the crash that my
> program had finished building the dict and started the
> pickle. When I tried running the other program that reads the
> pickle and analyzes the data in it, it gave an error because the
> file was incomplete. So I know where in my code the crash
> happened.)
> 
>>From searching the web, I get the impression that pickle uses a
> lot of memory because it checked for recursion and other things
> that could break other serialization methods. So I've switched to
> using marshal to save the dict itself (the only persistent thing
> in the class, which just has convenience methods for adding data
> to the dict and searching it for the second stage of analysis).
> 
> I found some references to h5 tables for getting around the
> pickling memory problem, but I got the impression they only work
> with fixed columns, not a somewhat complex data structure like
> mine.
> 
> Any comments, suggestions?

Have you seen that one?

http://mail.python.org/pipermail/python-list/2008-July/1139855.html
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: my computer is allergic to pickles

2011-03-09 Thread Bob Fnord
Terry Reedy  wrote:

> On 3/7/2011 4:50 AM, Bob Fnord wrote:
> 
> > I want a portable data file (can be moved around the filesystem
> > or copied to another machine and used),
> 
> Used only by Python or by other software?

just Python

> > Would a database in a file have any advantages over a file made
> > by marshal or shelve?
> 
> If you have read the initial paragraphs of the marshal doc and your 
> needs fit within its limitations, go ahead and use it. (Also note that 
> Python could switch to a new version in the future.)

OK, I think marshal is just what I need.

> Keyed databases have the advantage that you can change the data file. If 
> you do not need to do that (as opposed to read in, do whatever, and 
> write out in entirety) then that is no advantage to you.

OK, thanks

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: my computer is allergic to pickles

2011-03-09 Thread Bob Fnord
"Martin P. Hellwig"  wrote:

> On 05/03/2011 01:56, Bob Fnord wrote:
> 
> > Any comments, suggestions?
> >
> No but I have a bunch of pseudo-questions :-)
> 
> What version of python are you using? How about your OS and bitspace 
> (32/64)? Have you also tried using the non-c pickle module?  If the data 
> is very simple in structure, perhaps serializing to CSV might be an option?

python 2.6.6

ubuntu 64 bit

The library ref says cPickle is "optimized" and "up to 1000 times
faster than pickle" but of course doesn't mention memory.

The data to save (and load in another python program) is a dict
with keys = tuples of strings (including " ' , and other
troublesome characters) and values = lists of strings and
integers. As the 1st program runs, it adds new keys AND changes
the contents of the value lists. (The 2nd program only reads the
dict into memory, analyzes it, and prints to STDOUT.)

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: my computer is allergic to pickles

2011-03-09 Thread Bob Fnord
Miki Tebeka  wrote:

> > Or, which situations does shelve suit better and which does
> > marshal suit better?
> shelve ease of use and the fact it uses the disk to store objects makes it a 
> good choice if you have a lot of object, each with a unique string key (and a 
> tuple of strings can be converted to and from a string).
> 
> db = shelve.open("/tmp/foo.db")
> db["key1"] = (1, 2, 3)
> ...
> 
> Marshal is faster and IIRC more geared toward network operations. But I 
> haven't used it that much ...

>From looking at the shelve info in the library reference, I get
the impression it's tricky to change the values in the dict for
existing keys and be sure they get changed on disk. My dict lists
of strings and integers as values and the lists get changed as
the program analyzes the input files, then stored on disk in
their final form. I guess marshal is better for that.

How can you convert a tuple of strings to a string and back in a
reliable deterministic way? The original strings may have ' " ,
in them.

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: my computer is allergic to pickles

2011-03-07 Thread Terry Reedy

On 3/7/2011 4:50 AM, Bob Fnord wrote:


I want a portable data file (can be moved around the filesystem
or copied to another machine and used),


Used only by Python or by other software?


Would a database in a file have any advantages over a file made
by marshal or shelve?


If you have read the initial paragraphs of the marshal doc and your 
needs fit within its limitations, go ahead and use it. (Also note that 
Python could switch to a new version in the future.)


Keyed databases have the advantage that you can change the data file. If 
you do not need to do that (as opposed to read in, do whatever, and 
write out in entirety) then that is no advantage to you.


Similar to marshal is json, which is more limited but more portable, 
because understood by other languages.


--
Terry Jan Reedy

--
http://mail.python.org/mailman/listinfo/python-list


Re: my computer is allergic to pickles

2011-03-07 Thread Martin P. Hellwig

On 05/03/2011 01:56, Bob Fnord wrote:


Any comments, suggestions?


No but I have a bunch of pseudo-questions :-)

What version of python are you using? How about your OS and bitspace 
(32/64)? Have you also tried using the non-c pickle module?  If the data 
is very simple in structure, perhaps serializing to CSV might be an option?

--
mph

--
http://mail.python.org/mailman/listinfo/python-list


Re: my computer is allergic to pickles

2011-03-07 Thread Mel
Bob Fnord wrote:

> I want a portable data file (can be moved around the filesystem
> or copied to another machine and used), so I don't want to use
> mysql or postgres. I guess the "sqlite" approach would work, but
> I think it would be difficult to turn the tuples of strings and
> lists of strings and numbers into database table lines.

This is as hairy as it's ever got for me (untested):

def inserter (db, table_name, names, values):
query = 'INSERT INTO %s (%s) VALUES (%s)' % (table_name, ','.join 
(names), ','.join (['?'] * len (names)))
cur = db.cursor()
cur.execute (query, values)
cur.close()
#...
for v in all_value_triples:
inserter (db, 'some_table', ['f1', 'f2', 'f3'], v)

(or even write a bulk_inserter that took all_value_triples as an argument 
and moved the `for v in ...` inside the function.)

> Would a database in a file have any advantages over a file made
> by marshal or shelve?

Depends.  An sqlite3 database file is usable by programs not written in 
Python.

> I'm more worried about the fact that a python program in user
> space can bring down the computer!

Never been a problem in the programs I've written.

Mel.

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: my computer is allergic to pickles

2011-03-07 Thread Bob Fnord
MRAB  wrote:

> On 05/03/2011 01:56, Bob Fnord wrote:
> > I'm using python to do some log file analysis and I need to store
> > on disk a very large dict with tuples of strings as keys and
> > lists of strings and numbers as values.
> >
> > I started by using cPickle to save the instance of the class that
> > contained this dict, but the pickling process started to write
> > the file but ate so much memory that my computer (4 GB RAM)
> > crashed so badly that I had to press the reset button. I've never
> > seen out-of-memory errors do this before. Is this normal?
> >
> > (I know from the output that got written before the crash that my
> > program had finished building the dict and started the
> > pickle. When I tried running the other program that reads the
> > pickle and analyzes the data in it, it gave an error because the
> > file was incomplete. So I know where in my code the crash
> > happened.)
> >
> >> From searching the web, I get the impression that pickle uses a
> > lot of memory because it checked for recursion and other things
> > that could break other serialization methods. So I've switched to
> > using marshal to save the dict itself (the only persistent thing
> > in the class, which just has convenience methods for adding data
> > to the dict and searching it for the second stage of analysis).
> >
> > I found some references to h5 tables for getting around the
> > pickling memory problem, but I got the impression they only work
> > with fixed columns, not a somewhat complex data structure like
> > mine.
> >
> > Any comments, suggestions?
> >
> Would a database work?

I want a portable data file (can be moved around the filesystem
or copied to another machine and used), so I don't want to use
mysql or postgres. I guess the "sqlite" approach would work, but
I think it would be difficult to turn the tuples of strings and
lists of strings and numbers into database table lines. 

Would a database in a file have any advantages over a file made
by marshal or shelve?

I'm more worried about the fact that a python program in user
space can bring down the computer!

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: my computer is allergic to pickles

2011-03-06 Thread Bob Fnord
GSO  wrote:

> On 5 March 2011 02:14, MRAB  wrote:
> ...
> >> Any comments, suggestions?
> >>
> 
> You obviously can't feed your computer pickles then.
> 
> How about a tasty tidbit of XML?  Served up in a main dish of DOM, or
> serially if preferred?

Well, right now it takes three lines to save the dict object:

data_file = open(data_filename, 'wb')
marshal.dump(analysis, file, 2)
data_file.close()

and three to load it.  I doubt I could do it that easily in XML
_and_ the data file would be enormous. (XML always is, let's be
honest. The file doesn't need to be human readable or editable.)

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: my computer is allergic to pickles

2011-03-06 Thread Miki Tebeka
> Or, which situations does shelve suit better and which does
> marshal suit better?
shelve ease of use and the fact it uses the disk to store objects makes it a 
good choice if you have a lot of object, each with a unique string key (and a 
tuple of strings can be converted to and from a string).

db = shelve.open("/tmp/foo.db")
db["key1"] = (1, 2, 3)
...

Marshal is faster and IIRC more geared toward network operations. But I haven't 
used it that much ...
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: my computer is allergic to pickles

2011-03-06 Thread Bob Fnord
Miki Tebeka  wrote:

> > I'm using python to do some log file analysis and I need to store
> > on disk a very large dict with tuples of strings as keys and
> > lists of strings and numbers as values.
> I recommend that you'll use the shelve module. It stores data on disk and is 
> more memory efficient than in-memory pickle objects.

OK, I got this to work with marshal. What makes shelve better?

Or, which situations does shelve suit better and which does
marshal suit better?

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: my computer is allergic to pickles

2011-03-04 Thread Miki Tebeka
> I'm using python to do some log file analysis and I need to store
> on disk a very large dict with tuples of strings as keys and
> lists of strings and numbers as values.
I recommend that you'll use the shelve module. It stores data on disk and is 
more memory efficient than in-memory pickle objects.

HTH,
--
Miki
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: my computer is allergic to pickles

2011-03-04 Thread GSO
On 5 March 2011 02:14, MRAB  wrote:
...
>> Any comments, suggestions?
>>

You obviously can't feed your computer pickles then.

How about a tasty tidbit of XML?  Served up in a main dish of DOM, or
serially if preferred?
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: my computer is allergic to pickles

2011-03-04 Thread MRAB

On 05/03/2011 01:56, Bob Fnord wrote:

I'm using python to do some log file analysis and I need to store
on disk a very large dict with tuples of strings as keys and
lists of strings and numbers as values.

I started by using cPickle to save the instance of the class that
contained this dict, but the pickling process started to write
the file but ate so much memory that my computer (4 GB RAM)
crashed so badly that I had to press the reset button. I've never
seen out-of-memory errors do this before. Is this normal?

(I know from the output that got written before the crash that my
program had finished building the dict and started the
pickle. When I tried running the other program that reads the
pickle and analyzes the data in it, it gave an error because the
file was incomplete. So I know where in my code the crash
happened.)


From searching the web, I get the impression that pickle uses a

lot of memory because it checked for recursion and other things
that could break other serialization methods. So I've switched to
using marshal to save the dict itself (the only persistent thing
in the class, which just has convenience methods for adding data
to the dict and searching it for the second stage of analysis).

I found some references to h5 tables for getting around the
pickling memory problem, but I got the impression they only work
with fixed columns, not a somewhat complex data structure like
mine.

Any comments, suggestions?


Would a database work?
--
http://mail.python.org/mailman/listinfo/python-list


my computer is allergic to pickles

2011-03-04 Thread Bob Fnord
I'm using python to do some log file analysis and I need to store
on disk a very large dict with tuples of strings as keys and
lists of strings and numbers as values.

I started by using cPickle to save the instance of the class that
contained this dict, but the pickling process started to write
the file but ate so much memory that my computer (4 GB RAM)
crashed so badly that I had to press the reset button. I've never
seen out-of-memory errors do this before. Is this normal?

(I know from the output that got written before the crash that my
program had finished building the dict and started the
pickle. When I tried running the other program that reads the
pickle and analyzes the data in it, it gave an error because the
file was incomplete. So I know where in my code the crash
happened.)

>From searching the web, I get the impression that pickle uses a
lot of memory because it checked for recursion and other things
that could break other serialization methods. So I've switched to
using marshal to save the dict itself (the only persistent thing
in the class, which just has convenience methods for adding data
to the dict and searching it for the second stage of analysis).

I found some references to h5 tables for getting around the
pickling memory problem, but I got the impression they only work
with fixed columns, not a somewhat complex data structure like
mine.

Any comments, suggestions?

-- 
http://mail.python.org/mailman/listinfo/python-list