Re: my computer is allergic to pickles
Miki Tebeka wrote: > > >From looking at the shelve info in the library reference, I get > > the impression it's tricky to change the values in the dict for > > existing keys and be sure they get changed on disk. > You can use writeback=True or call sync at the right places. > > > > How can you convert a tuple of strings to a string and back in a > > reliable deterministic way? The original strings may have ' " , > > in them. > You can use marshal, json or any other serializing library. Thanks for the tips. I guess I'll use json if I ever need to be able to read the file with something else besides python, but marshal seems fine for now. -- http://mail.python.org/mailman/listinfo/python-list
Re: my computer is allergic to pickles
Peter Otten <__pete...@web.de> wrote: > Bob Fnord wrote: > > I started by using cPickle to save the instance of the class that > > contained this dict, but the pickling process started to write > > the file but ate so much memory that my computer (4 GB RAM) > > crashed so badly that I had to press the reset button. I've never > > seen out-of-memory errors do this before. Is this normal? snipped myself > > Any comments, suggestions? > > Have you seen that one? > > http://mail.python.org/pipermail/python-list/2008-July/1139855.html Not until now, but that's interesting. But I didn't even get a backtrace, just a totally locked up computer! -- http://mail.python.org/mailman/listinfo/python-list
Re: my computer is allergic to pickles
> >From looking at the shelve info in the library reference, I get > the impression it's tricky to change the values in the dict for > existing keys and be sure they get changed on disk. You can use writeback=True or call sync at the right places. > How can you convert a tuple of strings to a string and back in a > reliable deterministic way? The original strings may have ' " , > in them. You can use marshal, json or any other serializing library. -- http://mail.python.org/mailman/listinfo/python-list
Re: my computer is allergic to pickles
Bob Fnord wrote: > I'm using python to do some log file analysis and I need to store > on disk a very large dict with tuples of strings as keys and > lists of strings and numbers as values. > > I started by using cPickle to save the instance of the class that > contained this dict, but the pickling process started to write > the file but ate so much memory that my computer (4 GB RAM) > crashed so badly that I had to press the reset button. I've never > seen out-of-memory errors do this before. Is this normal? > > (I know from the output that got written before the crash that my > program had finished building the dict and started the > pickle. When I tried running the other program that reads the > pickle and analyzes the data in it, it gave an error because the > file was incomplete. So I know where in my code the crash > happened.) > >>From searching the web, I get the impression that pickle uses a > lot of memory because it checked for recursion and other things > that could break other serialization methods. So I've switched to > using marshal to save the dict itself (the only persistent thing > in the class, which just has convenience methods for adding data > to the dict and searching it for the second stage of analysis). > > I found some references to h5 tables for getting around the > pickling memory problem, but I got the impression they only work > with fixed columns, not a somewhat complex data structure like > mine. > > Any comments, suggestions? Have you seen that one? http://mail.python.org/pipermail/python-list/2008-July/1139855.html -- http://mail.python.org/mailman/listinfo/python-list
Re: my computer is allergic to pickles
Terry Reedy wrote: > On 3/7/2011 4:50 AM, Bob Fnord wrote: > > > I want a portable data file (can be moved around the filesystem > > or copied to another machine and used), > > Used only by Python or by other software? just Python > > Would a database in a file have any advantages over a file made > > by marshal or shelve? > > If you have read the initial paragraphs of the marshal doc and your > needs fit within its limitations, go ahead and use it. (Also note that > Python could switch to a new version in the future.) OK, I think marshal is just what I need. > Keyed databases have the advantage that you can change the data file. If > you do not need to do that (as opposed to read in, do whatever, and > write out in entirety) then that is no advantage to you. OK, thanks -- http://mail.python.org/mailman/listinfo/python-list
Re: my computer is allergic to pickles
"Martin P. Hellwig" wrote: > On 05/03/2011 01:56, Bob Fnord wrote: > > > Any comments, suggestions? > > > No but I have a bunch of pseudo-questions :-) > > What version of python are you using? How about your OS and bitspace > (32/64)? Have you also tried using the non-c pickle module? If the data > is very simple in structure, perhaps serializing to CSV might be an option? python 2.6.6 ubuntu 64 bit The library ref says cPickle is "optimized" and "up to 1000 times faster than pickle" but of course doesn't mention memory. The data to save (and load in another python program) is a dict with keys = tuples of strings (including " ' , and other troublesome characters) and values = lists of strings and integers. As the 1st program runs, it adds new keys AND changes the contents of the value lists. (The 2nd program only reads the dict into memory, analyzes it, and prints to STDOUT.) -- http://mail.python.org/mailman/listinfo/python-list
Re: my computer is allergic to pickles
Miki Tebeka wrote: > > Or, which situations does shelve suit better and which does > > marshal suit better? > shelve ease of use and the fact it uses the disk to store objects makes it a > good choice if you have a lot of object, each with a unique string key (and a > tuple of strings can be converted to and from a string). > > db = shelve.open("/tmp/foo.db") > db["key1"] = (1, 2, 3) > ... > > Marshal is faster and IIRC more geared toward network operations. But I > haven't used it that much ... >From looking at the shelve info in the library reference, I get the impression it's tricky to change the values in the dict for existing keys and be sure they get changed on disk. My dict lists of strings and integers as values and the lists get changed as the program analyzes the input files, then stored on disk in their final form. I guess marshal is better for that. How can you convert a tuple of strings to a string and back in a reliable deterministic way? The original strings may have ' " , in them. -- http://mail.python.org/mailman/listinfo/python-list
Re: my computer is allergic to pickles
On 3/7/2011 4:50 AM, Bob Fnord wrote: I want a portable data file (can be moved around the filesystem or copied to another machine and used), Used only by Python or by other software? Would a database in a file have any advantages over a file made by marshal or shelve? If you have read the initial paragraphs of the marshal doc and your needs fit within its limitations, go ahead and use it. (Also note that Python could switch to a new version in the future.) Keyed databases have the advantage that you can change the data file. If you do not need to do that (as opposed to read in, do whatever, and write out in entirety) then that is no advantage to you. Similar to marshal is json, which is more limited but more portable, because understood by other languages. -- Terry Jan Reedy -- http://mail.python.org/mailman/listinfo/python-list
Re: my computer is allergic to pickles
On 05/03/2011 01:56, Bob Fnord wrote: Any comments, suggestions? No but I have a bunch of pseudo-questions :-) What version of python are you using? How about your OS and bitspace (32/64)? Have you also tried using the non-c pickle module? If the data is very simple in structure, perhaps serializing to CSV might be an option? -- mph -- http://mail.python.org/mailman/listinfo/python-list
Re: my computer is allergic to pickles
Bob Fnord wrote: > I want a portable data file (can be moved around the filesystem > or copied to another machine and used), so I don't want to use > mysql or postgres. I guess the "sqlite" approach would work, but > I think it would be difficult to turn the tuples of strings and > lists of strings and numbers into database table lines. This is as hairy as it's ever got for me (untested): def inserter (db, table_name, names, values): query = 'INSERT INTO %s (%s) VALUES (%s)' % (table_name, ','.join (names), ','.join (['?'] * len (names))) cur = db.cursor() cur.execute (query, values) cur.close() #... for v in all_value_triples: inserter (db, 'some_table', ['f1', 'f2', 'f3'], v) (or even write a bulk_inserter that took all_value_triples as an argument and moved the `for v in ...` inside the function.) > Would a database in a file have any advantages over a file made > by marshal or shelve? Depends. An sqlite3 database file is usable by programs not written in Python. > I'm more worried about the fact that a python program in user > space can bring down the computer! Never been a problem in the programs I've written. Mel. -- http://mail.python.org/mailman/listinfo/python-list
Re: my computer is allergic to pickles
MRAB wrote: > On 05/03/2011 01:56, Bob Fnord wrote: > > I'm using python to do some log file analysis and I need to store > > on disk a very large dict with tuples of strings as keys and > > lists of strings and numbers as values. > > > > I started by using cPickle to save the instance of the class that > > contained this dict, but the pickling process started to write > > the file but ate so much memory that my computer (4 GB RAM) > > crashed so badly that I had to press the reset button. I've never > > seen out-of-memory errors do this before. Is this normal? > > > > (I know from the output that got written before the crash that my > > program had finished building the dict and started the > > pickle. When I tried running the other program that reads the > > pickle and analyzes the data in it, it gave an error because the > > file was incomplete. So I know where in my code the crash > > happened.) > > > >> From searching the web, I get the impression that pickle uses a > > lot of memory because it checked for recursion and other things > > that could break other serialization methods. So I've switched to > > using marshal to save the dict itself (the only persistent thing > > in the class, which just has convenience methods for adding data > > to the dict and searching it for the second stage of analysis). > > > > I found some references to h5 tables for getting around the > > pickling memory problem, but I got the impression they only work > > with fixed columns, not a somewhat complex data structure like > > mine. > > > > Any comments, suggestions? > > > Would a database work? I want a portable data file (can be moved around the filesystem or copied to another machine and used), so I don't want to use mysql or postgres. I guess the "sqlite" approach would work, but I think it would be difficult to turn the tuples of strings and lists of strings and numbers into database table lines. Would a database in a file have any advantages over a file made by marshal or shelve? I'm more worried about the fact that a python program in user space can bring down the computer! -- http://mail.python.org/mailman/listinfo/python-list
Re: my computer is allergic to pickles
GSO wrote: > On 5 March 2011 02:14, MRAB wrote: > ... > >> Any comments, suggestions? > >> > > You obviously can't feed your computer pickles then. > > How about a tasty tidbit of XML? Served up in a main dish of DOM, or > serially if preferred? Well, right now it takes three lines to save the dict object: data_file = open(data_filename, 'wb') marshal.dump(analysis, file, 2) data_file.close() and three to load it. I doubt I could do it that easily in XML _and_ the data file would be enormous. (XML always is, let's be honest. The file doesn't need to be human readable or editable.) -- http://mail.python.org/mailman/listinfo/python-list
Re: my computer is allergic to pickles
> Or, which situations does shelve suit better and which does > marshal suit better? shelve ease of use and the fact it uses the disk to store objects makes it a good choice if you have a lot of object, each with a unique string key (and a tuple of strings can be converted to and from a string). db = shelve.open("/tmp/foo.db") db["key1"] = (1, 2, 3) ... Marshal is faster and IIRC more geared toward network operations. But I haven't used it that much ... -- http://mail.python.org/mailman/listinfo/python-list
Re: my computer is allergic to pickles
Miki Tebeka wrote: > > I'm using python to do some log file analysis and I need to store > > on disk a very large dict with tuples of strings as keys and > > lists of strings and numbers as values. > I recommend that you'll use the shelve module. It stores data on disk and is > more memory efficient than in-memory pickle objects. OK, I got this to work with marshal. What makes shelve better? Or, which situations does shelve suit better and which does marshal suit better? -- http://mail.python.org/mailman/listinfo/python-list
Re: my computer is allergic to pickles
> I'm using python to do some log file analysis and I need to store > on disk a very large dict with tuples of strings as keys and > lists of strings and numbers as values. I recommend that you'll use the shelve module. It stores data on disk and is more memory efficient than in-memory pickle objects. HTH, -- Miki -- http://mail.python.org/mailman/listinfo/python-list
Re: my computer is allergic to pickles
On 5 March 2011 02:14, MRAB wrote: ... >> Any comments, suggestions? >> You obviously can't feed your computer pickles then. How about a tasty tidbit of XML? Served up in a main dish of DOM, or serially if preferred? -- http://mail.python.org/mailman/listinfo/python-list
Re: my computer is allergic to pickles
On 05/03/2011 01:56, Bob Fnord wrote: I'm using python to do some log file analysis and I need to store on disk a very large dict with tuples of strings as keys and lists of strings and numbers as values. I started by using cPickle to save the instance of the class that contained this dict, but the pickling process started to write the file but ate so much memory that my computer (4 GB RAM) crashed so badly that I had to press the reset button. I've never seen out-of-memory errors do this before. Is this normal? (I know from the output that got written before the crash that my program had finished building the dict and started the pickle. When I tried running the other program that reads the pickle and analyzes the data in it, it gave an error because the file was incomplete. So I know where in my code the crash happened.) From searching the web, I get the impression that pickle uses a lot of memory because it checked for recursion and other things that could break other serialization methods. So I've switched to using marshal to save the dict itself (the only persistent thing in the class, which just has convenience methods for adding data to the dict and searching it for the second stage of analysis). I found some references to h5 tables for getting around the pickling memory problem, but I got the impression they only work with fixed columns, not a somewhat complex data structure like mine. Any comments, suggestions? Would a database work? -- http://mail.python.org/mailman/listinfo/python-list
my computer is allergic to pickles
I'm using python to do some log file analysis and I need to store on disk a very large dict with tuples of strings as keys and lists of strings and numbers as values. I started by using cPickle to save the instance of the class that contained this dict, but the pickling process started to write the file but ate so much memory that my computer (4 GB RAM) crashed so badly that I had to press the reset button. I've never seen out-of-memory errors do this before. Is this normal? (I know from the output that got written before the crash that my program had finished building the dict and started the pickle. When I tried running the other program that reads the pickle and analyzes the data in it, it gave an error because the file was incomplete. So I know where in my code the crash happened.) >From searching the web, I get the impression that pickle uses a lot of memory because it checked for recursion and other things that could break other serialization methods. So I've switched to using marshal to save the dict itself (the only persistent thing in the class, which just has convenience methods for adding data to the dict and searching it for the second stage of analysis). I found some references to h5 tables for getting around the pickling memory problem, but I got the impression they only work with fixed columns, not a somewhat complex data structure like mine. Any comments, suggestions? -- http://mail.python.org/mailman/listinfo/python-list