New submission from Alex Roper <al...@ugcs.caltech.edu>: Hi,
I wrote a simple script (attached) to do some preprocessing of MediaWiki XML dumps. When it has a 8 MB chunk ready to dump to disk, it forks, and the child writes it out and (will) compress it, then exit. The main thread continues as before. Note that the child thread never touches (or executes code that has in scope) the shelve handle. The attached script, as written, will work fine on dumps (I tested it on enwikisource-20081112-pages-articles.xml available from http://download.wikimedia.org/enwikisource/20081112/). If you uncomment the fork on line 40 (and the exit() on line 46 of course) and run it, it will die after writing out about 450 megabytes with the backtrace below. This appears to happen deterministically at the same place 3 of the 3 times I ran it. Apologies for the size and complexity of the test, I don't have time to reduce it further at the moment, and it looks like it may be fairly involved. I can try to work out a reduced case later and resubmit if no one wants to touch this as is;) # I ran the script with: bzcat enwikisource-20081112-pages-articles.xml.bz2 | ./convert.py wikisource 8388608 # (after making a dir called wikisource) Let me know if I can be of any assistance, and apologies if this is somewhere documented and I missed it. Using Python 2.6.1 as released from python.org. Alex al...@autumn:~/projects/wikipedia$ cat enwikisource-20081112-pages-articles.xml | ./convert.py wikisource 8388608 Alexandria version 1, Copyright (C) 2008 Alex Roper Alexandria comes with ABSOLUTELY NO WARRANTY. This is free software, and you are welcome to copy modify, and redistribute it under certain conditions; see the file COPYING for details. ..........................................................Traceback (most recent call last): File "./convert.py", line 100, in <module> sax.parse(sys.stdin, Parser(sys.argv[1], MIN_CHUNK_SIZE)) File "/usr/lib/python2.6/xml/sax/__init__.py", line 33, in parse parser.parse(source) File "/usr/lib/python2.6/xml/sax/expatreader.py", line 107, in parse xmlreader.IncrementalParser.parse(self, source) File "/usr/lib/python2.6/xml/sax/xmlreader.py", line 123, in parse self.feed(buffer) File "/usr/lib/python2.6/xml/sax/expatreader.py", line 207, in feed self._parser.Parse(data, isFinal) File "/usr/lib/python2.6/xml/sax/expatreader.py", line 304, in end_element self._cont_handler.endElement(name) File "./convert.py", line 61, in endElement s.pagehandler(s.title, s.text) File "./convert.py", line 68, in pagehandler s.index[title.encode("UTF8")] = (s.chunks, len(s.pages)) File "/usr/lib/python2.6/shelve.py", line 133, in __setitem__ self.dict[key] = f.getvalue() File "/usr/lib/python2.6/bsddb/__init__.py", line 276, in __setitem__ _DeadlockWrap(wrapF) # self.db[key] = value File "/usr/lib/python2.6/bsddb/dbutils.py", line 68, in DeadlockWrap return function(*_args, **_kwargs) File "/usr/lib/python2.6/bsddb/__init__.py", line 275, in wrapF self.db[key] = value bsddb.db.DBRunRecoveryError: (-30975, 'DB_RUNRECOVERY: Fatal error, run database recovery -- PANIC: Invalid argument') Exception bsddb.db.DBRunRecoveryError: DBRunRecoveryError(-30975, 'DB_RUNRECOVERY: Fatal error, run database recovery -- PANIC: fatal region error detected; run recovery') in <bound method Parser.__del__ of <__main__.Parser instance at 0x7f3492966d40>> ignored Exception bsddb.db.DBRunRecoveryError: DBRunRecoveryError(-30975, 'DB_RUNRECOVERY: Fatal error, run database recovery -- PANIC: fatal region error detected; run recovery') in ignored Exception bsddb.db.DBRunRecoveryError: DBRunRecoveryError(-30975, 'DB_RUNRECOVERY: Fatal error, run database recovery -- PANIC: fatal region error detected; run recovery') in ignored ---------- components: Extension Modules files: convert.py messages: 77942 nosy: calmofthestorm severity: normal status: open title: Fork + shelve causes shelve corruption and backtrace type: behavior versions: Python 2.6 Added file: http://bugs.python.org/file12370/convert.py _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue4679> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com