Thanks, that worked! Unfortunately, due to Python threads being user-level threads, it seems my script doesn't see any performance gains from the second CPU (I/O isn't much of a bottleneck since I have enough spare RAM on the box to cache the entire filesystem).
I created another version using popen2's Popen4 class which worked (two processes indexed all my data in half the time that the previous single-process version needed), so I'll probably keep using that. (I tried to use the subprocess module, as per the recommendation in the Python Library Reference, but that module seems to be absent from my install, and my lame attempt at stealing a copy from a source package and dropping it in the library path only yielded syntax errors.) -ofer > -----Original Message----- > From: [EMAIL PROTECTED] > [mailto:[EMAIL PROTECTED] On Behalf Of > Norbert Wojtowicz > Sent: Monday, March 12, 2007 8:31 PM > To: [email protected] > Subject: Re: [pylucene-dev] simple threading segfault demo > > Hello, > > I'm sure someone can give you more detailed advice, but the > general rule with PyLucene and threading is you need to use > PyLucene.PythonThread wherever you would normally use a > python thread. It's a small wrapper for python's thread that > fixes some issues with gcj and the garbage collector. I'm > sure someone can explain that better, but I've learned this > is the golden rule when working with PyLucene and threads. > > Cheers, > Norbert > > On Mon, 2007-03-12 at 20:15 -0700, Ofer Nave wrote: > > Hello. > > > > I wanted to try splitting my index up into two slices and indexing > > each in separate threads to see if it would run faster on a > dual-proc > > box, but my script began segfaulting as soon as threading > was added. > > This is the first time I've ever used threads in Python, so > I might be > > doing something obviously stupid. > > > > Anyway, I pared down the script to a minimal test case that still > > yields a segfault. Here is the code: > > > > --- > > #!/usr/bin/python > > import os > > import sys > > import threading > > > > import PyLucene > > > > class Indexer(object): > > def __init__(self, index_dir): > > self.index_dir = index_dir > > if not os.path.exists(index_dir): > > os.mkdir(index_dir) > > > > def run(self): > > worker1 = Worker(self.index_dir + '/1', 1) > > worker2 = Worker(self.index_dir + '/2', 2) > > worker1.start() > > worker2.start() > > while (worker1.isAlive() or worker2.isAlive()): > > pass > > > > class Worker(threading.Thread): > > def __init__(self, index_dir, worker_id): > > threading.Thread.__init__(self) > > self.index_dir = index_dir > > self.worker_id = worker_id > > if not os.path.exists(index_dir): > > os.mkdir(index_dir) > > > > def run(self): > > print 'woo hoo: ' + self.index_dir > > self.store = > PyLucene.FSDirectory.getDirectory(self.index_dir, True) > > self.store.close() > > > > if __name__ == '__main__': > > if len(sys.argv) < 2: > > print "Usage: python " + __file__ + " <index_dir>" > > sys.exit(1) > > print 'PyLucene', PyLucene.VERSION, 'Lucene', > PyLucene.LUCENE_VERSION > > indexer = Indexer(sys.argv[1]) > > indexer.run() > > --- > > > > The output is as follows: > > > > [EMAIL PROTECTED] ~/bin]$ lucene_segfault_demo /tmp PyLucene > 2.1.0-1 Lucene > > 2.1.0-509013 woo hoo: /tmp/1 Segmentation fault > > > > Any ideas? > > > > -ofer > > > > _______________________________________________ > > pylucene-dev mailing list > > [email protected] > > http://lists.osafoundation.org/mailman/listinfo/pylucene-dev > > _______________________________________________ > pylucene-dev mailing list > [email protected] > http://lists.osafoundation.org/mailman/listinfo/pylucene-dev _______________________________________________ pylucene-dev mailing list [email protected] http://lists.osafoundation.org/mailman/listinfo/pylucene-dev
