On Tue, 8 Jan 2008, Brian Merrell wrote:

# java -version

java version "1.6.0_03"
Java(TM) SE Runtime Environment (build 1.6.0_03-b05)
Java HotSpot(TM) 64-Bit Server VM (build 1.6.0_03-b05, mixed mode)

# which java
/usr/bin/java

It doesn't seem to crash when I remove the filter.  However this may be
misleading as don't have nearly as many tokens (particularly unique tokens)
without the filter.  The problem may exist but the symptoms delayed.

This could indicate that there is indeed a leak in the code generated for the extension. I intend to take a closer look at what's being generated tomorrow or Thursday. This dictionary should not be growing unless your python code keeps references to all these objects. Are all the values in the returned dict mostly the same (their refcount) ? If so, what is it ?
In other words, what does myvm._dumpRefs().values() look like ?

After a 3000 thousand documents I get len(myvm._dumpRef()) == 12270 and it
seems to be increasing by about 4000 for each 1000 documents.

I didn't even realize  C++ code was being generated.  I doubt I can help
directly with this but would be happy to provide anything that would help
those more knowledgeable than I debug this).

JCC generates over 100,000 lines of C++ code to integrate Java Lucene and Python. I used to write this by hand, phew.

If you could send me one (or ten) document(s) and your indexing code (you already posted your filter code), that should help me reproduce this.

Andi..
_______________________________________________
pylucene-dev mailing list
[email protected]
http://lists.osafoundation.org/mailman/listinfo/pylucene-dev

Reply via email to