Hi!
First off, thanks for PyLucene -- it totally rocks!
I've been working on a python script that uses Lucene to look up text in an index -- it works great, but it ran my machine out of memory in an all-night test. :-( A little bit of digging around and I've come up with this little Python program to duplicates the memory leak:
#!/usr/bin/env python
import PyLucene from stringreader import StringReader
analyzer = PyLucene.StopAnalyzer()
while True:
query = u"any old text here will cause a leak"
for token in query.split(u' '):
stream = analyzer.tokenStream("", StringReader(token))
while stream.next(): passI know that my use of the analyzer is a bit strange, but I want to examine which words get tossed as stop words and I need to correlate tokenized lucene queries from non-tokenized query strings.
Is there something I need to do that I am not doing?
This happens on Linux:
kernel: 2.4.24 (eeek -- its time to upgrade!) gcj: gcj (GCC) 3.4.4 20041218 (prerelease) (Debian 3.4.3-6) python: Python 2.3.4 (#2, Jan 5 2005, 08:24:51) PyLucene: 0.9.6
Any tips at all would be appreciated!
The stringreader class used in the example above is mostly ripped off from one of the PyLucene unit tests:
#!/usr/bin/env python class StringReader(object):
def __init__(self, text):
self.text = unicode(text)def read(self, length = -1):
text = self.text
if text is None:
return '' if length == -1 or length >= len(text):
self.text = None
return text text = text[0:length]
self.text = self.text[length:]
return text
def close(self):
pass--
--ruaok Somewhere in Texas a village is *still* missing its idiot.
Robert Kaye -- [EMAIL PROTECTED] -- http://mayhem-chaos.net
_______________________________________________ pylucene-dev mailing list [email protected] http://lists.osafoundation.org/mailman/listinfo/pylucene-dev
