Thanks for you efforts , I will be waiting for the solution eagerly because I
don't want to code so many wrappers.
For time being this is what I am using for finalizing tokenizers and keep a
single copy of analyzer.
Its keep per thread one tokenStreamWrapper, looks simple but may be you can
point on some obvious problem :)
Every custom analyzer must derive from CustomAnalyzer
and it return StreamWrapper of type PythonTokenStream so that it can be passed
around in java calls.
---------
from lucene import PythonAnalyzer, PythonTokenStream
import threading
class StreamWrapper(PythonTokenStream):
"""
Wrapper over stream which finalizes all custom steams once done
"""
def __init__(self, tokenStream, customFilters):
super(StreamWrapper, self).__init__()
self.tokenStream = tokenStream
self.customFilters = customFilters
def next(self):
try:
token = self.tokenStream.next()
except StopIteration,e:
# clear all custom filters
for customFilter in self.customFilters:
customFilter.finalize()
self.customFilters = []
raise
return token
class CustomAnalyzer(PythonAnalyzer):
"""
All custom analyzers should derive from this to avoid memory leaks
it keeps track of custom tokenizers and removed them after tokenStream is
done.
"""
def __init__(self):
super(CustomAnalyzer, self).__init__()
self.streamWrapperMap = {}
def tokenStream(self, fieldName, reader):
threadID = str(threading.currentThread())
if threadID in self.streamWrapperMap:
self.streamWrapperMap[threadID].finalize()
customFilters = []
stream = self._tokenStream(customFilters, fieldName, reader)
streamWrapper = StreamWrapper(stream, customFilters)
self.streamWrapperMap[threadID] = streamWrapper
return streamWrapper
---------
----- Original Message ----
From: Andi Vajda <[EMAIL PROTECTED]>
To: [email protected]
Sent: Friday, 18 January, 2008 2:08:47 AM
Subject: [pylucene-dev] finalizing the deadly embrace
Thinking about this some more, I believe that Anurag's finalizer proxy idea
is on the right track. It provides the "trick" needed to break the deadly
embrace when the ref count of the python object is down to 1, that is, down
to when the only reference is the one from the Java parent wrapper.
When the finalizer proxy's refcount goes to zero, it is safe to assume that
only Java _may_ still be needing the object. This is enough then to replace
the strong global reference to the Java parent wrapper with a weak global
reference thereby breaking the deadly embrace and letting Java garbage
collect it when its time has come. When that time comes, the finalize()
method on it is normally called by the Java garbage collector and the python
ref count to the Python extension instance is brought to zero and the object
is finally freed.
This assumes, of course, that when such an extension object is instantiated,
the finalizer proxy is actually returned.
I should be able to implement this in C/C++ so that the performance hit
is minimal and in a way that is transparent to PyLucene users.
More on this in the next few days....
Andi..
_______________________________________________
pylucene-dev mailing list
[email protected]
http://lists.osafoundation.org/mailman/listinfo/pylucene-dev
Save all your chat conversations. Find them online at
http://in.messenger.yahoo.com/webmessengerpromo.php_______________________________________________
pylucene-dev mailing list
[email protected]
http://lists.osafoundation.org/mailman/listinfo/pylucene-dev