Thanks for you efforts , I will be waiting for the solution eagerly because I 
don't want to code so many wrappers.

For time being this is what I am using for finalizing tokenizers and keep a 
single copy of analyzer.
Its keep per thread one tokenStreamWrapper, looks simple but may be you can 
point on some obvious problem :)

Every custom analyzer must derive from CustomAnalyzer
and it return StreamWrapper of type PythonTokenStream so that it can be passed 
around in java calls.

---------
from lucene import PythonAnalyzer, PythonTokenStream
import threading
class StreamWrapper(PythonTokenStream):
    """
    Wrapper over stream which finalizes all custom steams once done
    """
    def __init__(self, tokenStream, customFilters):
        super(StreamWrapper, self).__init__()
        self.tokenStream = tokenStream
        self.customFilters = customFilters
    def next(self):
        try:
            token = self.tokenStream.next()
        except StopIteration,e:
            # clear all custom filters
            for customFilter in self.customFilters:
                customFilter.finalize()
            self.customFilters = []
            raise
        return token
    
class CustomAnalyzer(PythonAnalyzer):
    """
    All custom analyzers should derive from this to avoid memory leaks
    it keeps track of custom tokenizers and removed them after tokenStream is 
done.
    """
    def __init__(self):
        super(CustomAnalyzer, self).__init__()
        self.streamWrapperMap = {}
    def tokenStream(self, fieldName, reader):
        threadID = str(threading.currentThread())
        if threadID in self.streamWrapperMap:
            self.streamWrapperMap[threadID].finalize()
        customFilters = []
        stream = self._tokenStream(customFilters, fieldName, reader)
        streamWrapper =  StreamWrapper(stream, customFilters)
        self.streamWrapperMap[threadID] = streamWrapper
        return streamWrapper
---------


----- Original Message ----
From: Andi Vajda <[EMAIL PROTECTED]>
To: [email protected]
Sent: Friday, 18 January, 2008 2:08:47 AM
Subject: [pylucene-dev] finalizing the deadly embrace


Thinking about this some more, I believe that Anurag's finalizer proxy idea 
is on the right track. It provides the "trick" needed to break the deadly 
embrace when the ref count of the python object is down to 1, that is, down 
to when the only reference is the one from the Java parent wrapper.

When the finalizer proxy's refcount goes to zero, it is safe to assume that 
only Java _may_ still be needing the object. This is enough then to replace 
the strong global reference to the Java parent wrapper with a weak global 
reference thereby breaking the deadly embrace and letting Java garbage 
collect it when its time has come. When that time comes, the finalize() 
method on it is normally called by the Java garbage collector and the python 
ref count to the Python extension instance is brought to zero and the object 
is finally freed.

This assumes, of course, that when such an extension object is instantiated, 
the finalizer proxy is actually returned.

I should be able to implement this in C/C++ so that the performance hit 
is minimal and in a way that is transparent to PyLucene users.

More on this in the next few days....

Andi..
_______________________________________________
pylucene-dev mailing list
[email protected]
http://lists.osafoundation.org/mailman/listinfo/pylucene-dev


      Save all your chat conversations. Find them online at 
http://in.messenger.yahoo.com/webmessengerpromo.php
_______________________________________________
pylucene-dev mailing list
[email protected]
http://lists.osafoundation.org/mailman/listinfo/pylucene-dev

Reply via email to