More news from the guy who won't shut up.

I just finished reading the README file very carefully, and figured out that
I was unaware of a lot of crucual background.  I read the example
demonstrating a custom Analyzer class, and the section detailing what
specific methods I need to implement to subclass Similarity.

I now have a working example:

---
class SimilaritySansTF(object):
    def __init__(self):
        self.super = PyLucene.DefaultSimilarity()

    def coord(self, overlap, maxOverlap):
        return self.super.coord(overlap, maxOverlap)

    def idf(self, term, searcher):
        return self.super.idf(term, searcher)

    def idf(self, terms, searcher):
        return self.super.idf(terms, searcher)

    def idf(self, docFreq, numDocs):
        return self.super.idf(docFreq, numDocs)

    def lengthNorm(self, fieldName, numTokens):
        return self.super.lengthNorm(fieldName, numTokens)

    def queryNorm(self, sumOfSquaredWeights):
        return self.super.queryNorm(sumOfSquaredWeights)

    def sloppyFreq(self, distance):
        return self.super.sloppyFreq(distance)

    def tf(self, freq):
        return 1
---

The first three print lines (shown in earlier emails) now output this:

---
[EMAIL PROTECTED]
<__main__.SimilaritySansTF object at 0x2a956536d0>
[EMAIL PROTECTED]
--- 

So, that's definitely a step forward.  However, I'm not very pleased with my
implementation, and I also have a host of new questions:

1) Did I implement the subclass wisely?  I only wanted to override tf(), but
needed to implement many other methods in order to conform.  I also didn't
want to have to recode the existing logic of the superclass, and I didn't
know how to properly call the super class methods, so instead I
instantiatied the superclass in __init__ and delegated to it to calculate
the answers for me.  This works for state-less classes, but would be
problematic if I was subclassing a class with state, where my subclass would
have to share state with the superclass.

2) Python doesn't have declared/static types, so how does Java know which
python idf() method to call?

-ofer

> -----Original Message-----
> From: [EMAIL PROTECTED] 
> [mailto:[EMAIL PROTECTED] On Behalf Of Ofer Nave
> Sent: Thursday, March 15, 2007 4:09 PM
> To: [email protected]
> Subject: RE: [pylucene-dev] why isn't my custom Similarity 
> objectchangingthebehavior?
> 
> Sorry for the rapid-fire messages...
> 
> Final sanity check script:
> 
> ---
> class Foo(object):
>     pass
> 
> class Bar(Foo):
>     pass
> 
> foo = Foo()
> print foo
> bar = Bar()
> print bar
> ---
> 
> Output:
> ---
> <__main__.Foo object at 0x2a955e3210>
> <__main__.Bar object at 0x2a955e31d0>
> ---
> 
> Clearly, inheritance is working correctly there.  Yet I'm 
> doing the same thing in my PyLucene script, and not getting 
> the same result.
> 
> What's interesting is that the format of the stringfied 
> object reference is different between by PyLucene script
> ('[EMAIL PROTECTED]') and 
> my sanity check script ('<__main__.Foo object at 
> 0x2a955e3210>').  Does this have something to do with the 
> Java -> gcj -> Python transformation?
> 
> -ofer
> 
> > -----Original Message-----
> > From: [EMAIL PROTECTED]
> > [mailto:[EMAIL PROTECTED] On Behalf 
> Of Ofer Nave
> > Sent: Thursday, March 15, 2007 4:04 PM
> > To: [email protected]
> > Subject: RE: [pylucene-dev] why isn't my custom Similarity object 
> > changingthebehavior?
> > 
> > I updated the first section in main() with this:
> > 
> >     store = build_index()
> >     searcher = PyLucene.IndexSearcher(store)
> >     foo = searcher.getSimilarity()
> >     print foo
> >     bar = SimilaritySansTF()
> >     print bar
> >     searcher.setSimilarity(bar)
> >     foo = searcher.getSimilarity()
> >     print foo
> >     parser = PyLucene.QueryParser('_all_',
> > PyLucene.StandardAnalyzer())
> > 
> > It's a sanity check to ensure that I'm creating an object of type 
> > SimilaritySansTF, and that the setSimilarity() call worked.
> > 
> > The first three lines of output are:
> > 
> > ---
> > [EMAIL PROTECTED]
> > [EMAIL PROTECTED]
> > [EMAIL PROTECTED]
> > ---
> > 
> > Now I'm thorougly baffled.  The bar variable is being directly set 
> > with the object created by the constructor call SimilaritySansTF(), 
> > and yet when I print bar, it still identifies itself as type 
> > DefaultSimilarity.
> > 
> > I'm new to Python.  Am I not understanding how inheritance 
> works here?
> > 
> > -ofer
> > 
> > > -----Original Message-----
> > > From: [EMAIL PROTECTED]
> > > [mailto:[EMAIL PROTECTED] On Behalf
> > Of Ofer Nave
> > > Sent: Thursday, March 15, 2007 3:46 PM
> > > To: list: pylucene-dev
> > > Subject: [pylucene-dev] why isn't my custom Similarity
> > object changing
> > > thebehavior?
> > > 
> > > I'm just now starting to play with the scoring algorithm.  
> > > The first change I want to make is to have the score ignore term 
> > > frequency.  I created this test script to validate my
> > understanding of
> > > the API, but my custom Similarity class doesn't seem to
> > affect the tf
> > > values in the output, and I can't figure out why.  I've
> > looked at the
> > > docs, the scoring page on the lucene site, and various
> > archived posts,
> > > and I don't see anything I've done wrong.
> > > 
> > > The print statement in tf() was to test if the overridden 
> method is 
> > > even getting called.  It's not.
> > > 
> > > ---
> > > import PyLucene
> > > 
> > > def main():
> > >     store = build_index()
> > >     searcher = PyLucene.IndexSearcher(store)
> > >     searcher.setSimilarity(SimilaritySansTF())
> > >     parser = PyLucene.QueryParser('_all_',
> > > PyLucene.StandardAnalyzer())
> > > 
> > >     query = parser.parse('foo')
> > >     hits = searcher.search(query)
> > > 
> > >     for i, doc in hits:
> > >         print '[%02d] %s (%0.2f)' % (i, doc.get('_all_'),
> > > hits.score(i))
> > >         print '\t%s' % (searcher.explain(query, hits.id(i)))
> > > 
> > > def build_index():
> > >     store = PyLucene.RAMDirectory()
> > >     writer = PyLucene.IndexWriter(store,
> > PyLucene.StandardAnalyzer(),
> > > True)
> > > 
> > >     doc = PyLucene.Document()
> > >     doc.add(PyLucene.Field('_all_', 'foo bar bar', 
> > > PyLucene.Field.Store.YES,
> > > PyLucene.Field.Index.TOKENIZED))
> > >     writer.addDocument(doc)
> > > 
> > >     doc = PyLucene.Document()
> > >     doc.add(PyLucene.Field('_all_', 'foo foo bar', 
> > > PyLucene.Field.Store.YES,
> > > PyLucene.Field.Index.TOKENIZED))
> > >     writer.addDocument(doc)
> > > 
> > >     doc = PyLucene.Document()
> > >     doc.add(PyLucene.Field('_all_', 'foo bar', 
> > > PyLucene.Field.Store.YES,
> > > PyLucene.Field.Index.TOKENIZED))
> > >     writer.addDocument(doc)
> > > 
> > >     writer.optimize()
> > >     writer.close()
> > > 
> > >     return store
> > > 
> > > class SimilaritySansTF(PyLucene.DefaultSimilarity):
> > >     def tf(freq):
> > >         print 'freak out!'
> > >         return 1
> > > 
> > > main()
> > > ---
> > > 
> > > -ofer
> > > 
> > > _______________________________________________
> > > pylucene-dev mailing list
> > > [email protected]
> > > http://lists.osafoundation.org/mailman/listinfo/pylucene-dev
> > 
> > _______________________________________________
> > pylucene-dev mailing list
> > [email protected]
> > http://lists.osafoundation.org/mailman/listinfo/pylucene-dev
> 
> _______________________________________________
> pylucene-dev mailing list
> [email protected]
> http://lists.osafoundation.org/mailman/listinfo/pylucene-dev

_______________________________________________
pylucene-dev mailing list
[email protected]
http://lists.osafoundation.org/mailman/listinfo/pylucene-dev

Reply via email to