More news from the guy who won't shut up.
I just finished reading the README file very carefully, and figured out that
I was unaware of a lot of crucual background. I read the example
demonstrating a custom Analyzer class, and the section detailing what
specific methods I need to implement to subclass Similarity.
I now have a working example:
---
class SimilaritySansTF(object):
def __init__(self):
self.super = PyLucene.DefaultSimilarity()
def coord(self, overlap, maxOverlap):
return self.super.coord(overlap, maxOverlap)
def idf(self, term, searcher):
return self.super.idf(term, searcher)
def idf(self, terms, searcher):
return self.super.idf(terms, searcher)
def idf(self, docFreq, numDocs):
return self.super.idf(docFreq, numDocs)
def lengthNorm(self, fieldName, numTokens):
return self.super.lengthNorm(fieldName, numTokens)
def queryNorm(self, sumOfSquaredWeights):
return self.super.queryNorm(sumOfSquaredWeights)
def sloppyFreq(self, distance):
return self.super.sloppyFreq(distance)
def tf(self, freq):
return 1
---
The first three print lines (shown in earlier emails) now output this:
---
[EMAIL PROTECTED]
<__main__.SimilaritySansTF object at 0x2a956536d0>
[EMAIL PROTECTED]
---
So, that's definitely a step forward. However, I'm not very pleased with my
implementation, and I also have a host of new questions:
1) Did I implement the subclass wisely? I only wanted to override tf(), but
needed to implement many other methods in order to conform. I also didn't
want to have to recode the existing logic of the superclass, and I didn't
know how to properly call the super class methods, so instead I
instantiatied the superclass in __init__ and delegated to it to calculate
the answers for me. This works for state-less classes, but would be
problematic if I was subclassing a class with state, where my subclass would
have to share state with the superclass.
2) Python doesn't have declared/static types, so how does Java know which
python idf() method to call?
-ofer
> -----Original Message-----
> From: [EMAIL PROTECTED]
> [mailto:[EMAIL PROTECTED] On Behalf Of Ofer Nave
> Sent: Thursday, March 15, 2007 4:09 PM
> To: [email protected]
> Subject: RE: [pylucene-dev] why isn't my custom Similarity
> objectchangingthebehavior?
>
> Sorry for the rapid-fire messages...
>
> Final sanity check script:
>
> ---
> class Foo(object):
> pass
>
> class Bar(Foo):
> pass
>
> foo = Foo()
> print foo
> bar = Bar()
> print bar
> ---
>
> Output:
> ---
> <__main__.Foo object at 0x2a955e3210>
> <__main__.Bar object at 0x2a955e31d0>
> ---
>
> Clearly, inheritance is working correctly there. Yet I'm
> doing the same thing in my PyLucene script, and not getting
> the same result.
>
> What's interesting is that the format of the stringfied
> object reference is different between by PyLucene script
> ('[EMAIL PROTECTED]') and
> my sanity check script ('<__main__.Foo object at
> 0x2a955e3210>'). Does this have something to do with the
> Java -> gcj -> Python transformation?
>
> -ofer
>
> > -----Original Message-----
> > From: [EMAIL PROTECTED]
> > [mailto:[EMAIL PROTECTED] On Behalf
> Of Ofer Nave
> > Sent: Thursday, March 15, 2007 4:04 PM
> > To: [email protected]
> > Subject: RE: [pylucene-dev] why isn't my custom Similarity object
> > changingthebehavior?
> >
> > I updated the first section in main() with this:
> >
> > store = build_index()
> > searcher = PyLucene.IndexSearcher(store)
> > foo = searcher.getSimilarity()
> > print foo
> > bar = SimilaritySansTF()
> > print bar
> > searcher.setSimilarity(bar)
> > foo = searcher.getSimilarity()
> > print foo
> > parser = PyLucene.QueryParser('_all_',
> > PyLucene.StandardAnalyzer())
> >
> > It's a sanity check to ensure that I'm creating an object of type
> > SimilaritySansTF, and that the setSimilarity() call worked.
> >
> > The first three lines of output are:
> >
> > ---
> > [EMAIL PROTECTED]
> > [EMAIL PROTECTED]
> > [EMAIL PROTECTED]
> > ---
> >
> > Now I'm thorougly baffled. The bar variable is being directly set
> > with the object created by the constructor call SimilaritySansTF(),
> > and yet when I print bar, it still identifies itself as type
> > DefaultSimilarity.
> >
> > I'm new to Python. Am I not understanding how inheritance
> works here?
> >
> > -ofer
> >
> > > -----Original Message-----
> > > From: [EMAIL PROTECTED]
> > > [mailto:[EMAIL PROTECTED] On Behalf
> > Of Ofer Nave
> > > Sent: Thursday, March 15, 2007 3:46 PM
> > > To: list: pylucene-dev
> > > Subject: [pylucene-dev] why isn't my custom Similarity
> > object changing
> > > thebehavior?
> > >
> > > I'm just now starting to play with the scoring algorithm.
> > > The first change I want to make is to have the score ignore term
> > > frequency. I created this test script to validate my
> > understanding of
> > > the API, but my custom Similarity class doesn't seem to
> > affect the tf
> > > values in the output, and I can't figure out why. I've
> > looked at the
> > > docs, the scoring page on the lucene site, and various
> > archived posts,
> > > and I don't see anything I've done wrong.
> > >
> > > The print statement in tf() was to test if the overridden
> method is
> > > even getting called. It's not.
> > >
> > > ---
> > > import PyLucene
> > >
> > > def main():
> > > store = build_index()
> > > searcher = PyLucene.IndexSearcher(store)
> > > searcher.setSimilarity(SimilaritySansTF())
> > > parser = PyLucene.QueryParser('_all_',
> > > PyLucene.StandardAnalyzer())
> > >
> > > query = parser.parse('foo')
> > > hits = searcher.search(query)
> > >
> > > for i, doc in hits:
> > > print '[%02d] %s (%0.2f)' % (i, doc.get('_all_'),
> > > hits.score(i))
> > > print '\t%s' % (searcher.explain(query, hits.id(i)))
> > >
> > > def build_index():
> > > store = PyLucene.RAMDirectory()
> > > writer = PyLucene.IndexWriter(store,
> > PyLucene.StandardAnalyzer(),
> > > True)
> > >
> > > doc = PyLucene.Document()
> > > doc.add(PyLucene.Field('_all_', 'foo bar bar',
> > > PyLucene.Field.Store.YES,
> > > PyLucene.Field.Index.TOKENIZED))
> > > writer.addDocument(doc)
> > >
> > > doc = PyLucene.Document()
> > > doc.add(PyLucene.Field('_all_', 'foo foo bar',
> > > PyLucene.Field.Store.YES,
> > > PyLucene.Field.Index.TOKENIZED))
> > > writer.addDocument(doc)
> > >
> > > doc = PyLucene.Document()
> > > doc.add(PyLucene.Field('_all_', 'foo bar',
> > > PyLucene.Field.Store.YES,
> > > PyLucene.Field.Index.TOKENIZED))
> > > writer.addDocument(doc)
> > >
> > > writer.optimize()
> > > writer.close()
> > >
> > > return store
> > >
> > > class SimilaritySansTF(PyLucene.DefaultSimilarity):
> > > def tf(freq):
> > > print 'freak out!'
> > > return 1
> > >
> > > main()
> > > ---
> > >
> > > -ofer
> > >
> > > _______________________________________________
> > > pylucene-dev mailing list
> > > [email protected]
> > > http://lists.osafoundation.org/mailman/listinfo/pylucene-dev
> >
> > _______________________________________________
> > pylucene-dev mailing list
> > [email protected]
> > http://lists.osafoundation.org/mailman/listinfo/pylucene-dev
>
> _______________________________________________
> pylucene-dev mailing list
> [email protected]
> http://lists.osafoundation.org/mailman/listinfo/pylucene-dev
_______________________________________________
pylucene-dev mailing list
[email protected]
http://lists.osafoundation.org/mailman/listinfo/pylucene-dev