I updated the first section in main() with this:

    store = build_index()
    searcher = PyLucene.IndexSearcher(store)
    foo = searcher.getSimilarity()
    print foo
    bar = SimilaritySansTF()
    print bar
    searcher.setSimilarity(bar)
    foo = searcher.getSimilarity()
    print foo
    parser = PyLucene.QueryParser('_all_', PyLucene.StandardAnalyzer()) 

It's a sanity check to ensure that I'm creating an object of type
SimilaritySansTF, and that the setSimilarity() call worked.

The first three lines of output are:

---
[EMAIL PROTECTED]
[EMAIL PROTECTED]
[EMAIL PROTECTED]
---

Now I'm thorougly baffled.  The bar variable is being directly set with the
object created by the constructor call SimilaritySansTF(), and yet when I
print bar, it still identifies itself as type DefaultSimilarity.

I'm new to Python.  Am I not understanding how inheritance works here?

-ofer

> -----Original Message-----
> From: [EMAIL PROTECTED] 
> [mailto:[EMAIL PROTECTED] On Behalf Of Ofer Nave
> Sent: Thursday, March 15, 2007 3:46 PM
> To: list: pylucene-dev
> Subject: [pylucene-dev] why isn't my custom Similarity object 
> changing thebehavior?
> 
> I'm just now starting to play with the scoring algorithm.  
> The first change I want to make is to have the score ignore 
> term frequency.  I created this test script to validate my 
> understanding of the API, but my custom Similarity class 
> doesn't seem to affect the tf values in the output, and I 
> can't figure out why.  I've looked at the docs, the scoring 
> page on the lucene site, and various archived posts, and I 
> don't see anything I've done wrong.
> 
> The print statement in tf() was to test if the overridden 
> method is even getting called.  It's not.
> 
> ---
> import PyLucene
> 
> def main():
>     store = build_index()
>     searcher = PyLucene.IndexSearcher(store)
>     searcher.setSimilarity(SimilaritySansTF())
>     parser = PyLucene.QueryParser('_all_', 
> PyLucene.StandardAnalyzer())
> 
>     query = parser.parse('foo')
>     hits = searcher.search(query)
> 
>     for i, doc in hits:
>         print '[%02d] %s (%0.2f)' % (i, doc.get('_all_'), 
> hits.score(i))
>         print '\t%s' % (searcher.explain(query, hits.id(i)))
> 
> def build_index():
>     store = PyLucene.RAMDirectory()
>     writer = PyLucene.IndexWriter(store, 
> PyLucene.StandardAnalyzer(), True)
> 
>     doc = PyLucene.Document()
>     doc.add(PyLucene.Field('_all_', 'foo bar bar', 
> PyLucene.Field.Store.YES,
> PyLucene.Field.Index.TOKENIZED))
>     writer.addDocument(doc)
> 
>     doc = PyLucene.Document()
>     doc.add(PyLucene.Field('_all_', 'foo foo bar', 
> PyLucene.Field.Store.YES,
> PyLucene.Field.Index.TOKENIZED))
>     writer.addDocument(doc)
> 
>     doc = PyLucene.Document()
>     doc.add(PyLucene.Field('_all_', 'foo bar', 
> PyLucene.Field.Store.YES,
> PyLucene.Field.Index.TOKENIZED))
>     writer.addDocument(doc)
> 
>     writer.optimize()
>     writer.close()
> 
>     return store
> 
> class SimilaritySansTF(PyLucene.DefaultSimilarity):
>     def tf(freq):
>         print 'freak out!'
>         return 1
> 
> main()
> ---
> 
> -ofer
> 
> _______________________________________________
> pylucene-dev mailing list
> [email protected]
> http://lists.osafoundation.org/mailman/listinfo/pylucene-dev

_______________________________________________
pylucene-dev mailing list
[email protected]
http://lists.osafoundation.org/mailman/listinfo/pylucene-dev

Reply via email to