On Fri, Sep 2, 2011 at 5:37 AM, David Nemeskey <nemeskey.da...@sztaki.hu> wrote:
> Hi,
>
>> >   http://wiki.apache.org/lucene-java/SummerOfCode2011ProjectRanking seem
>> > s interesting. what's the status of this branch? will it be included in
>> > lucene4 release?
>>
>> Hi, its very close. there are some nocommits still in the branch right
>> now, once these are fixed we will look at merging to trunk.
> I've checked the nocommits in the similarities package, and it seems to me
> that there is only one that is really no-worky (the phrase df). The rest are
> about modifications to a few DFR models that are suboptimal, but they work
> nevertheless.
>

thats true: but they do also cause other unexpected things when the
"bounds" are exceeded: e.g. boosting a document up might lower its
score, keeping stopwords in your index is a disaster, etc.

This is because then these stopwords violate the relation that F << N.

This is pretty annoying for practical reasons!  This also means some
of lucene's tests will actually fail if this sim is used... sure we
can disable that particular model from being used in all tests, but
that's not great. I like the idea of rotating all the similarities in
all of lucene's tests, swapping the sims into the tests this way has
found a lot of little issues so far!

> Robert: I figured I'd take a week out for a much needed rest (not), what about
> getting back on this on Monday?
>

enjoy your rest... very well deserved! I'll keep testing and looking
for things and see if I can't find a better solution to the binomial
model (P/D), its the only DFR one left with issues.

I might not be able to help you on monday, its a holiday here and I
will be returning from the river... not sure what time I will make it
back to a computer that day. but please don't let that stop you from
tacking a crack at it!

-- 
lucidimagination.com

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to