[jira] Resolved: (LUCENE-2274) Catch exceptions in Threads created by JUnit tasks

2010-02-21 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler resolved LUCENE-2274.
---

Resolution: Fixed

Committed Revision: 912376

 Catch exceptions in Threads created by JUnit tasks
 --

 Key: LUCENE-2274
 URL: https://issues.apache.org/jira/browse/LUCENE-2274
 Project: Lucene - Java
  Issue Type: Test
Reporter: Uwe Schindler
Assignee: Uwe Schindler
Priority: Minor
 Fix For: 3.1

 Attachments: LUCENE-2274.patch, LUCENE-2274.patch


 On hudson we had several assertions failed in TestRAMDirectory, that were 
 never caught by the error reportier in JUnit (as the test itsself did not 
 fail). This patch adds a handler for uncaught exceptions to 
 LuceneTestCase(J4) that let the test fail in tearDown().

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-2190) CustomScoreQuery (function query) is broken (due to per-segment searching)

2010-02-21 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-2190:
--

Attachment: LUCENE-2190-2-branch30.patch
LUCENE-2190-2-trunk.patch

Here the patches for trunk (without deprecations) and 3.0 brach. 2.9 will be 
merged later. Merging from trunk - 3.0 is not possible as TestCase heavily 
rewritten.

 CustomScoreQuery (function query) is broken (due to per-segment searching)
 --

 Key: LUCENE-2190
 URL: https://issues.apache.org/jira/browse/LUCENE-2190
 Project: Lucene - Java
  Issue Type: Bug
  Components: Search
Affects Versions: 2.9, 2.9.1, 3.0, 3.0.1, 3.1
Reporter: Michael McCandless
Assignee: Michael McCandless
 Fix For: 2.9.2, 3.0.1, 3.1

 Attachments: LUCENE-2190-2-branch30.patch, LUCENE-2190-2-trunk.patch, 
 LUCENE-2190-2.patch, LUCENE-2190-2.patch, LUCENE-2190.patch


 Spinoff from here:
   http://lucene.markmail.org/message/psw2m3adzibaixbq
 With the cutover to per-segment searching, CustomScoreQuery is not really 
 usable anymore, because the per-doc custom scoring method (customScore) 
 receives a per-segment docID, yet there is no way to figure out which segment 
 you are currently searching.
 I think to fix this we must also notify the subclass whenever a new segment 
 is switched to.  I think if we copy Collector.setNextReader, that would be 
 sufficient.  It would by default do nothing in CustomScoreQuery, but a 
 subclass could override.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-2190) CustomScoreQuery (function query) is broken (due to per-segment searching)

2010-02-21 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-2190:
--

Attachment: LUCENE-2190-2-branch30.patch
LUCENE-2190-2-trunk.patch

Updated patches without javadocs-warnings / fixed javadocs. In trunk the 
backwards branch needs to be patched, too (merge from 3.0 branch).

 CustomScoreQuery (function query) is broken (due to per-segment searching)
 --

 Key: LUCENE-2190
 URL: https://issues.apache.org/jira/browse/LUCENE-2190
 Project: Lucene - Java
  Issue Type: Bug
  Components: Search
Affects Versions: 2.9, 2.9.1, 3.0, 3.0.1, 3.1
Reporter: Michael McCandless
Assignee: Michael McCandless
 Fix For: 2.9.2, 3.0.1, 3.1

 Attachments: LUCENE-2190-2-branch30.patch, 
 LUCENE-2190-2-branch30.patch, LUCENE-2190-2-trunk.patch, 
 LUCENE-2190-2-trunk.patch, LUCENE-2190-2.patch, LUCENE-2190-2.patch, 
 LUCENE-2190.patch


 Spinoff from here:
   http://lucene.markmail.org/message/psw2m3adzibaixbq
 With the cutover to per-segment searching, CustomScoreQuery is not really 
 usable anymore, because the per-doc custom scoring method (customScore) 
 receives a per-segment docID, yet there is no way to figure out which segment 
 you are currently searching.
 I think to fix this we must also notify the subclass whenever a new segment 
 is switched to.  I think if we copy Collector.setNextReader, that would be 
 sufficient.  It would by default do nothing in CustomScoreQuery, but a 
 subclass could override.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-2190) CustomScoreQuery (function query) is broken (due to per-segment searching)

2010-02-21 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-2190:
--

Attachment: LUCENE-2190-2-branch29.patch

Here the patch for 2.9

 CustomScoreQuery (function query) is broken (due to per-segment searching)
 --

 Key: LUCENE-2190
 URL: https://issues.apache.org/jira/browse/LUCENE-2190
 Project: Lucene - Java
  Issue Type: Bug
  Components: Search
Affects Versions: 2.9, 2.9.1, 3.0, 3.0.1, 3.1
Reporter: Michael McCandless
Assignee: Michael McCandless
 Fix For: 2.9.2, 3.0.1, 3.1

 Attachments: LUCENE-2190-2-branch29.patch, 
 LUCENE-2190-2-branch30.patch, LUCENE-2190-2-branch30.patch, 
 LUCENE-2190-2-trunk.patch, LUCENE-2190-2-trunk.patch, LUCENE-2190-2.patch, 
 LUCENE-2190-2.patch, LUCENE-2190.patch


 Spinoff from here:
   http://lucene.markmail.org/message/psw2m3adzibaixbq
 With the cutover to per-segment searching, CustomScoreQuery is not really 
 usable anymore, because the per-doc custom scoring method (customScore) 
 receives a per-segment docID, yet there is no way to figure out which segment 
 you are currently searching.
 I think to fix this we must also notify the subclass whenever a new segment 
 is switched to.  I think if we copy Collector.setNextReader, that would be 
 sufficient.  It would by default do nothing in CustomScoreQuery, but a 
 subclass could override.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Resolved: (LUCENE-2190) CustomScoreQuery (function query) is broken (due to per-segment searching)

2010-02-21 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler resolved LUCENE-2190.
---

   Resolution: Fixed
 Assignee: Uwe Schindler  (was: Michael McCandless)
Lucene Fields: [New, Patch Available]  (was: [New])

Committed 3.0 branch revision: 912383, 912389
Committed trunk revision: 912386
Committed 2.9 branch revision: 912390

Thanks Mike for the help!

 CustomScoreQuery (function query) is broken (due to per-segment searching)
 --

 Key: LUCENE-2190
 URL: https://issues.apache.org/jira/browse/LUCENE-2190
 Project: Lucene - Java
  Issue Type: Bug
  Components: Search
Affects Versions: 2.9, 2.9.1, 3.0, 3.0.1, 3.1
Reporter: Michael McCandless
Assignee: Uwe Schindler
 Fix For: 2.9.2, 3.0.1, 3.1

 Attachments: LUCENE-2190-2-branch29.patch, 
 LUCENE-2190-2-branch30.patch, LUCENE-2190-2-branch30.patch, 
 LUCENE-2190-2-trunk.patch, LUCENE-2190-2-trunk.patch, LUCENE-2190-2.patch, 
 LUCENE-2190-2.patch, LUCENE-2190.patch


 Spinoff from here:
   http://lucene.markmail.org/message/psw2m3adzibaixbq
 With the cutover to per-segment searching, CustomScoreQuery is not really 
 usable anymore, because the per-doc custom scoring method (customScore) 
 receives a per-segment docID, yet there is no way to figure out which segment 
 you are currently searching.
 I think to fix this we must also notify the subclass whenever a new segment 
 is switched to.  I think if we copy Collector.setNextReader, that would be 
 sufficient.  It would by default do nothing in CustomScoreQuery, but a 
 subclass could override.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



RE: (LUCENE-1844) Speed up junit tests

2010-02-21 Thread Uwe Schindler
Another test-bug that now shows as a real test failure (and not only in stderr 
as before, thanks to LUCENE-2274). Happens quite often, will check logs on 
Hudson, how often this happens.

The test failure on my solaris box occurred in backwards branch of trunk.

 

[junit] Testsuite: org.apache.lucene.store.TestRAMDirectory

[junit] Tests run: 4, Failures: 1, Errors: 0, Time elapsed: 0.259 sec

[junit] 

[junit] - Standard Error -

[junit] The following exceptions were thrown by threads:

[junit] *** Thread: Thread-16978 ***

[junit] junit.framework.AssertionFailedError: expected:84992 but 
was:86016

[junit] at junit.framework.Assert.fail(Assert.java:47)

[junit] at junit.framework.Assert.failNotEquals(Assert.java:277)

[junit] at junit.framework.Assert.assertEquals(Assert.java:64)

[junit] at junit.framework.Assert.assertEquals(Assert.java:130)

[junit] at junit.framework.Assert.assertEquals(Assert.java:136)

[junit] at 
org.apache.lucene.store.TestRAMDirectory$1.run(TestRAMDirectory.java:129)

[junit] -  ---

[junit] Testcase: 
testRAMDirectorySize(org.apache.lucene.store.TestRAMDirectory):   FAILED

[junit] Some threads throwed uncaught exceptions!

[junit] junit.framework.AssertionFailedError: Some threads throwed uncaught 
exceptions!

[junit] at 
org.apache.lucene.util.LuceneTestCase.tearDown(LuceneTestCase.java:142)

[junit] at 
org.apache.lucene.store.TestRAMDirectory.tearDown(TestRAMDirectory.java:160)

[junit] at 
org.apache.lucene.util.LuceneTestCase.runBare(LuceneTestCase.java:250)

[junit] 

[junit] 

[junit] TEST org.apache.lucene.store.TestRAMDirectory FAILED

 

-

Uwe Schindler

H.-H.-Meier-Allee 63, D-28213 Bremen

 http://www.thetaphi.de/ http://www.thetaphi.de

eMail: u...@thetaphi.de

 

From: Robert Muir [mailto:rcm...@gmail.com] 
Sent: Sunday, February 21, 2010 10:53 AM
To: java-dev@lucene.apache.org
Subject: Re: (LUCENE-1844) Speed up junit tests

 

here is what i was worried about, if we cannot fix, i can revert back to 
forking. This is not reproduceable all the time:

[junit] Testcase: testParallelMultiSort(org.apache.lucene.search.TestSort): 
Caused an ERROR
[junit] java.util.ConcurrentModificationException
[junit] java.lang.RuntimeException: 
java.util.ConcurrentModificationException
[junit] at 
org.apache.lucene.search.ParallelMultiSearcher.foreach(ParallelMultiSearcher.java:216)
[junit] at 
org.apache.lucene.search.ParallelMultiSearcher.search(ParallelMultiSearcher.java:121)
[junit] at org.apache.lucene.search.Searcher.search(Searcher.java:49)
[junit] at 
org.apache.lucene.search.TestSort.assertMatches(TestSort.java:965)
[junit] at 
org.apache.lucene.search.TestSort.runMultiSorts(TestSort.java:891)
[junit] at 
org.apache.lucene.search.TestSort.testParallelMultiSort(TestSort.java:629)
[junit] at 
org.apache.lucene.util.LuceneTestCase.runBare(LuceneTestCase.java:208)
[junit] Caused by: java.util.ConcurrentModificationException
[junit] at 
java.util.WeakHashMap$HashIterator.nextEntry(WeakHashMap.java:762)
[junit] at java.util.WeakHashMap$KeyIterator.next(WeakHashMap.java:795)
[junit] at 
org.apache.lucene.search.FieldCacheImpl.getCacheEntries(FieldCacheImpl.java:75)
[junit] at 
org.apache.lucene.util.FieldCacheSanityChecker.checkSanity(FieldCacheSanityChecker.java:72)
[junit] at 
org.apache.lucene.search.FieldCacheImpl$Cache.printNewInsanity(FieldCacheImpl.java:205)
[junit] at 
org.apache.lucene.search.FieldCacheImpl$Cache.get(FieldCacheImpl.java:194)
[junit] at 
org.apache.lucene.search.FieldCacheImpl.getInts(FieldCacheImpl.java:357)
[junit] at 
org.apache.lucene.search.FieldCacheImpl$IntCache.createValue(FieldCacheImpl.java:373)
[junit] at 
org.apache.lucene.search.FieldCacheImpl$Cache.get(FieldCacheImpl.java:183)
[junit] at 
org.apache.lucene.search.FieldCacheImpl.getInts(FieldCacheImpl.java:357)
[junit] at 
org.apache.lucene.search.FieldComparator$IntComparator.setNextReader(FieldComparator.java:438)
[junit] at 
org.apache.lucene.search.TopFieldCollector$OneComparatorNonScoringCollector.setNextReader(TopFieldCollector.java:95)
[junit] at 
org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:207)
[junit] at 
org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:197)
[junit] at 
org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:175)
[junit] at 
org.apache.lucene.search.MultiSearcher$MultiSearcherCallableWithSort.call(MultiSearcher.java:420)
[junit] at 
org.apache.lucene.search.MultiSearcher$MultiSearcherCallableWithSort.call(MultiSearcher.java:394)
[junit] at 
java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303

RE: (LUCENE-1844) Speed up junit tests

2010-02-21 Thread Uwe Schindler
I fixed the backwards test and removed the assertion there. This was forgotten 
to be merged back. The reason why this test fails:

 

TestRAMDir creates 10 threads that start to add files to a RAMDir and adds 
content to these files. The RAMFile updates its own size and also updates the 
size of the enclosing RAMDir (using AtomicLong). The problem is, that all 
threads do this in parallel and maybe another thread added the contents to the 
file and have updated the parents AtomicLong but not yet its own size. As 
update local size and in RAMDir is no longer atomic for the another thread, the 
assertion fails. In previous versions of RamDir both updates were one atomic 
op, synced to the directory. For speed reasons this was removed, so writing to 
RAMFiles no longer locks the parent directory.

 

-

Uwe Schindler

H.-H.-Meier-Allee 63, D-28213 Bremen

 http://www.thetaphi.de/ http://www.thetaphi.de

eMail: u...@thetaphi.de

 

From: Robert Muir [mailto:rcm...@gmail.com] 
Sent: Sunday, February 21, 2010 8:44 PM
To: java-dev@lucene.apache.org
Subject: Re: (LUCENE-1844) Speed up junit tests

 

Mike removed this assertion in LUCENE-2095, so this only happens in the 
backwards tests.

On Sun, Feb 21, 2010 at 2:26 PM, Uwe Schindler u...@thetaphi.de wrote:

Another test-bug that now shows as a real test failure (and not only in stderr 
as before, thanks to LUCENE-2274). Happens quite often, will check logs on 
Hudson, how often this happens.

The test failure on my solaris box occurred in backwards branch of trunk.

 

[junit] Testsuite: org.apache.lucene.store.TestRAMDirectory

[junit] Tests run: 4, Failures: 1, Errors: 0, Time elapsed: 0.259 sec

[junit] 

[junit] - Standard Error -

[junit] The following exceptions were thrown by threads:

[junit] *** Thread: Thread-16978 ***

[junit] junit.framework.AssertionFailedError: expected:84992 but 
was:86016

[junit] at junit.framework.Assert.fail(Assert.java:47)

[junit] at junit.framework.Assert.failNotEquals(Assert.java:277)

[junit] at junit.framework.Assert.assertEquals(Assert.java:64)

[junit] at junit.framework.Assert.assertEquals(Assert.java:130)

[junit] at junit.framework.Assert.assertEquals(Assert.java:136)

[junit] at 
org.apache.lucene.store.TestRAMDirectory$1.run(TestRAMDirectory.java:129)

[junit] -  ---

[junit] Testcase: 
testRAMDirectorySize(org.apache.lucene.store.TestRAMDirectory):   FAILED

[junit] Some threads throwed uncaught exceptions!

[junit] junit.framework.AssertionFailedError: Some threads throwed uncaught 
exceptions!

[junit] at 
org.apache.lucene.util.LuceneTestCase.tearDown(LuceneTestCase.java:142)

[junit] at 
org.apache.lucene.store.TestRAMDirectory.tearDown(TestRAMDirectory.java:160)

[junit] at 
org.apache.lucene.util.LuceneTestCase.runBare(LuceneTestCase.java:250)

[junit] 

[junit] 

[junit] TEST org.apache.lucene.store.TestRAMDirectory FAILED

 

-

Uwe Schindler

H.-H.-Meier-Allee 63, D-28213 Bremen

http://www.thetaphi.de http://www.thetaphi.de/ 

eMail: u...@thetaphi.de

 

From: Robert Muir [mailto:rcm...@gmail.com] 
Sent: Sunday, February 21, 2010 10:53 AM


To: java-dev@lucene.apache.org
Subject: Re: (LUCENE-1844) Speed up junit tests

 

here is what i was worried about, if we cannot fix, i can revert back to 
forking. This is not reproduceable all the time:

[junit] Testcase: testParallelMultiSort(org.apache.lucene.search.TestSort): 
Caused an ERROR
[junit] java.util.ConcurrentModificationException
[junit] java.lang.RuntimeException: 
java.util.ConcurrentModificationException
[junit] at 
org.apache.lucene.search.ParallelMultiSearcher.foreach(ParallelMultiSearcher.java:216)
[junit] at 
org.apache.lucene.search.ParallelMultiSearcher.search(ParallelMultiSearcher.java:121)
[junit] at org.apache.lucene.search.Searcher.search(Searcher.java:49)
[junit] at 
org.apache.lucene.search.TestSort.assertMatches(TestSort.java:965)
[junit] at 
org.apache.lucene.search.TestSort.runMultiSorts(TestSort.java:891)
[junit] at 
org.apache.lucene.search.TestSort.testParallelMultiSort(TestSort.java:629)
[junit] at 
org.apache.lucene.util.LuceneTestCase.runBare(LuceneTestCase.java:208)
[junit] Caused by: java.util.ConcurrentModificationException
[junit] at 
java.util.WeakHashMap$HashIterator.nextEntry(WeakHashMap.java:762)
[junit] at java.util.WeakHashMap$KeyIterator.next(WeakHashMap.java:795)
[junit] at 
org.apache.lucene.search.FieldCacheImpl.getCacheEntries(FieldCacheImpl.java:75)
[junit] at 
org.apache.lucene.util.FieldCacheSanityChecker.checkSanity(FieldCacheSanityChecker.java:72)
[junit] at 
org.apache.lucene.search.FieldCacheImpl$Cache.printNewInsanity(FieldCacheImpl.java:205)
[junit

[jira] Updated: (LUCENE-2271) Function queries producing scores of -inf or NaN (e.g. 1/x) return incorrect results with TopScoreDocCollector

2010-02-21 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-2271:
--

Attachment: LUCENE-2271-maybe-as-separate-collector.patch

After applying Mike's patch (with modified asserts to correctly detect NaN), 
updated my patch of the delegating and -inf/NaN aware TopScoreDocCollector.

Maybe we should add it as a separate collector for function queries in 3.1. 
Maybe with correct NaN ordering?

 Function queries producing scores of -inf or NaN (e.g. 1/x) return incorrect 
 results with TopScoreDocCollector
 --

 Key: LUCENE-2271
 URL: https://issues.apache.org/jira/browse/LUCENE-2271
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Affects Versions: 2.9
Reporter: Uwe Schindler
Priority: Minor
 Fix For: 3.1

 Attachments: LUCENE-2271-bench.patch, 
 LUCENE-2271-maybe-as-separate-collector.patch, LUCENE-2271.patch, 
 LUCENE-2271.patch, LUCENE-2271.patch, LUCENE-2271.patch, LUCENE-2271.patch, 
 TSDC.patch


 This is a foolowup to LUCENE-2270, where a part of this problem was fixed 
 (boost = 0 leading to NaN scores, which is also un-intuitive), but in 
 general, function queries in Solr can create these invalid scores easily. In 
 previous version of Lucene these scores ordered correct (except NaN, which 
 mixes up results), but never invalid document ids are returned (like 
 Integer.MAX_VALUE).
 The problem is: TopScoreDocCollector pre-fills the HitQueue with sentinel 
 ScoreDocs with a score of -inf and a doc id of Integer.MAX_VALUE. For the HQ 
 to work, this sentinel must be smaller than all posible values, which is not 
 the case:
 - -inf is equal and the document is not inserted into the HQ, as not 
 competitive, but the HQ is not yet full, so the sentinel values keep in the 
 HQ and result is the Integer.MAX_VALUE docs. This problem is solveable (and 
 only affects the Ordered collector) by chaning the exit condition to:
 {code}
 if (score = pqTop.score  pqTop.doc != Integer.MAX_VALUE) {
 // Since docs are returned in-order (i.e., increasing doc Id), a document
 // with equal score to pqTop.score cannot compete since HitQueue favors
 // documents with lower doc Ids. Therefore reject those docs too.
 return;
 }
 {code}
 - The NaN case can be fixed in the same way, but then has another problem: 
 all comparisons with NaN result in false (none of these is true): x  NaN, x 
  NaN, NaN == NaN. This leads to the fact that HQ's lessThan always returns 
 false, leading to unexspected ordering in the PQ and sometimes the sentinel 
 values do not stay at the top of the queue. A later hit then overrides the 
 top of the queue but leaves the incorrect sentinels  unchanged - invalid 
 results. This can be fixed in two ways in HQ:
 Force all sentinels to the top:
 {code}
 protected final boolean lessThan(ScoreDoc hitA, ScoreDoc hitB) {
 if (hitA.doc == Integer.MAX_VALUE)
   return true;
 if (hitB.doc == Integer.MAX_VALUE)
   return false;
 if (hitA.score == hitB.score)
   return hitA.doc  hitB.doc; 
 else
   return hitA.score  hitB.score;
 }
 {code}
 or alternatively have a defined order for NaN (Float.compare sorts them after 
 +inf):
 {code}
 protected final boolean lessThan(ScoreDoc hitA, ScoreDoc hitB) {
 if (hitA.score == hitB.score)
   return hitA.doc  hitB.doc; 
 else
   return Float.compare(hitA.score, hitB.score)  0;
 }
 {code}
 The problem with both solutions is, that we have now more comparisons per hit 
 and the use of sentinels is questionable. I would like to remove the 
 sentinels and use the old pre 2.9 code for comparing and using PQ.add() when 
 a competitive hit arrives. The order of NaN would be unspecified.
 To fix the order of NaN, it would be better to replace all score comparisons 
 by Float.compare() [also in FieldComparator].
 I would like to delay 2.9.2 and 3.0.1 until this problem is discussed and 
 solved.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-2271) Function queries producing scores of -inf or NaN (e.g. 1/x) return incorrect results with TopScoreDocCollector

2010-02-21 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-2271:
--

Fix Version/s: (was: 3.0.1)
   (was: 2.9.2)

 Function queries producing scores of -inf or NaN (e.g. 1/x) return incorrect 
 results with TopScoreDocCollector
 --

 Key: LUCENE-2271
 URL: https://issues.apache.org/jira/browse/LUCENE-2271
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Affects Versions: 2.9
Reporter: Uwe Schindler
Priority: Minor
 Fix For: 3.1

 Attachments: LUCENE-2271-bench.patch, 
 LUCENE-2271-maybe-as-separate-collector.patch, LUCENE-2271.patch, 
 LUCENE-2271.patch, LUCENE-2271.patch, LUCENE-2271.patch, LUCENE-2271.patch, 
 TSDC.patch


 This is a foolowup to LUCENE-2270, where a part of this problem was fixed 
 (boost = 0 leading to NaN scores, which is also un-intuitive), but in 
 general, function queries in Solr can create these invalid scores easily. In 
 previous version of Lucene these scores ordered correct (except NaN, which 
 mixes up results), but never invalid document ids are returned (like 
 Integer.MAX_VALUE).
 The problem is: TopScoreDocCollector pre-fills the HitQueue with sentinel 
 ScoreDocs with a score of -inf and a doc id of Integer.MAX_VALUE. For the HQ 
 to work, this sentinel must be smaller than all posible values, which is not 
 the case:
 - -inf is equal and the document is not inserted into the HQ, as not 
 competitive, but the HQ is not yet full, so the sentinel values keep in the 
 HQ and result is the Integer.MAX_VALUE docs. This problem is solveable (and 
 only affects the Ordered collector) by chaning the exit condition to:
 {code}
 if (score = pqTop.score  pqTop.doc != Integer.MAX_VALUE) {
 // Since docs are returned in-order (i.e., increasing doc Id), a document
 // with equal score to pqTop.score cannot compete since HitQueue favors
 // documents with lower doc Ids. Therefore reject those docs too.
 return;
 }
 {code}
 - The NaN case can be fixed in the same way, but then has another problem: 
 all comparisons with NaN result in false (none of these is true): x  NaN, x 
  NaN, NaN == NaN. This leads to the fact that HQ's lessThan always returns 
 false, leading to unexspected ordering in the PQ and sometimes the sentinel 
 values do not stay at the top of the queue. A later hit then overrides the 
 top of the queue but leaves the incorrect sentinels  unchanged - invalid 
 results. This can be fixed in two ways in HQ:
 Force all sentinels to the top:
 {code}
 protected final boolean lessThan(ScoreDoc hitA, ScoreDoc hitB) {
 if (hitA.doc == Integer.MAX_VALUE)
   return true;
 if (hitB.doc == Integer.MAX_VALUE)
   return false;
 if (hitA.score == hitB.score)
   return hitA.doc  hitB.doc; 
 else
   return hitA.score  hitB.score;
 }
 {code}
 or alternatively have a defined order for NaN (Float.compare sorts them after 
 +inf):
 {code}
 protected final boolean lessThan(ScoreDoc hitA, ScoreDoc hitB) {
 if (hitA.score == hitB.score)
   return hitA.doc  hitB.doc; 
 else
   return Float.compare(hitA.score, hitB.score)  0;
 }
 {code}
 The problem with both solutions is, that we have now more comparisons per hit 
 and the use of sentinels is questionable. I would like to remove the 
 sentinels and use the old pre 2.9 code for comparing and using PQ.add() when 
 a competitive hit arrives. The order of NaN would be unspecified.
 To fix the order of NaN, it would be better to replace all score comparisons 
 by Float.compare() [also in FieldComparator].
 I would like to delay 2.9.2 and 3.0.1 until this problem is discussed and 
 solved.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[VOTE] Lucene Java 2.9.2 and 3.0.1 release artifacts - Take #2

2010-02-21 Thread Uwe Schindler
Hallo Folks,

I have posted a new release candidate (take #2) for both Lucene Java 2.9.2 and 
3.0.1 (which both have the same bug fix level, functionality and release 
announcement), build from revision 912433 of the corresponding branches. Thanks 
for all your help! Please test them and give your votes until *Thursday 
morning*, as the scheduled release date for both versions is Friday, Feb 26th, 
2010. Only votes from Lucene PMC are binding, but everyone
is welcome to check the release candidate and voice their approval or 
disapproval. The vote passes if at least three binding +1 votes are cast.

We planned the parallel release with one announcement because of their parallel 
development / bug fix level to emphasize that they are equal except deprecation 
removal and Java 5 since major version 3.

Updates since take #1 can be followed in issues:
https://issues.apache.org/jira/browse/LUCENE-2190 (reopened, fixed)
https://issues.apache.org/jira/browse/LUCENE-2270 (fixed)
https://issues.apache.org/jira/browse/LUCENE-2271 (won't fix for 2.9.2/3.0.1)

You can find the artifacts here:
http://people.apache.org/~uschindler/staging-area/lucene-292-301-take2-rev912433/

Maven repo:
http://people.apache.org/~uschindler/staging-area/lucene-292-301-take2-rev912433/maven/

The changes are here:
http://people.apache.org/~uschindler/staging-area/lucene-292-301-take2-rev912433/changes-2.9.2/Changes.html
http://people.apache.org/~uschindler/staging-area/lucene-292-301-take2-rev912433/changes-2.9.2/Contrib-Changes.html

http://people.apache.org/~uschindler/staging-area/lucene-292-301-take2-rev912433/changes-3.0.1/Changes.html
http://people.apache.org/~uschindler/staging-area/lucene-292-301-take2-rev912433/changes-3.0.1/Contrib-Changes.html

Uwe

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de




-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-2190) CustomScoreQuery (function query) is broken (due to per-segment searching)

2010-02-20 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-2190:
--

Attachment: LUCENE-2190-2.patch

Here a better solution. It now works like Filter's getDocIdSet: For customizing 
scores, you have to override the similar protected method 
getCustomScoreProvider(IndexReader) and return a subclass of 
CustomScoreProvider. The default delegates to the backwards layer.

The semantics is now identical to filters: Each IndexReader of a segment gets 
its own calculator like its own DocIdSet in filters.

Also fixes the following problems:
- rewrite() was incorrectly implemented (now works like BooleanQuery.rewrite())
- equals/hashCode ignored strict

 CustomScoreQuery (function query) is broken (due to per-segment searching)
 --

 Key: LUCENE-2190
 URL: https://issues.apache.org/jira/browse/LUCENE-2190
 Project: Lucene - Java
  Issue Type: Bug
  Components: Search
Affects Versions: 2.9, 2.9.1, 3.0, 3.0.1, 3.1
Reporter: Michael McCandless
Assignee: Michael McCandless
 Fix For: 2.9.2, 3.0.1, 3.1

 Attachments: LUCENE-2190-2.patch, LUCENE-2190.patch


 Spinoff from here:
   http://lucene.markmail.org/message/psw2m3adzibaixbq
 With the cutover to per-segment searching, CustomScoreQuery is not really 
 usable anymore, because the per-doc custom scoring method (customScore) 
 receives a per-segment docID, yet there is no way to figure out which segment 
 you are currently searching.
 I think to fix this we must also notify the subclass whenever a new segment 
 is switched to.  I think if we copy Collector.setNextReader, that would be 
 sufficient.  It would by default do nothing in CustomScoreQuery, but a 
 subclass could override.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-2190) CustomScoreQuery (function query) is broken (due to per-segment searching)

2010-02-20 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-2190:
--

Attachment: LUCENE-2190-2.patch

Updated patch (forgot to add an IOException to getCustomScoreProvider and fixed 
test).

Will backport after committing to 3.0 and 2.9 (argh).

 CustomScoreQuery (function query) is broken (due to per-segment searching)
 --

 Key: LUCENE-2190
 URL: https://issues.apache.org/jira/browse/LUCENE-2190
 Project: Lucene - Java
  Issue Type: Bug
  Components: Search
Affects Versions: 2.9, 2.9.1, 3.0, 3.0.1, 3.1
Reporter: Michael McCandless
Assignee: Michael McCandless
 Fix For: 2.9.2, 3.0.1, 3.1

 Attachments: LUCENE-2190-2.patch, LUCENE-2190-2.patch, 
 LUCENE-2190.patch


 Spinoff from here:
   http://lucene.markmail.org/message/psw2m3adzibaixbq
 With the cutover to per-segment searching, CustomScoreQuery is not really 
 usable anymore, because the per-doc custom scoring method (customScore) 
 receives a per-segment docID, yet there is no way to figure out which segment 
 you are currently searching.
 I think to fix this we must also notify the subclass whenever a new segment 
 is switched to.  I think if we copy Collector.setNextReader, that would be 
 sufficient.  It would by default do nothing in CustomScoreQuery, but a 
 subclass could override.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Resolved: (LUCENE-2267) Add solr's artifact signing scripts into lucene's build.xml/common-build.xml

2010-02-20 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler resolved LUCENE-2267.
---

   Resolution: Fixed
Lucene Fields: [New, Patch Available]  (was: [New])

Committed revision: 912115

 Add solr's artifact signing scripts into lucene's build.xml/common-build.xml
 

 Key: LUCENE-2267
 URL: https://issues.apache.org/jira/browse/LUCENE-2267
 Project: Lucene - Java
  Issue Type: Improvement
Affects Versions: 2.9.2, 3.0.1
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 3.1

 Attachments: LUCENE-2267.patch, LUCENE-2267.patch


 Solr has nice artifact signing scripts in its common-build.xml and build.xml.
 For me as release manager of 3.0 it would have be good to have them also when 
 building lucene artifacts. I will investigate how to add them to src 
 artifacts and maven artifacts

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2271) Function queries producing scores of -inf or NaN (e.g. 1/x) return incorrect results with TopScoreDocCollector

2010-02-20 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12836182#action_12836182
 ] 

Uwe Schindler commented on LUCENE-2271:
---

In my opinion we should fix it using the attached patch and in the future 3.1 
do some refactoring:
- no sentinels
- define a order for NaN, as NaN breaks the complete order of results (because 
PQ cannot handle the case that lessThan(a,b) returns false and also 
lessThan(b,a) when NaN is involved)

 Function queries producing scores of -inf or NaN (e.g. 1/x) return incorrect 
 results with TopScoreDocCollector
 --

 Key: LUCENE-2271
 URL: https://issues.apache.org/jira/browse/LUCENE-2271
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Affects Versions: 2.9
Reporter: Uwe Schindler
Priority: Minor
 Fix For: 2.9.2, 3.0.1, 3.1

 Attachments: LUCENE-2271.patch, LUCENE-2271.patch, LUCENE-2271.patch, 
 TSDC.patch


 This is a foolowup to LUCENE-2270, where a part of this problem was fixed 
 (boost = 0 leading to NaN scores, which is also un-intuitive), but in 
 general, function queries in Solr can create these invalid scores easily. In 
 previous version of Lucene these scores ordered correct (except NaN, which 
 mixes up results), but never invalid document ids are returned (like 
 Integer.MAX_VALUE).
 The problem is: TopScoreDocCollector pre-fills the HitQueue with sentinel 
 ScoreDocs with a score of -inf and a doc id of Integer.MAX_VALUE. For the HQ 
 to work, this sentinel must be smaller than all posible values, which is not 
 the case:
 - -inf is equal and the document is not inserted into the HQ, as not 
 competitive, but the HQ is not yet full, so the sentinel values keep in the 
 HQ and result is the Integer.MAX_VALUE docs. This problem is solveable (and 
 only affects the Ordered collector) by chaning the exit condition to:
 {code}
 if (score = pqTop.score  pqTop.doc != Integer.MAX_VALUE) {
 // Since docs are returned in-order (i.e., increasing doc Id), a document
 // with equal score to pqTop.score cannot compete since HitQueue favors
 // documents with lower doc Ids. Therefore reject those docs too.
 return;
 }
 {code}
 - The NaN case can be fixed in the same way, but then has another problem: 
 all comparisons with NaN result in false (none of these is true): x  NaN, x 
  NaN, NaN == NaN. This leads to the fact that HQ's lessThan always returns 
 false, leading to unexspected ordering in the PQ and sometimes the sentinel 
 values do not stay at the top of the queue. A later hit then overrides the 
 top of the queue but leaves the incorrect sentinels  unchanged - invalid 
 results. This can be fixed in two ways in HQ:
 Force all sentinels to the top:
 {code}
 protected final boolean lessThan(ScoreDoc hitA, ScoreDoc hitB) {
 if (hitA.doc == Integer.MAX_VALUE)
   return true;
 if (hitB.doc == Integer.MAX_VALUE)
   return false;
 if (hitA.score == hitB.score)
   return hitA.doc  hitB.doc; 
 else
   return hitA.score  hitB.score;
 }
 {code}
 or alternatively have a defined order for NaN (Float.compare sorts them after 
 +inf):
 {code}
 protected final boolean lessThan(ScoreDoc hitA, ScoreDoc hitB) {
 if (hitA.score == hitB.score)
   return hitA.doc  hitB.doc; 
 else
   return Float.compare(hitA.score, hitB.score)  0;
 }
 {code}
 The problem with both solutions is, that we have now more comparisons per hit 
 and the use of sentinels is questionable. I would like to remove the 
 sentinels and use the old pre 2.9 code for comparing and using PQ.add() when 
 a competitive hit arrives. The order of NaN would be unspecified.
 To fix the order of NaN, it would be better to replace all score comparisons 
 by Float.compare() [also in FieldComparator].
 I would like to delay 2.9.2 and 3.0.1 until this problem is discussed and 
 solved.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-2271) Function queries producing scores of -inf or NaN (e.g. 1/x) return incorrect results with TopScoreDocCollector

2010-02-20 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-2271:
--

Attachment: LUCENE-2271.patch

Here a simplier patch with sentinels removed. You can maybe think about a 
better if-check in the out of order collector

 Function queries producing scores of -inf or NaN (e.g. 1/x) return incorrect 
 results with TopScoreDocCollector
 --

 Key: LUCENE-2271
 URL: https://issues.apache.org/jira/browse/LUCENE-2271
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Affects Versions: 2.9
Reporter: Uwe Schindler
Priority: Minor
 Fix For: 2.9.2, 3.0.1, 3.1

 Attachments: LUCENE-2271.patch, LUCENE-2271.patch, LUCENE-2271.patch, 
 LUCENE-2271.patch, TSDC.patch


 This is a foolowup to LUCENE-2270, where a part of this problem was fixed 
 (boost = 0 leading to NaN scores, which is also un-intuitive), but in 
 general, function queries in Solr can create these invalid scores easily. In 
 previous version of Lucene these scores ordered correct (except NaN, which 
 mixes up results), but never invalid document ids are returned (like 
 Integer.MAX_VALUE).
 The problem is: TopScoreDocCollector pre-fills the HitQueue with sentinel 
 ScoreDocs with a score of -inf and a doc id of Integer.MAX_VALUE. For the HQ 
 to work, this sentinel must be smaller than all posible values, which is not 
 the case:
 - -inf is equal and the document is not inserted into the HQ, as not 
 competitive, but the HQ is not yet full, so the sentinel values keep in the 
 HQ and result is the Integer.MAX_VALUE docs. This problem is solveable (and 
 only affects the Ordered collector) by chaning the exit condition to:
 {code}
 if (score = pqTop.score  pqTop.doc != Integer.MAX_VALUE) {
 // Since docs are returned in-order (i.e., increasing doc Id), a document
 // with equal score to pqTop.score cannot compete since HitQueue favors
 // documents with lower doc Ids. Therefore reject those docs too.
 return;
 }
 {code}
 - The NaN case can be fixed in the same way, but then has another problem: 
 all comparisons with NaN result in false (none of these is true): x  NaN, x 
  NaN, NaN == NaN. This leads to the fact that HQ's lessThan always returns 
 false, leading to unexspected ordering in the PQ and sometimes the sentinel 
 values do not stay at the top of the queue. A later hit then overrides the 
 top of the queue but leaves the incorrect sentinels  unchanged - invalid 
 results. This can be fixed in two ways in HQ:
 Force all sentinels to the top:
 {code}
 protected final boolean lessThan(ScoreDoc hitA, ScoreDoc hitB) {
 if (hitA.doc == Integer.MAX_VALUE)
   return true;
 if (hitB.doc == Integer.MAX_VALUE)
   return false;
 if (hitA.score == hitB.score)
   return hitA.doc  hitB.doc; 
 else
   return hitA.score  hitB.score;
 }
 {code}
 or alternatively have a defined order for NaN (Float.compare sorts them after 
 +inf):
 {code}
 protected final boolean lessThan(ScoreDoc hitA, ScoreDoc hitB) {
 if (hitA.score == hitB.score)
   return hitA.doc  hitB.doc; 
 else
   return Float.compare(hitA.score, hitB.score)  0;
 }
 {code}
 The problem with both solutions is, that we have now more comparisons per hit 
 and the use of sentinels is questionable. I would like to remove the 
 sentinels and use the old pre 2.9 code for comparing and using PQ.add() when 
 a competitive hit arrives. The order of NaN would be unspecified.
 To fix the order of NaN, it would be better to replace all score comparisons 
 by Float.compare() [also in FieldComparator].
 I would like to delay 2.9.2 and 3.0.1 until this problem is discussed and 
 solved.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-2271) Function queries producing scores of -inf or NaN (e.g. 1/x) return incorrect results with TopScoreDocCollector

2010-02-20 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-2271:
--

Attachment: LUCENE-2271.patch

Sorry, insertWithOverflow is correct!

 Function queries producing scores of -inf or NaN (e.g. 1/x) return incorrect 
 results with TopScoreDocCollector
 --

 Key: LUCENE-2271
 URL: https://issues.apache.org/jira/browse/LUCENE-2271
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Affects Versions: 2.9
Reporter: Uwe Schindler
Priority: Minor
 Fix For: 2.9.2, 3.0.1, 3.1

 Attachments: LUCENE-2271.patch, LUCENE-2271.patch, LUCENE-2271.patch, 
 LUCENE-2271.patch, LUCENE-2271.patch, TSDC.patch


 This is a foolowup to LUCENE-2270, where a part of this problem was fixed 
 (boost = 0 leading to NaN scores, which is also un-intuitive), but in 
 general, function queries in Solr can create these invalid scores easily. In 
 previous version of Lucene these scores ordered correct (except NaN, which 
 mixes up results), but never invalid document ids are returned (like 
 Integer.MAX_VALUE).
 The problem is: TopScoreDocCollector pre-fills the HitQueue with sentinel 
 ScoreDocs with a score of -inf and a doc id of Integer.MAX_VALUE. For the HQ 
 to work, this sentinel must be smaller than all posible values, which is not 
 the case:
 - -inf is equal and the document is not inserted into the HQ, as not 
 competitive, but the HQ is not yet full, so the sentinel values keep in the 
 HQ and result is the Integer.MAX_VALUE docs. This problem is solveable (and 
 only affects the Ordered collector) by chaning the exit condition to:
 {code}
 if (score = pqTop.score  pqTop.doc != Integer.MAX_VALUE) {
 // Since docs are returned in-order (i.e., increasing doc Id), a document
 // with equal score to pqTop.score cannot compete since HitQueue favors
 // documents with lower doc Ids. Therefore reject those docs too.
 return;
 }
 {code}
 - The NaN case can be fixed in the same way, but then has another problem: 
 all comparisons with NaN result in false (none of these is true): x  NaN, x 
  NaN, NaN == NaN. This leads to the fact that HQ's lessThan always returns 
 false, leading to unexspected ordering in the PQ and sometimes the sentinel 
 values do not stay at the top of the queue. A later hit then overrides the 
 top of the queue but leaves the incorrect sentinels  unchanged - invalid 
 results. This can be fixed in two ways in HQ:
 Force all sentinels to the top:
 {code}
 protected final boolean lessThan(ScoreDoc hitA, ScoreDoc hitB) {
 if (hitA.doc == Integer.MAX_VALUE)
   return true;
 if (hitB.doc == Integer.MAX_VALUE)
   return false;
 if (hitA.score == hitB.score)
   return hitA.doc  hitB.doc; 
 else
   return hitA.score  hitB.score;
 }
 {code}
 or alternatively have a defined order for NaN (Float.compare sorts them after 
 +inf):
 {code}
 protected final boolean lessThan(ScoreDoc hitA, ScoreDoc hitB) {
 if (hitA.score == hitB.score)
   return hitA.doc  hitB.doc; 
 else
   return Float.compare(hitA.score, hitB.score)  0;
 }
 {code}
 The problem with both solutions is, that we have now more comparisons per hit 
 and the use of sentinels is questionable. I would like to remove the 
 sentinels and use the old pre 2.9 code for comparing and using PQ.add() when 
 a competitive hit arrives. The order of NaN would be unspecified.
 To fix the order of NaN, it would be better to replace all score comparisons 
 by Float.compare() [also in FieldComparator].
 I would like to delay 2.9.2 and 3.0.1 until this problem is discussed and 
 solved.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-1935) Generify PriorityQueue

2010-02-20 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1935?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-1935:
--

Attachment: HitQueue.jad

Just for reference: Here is the generated class (by javac) when overriding 
lessThan (as example HitQueue), decompiled from the resulting class file by JAD.

 Generify PriorityQueue
 --

 Key: LUCENE-1935
 URL: https://issues.apache.org/jira/browse/LUCENE-1935
 Project: Lucene - Java
  Issue Type: Task
  Components: Other
Affects Versions: 2.9
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 3.0

 Attachments: HitQueue.jad, LUCENE-1935.patch


 Priority Queue should use generics like all other Java 5 Collection API 
 classes. This very simple, but makes code more readable.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-2271) Function queries producing scores of -inf or NaN (e.g. 1/x) return incorrect results with TopScoreDocCollector

2010-02-20 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-2271:
--

Attachment: LUCENE-2271.patch

Here is a new impl that only has exactly one additional check in the initial 
collection (when th pq is not yet full). After the PQ is full, the collector is 
replaced by the short-cutting one.

This impl should even be faster than before, if the additional method call does 
not count and is removed by the JVM (which it should, because its clearly 
predictable)

 Function queries producing scores of -inf or NaN (e.g. 1/x) return incorrect 
 results with TopScoreDocCollector
 --

 Key: LUCENE-2271
 URL: https://issues.apache.org/jira/browse/LUCENE-2271
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Affects Versions: 2.9
Reporter: Uwe Schindler
Priority: Minor
 Fix For: 2.9.2, 3.0.1, 3.1

 Attachments: LUCENE-2271.patch, LUCENE-2271.patch, LUCENE-2271.patch, 
 LUCENE-2271.patch, LUCENE-2271.patch, LUCENE-2271.patch, LUCENE-2271.patch, 
 TSDC.patch


 This is a foolowup to LUCENE-2270, where a part of this problem was fixed 
 (boost = 0 leading to NaN scores, which is also un-intuitive), but in 
 general, function queries in Solr can create these invalid scores easily. In 
 previous version of Lucene these scores ordered correct (except NaN, which 
 mixes up results), but never invalid document ids are returned (like 
 Integer.MAX_VALUE).
 The problem is: TopScoreDocCollector pre-fills the HitQueue with sentinel 
 ScoreDocs with a score of -inf and a doc id of Integer.MAX_VALUE. For the HQ 
 to work, this sentinel must be smaller than all posible values, which is not 
 the case:
 - -inf is equal and the document is not inserted into the HQ, as not 
 competitive, but the HQ is not yet full, so the sentinel values keep in the 
 HQ and result is the Integer.MAX_VALUE docs. This problem is solveable (and 
 only affects the Ordered collector) by chaning the exit condition to:
 {code}
 if (score = pqTop.score  pqTop.doc != Integer.MAX_VALUE) {
 // Since docs are returned in-order (i.e., increasing doc Id), a document
 // with equal score to pqTop.score cannot compete since HitQueue favors
 // documents with lower doc Ids. Therefore reject those docs too.
 return;
 }
 {code}
 - The NaN case can be fixed in the same way, but then has another problem: 
 all comparisons with NaN result in false (none of these is true): x  NaN, x 
  NaN, NaN == NaN. This leads to the fact that HQ's lessThan always returns 
 false, leading to unexspected ordering in the PQ and sometimes the sentinel 
 values do not stay at the top of the queue. A later hit then overrides the 
 top of the queue but leaves the incorrect sentinels  unchanged - invalid 
 results. This can be fixed in two ways in HQ:
 Force all sentinels to the top:
 {code}
 protected final boolean lessThan(ScoreDoc hitA, ScoreDoc hitB) {
 if (hitA.doc == Integer.MAX_VALUE)
   return true;
 if (hitB.doc == Integer.MAX_VALUE)
   return false;
 if (hitA.score == hitB.score)
   return hitA.doc  hitB.doc; 
 else
   return hitA.score  hitB.score;
 }
 {code}
 or alternatively have a defined order for NaN (Float.compare sorts them after 
 +inf):
 {code}
 protected final boolean lessThan(ScoreDoc hitA, ScoreDoc hitB) {
 if (hitA.score == hitB.score)
   return hitA.doc  hitB.doc; 
 else
   return Float.compare(hitA.score, hitB.score)  0;
 }
 {code}
 The problem with both solutions is, that we have now more comparisons per hit 
 and the use of sentinels is questionable. I would like to remove the 
 sentinels and use the old pre 2.9 code for comparing and using PQ.add() when 
 a competitive hit arrives. The order of NaN would be unspecified.
 To fix the order of NaN, it would be better to replace all score comparisons 
 by Float.compare() [also in FieldComparator].
 I would like to delay 2.9.2 and 3.0.1 until this problem is discussed and 
 solved.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-2271) Function queries producing scores of -inf or NaN (e.g. 1/x) return incorrect results with TopScoreDocCollector

2010-02-20 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-2271:
--

Attachment: (was: LUCENE-2271.patch)

 Function queries producing scores of -inf or NaN (e.g. 1/x) return incorrect 
 results with TopScoreDocCollector
 --

 Key: LUCENE-2271
 URL: https://issues.apache.org/jira/browse/LUCENE-2271
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Affects Versions: 2.9
Reporter: Uwe Schindler
Priority: Minor
 Fix For: 2.9.2, 3.0.1, 3.1

 Attachments: LUCENE-2271.patch, LUCENE-2271.patch, LUCENE-2271.patch, 
 LUCENE-2271.patch, LUCENE-2271.patch, LUCENE-2271.patch, LUCENE-2271.patch, 
 TSDC.patch


 This is a foolowup to LUCENE-2270, where a part of this problem was fixed 
 (boost = 0 leading to NaN scores, which is also un-intuitive), but in 
 general, function queries in Solr can create these invalid scores easily. In 
 previous version of Lucene these scores ordered correct (except NaN, which 
 mixes up results), but never invalid document ids are returned (like 
 Integer.MAX_VALUE).
 The problem is: TopScoreDocCollector pre-fills the HitQueue with sentinel 
 ScoreDocs with a score of -inf and a doc id of Integer.MAX_VALUE. For the HQ 
 to work, this sentinel must be smaller than all posible values, which is not 
 the case:
 - -inf is equal and the document is not inserted into the HQ, as not 
 competitive, but the HQ is not yet full, so the sentinel values keep in the 
 HQ and result is the Integer.MAX_VALUE docs. This problem is solveable (and 
 only affects the Ordered collector) by chaning the exit condition to:
 {code}
 if (score = pqTop.score  pqTop.doc != Integer.MAX_VALUE) {
 // Since docs are returned in-order (i.e., increasing doc Id), a document
 // with equal score to pqTop.score cannot compete since HitQueue favors
 // documents with lower doc Ids. Therefore reject those docs too.
 return;
 }
 {code}
 - The NaN case can be fixed in the same way, but then has another problem: 
 all comparisons with NaN result in false (none of these is true): x  NaN, x 
  NaN, NaN == NaN. This leads to the fact that HQ's lessThan always returns 
 false, leading to unexspected ordering in the PQ and sometimes the sentinel 
 values do not stay at the top of the queue. A later hit then overrides the 
 top of the queue but leaves the incorrect sentinels  unchanged - invalid 
 results. This can be fixed in two ways in HQ:
 Force all sentinels to the top:
 {code}
 protected final boolean lessThan(ScoreDoc hitA, ScoreDoc hitB) {
 if (hitA.doc == Integer.MAX_VALUE)
   return true;
 if (hitB.doc == Integer.MAX_VALUE)
   return false;
 if (hitA.score == hitB.score)
   return hitA.doc  hitB.doc; 
 else
   return hitA.score  hitB.score;
 }
 {code}
 or alternatively have a defined order for NaN (Float.compare sorts them after 
 +inf):
 {code}
 protected final boolean lessThan(ScoreDoc hitA, ScoreDoc hitB) {
 if (hitA.score == hitB.score)
   return hitA.doc  hitB.doc; 
 else
   return Float.compare(hitA.score, hitB.score)  0;
 }
 {code}
 The problem with both solutions is, that we have now more comparisons per hit 
 and the use of sentinels is questionable. I would like to remove the 
 sentinels and use the old pre 2.9 code for comparing and using PQ.add() when 
 a competitive hit arrives. The order of NaN would be unspecified.
 To fix the order of NaN, it would be better to replace all score comparisons 
 by Float.compare() [also in FieldComparator].
 I would like to delay 2.9.2 and 3.0.1 until this problem is discussed and 
 solved.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-2271) Function queries producing scores of -inf or NaN (e.g. 1/x) return incorrect results with TopScoreDocCollector

2010-02-20 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-2271:
--

Attachment: LUCENE-2271.patch

more improved

 Function queries producing scores of -inf or NaN (e.g. 1/x) return incorrect 
 results with TopScoreDocCollector
 --

 Key: LUCENE-2271
 URL: https://issues.apache.org/jira/browse/LUCENE-2271
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Affects Versions: 2.9
Reporter: Uwe Schindler
Priority: Minor
 Fix For: 2.9.2, 3.0.1, 3.1

 Attachments: LUCENE-2271.patch, LUCENE-2271.patch, LUCENE-2271.patch, 
 LUCENE-2271.patch, LUCENE-2271.patch, LUCENE-2271.patch, LUCENE-2271.patch, 
 TSDC.patch


 This is a foolowup to LUCENE-2270, where a part of this problem was fixed 
 (boost = 0 leading to NaN scores, which is also un-intuitive), but in 
 general, function queries in Solr can create these invalid scores easily. In 
 previous version of Lucene these scores ordered correct (except NaN, which 
 mixes up results), but never invalid document ids are returned (like 
 Integer.MAX_VALUE).
 The problem is: TopScoreDocCollector pre-fills the HitQueue with sentinel 
 ScoreDocs with a score of -inf and a doc id of Integer.MAX_VALUE. For the HQ 
 to work, this sentinel must be smaller than all posible values, which is not 
 the case:
 - -inf is equal and the document is not inserted into the HQ, as not 
 competitive, but the HQ is not yet full, so the sentinel values keep in the 
 HQ and result is the Integer.MAX_VALUE docs. This problem is solveable (and 
 only affects the Ordered collector) by chaning the exit condition to:
 {code}
 if (score = pqTop.score  pqTop.doc != Integer.MAX_VALUE) {
 // Since docs are returned in-order (i.e., increasing doc Id), a document
 // with equal score to pqTop.score cannot compete since HitQueue favors
 // documents with lower doc Ids. Therefore reject those docs too.
 return;
 }
 {code}
 - The NaN case can be fixed in the same way, but then has another problem: 
 all comparisons with NaN result in false (none of these is true): x  NaN, x 
  NaN, NaN == NaN. This leads to the fact that HQ's lessThan always returns 
 false, leading to unexspected ordering in the PQ and sometimes the sentinel 
 values do not stay at the top of the queue. A later hit then overrides the 
 top of the queue but leaves the incorrect sentinels  unchanged - invalid 
 results. This can be fixed in two ways in HQ:
 Force all sentinels to the top:
 {code}
 protected final boolean lessThan(ScoreDoc hitA, ScoreDoc hitB) {
 if (hitA.doc == Integer.MAX_VALUE)
   return true;
 if (hitB.doc == Integer.MAX_VALUE)
   return false;
 if (hitA.score == hitB.score)
   return hitA.doc  hitB.doc; 
 else
   return hitA.score  hitB.score;
 }
 {code}
 or alternatively have a defined order for NaN (Float.compare sorts them after 
 +inf):
 {code}
 protected final boolean lessThan(ScoreDoc hitA, ScoreDoc hitB) {
 if (hitA.score == hitB.score)
   return hitA.doc  hitB.doc; 
 else
   return Float.compare(hitA.score, hitB.score)  0;
 }
 {code}
 The problem with both solutions is, that we have now more comparisons per hit 
 and the use of sentinels is questionable. I would like to remove the 
 sentinels and use the old pre 2.9 code for comparing and using PQ.add() when 
 a competitive hit arrives. The order of NaN would be unspecified.
 To fix the order of NaN, it would be better to replace all score comparisons 
 by Float.compare() [also in FieldComparator].
 I would like to delay 2.9.2 and 3.0.1 until this problem is discussed and 
 solved.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-2271) Function queries producing scores of -inf or NaN (e.g. 1/x) return incorrect results with TopScoreDocCollector

2010-02-20 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-2271:
--

Attachment: LUCENE-2271.patch

More optimized version with more local variables. This is the version for the 
benchmark-try.

 Function queries producing scores of -inf or NaN (e.g. 1/x) return incorrect 
 results with TopScoreDocCollector
 --

 Key: LUCENE-2271
 URL: https://issues.apache.org/jira/browse/LUCENE-2271
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Affects Versions: 2.9
Reporter: Uwe Schindler
Priority: Minor
 Fix For: 2.9.2, 3.0.1, 3.1

 Attachments: LUCENE-2271.patch, LUCENE-2271.patch, LUCENE-2271.patch, 
 LUCENE-2271.patch, LUCENE-2271.patch, LUCENE-2271.patch, LUCENE-2271.patch, 
 LUCENE-2271.patch, TSDC.patch


 This is a foolowup to LUCENE-2270, where a part of this problem was fixed 
 (boost = 0 leading to NaN scores, which is also un-intuitive), but in 
 general, function queries in Solr can create these invalid scores easily. In 
 previous version of Lucene these scores ordered correct (except NaN, which 
 mixes up results), but never invalid document ids are returned (like 
 Integer.MAX_VALUE).
 The problem is: TopScoreDocCollector pre-fills the HitQueue with sentinel 
 ScoreDocs with a score of -inf and a doc id of Integer.MAX_VALUE. For the HQ 
 to work, this sentinel must be smaller than all posible values, which is not 
 the case:
 - -inf is equal and the document is not inserted into the HQ, as not 
 competitive, but the HQ is not yet full, so the sentinel values keep in the 
 HQ and result is the Integer.MAX_VALUE docs. This problem is solveable (and 
 only affects the Ordered collector) by chaning the exit condition to:
 {code}
 if (score = pqTop.score  pqTop.doc != Integer.MAX_VALUE) {
 // Since docs are returned in-order (i.e., increasing doc Id), a document
 // with equal score to pqTop.score cannot compete since HitQueue favors
 // documents with lower doc Ids. Therefore reject those docs too.
 return;
 }
 {code}
 - The NaN case can be fixed in the same way, but then has another problem: 
 all comparisons with NaN result in false (none of these is true): x  NaN, x 
  NaN, NaN == NaN. This leads to the fact that HQ's lessThan always returns 
 false, leading to unexspected ordering in the PQ and sometimes the sentinel 
 values do not stay at the top of the queue. A later hit then overrides the 
 top of the queue but leaves the incorrect sentinels  unchanged - invalid 
 results. This can be fixed in two ways in HQ:
 Force all sentinels to the top:
 {code}
 protected final boolean lessThan(ScoreDoc hitA, ScoreDoc hitB) {
 if (hitA.doc == Integer.MAX_VALUE)
   return true;
 if (hitB.doc == Integer.MAX_VALUE)
   return false;
 if (hitA.score == hitB.score)
   return hitA.doc  hitB.doc; 
 else
   return hitA.score  hitB.score;
 }
 {code}
 or alternatively have a defined order for NaN (Float.compare sorts them after 
 +inf):
 {code}
 protected final boolean lessThan(ScoreDoc hitA, ScoreDoc hitB) {
 if (hitA.score == hitB.score)
   return hitA.doc  hitB.doc; 
 else
   return Float.compare(hitA.score, hitB.score)  0;
 }
 {code}
 The problem with both solutions is, that we have now more comparisons per hit 
 and the use of sentinels is questionable. I would like to remove the 
 sentinels and use the old pre 2.9 code for comparing and using PQ.add() when 
 a competitive hit arrives. The order of NaN would be unspecified.
 To fix the order of NaN, it would be better to replace all score comparisons 
 by Float.compare() [also in FieldComparator].
 I would like to delay 2.9.2 and 3.0.1 until this problem is discussed and 
 solved.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-2271) Function queries producing scores of -inf or NaN (e.g. 1/x) return incorrect results with TopScoreDocCollector

2010-02-20 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-2271:
--

Attachment: LUCENE-2271-bench.patch

Here a benchmark task made by grant. Run collector.alg and wait long enough.

 Function queries producing scores of -inf or NaN (e.g. 1/x) return incorrect 
 results with TopScoreDocCollector
 --

 Key: LUCENE-2271
 URL: https://issues.apache.org/jira/browse/LUCENE-2271
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Affects Versions: 2.9
Reporter: Uwe Schindler
Priority: Minor
 Fix For: 2.9.2, 3.0.1, 3.1

 Attachments: LUCENE-2271-bench.patch, LUCENE-2271.patch, 
 LUCENE-2271.patch, LUCENE-2271.patch, LUCENE-2271.patch, LUCENE-2271.patch, 
 LUCENE-2271.patch, LUCENE-2271.patch, LUCENE-2271.patch, TSDC.patch


 This is a foolowup to LUCENE-2270, where a part of this problem was fixed 
 (boost = 0 leading to NaN scores, which is also un-intuitive), but in 
 general, function queries in Solr can create these invalid scores easily. In 
 previous version of Lucene these scores ordered correct (except NaN, which 
 mixes up results), but never invalid document ids are returned (like 
 Integer.MAX_VALUE).
 The problem is: TopScoreDocCollector pre-fills the HitQueue with sentinel 
 ScoreDocs with a score of -inf and a doc id of Integer.MAX_VALUE. For the HQ 
 to work, this sentinel must be smaller than all posible values, which is not 
 the case:
 - -inf is equal and the document is not inserted into the HQ, as not 
 competitive, but the HQ is not yet full, so the sentinel values keep in the 
 HQ and result is the Integer.MAX_VALUE docs. This problem is solveable (and 
 only affects the Ordered collector) by chaning the exit condition to:
 {code}
 if (score = pqTop.score  pqTop.doc != Integer.MAX_VALUE) {
 // Since docs are returned in-order (i.e., increasing doc Id), a document
 // with equal score to pqTop.score cannot compete since HitQueue favors
 // documents with lower doc Ids. Therefore reject those docs too.
 return;
 }
 {code}
 - The NaN case can be fixed in the same way, but then has another problem: 
 all comparisons with NaN result in false (none of these is true): x  NaN, x 
  NaN, NaN == NaN. This leads to the fact that HQ's lessThan always returns 
 false, leading to unexspected ordering in the PQ and sometimes the sentinel 
 values do not stay at the top of the queue. A later hit then overrides the 
 top of the queue but leaves the incorrect sentinels  unchanged - invalid 
 results. This can be fixed in two ways in HQ:
 Force all sentinels to the top:
 {code}
 protected final boolean lessThan(ScoreDoc hitA, ScoreDoc hitB) {
 if (hitA.doc == Integer.MAX_VALUE)
   return true;
 if (hitB.doc == Integer.MAX_VALUE)
   return false;
 if (hitA.score == hitB.score)
   return hitA.doc  hitB.doc; 
 else
   return hitA.score  hitB.score;
 }
 {code}
 or alternatively have a defined order for NaN (Float.compare sorts them after 
 +inf):
 {code}
 protected final boolean lessThan(ScoreDoc hitA, ScoreDoc hitB) {
 if (hitA.score == hitB.score)
   return hitA.doc  hitB.doc; 
 else
   return Float.compare(hitA.score, hitB.score)  0;
 }
 {code}
 The problem with both solutions is, that we have now more comparisons per hit 
 and the use of sentinels is questionable. I would like to remove the 
 sentinels and use the old pre 2.9 code for comparing and using PQ.add() when 
 a competitive hit arrives. The order of NaN would be unspecified.
 To fix the order of NaN, it would be better to replace all score comparisons 
 by Float.compare() [also in FieldComparator].
 I would like to delay 2.9.2 and 3.0.1 until this problem is discussed and 
 solved.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-2271) Function queries producing scores of -inf or NaN (e.g. 1/x) return incorrect results with TopScoreDocCollector

2010-02-20 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-2271:
--

Attachment: LUCENE-2271.patch

More improved version, now equal to prefilled queue case, as the collector 
reuses overflowed ScoreDoc instances.

 Function queries producing scores of -inf or NaN (e.g. 1/x) return incorrect 
 results with TopScoreDocCollector
 --

 Key: LUCENE-2271
 URL: https://issues.apache.org/jira/browse/LUCENE-2271
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Affects Versions: 2.9
Reporter: Uwe Schindler
Priority: Minor
 Fix For: 2.9.2, 3.0.1, 3.1

 Attachments: LUCENE-2271-bench.patch, LUCENE-2271.patch, 
 LUCENE-2271.patch, LUCENE-2271.patch, LUCENE-2271.patch, LUCENE-2271.patch, 
 LUCENE-2271.patch, LUCENE-2271.patch, LUCENE-2271.patch, LUCENE-2271.patch, 
 TSDC.patch


 This is a foolowup to LUCENE-2270, where a part of this problem was fixed 
 (boost = 0 leading to NaN scores, which is also un-intuitive), but in 
 general, function queries in Solr can create these invalid scores easily. In 
 previous version of Lucene these scores ordered correct (except NaN, which 
 mixes up results), but never invalid document ids are returned (like 
 Integer.MAX_VALUE).
 The problem is: TopScoreDocCollector pre-fills the HitQueue with sentinel 
 ScoreDocs with a score of -inf and a doc id of Integer.MAX_VALUE. For the HQ 
 to work, this sentinel must be smaller than all posible values, which is not 
 the case:
 - -inf is equal and the document is not inserted into the HQ, as not 
 competitive, but the HQ is not yet full, so the sentinel values keep in the 
 HQ and result is the Integer.MAX_VALUE docs. This problem is solveable (and 
 only affects the Ordered collector) by chaning the exit condition to:
 {code}
 if (score = pqTop.score  pqTop.doc != Integer.MAX_VALUE) {
 // Since docs are returned in-order (i.e., increasing doc Id), a document
 // with equal score to pqTop.score cannot compete since HitQueue favors
 // documents with lower doc Ids. Therefore reject those docs too.
 return;
 }
 {code}
 - The NaN case can be fixed in the same way, but then has another problem: 
 all comparisons with NaN result in false (none of these is true): x  NaN, x 
  NaN, NaN == NaN. This leads to the fact that HQ's lessThan always returns 
 false, leading to unexspected ordering in the PQ and sometimes the sentinel 
 values do not stay at the top of the queue. A later hit then overrides the 
 top of the queue but leaves the incorrect sentinels  unchanged - invalid 
 results. This can be fixed in two ways in HQ:
 Force all sentinels to the top:
 {code}
 protected final boolean lessThan(ScoreDoc hitA, ScoreDoc hitB) {
 if (hitA.doc == Integer.MAX_VALUE)
   return true;
 if (hitB.doc == Integer.MAX_VALUE)
   return false;
 if (hitA.score == hitB.score)
   return hitA.doc  hitB.doc; 
 else
   return hitA.score  hitB.score;
 }
 {code}
 or alternatively have a defined order for NaN (Float.compare sorts them after 
 +inf):
 {code}
 protected final boolean lessThan(ScoreDoc hitA, ScoreDoc hitB) {
 if (hitA.score == hitB.score)
   return hitA.doc  hitB.doc; 
 else
   return Float.compare(hitA.score, hitB.score)  0;
 }
 {code}
 The problem with both solutions is, that we have now more comparisons per hit 
 and the use of sentinels is questionable. I would like to remove the 
 sentinels and use the old pre 2.9 code for comparing and using PQ.add() when 
 a competitive hit arrives. The order of NaN would be unspecified.
 To fix the order of NaN, it would be better to replace all score comparisons 
 by Float.compare() [also in FieldComparator].
 I would like to delay 2.9.2 and 3.0.1 until this problem is discussed and 
 solved.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-2271) Function queries producing scores of -inf or NaN (e.g. 1/x) return incorrect results with TopScoreDocCollector

2010-02-20 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-2271:
--

Attachment: (was: LUCENE-2271.patch)

 Function queries producing scores of -inf or NaN (e.g. 1/x) return incorrect 
 results with TopScoreDocCollector
 --

 Key: LUCENE-2271
 URL: https://issues.apache.org/jira/browse/LUCENE-2271
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Affects Versions: 2.9
Reporter: Uwe Schindler
Priority: Minor
 Fix For: 2.9.2, 3.0.1, 3.1

 Attachments: LUCENE-2271-bench.patch, LUCENE-2271.patch, 
 LUCENE-2271.patch, LUCENE-2271.patch, LUCENE-2271.patch, LUCENE-2271.patch, 
 LUCENE-2271.patch, LUCENE-2271.patch, LUCENE-2271.patch, TSDC.patch


 This is a foolowup to LUCENE-2270, where a part of this problem was fixed 
 (boost = 0 leading to NaN scores, which is also un-intuitive), but in 
 general, function queries in Solr can create these invalid scores easily. In 
 previous version of Lucene these scores ordered correct (except NaN, which 
 mixes up results), but never invalid document ids are returned (like 
 Integer.MAX_VALUE).
 The problem is: TopScoreDocCollector pre-fills the HitQueue with sentinel 
 ScoreDocs with a score of -inf and a doc id of Integer.MAX_VALUE. For the HQ 
 to work, this sentinel must be smaller than all posible values, which is not 
 the case:
 - -inf is equal and the document is not inserted into the HQ, as not 
 competitive, but the HQ is not yet full, so the sentinel values keep in the 
 HQ and result is the Integer.MAX_VALUE docs. This problem is solveable (and 
 only affects the Ordered collector) by chaning the exit condition to:
 {code}
 if (score = pqTop.score  pqTop.doc != Integer.MAX_VALUE) {
 // Since docs are returned in-order (i.e., increasing doc Id), a document
 // with equal score to pqTop.score cannot compete since HitQueue favors
 // documents with lower doc Ids. Therefore reject those docs too.
 return;
 }
 {code}
 - The NaN case can be fixed in the same way, but then has another problem: 
 all comparisons with NaN result in false (none of these is true): x  NaN, x 
  NaN, NaN == NaN. This leads to the fact that HQ's lessThan always returns 
 false, leading to unexspected ordering in the PQ and sometimes the sentinel 
 values do not stay at the top of the queue. A later hit then overrides the 
 top of the queue but leaves the incorrect sentinels  unchanged - invalid 
 results. This can be fixed in two ways in HQ:
 Force all sentinels to the top:
 {code}
 protected final boolean lessThan(ScoreDoc hitA, ScoreDoc hitB) {
 if (hitA.doc == Integer.MAX_VALUE)
   return true;
 if (hitB.doc == Integer.MAX_VALUE)
   return false;
 if (hitA.score == hitB.score)
   return hitA.doc  hitB.doc; 
 else
   return hitA.score  hitB.score;
 }
 {code}
 or alternatively have a defined order for NaN (Float.compare sorts them after 
 +inf):
 {code}
 protected final boolean lessThan(ScoreDoc hitA, ScoreDoc hitB) {
 if (hitA.score == hitB.score)
   return hitA.doc  hitB.doc; 
 else
   return Float.compare(hitA.score, hitB.score)  0;
 }
 {code}
 The problem with both solutions is, that we have now more comparisons per hit 
 and the use of sentinels is questionable. I would like to remove the 
 sentinels and use the old pre 2.9 code for comparing and using PQ.add() when 
 a competitive hit arrives. The order of NaN would be unspecified.
 To fix the order of NaN, it would be better to replace all score comparisons 
 by Float.compare() [also in FieldComparator].
 I would like to delay 2.9.2 and 3.0.1 until this problem is discussed and 
 solved.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-2271) Function queries producing scores of -inf or NaN (e.g. 1/x) return incorrect results with TopScoreDocCollector

2010-02-20 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-2271:
--

Attachment: (was: LUCENE-2271.patch)

 Function queries producing scores of -inf or NaN (e.g. 1/x) return incorrect 
 results with TopScoreDocCollector
 --

 Key: LUCENE-2271
 URL: https://issues.apache.org/jira/browse/LUCENE-2271
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Affects Versions: 2.9
Reporter: Uwe Schindler
Priority: Minor
 Fix For: 2.9.2, 3.0.1, 3.1

 Attachments: LUCENE-2271-bench.patch, LUCENE-2271.patch, 
 LUCENE-2271.patch, LUCENE-2271.patch, LUCENE-2271.patch, LUCENE-2271.patch, 
 LUCENE-2271.patch, LUCENE-2271.patch, TSDC.patch


 This is a foolowup to LUCENE-2270, where a part of this problem was fixed 
 (boost = 0 leading to NaN scores, which is also un-intuitive), but in 
 general, function queries in Solr can create these invalid scores easily. In 
 previous version of Lucene these scores ordered correct (except NaN, which 
 mixes up results), but never invalid document ids are returned (like 
 Integer.MAX_VALUE).
 The problem is: TopScoreDocCollector pre-fills the HitQueue with sentinel 
 ScoreDocs with a score of -inf and a doc id of Integer.MAX_VALUE. For the HQ 
 to work, this sentinel must be smaller than all posible values, which is not 
 the case:
 - -inf is equal and the document is not inserted into the HQ, as not 
 competitive, but the HQ is not yet full, so the sentinel values keep in the 
 HQ and result is the Integer.MAX_VALUE docs. This problem is solveable (and 
 only affects the Ordered collector) by chaning the exit condition to:
 {code}
 if (score = pqTop.score  pqTop.doc != Integer.MAX_VALUE) {
 // Since docs are returned in-order (i.e., increasing doc Id), a document
 // with equal score to pqTop.score cannot compete since HitQueue favors
 // documents with lower doc Ids. Therefore reject those docs too.
 return;
 }
 {code}
 - The NaN case can be fixed in the same way, but then has another problem: 
 all comparisons with NaN result in false (none of these is true): x  NaN, x 
  NaN, NaN == NaN. This leads to the fact that HQ's lessThan always returns 
 false, leading to unexspected ordering in the PQ and sometimes the sentinel 
 values do not stay at the top of the queue. A later hit then overrides the 
 top of the queue but leaves the incorrect sentinels  unchanged - invalid 
 results. This can be fixed in two ways in HQ:
 Force all sentinels to the top:
 {code}
 protected final boolean lessThan(ScoreDoc hitA, ScoreDoc hitB) {
 if (hitA.doc == Integer.MAX_VALUE)
   return true;
 if (hitB.doc == Integer.MAX_VALUE)
   return false;
 if (hitA.score == hitB.score)
   return hitA.doc  hitB.doc; 
 else
   return hitA.score  hitB.score;
 }
 {code}
 or alternatively have a defined order for NaN (Float.compare sorts them after 
 +inf):
 {code}
 protected final boolean lessThan(ScoreDoc hitA, ScoreDoc hitB) {
 if (hitA.score == hitB.score)
   return hitA.doc  hitB.doc; 
 else
   return Float.compare(hitA.score, hitB.score)  0;
 }
 {code}
 The problem with both solutions is, that we have now more comparisons per hit 
 and the use of sentinels is questionable. I would like to remove the 
 sentinels and use the old pre 2.9 code for comparing and using PQ.add() when 
 a competitive hit arrives. The order of NaN would be unspecified.
 To fix the order of NaN, it would be better to replace all score comparisons 
 by Float.compare() [also in FieldComparator].
 I would like to delay 2.9.2 and 3.0.1 until this problem is discussed and 
 solved.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-2271) Function queries producing scores of -inf or NaN (e.g. 1/x) return incorrect results with TopScoreDocCollector

2010-02-20 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-2271:
--

Attachment: (was: LUCENE-2271.patch)

 Function queries producing scores of -inf or NaN (e.g. 1/x) return incorrect 
 results with TopScoreDocCollector
 --

 Key: LUCENE-2271
 URL: https://issues.apache.org/jira/browse/LUCENE-2271
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Affects Versions: 2.9
Reporter: Uwe Schindler
Priority: Minor
 Fix For: 2.9.2, 3.0.1, 3.1

 Attachments: LUCENE-2271-bench.patch, LUCENE-2271.patch, 
 LUCENE-2271.patch, LUCENE-2271.patch, LUCENE-2271.patch, LUCENE-2271.patch, 
 LUCENE-2271.patch, LUCENE-2271.patch, TSDC.patch


 This is a foolowup to LUCENE-2270, where a part of this problem was fixed 
 (boost = 0 leading to NaN scores, which is also un-intuitive), but in 
 general, function queries in Solr can create these invalid scores easily. In 
 previous version of Lucene these scores ordered correct (except NaN, which 
 mixes up results), but never invalid document ids are returned (like 
 Integer.MAX_VALUE).
 The problem is: TopScoreDocCollector pre-fills the HitQueue with sentinel 
 ScoreDocs with a score of -inf and a doc id of Integer.MAX_VALUE. For the HQ 
 to work, this sentinel must be smaller than all posible values, which is not 
 the case:
 - -inf is equal and the document is not inserted into the HQ, as not 
 competitive, but the HQ is not yet full, so the sentinel values keep in the 
 HQ and result is the Integer.MAX_VALUE docs. This problem is solveable (and 
 only affects the Ordered collector) by chaning the exit condition to:
 {code}
 if (score = pqTop.score  pqTop.doc != Integer.MAX_VALUE) {
 // Since docs are returned in-order (i.e., increasing doc Id), a document
 // with equal score to pqTop.score cannot compete since HitQueue favors
 // documents with lower doc Ids. Therefore reject those docs too.
 return;
 }
 {code}
 - The NaN case can be fixed in the same way, but then has another problem: 
 all comparisons with NaN result in false (none of these is true): x  NaN, x 
  NaN, NaN == NaN. This leads to the fact that HQ's lessThan always returns 
 false, leading to unexspected ordering in the PQ and sometimes the sentinel 
 values do not stay at the top of the queue. A later hit then overrides the 
 top of the queue but leaves the incorrect sentinels  unchanged - invalid 
 results. This can be fixed in two ways in HQ:
 Force all sentinels to the top:
 {code}
 protected final boolean lessThan(ScoreDoc hitA, ScoreDoc hitB) {
 if (hitA.doc == Integer.MAX_VALUE)
   return true;
 if (hitB.doc == Integer.MAX_VALUE)
   return false;
 if (hitA.score == hitB.score)
   return hitA.doc  hitB.doc; 
 else
   return hitA.score  hitB.score;
 }
 {code}
 or alternatively have a defined order for NaN (Float.compare sorts them after 
 +inf):
 {code}
 protected final boolean lessThan(ScoreDoc hitA, ScoreDoc hitB) {
 if (hitA.score == hitB.score)
   return hitA.doc  hitB.doc; 
 else
   return Float.compare(hitA.score, hitB.score)  0;
 }
 {code}
 The problem with both solutions is, that we have now more comparisons per hit 
 and the use of sentinels is questionable. I would like to remove the 
 sentinels and use the old pre 2.9 code for comparing and using PQ.add() when 
 a competitive hit arrives. The order of NaN would be unspecified.
 To fix the order of NaN, it would be better to replace all score comparisons 
 by Float.compare() [also in FieldComparator].
 I would like to delay 2.9.2 and 3.0.1 until this problem is discussed and 
 solved.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-2271) Function queries producing scores of -inf or NaN (e.g. 1/x) return incorrect results with TopScoreDocCollector

2010-02-20 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-2271:
--

Attachment: LUCENE-2271.patch

 Function queries producing scores of -inf or NaN (e.g. 1/x) return incorrect 
 results with TopScoreDocCollector
 --

 Key: LUCENE-2271
 URL: https://issues.apache.org/jira/browse/LUCENE-2271
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Affects Versions: 2.9
Reporter: Uwe Schindler
Priority: Minor
 Fix For: 2.9.2, 3.0.1, 3.1

 Attachments: LUCENE-2271-bench.patch, LUCENE-2271.patch, 
 LUCENE-2271.patch, LUCENE-2271.patch, LUCENE-2271.patch, LUCENE-2271.patch, 
 LUCENE-2271.patch, LUCENE-2271.patch, TSDC.patch


 This is a foolowup to LUCENE-2270, where a part of this problem was fixed 
 (boost = 0 leading to NaN scores, which is also un-intuitive), but in 
 general, function queries in Solr can create these invalid scores easily. In 
 previous version of Lucene these scores ordered correct (except NaN, which 
 mixes up results), but never invalid document ids are returned (like 
 Integer.MAX_VALUE).
 The problem is: TopScoreDocCollector pre-fills the HitQueue with sentinel 
 ScoreDocs with a score of -inf and a doc id of Integer.MAX_VALUE. For the HQ 
 to work, this sentinel must be smaller than all posible values, which is not 
 the case:
 - -inf is equal and the document is not inserted into the HQ, as not 
 competitive, but the HQ is not yet full, so the sentinel values keep in the 
 HQ and result is the Integer.MAX_VALUE docs. This problem is solveable (and 
 only affects the Ordered collector) by chaning the exit condition to:
 {code}
 if (score = pqTop.score  pqTop.doc != Integer.MAX_VALUE) {
 // Since docs are returned in-order (i.e., increasing doc Id), a document
 // with equal score to pqTop.score cannot compete since HitQueue favors
 // documents with lower doc Ids. Therefore reject those docs too.
 return;
 }
 {code}
 - The NaN case can be fixed in the same way, but then has another problem: 
 all comparisons with NaN result in false (none of these is true): x  NaN, x 
  NaN, NaN == NaN. This leads to the fact that HQ's lessThan always returns 
 false, leading to unexspected ordering in the PQ and sometimes the sentinel 
 values do not stay at the top of the queue. A later hit then overrides the 
 top of the queue but leaves the incorrect sentinels  unchanged - invalid 
 results. This can be fixed in two ways in HQ:
 Force all sentinels to the top:
 {code}
 protected final boolean lessThan(ScoreDoc hitA, ScoreDoc hitB) {
 if (hitA.doc == Integer.MAX_VALUE)
   return true;
 if (hitB.doc == Integer.MAX_VALUE)
   return false;
 if (hitA.score == hitB.score)
   return hitA.doc  hitB.doc; 
 else
   return hitA.score  hitB.score;
 }
 {code}
 or alternatively have a defined order for NaN (Float.compare sorts them after 
 +inf):
 {code}
 protected final boolean lessThan(ScoreDoc hitA, ScoreDoc hitB) {
 if (hitA.score == hitB.score)
   return hitA.doc  hitB.doc; 
 else
   return Float.compare(hitA.score, hitB.score)  0;
 }
 {code}
 The problem with both solutions is, that we have now more comparisons per hit 
 and the use of sentinels is questionable. I would like to remove the 
 sentinels and use the old pre 2.9 code for comparing and using PQ.add() when 
 a competitive hit arrives. The order of NaN would be unspecified.
 To fix the order of NaN, it would be better to replace all score comparisons 
 by Float.compare() [also in FieldComparator].
 I would like to delay 2.9.2 and 3.0.1 until this problem is discussed and 
 solved.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-2271) Function queries producing scores of -inf or NaN (e.g. 1/x) return incorrect results with TopScoreDocCollector

2010-02-20 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-2271:
--

Attachment: (was: LUCENE-2271.patch)

 Function queries producing scores of -inf or NaN (e.g. 1/x) return incorrect 
 results with TopScoreDocCollector
 --

 Key: LUCENE-2271
 URL: https://issues.apache.org/jira/browse/LUCENE-2271
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Affects Versions: 2.9
Reporter: Uwe Schindler
Priority: Minor
 Fix For: 2.9.2, 3.0.1, 3.1

 Attachments: LUCENE-2271-bench.patch, LUCENE-2271.patch, 
 LUCENE-2271.patch, LUCENE-2271.patch, TSDC.patch


 This is a foolowup to LUCENE-2270, where a part of this problem was fixed 
 (boost = 0 leading to NaN scores, which is also un-intuitive), but in 
 general, function queries in Solr can create these invalid scores easily. In 
 previous version of Lucene these scores ordered correct (except NaN, which 
 mixes up results), but never invalid document ids are returned (like 
 Integer.MAX_VALUE).
 The problem is: TopScoreDocCollector pre-fills the HitQueue with sentinel 
 ScoreDocs with a score of -inf and a doc id of Integer.MAX_VALUE. For the HQ 
 to work, this sentinel must be smaller than all posible values, which is not 
 the case:
 - -inf is equal and the document is not inserted into the HQ, as not 
 competitive, but the HQ is not yet full, so the sentinel values keep in the 
 HQ and result is the Integer.MAX_VALUE docs. This problem is solveable (and 
 only affects the Ordered collector) by chaning the exit condition to:
 {code}
 if (score = pqTop.score  pqTop.doc != Integer.MAX_VALUE) {
 // Since docs are returned in-order (i.e., increasing doc Id), a document
 // with equal score to pqTop.score cannot compete since HitQueue favors
 // documents with lower doc Ids. Therefore reject those docs too.
 return;
 }
 {code}
 - The NaN case can be fixed in the same way, but then has another problem: 
 all comparisons with NaN result in false (none of these is true): x  NaN, x 
  NaN, NaN == NaN. This leads to the fact that HQ's lessThan always returns 
 false, leading to unexspected ordering in the PQ and sometimes the sentinel 
 values do not stay at the top of the queue. A later hit then overrides the 
 top of the queue but leaves the incorrect sentinels  unchanged - invalid 
 results. This can be fixed in two ways in HQ:
 Force all sentinels to the top:
 {code}
 protected final boolean lessThan(ScoreDoc hitA, ScoreDoc hitB) {
 if (hitA.doc == Integer.MAX_VALUE)
   return true;
 if (hitB.doc == Integer.MAX_VALUE)
   return false;
 if (hitA.score == hitB.score)
   return hitA.doc  hitB.doc; 
 else
   return hitA.score  hitB.score;
 }
 {code}
 or alternatively have a defined order for NaN (Float.compare sorts them after 
 +inf):
 {code}
 protected final boolean lessThan(ScoreDoc hitA, ScoreDoc hitB) {
 if (hitA.score == hitB.score)
   return hitA.doc  hitB.doc; 
 else
   return Float.compare(hitA.score, hitB.score)  0;
 }
 {code}
 The problem with both solutions is, that we have now more comparisons per hit 
 and the use of sentinels is questionable. I would like to remove the 
 sentinels and use the old pre 2.9 code for comparing and using PQ.add() when 
 a competitive hit arrives. The order of NaN would be unspecified.
 To fix the order of NaN, it would be better to replace all score comparisons 
 by Float.compare() [also in FieldComparator].
 I would like to delay 2.9.2 and 3.0.1 until this problem is discussed and 
 solved.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-2271) Function queries producing scores of -inf or NaN (e.g. 1/x) return incorrect results with TopScoreDocCollector

2010-02-20 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-2271:
--

Attachment: (was: LUCENE-2271.patch)

 Function queries producing scores of -inf or NaN (e.g. 1/x) return incorrect 
 results with TopScoreDocCollector
 --

 Key: LUCENE-2271
 URL: https://issues.apache.org/jira/browse/LUCENE-2271
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Affects Versions: 2.9
Reporter: Uwe Schindler
Priority: Minor
 Fix For: 2.9.2, 3.0.1, 3.1

 Attachments: LUCENE-2271-bench.patch, LUCENE-2271.patch, 
 LUCENE-2271.patch, LUCENE-2271.patch, TSDC.patch


 This is a foolowup to LUCENE-2270, where a part of this problem was fixed 
 (boost = 0 leading to NaN scores, which is also un-intuitive), but in 
 general, function queries in Solr can create these invalid scores easily. In 
 previous version of Lucene these scores ordered correct (except NaN, which 
 mixes up results), but never invalid document ids are returned (like 
 Integer.MAX_VALUE).
 The problem is: TopScoreDocCollector pre-fills the HitQueue with sentinel 
 ScoreDocs with a score of -inf and a doc id of Integer.MAX_VALUE. For the HQ 
 to work, this sentinel must be smaller than all posible values, which is not 
 the case:
 - -inf is equal and the document is not inserted into the HQ, as not 
 competitive, but the HQ is not yet full, so the sentinel values keep in the 
 HQ and result is the Integer.MAX_VALUE docs. This problem is solveable (and 
 only affects the Ordered collector) by chaning the exit condition to:
 {code}
 if (score = pqTop.score  pqTop.doc != Integer.MAX_VALUE) {
 // Since docs are returned in-order (i.e., increasing doc Id), a document
 // with equal score to pqTop.score cannot compete since HitQueue favors
 // documents with lower doc Ids. Therefore reject those docs too.
 return;
 }
 {code}
 - The NaN case can be fixed in the same way, but then has another problem: 
 all comparisons with NaN result in false (none of these is true): x  NaN, x 
  NaN, NaN == NaN. This leads to the fact that HQ's lessThan always returns 
 false, leading to unexspected ordering in the PQ and sometimes the sentinel 
 values do not stay at the top of the queue. A later hit then overrides the 
 top of the queue but leaves the incorrect sentinels  unchanged - invalid 
 results. This can be fixed in two ways in HQ:
 Force all sentinels to the top:
 {code}
 protected final boolean lessThan(ScoreDoc hitA, ScoreDoc hitB) {
 if (hitA.doc == Integer.MAX_VALUE)
   return true;
 if (hitB.doc == Integer.MAX_VALUE)
   return false;
 if (hitA.score == hitB.score)
   return hitA.doc  hitB.doc; 
 else
   return hitA.score  hitB.score;
 }
 {code}
 or alternatively have a defined order for NaN (Float.compare sorts them after 
 +inf):
 {code}
 protected final boolean lessThan(ScoreDoc hitA, ScoreDoc hitB) {
 if (hitA.score == hitB.score)
   return hitA.doc  hitB.doc; 
 else
   return Float.compare(hitA.score, hitB.score)  0;
 }
 {code}
 The problem with both solutions is, that we have now more comparisons per hit 
 and the use of sentinels is questionable. I would like to remove the 
 sentinels and use the old pre 2.9 code for comparing and using PQ.add() when 
 a competitive hit arrives. The order of NaN would be unspecified.
 To fix the order of NaN, it would be better to replace all score comparisons 
 by Float.compare() [also in FieldComparator].
 I would like to delay 2.9.2 and 3.0.1 until this problem is discussed and 
 solved.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-2271) Function queries producing scores of -inf or NaN (e.g. 1/x) return incorrect results with TopScoreDocCollector

2010-02-20 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-2271:
--

Attachment: (was: LUCENE-2271.patch)

 Function queries producing scores of -inf or NaN (e.g. 1/x) return incorrect 
 results with TopScoreDocCollector
 --

 Key: LUCENE-2271
 URL: https://issues.apache.org/jira/browse/LUCENE-2271
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Affects Versions: 2.9
Reporter: Uwe Schindler
Priority: Minor
 Fix For: 2.9.2, 3.0.1, 3.1

 Attachments: LUCENE-2271-bench.patch, LUCENE-2271.patch, 
 LUCENE-2271.patch, LUCENE-2271.patch, TSDC.patch


 This is a foolowup to LUCENE-2270, where a part of this problem was fixed 
 (boost = 0 leading to NaN scores, which is also un-intuitive), but in 
 general, function queries in Solr can create these invalid scores easily. In 
 previous version of Lucene these scores ordered correct (except NaN, which 
 mixes up results), but never invalid document ids are returned (like 
 Integer.MAX_VALUE).
 The problem is: TopScoreDocCollector pre-fills the HitQueue with sentinel 
 ScoreDocs with a score of -inf and a doc id of Integer.MAX_VALUE. For the HQ 
 to work, this sentinel must be smaller than all posible values, which is not 
 the case:
 - -inf is equal and the document is not inserted into the HQ, as not 
 competitive, but the HQ is not yet full, so the sentinel values keep in the 
 HQ and result is the Integer.MAX_VALUE docs. This problem is solveable (and 
 only affects the Ordered collector) by chaning the exit condition to:
 {code}
 if (score = pqTop.score  pqTop.doc != Integer.MAX_VALUE) {
 // Since docs are returned in-order (i.e., increasing doc Id), a document
 // with equal score to pqTop.score cannot compete since HitQueue favors
 // documents with lower doc Ids. Therefore reject those docs too.
 return;
 }
 {code}
 - The NaN case can be fixed in the same way, but then has another problem: 
 all comparisons with NaN result in false (none of these is true): x  NaN, x 
  NaN, NaN == NaN. This leads to the fact that HQ's lessThan always returns 
 false, leading to unexspected ordering in the PQ and sometimes the sentinel 
 values do not stay at the top of the queue. A later hit then overrides the 
 top of the queue but leaves the incorrect sentinels  unchanged - invalid 
 results. This can be fixed in two ways in HQ:
 Force all sentinels to the top:
 {code}
 protected final boolean lessThan(ScoreDoc hitA, ScoreDoc hitB) {
 if (hitA.doc == Integer.MAX_VALUE)
   return true;
 if (hitB.doc == Integer.MAX_VALUE)
   return false;
 if (hitA.score == hitB.score)
   return hitA.doc  hitB.doc; 
 else
   return hitA.score  hitB.score;
 }
 {code}
 or alternatively have a defined order for NaN (Float.compare sorts them after 
 +inf):
 {code}
 protected final boolean lessThan(ScoreDoc hitA, ScoreDoc hitB) {
 if (hitA.score == hitB.score)
   return hitA.doc  hitB.doc; 
 else
   return Float.compare(hitA.score, hitB.score)  0;
 }
 {code}
 The problem with both solutions is, that we have now more comparisons per hit 
 and the use of sentinels is questionable. I would like to remove the 
 sentinels and use the old pre 2.9 code for comparing and using PQ.add() when 
 a competitive hit arrives. The order of NaN would be unspecified.
 To fix the order of NaN, it would be better to replace all score comparisons 
 by Float.compare() [also in FieldComparator].
 I would like to delay 2.9.2 and 3.0.1 until this problem is discussed and 
 solved.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-2271) Function queries producing scores of -inf or NaN (e.g. 1/x) return incorrect results with TopScoreDocCollector

2010-02-20 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-2271:
--

Attachment: (was: LUCENE-2271.patch)

 Function queries producing scores of -inf or NaN (e.g. 1/x) return incorrect 
 results with TopScoreDocCollector
 --

 Key: LUCENE-2271
 URL: https://issues.apache.org/jira/browse/LUCENE-2271
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Affects Versions: 2.9
Reporter: Uwe Schindler
Priority: Minor
 Fix For: 2.9.2, 3.0.1, 3.1

 Attachments: LUCENE-2271-bench.patch, LUCENE-2271.patch, 
 LUCENE-2271.patch, LUCENE-2271.patch, TSDC.patch


 This is a foolowup to LUCENE-2270, where a part of this problem was fixed 
 (boost = 0 leading to NaN scores, which is also un-intuitive), but in 
 general, function queries in Solr can create these invalid scores easily. In 
 previous version of Lucene these scores ordered correct (except NaN, which 
 mixes up results), but never invalid document ids are returned (like 
 Integer.MAX_VALUE).
 The problem is: TopScoreDocCollector pre-fills the HitQueue with sentinel 
 ScoreDocs with a score of -inf and a doc id of Integer.MAX_VALUE. For the HQ 
 to work, this sentinel must be smaller than all posible values, which is not 
 the case:
 - -inf is equal and the document is not inserted into the HQ, as not 
 competitive, but the HQ is not yet full, so the sentinel values keep in the 
 HQ and result is the Integer.MAX_VALUE docs. This problem is solveable (and 
 only affects the Ordered collector) by chaning the exit condition to:
 {code}
 if (score = pqTop.score  pqTop.doc != Integer.MAX_VALUE) {
 // Since docs are returned in-order (i.e., increasing doc Id), a document
 // with equal score to pqTop.score cannot compete since HitQueue favors
 // documents with lower doc Ids. Therefore reject those docs too.
 return;
 }
 {code}
 - The NaN case can be fixed in the same way, but then has another problem: 
 all comparisons with NaN result in false (none of these is true): x  NaN, x 
  NaN, NaN == NaN. This leads to the fact that HQ's lessThan always returns 
 false, leading to unexspected ordering in the PQ and sometimes the sentinel 
 values do not stay at the top of the queue. A later hit then overrides the 
 top of the queue but leaves the incorrect sentinels  unchanged - invalid 
 results. This can be fixed in two ways in HQ:
 Force all sentinels to the top:
 {code}
 protected final boolean lessThan(ScoreDoc hitA, ScoreDoc hitB) {
 if (hitA.doc == Integer.MAX_VALUE)
   return true;
 if (hitB.doc == Integer.MAX_VALUE)
   return false;
 if (hitA.score == hitB.score)
   return hitA.doc  hitB.doc; 
 else
   return hitA.score  hitB.score;
 }
 {code}
 or alternatively have a defined order for NaN (Float.compare sorts them after 
 +inf):
 {code}
 protected final boolean lessThan(ScoreDoc hitA, ScoreDoc hitB) {
 if (hitA.score == hitB.score)
   return hitA.doc  hitB.doc; 
 else
   return Float.compare(hitA.score, hitB.score)  0;
 }
 {code}
 The problem with both solutions is, that we have now more comparisons per hit 
 and the use of sentinels is questionable. I would like to remove the 
 sentinels and use the old pre 2.9 code for comparing and using PQ.add() when 
 a competitive hit arrives. The order of NaN would be unspecified.
 To fix the order of NaN, it would be better to replace all score comparisons 
 by Float.compare() [also in FieldComparator].
 I would like to delay 2.9.2 and 3.0.1 until this problem is discussed and 
 solved.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Issue Comment Edited: (LUCENE-2271) Function queries producing scores of -inf or NaN (e.g. 1/x) return incorrect results with TopScoreDocCollector

2010-02-20 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12836265#action_12836265
 ] 

Uwe Schindler edited comment on LUCENE-2271 at 2/20/10 9:40 PM:


I did some benchmarks (Java 1.5.0_22, 64bit, Win7, Core2Duo P8700) will do more 
tomorrow when i set up a large testing environment with 3 separate checkouts 
containing the three collector versions):
- The latest approach 
(https://issues.apache.org/jira/secure/attachment/12436458/LUCENE-2271.patch) 
with no sentinels using the delegation and exchanging the inner collector was 
as fast as the original unpatched version
- The approach with sentinels but fixed HitQueue ordering and extra checks 
(https://issues.apache.org/jira/secure/attachment/12436329/LUCENE-2271.patch), 
showed (as exspected) a little overhead: The ordered collector was as fast as 
the unpatched unordered collector (because one check more) - so i would not use 
this patch

  was (Author: thetaphi):
I did some benchmarks (will do more tomorrow when i set up a large testing 
environment with 3 separate checkouts containing the three collector versions):
- The latest approach 
(https://issues.apache.org/jira/secure/attachment/12436458/LUCENE-2271.patch) 
with no sentinels using the delegation and exchanging the inner collector was 
as fast as the original unpatched version
- The approach with sentinels but fixed HitQueue ordering and extra checks 
(https://issues.apache.org/jira/secure/attachment/12436329/LUCENE-2271.patch), 
showed (as exspected) a little overhead: The ordered collector was as fast as 
the unpatched unordered collector (because one check more) - so i would not use 
this patch
  
 Function queries producing scores of -inf or NaN (e.g. 1/x) return incorrect 
 results with TopScoreDocCollector
 --

 Key: LUCENE-2271
 URL: https://issues.apache.org/jira/browse/LUCENE-2271
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Affects Versions: 2.9
Reporter: Uwe Schindler
Priority: Minor
 Fix For: 2.9.2, 3.0.1, 3.1

 Attachments: LUCENE-2271-bench.patch, LUCENE-2271.patch, 
 LUCENE-2271.patch, LUCENE-2271.patch, TSDC.patch


 This is a foolowup to LUCENE-2270, where a part of this problem was fixed 
 (boost = 0 leading to NaN scores, which is also un-intuitive), but in 
 general, function queries in Solr can create these invalid scores easily. In 
 previous version of Lucene these scores ordered correct (except NaN, which 
 mixes up results), but never invalid document ids are returned (like 
 Integer.MAX_VALUE).
 The problem is: TopScoreDocCollector pre-fills the HitQueue with sentinel 
 ScoreDocs with a score of -inf and a doc id of Integer.MAX_VALUE. For the HQ 
 to work, this sentinel must be smaller than all posible values, which is not 
 the case:
 - -inf is equal and the document is not inserted into the HQ, as not 
 competitive, but the HQ is not yet full, so the sentinel values keep in the 
 HQ and result is the Integer.MAX_VALUE docs. This problem is solveable (and 
 only affects the Ordered collector) by chaning the exit condition to:
 {code}
 if (score = pqTop.score  pqTop.doc != Integer.MAX_VALUE) {
 // Since docs are returned in-order (i.e., increasing doc Id), a document
 // with equal score to pqTop.score cannot compete since HitQueue favors
 // documents with lower doc Ids. Therefore reject those docs too.
 return;
 }
 {code}
 - The NaN case can be fixed in the same way, but then has another problem: 
 all comparisons with NaN result in false (none of these is true): x  NaN, x 
  NaN, NaN == NaN. This leads to the fact that HQ's lessThan always returns 
 false, leading to unexspected ordering in the PQ and sometimes the sentinel 
 values do not stay at the top of the queue. A later hit then overrides the 
 top of the queue but leaves the incorrect sentinels  unchanged - invalid 
 results. This can be fixed in two ways in HQ:
 Force all sentinels to the top:
 {code}
 protected final boolean lessThan(ScoreDoc hitA, ScoreDoc hitB) {
 if (hitA.doc == Integer.MAX_VALUE)
   return true;
 if (hitB.doc == Integer.MAX_VALUE)
   return false;
 if (hitA.score == hitB.score)
   return hitA.doc  hitB.doc; 
 else
   return hitA.score  hitB.score;
 }
 {code}
 or alternatively have a defined order for NaN (Float.compare sorts them after 
 +inf):
 {code}
 protected final boolean lessThan(ScoreDoc hitA, ScoreDoc hitB) {
 if (hitA.score == hitB.score)
   return hitA.doc  hitB.doc; 
 else
   return Float.compare(hitA.score, hitB.score)  0;
 }
 {code}
 The problem with both solutions is, that we have now more comparisons per hit

[jira] Commented: (LUCENE-2271) Function queries producing scores of -inf or NaN (e.g. 1/x) return incorrect results with TopScoreDocCollector

2010-02-20 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12836265#action_12836265
 ] 

Uwe Schindler commented on LUCENE-2271:
---

I did some benchmarks (will do more tomorrow when i set up a large testing 
environment with 3 separate checkouts containing the three collector versions):
- The latest approach 
(https://issues.apache.org/jira/secure/attachment/12436458/LUCENE-2271.patch) 
with no sentinels using the delegation and exchanging the inner collector was 
as fast as the original unpatched version
- The approach with sentinels but fixed HitQueue ordering and extra checks 
(https://issues.apache.org/jira/secure/attachment/12436329/LUCENE-2271.patch), 
showed (as exspected) a little overhead: The ordered collector was as fast as 
the unpatched unordered collector (because one check more) - so i would not use 
this patch

 Function queries producing scores of -inf or NaN (e.g. 1/x) return incorrect 
 results with TopScoreDocCollector
 --

 Key: LUCENE-2271
 URL: https://issues.apache.org/jira/browse/LUCENE-2271
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Affects Versions: 2.9
Reporter: Uwe Schindler
Priority: Minor
 Fix For: 2.9.2, 3.0.1, 3.1

 Attachments: LUCENE-2271-bench.patch, LUCENE-2271.patch, 
 LUCENE-2271.patch, LUCENE-2271.patch, TSDC.patch


 This is a foolowup to LUCENE-2270, where a part of this problem was fixed 
 (boost = 0 leading to NaN scores, which is also un-intuitive), but in 
 general, function queries in Solr can create these invalid scores easily. In 
 previous version of Lucene these scores ordered correct (except NaN, which 
 mixes up results), but never invalid document ids are returned (like 
 Integer.MAX_VALUE).
 The problem is: TopScoreDocCollector pre-fills the HitQueue with sentinel 
 ScoreDocs with a score of -inf and a doc id of Integer.MAX_VALUE. For the HQ 
 to work, this sentinel must be smaller than all posible values, which is not 
 the case:
 - -inf is equal and the document is not inserted into the HQ, as not 
 competitive, but the HQ is not yet full, so the sentinel values keep in the 
 HQ and result is the Integer.MAX_VALUE docs. This problem is solveable (and 
 only affects the Ordered collector) by chaning the exit condition to:
 {code}
 if (score = pqTop.score  pqTop.doc != Integer.MAX_VALUE) {
 // Since docs are returned in-order (i.e., increasing doc Id), a document
 // with equal score to pqTop.score cannot compete since HitQueue favors
 // documents with lower doc Ids. Therefore reject those docs too.
 return;
 }
 {code}
 - The NaN case can be fixed in the same way, but then has another problem: 
 all comparisons with NaN result in false (none of these is true): x  NaN, x 
  NaN, NaN == NaN. This leads to the fact that HQ's lessThan always returns 
 false, leading to unexspected ordering in the PQ and sometimes the sentinel 
 values do not stay at the top of the queue. A later hit then overrides the 
 top of the queue but leaves the incorrect sentinels  unchanged - invalid 
 results. This can be fixed in two ways in HQ:
 Force all sentinels to the top:
 {code}
 protected final boolean lessThan(ScoreDoc hitA, ScoreDoc hitB) {
 if (hitA.doc == Integer.MAX_VALUE)
   return true;
 if (hitB.doc == Integer.MAX_VALUE)
   return false;
 if (hitA.score == hitB.score)
   return hitA.doc  hitB.doc; 
 else
   return hitA.score  hitB.score;
 }
 {code}
 or alternatively have a defined order for NaN (Float.compare sorts them after 
 +inf):
 {code}
 protected final boolean lessThan(ScoreDoc hitA, ScoreDoc hitB) {
 if (hitA.score == hitB.score)
   return hitA.doc  hitB.doc; 
 else
   return Float.compare(hitA.score, hitB.score)  0;
 }
 {code}
 The problem with both solutions is, that we have now more comparisons per hit 
 and the use of sentinels is questionable. I would like to remove the 
 sentinels and use the old pre 2.9 code for comparing and using PQ.add() when 
 a competitive hit arrives. The order of NaN would be unspecified.
 To fix the order of NaN, it would be better to replace all score comparisons 
 by Float.compare() [also in FieldComparator].
 I would like to delay 2.9.2 and 3.0.1 until this problem is discussed and 
 solved.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-2271) Function queries producing scores of -inf or NaN (e.g. 1/x) return incorrect results with TopScoreDocCollector

2010-02-20 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-2271:
--

Attachment: LUCENE-2271.patch

Fix an issue when numDocs==0.

 Function queries producing scores of -inf or NaN (e.g. 1/x) return incorrect 
 results with TopScoreDocCollector
 --

 Key: LUCENE-2271
 URL: https://issues.apache.org/jira/browse/LUCENE-2271
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Affects Versions: 2.9
Reporter: Uwe Schindler
Priority: Minor
 Fix For: 2.9.2, 3.0.1, 3.1

 Attachments: LUCENE-2271-bench.patch, LUCENE-2271.patch, 
 LUCENE-2271.patch, LUCENE-2271.patch, LUCENE-2271.patch, TSDC.patch


 This is a foolowup to LUCENE-2270, where a part of this problem was fixed 
 (boost = 0 leading to NaN scores, which is also un-intuitive), but in 
 general, function queries in Solr can create these invalid scores easily. In 
 previous version of Lucene these scores ordered correct (except NaN, which 
 mixes up results), but never invalid document ids are returned (like 
 Integer.MAX_VALUE).
 The problem is: TopScoreDocCollector pre-fills the HitQueue with sentinel 
 ScoreDocs with a score of -inf and a doc id of Integer.MAX_VALUE. For the HQ 
 to work, this sentinel must be smaller than all posible values, which is not 
 the case:
 - -inf is equal and the document is not inserted into the HQ, as not 
 competitive, but the HQ is not yet full, so the sentinel values keep in the 
 HQ and result is the Integer.MAX_VALUE docs. This problem is solveable (and 
 only affects the Ordered collector) by chaning the exit condition to:
 {code}
 if (score = pqTop.score  pqTop.doc != Integer.MAX_VALUE) {
 // Since docs are returned in-order (i.e., increasing doc Id), a document
 // with equal score to pqTop.score cannot compete since HitQueue favors
 // documents with lower doc Ids. Therefore reject those docs too.
 return;
 }
 {code}
 - The NaN case can be fixed in the same way, but then has another problem: 
 all comparisons with NaN result in false (none of these is true): x  NaN, x 
  NaN, NaN == NaN. This leads to the fact that HQ's lessThan always returns 
 false, leading to unexspected ordering in the PQ and sometimes the sentinel 
 values do not stay at the top of the queue. A later hit then overrides the 
 top of the queue but leaves the incorrect sentinels  unchanged - invalid 
 results. This can be fixed in two ways in HQ:
 Force all sentinels to the top:
 {code}
 protected final boolean lessThan(ScoreDoc hitA, ScoreDoc hitB) {
 if (hitA.doc == Integer.MAX_VALUE)
   return true;
 if (hitB.doc == Integer.MAX_VALUE)
   return false;
 if (hitA.score == hitB.score)
   return hitA.doc  hitB.doc; 
 else
   return hitA.score  hitB.score;
 }
 {code}
 or alternatively have a defined order for NaN (Float.compare sorts them after 
 +inf):
 {code}
 protected final boolean lessThan(ScoreDoc hitA, ScoreDoc hitB) {
 if (hitA.score == hitB.score)
   return hitA.doc  hitB.doc; 
 else
   return Float.compare(hitA.score, hitB.score)  0;
 }
 {code}
 The problem with both solutions is, that we have now more comparisons per hit 
 and the use of sentinels is questionable. I would like to remove the 
 sentinels and use the old pre 2.9 code for comparing and using PQ.add() when 
 a competitive hit arrives. The order of NaN would be unspecified.
 To fix the order of NaN, it would be better to replace all score comparisons 
 by Float.compare() [also in FieldComparator].
 I would like to delay 2.9.2 and 3.0.1 until this problem is discussed and 
 solved.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Created: (LUCENE-2271) Function queries producing scores of -inf or NaN (e.g. 1/x) return incorrect results with TopScoreDocCollector

2010-02-19 Thread Uwe Schindler (JIRA)
Function queries producing scores of -inf or NaN (e.g. 1/x) return incorrect 
results with TopScoreDocCollector
--

 Key: LUCENE-2271
 URL: https://issues.apache.org/jira/browse/LUCENE-2271
 Project: Lucene - Java
  Issue Type: Bug
  Components: Search
Affects Versions: 2.9
Reporter: Uwe Schindler
 Fix For: 2.9.2, 3.0.1, 3.1


This is a foolowup to LUCENE-2270, where a part of this problem was fixed 
(boost = 0 leading to NaN scores, which is also un-intuitive), but in general, 
function queries in Solr can create these invalid scores easily. In previous 
version of Lucene these scores ordered correct (except NaN, which mixes up 
results), but never invalid document ids are returned (like Integer.MAX_VALUE).

The problem is: TopScoreDocCollector pre-fills the HitQueue with sentinel 
ScoreDocs with a score of -inf and a doc id of Integer.MAX_VALUE. For the HQ to 
work, this sentinel must be smaller than all posible values, which is not the 
case:
- -inf is equal and the document is not inserted into the HQ, as not 
competitive, but the HQ is not yet full, so the sentinel values keep in the HQ 
and result is the Integer.MAX_VALUE docs. This problem is solveable (and only 
affects the Ordered collector) by chaning the exit condition to:
{code}
if (score = pqTop.score  pqTop.doc != Integer.MAX_VALUE) {
// Since docs are returned in-order (i.e., increasing doc Id), a document
// with equal score to pqTop.score cannot compete since HitQueue favors
// documents with lower doc Ids. Therefore reject those docs too.
return;
}
{code}

- The NaN case can be fixed in the same way, but then has another problem: all 
comparisons with NaN result in false (none of these is true): x  NaN, x  NaN, 
NaN == NaN. This leads to the fact that HQ's lessThan always returns false, 
leading to unexspected ordering in the PQ and sometimes the sentinel values do 
not stay at the top of the queue. A later hit then overrides the top of the 
queue but leaves the incorrect sentinels  unchanged - invalid results. This 
can be fixed in two ways in HQ:
Force all sentinels to the top:
{code}
protected final boolean lessThan(ScoreDoc hitA, ScoreDoc hitB) {
if (hitA.doc == Integer.MAX_VALUE)
  return true;
if (hitB.doc == Integer.MAX_VALUE)
  return false;
if (hitA.score == hitB.score)
  return hitA.doc  hitB.doc; 
else
  return hitA.score  hitB.score;
}
{code}
or alternatively have a defined order for NaN (Float.compare sorts them after 
+inf):
{code}
protected final boolean lessThan(ScoreDoc hitA, ScoreDoc hitB) {
if (hitA.score == hitB.score)
  return hitA.doc  hitB.doc; 
else
  return Float.compare(hitA.score, hitB.score)  0;
}
{code}

The problem with both solutions is, that we have now more comparisons per hit 
and the use of sentinels is questionable. I would like to remove the sentinels 
and use the old pre 2.9 code for comparing and using PQ.add() when a 
competitive hit arrives. The order of NaN would be unspecified.

To fix the order of NaN, it would be better to replace all score comparisons by 
Float.compare() [also in FieldComparator].

I would like to delay 2.9.2 and 3.0.1 until this problem is discussed and 
solved.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-2271) Function queries producing scores of -inf or NaN (e.g. 1/x) return incorrect results with TopScoreDocCollector

2010-02-19 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-2271:
--

Attachment: LUCENE-2271.patch

This is patch that supports NaN and -inf.

The cost of the additional checks in HitQueue.lessThan are neglectible, as they 
only occur when a competitive hit is really inserted into the queue. The check 
enforces all sentinels to the top of the queue, regardless what their score is 
(because always NaN != NaN).

 Function queries producing scores of -inf or NaN (e.g. 1/x) return incorrect 
 results with TopScoreDocCollector
 --

 Key: LUCENE-2271
 URL: https://issues.apache.org/jira/browse/LUCENE-2271
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Affects Versions: 2.9
Reporter: Uwe Schindler
Priority: Minor
 Fix For: 2.9.2, 3.0.1, 3.1

 Attachments: LUCENE-2271.patch


 This is a foolowup to LUCENE-2270, where a part of this problem was fixed 
 (boost = 0 leading to NaN scores, which is also un-intuitive), but in 
 general, function queries in Solr can create these invalid scores easily. In 
 previous version of Lucene these scores ordered correct (except NaN, which 
 mixes up results), but never invalid document ids are returned (like 
 Integer.MAX_VALUE).
 The problem is: TopScoreDocCollector pre-fills the HitQueue with sentinel 
 ScoreDocs with a score of -inf and a doc id of Integer.MAX_VALUE. For the HQ 
 to work, this sentinel must be smaller than all posible values, which is not 
 the case:
 - -inf is equal and the document is not inserted into the HQ, as not 
 competitive, but the HQ is not yet full, so the sentinel values keep in the 
 HQ and result is the Integer.MAX_VALUE docs. This problem is solveable (and 
 only affects the Ordered collector) by chaning the exit condition to:
 {code}
 if (score = pqTop.score  pqTop.doc != Integer.MAX_VALUE) {
 // Since docs are returned in-order (i.e., increasing doc Id), a document
 // with equal score to pqTop.score cannot compete since HitQueue favors
 // documents with lower doc Ids. Therefore reject those docs too.
 return;
 }
 {code}
 - The NaN case can be fixed in the same way, but then has another problem: 
 all comparisons with NaN result in false (none of these is true): x  NaN, x 
  NaN, NaN == NaN. This leads to the fact that HQ's lessThan always returns 
 false, leading to unexspected ordering in the PQ and sometimes the sentinel 
 values do not stay at the top of the queue. A later hit then overrides the 
 top of the queue but leaves the incorrect sentinels  unchanged - invalid 
 results. This can be fixed in two ways in HQ:
 Force all sentinels to the top:
 {code}
 protected final boolean lessThan(ScoreDoc hitA, ScoreDoc hitB) {
 if (hitA.doc == Integer.MAX_VALUE)
   return true;
 if (hitB.doc == Integer.MAX_VALUE)
   return false;
 if (hitA.score == hitB.score)
   return hitA.doc  hitB.doc; 
 else
   return hitA.score  hitB.score;
 }
 {code}
 or alternatively have a defined order for NaN (Float.compare sorts them after 
 +inf):
 {code}
 protected final boolean lessThan(ScoreDoc hitA, ScoreDoc hitB) {
 if (hitA.score == hitB.score)
   return hitA.doc  hitB.doc; 
 else
   return Float.compare(hitA.score, hitB.score)  0;
 }
 {code}
 The problem with both solutions is, that we have now more comparisons per hit 
 and the use of sentinels is questionable. I would like to remove the 
 sentinels and use the old pre 2.9 code for comparing and using PQ.add() when 
 a competitive hit arrives. The order of NaN would be unspecified.
 To fix the order of NaN, it would be better to replace all score comparisons 
 by Float.compare() [also in FieldComparator].
 I would like to delay 2.9.2 and 3.0.1 until this problem is discussed and 
 solved.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-2271) Function queries producing scores of -inf or NaN (e.g. 1/x) return incorrect results with TopScoreDocCollector

2010-02-19 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-2271:
--

Attachment: LUCENE-2271.patch

Sorry reverted a comment remove.

 Function queries producing scores of -inf or NaN (e.g. 1/x) return incorrect 
 results with TopScoreDocCollector
 --

 Key: LUCENE-2271
 URL: https://issues.apache.org/jira/browse/LUCENE-2271
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Affects Versions: 2.9
Reporter: Uwe Schindler
Priority: Minor
 Fix For: 2.9.2, 3.0.1, 3.1

 Attachments: LUCENE-2271.patch, TSDC.patch


 This is a foolowup to LUCENE-2270, where a part of this problem was fixed 
 (boost = 0 leading to NaN scores, which is also un-intuitive), but in 
 general, function queries in Solr can create these invalid scores easily. In 
 previous version of Lucene these scores ordered correct (except NaN, which 
 mixes up results), but never invalid document ids are returned (like 
 Integer.MAX_VALUE).
 The problem is: TopScoreDocCollector pre-fills the HitQueue with sentinel 
 ScoreDocs with a score of -inf and a doc id of Integer.MAX_VALUE. For the HQ 
 to work, this sentinel must be smaller than all posible values, which is not 
 the case:
 - -inf is equal and the document is not inserted into the HQ, as not 
 competitive, but the HQ is not yet full, so the sentinel values keep in the 
 HQ and result is the Integer.MAX_VALUE docs. This problem is solveable (and 
 only affects the Ordered collector) by chaning the exit condition to:
 {code}
 if (score = pqTop.score  pqTop.doc != Integer.MAX_VALUE) {
 // Since docs are returned in-order (i.e., increasing doc Id), a document
 // with equal score to pqTop.score cannot compete since HitQueue favors
 // documents with lower doc Ids. Therefore reject those docs too.
 return;
 }
 {code}
 - The NaN case can be fixed in the same way, but then has another problem: 
 all comparisons with NaN result in false (none of these is true): x  NaN, x 
  NaN, NaN == NaN. This leads to the fact that HQ's lessThan always returns 
 false, leading to unexspected ordering in the PQ and sometimes the sentinel 
 values do not stay at the top of the queue. A later hit then overrides the 
 top of the queue but leaves the incorrect sentinels  unchanged - invalid 
 results. This can be fixed in two ways in HQ:
 Force all sentinels to the top:
 {code}
 protected final boolean lessThan(ScoreDoc hitA, ScoreDoc hitB) {
 if (hitA.doc == Integer.MAX_VALUE)
   return true;
 if (hitB.doc == Integer.MAX_VALUE)
   return false;
 if (hitA.score == hitB.score)
   return hitA.doc  hitB.doc; 
 else
   return hitA.score  hitB.score;
 }
 {code}
 or alternatively have a defined order for NaN (Float.compare sorts them after 
 +inf):
 {code}
 protected final boolean lessThan(ScoreDoc hitA, ScoreDoc hitB) {
 if (hitA.score == hitB.score)
   return hitA.doc  hitB.doc; 
 else
   return Float.compare(hitA.score, hitB.score)  0;
 }
 {code}
 The problem with both solutions is, that we have now more comparisons per hit 
 and the use of sentinels is questionable. I would like to remove the 
 sentinels and use the old pre 2.9 code for comparing and using PQ.add() when 
 a competitive hit arrives. The order of NaN would be unspecified.
 To fix the order of NaN, it would be better to replace all score comparisons 
 by Float.compare() [also in FieldComparator].
 I would like to delay 2.9.2 and 3.0.1 until this problem is discussed and 
 solved.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-2271) Function queries producing scores of -inf or NaN (e.g. 1/x) return incorrect results with TopScoreDocCollector

2010-02-19 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-2271:
--

Attachment: (was: LUCENE-2271.patch)

 Function queries producing scores of -inf or NaN (e.g. 1/x) return incorrect 
 results with TopScoreDocCollector
 --

 Key: LUCENE-2271
 URL: https://issues.apache.org/jira/browse/LUCENE-2271
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Affects Versions: 2.9
Reporter: Uwe Schindler
Priority: Minor
 Fix For: 2.9.2, 3.0.1, 3.1

 Attachments: LUCENE-2271.patch, TSDC.patch


 This is a foolowup to LUCENE-2270, where a part of this problem was fixed 
 (boost = 0 leading to NaN scores, which is also un-intuitive), but in 
 general, function queries in Solr can create these invalid scores easily. In 
 previous version of Lucene these scores ordered correct (except NaN, which 
 mixes up results), but never invalid document ids are returned (like 
 Integer.MAX_VALUE).
 The problem is: TopScoreDocCollector pre-fills the HitQueue with sentinel 
 ScoreDocs with a score of -inf and a doc id of Integer.MAX_VALUE. For the HQ 
 to work, this sentinel must be smaller than all posible values, which is not 
 the case:
 - -inf is equal and the document is not inserted into the HQ, as not 
 competitive, but the HQ is not yet full, so the sentinel values keep in the 
 HQ and result is the Integer.MAX_VALUE docs. This problem is solveable (and 
 only affects the Ordered collector) by chaning the exit condition to:
 {code}
 if (score = pqTop.score  pqTop.doc != Integer.MAX_VALUE) {
 // Since docs are returned in-order (i.e., increasing doc Id), a document
 // with equal score to pqTop.score cannot compete since HitQueue favors
 // documents with lower doc Ids. Therefore reject those docs too.
 return;
 }
 {code}
 - The NaN case can be fixed in the same way, but then has another problem: 
 all comparisons with NaN result in false (none of these is true): x  NaN, x 
  NaN, NaN == NaN. This leads to the fact that HQ's lessThan always returns 
 false, leading to unexspected ordering in the PQ and sometimes the sentinel 
 values do not stay at the top of the queue. A later hit then overrides the 
 top of the queue but leaves the incorrect sentinels  unchanged - invalid 
 results. This can be fixed in two ways in HQ:
 Force all sentinels to the top:
 {code}
 protected final boolean lessThan(ScoreDoc hitA, ScoreDoc hitB) {
 if (hitA.doc == Integer.MAX_VALUE)
   return true;
 if (hitB.doc == Integer.MAX_VALUE)
   return false;
 if (hitA.score == hitB.score)
   return hitA.doc  hitB.doc; 
 else
   return hitA.score  hitB.score;
 }
 {code}
 or alternatively have a defined order for NaN (Float.compare sorts them after 
 +inf):
 {code}
 protected final boolean lessThan(ScoreDoc hitA, ScoreDoc hitB) {
 if (hitA.score == hitB.score)
   return hitA.doc  hitB.doc; 
 else
   return Float.compare(hitA.score, hitB.score)  0;
 }
 {code}
 The problem with both solutions is, that we have now more comparisons per hit 
 and the use of sentinels is questionable. I would like to remove the 
 sentinels and use the old pre 2.9 code for comparing and using PQ.add() when 
 a competitive hit arrives. The order of NaN would be unspecified.
 To fix the order of NaN, it would be better to replace all score comparisons 
 by Float.compare() [also in FieldComparator].
 I would like to delay 2.9.2 and 3.0.1 until this problem is discussed and 
 solved.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-2271) Function queries producing scores of -inf or NaN (e.g. 1/x) return incorrect results with TopScoreDocCollector

2010-02-19 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-2271:
--

Attachment: LUCENE-2271.patch

Patch with testcases for trunk, but should work on branches, too (after 
removing @Override). Without the fixes in HitQueue or TSDC the tests fail.

 Function queries producing scores of -inf or NaN (e.g. 1/x) return incorrect 
 results with TopScoreDocCollector
 --

 Key: LUCENE-2271
 URL: https://issues.apache.org/jira/browse/LUCENE-2271
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Affects Versions: 2.9
Reporter: Uwe Schindler
Priority: Minor
 Fix For: 2.9.2, 3.0.1, 3.1

 Attachments: LUCENE-2271.patch, LUCENE-2271.patch, TSDC.patch


 This is a foolowup to LUCENE-2270, where a part of this problem was fixed 
 (boost = 0 leading to NaN scores, which is also un-intuitive), but in 
 general, function queries in Solr can create these invalid scores easily. In 
 previous version of Lucene these scores ordered correct (except NaN, which 
 mixes up results), but never invalid document ids are returned (like 
 Integer.MAX_VALUE).
 The problem is: TopScoreDocCollector pre-fills the HitQueue with sentinel 
 ScoreDocs with a score of -inf and a doc id of Integer.MAX_VALUE. For the HQ 
 to work, this sentinel must be smaller than all posible values, which is not 
 the case:
 - -inf is equal and the document is not inserted into the HQ, as not 
 competitive, but the HQ is not yet full, so the sentinel values keep in the 
 HQ and result is the Integer.MAX_VALUE docs. This problem is solveable (and 
 only affects the Ordered collector) by chaning the exit condition to:
 {code}
 if (score = pqTop.score  pqTop.doc != Integer.MAX_VALUE) {
 // Since docs are returned in-order (i.e., increasing doc Id), a document
 // with equal score to pqTop.score cannot compete since HitQueue favors
 // documents with lower doc Ids. Therefore reject those docs too.
 return;
 }
 {code}
 - The NaN case can be fixed in the same way, but then has another problem: 
 all comparisons with NaN result in false (none of these is true): x  NaN, x 
  NaN, NaN == NaN. This leads to the fact that HQ's lessThan always returns 
 false, leading to unexspected ordering in the PQ and sometimes the sentinel 
 values do not stay at the top of the queue. A later hit then overrides the 
 top of the queue but leaves the incorrect sentinels  unchanged - invalid 
 results. This can be fixed in two ways in HQ:
 Force all sentinels to the top:
 {code}
 protected final boolean lessThan(ScoreDoc hitA, ScoreDoc hitB) {
 if (hitA.doc == Integer.MAX_VALUE)
   return true;
 if (hitB.doc == Integer.MAX_VALUE)
   return false;
 if (hitA.score == hitB.score)
   return hitA.doc  hitB.doc; 
 else
   return hitA.score  hitB.score;
 }
 {code}
 or alternatively have a defined order for NaN (Float.compare sorts them after 
 +inf):
 {code}
 protected final boolean lessThan(ScoreDoc hitA, ScoreDoc hitB) {
 if (hitA.score == hitB.score)
   return hitA.doc  hitB.doc; 
 else
   return Float.compare(hitA.score, hitB.score)  0;
 }
 {code}
 The problem with both solutions is, that we have now more comparisons per hit 
 and the use of sentinels is questionable. I would like to remove the 
 sentinels and use the old pre 2.9 code for comparing and using PQ.add() when 
 a competitive hit arrives. The order of NaN would be unspecified.
 To fix the order of NaN, it would be better to replace all score comparisons 
 by Float.compare() [also in FieldComparator].
 I would like to delay 2.9.2 and 3.0.1 until this problem is discussed and 
 solved.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2271) Function queries producing scores of -inf or NaN (e.g. 1/x) return incorrect results with TopScoreDocCollector

2010-02-19 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12835763#action_12835763
 ] 

Uwe Schindler commented on LUCENE-2271:
---

The cost to handle NaN is the modified lessThan() in HitQueue.

 Function queries producing scores of -inf or NaN (e.g. 1/x) return incorrect 
 results with TopScoreDocCollector
 --

 Key: LUCENE-2271
 URL: https://issues.apache.org/jira/browse/LUCENE-2271
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Affects Versions: 2.9
Reporter: Uwe Schindler
Priority: Minor
 Fix For: 2.9.2, 3.0.1, 3.1

 Attachments: LUCENE-2271.patch, LUCENE-2271.patch, TSDC.patch


 This is a foolowup to LUCENE-2270, where a part of this problem was fixed 
 (boost = 0 leading to NaN scores, which is also un-intuitive), but in 
 general, function queries in Solr can create these invalid scores easily. In 
 previous version of Lucene these scores ordered correct (except NaN, which 
 mixes up results), but never invalid document ids are returned (like 
 Integer.MAX_VALUE).
 The problem is: TopScoreDocCollector pre-fills the HitQueue with sentinel 
 ScoreDocs with a score of -inf and a doc id of Integer.MAX_VALUE. For the HQ 
 to work, this sentinel must be smaller than all posible values, which is not 
 the case:
 - -inf is equal and the document is not inserted into the HQ, as not 
 competitive, but the HQ is not yet full, so the sentinel values keep in the 
 HQ and result is the Integer.MAX_VALUE docs. This problem is solveable (and 
 only affects the Ordered collector) by chaning the exit condition to:
 {code}
 if (score = pqTop.score  pqTop.doc != Integer.MAX_VALUE) {
 // Since docs are returned in-order (i.e., increasing doc Id), a document
 // with equal score to pqTop.score cannot compete since HitQueue favors
 // documents with lower doc Ids. Therefore reject those docs too.
 return;
 }
 {code}
 - The NaN case can be fixed in the same way, but then has another problem: 
 all comparisons with NaN result in false (none of these is true): x  NaN, x 
  NaN, NaN == NaN. This leads to the fact that HQ's lessThan always returns 
 false, leading to unexspected ordering in the PQ and sometimes the sentinel 
 values do not stay at the top of the queue. A later hit then overrides the 
 top of the queue but leaves the incorrect sentinels  unchanged - invalid 
 results. This can be fixed in two ways in HQ:
 Force all sentinels to the top:
 {code}
 protected final boolean lessThan(ScoreDoc hitA, ScoreDoc hitB) {
 if (hitA.doc == Integer.MAX_VALUE)
   return true;
 if (hitB.doc == Integer.MAX_VALUE)
   return false;
 if (hitA.score == hitB.score)
   return hitA.doc  hitB.doc; 
 else
   return hitA.score  hitB.score;
 }
 {code}
 or alternatively have a defined order for NaN (Float.compare sorts them after 
 +inf):
 {code}
 protected final boolean lessThan(ScoreDoc hitA, ScoreDoc hitB) {
 if (hitA.score == hitB.score)
   return hitA.doc  hitB.doc; 
 else
   return Float.compare(hitA.score, hitB.score)  0;
 }
 {code}
 The problem with both solutions is, that we have now more comparisons per hit 
 and the use of sentinels is questionable. I would like to remove the 
 sentinels and use the old pre 2.9 code for comparing and using PQ.add() when 
 a competitive hit arrives. The order of NaN would be unspecified.
 To fix the order of NaN, it would be better to replace all score comparisons 
 by Float.compare() [also in FieldComparator].
 I would like to delay 2.9.2 and 3.0.1 until this problem is discussed and 
 solved.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-2271) Function queries producing scores of -inf or NaN (e.g. 1/x) return incorrect results with TopScoreDocCollector

2010-02-19 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-2271:
--

Attachment: LUCENE-2271.patch

Improved test, that also checks for increasing doc ids when score identical

 Function queries producing scores of -inf or NaN (e.g. 1/x) return incorrect 
 results with TopScoreDocCollector
 --

 Key: LUCENE-2271
 URL: https://issues.apache.org/jira/browse/LUCENE-2271
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Affects Versions: 2.9
Reporter: Uwe Schindler
Priority: Minor
 Fix For: 2.9.2, 3.0.1, 3.1

 Attachments: LUCENE-2271.patch, LUCENE-2271.patch, LUCENE-2271.patch, 
 TSDC.patch


 This is a foolowup to LUCENE-2270, where a part of this problem was fixed 
 (boost = 0 leading to NaN scores, which is also un-intuitive), but in 
 general, function queries in Solr can create these invalid scores easily. In 
 previous version of Lucene these scores ordered correct (except NaN, which 
 mixes up results), but never invalid document ids are returned (like 
 Integer.MAX_VALUE).
 The problem is: TopScoreDocCollector pre-fills the HitQueue with sentinel 
 ScoreDocs with a score of -inf and a doc id of Integer.MAX_VALUE. For the HQ 
 to work, this sentinel must be smaller than all posible values, which is not 
 the case:
 - -inf is equal and the document is not inserted into the HQ, as not 
 competitive, but the HQ is not yet full, so the sentinel values keep in the 
 HQ and result is the Integer.MAX_VALUE docs. This problem is solveable (and 
 only affects the Ordered collector) by chaning the exit condition to:
 {code}
 if (score = pqTop.score  pqTop.doc != Integer.MAX_VALUE) {
 // Since docs are returned in-order (i.e., increasing doc Id), a document
 // with equal score to pqTop.score cannot compete since HitQueue favors
 // documents with lower doc Ids. Therefore reject those docs too.
 return;
 }
 {code}
 - The NaN case can be fixed in the same way, but then has another problem: 
 all comparisons with NaN result in false (none of these is true): x  NaN, x 
  NaN, NaN == NaN. This leads to the fact that HQ's lessThan always returns 
 false, leading to unexspected ordering in the PQ and sometimes the sentinel 
 values do not stay at the top of the queue. A later hit then overrides the 
 top of the queue but leaves the incorrect sentinels  unchanged - invalid 
 results. This can be fixed in two ways in HQ:
 Force all sentinels to the top:
 {code}
 protected final boolean lessThan(ScoreDoc hitA, ScoreDoc hitB) {
 if (hitA.doc == Integer.MAX_VALUE)
   return true;
 if (hitB.doc == Integer.MAX_VALUE)
   return false;
 if (hitA.score == hitB.score)
   return hitA.doc  hitB.doc; 
 else
   return hitA.score  hitB.score;
 }
 {code}
 or alternatively have a defined order for NaN (Float.compare sorts them after 
 +inf):
 {code}
 protected final boolean lessThan(ScoreDoc hitA, ScoreDoc hitB) {
 if (hitA.score == hitB.score)
   return hitA.doc  hitB.doc; 
 else
   return Float.compare(hitA.score, hitB.score)  0;
 }
 {code}
 The problem with both solutions is, that we have now more comparisons per hit 
 and the use of sentinels is questionable. I would like to remove the 
 sentinels and use the old pre 2.9 code for comparing and using PQ.add() when 
 a competitive hit arrives. The order of NaN would be unspecified.
 To fix the order of NaN, it would be better to replace all score comparisons 
 by Float.compare() [also in FieldComparator].
 I would like to delay 2.9.2 and 3.0.1 until this problem is discussed and 
 solved.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Reopened: (LUCENE-2190) CustomScoreQuery (function query) is broken (due to per-segment searching)

2010-02-19 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler reopened LUCENE-2190:
---


The fix is invalid:
Adding setNextReader to CustomScoreQuery makes the Query itsself stateful. This 
breaks when using together with e.g. ParallelMultiSearcher.
As the package is experimental, I see no problem in changing the method 
signature of customScore to pass in the affected IndexReader (CustomScorer can 
do this)

 CustomScoreQuery (function query) is broken (due to per-segment searching)
 --

 Key: LUCENE-2190
 URL: https://issues.apache.org/jira/browse/LUCENE-2190
 Project: Lucene - Java
  Issue Type: Bug
  Components: Search
Affects Versions: 2.9, 2.9.1, 3.0, 3.0.1, 3.1
Reporter: Michael McCandless
Assignee: Michael McCandless
 Fix For: 2.9.2, 3.0.1, 3.1

 Attachments: LUCENE-2190.patch


 Spinoff from here:
   http://lucene.markmail.org/message/psw2m3adzibaixbq
 With the cutover to per-segment searching, CustomScoreQuery is not really 
 usable anymore, because the per-doc custom scoring method (customScore) 
 receives a per-segment docID, yet there is no way to figure out which segment 
 you are currently searching.
 I think to fix this we must also notify the subclass whenever a new segment 
 is switched to.  I think if we copy Collector.setNextReader, that would be 
 sufficient.  It would by default do nothing in CustomScoreQuery, but a 
 subclass could override.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2190) CustomScoreQuery (function query) is broken (due to per-segment searching)

2010-02-19 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12835837#action_12835837
 ] 

Uwe Schindler commented on LUCENE-2190:
---

We can preserve backwards compatibility is the default impl with the new reader 
only passes to the deprecated old customScore function.

I will provide a patch tomorrow.

 CustomScoreQuery (function query) is broken (due to per-segment searching)
 --

 Key: LUCENE-2190
 URL: https://issues.apache.org/jira/browse/LUCENE-2190
 Project: Lucene - Java
  Issue Type: Bug
  Components: Search
Affects Versions: 2.9, 2.9.1, 3.0, 3.0.1, 3.1
Reporter: Michael McCandless
Assignee: Michael McCandless
 Fix For: 2.9.2, 3.0.1, 3.1

 Attachments: LUCENE-2190.patch


 Spinoff from here:
   http://lucene.markmail.org/message/psw2m3adzibaixbq
 With the cutover to per-segment searching, CustomScoreQuery is not really 
 usable anymore, because the per-doc custom scoring method (customScore) 
 receives a per-segment docID, yet there is no way to figure out which segment 
 you are currently searching.
 I think to fix this we must also notify the subclass whenever a new segment 
 is switched to.  I think if we copy Collector.setNextReader, that would be 
 sufficient.  It would by default do nothing in CustomScoreQuery, but a 
 subclass could override.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2190) CustomScoreQuery (function query) is broken (due to per-segment searching)

2010-02-19 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12835903#action_12835903
 ] 

Uwe Schindler commented on LUCENE-2190:
---

During refactoring I found out:

CustomScoreQuery is more broken: the rewrite() method is wrong, for now its not 
really a problem but when we change to per-segment rewrite (as Mike plans) its 
broken. Its even broken if you rewrite against one IndexReader and want to 
reuse the query later on another IndexReader. It rewrites all its subqueries 
and returns itsself, which is wrong: if one of the subqueries was rewritten it 
must return a new clone instance (like BooleanQuery). Also hashCode and equals 
ignore strict.

Will provide patch later. Now everything seems to work correct.

 CustomScoreQuery (function query) is broken (due to per-segment searching)
 --

 Key: LUCENE-2190
 URL: https://issues.apache.org/jira/browse/LUCENE-2190
 Project: Lucene - Java
  Issue Type: Bug
  Components: Search
Affects Versions: 2.9, 2.9.1, 3.0, 3.0.1, 3.1
Reporter: Michael McCandless
Assignee: Michael McCandless
 Fix For: 2.9.2, 3.0.1, 3.1

 Attachments: LUCENE-2190.patch


 Spinoff from here:
   http://lucene.markmail.org/message/psw2m3adzibaixbq
 With the cutover to per-segment searching, CustomScoreQuery is not really 
 usable anymore, because the per-doc custom scoring method (customScore) 
 receives a per-segment docID, yet there is no way to figure out which segment 
 you are currently searching.
 I think to fix this we must also notify the subclass whenever a new segment 
 is switched to.  I think if we copy Collector.setNextReader, that would be 
 sufficient.  It would by default do nothing in CustomScoreQuery, but a 
 subclass could override.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-2267) Add solr's artifact signing scripts into lucene's build.xml/common-build.xml

2010-02-18 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-2267:
--

Attachment: LUCENE-2267.patch

Patch with a heavy improved version of solrs macros, I changed:

- For security reasons, the password is not passed through command line (you 
can see it with ps \-ef !!!). Also \-\-passphrase does not work with newer 
2.x versions of gpg. The correct way is the same as in Mike McCandless Python 
script to pass \-\-passphrase-fd 0 (then it read the passphrase from stdin), 
piping in the password using the inputstring task attribute of ant.
- added \-\-batch parameter to gpg. Without, in GUI environments it ignores 
the passed-in password and uses gpg-agent
- no manual signing of every file, it uses the apply ant task that starts a 
process for every file in a fileset and also supplies a source - desfilename 
mapping (which appends .asc)
- add \-\-default-key with a default value of CODE SIGNING KEY, you can 
override with \-Dgpg.key=YourHexKeyOrEmail

The only problem is that appy does not print the command lines or some 
filelist. You only get a message at the end that applied 'gpg' to x files, 
which is fine.

Usage:
{code}
ant sign-artifacts -Dgpg.exe=/path/to/gpg -Dgpg.key=YourHexKeyOrEmail 
-Dgpg.passphrase=12345
{code}
All parameters are optional, defaults are:
{code}
gpg.exe = gpg
gpg.key = CODE SIGNING KEY
gpg.passphrase = none, if not given, you are asked to input
{code}

 Add solr's artifact signing scripts into lucene's build.xml/common-build.xml
 

 Key: LUCENE-2267
 URL: https://issues.apache.org/jira/browse/LUCENE-2267
 Project: Lucene - Java
  Issue Type: Improvement
Affects Versions: 2.9.2, 3.0.1
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 3.1

 Attachments: LUCENE-2267.patch


 Solr has nice artifact signing scripts in its common-build.xml and build.xml.
 For me as release manager of 3.0 it would have be good to have them also when 
 building lucene artifacts. I will investigate how to add them to src 
 artifacts and maven artifacts

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2267) Add solr's artifact signing scripts into lucene's build.xml/common-build.xml

2010-02-18 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12835217#action_12835217
 ] 

Uwe Schindler commented on LUCENE-2267:
---

I forgot, the target has no dependencies on maven run before or dist-src/bin. 
You have to run dist-src, dist-bin and generate-maven-artifacts before, esle it 
would simply sign no files, or more it would break, because dist-folder does 
not exist

 Add solr's artifact signing scripts into lucene's build.xml/common-build.xml
 

 Key: LUCENE-2267
 URL: https://issues.apache.org/jira/browse/LUCENE-2267
 Project: Lucene - Java
  Issue Type: Improvement
Affects Versions: 2.9.2, 3.0.1
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 3.1

 Attachments: LUCENE-2267.patch


 Solr has nice artifact signing scripts in its common-build.xml and build.xml.
 For me as release manager of 3.0 it would have be good to have them also when 
 building lucene artifacts. I will investigate how to add them to src 
 artifacts and maven artifacts

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-2267) Add solr's artifact signing scripts into lucene's build.xml/common-build.xml

2010-02-18 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-2267:
--

Attachment: LUCENE-2267.patch

Updated patch that needs only requires  trunk's minimum ANT version 1.7.0. 
Secure password input is only available, if ant = 1.7.1 and java 6 is used.

 Add solr's artifact signing scripts into lucene's build.xml/common-build.xml
 

 Key: LUCENE-2267
 URL: https://issues.apache.org/jira/browse/LUCENE-2267
 Project: Lucene - Java
  Issue Type: Improvement
Affects Versions: 2.9.2, 3.0.1
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 3.1

 Attachments: LUCENE-2267.patch, LUCENE-2267.patch


 Solr has nice artifact signing scripts in its common-build.xml and build.xml.
 For me as release manager of 3.0 it would have be good to have them also when 
 building lucene artifacts. I will investigate how to add them to src 
 artifacts and maven artifacts

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-2270) queries with zero boosts don't work

2010-02-18 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-2270:
--

Fix Version/s: 3.1
   3.0.1
   2.9.2
 Assignee: Yonik Seeley

 queries with zero boosts don't work
 ---

 Key: LUCENE-2270
 URL: https://issues.apache.org/jira/browse/LUCENE-2270
 Project: Lucene - Java
  Issue Type: Bug
Affects Versions: 2.9
Reporter: Yonik Seeley
Assignee: Yonik Seeley
 Fix For: 2.9.2, 3.0.1, 3.1

 Attachments: LUCENE-2270.patch


 Queries consisting of only zero boosts result in incorrect results.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2089) explore using automaton for fuzzyquery

2010-02-17 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12834695#action_12834695
 ] 

Uwe Schindler commented on LUCENE-2089:
---

Hi Robert,
I reviewed you latest patch and was a little bit irritated, but then everything 
explained when also looking into AutomatonTermsEnum and understanding what 
happes. But there is still some code duplication (not really duplication, but 
functionality duplication):

- If a constant prefix is used, the generated Automatons are using a constant 
prefix + a Levenshtein Automaton (using concat)
- If you run such an automaton agains the TermIndex using the superclass, it 
will seek first to the prefix term (or some term starting with the prefix), 
thats ok. As soon as the prefix is no longer valid, the AutomatonTermsEnum 
stops processing (if running such an automaton using the standard 
AutomatonTermsEnum).
- The AutomatonFuzzyTermsEnum checks if the term starts with prefix and if not 
it exists ENDs (!) the automaton. The reason why this works is because 
nextString() in superclass returns automatically a string starting with the 
prefix, but this was the main fact that irritated me.
- The question is now, is this extra prefix check really needed? Running the 
automaton against the current term in accept would simply return no match and 
nextString() would stop further processing? Or is this because the accept 
method should not iterate through all distances once the prefix is not matched?

Maybe you should add some comments to the AutomatonFuzzyTermsEnum or some 
asserts to show whats happening.

 explore using automaton for fuzzyquery
 --

 Key: LUCENE-2089
 URL: https://issues.apache.org/jira/browse/LUCENE-2089
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Affects Versions: Flex Branch
Reporter: Robert Muir
Assignee: Mark Miller
Priority: Minor
 Fix For: Flex Branch

 Attachments: LUCENE-2089.patch, LUCENE-2089.patch, LUCENE-2089.patch, 
 LUCENE-2089.patch, LUCENE-2089_concat.patch, Moman-0.2.1.tar.gz, 
 TestFuzzy.java


 Mark brought this up on LUCENE-1606 (i will assign this to him, I know he is 
 itching to write that nasty algorithm)
 we can optimize fuzzyquery by using AutomatonTermsEnum, here is my idea
 * up front, calculate the maximum required K edits needed to match the users 
 supplied float threshold.
 * for at least small common E up to some max K (1,2,3, etc) we should create 
 a DFA for each E. 
 if the required E is above our supported max, we use dumb mode at first (no 
 seeking, no DFA, just brute force like now).
 As the pq fills, we swap progressively lower DFAs into the enum, based upon 
 the lowest score in the pq.
 This should work well on avg, at high E, you will typically fill the pq very 
 quickly since you will match many terms. 
 This not only provides a mechanism to switch to more efficient DFAs during 
 enumeration, but also to switch from dumb mode to smart mode.
 i modified my wildcard benchmark to generate random fuzzy queries.
 * Pattern: 7N stands for NNN, etc.
 * AvgMS_DFA: this is the time spent creating the automaton (constructor)
 ||Pattern||Iter||AvgHits||AvgMS(old)||AvgMS (new,total)||AvgMS_DFA||
 |7N|10|64.0|4155.9|38.6|20.3|
 |14N|10|0.0|2511.6|46.0|37.9| 
 |28N|10|0.0|2506.3|93.0|86.6|
 |56N|10|0.0|2524.5|304.4|298.5|
 as you can see, this prototype is no good yet, because it creates the DFA in 
 a slow way. right now it creates an NFA, and all this wasted time is in 
 NFA-DFA conversion.
 So, for a very long string, it just gets worse and worse. This has nothing to 
 do with lucene, and here you can see, the TermEnum is fast (AvgMS - 
 AvgMS_DFA), there is no problem there.
 instead we should just build a DFA to begin with, maybe with this paper: 
 http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.16.652
 we can precompute the tables with that algorithm up to some reasonable K, and 
 then I think we are ok.
 the paper references using http://portal.acm.org/citation.cfm?id=135907 for 
 linear minimization, if someone wants to implement this they should not worry 
 about minimization.
 in fact, we need to at some point determine if AutomatonQuery should even 
 minimize FSM's at all, or if it is simply enough for them to be deterministic 
 with no transitions to dead states. (The only code that actually assumes 
 minimal DFA is the Dumb vs Smart heuristic and this can be rewritten as a 
 summation easily). we need to benchmark really complex DFAs (i.e. write a 
 regex benchmark) to figure out if minimization is even helping right now.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online

[jira] Created: (LUCENE-2267) Add solr's artifact signing scripts into lucene's build.xml/common-build.xml

2010-02-17 Thread Uwe Schindler (JIRA)
Add solr's artifact signing scripts into lucene's build.xml/common-build.xml


 Key: LUCENE-2267
 URL: https://issues.apache.org/jira/browse/LUCENE-2267
 Project: Lucene - Java
  Issue Type: Improvement
Affects Versions: 2.9.2, 3.0.1
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 3.1


Solr has nice artifact signing scripts in its common-build.xml and build.xml.

For me as release manager of 3.0 it would have be good to have them also when 
building lucene artifacts. I will investigate how to add them to src artifacts 
and maven artifacts

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Created: (LUCENE-2268) Add test to check maven artifacts and their poms

2010-02-17 Thread Uwe Schindler (JIRA)
Add test to check maven artifacts and their poms


 Key: LUCENE-2268
 URL: https://issues.apache.org/jira/browse/LUCENE-2268
 Project: Lucene - Java
  Issue Type: Test
Reporter: Uwe Schindler


As release manager it is hard to find out if the maven artifacts work correct. 
It would be good to have an ant task that executes maven with a .pom file that 
requires all contrib/core artifacts (or one for each contrib) that downloads 
the artifacts from the local dist/maven folder and builds that test project. 
This would require maven to execute the build script. Also it should pass the 
${version} ANT property to this pom.xml

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



RE: [VOTE] Lucene Java 2.9.2 and 3.0.1 release artifacts

2010-02-17 Thread Uwe Schindler
Hi Grant, inline:

 Inline
 
 On Feb 14, 2010, at 6:45 PM, Uwe Schindler wrote:
 
  Hallo Folks,
 
  I have posted a release candidate for both Lucene Java 2.9.2 and
 3.0.1 (which both have the same bug fix level, functionality and
 release announcement), build from revision 910082 of the corresponding
 branches. Thanks for all your help! Please test them and give your
 votes until Thursday morning, as the scheduled release date for both
 versions is Friday, Feb 19th, 2010. Only votes from Lucene PMC are
 binding, but everyone
  is welcome to check the release candidate and voice their approval or
 disapproval. The vote passes if at least three binding +1 votes are
 cast.
 
  We planned the parallel release with one announcement because of
 their parallel development / bug fix level to emphasize that they are
 equal except deprecation removal and Java 5 since major version 3.
 
  Please also read the attached release announcement (Open Document)
 and send it corrected back if you miss anything or want to improve my
 bad English :-)
 
  You find the artifacts here:
  http://people.apache.org/~uschindler/staging-area/lucene-292-301-
 take1-rev910082/
 
 
 Still working through this, but:
 
 Why are there SHA1 signatures for the 3.0.1 releases but not 2.9.2.  I
 don't think SHA1 is required (in fact, isn't it cracked?) so it may be
 fine to just remove it.

Md5 is cracked, sha1 not (yet). We have the sha1 since 3.0 (its generated by 
3.0's build.xml since upgrade to ANT 1.7 because of fixed ant task 
definitions). And also all maven artifacts require sha1, too, so its only 2.9's 
zip/tgz missing them. So I could simply generate them manually for 2.9.2. The 
current 3.0.0 release on apache.org already have sha1, so why remove them now?

 
  === Proposed Release Announcement ===
 
  Hello Lucene users,
 
  On behalf of the Lucene development community I would like to
 announce the release of Lucene Java versions 3.0.1 and 2.9.2:
 
  Both releases fix bugs in the previous versions, where 2.9.2 is the
 last release working with Java 1.4, still providing all deprecated APIs
 of the Lucene Java 2.x series. 3.0.1 has the same bug fix level, but
 requires Java 5 and is no longer compatible with code using deprecated
 APIs. The API was cleaned up to make use of Java 5's generics, varargs,
 enums, and autoboxing. New users of Lucene are advised to use version
 3.0.1 for new developments, because it has a clean, type safe new API.
 Users upgrading from 2.9.x can now remove unnecessary casts and add
 generics to their code, too.
 
  Important improvements in these releases are a increased maximum
 number of unique terms in each index segment. They also add fixes in
 IndexWriter’s commit and lost document deletes in near real-time
 indexing.
  Also lots of bugs in Contrib’s Analyzers package were fixed.
 
 How about:  Several bugs in Contrib's Analyzers package were fixed
 Also, do these changes imply reindexing is needed?  If so, we should
 say so.

I have to go through this, but reindexing is not required, because the bugs 
were mostly missing clearAttributes() calls leading to StopFilter integer 
overflows (with Version.LUCENE_30) - and so crashes during indexing. Robert?

As always we preserve index compatibility, so we would not change behavior 
without adding a new Version enum constant.

I will change the wording, Robert already sent me some grammar changes and a 
better overview using bullted lists.

Thanks for reviewing,
Uwe


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-124) Fuzzy Searches do not get a boost of 0.2 as stated in Query Syntax doc

2010-02-16 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12834163#action_12834163
 ] 

Uwe Schindler commented on LUCENE-124:
--

bq. I will wait till after the code freeze and commit this in a few days if no 
one objects. 

The code freeze only affects branches. Trunk is only frozen for fixes that 
should also go into branches.

 Fuzzy Searches do not get a boost of 0.2 as stated in Query Syntax doc
 

 Key: LUCENE-124
 URL: https://issues.apache.org/jira/browse/LUCENE-124
 Project: Lucene - Java
  Issue Type: Bug
  Components: Search
Affects Versions: 1.2
 Environment: Operating System: All
 Platform: All
Reporter: Cormac Twomey
Assignee: Robert Muir
Priority: Minor
 Attachments: LUCENE-124.patch


 According to the website's Query Syntax page, fuzzy searches are given a
 boost of 0.2. I've found this not to be the case, and have seen situations 
 where
 exact matches have lower relevance scores than fuzzy matches.
 Rather than getting a boost of 0.2, it appears that all variations on the term
 are first found in the model, where dist*  0.5.
 * dist = levenshteinDistance / length of min(termlength, variantlength)
 This then leads to a boolean OR search of all the variant terms, each of whose
 boost is set to (dist - 0.5)*2 for that variant.
 The upshot of all of this is that there are many cases where a fuzzy match 
 will
 get a higher relevance score than an exact match.
 See this email for a test case to reproduce this anomalous behaviour.
 http://www.mail-archive.com/lucene-...@jakarta.apache.org/msg02819.html
 Here is a candidate patch to address the issue -
 *** lucene-1.2\src\java\org\apache\lucene\search\FuzzyTermEnum.java   Sun Jun 
 09
 13:47:54 2002
 --- lucene-1.2-modified\src\java\org\apache\lucene\search\FuzzyTermEnum.java  
 Fri
 Mar 14 11:37:20 2003
 ***
 *** 99,105 
   }
   
   final protected float difference() {
 ! return (float)((distance - FUZZY_THRESHOLD) * SCALE_FACTOR);
   }
   
   final public boolean endEnum() {
 --- 99,109 
   }
   
   final protected float difference() {
 ! if (distance == 1.0) {
 ! return 1.0f;
 ! }
 ! else
 ! return (float)((distance - FUZZY_THRESHOLD) * 
 SCALE_FACTOR);
   }
   
   final public boolean endEnum() {
 ***
 *** 111,117 
**/
   
   public static final double FUZZY_THRESHOLD = 0.5;
 ! public static final double SCALE_FACTOR = 1.0f / (1.0f - 
 FUZZY_THRESHOLD);
   
   /**
Finds and returns the smallest of three integers 
 --- 115,121 
**/
   
   public static final double FUZZY_THRESHOLD = 0.5;
 ! public static final double SCALE_FACTOR = 0.2f * (1.0f / (1.0f -
 FUZZY_THRESHOLD));
   
   /**
Finds and returns the smallest of three integers

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2089) explore using automaton for fuzzyquery

2010-02-16 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12834676#action_12834676
 ] 

Uwe Schindler commented on LUCENE-2089:
---

bq. this is the patch to improve BasicOperations.concatenate. If the left side 
is a singleton automaton, then it has only one final state with no outgoing 
transitions. applying epsilon transitions with the NFA concatenation algorithm 
when the right side is a DFA always produces a resulting DFA, so mark it as 
such. 

Strange that the automaton author did not add this? Have you reported upstream?

 explore using automaton for fuzzyquery
 --

 Key: LUCENE-2089
 URL: https://issues.apache.org/jira/browse/LUCENE-2089
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Affects Versions: Flex Branch
Reporter: Robert Muir
Assignee: Mark Miller
Priority: Minor
 Fix For: Flex Branch

 Attachments: LUCENE-2089.patch, LUCENE-2089.patch, LUCENE-2089.patch, 
 LUCENE-2089.patch, LUCENE-2089_concat.patch, Moman-0.2.1.tar.gz, 
 TestFuzzy.java


 Mark brought this up on LUCENE-1606 (i will assign this to him, I know he is 
 itching to write that nasty algorithm)
 we can optimize fuzzyquery by using AutomatonTermsEnum, here is my idea
 * up front, calculate the maximum required K edits needed to match the users 
 supplied float threshold.
 * for at least small common E up to some max K (1,2,3, etc) we should create 
 a DFA for each E. 
 if the required E is above our supported max, we use dumb mode at first (no 
 seeking, no DFA, just brute force like now).
 As the pq fills, we swap progressively lower DFAs into the enum, based upon 
 the lowest score in the pq.
 This should work well on avg, at high E, you will typically fill the pq very 
 quickly since you will match many terms. 
 This not only provides a mechanism to switch to more efficient DFAs during 
 enumeration, but also to switch from dumb mode to smart mode.
 i modified my wildcard benchmark to generate random fuzzy queries.
 * Pattern: 7N stands for NNN, etc.
 * AvgMS_DFA: this is the time spent creating the automaton (constructor)
 ||Pattern||Iter||AvgHits||AvgMS(old)||AvgMS (new,total)||AvgMS_DFA||
 |7N|10|64.0|4155.9|38.6|20.3|
 |14N|10|0.0|2511.6|46.0|37.9| 
 |28N|10|0.0|2506.3|93.0|86.6|
 |56N|10|0.0|2524.5|304.4|298.5|
 as you can see, this prototype is no good yet, because it creates the DFA in 
 a slow way. right now it creates an NFA, and all this wasted time is in 
 NFA-DFA conversion.
 So, for a very long string, it just gets worse and worse. This has nothing to 
 do with lucene, and here you can see, the TermEnum is fast (AvgMS - 
 AvgMS_DFA), there is no problem there.
 instead we should just build a DFA to begin with, maybe with this paper: 
 http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.16.652
 we can precompute the tables with that algorithm up to some reasonable K, and 
 then I think we are ok.
 the paper references using http://portal.acm.org/citation.cfm?id=135907 for 
 linear minimization, if someone wants to implement this they should not worry 
 about minimization.
 in fact, we need to at some point determine if AutomatonQuery should even 
 minimize FSM's at all, or if it is simply enough for them to be deterministic 
 with no transitions to dead states. (The only code that actually assumes 
 minimal DFA is the Dumb vs Smart heuristic and this can be rewritten as a 
 summation easily). we need to benchmark really complex DFAs (i.e. write a 
 regex benchmark) to figure out if minimization is even helping right now.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-2089) explore using automaton for fuzzyquery

2010-02-15 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2089?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-2089:
--

Affects Version/s: Flex Branch
Fix Version/s: Flex Branch

 explore using automaton for fuzzyquery
 --

 Key: LUCENE-2089
 URL: https://issues.apache.org/jira/browse/LUCENE-2089
 Project: Lucene - Java
  Issue Type: Wish
  Components: Search
Affects Versions: Flex Branch
Reporter: Robert Muir
Assignee: Mark Miller
Priority: Minor
 Fix For: Flex Branch

 Attachments: LUCENE-2089.patch, Moman-0.2.1.tar.gz, TestFuzzy.java


 Mark brought this up on LUCENE-1606 (i will assign this to him, I know he is 
 itching to write that nasty algorithm)
 we can optimize fuzzyquery by using AutomatonTermsEnum, here is my idea
 * up front, calculate the maximum required K edits needed to match the users 
 supplied float threshold.
 * for at least small common E up to some max K (1,2,3, etc) we should create 
 a DFA for each E. 
 if the required E is above our supported max, we use dumb mode at first (no 
 seeking, no DFA, just brute force like now).
 As the pq fills, we swap progressively lower DFAs into the enum, based upon 
 the lowest score in the pq.
 This should work well on avg, at high E, you will typically fill the pq very 
 quickly since you will match many terms. 
 This not only provides a mechanism to switch to more efficient DFAs during 
 enumeration, but also to switch from dumb mode to smart mode.
 i modified my wildcard benchmark to generate random fuzzy queries.
 * Pattern: 7N stands for NNN, etc.
 * AvgMS_DFA: this is the time spent creating the automaton (constructor)
 ||Pattern||Iter||AvgHits||AvgMS(old)||AvgMS (new,total)||AvgMS_DFA||
 |7N|10|64.0|4155.9|38.6|20.3|
 |14N|10|0.0|2511.6|46.0|37.9| 
 |28N|10|0.0|2506.3|93.0|86.6|
 |56N|10|0.0|2524.5|304.4|298.5|
 as you can see, this prototype is no good yet, because it creates the DFA in 
 a slow way. right now it creates an NFA, and all this wasted time is in 
 NFA-DFA conversion.
 So, for a very long string, it just gets worse and worse. This has nothing to 
 do with lucene, and here you can see, the TermEnum is fast (AvgMS - 
 AvgMS_DFA), there is no problem there.
 instead we should just build a DFA to begin with, maybe with this paper: 
 http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.16.652
 we can precompute the tables with that algorithm up to some reasonable K, and 
 then I think we are ok.
 the paper references using http://portal.acm.org/citation.cfm?id=135907 for 
 linear minimization, if someone wants to implement this they should not worry 
 about minimization.
 in fact, we need to at some point determine if AutomatonQuery should even 
 minimize FSM's at all, or if it is simply enough for them to be deterministic 
 with no transitions to dead states. (The only code that actually assumes 
 minimal DFA is the Dumb vs Smart heuristic and this can be rewritten as a 
 summation easily). we need to benchmark really complex DFAs (i.e. write a 
 regex benchmark) to figure out if minimization is even helping right now.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Assigned: (LUCENE-329) Fuzzy query scoring issues

2010-02-15 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler reassigned LUCENE-329:


Assignee: (was: Lucene Developers)

 Fuzzy query scoring issues
 --

 Key: LUCENE-329
 URL: https://issues.apache.org/jira/browse/LUCENE-329
 Project: Lucene - Java
  Issue Type: Bug
  Components: Search
Affects Versions: 1.2rc5
 Environment: Operating System: All
 Platform: All
Reporter: Mark Harwood
Priority: Minor
 Attachments: LUCENE-329.patch, patch.txt


 Queries which automatically produce multiple terms (wildcard, range, prefix, 
 fuzzy etc)currently suffer from two problems:
 1) Scores for matching documents are significantly smaller than term queries 
 because of the volume of terms introduced (A match on query Foo~ is 0.1 
 whereas a match on query Foo is 1).
 2) The rarer forms of expanded terms are favoured over those of more common 
 forms because of the IDF. When using Fuzzy queries for example, rare mis-
 spellings typically appear in results before the more common correct 
 spellings.
 I will attach a patch that corrects the issues identified above by 
 1) Overriding Similarity.coord to counteract the downplaying of scores 
 introduced by expanding terms.
 2) Taking the IDF factor of the most common form of expanded terms as the 
 basis of scoring all other expanded terms.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Assigned: (LUCENE-329) Fuzzy query scoring issues

2010-02-15 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler reassigned LUCENE-329:


Assignee: (was: Lucene Developers)

 Fuzzy query scoring issues
 --

 Key: LUCENE-329
 URL: https://issues.apache.org/jira/browse/LUCENE-329
 Project: Lucene - Java
  Issue Type: Bug
  Components: Search
Affects Versions: 1.2rc5
 Environment: Operating System: All
 Platform: All
Reporter: Mark Harwood
Priority: Minor
 Attachments: LUCENE-329.patch, patch.txt


 Queries which automatically produce multiple terms (wildcard, range, prefix, 
 fuzzy etc)currently suffer from two problems:
 1) Scores for matching documents are significantly smaller than term queries 
 because of the volume of terms introduced (A match on query Foo~ is 0.1 
 whereas a match on query Foo is 1).
 2) The rarer forms of expanded terms are favoured over those of more common 
 forms because of the IDF. When using Fuzzy queries for example, rare mis-
 spellings typically appear in results before the more common correct 
 spellings.
 I will attach a patch that corrects the issues identified above by 
 1) Overriding Similarity.coord to counteract the downplaying of scores 
 introduced by expanding terms.
 2) Taking the IDF factor of the most common form of expanded terms as the 
 basis of scoring all other expanded terms.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Assigned: (LUCENE-124) Fuzzy Searches do not get a boost of 0.2 as stated in Query Syntax doc

2010-02-15 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler reassigned LUCENE-124:


Assignee: (was: Lucene Developers)

 Fuzzy Searches do not get a boost of 0.2 as stated in Query Syntax doc
 

 Key: LUCENE-124
 URL: https://issues.apache.org/jira/browse/LUCENE-124
 Project: Lucene - Java
  Issue Type: Bug
  Components: Search
Affects Versions: 1.2
 Environment: Operating System: All
 Platform: All
Reporter: Cormac Twomey
Priority: Minor

 According to the website's Query Syntax page, fuzzy searches are given a
 boost of 0.2. I've found this not to be the case, and have seen situations 
 where
 exact matches have lower relevance scores than fuzzy matches.
 Rather than getting a boost of 0.2, it appears that all variations on the term
 are first found in the model, where dist*  0.5.
 * dist = levenshteinDistance / length of min(termlength, variantlength)
 This then leads to a boolean OR search of all the variant terms, each of whose
 boost is set to (dist - 0.5)*2 for that variant.
 The upshot of all of this is that there are many cases where a fuzzy match 
 will
 get a higher relevance score than an exact match.
 See this email for a test case to reproduce this anomalous behaviour.
 http://www.mail-archive.com/lucene-...@jakarta.apache.org/msg02819.html
 Here is a candidate patch to address the issue -
 *** lucene-1.2\src\java\org\apache\lucene\search\FuzzyTermEnum.java   Sun Jun 
 09
 13:47:54 2002
 --- lucene-1.2-modified\src\java\org\apache\lucene\search\FuzzyTermEnum.java  
 Fri
 Mar 14 11:37:20 2003
 ***
 *** 99,105 
   }
   
   final protected float difference() {
 ! return (float)((distance - FUZZY_THRESHOLD) * SCALE_FACTOR);
   }
   
   final public boolean endEnum() {
 --- 99,109 
   }
   
   final protected float difference() {
 ! if (distance == 1.0) {
 ! return 1.0f;
 ! }
 ! else
 ! return (float)((distance - FUZZY_THRESHOLD) * 
 SCALE_FACTOR);
   }
   
   final public boolean endEnum() {
 ***
 *** 111,117 
**/
   
   public static final double FUZZY_THRESHOLD = 0.5;
 ! public static final double SCALE_FACTOR = 1.0f / (1.0f - 
 FUZZY_THRESHOLD);
   
   /**
Finds and returns the smallest of three integers 
 --- 115,121 
**/
   
   public static final double FUZZY_THRESHOLD = 0.5;
 ! public static final double SCALE_FACTOR = 0.2f * (1.0f / (1.0f -
 FUZZY_THRESHOLD));
   
   /**
Finds and returns the smallest of three integers

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Assigned: (LUCENE-124) Fuzzy Searches do not get a boost of 0.2 as stated in Query Syntax doc

2010-02-15 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler reassigned LUCENE-124:


Assignee: (was: Lucene Developers)

 Fuzzy Searches do not get a boost of 0.2 as stated in Query Syntax doc
 

 Key: LUCENE-124
 URL: https://issues.apache.org/jira/browse/LUCENE-124
 Project: Lucene - Java
  Issue Type: Bug
  Components: Search
Affects Versions: 1.2
 Environment: Operating System: All
 Platform: All
Reporter: Cormac Twomey
Priority: Minor

 According to the website's Query Syntax page, fuzzy searches are given a
 boost of 0.2. I've found this not to be the case, and have seen situations 
 where
 exact matches have lower relevance scores than fuzzy matches.
 Rather than getting a boost of 0.2, it appears that all variations on the term
 are first found in the model, where dist*  0.5.
 * dist = levenshteinDistance / length of min(termlength, variantlength)
 This then leads to a boolean OR search of all the variant terms, each of whose
 boost is set to (dist - 0.5)*2 for that variant.
 The upshot of all of this is that there are many cases where a fuzzy match 
 will
 get a higher relevance score than an exact match.
 See this email for a test case to reproduce this anomalous behaviour.
 http://www.mail-archive.com/lucene-...@jakarta.apache.org/msg02819.html
 Here is a candidate patch to address the issue -
 *** lucene-1.2\src\java\org\apache\lucene\search\FuzzyTermEnum.java   Sun Jun 
 09
 13:47:54 2002
 --- lucene-1.2-modified\src\java\org\apache\lucene\search\FuzzyTermEnum.java  
 Fri
 Mar 14 11:37:20 2003
 ***
 *** 99,105 
   }
   
   final protected float difference() {
 ! return (float)((distance - FUZZY_THRESHOLD) * SCALE_FACTOR);
   }
   
   final public boolean endEnum() {
 --- 99,109 
   }
   
   final protected float difference() {
 ! if (distance == 1.0) {
 ! return 1.0f;
 ! }
 ! else
 ! return (float)((distance - FUZZY_THRESHOLD) * 
 SCALE_FACTOR);
   }
   
   final public boolean endEnum() {
 ***
 *** 111,117 
**/
   
   public static final double FUZZY_THRESHOLD = 0.5;
 ! public static final double SCALE_FACTOR = 1.0f / (1.0f - 
 FUZZY_THRESHOLD);
   
   /**
Finds and returns the smallest of three integers 
 --- 115,121 
**/
   
   public static final double FUZZY_THRESHOLD = 0.5;
 ! public static final double SCALE_FACTOR = 0.2f * (1.0f / (1.0f -
 FUZZY_THRESHOLD));
   
   /**
Finds and returns the smallest of three integers

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



RE: [VOTE] Lucene Java 2.9.2 and 3.0.1 release artifacts

2010-02-15 Thread Uwe Schindler
As people.apache.org is down, here is an alternate location with the same 
artifacts:

http://alpha.thetaphi.de/lucene-292-301-take1-rev910082/

Happy testing!

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de

 -Original Message-
 From: Uwe Schindler [mailto:u...@thetaphi.de]
 Sent: Monday, February 15, 2010 12:46 AM
 To: gene...@lucene.apache.org; java-dev@lucene.apache.org
 Subject: [VOTE] Lucene Java 2.9.2 and 3.0.1 release artifacts
 
 Hallo Folks,
 
 I have posted a release candidate for both Lucene Java 2.9.2 and 3.0.1
 (which both have the same bug fix level, functionality and release
 announcement), build from revision 910082 of the corresponding
 branches. Thanks for all your help! Please test them and give your
 votes until Thursday morning, as the scheduled release date for both
 versions is Friday, Feb 19th, 2010. Only votes from Lucene PMC are
 binding, but everyone
 is welcome to check the release candidate and voice their approval or
 disapproval. The vote passes if at least three binding +1 votes are
 cast.
 
 We planned the parallel release with one announcement because of their
 parallel development / bug fix level to emphasize that they are equal
 except deprecation removal and Java 5 since major version 3.
 
 Please also read the attached release announcement (Open Document) and
 send it corrected back if you miss anything or want to improve my bad
 English :-)
 
 You find the artifacts here:
 http://people.apache.org/~uschindler/staging-area/lucene-292-301-take1-
 rev910082/
 
 Maven repo:
 http://people.apache.org/~uschindler/staging-area/lucene-292-301-take1-
 rev910082/maven/
 
 The changes are here:
 http://people.apache.org/~uschindler/staging-area/lucene-292-301-take1-
 rev910082/changes-2.9.2/Changes.html
 http://people.apache.org/~uschindler/staging-area/lucene-292-301-take1-
 rev910082/changes-2.9.2/Contrib-Changes.html
 
 http://people.apache.org/~uschindler/staging-area/lucene-292-301-take1-
 rev910082/changes-3.0.1/Changes.html
 http://people.apache.org/~uschindler/staging-area/lucene-292-301-take1-
 rev910082/changes-3.0.1/Contrib-Changes.html
 
 Uwe
 
 === Proposed Release Announcement ===
 
 Hello Lucene users,
 
 On behalf of the Lucene development community I would like to announce
 the release of Lucene Java versions 3.0.1 and 2.9.2:
 
 Both releases fix bugs in the previous versions, where 2.9.2 is the
 last release working with Java 1.4, still providing all deprecated APIs
 of the Lucene Java 2.x series. 3.0.1 has the same bug fix level, but
 requires Java 5 and is no longer compatible with code using deprecated
 APIs. The API was cleaned up to make use of Java 5's generics, varargs,
 enums, and autoboxing. New users of Lucene are advised to use version
 3.0.1 for new developments, because it has a clean, type safe new API.
 Users upgrading from 2.9.x can now remove unnecessary casts and add
 generics to their code, too.
 
 Important improvements in these releases are a increased maximum number
 of unique terms in each index segment. They also add fixes in
 IndexWriter’s commit and lost document deletes in near real-time
 indexing. Also lots of bugs in Contrib’s Analyzers package were fixed.
 Additionally, the 3.0.1 release restored some public methods, that get
 lost during deprecation removal. If you are using Lucene in a web
 application environment, you will notice that the new Attribute-based
 TokenStream API now works correct with different class loaders.
 Both releases are fully compatible with the corresponding previous
 versions. We strongly recommend upgrading to 2.9.2 if you are using
 2.9.1 or 2.9.0; and to 3.0.1 if you are using 3.0.0.
 
 -
 Uwe Schindler
 H.-H.-Meier-Allee 63, D-28213 Bremen
 http://www.thetaphi.de
 eMail: u...@thetaphi.de
 



-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



RE: (LUCENE-1844) Speed up junit tests

2010-02-14 Thread Uwe Schindler
At least we should check all core tests to not set any static defaults without 
try...finally! Are there any possibilities inside Eclipse/other-IDEs to check 
this?

Uwe

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de

 -Original Message-
 From: Michael McCandless [mailto:luc...@mikemccandless.com]
 Sent: Sunday, February 14, 2010 11:43 AM
 To: java-dev@lucene.apache.org
 Subject: Re: (LUCENE-1844) Speed up junit tests
 
 Wow -- this is MUCH faster!  I think we should switch...
 
 It seems like we use a batchtest for all core tests, then for all
 back-compat tests, then once per contrib package?  Ie, so ant
 test-core uses one jvm?
 
 I think we should simply fix any badly behaved tests (that don't
 restore statics).  It's impressive we already have no test failures
 when we do this... I guess our tests are already cleaning things up
 (though also probably not often changing global state, or, changing it
 in a way that'd lead other tests to fail).
 
 Mike
 
 On Sat, Feb 13, 2010 at 5:23 PM, Robert Muir rcm...@gmail.com wrote:
  On Fri, Nov 27, 2009 at 1:27 PM, Michael McCandless
  luc...@mikemccandless.com wrote:
 
  Also one thing I'd love to try is NOT forking the JVM for each test
  (fork=no in the junit task).  I wonder how much time that'd buy...
 
 
  it shaves off a good deal of time on my machine.
 
  'ant test-core': 4 minutes, 39 seconds - 3 minutes, 3 seconds
  'ant test':  11 minutes, 8 seconds - 7 minutes, 13 seconds
 
  however, it makes me a little nervous because i'm not sure all the
 tests
  cleanup nicely if they change statics and stuff.
  anyway, here's the trivial patch (you don't want fork=no, because it
 turns
  off assertions)
 
  Index: common-build.xml
  ===
  --- common-build.xml(revision 909395)
  +++ common-build.xml(working copy)
  @@ -398,7 +398,7 @@
  /condition
  mkdir dir=@{junit.output.dir}/
  junit printsummary=off haltonfailure=no
 maxmemory=512M
  - errorProperty=tests.failed
 failureProperty=tests.failed
  + errorProperty=tests.failed
 failureProperty=tests.failed
  forkmode=perBatch
classpath refid=@{junit.classpath}/
assertions
  enable package=org.apache.lucene/
 
  --
  Robert Muir
  rcm...@gmail.com
 
 
 -
 To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: java-dev-h...@lucene.apache.org



-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1941) MinPayloadFunction returns 0 when only one payload is present

2010-02-14 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12833551#action_12833551
 ] 

Uwe Schindler commented on LUCENE-1941:
---

As there is no real test available (for the whole class exspect ctor, Mark 
Miller figured out yesterday) - I think the attached fix is ok at the moment 
and i would like to apply it to 2.9, 3.0 and trunk to release the pending 2.9.2 
and 3.0.1.

If nobody is against it (Erik?) i would like to apply this patch and release 
the artifacts for PMC vote today afternoon. Also I open a new issue requesting 
tests at all :-)

 MinPayloadFunction returns 0 when only one payload is present
 -

 Key: LUCENE-1941
 URL: https://issues.apache.org/jira/browse/LUCENE-1941
 Project: Lucene - Java
  Issue Type: Bug
  Components: Query/Scoring
Affects Versions: 2.9, 3.0
Reporter: Erik Hatcher
Assignee: Uwe Schindler
 Fix For: 2.9.2, 3.0.1, 3.1

 Attachments: LUCENE-1941.patch, LUCENE-1941.patch


 In some experiments with payload scoring through PayloadTermQuery, I'm seeing 
 0 returned when using MinPayloadFunction.  I believe there is a bug there.  
 No time at the moment to flesh out a unit test, but wanted to report it for 
 tracking.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Created: (LUCENE-2264) Add missing tests for PayloadXxxQuery

2010-02-14 Thread Uwe Schindler (JIRA)
Add missing tests for PayloadXxxQuery
-

 Key: LUCENE-2264
 URL: https://issues.apache.org/jira/browse/LUCENE-2264
 Project: Lucene - Java
  Issue Type: Test
  Components: Search
Reporter: Uwe Schindler
Priority: Minor
 Fix For: 3.1


This is a followup for  LUCENE-1941 and the discussion in IRC. The Payload 
queries have no real working tests, esp they are missing for the Min/Max/Avg 
functions.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1941) MinPayloadFunction returns 0 when only one payload is present

2010-02-14 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12833564#action_12833564
 ] 

Uwe Schindler commented on LUCENE-1941:
---

I created LUCENE-2264 for the tests.

I will no proceed with applying the patches and merging to 2.9/3.0.

 MinPayloadFunction returns 0 when only one payload is present
 -

 Key: LUCENE-1941
 URL: https://issues.apache.org/jira/browse/LUCENE-1941
 Project: Lucene - Java
  Issue Type: Bug
  Components: Query/Scoring
Affects Versions: 2.9, 3.0
Reporter: Erik Hatcher
Assignee: Uwe Schindler
 Fix For: 2.9.2, 3.0.1, 3.1

 Attachments: LUCENE-1941.patch, LUCENE-1941.patch


 In some experiments with payload scoring through PayloadTermQuery, I'm seeing 
 0 returned when using MinPayloadFunction.  I believe there is a bug there.  
 No time at the moment to flesh out a unit test, but wanted to report it for 
 tracking.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Issue Comment Edited: (LUCENE-1941) MinPayloadFunction returns 0 when only one payload is present

2010-02-14 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12833564#action_12833564
 ] 

Uwe Schindler edited comment on LUCENE-1941 at 2/14/10 12:52 PM:
-

I created LUCENE-2264 for the tests.

I will now proceed with applying the patches and merging to 2.9/3.0.

  was (Author: thetaphi):
I created LUCENE-2264 for the tests.

I will no proceed with applying the patches and merging to 2.9/3.0.
  
 MinPayloadFunction returns 0 when only one payload is present
 -

 Key: LUCENE-1941
 URL: https://issues.apache.org/jira/browse/LUCENE-1941
 Project: Lucene - Java
  Issue Type: Bug
  Components: Query/Scoring
Affects Versions: 2.9, 3.0
Reporter: Erik Hatcher
Assignee: Uwe Schindler
 Fix For: 2.9.2, 3.0.1, 3.1

 Attachments: LUCENE-1941.patch, LUCENE-1941.patch


 In some experiments with payload scoring through PayloadTermQuery, I'm seeing 
 0 returned when using MinPayloadFunction.  I believe there is a bug there.  
 No time at the moment to flesh out a unit test, but wanted to report it for 
 tracking.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-1941) MinPayloadFunction returns 0 when only one payload is present

2010-02-14 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1941?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-1941:
--

Attachment: LUCENE-1941.patch

Patch with CHANGES.txt in the new 3.0.1/2.9.2 section of restructured trunk 
changes.

 MinPayloadFunction returns 0 when only one payload is present
 -

 Key: LUCENE-1941
 URL: https://issues.apache.org/jira/browse/LUCENE-1941
 Project: Lucene - Java
  Issue Type: Bug
  Components: Query/Scoring
Affects Versions: 2.9, 3.0
Reporter: Erik Hatcher
Assignee: Uwe Schindler
 Fix For: 2.9.2, 3.0.1, 3.1

 Attachments: LUCENE-1941.patch, LUCENE-1941.patch, LUCENE-1941.patch


 In some experiments with payload scoring through PayloadTermQuery, I'm seeing 
 0 returned when using MinPayloadFunction.  I believe there is a bug there.  
 No time at the moment to flesh out a unit test, but wanted to report it for 
 tracking.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



RE: (LUCENE-1844) Speed up junit tests

2010-02-14 Thread Uwe Schindler
That look exciting! Too bad that I have no IntelliJ, maybe we can use that 
somehow!

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de

 -Original Message-
 From: Steven A Rowe [mailto:sar...@syr.edu]
 Sent: Sunday, February 14, 2010 4:52 PM
 To: java-dev@lucene.apache.org
 Subject: RE: (LUCENE-1844) Speed up junit tests
 
 Hi Uwe,
 
 On 02/14/2010 at 5:53 AM, Uwe Schindler wrote:
  At least we should check all core tests to not set any static
 defaults
  without try...finally! Are there any possibilities inside
  Eclipse/other-IDEs to check this?
 
 IntelliJ has something called structural search and replace (SSR) -
 it could probably do something like what you want (I've only used it
 once, so I'm afraid I can't be of much assistance figuring out an
 appropriate expression):
 
 http://www.jetbrains.com/idea/documentation/ssr.html
 
 Steve


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Resolved: (LUCENE-1941) MinPayloadFunction returns 0 when only one payload is present

2010-02-14 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1941?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler resolved LUCENE-1941.
---

Resolution: Fixed

Committed trunk revision: 910034
Committed 3.0 branch revision: 910037
Committed 2.9 branch revision: 910038

 MinPayloadFunction returns 0 when only one payload is present
 -

 Key: LUCENE-1941
 URL: https://issues.apache.org/jira/browse/LUCENE-1941
 Project: Lucene - Java
  Issue Type: Bug
  Components: Query/Scoring
Affects Versions: 2.9, 3.0
Reporter: Erik Hatcher
Assignee: Uwe Schindler
 Fix For: 2.9.2, 3.0.1, 3.1

 Attachments: LUCENE-1941.patch, LUCENE-1941.patch, LUCENE-1941.patch


 In some experiments with payload scoring through PayloadTermQuery, I'm seeing 
 0 returned when using MinPayloadFunction.  I believe there is a bug there.  
 No time at the moment to flesh out a unit test, but wanted to report it for 
 tracking.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2266) problem with edgengramtokenfilter and highlighter

2010-02-14 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12833626#action_12833626
 ] 

Uwe Schindler commented on LUCENE-2266:
---

As this patch is really simple, I have no problem with quickly putting into 
2.9.2. Robert, as we are in code freeze, I would like to apply it.

 problem with edgengramtokenfilter and highlighter
 -

 Key: LUCENE-2266
 URL: https://issues.apache.org/jira/browse/LUCENE-2266
 Project: Lucene - Java
  Issue Type: Bug
  Components: contrib/analyzers
Affects Versions: 2.9.1
Reporter: Joe Calderon
Assignee: Robert Muir
Priority: Minor
 Attachments: LUCENE-2266.patch, LUCENE-2266.patch


 i ran into a problem while using the edgengramtokenfilter, it seems to report 
 incorrect offsets when generating tokens, more specifically all the tokens 
 have offset 0 and term length as start and end, this leads to goofy 
 highlighting behavior when creating edge grams for tokens beyond the first 
 one, i created a small patch that takes into account the start of the 
 original token and adds that to the reported start/end offsets.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-2266) problem with edgengramtokenfilter and highlighter

2010-02-14 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-2266:
--

Fix Version/s: 3.1
   3.0.1
   2.9.2

 problem with edgengramtokenfilter and highlighter
 -

 Key: LUCENE-2266
 URL: https://issues.apache.org/jira/browse/LUCENE-2266
 Project: Lucene - Java
  Issue Type: Bug
  Components: contrib/analyzers
Affects Versions: 2.9.1
Reporter: Joe Calderon
Assignee: Robert Muir
Priority: Minor
 Fix For: 2.9.2, 3.0.1, 3.1

 Attachments: LUCENE-2266.patch, LUCENE-2266.patch


 i ran into a problem while using the edgengramtokenfilter, it seems to report 
 incorrect offsets when generating tokens, more specifically all the tokens 
 have offset 0 and term length as start and end, this leads to goofy 
 highlighting behavior when creating edge grams for tokens beyond the first 
 one, i created a small patch that takes into account the start of the 
 original token and adds that to the reported start/end offsets.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Resolved: (LUCENE-2266) problem with edgengramtokenfilter and highlighter

2010-02-14 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler resolved LUCENE-2266.
---

Resolution: Fixed

Committed trunk revision: 910078
Committed 3.0 revision: 910080
Committed 2.9 revision: 910082

Thanks Joe  Robert. Now I can start the PMC votes of Lucene 2.9.2 and 3.0.1!

 problem with edgengramtokenfilter and highlighter
 -

 Key: LUCENE-2266
 URL: https://issues.apache.org/jira/browse/LUCENE-2266
 Project: Lucene - Java
  Issue Type: Bug
  Components: contrib/analyzers
Affects Versions: 2.9.1
Reporter: Joe Calderon
Assignee: Robert Muir
Priority: Minor
 Fix For: 2.9.2, 3.0.1, 3.1

 Attachments: LUCENE-2266.patch, LUCENE-2266.patch


 i ran into a problem while using the edgengramtokenfilter, it seems to report 
 incorrect offsets when generating tokens, more specifically all the tokens 
 have offset 0 and term length as start and end, this leads to goofy 
 highlighting behavior when creating edge grams for tokens beyond the first 
 one, i created a small patch that takes into account the start of the 
 original token and adds that to the reported start/end offsets.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[VOTE] Lucene Java 2.9.2 and 3.0.1 release artifacts

2010-02-14 Thread Uwe Schindler
Hallo Folks,

I have posted a release candidate for both Lucene Java 2.9.2 and 3.0.1 (which 
both have the same bug fix level, functionality and release announcement), 
build from revision 910082 of the corresponding branches. Thanks for all your 
help! Please test them and give your votes until Thursday morning, as the 
scheduled release date for both versions is Friday, Feb 19th, 2010. Only votes 
from Lucene PMC are binding, but everyone
is welcome to check the release candidate and voice their approval or 
disapproval. The vote passes if at least three binding +1 votes are cast.

We planned the parallel release with one announcement because of their parallel 
development / bug fix level to emphasize that they are equal except deprecation 
removal and Java 5 since major version 3.

Please also read the attached release announcement (Open Document) and send it 
corrected back if you miss anything or want to improve my bad English :-)

You find the artifacts here:
http://people.apache.org/~uschindler/staging-area/lucene-292-301-take1-rev910082/

Maven repo:
http://people.apache.org/~uschindler/staging-area/lucene-292-301-take1-rev910082/maven/
 

The changes are here:
http://people.apache.org/~uschindler/staging-area/lucene-292-301-take1-rev910082/changes-2.9.2/Changes.html
http://people.apache.org/~uschindler/staging-area/lucene-292-301-take1-rev910082/changes-2.9.2/Contrib-Changes.html

http://people.apache.org/~uschindler/staging-area/lucene-292-301-take1-rev910082/changes-3.0.1/Changes.html
http://people.apache.org/~uschindler/staging-area/lucene-292-301-take1-rev910082/changes-3.0.1/Contrib-Changes.html

Uwe

=== Proposed Release Announcement ===

Hello Lucene users,

On behalf of the Lucene development community I would like to announce the 
release of Lucene Java versions 3.0.1 and 2.9.2:

Both releases fix bugs in the previous versions, where 2.9.2 is the last 
release working with Java 1.4, still providing all deprecated APIs of the 
Lucene Java 2.x series. 3.0.1 has the same bug fix level, but requires Java 5 
and is no longer compatible with code using deprecated APIs. The API was 
cleaned up to make use of Java 5's generics, varargs, enums, and autoboxing. 
New users of Lucene are advised to use version 3.0.1 for new developments, 
because it has a clean, type safe new API. Users upgrading from 2.9.x can now 
remove unnecessary casts and add generics to their code, too.

Important improvements in these releases are a increased maximum number of 
unique terms in each index segment. They also add fixes in IndexWriter’s commit 
and lost document deletes in near real-time indexing. Also lots of bugs in 
Contrib’s Analyzers package were fixed. Additionally, the 3.0.1 release 
restored some public methods, that get lost during deprecation removal. If you 
are using Lucene in a web application environment, you will notice that the new 
Attribute-based TokenStream API now works correct with different class loaders.
Both releases are fully compatible with the corresponding previous versions. We 
strongly recommend upgrading to 2.9.2 if you are using 2.9.1 or 2.9.0; and to 
3.0.1 if you are using 3.0.0.

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de




release-note.odt
Description: application/vnd.oasis.opendocument.text

-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Updated: (LUCENE-2144) InstantiatedIndexReader does not handle #termDocs(null) correct (AllTermDocs)

2010-02-13 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-2144:
--

Fix Version/s: 2.9.2

merge back also to 2.9.2

 InstantiatedIndexReader does not handle #termDocs(null) correct (AllTermDocs)
 -

 Key: LUCENE-2144
 URL: https://issues.apache.org/jira/browse/LUCENE-2144
 Project: Lucene - Java
  Issue Type: Bug
  Components: contrib/*
Affects Versions: 2.9, 2.9.1, 3.0
Reporter: Karl Wettin
Assignee: Michael McCandless
Priority: Critical
 Fix For: 2.9.2, 3.0.1, 3.1

 Attachments: LUCENE-2144-30.patch, LUCENE-2144.txt


 This patch contains core changes so someone else needs to commit it.
 Due to the incompatible #termDocs(null) behaviour at least MatchAllDocsQuery, 
 FieldCacheRangeFilter and ValueSourceQuery fails using II since 2.9.
 AllTermDocs now has a superclass, AbstractAllTermDocs that also 
 InstantiatedAllTermDocs extend.
 Also:
  * II-tests made less plausable to pass on future incompatible changes to 
 TermDocs and TermEnum
  * IITermDocs#skipTo and #next mimics the behaviour of document posisioning 
 from SegmentTermDocs#dito when returning false
  * II now uses BitVector rather than sets for deleted documents

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-2165) SnowballAnalyzer lacks a constructor that takes a Set of Stop Words

2010-02-13 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-2165:
--

Fix Version/s: 2.9.2

backport

 SnowballAnalyzer lacks a constructor that takes a Set of Stop Words
 ---

 Key: LUCENE-2165
 URL: https://issues.apache.org/jira/browse/LUCENE-2165
 Project: Lucene - Java
  Issue Type: Bug
  Components: contrib/analyzers
Affects Versions: 2.9.1, 3.0
Reporter: Nick Burch
Assignee: Robert Muir
Priority: Minor
 Fix For: 2.9.2, 3.0.1, 3.1

 Attachments: LUCENE-2165.patch


 As discussed on the java-user list, the SnowballAnalyzer has been updated to 
 use a Set of stop words. However, there is no constructor which accepts a 
 Set, there's only the original String[] one
 This is an issue, because most of the common sources of stop words (eg 
 StopAnalyzer) have deprecated their String[] stop word lists, and moved over 
 to Sets (eg StopAnalyzer.ENGLISH_STOP_WORDS_SET). So, for now, you either 
 have to use a deprecated field on StopAnalyzer, or manually turn the Set into 
 an array so you can pass it to the SnowballAnalyzer
 I would suggest that a constructor is added to SnowballAnalyzer which accepts 
 a Set. Not sure if the old String[] one should be deprecated or not.
 A sample patch against 2.9.1 to add the constructor is:
 --- SnowballAnalyzer.java.orig  2009-12-15 11:14:08.0 +
 +++ SnowballAnalyzer.java   2009-12-14 12:58:37.0 +
 @@ -67,6 +67,12 @@
  stopSet = StopFilter.makeStopSet(stopWords);
}
  
 +  /** Builds the named analyzer with the given stop words. */
 +  public SnowballAnalyzer(Version matchVersion, String name, Set 
 stopWordsSet) {
 +this(matchVersion, name);
 +stopSet = stopWordsSet;
 +  }
 +

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Assigned: (LUCENE-1941) MinPayloadFunction returns 0 when only one payload is present

2010-02-13 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1941?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler reassigned LUCENE-1941:
-

Assignee: Uwe Schindler

 MinPayloadFunction returns 0 when only one payload is present
 -

 Key: LUCENE-1941
 URL: https://issues.apache.org/jira/browse/LUCENE-1941
 Project: Lucene - Java
  Issue Type: Bug
  Components: Query/Scoring
Affects Versions: 2.9, 3.0
Reporter: Erik Hatcher
Assignee: Uwe Schindler
 Fix For: 2.9.2, 3.0.1, 3.1

 Attachments: LUCENE-1941.patch, LUCENE-1941.patch


 In some experiments with payload scoring through PayloadTermQuery, I'm seeing 
 0 returned when using MinPayloadFunction.  I believe there is a bug there.  
 No time at the moment to flesh out a unit test, but wanted to report it for 
 tracking.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



Release of 2.9.2 and 3.0.1 in progress - commit freeze

2010-02-13 Thread Uwe Schindler
Hi all,

the release of 2.9.2 and 3.0.1 is in progress. I merged all CHANGES.txt 
entries, merged remaining bugfixes and prepared the version number in both 
branches. The only missing fix is 
https://issues.apache.org/jira/browse/LUCENE-1941, which is in progress, I will 
backport and commit when the tests are finished.

Please do not commit anything to the branches and trunk, if it is fix-for 2.9.x 
or 3.0.x. All other changes can be committed.

I will create the final artifacts for vote tomorrow and plan to release on 
Friday.

Uwe

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de



-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



RE: Release of 2.9.2 and 3.0.1 in progress - commit freeze

2010-02-13 Thread Uwe Schindler
 Please do not commit anything to the branches and trunk, if it is fix-
 for 2.9.x or 3.0.x. All other changes can be committed.

Of course only changes in *trunk* may be committed that are not also fix-for 
2.9 and 3.0. :-)


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1941) MinPayloadFunction returns 0 when only one payload is present

2010-02-12 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12832928#action_12832928
 ] 

Uwe Schindler commented on LUCENE-1941:
---

Hi Erik,

I want to release 2.9.2 and 3.0.1, is there any problem? 
I would change this to fix 3.1 only, else it should be fix for 3.0.1 and 2.9.2 
both.

 MinPayloadFunction returns 0 when only one payload is present
 -

 Key: LUCENE-1941
 URL: https://issues.apache.org/jira/browse/LUCENE-1941
 Project: Lucene - Java
  Issue Type: Bug
  Components: Query/Scoring
Affects Versions: 2.9
Reporter: Erik Hatcher
 Fix For: 3.0.1, 3.1


 In some experiments with payload scoring through PayloadTermQuery, I'm seeing 
 0 returned when using MinPayloadFunction.  I believe there is a bug there.  
 No time at the moment to flesh out a unit test, but wanted to report it for 
 tracking.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-1941) MinPayloadFunction returns 0 when only one payload is present

2010-02-12 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1941?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-1941:
--

Affects Version/s: 3.0
Fix Version/s: 2.9.2

 MinPayloadFunction returns 0 when only one payload is present
 -

 Key: LUCENE-1941
 URL: https://issues.apache.org/jira/browse/LUCENE-1941
 Project: Lucene - Java
  Issue Type: Bug
  Components: Query/Scoring
Affects Versions: 2.9, 3.0
Reporter: Erik Hatcher
 Fix For: 2.9.2, 3.0.1, 3.1


 In some experiments with payload scoring through PayloadTermQuery, I'm seeing 
 0 returned when using MinPayloadFunction.  I believe there is a bug there.  
 No time at the moment to flesh out a unit test, but wanted to report it for 
 tracking.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2080) Improve the documentation of Version

2010-02-12 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12832949#action_12832949
 ] 

Uwe Schindler commented on LUCENE-2080:
---

We should add a note in CHANGES.txt in 3.0 and 2.9 branch as this is an API 
change.

Something like: Deprecated Version.LUCENE_CURRENT constant... with the reason 
phrases from above

 Improve the documentation of Version
 

 Key: LUCENE-2080
 URL: https://issues.apache.org/jira/browse/LUCENE-2080
 Project: Lucene - Java
  Issue Type: Task
  Components: Javadocs
Reporter: Robert Muir
Assignee: Robert Muir
Priority: Trivial
 Fix For: 2.9.2, 3.0, 3.1

 Attachments: LUCENE-2080.patch, LUCENE-2080.patch, LUCENE-2080.patch, 
 LUCENE-2080.patch


 In my opinion, we should elaborate more on the effects of changing the 
 Version parameter.
 Particularly, changing this value, even if you recompile your code, likely 
 involves reindexing your data.
 I do not think this is adequately clear from the current javadocs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2260) AttributeSource holds strong reference to class instances and prevents unloading e.g. in Solr if webapplication reload and custom attributes in separate classloaders ar

2010-02-12 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12832951#action_12832951
 ] 

Uwe Schindler commented on LUCENE-2260:
---

I'll commit this soon!

 AttributeSource holds strong reference to class instances and prevents 
 unloading e.g. in Solr if webapplication reload and custom attributes in 
 separate classloaders are used (e.g. in the Solr plugins classloader)
 -

 Key: LUCENE-2260
 URL: https://issues.apache.org/jira/browse/LUCENE-2260
 Project: Lucene - Java
  Issue Type: Bug
Affects Versions: 2.9.1, 3.0
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 2.9.2, 3.0.1, 3.1

 Attachments: LUCENE-2260-lucene29.patch, LUCENE-2260.patch, 
 LUCENE-2260.patch


 When working on the dynmaic proxy classes using cglib/javaassist i recognized 
 a problem in the caching code inside AttributeSource:
 - AttributeSource has a static (!) cache map that holds implementation 
 classes for attributes to be faster on creating new attributes (reflection 
 cost)
 - AttributeSource has a static (!) cache map that holds a list of all 
 interfaces implemented by a specific AttributeImpl
 Also:
 - VirtualMethod in 3.1 hold a map of implementation distances keyed by 
 subclasses of the deprecated API
 Both have the problem that this strong reference is inside Lucene's 
 classloader and so persists as long as lucene lives. The classes referenced 
 can never be unloaded therefore, which would be fine if all live in the same 
 classloader. As soon as the Attribute or implementation class or the subclass 
 of the deprecated API are loaded by a different classloder (e.g. Lucene lives 
 in bootclasspath of tomcat, but lucene-consumer with custom attributes lives 
 in a webapp), they can never be unloaded, because a reference exists.
 Libs like CGLIB or JavaAssist or JDK's reflect.Proxy have a similar cache for 
 generated class files. They also manage this by a WeakHashMap. The cache will 
 always work perfect and no class will be evicted without reason, as classes 
 are only unloaded when the classloader goes and this will only happen on 
 request (e.g. by Tomcat).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Resolved: (LUCENE-2260) AttributeSource holds strong reference to class instances and prevents unloading e.g. in Solr if webapplication reload and custom attributes in separate classloaders are

2010-02-12 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler resolved LUCENE-2260.
---

Resolution: Fixed

Committed trunk revision: 909360
Committed 2.9 revision: 909361
Committed 3.0 revision: 909365

 AttributeSource holds strong reference to class instances and prevents 
 unloading e.g. in Solr if webapplication reload and custom attributes in 
 separate classloaders are used (e.g. in the Solr plugins classloader)
 -

 Key: LUCENE-2260
 URL: https://issues.apache.org/jira/browse/LUCENE-2260
 Project: Lucene - Java
  Issue Type: Bug
Affects Versions: 2.9.1, 3.0
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 2.9.2, 3.0.1, 3.1

 Attachments: LUCENE-2260-lucene29.patch, LUCENE-2260.patch, 
 LUCENE-2260.patch


 When working on the dynmaic proxy classes using cglib/javaassist i recognized 
 a problem in the caching code inside AttributeSource:
 - AttributeSource has a static (!) cache map that holds implementation 
 classes for attributes to be faster on creating new attributes (reflection 
 cost)
 - AttributeSource has a static (!) cache map that holds a list of all 
 interfaces implemented by a specific AttributeImpl
 Also:
 - VirtualMethod in 3.1 hold a map of implementation distances keyed by 
 subclasses of the deprecated API
 Both have the problem that this strong reference is inside Lucene's 
 classloader and so persists as long as lucene lives. The classes referenced 
 can never be unloaded therefore, which would be fine if all live in the same 
 classloader. As soon as the Attribute or implementation class or the subclass 
 of the deprecated API are loaded by a different classloder (e.g. Lucene lives 
 in bootclasspath of tomcat, but lucene-consumer with custom attributes lives 
 in a webapp), they can never be unloaded, because a reference exists.
 Libs like CGLIB or JavaAssist or JDK's reflect.Proxy have a similar cache for 
 generated class files. They also manage this by a WeakHashMap. The cache will 
 always work perfect and no class will be evicted without reason, as classes 
 are only unloaded when the classloader goes and this will only happen on 
 request (e.g. by Tomcat).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-2154) Need a clean way for Dir/MultiReader to merge the AttributeSources of the sub-readers

2010-02-12 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2154?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-2154:
--

Attachment: LUCENE-2154-javassist.patch
LUCENE-2154-cglib.patch

Here the last CGLIB patch for reference.

Now the real cool class created using JAVASSIST [http://www.javassist.org/]:
You have to place the latest javassist.jar (Mozilla/LGPL licensed) in the lib/ 
folder and apply the patch. What it does is the fastest proxy we can think of:
It creates a subclass of ProxyAttributeImpl that implements all methods of the 
interface natively in bytecode using JAVASSIST's bytecode generation tools (a 
subset of the Java language spec).

The micro-benchmark shows, no difference between proxied and native method - as 
hotspot removes the extra method call.

With Javassist it would even be possible to create classes that implement our 
interfaces around simple fields that are set by get/setters. Just like 
Eclipse's create get/set around a private field. That would be really cool. Or 
we could create combining attributes on the fly, Michael Busch would be 
excited. All *Impl classes we currently have would be almost obsolete (except 
TermAttributeImpl, which is rather complex). We could also create dynamic State 
classes for capturing state...

Nice, but a little bit hackish. Maybe we put this first into contrib and supply 
a ConcenatingTokenStream as demo impl and also other Solr TokenStreams that are 
no longer easy with the Attributes without proxies (Robert listed some).

 Need a clean way for Dir/MultiReader to merge the AttributeSources of the 
 sub-readers
 ---

 Key: LUCENE-2154
 URL: https://issues.apache.org/jira/browse/LUCENE-2154
 Project: Lucene - Java
  Issue Type: Bug
  Components: Index
Affects Versions: Flex Branch
Reporter: Michael McCandless
 Fix For: Flex Branch

 Attachments: LUCENE-2154-cglib.patch, LUCENE-2154-javassist.patch, 
 LUCENE-2154.patch, LUCENE-2154.patch


 The flex API allows extensibility at the Fields/Terms/Docs/PositionsEnum 
 levels, for a codec to set custom attrs.
 But, it's currently broken for Dir/MultiReader, which must somehow share 
 attrs across all the sub-readers.  Somehow we must make a single attr source, 
 and tell each sub-reader's enum to use that instead of creating its own.  
 Hopefully Uwe can work some magic here :)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-2154) Need a clean way for Dir/MultiReader to merge the AttributeSources of the sub-readers

2010-02-12 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2154?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-2154:
--

Attachment: (was: LUCENE-2154-javassist.patch)

 Need a clean way for Dir/MultiReader to merge the AttributeSources of the 
 sub-readers
 ---

 Key: LUCENE-2154
 URL: https://issues.apache.org/jira/browse/LUCENE-2154
 Project: Lucene - Java
  Issue Type: Bug
  Components: Index
Affects Versions: Flex Branch
Reporter: Michael McCandless
 Fix For: Flex Branch

 Attachments: LUCENE-2154-cglib.patch, LUCENE-2154-javassist.patch, 
 LUCENE-2154.patch, LUCENE-2154.patch


 The flex API allows extensibility at the Fields/Terms/Docs/PositionsEnum 
 levels, for a codec to set custom attrs.
 But, it's currently broken for Dir/MultiReader, which must somehow share 
 attrs across all the sub-readers.  Somehow we must make a single attr source, 
 and tell each sub-reader's enum to use that instead of creating its own.  
 Hopefully Uwe can work some magic here :)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-2154) Need a clean way for Dir/MultiReader to merge the AttributeSources of the sub-readers

2010-02-12 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2154?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-2154:
--

Attachment: LUCENE-2154-javassist.patch

Better patch without classloader problems.

 Need a clean way for Dir/MultiReader to merge the AttributeSources of the 
 sub-readers
 ---

 Key: LUCENE-2154
 URL: https://issues.apache.org/jira/browse/LUCENE-2154
 Project: Lucene - Java
  Issue Type: Bug
  Components: Index
Affects Versions: Flex Branch
Reporter: Michael McCandless
 Fix For: Flex Branch

 Attachments: LUCENE-2154-cglib.patch, LUCENE-2154-javassist.patch, 
 LUCENE-2154.patch, LUCENE-2154.patch


 The flex API allows extensibility at the Fields/Terms/Docs/PositionsEnum 
 levels, for a codec to set custom attrs.
 But, it's currently broken for Dir/MultiReader, which must somehow share 
 attrs across all the sub-readers.  Somehow we must make a single attr source, 
 and tell each sub-reader's enum to use that instead of creating its own.  
 Hopefully Uwe can work some magic here :)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2261) configurable MultiTermQuery TopTermsScoringBooleanRewrite pq size

2010-02-12 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12833181#action_12833181
 ] 

Uwe Schindler commented on LUCENE-2261:
---

Hi Robert, patch looks good, all tests pass, nothing to complain from the MTQ 
police :-)

There is only one thing unrelated to that issue: It makes no sense to declare 
IllegalArgExceptions as they are unchecked. I would remove them, else the 
compiler does.

 configurable MultiTermQuery TopTermsScoringBooleanRewrite pq size
 -

 Key: LUCENE-2261
 URL: https://issues.apache.org/jira/browse/LUCENE-2261
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Reporter: Robert Muir
Assignee: Robert Muir
Priority: Minor
 Fix For: Flex Branch, 3.1

 Attachments: LUCENE-2261.patch, LUCENE-2261.patch, LUCENE-2261.patch, 
 LUCENE-2261.patch


 MultiTermQuery has a TopTermsScoringBooleanRewrite, that uses a priority 
 queue to expand the query to the top-N terms.
 currently N is hardcoded at BooleanQuery.getMaxClauseCount(), but it would be 
 nice to be able to set this for top-N MultiTermQueries: e.g. expand a fuzzy 
 query to at most only the 50 closest terms.
 at a glance it seems one way would be to expose TopTermsScoringBooleanRewrite 
 (it is private right now) and add a ctor to it, so a MultiTermQuery can 
 instantiate one with its own limit.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-2154) Need a clean way for Dir/MultiReader to merge the AttributeSources of the sub-readers

2010-02-12 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2154?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-2154:
--

Attachment: LUCENE-2154-javassist.patch

More cool, less casts, more speed.

 Need a clean way for Dir/MultiReader to merge the AttributeSources of the 
 sub-readers
 ---

 Key: LUCENE-2154
 URL: https://issues.apache.org/jira/browse/LUCENE-2154
 Project: Lucene - Java
  Issue Type: Bug
  Components: Index
Affects Versions: Flex Branch
Reporter: Michael McCandless
 Fix For: Flex Branch

 Attachments: LUCENE-2154-cglib.patch, LUCENE-2154-javassist.patch, 
 LUCENE-2154-javassist.patch, LUCENE-2154.patch, LUCENE-2154.patch


 The flex API allows extensibility at the Fields/Terms/Docs/PositionsEnum 
 levels, for a codec to set custom attrs.
 But, it's currently broken for Dir/MultiReader, which must somehow share 
 attrs across all the sub-readers.  Somehow we must make a single attr source, 
 and tell each sub-reader's enum to use that instead of creating its own.  
 Hopefully Uwe can work some magic here :)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



RE: Build failed in Hudson: Lucene-trunk #1091

2010-02-11 Thread Uwe Schindler
Really cool:
This time we hit the failure during the clover run:

[junit] Testsuite: org.apache.lucene.index.TestIndexWriterMergePolicy
[junit] Tests run: 6, Failures: 1, Errors: 0, Time elapsed: 28.519 sec
[junit] 
[junit] Testcase: 
testMaxBufferedDocsChange(org.apache.lucene.index.TestIndexWriterMergePolicy):  
  FAILED
[junit] maxMergeDocs=2147483647; numSegments=11; upperBound=10; 
mergeFactor=10; segs=_65:c5950 _5t:c10-_32 _5u:c10-_32 _5v:c10-_32 
_5w:c10-_32 _5x:c10-_32 _5y:c10-_32 _5z:c10-_32 _60:c10-_32 _61:c10-_32 
_62:c9-_32 _64:c1-_62
[junit] junit.framework.AssertionFailedError: maxMergeDocs=2147483647; 
numSegments=11; upperBound=10; mergeFactor=10; segs=_65:c5950 _5t:c10-_32 
_5u:c10-_32 _5v:c10-_32 _5w:c10-_32 _5x:c10-_32 _5y:c10-_32 _5z:c10-_32 
_60:c10-_32 _61:c10-_32 _62:c9-_32 _64:c1-_62
[junit] at 
org.apache.lucene.index.TestIndexWriterMergePolicy.checkInvariants(TestIndexWriterMergePolicy.java:234)
[junit] at 
org.apache.lucene.index.TestIndexWriterMergePolicy.__CLR2_6_3zf7i0317qu(TestIndexWriterMergePolicy.java:164)
[junit] at 
org.apache.lucene.index.TestIndexWriterMergePolicy.testMaxBufferedDocsChange(TestIndexWriterMergePolicy.java:125)
[junit] at 
org.apache.lucene.util.LuceneTestCase.runBare(LuceneTestCase.java:214)
[junit] 
[junit] 
[junit] Test org.apache.lucene.index.TestIndexWriterMergePolicy FAILED


We also get the exact location of the failure:
http://hudson.zones.apache.org/hudson/job/Lucene-trunk/1091/clover-report/org/apache/lucene/index/TestIndexWriterMergePolicy.html?line=125#src-125

And you can see which lines are called how often until the failure occurred!

Uwe

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de


 -Original Message-
 From: Apache Hudson Server [mailto:hud...@hudson.zones.apache.org]
 Sent: Thursday, February 11, 2010 5:16 AM
 To: java-dev@lucene.apache.org
 Subject: Build failed in Hudson: Lucene-trunk #1091
 
 See http://hudson.zones.apache.org/hudson/job/Lucene-
 trunk/1091/changes
 
 Changes:
 
 [uschindler] LUCENE-2248: Change core tests to use a global Version
 constant
 
 [uschindler] LUCENE-2258: Remove unneeded synchronization in
 FuzzyTermEnum
 
 --
 [...truncated 23095 lines...]
   [javadoc] Loading source files for package
 org.apache.lucene.queryParser.standard...
   [javadoc] Loading source files for package
 org.apache.lucene.queryParser.standard.builders...
   [javadoc] Loading source files for package
 org.apache.lucene.queryParser.standard.config...
   [javadoc] Loading source files for package
 org.apache.lucene.queryParser.standard.nodes...
   [javadoc] Loading source files for package
 org.apache.lucene.queryParser.standard.parser...
   [javadoc] Loading source files for package
 org.apache.lucene.queryParser.standard.processors...
   [javadoc] Constructing Javadoc information...
   [javadoc] Standard Doclet version 1.5.0_22
   [javadoc] Building tree for all the packages and classes...
   [javadoc] Building index for all the packages and classes...
   [javadoc] Building index for all classes...
   [javadoc] Generating
 http://hudson.zones.apache.org/hudson/job/Lucene-
 trunk/ws/trunk/build/docs/api/contrib-queryparser/stylesheet.css...
   [javadoc] Note: Custom tags that were not seen:
 @lucene.experimental, @lucene.internal
   [jar] Building jar:
 http://hudson.zones.apache.org/hudson/job/Lucene-
 trunk/ws/trunk/build/contrib/queryparser/lucene-queryparser-2010-02-
 11_02-03-57-javadoc.jar
  [echo] Building regex...
 
 javadocs:
 [mkdir] Created dir:
 http://hudson.zones.apache.org/hudson/job/Lucene-
 trunk/ws/trunk/build/docs/api/contrib-regex
   [javadoc] Generating Javadoc
   [javadoc] Javadoc execution
   [javadoc] Loading source files for package
 org.apache.lucene.search.regex...
   [javadoc] Loading source files for package org.apache.regexp...
   [javadoc] Constructing Javadoc information...
   [javadoc] Standard Doclet version 1.5.0_22
   [javadoc] Building tree for all the packages and classes...
   [javadoc] Building index for all the packages and classes...
   [javadoc] Building index for all classes...
   [javadoc] Generating
 http://hudson.zones.apache.org/hudson/job/Lucene-
 trunk/ws/trunk/build/docs/api/contrib-regex/stylesheet.css...
   [javadoc] Note: Custom tags that were not seen:
 @lucene.experimental, @lucene.internal
   [jar] Building jar:
 http://hudson.zones.apache.org/hudson/job/Lucene-
 trunk/ws/trunk/build/contrib/regex/lucene-regex-2010-02-11_02-03-57-
 javadoc.jar
  [echo] Building remote...
 
 javadocs:
 [mkdir] Created dir:
 http://hudson.zones.apache.org/hudson/job/Lucene-
 trunk/ws/trunk/build/docs/api/contrib-remote
   [javadoc] Generating Javadoc
   [javadoc] Javadoc execution
   [javadoc] Loading source files for package
 org.apache.lucene.search...
   [javadoc] Constructing Javadoc information

[jira] Updated: (LUCENE-2154) Need a clean way for Dir/MultiReader to merge the AttributeSources of the sub-readers

2010-02-11 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2154?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-2154:
--

Attachment: LUCENE-2154.patch

Here is a first patch about cglib-generated proxy attributes.

In IRC we found out yesterday, that the proposed idea to share the attributes 
accross all Multi*Enums would result in problems as the call to next() on any 
sub-enum would overwrite the contents of the attributes of the previous 
sub-enum which would make TermsEnum not working (because e.g. TermsEnum looks 
forward by calling next() an all sub-enums and choosing the lowest term to 
return - after calling each enums next() the attributes of the first enums 
cannot be restored without captureState  co, as overwritten by the next() call 
to the last enum).

This patch needs cglib-nodep-2.2.jar put into the lib-folder of the checkout 
[http://sourceforge.net/projects/cglib/files/cglib2/2.2/cglib-nodep-2.2.jar/download].

It contains a test and that shows how the usage is. The central part is cglib's 
Enhancer that creates a dynamic class extending ProxyAttributeImpl (which 
defines the general AttributeImpl methods delegating to the delegate) and 
implementing the requested Attribute interface using a MethodInterceptor.

Please note: This uses no reflection (only during in-memory class file 
creation, which is only run one time on loading the proxy class). The proxy 
implements MethodInterceptor and uses the fast MethodProxy class (which is also 
generated by cglib for each proxied method, too) and can invoke the delegated 
method directly (without reflection) on the delegate.

The test verifies everything works and also compares speed by using a 
TermAttribute natively and proxied. The speed is lower (which is not caused by 
reflection, but by the MethodInterceptor creating an array of parameters and 
boxing/unboxing native parameters into the Object[]), but for the testcase I 
have seen about only  50% more time needed.

The generated classes are cached and reused (like DEFAULT_ATTRIBUTE_FACTORY 
does).

To get maximum speed and no external libraries, the code implemented by 
Enhancer can be rewritten natively using the Apache Harmony 
java.lang.reflect.Proxy implementation source code as basis. The hardest part 
in generating bytecode is the ConstantPool in class files. But as the proxy 
methods are simply delegating and no magic like boxing/unboxing is needed, the 
generated bytecode is rather simple.

One other use-case for these proxies is AppendingTokenStream, which is not 
possible since 3.0 without captureState (in old TS API it was possible, because 
you could reuse the same TokenInstance even over the appended streams). In the 
new TS api, the appending stream must have a view on the attributes of the 
current consuming sub-stream.

 Need a clean way for Dir/MultiReader to merge the AttributeSources of the 
 sub-readers
 ---

 Key: LUCENE-2154
 URL: https://issues.apache.org/jira/browse/LUCENE-2154
 Project: Lucene - Java
  Issue Type: Bug
  Components: Index
Affects Versions: Flex Branch
Reporter: Michael McCandless
 Fix For: Flex Branch

 Attachments: LUCENE-2154.patch


 The flex API allows extensibility at the Fields/Terms/Docs/PositionsEnum 
 levels, for a codec to set custom attrs.
 But, it's currently broken for Dir/MultiReader, which must somehow share 
 attrs across all the sub-readers.  Somehow we must make a single attr source, 
 and tell each sub-reader's enum to use that instead of creating its own.  
 Hopefully Uwe can work some magic here :)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-2154) Need a clean way for Dir/MultiReader to merge the AttributeSources of the sub-readers

2010-02-11 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2154?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-2154:
--

Attachment: LUCENE-2154.patch

I had some more fun. Made ProxyAttributeSource non-final and added class name 
policy to also contain the corresponding interface (to make stack traces on 
errors nicer).

Here the example output:
{noformat}
[junit] DEBUG: Created class 
org.apache.lucene.util.ProxyAttributeSource$ProxyAttributeImpl$$TermAttribute$$EnhancerByCGLIB$$6100bdf9
 for attribute org.apache.lucene.analysis.tokenattributes.TermAttribute
[junit] DEBUG: Created class 
org.apache.lucene.util.ProxyAttributeSource$ProxyAttributeImpl$$TypeAttribute$$EnhancerByCGLIB$$6f89c3ff
 for attribute org.apache.lucene.analysis.tokenattributes.TypeAttribute
[junit] DEBUG: Created class 
org.apache.lucene.util.ProxyAttributeSource$ProxyAttributeImpl$$FlagsAttribute$$EnhancerByCGLIB$$4668733c
 for attribute org.apache.lucene.analysis.tokenattributes.FlagsAttribute
[junit] Time taken using 
org.apache.lucene.analysis.tokenattributes.TermAttributeImpl:
[junit]   1476.090658 ms for 1000 iterations
[junit] Time taken using 
org.apache.lucene.util.ProxyAttributeSource$ProxyAttributeImpl$$TermAttribute$$EnhancerByCGLIB$$6100bdf
9:
[junit]   1881.295734 ms for 1000 iterations
{noformat}

 Need a clean way for Dir/MultiReader to merge the AttributeSources of the 
 sub-readers
 ---

 Key: LUCENE-2154
 URL: https://issues.apache.org/jira/browse/LUCENE-2154
 Project: Lucene - Java
  Issue Type: Bug
  Components: Index
Affects Versions: Flex Branch
Reporter: Michael McCandless
 Fix For: Flex Branch

 Attachments: LUCENE-2154.patch, LUCENE-2154.patch


 The flex API allows extensibility at the Fields/Terms/Docs/PositionsEnum 
 levels, for a codec to set custom attrs.
 But, it's currently broken for Dir/MultiReader, which must somehow share 
 attrs across all the sub-readers.  Somehow we must make a single attr source, 
 and tell each sub-reader's enum to use that instead of creating its own.  
 Hopefully Uwe can work some magic here :)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Created: (LUCENE-2260) AttributeSource holds strong reference to class instances and prevents unloading e.g. in Solr if webapplication reload and custom attributes in separate classloaders are

2010-02-11 Thread Uwe Schindler (JIRA)
AttributeSource holds strong reference to class instances and prevents 
unloading e.g. in Solr if webapplication reload and custom attributes in 
separate classloaders are used (e.g. in the Solr plugins classloader)
-

 Key: LUCENE-2260
 URL: https://issues.apache.org/jira/browse/LUCENE-2260
 Project: Lucene - Java
  Issue Type: Bug
Affects Versions: 3.0, 2.9.1
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 2.9.2, 3.0.1, 3.1


When working on the dynmaic proxy classes using cglib/javaassist i recognized a 
problem in the caching code inside AttributeSource:
- AttributeSource has a static (!) cache map that holds implementation classes 
for attributes to be faster on creating new attributes (reflection cost)
- AttributeSource has a static (!) cache map that holds a list of all 
interfaces implemented by a specific AttributeImpl

Also:
- VirtualMethod in 3.1 hold a map of implementation distances keyed by 
subclasses of the deprecated API

Both have the problem that this strong reference is inside Lucene's classloader 
and so persists as long as lucene lives. The classes referenced can never be 
unloaded therefore, which would be fine if all live in the same classloader. As 
soon as the Attribute or implementation class or the subclass of the deprecated 
API are loaded by a different classloder (e.g. Lucene lives in bootclasspath of 
tomcat, but lucene-consumer with custom attributes lives in a webapp), they can 
never be unloaded, because a reference exists.

Libs like CGLIB or JavaAssist or JDK's reflect.Proxy have a similar cache for 
generated class files. They also manage this by a WeakHashMap. The cache will 
always work perfect and no class will be evicted without reason, as classes are 
only unloaded when the classloader goes and this will only happen on request 
(e.g. by Tomcat).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-2260) AttributeSource holds strong reference to class instances and prevents unloading e.g. in Solr if webapplication reload and custom attributes in separate classloaders are

2010-02-11 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-2260:
--

Attachment: LUCENE-2260.patch

Attached patch. I will commit this in a day and also merge to 2.9 and 3.0 
(without VirtualMethod) as this is a resource leak. This problem is similar to 
LUCENE-2182.

 AttributeSource holds strong reference to class instances and prevents 
 unloading e.g. in Solr if webapplication reload and custom attributes in 
 separate classloaders are used (e.g. in the Solr plugins classloader)
 -

 Key: LUCENE-2260
 URL: https://issues.apache.org/jira/browse/LUCENE-2260
 Project: Lucene - Java
  Issue Type: Bug
Affects Versions: 2.9.1, 3.0
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 2.9.2, 3.0.1, 3.1

 Attachments: LUCENE-2260.patch


 When working on the dynmaic proxy classes using cglib/javaassist i recognized 
 a problem in the caching code inside AttributeSource:
 - AttributeSource has a static (!) cache map that holds implementation 
 classes for attributes to be faster on creating new attributes (reflection 
 cost)
 - AttributeSource has a static (!) cache map that holds a list of all 
 interfaces implemented by a specific AttributeImpl
 Also:
 - VirtualMethod in 3.1 hold a map of implementation distances keyed by 
 subclasses of the deprecated API
 Both have the problem that this strong reference is inside Lucene's 
 classloader and so persists as long as lucene lives. The classes referenced 
 can never be unloaded therefore, which would be fine if all live in the same 
 classloader. As soon as the Attribute or implementation class or the subclass 
 of the deprecated API are loaded by a different classloder (e.g. Lucene lives 
 in bootclasspath of tomcat, but lucene-consumer with custom attributes lives 
 in a webapp), they can never be unloaded, because a reference exists.
 Libs like CGLIB or JavaAssist or JDK's reflect.Proxy have a similar cache for 
 generated class files. They also manage this by a WeakHashMap. The cache will 
 always work perfect and no class will be evicted without reason, as classes 
 are only unloaded when the classloader goes and this will only happen on 
 request (e.g. by Tomcat).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-2260) AttributeSource holds strong reference to class instances and prevents unloading e.g. in Solr if webapplication reload and custom attributes in separate classloaders are

2010-02-11 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-2260:
--

Attachment: LUCENE-2260.patch

Improved patch, now all class references are weak. The assumption on the 
WeakReference inside addAttributeImpl is always != null is true because the 
code has a strong reference on the implementing class.

 AttributeSource holds strong reference to class instances and prevents 
 unloading e.g. in Solr if webapplication reload and custom attributes in 
 separate classloaders are used (e.g. in the Solr plugins classloader)
 -

 Key: LUCENE-2260
 URL: https://issues.apache.org/jira/browse/LUCENE-2260
 Project: Lucene - Java
  Issue Type: Bug
Affects Versions: 2.9.1, 3.0
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 2.9.2, 3.0.1, 3.1

 Attachments: LUCENE-2260.patch, LUCENE-2260.patch


 When working on the dynmaic proxy classes using cglib/javaassist i recognized 
 a problem in the caching code inside AttributeSource:
 - AttributeSource has a static (!) cache map that holds implementation 
 classes for attributes to be faster on creating new attributes (reflection 
 cost)
 - AttributeSource has a static (!) cache map that holds a list of all 
 interfaces implemented by a specific AttributeImpl
 Also:
 - VirtualMethod in 3.1 hold a map of implementation distances keyed by 
 subclasses of the deprecated API
 Both have the problem that this strong reference is inside Lucene's 
 classloader and so persists as long as lucene lives. The classes referenced 
 can never be unloaded therefore, which would be fine if all live in the same 
 classloader. As soon as the Attribute or implementation class or the subclass 
 of the deprecated API are loaded by a different classloder (e.g. Lucene lives 
 in bootclasspath of tomcat, but lucene-consumer with custom attributes lives 
 in a webapp), they can never be unloaded, because a reference exists.
 Libs like CGLIB or JavaAssist or JDK's reflect.Proxy have a similar cache for 
 generated class files. They also manage this by a WeakHashMap. The cache will 
 always work perfect and no class will be evicted without reason, as classes 
 are only unloaded when the classloader goes and this will only happen on 
 request (e.g. by Tomcat).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-2260) AttributeSource holds strong reference to class instances and prevents unloading e.g. in Solr if webapplication reload and custom attributes in separate classloaders are

2010-02-11 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-2260:
--

Attachment: LUCENE-2260-lucene29.patch

Patch for 2.9 branch (without Java 5 generics)

 AttributeSource holds strong reference to class instances and prevents 
 unloading e.g. in Solr if webapplication reload and custom attributes in 
 separate classloaders are used (e.g. in the Solr plugins classloader)
 -

 Key: LUCENE-2260
 URL: https://issues.apache.org/jira/browse/LUCENE-2260
 Project: Lucene - Java
  Issue Type: Bug
Affects Versions: 2.9.1, 3.0
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 2.9.2, 3.0.1, 3.1

 Attachments: LUCENE-2260-lucene29.patch, LUCENE-2260.patch, 
 LUCENE-2260.patch


 When working on the dynmaic proxy classes using cglib/javaassist i recognized 
 a problem in the caching code inside AttributeSource:
 - AttributeSource has a static (!) cache map that holds implementation 
 classes for attributes to be faster on creating new attributes (reflection 
 cost)
 - AttributeSource has a static (!) cache map that holds a list of all 
 interfaces implemented by a specific AttributeImpl
 Also:
 - VirtualMethod in 3.1 hold a map of implementation distances keyed by 
 subclasses of the deprecated API
 Both have the problem that this strong reference is inside Lucene's 
 classloader and so persists as long as lucene lives. The classes referenced 
 can never be unloaded therefore, which would be fine if all live in the same 
 classloader. As soon as the Attribute or implementation class or the subclass 
 of the deprecated API are loaded by a different classloder (e.g. Lucene lives 
 in bootclasspath of tomcat, but lucene-consumer with custom attributes lives 
 in a webapp), they can never be unloaded, because a reference exists.
 Libs like CGLIB or JavaAssist or JDK's reflect.Proxy have a similar cache for 
 generated class files. They also manage this by a WeakHashMap. The cache will 
 always work perfect and no class will be evicted without reason, as classes 
 are only unloaded when the classloader goes and this will only happen on 
 request (e.g. by Tomcat).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2261) configurable MultiTermQuery TopTermsScoringBooleanRewrite pq size

2010-02-11 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12832607#action_12832607
 ] 

Uwe Schindler commented on LUCENE-2261:
---

Patch looks good, some things because of serializable:
- The readResove method must go to the singleton constant, which should also 
throw UOE when modified
- euquals / hashcode is neaded for the rewritemode, else FuzzyQuery  Co would 
no longer compare

It could be solved by doing like for AutoRewrite and its unmodifiable constant. 
I know: Queries are a pain because of Serializable.

+1 on adding a param to FuzzyQuery ctor

 configurable MultiTermQuery TopTermsScoringBooleanRewrite pq size
 -

 Key: LUCENE-2261
 URL: https://issues.apache.org/jira/browse/LUCENE-2261
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Reporter: Robert Muir
Priority: Minor
 Fix For: Flex Branch, 3.1

 Attachments: LUCENE-2261.patch


 MultiTermQuery has a TopTermsScoringBooleanRewrite, that uses a priority 
 queue to expand the query to the top-N terms.
 currently N is hardcoded at BooleanQuery.getMaxClauseCount(), but it would be 
 nice to be able to set this for top-N MultiTermQueries: e.g. expand a fuzzy 
 query to at most only the 50 closest terms.
 at a glance it seems one way would be to expose TopTermsScoringBooleanRewrite 
 (it is private right now) and add a ctor to it, so a MultiTermQuery can 
 instantiate one with its own limit.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2261) configurable MultiTermQuery TopTermsScoringBooleanRewrite pq size

2010-02-11 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12832683#action_12832683
 ] 

Uwe Schindler commented on LUCENE-2261:
---

Looks good when gaining a first insight. I have not tried the patch, will do 
soon.

 configurable MultiTermQuery TopTermsScoringBooleanRewrite pq size
 -

 Key: LUCENE-2261
 URL: https://issues.apache.org/jira/browse/LUCENE-2261
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Reporter: Robert Muir
Priority: Minor
 Fix For: Flex Branch, 3.1

 Attachments: LUCENE-2261.patch, LUCENE-2261.patch, LUCENE-2261.patch, 
 LUCENE-2261.patch


 MultiTermQuery has a TopTermsScoringBooleanRewrite, that uses a priority 
 queue to expand the query to the top-N terms.
 currently N is hardcoded at BooleanQuery.getMaxClauseCount(), but it would be 
 nice to be able to set this for top-N MultiTermQueries: e.g. expand a fuzzy 
 query to at most only the 50 closest terms.
 at a glance it seems one way would be to expose TopTermsScoringBooleanRewrite 
 (it is private right now) and add a ctor to it, so a MultiTermQuery can 
 instantiate one with its own limit.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2111) Wrapup flexible indexing

2010-02-10 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12831901#action_12831901
 ] 

Uwe Schindler commented on LUCENE-2111:
---

Mike: I reviewed this EmptyTermsEnum in MTQ. I would leave it in, but simply 
make EmptyTermsEnum a singleton (which is perfectly fine, because its 
stateless). Returning null here makes no performance in MTQs, it only makes the 
code in MTQ#rewrite and MTQWF#getDocIdSet ugly. The biggest problem with 
returning null here is the backwards layer that must be fixed then (because it 
checks if getTermsEnum return null and falls back to FilteredTermEnum from 
trunk). If you really want null, getTermsEnum should per default (if not 
overriddden) throw UOE and the rewrite code should catch this UOE and only then 
delegate to backwards layer.

 Wrapup flexible indexing
 

 Key: LUCENE-2111
 URL: https://issues.apache.org/jira/browse/LUCENE-2111
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Affects Versions: Flex Branch
Reporter: Michael McCandless
Assignee: Michael McCandless
 Fix For: 3.1

 Attachments: LUCENE-2111.patch, LUCENE-2111.patch, LUCENE-2111.patch, 
 LUCENE-2111.patch, LUCENE-2111.patch, LUCENE-2111.patch, LUCENE-2111.patch, 
 LUCENE-2111_fuzzy.patch


 Spinoff from LUCENE-1458.
 The flex branch is in fairly good shape -- all tests pass, initial search 
 performance testing looks good, it survived several visits from the Unicode 
 policeman ;)
 But it still has a number of nocommits, could use some more scrutiny 
 especially on the emulate old API on flex index and vice/versa code paths, 
 and still needs some more performance testing.  I'll do these under this 
 issue, and we should open separate issues for other self contained fixes.
 The end is in sight!

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Created: (LUCENE-2258) Remove synchonized from FuzzyTermEnum#similarity(final String target)

2010-02-10 Thread Uwe Schindler (JIRA)
Remove synchonized from FuzzyTermEnum#similarity(final String target)
---

 Key: LUCENE-2258
 URL: https://issues.apache.org/jira/browse/LUCENE-2258
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Reporter: Uwe Schindler
Assignee: Uwe Schindler
Priority: Trivial
 Fix For: 2.9.2, Flex Branch, 3.0.1, 3.1


The similarity method in FuzzyTermEnum is synchronized which is stupid because 
of:
- TermEnums are the iterator pattern and so are single-thread per definition
- The method is private, so nobody could ever create a fake FuzzyTermEnum just 
to have this method and use it multithreaded.
- The method is not static and has no static fields - so instances do not 
affect each other

The root of this comes from LUCENE-296, but was never reviewd and simply 
committed. The argument for making it synchronized is wrong.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-2258) Remove synchonized from FuzzyTermEnum#similarity(final String target)

2010-02-10 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-2258:
--

Attachment: LUCENE-2258.patch

Patch.

 Remove synchonized from FuzzyTermEnum#similarity(final String target)
 ---

 Key: LUCENE-2258
 URL: https://issues.apache.org/jira/browse/LUCENE-2258
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Reporter: Uwe Schindler
Assignee: Uwe Schindler
Priority: Trivial
 Fix For: 2.9.2, Flex Branch, 3.0.1, 3.1

 Attachments: LUCENE-2258.patch


 The similarity method in FuzzyTermEnum is synchronized which is stupid 
 because of:
 - TermEnums are the iterator pattern and so are single-thread per definition
 - The method is private, so nobody could ever create a fake FuzzyTermEnum 
 just to have this method and use it multithreaded.
 - The method is not static and has no static fields - so instances do not 
 affect each other
 The root of this comes from LUCENE-296, but was never reviewd and simply 
 committed. The argument for making it synchronized is wrong.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-2111) Wrapup flexible indexing

2010-02-10 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-2111:
--

Attachment: LUCENE-2111-EmptyTermsEnum.patch

Here the EmptyTermsEnum singleton patch (against flex trunk).

 Wrapup flexible indexing
 

 Key: LUCENE-2111
 URL: https://issues.apache.org/jira/browse/LUCENE-2111
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Affects Versions: Flex Branch
Reporter: Michael McCandless
Assignee: Michael McCandless
 Fix For: 3.1

 Attachments: LUCENE-2111-EmptyTermsEnum.patch, LUCENE-2111.patch, 
 LUCENE-2111.patch, LUCENE-2111.patch, LUCENE-2111.patch, LUCENE-2111.patch, 
 LUCENE-2111.patch, LUCENE-2111.patch, LUCENE-2111_fuzzy.patch


 Spinoff from LUCENE-1458.
 The flex branch is in fairly good shape -- all tests pass, initial search 
 performance testing looks good, it survived several visits from the Unicode 
 policeman ;)
 But it still has a number of nocommits, could use some more scrutiny 
 especially on the emulate old API on flex index and vice/versa code paths, 
 and still needs some more performance testing.  I'll do these under this 
 issue, and we should open separate issues for other self contained fixes.
 The end is in sight!

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



<    1   2   3   4   5   6   7   8   9   10   >