from:"Doron Cohen \(JIRA\)"

[jira] [Resolved] (LUCENE-4590) WriteEnwikiLineDoc which writes Wikipedia category pages to a separate file

2012-12-10 Thread Doron Cohen (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-4590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Doron Cohen resolved LUCENE-4590.
-

Resolution: Fixed

done.

> WriteEnwikiLineDoc which writes Wikipedia category pages to a separate file
> ---
>
> Key: LUCENE-4590
> URL: https://issues.apache.org/jira/browse/LUCENE-4590
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: modules/benchmark
>Reporter: Doron Cohen
>Assignee: Doron Cohen
>Priority: Minor
> Attachments: LUCENE-4590.patch
>
>
> It may be convenient to split Wikipedia's line file into two separate files: 
> category-pages and non-category ones. 
> It is possible to split the original line file with grep or such.
> It is more efficient to do it in advance.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Reopened] (LUCENE-4590) WriteEnwikiLineDoc which writes Wikipedia category pages to a separate file

2012-12-10 Thread Doron Cohen (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-4590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Doron Cohen reopened LUCENE-4590:
-

Lucene Fields:   (was: New)

Reopen issue for making the categories file name method public: 
categoriesLineFile() so that it can easily be modified in the future without 
breaking apps logic.

> WriteEnwikiLineDoc which writes Wikipedia category pages to a separate file
> ---
>
> Key: LUCENE-4590
> URL: https://issues.apache.org/jira/browse/LUCENE-4590
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: modules/benchmark
>Reporter: Doron Cohen
>Assignee: Doron Cohen
>Priority: Minor
> Attachments: LUCENE-4590.patch
>
>
> It may be convenient to split Wikipedia's line file into two separate files: 
> category-pages and non-category ones. 
> It is possible to split the original line file with grep or such.
> It is more efficient to do it in advance.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-4590) WriteEnwikiLineDoc which writes Wikipedia category pages to a separate file

2012-12-09 Thread Doron Cohen (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-4590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Doron Cohen resolved LUCENE-4590.
-

Resolution: Fixed

Done.

> WriteEnwikiLineDoc which writes Wikipedia category pages to a separate file
> ---
>
> Key: LUCENE-4590
> URL: https://issues.apache.org/jira/browse/LUCENE-4590
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: modules/benchmark
>Reporter: Doron Cohen
>Assignee: Doron Cohen
>Priority: Minor
> Attachments: LUCENE-4590.patch
>
>
> It may be convenient to split Wikipedia's line file into two separate files: 
> category-pages and non-category ones. 
> It is possible to split the original line file with grep or such.
> It is more efficient to do it in advance.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-4595) EnwikiContentSource thread safety problem (NPE) in 'forever' mode

2012-12-09 Thread Doron Cohen (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-4595?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Doron Cohen resolved LUCENE-4595.
-

   Resolution: Fixed
Lucene Fields:   (was: New)

Fixed.

Seems the tag bot missed the trunk commit for this one,
so her they are both:

- trunk: [r1418281|http://svn.apache.org/viewvc?view=revision&revision=1418281]
- 4x: [r1418925|http://svn.apache.org/viewvc?view=revision&revision=1418925]

> EnwikiContentSource thread safety problem (NPE) in 'forever' mode
> -
>
> Key: LUCENE-4595
> URL: https://issues.apache.org/jira/browse/LUCENE-4595
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: modules/benchmark
>Reporter: Doron Cohen
>Assignee: Doron Cohen
>Priority: Minor
> Attachments: LUCENE-4595.patch
>
>
> If close() is invoked around when an additional input stream reader is 
> recreated for the 'forever' behavior, an uncaught NPE might occur.
> This bug was probably always there, just exposed now with the 
> EnwikioContentSourceTest added in LUCENE-4588.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-4588) EnwikiContentSource silently swallows the last wiki doc

2012-12-09 Thread Doron Cohen (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-4588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Doron Cohen resolved LUCENE-4588.
-

   Resolution: Fixed
Lucene Fields:   (was: New)

Fixed.

As a side note, merging benchmark changes to 4x is so much easier than it used 
to be in 3x, now that trunk and branch are structured the same! Now if only 
'precommit' would run 60 times faster (that would be 12 seconds here)... 
wouldn't that be great? :) 

> EnwikiContentSource silently swallows the last wiki doc
> ---
>
> Key: LUCENE-4588
> URL: https://issues.apache.org/jira/browse/LUCENE-4588
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: modules/benchmark
>Reporter: Doron Cohen
>Assignee: Doron Cohen
>Priority: Minor
> Attachments: LUCENE-4588.patch
>
>
> Last wiki doc is never returned

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4588) EnwikiContentSource silently swallows the last wiki doc

2012-12-09 Thread Doron Cohen (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-4588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13527399#comment-13527399
 ] 

Doron Cohen commented on LUCENE-4588:
-

Two more commits to trunk (uncaught by bot due to incorrect message format):
- [r1417871|http://svn.apache.org/viewvc?rev=1417871&view=rev] -- LUCENE-4588 
(cont): (EnwikiContentSource fixes) avoid using the forbidden
StringBufferInputStream..
- [r1417921|http://svn.apache.org/viewvc?rev=1417921&view=rev] -- LUCENE-4588 
(cont): simplify test input stream crration. 

> EnwikiContentSource silently swallows the last wiki doc
> ---
>
> Key: LUCENE-4588
> URL: https://issues.apache.org/jira/browse/LUCENE-4588
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: modules/benchmark
>Reporter: Doron Cohen
>Assignee: Doron Cohen
>Priority: Minor
> Attachments: LUCENE-4588.patch
>
>
> Last wiki doc is never returned

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4595) EnwikiContentSource thread safety problem (NPE) in 'forever' mode

2012-12-07 Thread Doron Cohen (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-4595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13526326#comment-13526326
 ] 

Doron Cohen commented on LUCENE-4595:
-

Thanks for verifying Robert.
Committed the fix, let's see if the build becomes stable again.
Issue remains open for porting to 4x.

> EnwikiContentSource thread safety problem (NPE) in 'forever' mode
> -
>
> Key: LUCENE-4595
> URL: https://issues.apache.org/jira/browse/LUCENE-4595
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: modules/benchmark
>Reporter: Doron Cohen
>Assignee: Doron Cohen
>Priority: Minor
> Attachments: LUCENE-4595.patch
>
>
> If close() is invoked around when an additional input stream reader is 
> recreated for the 'forever' behavior, an uncaught NPE might occur.
> This bug was probably always there, just exposed now with the 
> EnwikioContentSourceTest added in LUCENE-4588.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-4590) WriteEnwikiLineDoc which writes Wikipedia category pages to a separate file

2012-12-06 Thread Doron Cohen (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-4590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Doron Cohen updated LUCENE-4590:


Attachment: LUCENE-4590.patch

Patch with the new task and a test.

> WriteEnwikiLineDoc which writes Wikipedia category pages to a separate file
> ---
>
> Key: LUCENE-4590
> URL: https://issues.apache.org/jira/browse/LUCENE-4590
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: modules/benchmark
>Reporter: Doron Cohen
>Assignee: Doron Cohen
>Priority: Minor
> Attachments: LUCENE-4590.patch
>
>
> It may be convenient to split Wikipedia's line file into two separate files: 
> category-pages and non-category ones. 
> It is possible to split the original line file with grep or such.
> It is more efficient to do it in advance.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4590) WriteEnwikiLineDoc which writes Wikipedia category pages to a separate file

2012-12-06 Thread Doron Cohen (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-4590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13514649#comment-13514649
 ] 

Doron Cohen commented on LUCENE-4590:
-

Now I see what you mean. Spooky, it is as if you were looking into the patch I 
did not post here.. How did you know I chose not to modify EnwikiConentSource...

I agree that if someone wishes to index just the non-category pages, the new 
WriteEnwikiLineDoc would create the category pages file for no use. Also, if 
indexing is conducted straight away, not through a line file first, categories 
will be indexed. But then anyone could check the title and decide not to index 
those docs. So I see the advantage, just not tempted to add this at the moment, 
but it can be added.

> WriteEnwikiLineDoc which writes Wikipedia category pages to a separate file
> ---
>
> Key: LUCENE-4590
> URL: https://issues.apache.org/jira/browse/LUCENE-4590
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: modules/benchmark
>Reporter: Doron Cohen
>Assignee: Doron Cohen
>Priority: Minor
>
> It may be convenient to split Wikipedia's line file into two separate files: 
> category-pages and non-category ones. 
> It is possible to split the original line file with grep or such.
> It is more efficient to do it in advance.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4588) EnwikiContentSource silently swallows the last wiki doc

2012-12-06 Thread Doron Cohen (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-4588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13514644#comment-13514644
 ] 

Doron Cohen commented on LUCENE-4588:
-

Thanks for the review Shai, changed as you suggested and committed (while jira 
was down...)

> EnwikiContentSource silently swallows the last wiki doc
> ---
>
> Key: LUCENE-4588
> URL: https://issues.apache.org/jira/browse/LUCENE-4588
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: modules/benchmark
>Reporter: Doron Cohen
>Assignee: Doron Cohen
>Priority: Minor
> Attachments: LUCENE-4588.patch
>
>
> Last wiki doc is never returned

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-4595) EnwikiContentSource thread safety problem (NPE) in 'forever' mode

2012-12-06 Thread Doron Cohen (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-4595?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Doron Cohen updated LUCENE-4595:


Attachment: LUCENE-4595.patch

Patch supposed to fix this.
But I was not able to recreate the bug, so couldn't actually test it.

> EnwikiContentSource thread safety problem (NPE) in 'forever' mode
> -
>
> Key: LUCENE-4595
> URL: https://issues.apache.org/jira/browse/LUCENE-4595
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: modules/benchmark
>Reporter: Doron Cohen
>Assignee: Doron Cohen
>Priority: Minor
> Attachments: LUCENE-4595.patch
>
>
> If close() is invoked around when an additional input stream reader is 
> recreated for the 'forever' behavior, an uncaught NPE might occur.
> This bug was probably always there, just exposed now with the 
> EnwikioContentSourceTest added in LUCENE-4588.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4595) EnwikiContentSource thread safety problem (NPE) in 'forever' mode

2012-12-06 Thread Doron Cohen (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-4595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13512113#comment-13512113
 ] 

Doron Cohen commented on LUCENE-4595:
-

Jenkin's reproduce params and error log: 
{noformat}
Build: http://jenkins.sd-datasolutions.de/job/Lucene-Solr-trunk-Linux/3093/
Java: 32bit/jdk1.6.0_37 -server -XX:+UseSerialGC

1 tests failed.
FAILED:  
org.apache.lucene.benchmark.byTask.feeds.EnwikiContentSourceTest.testForever

Error Message:
Captured an uncaught exception in thread: Thread[id=140, name=Thread-2, 
state=RUNNABLE, group=TGRP-EnwikiContentSourceTest]

Stack Trace:
com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an uncaught 
exception in thread: Thread[id=140, name=Thread-2, state=RUNNABLE, 
group=TGRP-EnwikiContentSourceTest]
at 
__randomizedtesting.SeedInfo.seed([EF7AF10441351C3B:AB004FFFCF2C6B8C]:0)
Caused by: java.lang.NullPointerException
at __randomizedtesting.SeedInfo.seed([EF7AF10441351C3B]:0)
at java.io.Reader.(Reader.java:61)
at java.io.InputStreamReader.(InputStreamReader.java:112)
at 
org.apache.lucene.benchmark.byTask.feeds.EnwikiContentSource$Parser.run(EnwikiContentSource.java:186)
at java.lang.Thread.run(Thread.java:662)

Build Log:
[...truncated 5173 lines...]
[junit4:junit4] Suite: 
org.apache.lucene.benchmark.byTask.feeds.EnwikiContentSourceTest
[junit4:junit4]   2> 7 Δεκ 2012 6:39:53 πμ 
com.carrotsearch.randomizedtesting.RandomizedRunner$QueueUncaughtExceptionsHandler
 uncaughtException
[junit4:junit4]   2> WARNING: Uncaught exception in thread: 
Thread[Thread-2,5,TGRP-EnwikiContentSourceTest]
[junit4:junit4]   2> java.lang.NullPointerException
[junit4:junit4]   2>at 
__randomizedtesting.SeedInfo.seed([EF7AF10441351C3B]:0)
[junit4:junit4]   2>at java.io.Reader.(Reader.java:61)
[junit4:junit4]   2>at 
java.io.InputStreamReader.(InputStreamReader.java:112)
[junit4:junit4]   2>at 
org.apache.lucene.benchmark.byTask.feeds.EnwikiContentSource$Parser.run(EnwikiContentSource.java:186)
[junit4:junit4]   2>at java.lang.Thread.run(Thread.java:662)
[junit4:junit4]   2> NOTE: reproduce with: ant test  
-Dtestcase=EnwikiContentSourceTest -Dtests.method=testForever 
-Dtests.seed=EF7AF10441351C3B -Dtests.multiplier=3 -Dtests.slow=true 
-Dtests.locale=el -Dtests.timezone=SST -Dtests.file.encoding=UTF-8
[junit4:junit4] ERROR   0.07s J1 | EnwikiContentSourceTest.testForever <<<
[junit4:junit4]> Throwable #1: 
com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an uncaught 
exception in thread: Thread[id=140, name=Thread-2, state=RUNNABLE, 
group=TGRP-EnwikiContentSourceTest]
[junit4:junit4]>at 
__randomizedtesting.SeedInfo.seed([EF7AF10441351C3B:AB004FFFCF2C6B8C]:0)
[junit4:junit4]> Caused by: java.lang.NullPointerException
[junit4:junit4]>at 
__randomizedtesting.SeedInfo.seed([EF7AF10441351C3B]:0)
[junit4:junit4]>at java.io.Reader.(Reader.java:61)
[junit4:junit4]>at 
java.io.InputStreamReader.(InputStreamReader.java:112)
[junit4:junit4]>at 
org.apache.lucene.benchmark.byTask.feeds.EnwikiContentSource$Parser.run(EnwikiContentSource.java:186)
[junit4:junit4]>at java.lang.Thread.run(Thread.java:662)
[junit4:junit4]   2> NOTE: test params are: codec=Lucene41: {}, 
sim=DefaultSimilarity, locale=el, timezone=SST
[junit4:junit4]   2> NOTE: Linux 3.2.0-34-generic i386/Sun Microsystems Inc. 
1.6.0_37 (32-bit)/cpus=8,threads=1,free=47084536,total=64946176
[junit4:junit4]   2> NOTE: All tests run in this JVM: [TrecContentSourceTest, 
TestConfig, DocMakerTest, SearchWithSortTaskTest, StreamUtilsTest, 
WriteLineDocTaskTest, CreateIndexTaskTest, TestQualityRun, LineDocSourceTest, 
TestPerfTasksParse, AddIndexesTaskTest, PerfTaskTest, AltPackageTaskTest, 
EnwikiContentSourceTest]
[junit4:junit4] Completed on J1 in 0.30s, 3 tests, 1 error <<< FAILURES!
{noformat}

> EnwikiContentSource thread safety problem (NPE) in 'forever' mode
> -
>
> Key: LUCENE-4595
> URL: https://issues.apache.org/jira/browse/LUCENE-4595
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: modules/benchmark
>Reporter: Doron Cohen
>Assignee: Doron Cohen
>Priority: Minor
>
> If close() is invoked around when an additional input stream reader is 
> recreated for the 'forever' behavior, an uncaught NPE might occur.
> This bug was probably always there, just exposed now with the 
> EnwikioContentSourceTest added in LUCENE-4588.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-

[jira] [Created] (LUCENE-4595) EnwikiContentSource thread safety problem (NPE) in 'forever' mode

2012-12-06 Thread Doron Cohen (JIRA)

Doron Cohen created LUCENE-4595:
---

 Summary: EnwikiContentSource thread safety problem (NPE) in 
'forever' mode
 Key: LUCENE-4595
 URL: https://issues.apache.org/jira/browse/LUCENE-4595
 Project: Lucene - Core
  Issue Type: Bug
  Components: modules/benchmark
Reporter: Doron Cohen
Assignee: Doron Cohen
Priority: Minor


If close() is invoked around when an additional input stream reader is 
recreated for the 'forever' behavior, an uncaught NPE might occur.
This bug was probably always there, just exposed now with the 
EnwikioContentSourceTest added in LUCENE-4588.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4590) WriteEnwikiLineDoc which writes Wikipedia category pages to a separate file

2012-12-06 Thread Doron Cohen (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-4590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13511262#comment-13511262
 ] 

Doron Cohen commented on LUCENE-4590:
-

bq. Do you think perhaps that EnwikiContentSource should let the caller know 
whether the returned DocData represents a content page or category page?

That's what I planned at start, but decided to leave WriteLineDoc intact 
because it is general, that is, not aware of the unique structure of Wikipedia 
data, where some of the pages represent categories.

bq. So maybe, if someone wants to generate a line file from the pages only... 
flexibility that I think you are trying to achieve...

Actually I am after the two files... :) These category pages are (unique) 
taxonomy node names, but without the taxonomy structure, which can be deduced 
from the (parent) categories of the category pages. Having this separate 
category pages can be useful for deducing that taxonomy.

> WriteEnwikiLineDoc which writes Wikipedia category pages to a separate file
> ---
>
> Key: LUCENE-4590
> URL: https://issues.apache.org/jira/browse/LUCENE-4590
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: modules/benchmark
>Reporter: Doron Cohen
>Assignee: Doron Cohen
>Priority: Minor
>
> It may be convenient to split Wikipedia's line file into two separate files: 
> category-pages and non-category ones. 
> It is possible to split the original line file with grep or such.
> It is more efficient to do it in advance.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-4590) WriteEnwikiLineDoc which writes Wikipedia category pages to a separate file

2012-12-06 Thread Doron Cohen (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-4590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Doron Cohen updated LUCENE-4590:


Component/s: modules/benchmark

> WriteEnwikiLineDoc which writes Wikipedia category pages to a separate file
> ---
>
> Key: LUCENE-4590
> URL: https://issues.apache.org/jira/browse/LUCENE-4590
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: modules/benchmark
>Reporter: Doron Cohen
>Assignee: Doron Cohen
>Priority: Minor
>
> It may be convenient to split Wikipedia's line file into two separate files: 
> category-pages and non-category ones. 
> It is possible to split the original line file with grep or such.
> It is more efficient to do it in advance.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (LUCENE-4590) WriteEnwikiLineDoc which writes Wikipedia category pages to a separate file

2012-12-06 Thread Doron Cohen (JIRA)

Doron Cohen created LUCENE-4590:
---

 Summary: WriteEnwikiLineDoc which writes Wikipedia category pages 
to a separate file
 Key: LUCENE-4590
 URL: https://issues.apache.org/jira/browse/LUCENE-4590
 Project: Lucene - Core
  Issue Type: New Feature
Reporter: Doron Cohen
Assignee: Doron Cohen
Priority: Minor


It may be convenient to split Wikipedia's line file into two separate files: 
category-pages and non-category ones. 
It is possible to split the original line file with grep or such.
It is more efficient to do it in advance.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Assigned] (LUCENE-4588) EnwikiContentSource silently swallows the last wiki doc

2012-12-06 Thread Doron Cohen (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-4588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Doron Cohen reassigned LUCENE-4588:
---

Assignee: Doron Cohen

> EnwikiContentSource silently swallows the last wiki doc
> ---
>
> Key: LUCENE-4588
> URL: https://issues.apache.org/jira/browse/LUCENE-4588
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: modules/benchmark
>Reporter: Doron Cohen
>Assignee: Doron Cohen
>Priority: Minor
> Attachments: LUCENE-4588.patch
>
>
> Last wiki doc is never returned

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-4588) EnwikiContentSource silently swallows the last wiki doc

2012-12-05 Thread Doron Cohen (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-4588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Doron Cohen updated LUCENE-4588:


Attachment: LUCENE-4588.patch

Patch adds a test for enwiki-content-source and fixes both the last doc problem 
and the thread leak.

> EnwikiContentSource silently swallows the last wiki doc
> ---
>
> Key: LUCENE-4588
> URL: https://issues.apache.org/jira/browse/LUCENE-4588
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: modules/benchmark
>Reporter: Doron Cohen
>Priority: Minor
> Attachments: LUCENE-4588.patch
>
>
> Last wiki doc is never returned

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4588) EnwikiContentSource silently swallows the last wiki doc

2012-12-05 Thread Doron Cohen (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-4588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13510774#comment-13510774
 ] 

Doron Cohen commented on LUCENE-4588:
-

In addition, there's a thread leak in 'forever' mode.

> EnwikiContentSource silently swallows the last wiki doc
> ---
>
> Key: LUCENE-4588
> URL: https://issues.apache.org/jira/browse/LUCENE-4588
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: modules/benchmark
>Reporter: Doron Cohen
>Priority: Minor
>
> Last wiki doc is never returned

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (LUCENE-4588) EnwikiContentSource silently swallows the last wiki doc

2012-12-05 Thread Doron Cohen (JIRA)

Doron Cohen created LUCENE-4588:
---

 Summary: EnwikiContentSource silently swallows the last wiki doc
 Key: LUCENE-4588
 URL: https://issues.apache.org/jira/browse/LUCENE-4588
 Project: Lucene - Core
  Issue Type: Bug
  Components: modules/benchmark
Reporter: Doron Cohen
Priority: Minor


Last wiki doc is never returned

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3464) Rename IndexReader.reopen to make it clear that reopen may not happen

2011-09-26 Thread Doron Cohen (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13114656#comment-13114656
 ] 

Doron Cohen commented on LUCENE-3464:
-

I liked reopen()... (but also like returning null in case there's nothing 
newer...)

If the name is going to change, two additional names to consider:
* newest()
* newer()

For "newest()" I think current behavior of returning "this" makes sense when 
"this" is the newest.
For "newer()" returning null in that case seems right.

One problem I have with these names is that they both seem to hide the fact 
that things are going on down there, when it is required to open a new reader...

> Rename IndexReader.reopen to make it clear that reopen may not happen
> -
>
> Key: LUCENE-3464
> URL: https://issues.apache.org/jira/browse/LUCENE-3464
> Project: Lucene - Java
>  Issue Type: Bug
>Reporter: Michael McCandless
>Assignee: Michael McCandless
> Fix For: 3.5, 4.0
>
>
> Spinoff from LUCENE-3454 where Shai noted this inconsistency.
> IR.reopen sounds like an unconditional operation, which has trapped users in 
> the past into always closing the old reader instead of only closing it if the 
> returned reader is new.
> I think this hidden maybe-ness is trappy and we should rename it 
> (maybeReopen?  reopenIfNeeded?).
> In addition, instead of returning "this" when the reopen didn't happen, I 
> think we should return null to enforce proper usage of the maybe-ness of this 
> API.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3454) rename optimize to a less cool-sounding name

2011-09-26 Thread Doron Cohen (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13114617#comment-13114617
 ] 

Doron Cohen commented on LUCENE-3454:
-

To me merge(num) doing nothing "because there are already no more than n 
segments" is as fine as close() doing nothing "because of already being closed" 
so +1 for merge(num).


> rename optimize to a less cool-sounding name
> 
>
> Key: LUCENE-3454
> URL: https://issues.apache.org/jira/browse/LUCENE-3454
> Project: Lucene - Java
>  Issue Type: Improvement
>Affects Versions: 4.0
>Reporter: Robert Muir
>
> I think users see the name optimize and feel they must do this, because who 
> wants a suboptimal system? but this probably just results in wasted time and 
> resources.
> maybe rename to collapseSegments or something?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-3457) Upgrade to commons-compress 1.2

2011-09-25 Thread Doron Cohen (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-3457?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Doron Cohen resolved LUCENE-3457.
-

Resolution: Fixed

Fixed:
- 1175475 - trunk
- 1175528 - 3x

> Upgrade to commons-compress 1.2
> ---
>
> Key: LUCENE-3457
> URL: https://issues.apache.org/jira/browse/LUCENE-3457
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: modules/benchmark
>Reporter: Doron Cohen
>Assignee: Doron Cohen
>Priority: Minor
> Fix For: 3.5, 4.0
>
> Attachments: LUCENE-3457.patch, test.out.gz
>
>
> Commons Compress bug COMPRESS-127 was fixed in 1.2, so the workaround in 
> benchmark's StreamUtils is no longer required. Compress is also used in solr. 
> Replace with new jar in both benchmark and solr and get rid of that 
> workaround.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3457) Upgrade to commons-compress 1.2

2011-09-25 Thread Doron Cohen (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13114302#comment-13114302
 ] 

Doron Cohen commented on LUCENE-3457:
-

ok great, thanks Robert, so this has nothing to do with the comprees jar update.
I'll commit shortly.

> Upgrade to commons-compress 1.2
> ---
>
> Key: LUCENE-3457
> URL: https://issues.apache.org/jira/browse/LUCENE-3457
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: modules/benchmark
>Reporter: Doron Cohen
>Assignee: Doron Cohen
>Priority: Minor
> Fix For: 3.5, 4.0
>
> Attachments: LUCENE-3457.patch, test.out.gz
>
>
> Commons Compress bug COMPRESS-127 was fixed in 1.2, so the workaround in 
> benchmark's StreamUtils is no longer required. Compress is also used in solr. 
> Replace with new jar in both benchmark and solr and get rid of that 
> workaround.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-3457) Upgrade to commons-compress 1.2

2011-09-25 Thread Doron Cohen (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-3457?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Doron Cohen updated LUCENE-3457:


Attachment: test.out.gz

Still it fails - this time running 'clean test' from trunk, all lucene tests 
pass, some of solr tests failed:

- org.apache.solr.handler.TestReplicationHandler
[junit] Tests run: 1, Failures: 1, Errors: 0, Time elapsed: 43.703 sec
- org.apache.solr.handler.component.DebugComponentTest
[junit] Tests run: 2, Failures: 1, Errors: 0, Time elapsed: 1 sec
- org.apache.solr.handler.component.TermVectorComponentTest
[junit] Tests run: 4, Failures: 1, Errors: 0, Time elapsed: 1.375 sec
- org.apache.solr.request.JSONWriterTest
[junit] Tests run: 2, Failures: 1, Errors: 0, Time elapsed: 1.078 sec
- org.apache.solr.schema.BadIndexSchemaTest
[junit] Tests run: 5, Failures: 1, Errors: 0, Time elapsed: 1.266 sec
- org.apache.solr.schema.RequiredFieldsTest
[junit] Tests run: 3, Failures: 1, Errors: 0, Time elapsed: 1.422 sec
- org.apache.solr.search.QueryParsingTest
[junit] Tests run: 2, Failures: 1, Errors: 0, Time elapsed: 0.641 sec
- org.apache.solr.search.SpatialFilterTest
[junit] Tests run: 3, Failures: 1, Errors: 0, Time elapsed: 1.438 sec
- org.apache.solr.search.TestQueryTypes
[junit] Tests run: 1, Failures: 1, Errors: 0, Time elapsed: 0.953 sec
- org.apache.solr.servlet.CacheHeaderTest
[junit] Tests run: 5, Failures: 1, Errors: 0, Time elapsed: 0.984 sec
- org.apache.solr.spelling.SpellCheckCollatorTest
[junit] Tests run: 4, Failures: 1, Errors: 0, Time elapsed: 1.281 sec
- org.apache.solr.update.DocumentBuilderTest
[junit] Tests run: 4, Failures: 1, Errors: 0, Time elapsed: 0.734 sec
- org.apache.solr.util.SolrPluginUtilsTest
[junit] Tests run: 7, Failures: 1, Errors: 0, Time elapsed: 0.766 sec

Running alone, TestReplicationHandler for example passes.
Same for DebugComponentTest.
I am not sure what is happenning here.
Attaching the test output in case someone wants take a look.

> Upgrade to commons-compress 1.2
> ---
>
> Key: LUCENE-3457
> URL: https://issues.apache.org/jira/browse/LUCENE-3457
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: modules/benchmark
>Reporter: Doron Cohen
>Assignee: Doron Cohen
>Priority: Minor
> Fix For: 3.5, 4.0
>
> Attachments: LUCENE-3457.patch, test.out.gz
>
>
> Commons Compress bug COMPRESS-127 was fixed in 1.2, so the workaround in 
> benchmark's StreamUtils is no longer required. Compress is also used in solr. 
> Replace with new jar in both benchmark and solr and get rid of that 
> workaround.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Issue Comment Edited] (LUCENE-3457) Upgrade to commons-compress 1.2

2011-09-25 Thread Doron Cohen (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13114219#comment-13114219
 ] 

Doron Cohen edited comment on LUCENE-3457 at 9/25/11 11:44 AM:
---

Thanks Chris, almost sure I did a clean, will try again.

  was (Author: doronc):
Thanks Chriss, almost sure I did a clean, will try again.
  
> Upgrade to commons-compress 1.2
> ---
>
> Key: LUCENE-3457
> URL: https://issues.apache.org/jira/browse/LUCENE-3457
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: modules/benchmark
>Reporter: Doron Cohen
>Assignee: Doron Cohen
>Priority: Minor
> Fix For: 3.5, 4.0
>
> Attachments: LUCENE-3457.patch
>
>
> Commons Compress bug COMPRESS-127 was fixed in 1.2, so the workaround in 
> benchmark's StreamUtils is no longer required. Compress is also used in solr. 
> Replace with new jar in both benchmark and solr and get rid of that 
> workaround.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3457) Upgrade to commons-compress 1.2

2011-09-25 Thread Doron Cohen (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13114219#comment-13114219
 ] 

Doron Cohen commented on LUCENE-3457:
-

Thanks Chriss, almost sure I did a clean, will try again.

> Upgrade to commons-compress 1.2
> ---
>
> Key: LUCENE-3457
> URL: https://issues.apache.org/jira/browse/LUCENE-3457
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: modules/benchmark
>Reporter: Doron Cohen
>Assignee: Doron Cohen
>Priority: Minor
> Fix For: 3.5, 4.0
>
> Attachments: LUCENE-3457.patch
>
>
> Commons Compress bug COMPRESS-127 was fixed in 1.2, so the workaround in 
> benchmark's StreamUtils is no longer required. Compress is also used in solr. 
> Replace with new jar in both benchmark and solr and get rid of that 
> workaround.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3457) Upgrade to commons-compress 1.2

2011-09-25 Thread Doron Cohen (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13114213#comment-13114213
 ] 

Doron Cohen commented on LUCENE-3457:
-

hmmm, this is strange.

These are the tests that failed with compress-1.2 for 'ant clean test' under 
solr:

- org.apache.solr.handler.TestReplicationHandler
[junit] Tests run: 1, Failures: 1, Errors: 0, Time elapsed: 39.968 sec
- org.apache.solr.handler.component.DebugComponentTest
[junit] Tests run: 2, Failures: 1, Errors: 0, Time elapsed: 1.219 sec
- org.apache.solr.handler.component.TermVectorComponentTest
[junit] Tests run: 4, Failures: 1, Errors: 0, Time elapsed: 1 sec
- org.apache.solr.request.JSONWriterTest
[junit] Tests run: 2, Failures: 1, Errors: 0, Time elapsed: 0.75 sec
- org.apache.solr.response.TestCSVResponseWriter
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.719 sec
- org.apache.solr.schema.BadIndexSchemaTest
[junit] Tests run: 5, Failures: 1, Errors: 0, Time elapsed: 1.187 sec
- org.apache.solr.search.TestQueryUtils
[junit] Tests run: 1, Failures: 1, Errors: 0, Time elapsed: 1.14 sec
- org.apache.solr.search.similarities.TestBM25SimilarityFactory
[junit] Tests run: 2, Failures: 1, Errors: 0, Time elapsed: 0.187 sec
- org.apache.solr.servlet.DirectSolrConnectionTest
[junit] Tests run: 2, Failures: 1, Errors: 0, Time elapsed: 0.344 sec
- org.apache.solr.update.processor.SignatureUpdateProcessorFactoryTest
[junit] Tests run: 4, Failures: 1, Errors: 0, Time elapsed: 3.984 sec

I replaced 1.1 and they all passed. 
However replaced to compress-1.2 and now they all passed.

I now see that I am on r1174072, I'll update and try again


> Upgrade to commons-compress 1.2
> ---
>
> Key: LUCENE-3457
> URL: https://issues.apache.org/jira/browse/LUCENE-3457
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: modules/benchmark
>Reporter: Doron Cohen
>Assignee: Doron Cohen
>Priority: Minor
> Fix For: 3.5, 4.0
>
> Attachments: LUCENE-3457.patch
>
>
> Commons Compress bug COMPRESS-127 was fixed in 1.2, so the workaround in 
> benchmark's StreamUtils is no longer required. Compress is also used in solr. 
> Replace with new jar in both benchmark and solr and get rid of that 
> workaround.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-3457) Upgrade to commons-compress 1.2

2011-09-25 Thread Doron Cohen (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-3457?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Doron Cohen updated LUCENE-3457:


Attachment: LUCENE-3457.patch

Attached simple patch with the fix.
After applying the patch need to also download commons-compress-1.2.jar and 
place it in under module/benchmark/lib and under solr/contrib/extraction/lib. 

Currently several solr tests fails for me with this patch, probably not related 
to replacing the compress jar, as when running alone (-Dtestcase) they pass.

> Upgrade to commons-compress 1.2
> ---
>
> Key: LUCENE-3457
> URL: https://issues.apache.org/jira/browse/LUCENE-3457
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: modules/benchmark
>Reporter: Doron Cohen
>Assignee: Doron Cohen
>Priority: Minor
> Fix For: 3.5, 4.0
>
> Attachments: LUCENE-3457.patch
>
>
> Commons Compress bug COMPRESS-127 was fixed in 1.2, so the workaround in 
> benchmark's StreamUtils is no longer required. Compress is also used in solr. 
> Replace with new jar in both benchmark and solr and get rid of that 
> workaround.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (LUCENE-3457) Upgrade to commons-compress 1.2

2011-09-25 Thread Doron Cohen (JIRA)

Upgrade to commons-compress 1.2
---

 Key: LUCENE-3457
 URL: https://issues.apache.org/jira/browse/LUCENE-3457
 Project: Lucene - Java
  Issue Type: Bug
  Components: modules/benchmark
Reporter: Doron Cohen
Assignee: Doron Cohen
Priority: Minor
 Fix For: 3.5, 4.0


Commons Compress bug COMPRESS-127 was fixed in 1.2, so the workaround in 
benchmark's StreamUtils is no longer required. Compress is also used in solr. 
Replace with new jar in both benchmark and solr and get rid of that workaround.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-3215) SloppyPhraseScorer sometimes computes Infinite freq

2011-09-22 Thread Doron Cohen (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-3215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Doron Cohen resolved LUCENE-3215.
-

   Resolution: Fixed
Fix Version/s: 4.0
   3.5

Fixed
- r1173961 - trunk
- r1174002 - 3x

Prior to committing I compared the performance of sloppy phrase queries 
with/out repeats for large documents with many candidate matches and did not 
see the anticipated speedup, though, at least, no degradations as well.

> SloppyPhraseScorer sometimes computes Infinite freq
> ---
>
> Key: LUCENE-3215
> URL: https://issues.apache.org/jira/browse/LUCENE-3215
> Project: Lucene - Java
>  Issue Type: Bug
>Reporter: Robert Muir
>Assignee: Doron Cohen
> Fix For: 3.5, 4.0
>
> Attachments: LUCENE-3215.patch, LUCENE-3215.patch, LUCENE-3215.patch, 
> LUCENE-3215.patch, LUCENE-3215_test.patch, LUCENE-3215_test.patch
>
>
> reported on user list:
> http://www.lucidimagination.com/search/document/400cbc528ed63db9/score_of_infinity_on_dismax_query

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-3390) Incorrect sort by Numeric values for documents missing the sorting field

2011-09-21 Thread Doron Cohen (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-3390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Doron Cohen updated LUCENE-3390:


Attachment: LUCENE-3390-BitsInterface.patch

Attached patch with a test that fails before this fix (otherwise patch same as 
previous).

The test uses 4 collectors simultaneously, each with different missing values.

> Incorrect sort by Numeric values for documents missing the sorting field
> 
>
> Key: LUCENE-3390
> URL: https://issues.apache.org/jira/browse/LUCENE-3390
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: core/search
>Affects Versions: 3.3
>Reporter: Gilad Barkai
>Assignee: Doron Cohen
>Priority: Minor
>  Labels: double, float, int, long, numeric, sort
> Fix For: 3.4
>
> Attachments: LUCENE-3390-BitsInterface.patch, 
> LUCENE-3390-BitsInterface.patch, LUCENE-3390-fix-like-trunk.patch, 
> LUCENE-3390-fix-like-trunk.patch, LUCENE-3390-fix-like-trunk.patch, 
> LUCENE-3390-fix-like-trunk.patch, LUCENE-3390.patch, SortByDouble.java
>
>
> While sorting results over a numeric field, documents which do not contain a 
> value for the sorting field seem to get 0 (ZERO) value in the sort. (Tested 
> against Double, Float, Int & Long numeric fields ascending and descending 
> order).
> This behavior is unexpected, as zero is "comparable" to the rest of the 
> values. A better solution would either be allowing the user to define such a 
> "non-value" default, or always bring those document results as the last ones.
> Example scenario:
> Adding 3 documents, 1st with value 3.5d, 2nd with -10d, and 3rd without any 
> value.
> Searching with MatchAllDocsQuery, with sort over that field in descending 
> order yields the docid results of 0, 2, 1.
> Asking for the top 2 documents brings the document without any value as the 
> 2nd result - which seems as a bug?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3390) Incorrect sort by Numeric values for documents missing the sorting field

2011-09-21 Thread Doron Cohen (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13109454#comment-13109454
 ] 

Doron Cohen commented on LUCENE-3390:
-

I wrote a small test that should fail with the bug Uwe fixed here and pass with 
the fix. For some reason it is still failing even with that fix. Tried this 
with previous patch, will now try with last one, though I think it it should 
pass also with previous one. I'll give it another try.

> Incorrect sort by Numeric values for documents missing the sorting field
> 
>
> Key: LUCENE-3390
> URL: https://issues.apache.org/jira/browse/LUCENE-3390
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: core/search
>Affects Versions: 3.3
>Reporter: Gilad Barkai
>Assignee: Doron Cohen
>Priority: Minor
>  Labels: double, float, int, long, numeric, sort
> Fix For: 3.4
>
> Attachments: LUCENE-3390-BitsInterface.patch, 
> LUCENE-3390-fix-like-trunk.patch, LUCENE-3390-fix-like-trunk.patch, 
> LUCENE-3390-fix-like-trunk.patch, LUCENE-3390-fix-like-trunk.patch, 
> LUCENE-3390.patch, SortByDouble.java
>
>
> While sorting results over a numeric field, documents which do not contain a 
> value for the sorting field seem to get 0 (ZERO) value in the sort. (Tested 
> against Double, Float, Int & Long numeric fields ascending and descending 
> order).
> This behavior is unexpected, as zero is "comparable" to the rest of the 
> values. A better solution would either be allowing the user to define such a 
> "non-value" default, or always bring those document results as the last ones.
> Example scenario:
> Adding 3 documents, 1st with value 3.5d, 2nd with -10d, and 3rd without any 
> value.
> Searching with MatchAllDocsQuery, with sort over that field in descending 
> order yields the docid results of 0, 2, 1.
> Asking for the top 2 documents brings the document without any value as the 
> 2nd result - which seems as a bug?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3390) Incorrect sort by Numeric values for documents missing the sorting field

2011-09-20 Thread Doron Cohen (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13108490#comment-13108490
 ] 

Doron Cohen commented on LUCENE-3390:
-

Hi Uwe, thanks for catching this. 
I agree that this is a bug, and needs to be fixed.
Just to make sure that we agree on what the problem is, let me describe it 
again: in current 3x code in setNextReader() we extract the values from the 
cache, e.g. by {code}FieldCache.DEFAULT.getDoubles(reader, field, 
parser);{code} and, if a missing value was set, we iterate the unvalued docs 
and set them to that missing value. However this settings takes place at the 
same array just obtained from the cache, and so this is (1) inefficient as it 
will happen again in the next sort with same field, (2) incorrect as if two 
sorts of *same* field have different missing value they will collide, and (3) 
unsafe as you indicated.
I was very happy with the reuse of the cache for caching the missing values so 
I would like to try to solve this with that "frame"... More later...

> Incorrect sort by Numeric values for documents missing the sorting field
> 
>
> Key: LUCENE-3390
> URL: https://issues.apache.org/jira/browse/LUCENE-3390
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: core/search
>Affects Versions: 3.3
>Reporter: Gilad Barkai
>Assignee: Doron Cohen
>Priority: Minor
>  Labels: double, float, int, long, numeric, sort
> Fix For: 3.4
>
> Attachments: LUCENE-3390-fix-like-trunk.patch, 
> LUCENE-3390-fix-like-trunk.patch, LUCENE-3390-fix-like-trunk.patch, 
> LUCENE-3390.patch, SortByDouble.java
>
>
> While sorting results over a numeric field, documents which do not contain a 
> value for the sorting field seem to get 0 (ZERO) value in the sort. (Tested 
> against Double, Float, Int & Long numeric fields ascending and descending 
> order).
> This behavior is unexpected, as zero is "comparable" to the rest of the 
> values. A better solution would either be allowing the user to define such a 
> "non-value" default, or always bring those document results as the last ones.
> Example scenario:
> Adding 3 documents, 1st with value 3.5d, 2nd with -10d, and 3rd without any 
> value.
> Searching with MatchAllDocsQuery, with sort over that field in descending 
> order yields the docid results of 0, 2, 1.
> Asking for the top 2 documents brings the document without any value as the 
> 2nd result - which seems as a bug?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-3215) SloppyPhraseScorer sometimes computes Infinite freq

2011-09-20 Thread Doron Cohen (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-3215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Doron Cohen updated LUCENE-3215:


Attachment: LUCENE-3215.patch

Previous patch still produces NANs and infinite scores with holes.
Updated patch is fixing this, by updating END (before computing the new 
match-length) also for pp (not only for its repeats).

I plan to commit this soon.

> SloppyPhraseScorer sometimes computes Infinite freq
> ---
>
> Key: LUCENE-3215
> URL: https://issues.apache.org/jira/browse/LUCENE-3215
> Project: Lucene - Java
>  Issue Type: Bug
>Reporter: Robert Muir
>Assignee: Doron Cohen
> Attachments: LUCENE-3215.patch, LUCENE-3215.patch, LUCENE-3215.patch, 
> LUCENE-3215.patch, LUCENE-3215_test.patch, LUCENE-3215_test.patch
>
>
> reported on user list:
> http://www.lucidimagination.com/search/document/400cbc528ed63db9/score_of_infinity_on_dismax_query

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-3215) SloppyPhraseScorer sometimes computes Infinite freq

2011-09-17 Thread Doron Cohen (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-3215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Doron Cohen updated LUCENE-3215:


Attachment: LUCENE-3215.patch

Updated patch for current trunk r1172055.

> SloppyPhraseScorer sometimes computes Infinite freq
> ---
>
> Key: LUCENE-3215
> URL: https://issues.apache.org/jira/browse/LUCENE-3215
> Project: Lucene - Java
>  Issue Type: Bug
>Reporter: Robert Muir
>Assignee: Doron Cohen
> Attachments: LUCENE-3215.patch, LUCENE-3215.patch, LUCENE-3215.patch, 
> LUCENE-3215_test.patch, LUCENE-3215_test.patch
>
>
> reported on user list:
> http://www.lucidimagination.com/search/document/400cbc528ed63db9/score_of_infinity_on_dismax_query

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-3215) SloppyPhraseScorer sometimes computes Infinite freq

2011-09-17 Thread Doron Cohen (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-3215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Doron Cohen updated LUCENE-3215:


Attachment: LUCENE-3215.patch

Attached patch is based on r1166541 - before recent changes to scorers. Will 
merge with recent changes tomorrow or so. All tests pass.
I believe that sloppy scoring performance should improve with this change but 
did not check this.

> SloppyPhraseScorer sometimes computes Infinite freq
> ---
>
> Key: LUCENE-3215
> URL: https://issues.apache.org/jira/browse/LUCENE-3215
> Project: Lucene - Java
>  Issue Type: Bug
>Reporter: Robert Muir
>Assignee: Doron Cohen
> Attachments: LUCENE-3215.patch, LUCENE-3215.patch, 
> LUCENE-3215_test.patch, LUCENE-3215_test.patch
>
>
> reported on user list:
> http://www.lucidimagination.com/search/document/400cbc528ed63db9/score_of_infinity_on_dismax_query

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Issue Comment Edited] (LUCENE-3215) SloppyPhraseScorer sometimes computes Infinite freq

2011-09-17 Thread Doron Cohen (JIRA)

[
https://issues.apache.org/jira/browse/LUCENE-3215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13107209#comment-13107209
]

Doron Cohen edited comment on LUCENE-3215 at 9/17/11 6:56 PM:
--

OK I think I have a fix for this.

While looking at it, I realized that PhraseScorer (the one that used to base
both Exact&Sloppy phrase scorers but now is the base of only sloppy phrase
scorer) is way too complicated and inefficient. All those sort calls after each
matching doc can be avoided.

So I am modifying PhraseScorer to not have a phrase-queue at all - just the
sorted linked list, which is always kept sorted by advancing last beyond first.
Last is renamed to 'min' and first is renamed to 'max'. Making the list cyclic
allows more efficient manipulation of it.

With this, SloppyPhraseScorer is modified to maintain its own phrase queue. The
queue size is set at the first candidate document. In order to handle
repetitions (Same term in different query offsets) it will contain only some of
the pps: those that either have no repetitions, or are the first (lower query
offset) in a repeating group. A linked list of repeating pps was added: so
PhrasePositions has a new member: nextRepeating.

Detection of repeating pps and creation of that list is done once per scorer:
at the first candidate doc.

For solving the bugs reported here, in addition to the initiation of 'end' as
explained in previous comment, advanceRepeatingPPs now also update two values:
- end, in case one of the repeating pps is far ahead (larger)
- position of the first pp in a repeating list (the one that is in the queue -
in case the repeating pp is far behind (smaller). This can happen when there
are holes in the query, as position = tpPOs - offset. It fixes the problem of
false negative distances which caused this bug. It is tricky: relies on that
PhrasePositions.nextPosition() ignores pp.position and just call
positions.nextPosition(). But it is correct, as the modified position is used
to replace pp in the queue.

Last, I think that the test added with holes had one wrong assert: It added
four docs:
- drug drug
- drug druggy drug
- drug druggy druggy drug
- drug druggy drug druggy drug

defined this query (number is the offset):
- drug(1) drug(3)

and expected that with slop=1 the first doc would not be found.
I think it should be found, as the slop operates in both directions.
So modified the query to: drug(1) drug(3)

Patch to follow.

was (Author: doronc):
OK I think I have a fix for this.

Detection of repeating pps and creation of that list is done once per scorer:
at the first candidate doc.

Last, I think that the test added with holes had one wrong assert: It added
four docs:
- drug drug
- drug druggy drug
- drug druggy druggy drug
- drug druggy drug druggy drug
defined this query (number is the offset):
- drug(1) drug(3)
and expected that with slop=1 the first doc would not be found.
I think it should be found, as the slop operates in both directions.
So modified the query to: drug(1) drug(3)

Patch to follow.

> SloppyPhraseScorer sometimes computes Infinite freq
> ---
>
>

[jira] [Commented] (LUCENE-3215) SloppyPhraseScorer sometimes computes Infinite freq

2011-09-17 Thread Doron Cohen (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13107209#comment-13107209
 ] 

Doron Cohen commented on LUCENE-3215:
-

OK I think I have a fix for this.

While looking at it, I realized that PhraseScorer (the one that used to base 
both Exact&Sloppy phrase scorers but now is the base of only sloppy phrase 
scorer) is way too complicated and inefficient. All those sort calls after each 
matching doc can be avoided. 

So I am modifying PhraseScorer to not have a phrase-queue at all - just the 
sorted linked list, which is always kept sorted by advancing last beyond first. 
Last is renamed to 'min' and first is renamed to 'max'. Making the list cyclic 
allows more efficient manipulation of it. 

With this, SloppyPhraseScorer is modified to maintain its own phrase queue. The 
queue size is set at the first candidate document. In order to handle 
repetitions (Same term in different query offsets) it will contain only some of 
the pps: those that either have no repetitions, or are the first (lower query 
offset) in a repeating group. A linked list of repeating pps was added: so 
PhrasePositions has a new member: nextRepeating.

Detection of repeating pps and creation of that list is done once per scorer: 
at the first candidate doc.

For solving the bugs reported here, in addition to the initiation of 'end' as 
explained in previous comment, advanceRepeatingPPs now also update two values:
- end, in case one of the repeating pps is far ahead (larger)
- position of the first pp in a repeating list (the one that is in the queue - 
in case the repeating pp is far behind (smaller). This can happen when there 
are holes in the query, as position = tpPOs - offset. It fixes the problem of 
false negative distances which caused this bug. It is tricky: relies on that 
PhrasePositions.nextPosition() ignores pp.position and just call 
positions.nextPosition(). But it is correct, as the modified position is used 
to replace pp in the queue.

Last, I think that the test added with holes had one wrong assert: It added 
four docs:
- drug drug
- drug druggy drug
- drug druggy druggy drug
- drug druggy drug druggy drug
defined this query (number is the offset):
- drug(1) drug(3)
and expected that with slop=1 the first doc would not be found.
I think it should be found, as the slop operates in both directions.
So modified the query to: drug(1) drug(3)

Patch to follow.

> SloppyPhraseScorer sometimes computes Infinite freq
> ---
>
> Key: LUCENE-3215
> URL: https://issues.apache.org/jira/browse/LUCENE-3215
> Project: Lucene - Java
>  Issue Type: Bug
>Reporter: Robert Muir
>Assignee: Doron Cohen
> Attachments: LUCENE-3215.patch, LUCENE-3215_test.patch, 
> LUCENE-3215_test.patch
>
>
> reported on user list:
> http://www.lucidimagination.com/search/document/400cbc528ed63db9/score_of_infinity_on_dismax_query

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3215) SloppyPhraseScorer sometimes computes Infinite freq

2011-09-08 Thread Doron Cohen (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13100182#comment-13100182
 ] 

Doron Cohen commented on LUCENE-3215:
-

An update on this...

This is not related to LUCENE-3142 - the latter was fixed but this one still 
fails.

The patch fix which 'abs' the distance indeed avoids the infinite score 
problem, but I was not 100% comfortable with it - how can the distance be none 
positive?

Digging into it shows a wrong assumption in SloppyPhraseScorer:

{code}
private int initPhrasePositions() throws IOException {
int end = 0;
{code}

The initial value of end assumes that all positions will be nonnegative.
But this is wrong, as PP position is computed as 

{code}
  position = postings.nextPosition() - offset
{code}

So, whenever the query term appears in the doc in a position smaller than its 
offset in the query, the computed position is negative. The correct 
initialization for end is therefore:

{code}
private int initPhrasePositions() throws IOException {
int end = Integer.MIN_VALUE;
{code}

You would expect this bug to surfaced sooner...

Anyhow, for the 3 tests that Robert added, this only resolve 
testInfiniteFreq1() but the other two tests still fail, investigating...

> SloppyPhraseScorer sometimes computes Infinite freq
> ---
>
> Key: LUCENE-3215
> URL: https://issues.apache.org/jira/browse/LUCENE-3215
> Project: Lucene - Java
>  Issue Type: Bug
>Reporter: Robert Muir
>Assignee: Doron Cohen
> Attachments: LUCENE-3215.patch, LUCENE-3215_test.patch, 
> LUCENE-3215_test.patch
>
>
> reported on user list:
> http://www.lucidimagination.com/search/document/400cbc528ed63db9/score_of_infinity_on_dismax_query

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Assigned] (LUCENE-3215) SloppyPhraseScorer sometimes computes Infinite freq

2011-09-08 Thread Doron Cohen (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-3215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Doron Cohen reassigned LUCENE-3215:
---

Assignee: Doron Cohen

> SloppyPhraseScorer sometimes computes Infinite freq
> ---
>
> Key: LUCENE-3215
> URL: https://issues.apache.org/jira/browse/LUCENE-3215
> Project: Lucene - Java
>  Issue Type: Bug
>Reporter: Robert Muir
>Assignee: Doron Cohen
> Attachments: LUCENE-3215.patch, LUCENE-3215_test.patch, 
> LUCENE-3215_test.patch
>
>
> reported on user list:
> http://www.lucidimagination.com/search/document/400cbc528ed63db9/score_of_infinity_on_dismax_query

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-3412) SloppyPhraseScorer returns non-deterministic results for queries with many repeats

2011-09-08 Thread Doron Cohen (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-3412?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Doron Cohen resolved LUCENE-3412.
-

   Resolution: Fixed
Fix Version/s: 4.0
   3.5
Lucene Fields:   (was: [New])

Fix committed:
- r1166541 - trunk
- r1166563 - 3x

(fix not included in 3.4 RC, therefore marked as 3.5 above)

> SloppyPhraseScorer returns non-deterministic results for queries with many 
> repeats
> --
>
> Key: LUCENE-3412
> URL: https://issues.apache.org/jira/browse/LUCENE-3412
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: core/search
>Affects Versions: 3.1, 3.2, 3.3, 4.0
>Reporter: Michael Ryan
>Assignee: Doron Cohen
> Fix For: 3.5, 4.0
>
> Attachments: LUCENE-3412.patch, LUCENE-3412.patch
>
>
> Proximity queries with many repeats (four or more, based on my testing) 
> return non-deterministic results. I run the same query multiple times with 
> the same data set and get different results.
> So far I've reproduced this with Solr 1.4.1, 3.1, 3.2, 3.3, and latest 4.0 
> trunk.
> Steps to reproduce (using the Solr example):
> 1) In solrconfig.xml, set queryResultCache size to 0.
> 2) Add some documents with text "dog dog dog" and "dog dog dog dog". 
> http://localhost:8983/solr/update?stream.body=%3Cadd%3E%3Cdoc%3E%3Cfield%20name=%22id%22%3E1%3C/field%3E%3Cfield%20name=%22text%22%3Edog%20dog%20dog%3C/field%3E%3C/doc%3E%3Cdoc%3E%3Cfield%20name=%22id%22%3E2%3C/field%3E%3Cfield%20name=%22text%22%3Edog%20dog%20dog%20dog%3C/field%3E%3C/doc%3E%3C/add%3E&commit=true
> 3) Do a "dog dog dog dog"~1 query. 
> http://localhost:8983/solr/select?q=%22dog%20dog%20dog%20dog%22~1
> 4) Repeat step 3 many times.
> Expected results: The document with id 2 should be returned.
> Actual results: The document with id 2 is always returned. The document with 
> id 1 is sometimes returned.
> Different proximity values show the same bug - "dog dog dog dog"~5, "dog dog 
> dog dog"~100, etc show the same behavior.
> So far I've traced it down to the "repeats" array in 
> SloppyPhraseScorer.initPhrasePositions() - depending on the order of the 
> elements in this array, the document may or may not match. I think the 
> HashSet may be to blame, but I'm not sure - that at least seems to be where 
> the non-determinism is coming from.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3412) SloppyPhraseScorer returns non-deterministic results for queries with many repeats

2011-09-07 Thread Doron Cohen (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13100098#comment-13100098
 ] 

Doron Cohen commented on LUCENE-3412:
-

Thanks Michael for verifying this, I'll go ahead and commit.

> SloppyPhraseScorer returns non-deterministic results for queries with many 
> repeats
> --
>
> Key: LUCENE-3412
> URL: https://issues.apache.org/jira/browse/LUCENE-3412
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: core/search
>Affects Versions: 3.1, 3.2, 3.3, 4.0
>Reporter: Michael Ryan
>Assignee: Doron Cohen
> Attachments: LUCENE-3412.patch, LUCENE-3412.patch
>
>
> Proximity queries with many repeats (four or more, based on my testing) 
> return non-deterministic results. I run the same query multiple times with 
> the same data set and get different results.
> So far I've reproduced this with Solr 1.4.1, 3.1, 3.2, 3.3, and latest 4.0 
> trunk.
> Steps to reproduce (using the Solr example):
> 1) In solrconfig.xml, set queryResultCache size to 0.
> 2) Add some documents with text "dog dog dog" and "dog dog dog dog". 
> http://localhost:8983/solr/update?stream.body=%3Cadd%3E%3Cdoc%3E%3Cfield%20name=%22id%22%3E1%3C/field%3E%3Cfield%20name=%22text%22%3Edog%20dog%20dog%3C/field%3E%3C/doc%3E%3Cdoc%3E%3Cfield%20name=%22id%22%3E2%3C/field%3E%3Cfield%20name=%22text%22%3Edog%20dog%20dog%20dog%3C/field%3E%3C/doc%3E%3C/add%3E&commit=true
> 3) Do a "dog dog dog dog"~1 query. 
> http://localhost:8983/solr/select?q=%22dog%20dog%20dog%20dog%22~1
> 4) Repeat step 3 many times.
> Expected results: The document with id 2 should be returned.
> Actual results: The document with id 2 is always returned. The document with 
> id 1 is sometimes returned.
> Different proximity values show the same bug - "dog dog dog dog"~5, "dog dog 
> dog dog"~100, etc show the same behavior.
> So far I've traced it down to the "repeats" array in 
> SloppyPhraseScorer.initPhrasePositions() - depending on the order of the 
> elements in this array, the document may or may not match. I think the 
> HashSet may be to blame, but I'm not sure - that at least seems to be where 
> the non-determinism is coming from.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-3412) SloppyPhraseScorer returns non-deterministic results for queries with many repeats

2011-09-07 Thread Doron Cohen (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-3412?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Doron Cohen updated LUCENE-3412:


Attachment: LUCENE-3412.patch

Attached patch with fix to this bug.

The fix is rather simple, - just process PP's in offset order. That is, when 
avoiding conflicts (a conflict means: more than a single query PP is landing on 
the same doc TP), make sure to handle PPs in a specific order: from first in 
query to last in query. 

This is crucial because the check for conflicts returns the PP with greater 
offset, and that one is advanced.

It was pretty quick to fix this, but took longer to justify the fix.

I added some explanations in the code so that next time justification would be 
faster :) and also renamed termPositionsDiffer() to termPositionsConflict() 
which more accurately describes the logic of that method.

now need to see if this fix is also related to LUCENE-3215.

> SloppyPhraseScorer returns non-deterministic results for queries with many 
> repeats
> --
>
> Key: LUCENE-3412
> URL: https://issues.apache.org/jira/browse/LUCENE-3412
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: core/search
>Affects Versions: 3.1, 3.2, 3.3, 4.0
>Reporter: Michael Ryan
>Assignee: Doron Cohen
> Attachments: LUCENE-3412.patch, LUCENE-3412.patch
>
>
> Proximity queries with many repeats (four or more, based on my testing) 
> return non-deterministic results. I run the same query multiple times with 
> the same data set and get different results.
> So far I've reproduced this with Solr 1.4.1, 3.1, 3.2, 3.3, and latest 4.0 
> trunk.
> Steps to reproduce (using the Solr example):
> 1) In solrconfig.xml, set queryResultCache size to 0.
> 2) Add some documents with text "dog dog dog" and "dog dog dog dog". 
> http://localhost:8983/solr/update?stream.body=%3Cadd%3E%3Cdoc%3E%3Cfield%20name=%22id%22%3E1%3C/field%3E%3Cfield%20name=%22text%22%3Edog%20dog%20dog%3C/field%3E%3C/doc%3E%3Cdoc%3E%3Cfield%20name=%22id%22%3E2%3C/field%3E%3Cfield%20name=%22text%22%3Edog%20dog%20dog%20dog%3C/field%3E%3C/doc%3E%3C/add%3E&commit=true
> 3) Do a "dog dog dog dog"~1 query. 
> http://localhost:8983/solr/select?q=%22dog%20dog%20dog%20dog%22~1
> 4) Repeat step 3 many times.
> Expected results: The document with id 2 should be returned.
> Actual results: The document with id 2 is always returned. The document with 
> id 1 is sometimes returned.
> Different proximity values show the same bug - "dog dog dog dog"~5, "dog dog 
> dog dog"~100, etc show the same behavior.
> So far I've traced it down to the "repeats" array in 
> SloppyPhraseScorer.initPhrasePositions() - depending on the order of the 
> elements in this array, the document may or may not match. I think the 
> HashSet may be to blame, but I'm not sure - that at least seems to be where 
> the non-determinism is coming from.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-3412) SloppyPhraseScorer returns non-deterministic results for queries with many repeats

2011-09-06 Thread Doron Cohen (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-3412?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Doron Cohen updated LUCENE-3412:


Attachment: LUCENE-3412.patch

I am able to see this inconsistent behavior!

Attached patch contains a test that fails on this. The test currently prints 
the trial number, and the first loop always pass in all 30 trials (expected) 
while the second loop always fail (for me) but is inconsistent about when it 
fails. Sometimes, it fails on the first iteration. Some other times it fails on 
the 3rd, 9th, etc.

Quite peculiar... investigating...

> SloppyPhraseScorer returns non-deterministic results for queries with many 
> repeats
> --
>
> Key: LUCENE-3412
> URL: https://issues.apache.org/jira/browse/LUCENE-3412
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: core/search
>Affects Versions: 3.1, 3.2, 3.3, 4.0
>Reporter: Michael Ryan
>Assignee: Doron Cohen
> Attachments: LUCENE-3412.patch
>
>
> Proximity queries with many repeats (four or more, based on my testing) 
> return non-deterministic results. I run the same query multiple times with 
> the same data set and get different results.
> So far I've reproduced this with Solr 1.4.1, 3.1, 3.2, 3.3, and latest 4.0 
> trunk.
> Steps to reproduce (using the Solr example):
> 1) In solrconfig.xml, set queryResultCache size to 0.
> 2) Add some documents with text "dog dog dog" and "dog dog dog dog". 
> http://localhost:8983/solr/update?stream.body=%3Cadd%3E%3Cdoc%3E%3Cfield%20name=%22id%22%3E1%3C/field%3E%3Cfield%20name=%22text%22%3Edog%20dog%20dog%3C/field%3E%3C/doc%3E%3Cdoc%3E%3Cfield%20name=%22id%22%3E2%3C/field%3E%3Cfield%20name=%22text%22%3Edog%20dog%20dog%20dog%3C/field%3E%3C/doc%3E%3C/add%3E&commit=true
> 3) Do a "dog dog dog dog"~1 query. 
> http://localhost:8983/solr/select?q=%22dog%20dog%20dog%20dog%22~1
> 4) Repeat step 3 many times.
> Expected results: The document with id 2 should be returned.
> Actual results: The document with id 2 is always returned. The document with 
> id 1 is sometimes returned.
> Different proximity values show the same bug - "dog dog dog dog"~5, "dog dog 
> dog dog"~100, etc show the same behavior.
> So far I've traced it down to the "repeats" array in 
> SloppyPhraseScorer.initPhrasePositions() - depending on the order of the 
> elements in this array, the document may or may not match. I think the 
> HashSet may be to blame, but I'm not sure - that at least seems to be where 
> the non-determinism is coming from.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Assigned] (LUCENE-3412) SloppyPhraseScorer returns non-deterministic results for queries with many repeats

2011-09-06 Thread Doron Cohen (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-3412?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Doron Cohen reassigned LUCENE-3412:
---

Assignee: Doron Cohen

> SloppyPhraseScorer returns non-deterministic results for queries with many 
> repeats
> --
>
> Key: LUCENE-3412
> URL: https://issues.apache.org/jira/browse/LUCENE-3412
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: core/search
>Affects Versions: 3.1, 3.2, 3.3, 4.0
>Reporter: Michael Ryan
>Assignee: Doron Cohen
>
> Proximity queries with many repeats (four or more, based on my testing) 
> return non-deterministic results. I run the same query multiple times with 
> the same data set and get different results.
> So far I've reproduced this with Solr 1.4.1, 3.1, 3.2, 3.3, and latest 4.0 
> trunk.
> Steps to reproduce (using the Solr example):
> 1) In solrconfig.xml, set queryResultCache size to 0.
> 2) Add some documents with text "dog dog dog" and "dog dog dog dog". 
> http://localhost:8983/solr/update?stream.body=%3Cadd%3E%3Cdoc%3E%3Cfield%20name=%22id%22%3E1%3C/field%3E%3Cfield%20name=%22text%22%3Edog%20dog%20dog%3C/field%3E%3C/doc%3E%3Cdoc%3E%3Cfield%20name=%22id%22%3E2%3C/field%3E%3Cfield%20name=%22text%22%3Edog%20dog%20dog%20dog%3C/field%3E%3C/doc%3E%3C/add%3E&commit=true
> 3) Do a "dog dog dog dog"~1 query. 
> http://localhost:8983/solr/select?q=%22dog%20dog%20dog%20dog%22~1
> 4) Repeat step 3 many times.
> Expected results: The document with id 2 should be returned.
> Actual results: The document with id 2 is always returned. The document with 
> id 1 is sometimes returned.
> Different proximity values show the same bug - "dog dog dog dog"~5, "dog dog 
> dog dog"~100, etc show the same behavior.
> So far I've traced it down to the "repeats" array in 
> SloppyPhraseScorer.initPhrasePositions() - depending on the order of the 
> elements in this array, the document may or may not match. I think the 
> HashSet may be to blame, but I'm not sure - that at least seems to be where 
> the non-determinism is coming from.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-3390) Incorrect sort by Numeric values for documents missing the sorting field

2011-09-02 Thread Doron Cohen (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-3390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Doron Cohen resolved LUCENE-3390.
-

   Resolution: Fixed
Fix Version/s: 3.4
Lucene Fields: [Patch Available]  (was: [New])

Fixed in 3.x r1164794.
Thanks Gilad!

> Incorrect sort by Numeric values for documents missing the sorting field
> 
>
> Key: LUCENE-3390
> URL: https://issues.apache.org/jira/browse/LUCENE-3390
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: core/search
>Affects Versions: 3.3
>Reporter: Gilad Barkai
>Assignee: Doron Cohen
>Priority: Minor
>  Labels: double, float, int, long, numeric, sort
> Fix For: 3.4
>
> Attachments: LUCENE-3390.patch, SortByDouble.java
>
>
> While sorting results over a numeric field, documents which do not contain a 
> value for the sorting field seem to get 0 (ZERO) value in the sort. (Tested 
> against Double, Float, Int & Long numeric fields ascending and descending 
> order).
> This behavior is unexpected, as zero is "comparable" to the rest of the 
> values. A better solution would either be allowing the user to define such a 
> "non-value" default, or always bring those document results as the last ones.
> Example scenario:
> Adding 3 documents, 1st with value 3.5d, 2nd with -10d, and 3rd without any 
> value.
> Searching with MatchAllDocsQuery, with sort over that field in descending 
> order yields the docid results of 0, 2, 1.
> Asking for the top 2 documents brings the document without any value as the 
> 2nd result - which seems as a bug?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Assigned] (LUCENE-3390) Incorrect sort by Numeric values for documents missing the sorting field

2011-09-01 Thread Doron Cohen (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-3390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Doron Cohen reassigned LUCENE-3390:
---

Assignee: Doron Cohen

> Incorrect sort by Numeric values for documents missing the sorting field
> 
>
> Key: LUCENE-3390
> URL: https://issues.apache.org/jira/browse/LUCENE-3390
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: core/search
>Affects Versions: 3.3
>Reporter: Gilad Barkai
>Assignee: Doron Cohen
>Priority: Minor
>  Labels: double, float, int, long, numeric, sort
> Attachments: LUCENE-3390.patch, SortByDouble.java
>
>
> While sorting results over a numeric field, documents which do not contain a 
> value for the sorting field seem to get 0 (ZERO) value in the sort. (Tested 
> against Double, Float, Int & Long numeric fields ascending and descending 
> order).
> This behavior is unexpected, as zero is "comparable" to the rest of the 
> values. A better solution would either be allowing the user to define such a 
> "non-value" default, or always bring those document results as the last ones.
> Example scenario:
> Adding 3 documents, 1st with value 3.5d, 2nd with -10d, and 3rd without any 
> value.
> Searching with MatchAllDocsQuery, with sort over that field in descending 
> order yields the docid results of 0, 2, 1.
> Asking for the top 2 documents brings the document without any value as the 
> 2nd result - which seems as a bug?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-3390) Incorrect sort by Numeric values for documents missing the sorting field

2011-09-01 Thread Doron Cohen (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-3390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Doron Cohen updated LUCENE-3390:


Attachment: LUCENE-3390.patch

Attached patch fixing this bug. 
TestSort was enhanced to test the new setMissingValue() method - actually 
merging the test from trunk r1002460 (LUCENE-2671).

All search test passed (running the rest now..)

> Incorrect sort by Numeric values for documents missing the sorting field
> 
>
> Key: LUCENE-3390
> URL: https://issues.apache.org/jira/browse/LUCENE-3390
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: core/search
>Affects Versions: 3.3
>Reporter: Gilad Barkai
>Priority: Minor
>  Labels: double, float, int, long, numeric, sort
> Attachments: LUCENE-3390.patch, SortByDouble.java
>
>
> While sorting results over a numeric field, documents which do not contain a 
> value for the sorting field seem to get 0 (ZERO) value in the sort. (Tested 
> against Double, Float, Int & Long numeric fields ascending and descending 
> order).
> This behavior is unexpected, as zero is "comparable" to the rest of the 
> values. A better solution would either be allowing the user to define such a 
> "non-value" default, or always bring those document results as the last ones.
> Example scenario:
> Adding 3 documents, 1st with value 3.5d, 2nd with -10d, and 3rd without any 
> value.
> Searching with MatchAllDocsQuery, with sort over that field in descending 
> order yields the docid results of 0, 2, 1.
> Asking for the top 2 documents brings the document without any value as the 
> 2nd result - which seems as a bug?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3390) Incorrect sort by Numeric values for documents missing the sorting field

2011-09-01 Thread Doron Cohen (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13095409#comment-13095409
 ] 

Doron Cohen commented on LUCENE-3390:
-

I think it may be useful to solve this also in 3x - without the 
cached-array-creators of the trunk, but with similar concept - i.e. an 
additional cache "type" will cache the docs missing values for certain field, 
and will allow to use the default value assigned by apps calling 
setMissingValue() as in trunk. Gilad and I looked at this, will post a patch 
shortly for review...

> Incorrect sort by Numeric values for documents missing the sorting field
> 
>
> Key: LUCENE-3390
> URL: https://issues.apache.org/jira/browse/LUCENE-3390
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: core/search
>Affects Versions: 3.3
>Reporter: Gilad Barkai
>Priority: Minor
>  Labels: double, float, int, long, numeric, sort
> Attachments: SortByDouble.java
>
>
> While sorting results over a numeric field, documents which do not contain a 
> value for the sorting field seem to get 0 (ZERO) value in the sort. (Tested 
> against Double, Float, Int & Long numeric fields ascending and descending 
> order).
> This behavior is unexpected, as zero is "comparable" to the rest of the 
> values. A better solution would either be allowing the user to define such a 
> "non-value" default, or always bring those document results as the last ones.
> Example scenario:
> Adding 3 documents, 1st with value 3.5d, 2nd with -10d, and 3rd without any 
> value.
> Searching with MatchAllDocsQuery, with sort over that field in descending 
> order yields the docid results of 0, 2, 1.
> Asking for the top 2 documents brings the document without any value as the 
> 2nd result - which seems as a bug?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-3142) benchmark/stats package is obsolete and unused - remove it

2011-06-30 Thread Doron Cohen (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-3142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Doron Cohen resolved LUCENE-3142.
-

Resolution: Fixed

r1141465: trunk
r1141468: 3x

> benchmark/stats package is obsolete and unused - remove it
> --
>
> Key: LUCENE-3142
> URL: https://issues.apache.org/jira/browse/LUCENE-3142
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: modules/benchmark
>Reporter: Doron Cohen
>Assignee: Doron Cohen
>Priority: Minor
>
> This seems like a leftover from the original benchmark implementation and can 
> thus be removed.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3153) Adding field w/ norms should fail if same field was added w/o norms already

2011-05-31 Thread Doron Cohen (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13041461#comment-13041461
 ] 

Doron Cohen commented on LUCENE-3153:
-

I was not clear enough.

I meant that when deciding on consistency of requested NORMS state, if relying 
only on committed data, then the handling of add/update requests is in a best 
effort manner, while the handling at commit is complete.

So, for this example:

* Index does not contain field F
* doc1 is added with F set to NO NORMS
* doc2 is added with F set to WITH NORMS

I was not sure about the ability to tell that F in doc2 is inconsistent, 
because of relying on committed data, and, perhaps, especially with DWPT.

At commit, it is def possible to check this.

Similarly this scenario has same problem:

* Index contains (committed) field F WITH NORMS
* doc1 is added with F set to NO NORMS
* doc2 is added with F set to WITH NORMS

Again, F in doc2, while consistent with F as committed in the index, is 
inconsistent with previously added F in doc1.

In this situation, throwing the exception due to inconsistencies might have to 
be late in some scenarios (at commit) and hence unacceptable IMO. At the least, 
such a behavior should be specifically requested by application, e.g. by 
setting a STRICT_NORMS mode or something like that in iwcfg. 

I am not convinced going that far is justified.

> Adding field w/ norms should fail if same field was added w/o norms already
> ---
>
> Key: LUCENE-3153
> URL: https://issues.apache.org/jira/browse/LUCENE-3153
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: core/index
>Reporter: Shai Erera
> Fix For: 4.0
>
>
> A spinoff from LUCENE-3146. Consider the following two scenarios, according 
> to how 4.0 currently works:
> * Field "a" is added w/ norms. Sometime later field "a" is added to a 
> document w/o norms -- norms are disabled for field "a", for all docs.
> * Field "a" is added w/o norms - norms are disabled for field "a". Sometime 
> later field "a" is added to a document w/ norms -- app thinks norms were 
> added, while in fact they are dropped.
> This is a bug and case #2 should fail on add/updateDocument - app should know 
> norms were not added. While case #1 isn't great either, it's the only way an 
> app can choose to disable norms for field "a", after instances of it already 
> contain norms, so we should support that scenario.
> In order to detect that early, we should track norms info in .fnx, as Mike 
> describes at LUCENE-3146. Since this changes the index format, we should also 
> update the "file format" page after we do it.
> Not sure what's the deal w/ 3.x indexes that are read by 4.0 code. Initially 
> they won't have .fnx file, so no central norms information exist to detect 
> the cases I've described above. Over time, as segments are merged, .fnx will 
> include information from more and more segments, but there's always a chance 
> few segments will still contain the norms for field "a". I'm not very 
> familiar w/ that part of the code, but I think that:
> * If .fnx says "no norms for field a", the we ignore any norms information 
> that may or may not exist in segments.
> * If .fnx says "norms for field a", then we need to make up some norms values 
> for (old) segments w/ no norms? We need to make up values during segment 
> merge and search?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3164) consolidate various CHANGES.txt into two files: lucene and solr

2011-05-30 Thread Doron Cohen (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13041439#comment-13041439
 ] 

Doron Cohen commented on LUCENE-3164:
-

Agreed, 3 for now, and then we'll see...

> consolidate various CHANGES.txt into two files: lucene and solr
> ---
>
> Key: LUCENE-3164
> URL: https://issues.apache.org/jira/browse/LUCENE-3164
> Project: Lucene - Java
>  Issue Type: Task
>Reporter: Robert Muir
>
> There are CHANGES.txt files everywhere: lucene/contrib has a CHANGES.txt, the 
> benchmark package has its own CHANGES.txt, in trunk all the modules have 
> their own CHANGES.txt, and each solr contrib has its own CHANGES.txt
> I propose we merge these files into a CHANGES.txt for each "product" we make. 
> so that means lucene/CHANGES.txt and solr/CHANGES.txt

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3164) consolidate various CHANGES.txt into two files: lucene and solr

2011-05-30 Thread Doron Cohen (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13041425#comment-13041425
 ] 

Doron Cohen commented on LUCENE-3164:
-

I agree that with frequent releases this is less of an issue.

What are your thoughts about trunk in this regard - would you like there 3 
changes files, i.e. keep one for modules?

> consolidate various CHANGES.txt into two files: lucene and solr
> ---
>
> Key: LUCENE-3164
> URL: https://issues.apache.org/jira/browse/LUCENE-3164
> Project: Lucene - Java
>  Issue Type: Task
>Reporter: Robert Muir
>
> There are CHANGES.txt files everywhere: lucene/contrib has a CHANGES.txt, the 
> benchmark package has its own CHANGES.txt, in trunk all the modules have 
> their own CHANGES.txt, and each solr contrib has its own CHANGES.txt
> I propose we merge these files into a CHANGES.txt for each "product" we make. 
> so that means lucene/CHANGES.txt and solr/CHANGES.txt

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3161) consider warnings from the source compilation

2011-05-30 Thread Doron Cohen (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13041418#comment-13041418
 ] 

Doron Cohen commented on LUCENE-3161:
-

bq. And, I don't think we should in general hide any warnings, even to users 
for the reasons i mentioned above.

+1 for not hiding!

> consider warnings from the source compilation
> -
>
> Key: LUCENE-3161
> URL: https://issues.apache.org/jira/browse/LUCENE-3161
> Project: Lucene - Java
>  Issue Type: Task
>  Components: general/build
>Reporter: Robert Muir
>  Labels: maybe32blocker
> Fix For: 3.3, 4.0
>
>
> as Doron mentioned in his review: At compiling there are various warning 
> printed, I think it would be more assuring for downloaders if the build runs 
> without warning. These warnings are not a stopper.
> we could conditionalize these warnings so that they don't "display" when 
> compiling from actual releases, but I have to wonder if we should hide 
> these... being open source I think we should display all our warts, maybe 
> some contributor sees these warnings and decides they want to submit a patch 
> to fix some of them.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3164) consolidate various CHANGES.txt into two files: lucene and solr

2011-05-30 Thread Doron Cohen (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13041409#comment-13041409
 ] 

Doron Cohen commented on LUCENE-3164:
-

Specifically, current files are:

lucene:
- CHANGES.txt
- contrib/benchmark/CHANGES.txt
- contrib/CHANGES.txt
- contrib/grouping/CHANGES.txt

solr
- CHANGES.txt
- client/ruby/flare/vendor/plugins/engines/CHANGELOG (\?)
- client/ruby/solr-ruby/CHANGES.yml (\?)
- contrib/analysis-extras/CHANGES.txt
- contrib/clustering/CHANGES.txt
- contrib/dataimporthandler/CHANGES.txt
- solr/contrib/extraction/CHANGES.txt
- solr/contrib/uima/CHANGES.txt

In favor of this, all changes would become more easily readable for users in 
the HTML format.

There is a risk that changes in contribs/modules would clutter the core 
changes. For example, today, even small changes in contrib/benchmark are listed 
in the changes file. But when this becomes part of the global changes file, not 
sure if all bm changes would be adequate to be listed there?

> consolidate various CHANGES.txt into two files: lucene and solr
> ---
>
> Key: LUCENE-3164
> URL: https://issues.apache.org/jira/browse/LUCENE-3164
> Project: Lucene - Java
>  Issue Type: Task
>Reporter: Robert Muir
>
> There are CHANGES.txt files everywhere: lucene/contrib has a CHANGES.txt, the 
> benchmark package has its own CHANGES.txt, in trunk all the modules have 
> their own CHANGES.txt, and each solr contrib has its own CHANGES.txt
> I propose we merge these files into a CHANGES.txt for each "product" we make. 
> so that means lucene/CHANGES.txt and solr/CHANGES.txt

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3153) Adding field w/ norms should fail if same field was added w/o norms already

2011-05-30 Thread Doron Cohen (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13041403#comment-13041403
 ] 

Doron Cohen commented on LUCENE-3153:
-

Can this be checked before any commit (/flush)?

Assume 10 docs were added without norms to a fresh index, now, without a commit 
or even a flush, a document is added with norms. Is the info required for 
checking the "configuration" for that field available at that time?

If it is not, this is still just a best effort check.

> Adding field w/ norms should fail if same field was added w/o norms already
> ---
>
> Key: LUCENE-3153
> URL: https://issues.apache.org/jira/browse/LUCENE-3153
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: core/index
>Reporter: Shai Erera
> Fix For: 4.0
>
>
> A spinoff from LUCENE-3146. Consider the following two scenarios, according 
> to how 4.0 currently works:
> * Field "a" is added w/ norms. Sometime later field "a" is added to a 
> document w/o norms -- norms are disabled for field "a", for all docs.
> * Field "a" is added w/o norms - norms are disabled for field "a". Sometime 
> later field "a" is added to a document w/ norms -- app thinks norms were 
> added, while in fact they are dropped.
> This is a bug and case #2 should fail on add/updateDocument - app should know 
> norms were not added. While case #1 isn't great either, it's the only way an 
> app can choose to disable norms for field "a", after instances of it already 
> contain norms, so we should support that scenario.
> In order to detect that early, we should track norms info in .fnx, as Mike 
> describes at LUCENE-3146. Since this changes the index format, we should also 
> update the "file format" page after we do it.
> Not sure what's the deal w/ 3.x indexes that are read by 4.0 code. Initially 
> they won't have .fnx file, so no central norms information exist to detect 
> the cases I've described above. Over time, as segments are merged, .fnx will 
> include information from more and more segments, but there's always a chance 
> few segments will still contain the norms for field "a". I'm not very 
> familiar w/ that part of the code, but I think that:
> * If .fnx says "no norms for field a", the we ignore any norms information 
> that may or may not exist in segments.
> * If .fnx says "norms for field a", then we need to make up some norms values 
> for (old) segments w/ no norms? We need to make up values during segment 
> merge and search?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3142) benchmark/stats package is obsolete and unused - remove it

2011-05-25 Thread Doron Cohen (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13039073#comment-13039073
 ] 

Doron Cohen commented on LUCENE-3142:
-

Just to make sure this is clear, the package in question is: 
o.a.l.benchmark.stats

> benchmark/stats package is obsolete and unused - remove it
> --
>
> Key: LUCENE-3142
> URL: https://issues.apache.org/jira/browse/LUCENE-3142
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: modules/benchmark
>Reporter: Doron Cohen
>Assignee: Doron Cohen
>Priority: Minor
>
> This seems like a leftover from the original benchmark implementation and can 
> thus be removed.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3142) benchmark/stats package is obsolete and unused - remove it

2011-05-25 Thread Doron Cohen (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13039066#comment-13039066
 ] 

Doron Cohen commented on LUCENE-3142:
-

Does anyone see why this should remain? (I will wait ~2 days before actually 
removing it)

> benchmark/stats package is obsolete and unused - remove it
> --
>
> Key: LUCENE-3142
> URL: https://issues.apache.org/jira/browse/LUCENE-3142
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: modules/benchmark
>Reporter: Doron Cohen
>Assignee: Doron Cohen
>Priority: Minor
>
> This seems like a leftover from the original benchmark implementation and can 
> thus be removed.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (LUCENE-3142) benchmark/stats package is obsolete and unused - remove it

2011-05-25 Thread Doron Cohen (JIRA)

benchmark/stats package is obsolete and unused - remove it
--

 Key: LUCENE-3142
 URL: https://issues.apache.org/jira/browse/LUCENE-3142
 Project: Lucene - Java
  Issue Type: Bug
  Components: modules/benchmark
Reporter: Doron Cohen
Assignee: Doron Cohen
Priority: Minor


This seems like a leftover from the original benchmark implementation and can 
thus be removed.


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-3137) Benchmark's ExtractReuters creates its temp dir wrongly if provided out-dir param ends by slash

2011-05-25 Thread Doron Cohen (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-3137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Doron Cohen resolved LUCENE-3137.
-

   Resolution: Fixed
Fix Version/s: 4.0
   3.2

Trunk: r1127436
3x: r1127466

> Benchmark's ExtractReuters creates its temp dir wrongly if provided out-dir 
> param ends by slash
> ---
>
> Key: LUCENE-3137
> URL: https://issues.apache.org/jira/browse/LUCENE-3137
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: modules/benchmark
>Affects Versions: 3.2, 4.0
>Reporter: Doron Cohen
>Assignee: Doron Cohen
>Priority: Minor
> Fix For: 3.2, 4.0
>
> Attachments: LUCENE-3137.patch
>
>
> See LUCENE-929 for context.
> As result, it might fail to create the temp dir at all.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-929) contrib/benchmark build doesn't handle checking if content is properly extracted

2011-05-25 Thread Doron Cohen (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Doron Cohen resolved LUCENE-929.


Resolution: Fixed

bq. Doron, that's fine to open a new issue and close this one, but it was this 
issue's fix that introduced the bug.

Thanks for clarifying!
Okay, so I will fix this in LUCENE-3137 (it makes sense to me at this time 
since this one was resolved 4 months ago and fixed something else) and resolve 
this one.

> contrib/benchmark build doesn't handle checking if content is properly 
> extracted
> 
>
> Key: LUCENE-929
> URL: https://issues.apache.org/jira/browse/LUCENE-929
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: modules/benchmark
>Reporter: Grant Ingersoll
>Assignee: Grant Ingersoll
>Priority: Minor
> Fix For: 4.0, 3.1
>
>
> The contrib/benchmark build does not properly handle checking to see if the 
> content (such as Reuters coll.) is properly extracted.  It only checks to see 
> if the directory exists.  Thus, it is possible that the directory gets 
> created and the extraction fails.  Then, the next time it is run, it skips 
> the extraction part and tries to continue on running the benchmark.
> The workaround is to manually delete the extraction directory.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-929) contrib/benchmark build doesn't handle checking if content is properly extracted

2011-05-24 Thread Doron Cohen (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13038502#comment-13038502
 ] 

Doron Cohen commented on LUCENE-929:


There's now a simple patch for this in LUCENE-3137. 
I think this one can be closed?

> contrib/benchmark build doesn't handle checking if content is properly 
> extracted
> 
>
> Key: LUCENE-929
> URL: https://issues.apache.org/jira/browse/LUCENE-929
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: modules/benchmark
>Reporter: Grant Ingersoll
>Assignee: Grant Ingersoll
>Priority: Minor
> Fix For: 3.1, 4.0
>
>
> The contrib/benchmark build does not properly handle checking to see if the 
> content (such as Reuters coll.) is properly extracted.  It only checks to see 
> if the directory exists.  Thus, it is possible that the directory gets 
> created and the extraction fails.  Then, the next time it is run, it skips 
> the extraction part and tries to continue on running the benchmark.
> The workaround is to manually delete the extraction directory.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-929) contrib/benchmark build doesn't handle checking if content is properly extracted

2011-05-24 Thread Doron Cohen (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13038498#comment-13038498
 ] 

Doron Cohen commented on LUCENE-929:


bq. Note, this fix this doesn't work if the output dir has a trailing slash

I think this is a separate issue - I mean not handling a trailing slash. 
Created LUCENE-3137 for handling this.

> contrib/benchmark build doesn't handle checking if content is properly 
> extracted
> 
>
> Key: LUCENE-929
> URL: https://issues.apache.org/jira/browse/LUCENE-929
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: modules/benchmark
>Reporter: Grant Ingersoll
>Assignee: Grant Ingersoll
>Priority: Minor
> Fix For: 3.1, 4.0
>
>
> The contrib/benchmark build does not properly handle checking to see if the 
> content (such as Reuters coll.) is properly extracted.  It only checks to see 
> if the directory exists.  Thus, it is possible that the directory gets 
> created and the extraction fails.  Then, the next time it is run, it skips 
> the extraction part and tries to continue on running the benchmark.
> The workaround is to manually delete the extraction directory.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-3137) Benchmark's ExtractReuters creates its temp dir wrongly if provided out-dir param ends by slash

2011-05-24 Thread Doron Cohen (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-3137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Doron Cohen updated LUCENE-3137:


Attachment: LUCENE-3137.patch

Simple patch solving this slash problem.

> Benchmark's ExtractReuters creates its temp dir wrongly if provided out-dir 
> param ends by slash
> ---
>
> Key: LUCENE-3137
> URL: https://issues.apache.org/jira/browse/LUCENE-3137
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: modules/benchmark
>Affects Versions: 3.2, 4.0
>Reporter: Doron Cohen
>Assignee: Doron Cohen
>Priority: Minor
> Attachments: LUCENE-3137.patch
>
>
> See LUCENE-929 for context.
> As result, it might fail to create the temp dir at all.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (LUCENE-3137) Benchmark's ExtractReuters creates its temp dir wrongly if provided out-dir param ends by slash

2011-05-24 Thread Doron Cohen (JIRA)

Benchmark's ExtractReuters creates its temp dir wrongly if provided out-dir 
param ends by slash
---

 Key: LUCENE-3137
 URL: https://issues.apache.org/jira/browse/LUCENE-3137
 Project: Lucene - Java
  Issue Type: Bug
  Components: modules/benchmark
Affects Versions: 3.2, 4.0
Reporter: Doron Cohen
Assignee: Doron Cohen
Priority: Minor


See LUCENE-929 for context.
As result, it might fail to create the temp dir at all.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (SOLR-2500) TestSolrProperties sometimes fails with "no such core: core0"

2011-05-22 Thread Doron Cohen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-2500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Doron Cohen resolved SOLR-2500.
---

   Resolution: Fixed
Fix Version/s: 4.0
   3.2

fixed in trunk: r1125932.
merged to 3x: r1125942.

> TestSolrProperties sometimes fails with "no such core: core0"
> -
>
> Key: SOLR-2500
> URL: https://issues.apache.org/jira/browse/SOLR-2500
> Project: Solr
>  Issue Type: Bug
>Affects Versions: 4.0
>Reporter: Robert Muir
>Assignee: Doron Cohen
> Fix For: 3.2, 4.0
>
> Attachments: SOLR-2500.patch, SOLR-2500.patch, SOLR-2500.patch, 
> solr-after-1st-run.xml, solr-clean.xml
>
>
> [junit] Testsuite: 
> org.apache.solr.client.solrj.embedded.TestSolrProperties
> [junit] Testcase: 
> testProperties(org.apache.solr.client.solrj.embedded.TestSolrProperties): 
> Caused an ERROR
> [junit] No such core: core0
> [junit] org.apache.solr.common.SolrException: No such core: core0
> [junit] at 
> org.apache.solr.client.solrj.embedded.EmbeddedSolrServer.request(EmbeddedSolrServer.java:118)
> [junit] at 
> org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:105)
> [junit] at 
> org.apache.solr.client.solrj.embedded.TestSolrProperties.testProperties(TestSolrProperties.java:128)
> [junit] at 
> org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1260)
> [junit] at 
> org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1189)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Assigned] (SOLR-2500) TestSolrProperties sometimes fails with "no such core: core0"

2011-05-22 Thread Doron Cohen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-2500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Doron Cohen reassigned SOLR-2500:
-

Assignee: Doron Cohen

> TestSolrProperties sometimes fails with "no such core: core0"
> -
>
> Key: SOLR-2500
> URL: https://issues.apache.org/jira/browse/SOLR-2500
> Project: Solr
>  Issue Type: Bug
>Affects Versions: 4.0
>Reporter: Robert Muir
>Assignee: Doron Cohen
> Attachments: SOLR-2500.patch, SOLR-2500.patch, SOLR-2500.patch, 
> solr-after-1st-run.xml, solr-clean.xml
>
>
> [junit] Testsuite: 
> org.apache.solr.client.solrj.embedded.TestSolrProperties
> [junit] Testcase: 
> testProperties(org.apache.solr.client.solrj.embedded.TestSolrProperties): 
> Caused an ERROR
> [junit] No such core: core0
> [junit] org.apache.solr.common.SolrException: No such core: core0
> [junit] at 
> org.apache.solr.client.solrj.embedded.EmbeddedSolrServer.request(EmbeddedSolrServer.java:118)
> [junit] at 
> org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:105)
> [junit] at 
> org.apache.solr.client.solrj.embedded.TestSolrProperties.testProperties(TestSolrProperties.java:128)
> [junit] at 
> org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1260)
> [junit] at 
> org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1189)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-3120) span query matches too many docs when two query terms are the same unless inOrder=true

2011-05-19 Thread Doron Cohen (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-3120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Doron Cohen updated LUCENE-3120:


Attachment: LUCENE-3120.patch

Updated patch with fixed test to not depend on analysis module.

> span query matches too many docs when two query terms are the same unless 
> inOrder=true
> --
>
> Key: LUCENE-3120
> URL: https://issues.apache.org/jira/browse/LUCENE-3120
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: core/search
>Reporter: Doron Cohen
>Priority: Minor
> Fix For: 3.2, 4.0
>
> Attachments: LUCENE-3120.patch, LUCENE-3120.patch
>
>
> spinoff of user list discussion - [SpanNearQuery - inOrder 
> parameter|http://markmail.org/message/i4cstlwgjmlcfwlc].
> With 3 documents:
> *  "a b x c d"
> *  "a b b d"
> *  "a b x b y d"
> Here are a few queries (the number in parenthesis indicates expected #hits):
> These ones work *as expected*:
> * (1)  in-order, slop=0, "b", "x", "b"
> * (1)  in-order, slop=0, "b", "b"
> * (2)  in-order, slop=1, "b", "b"
> These ones match *too many* hits:
> * (1)  any-order, slop=0, "b", "x", "b"
> * (1)  any-order, slop=1, "b", "x", "b"
> * (1)  any-order, slop=2, "b", "x", "b"
> * (1)  any-order, slop=3, "b", "x", "b"
> These ones match *too many* hits as well:
> * (1)  any-order, slop=0, "b", "b"
> * (2)  any-order, slop=1, "b", "b"
> Each of the above passes when using a phrase query (applying the slop, no 
> in-order indication in phrase query).
> This seems related to a known overlapping spans issue - [non-overlapping Span 
> queries|http://markmail.org/message/7jxn5eysjagjwlon] - as indicated by Hoss, 
> so we might decide to close this bug after all, but I would like to at least 
> have the junit that exposes the behavior in JIRA.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-2500) TestSolrCoreProperties sometimes fails with "no such core: core0"

2011-05-19 Thread Doron Cohen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-2500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Doron Cohen updated SOLR-2500:
--

Attachment: SOLR-2500.patch

Attached patch, test passes now in both IDE and cmd line:

* at setup() copies solr.xml to a private file. 

* use that private file as its solr.solr.home.

* erase that file at tearDown(), though not erasing it
  should not affect on further/re/tests.

* fixes the deletion at tearDown() to look at 
  solr.solr.home rather than solr.home.
  (I think this was a bug on a bug in this test - it used the
  original file at s.s.h but for cleanup 
  attempted to remove files from just s.h.

This debugging took place in pure darkness, better review...

> TestSolrCoreProperties sometimes fails with "no such core: core0"
> -
>
> Key: SOLR-2500
> URL: https://issues.apache.org/jira/browse/SOLR-2500
> Project: Solr
>  Issue Type: Bug
>Affects Versions: 4.0
>Reporter: Robert Muir
> Attachments: SOLR-2500.patch, SOLR-2500.patch, SOLR-2500.patch, 
> solr-after-1st-run.xml, solr-clean.xml
>
>
> [junit] Testsuite: 
> org.apache.solr.client.solrj.embedded.TestSolrProperties
> [junit] Testcase: 
> testProperties(org.apache.solr.client.solrj.embedded.TestSolrProperties): 
> Caused an ERROR
> [junit] No such core: core0
> [junit] org.apache.solr.common.SolrException: No such core: core0
> [junit] at 
> org.apache.solr.client.solrj.embedded.EmbeddedSolrServer.request(EmbeddedSolrServer.java:118)
> [junit] at 
> org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:105)
> [junit] at 
> org.apache.solr.client.solrj.embedded.TestSolrProperties.testProperties(TestSolrProperties.java:128)
> [junit] at 
> org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1260)
> [junit] at 
> org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1189)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-3123) TestIndexWriter.testBackgroundOptimize fails with too many open files

2011-05-19 Thread Doron Cohen (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-3123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Doron Cohen resolved LUCENE-3123.
-

   Resolution: Fixed
Fix Version/s: 4.0
   3.2

Fixed by Mike, thanks Mike!

> TestIndexWriter.testBackgroundOptimize fails with too many open files
> -
>
> Key: LUCENE-3123
> URL: https://issues.apache.org/jira/browse/LUCENE-3123
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: core/index
> Environment: Linux 2.6.32-31-generic i386/Sun Microsystems Inc. 
> 1.6.0_20 (32-bit)/cpus=1,threads=2
>Reporter: Doron Cohen
> Fix For: 3.2, 4.0
>
>
> Recreate with this line:
> ant test -Dtestcase=TestIndexWriter -Dtestmethod=testBackgroundOptimize 
> -Dtests.seed=-3981504507637360146:51354004663342240
> Might be related to LUCENE-2873 ?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3123) TestIndexWriter.testBackgroundOptimize fails with too many open files

2011-05-19 Thread Doron Cohen (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13036331#comment-13036331
 ] 

Doron Cohen commented on LUCENE-3123:
-

I fact in 3x this is not reproducible with same seed (expected as Robert once 
explained) and I was not able to reproduce it with no seed, tried with 
-Dtest.iter=100 as well (though I am not sure, would a new seed be created in 
each iteration? Need to verify this...)
Anyhow in 3x the test passes also after svn up with this fix.
So I think this can be resolved...

> TestIndexWriter.testBackgroundOptimize fails with too many open files
> -
>
> Key: LUCENE-3123
> URL: https://issues.apache.org/jira/browse/LUCENE-3123
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: core/index
> Environment: Linux 2.6.32-31-generic i386/Sun Microsystems Inc. 
> 1.6.0_20 (32-bit)/cpus=1,threads=2
>Reporter: Doron Cohen
>
> Recreate with this line:
> ant test -Dtestcase=TestIndexWriter -Dtestmethod=testBackgroundOptimize 
> -Dtests.seed=-3981504507637360146:51354004663342240
> Might be related to LUCENE-2873 ?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3123) TestIndexWriter.testBackgroundOptimize fails with too many open files

2011-05-19 Thread Doron Cohen (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13036322#comment-13036322
 ] 

Doron Cohen commented on LUCENE-3123:
-

Yes, thanks, now it passes (trunk) - with this seed as well quite a few times 
without specifying a seed. 
I'll now verify on 3x.

> TestIndexWriter.testBackgroundOptimize fails with too many open files
> -
>
> Key: LUCENE-3123
> URL: https://issues.apache.org/jira/browse/LUCENE-3123
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: core/index
> Environment: Linux 2.6.32-31-generic i386/Sun Microsystems Inc. 
> 1.6.0_20 (32-bit)/cpus=1,threads=2
>Reporter: Doron Cohen
>
> Recreate with this line:
> ant test -Dtestcase=TestIndexWriter -Dtestmethod=testBackgroundOptimize 
> -Dtests.seed=-3981504507637360146:51354004663342240
> Might be related to LUCENE-2873 ?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2500) TestSolrCoreProperties sometimes fails with "no such core: core0"

2011-05-19 Thread Doron Cohen (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-2500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13036300#comment-13036300
 ] 

Doron Cohen commented on SOLR-2500:
---

Oops just noticed I was testing all this time TestSolrProperties and not 
TestSolrCoreProperties, and, because the error message was the same as in the 
issue description *"No such core: core0"* I was sure that this is the same 
test... Now this is confusing...

Hmmm.. the original exception reported above is 
[junit] at 
org.apache.solr.client.solrj.embedded.TestSolrProperties.testProperties(TestSolrProperties.java:128)

So perhaps I was working on the correct bug after all and just the JIRA issue 
title is inaccurate?
Or I need to call it a day... :)

Anyhow, TestSolrProperties consistently behaves as I described here, while 
TestSolrCoreProperties consistently passes (when ran in standalone mode).

> TestSolrCoreProperties sometimes fails with "no such core: core0"
> -
>
> Key: SOLR-2500
> URL: https://issues.apache.org/jira/browse/SOLR-2500
> Project: Solr
>  Issue Type: Bug
>Affects Versions: 4.0
>Reporter: Robert Muir
> Attachments: SOLR-2500.patch, solr-after-1st-run.xml, solr-clean.xml
>
>
> [junit] Testsuite: 
> org.apache.solr.client.solrj.embedded.TestSolrProperties
> [junit] Testcase: 
> testProperties(org.apache.solr.client.solrj.embedded.TestSolrProperties): 
> Caused an ERROR
> [junit] No such core: core0
> [junit] org.apache.solr.common.SolrException: No such core: core0
> [junit] at 
> org.apache.solr.client.solrj.embedded.EmbeddedSolrServer.request(EmbeddedSolrServer.java:118)
> [junit] at 
> org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:105)
> [junit] at 
> org.apache.solr.client.solrj.embedded.TestSolrProperties.testProperties(TestSolrProperties.java:128)
> [junit] at 
> org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1260)
> [junit] at 
> org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1189)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2500) TestSolrCoreProperties sometimes fails with "no such core: core0"

2011-05-19 Thread Doron Cohen (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-2500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13036288#comment-13036288
 ] 

Doron Cohen commented on SOLR-2500:
---

FWIW, also the first clean run would fail if test's tearDown() is modified like 
this:

{noformat}
-persistedFile.delete();
+assertTrue("could not delete "+persistedFile, persistedFile.delete());
{noformat}

For some reason it fails to remove that file - in both Linux and Windows.

> TestSolrCoreProperties sometimes fails with "no such core: core0"
> -
>
> Key: SOLR-2500
> URL: https://issues.apache.org/jira/browse/SOLR-2500
> Project: Solr
>  Issue Type: Bug
>Affects Versions: 4.0
>Reporter: Robert Muir
> Attachments: SOLR-2500.patch, solr-after-1st-run.xml, solr-clean.xml
>
>
> [junit] Testsuite: 
> org.apache.solr.client.solrj.embedded.TestSolrProperties
> [junit] Testcase: 
> testProperties(org.apache.solr.client.solrj.embedded.TestSolrProperties): 
> Caused an ERROR
> [junit] No such core: core0
> [junit] org.apache.solr.common.SolrException: No such core: core0
> [junit] at 
> org.apache.solr.client.solrj.embedded.EmbeddedSolrServer.request(EmbeddedSolrServer.java:118)
> [junit] at 
> org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:105)
> [junit] at 
> org.apache.solr.client.solrj.embedded.TestSolrProperties.testProperties(TestSolrProperties.java:128)
> [junit] at 
> org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1260)
> [junit] at 
> org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1189)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-2500) TestSolrCoreProperties sometimes fails with "no such core: core0"

2011-05-19 Thread Doron Cohen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-2500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Doron Cohen updated SOLR-2500:
--

Attachment: solr-after-1st-run.xml
solr-clean.xml

solr.xml files from trunk/bin/solr/shared:
- clean - with which the test passes.
- after-1st-run - with which it fails.

> TestSolrCoreProperties sometimes fails with "no such core: core0"
> -
>
> Key: SOLR-2500
> URL: https://issues.apache.org/jira/browse/SOLR-2500
> Project: Solr
>  Issue Type: Bug
>Affects Versions: 4.0
>Reporter: Robert Muir
> Attachments: solr-after-1st-run.xml, solr-clean.xml
>
>
> [junit] Testsuite: 
> org.apache.solr.client.solrj.embedded.TestSolrProperties
> [junit] Testcase: 
> testProperties(org.apache.solr.client.solrj.embedded.TestSolrProperties): 
> Caused an ERROR
> [junit] No such core: core0
> [junit] org.apache.solr.common.SolrException: No such core: core0
> [junit] at 
> org.apache.solr.client.solrj.embedded.EmbeddedSolrServer.request(EmbeddedSolrServer.java:118)
> [junit] at 
> org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:105)
> [junit] at 
> org.apache.solr.client.solrj.embedded.TestSolrProperties.testProperties(TestSolrProperties.java:128)
> [junit] at 
> org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1260)
> [junit] at 
> org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1189)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2500) TestSolrCoreProperties sometimes fails with "no such core: core0"

2011-05-19 Thread Doron Cohen (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-2500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13036242#comment-13036242
 ] 

Doron Cohen commented on SOLR-2500:
---

>From Eclipse (XP), passed at 1st attempt, failed at the 2nd!

I am not familiar with this part of the code so it would be too much work to 
track it all the way myself, but I think I can now provide sufficient 
information for solving it.

In Eclipse, after cleaning the project the test passes, and then start failing 
in all successive runs. 
So I assume when you run it isolated you also do clean, which covers Eclipse's 
clean (and more). 

I tracked the content of the cleaned relevant dir before and after the test - 
it is (trunk/)bin/solr - there's only one file that differs between the runs - 
this is bin/solr/shared/solr.xml.

Not sure if this is a bug in the test not cleaning after itself or a bug in the 
code that reads the configuration...

I'll attach here the two file so that you can compare them.


> TestSolrCoreProperties sometimes fails with "no such core: core0"
> -
>
> Key: SOLR-2500
> URL: https://issues.apache.org/jira/browse/SOLR-2500
> Project: Solr
>  Issue Type: Bug
>Affects Versions: 4.0
>Reporter: Robert Muir
>
> [junit] Testsuite: 
> org.apache.solr.client.solrj.embedded.TestSolrProperties
> [junit] Testcase: 
> testProperties(org.apache.solr.client.solrj.embedded.TestSolrProperties): 
> Caused an ERROR
> [junit] No such core: core0
> [junit] org.apache.solr.common.SolrException: No such core: core0
> [junit] at 
> org.apache.solr.client.solrj.embedded.EmbeddedSolrServer.request(EmbeddedSolrServer.java:118)
> [junit] at 
> org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:105)
> [junit] at 
> org.apache.solr.client.solrj.embedded.TestSolrProperties.testProperties(TestSolrProperties.java:128)
> [junit] at 
> org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1260)
> [junit] at 
> org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1189)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3123) TestIndexWriter.testBackgroundOptimize fails with too many open files

2011-05-19 Thread Doron Cohen (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13036163#comment-13036163
 ] 

Doron Cohen commented on LUCENE-3123:
-

This is on Ubuntu btw.

Run log:
{noformat}
NOTE: reproduce with: ant test -Dtestcase=TestIndexWriter 
-Dtestmethod=testBackgroundOptimize 
-Dtests.seed=-3981504507637360146:51354004663342240
NOTE: reproduce with: ant test -Dtestcase=TestIndexWriter 
-Dtestmethod=testBackgroundOptimize 
-Dtests.seed=-3981504507637360146:51354004663342240
The following exceptions were thrown by threads:
*** Thread: Lucene Merge Thread #0 ***
org.apache.lucene.index.MergePolicy$MergeException: 
java.io.FileNotFoundException: /tmp/test4907593285402510583tmp/_51_0.sd (Too 
many open files)
at 
org.apache.lucene.index.ConcurrentMergeScheduler.handleMergeException(ConcurrentMergeScheduler.java:507)
at 
org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:472)
Caused by: java.io.FileNotFoundException: 
/tmp/test4907593285402510583tmp/_51_0.sd (Too many open files)
at java.io.RandomAccessFile.open(Native Method)
at java.io.RandomAccessFile.(RandomAccessFile.java:233)
at 
org.apache.lucene.store.SimpleFSDirectory$SimpleFSIndexInput$Descriptor.(SimpleFSDirectory.java:69)
at 
org.apache.lucene.store.SimpleFSDirectory$SimpleFSIndexInput.(SimpleFSDirectory.java:90)
at 
org.apache.lucene.store.SimpleFSDirectory.openInput(SimpleFSDirectory.java:56)
at org.apache.lucene.store.FSDirectory.openInput(FSDirectory.java:337)
at 
org.apache.lucene.store.MockDirectoryWrapper.openInput(MockDirectoryWrapper.java:402)
at 
org.apache.lucene.index.codecs.mockrandom.MockRandomCodec.fieldsProducer(MockRandomCodec.java:236)
at 
org.apache.lucene.index.PerFieldCodecWrapper$FieldsReader.(PerFieldCodecWrapper.java:113)
at 
org.apache.lucene.index.PerFieldCodecWrapper.fieldsProducer(PerFieldCodecWrapper.java:210)
at 
org.apache.lucene.index.SegmentReader$CoreReaders.(SegmentReader.java:131)
at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:495)
at 
org.apache.lucene.index.IndexWriter$ReaderPool.get(IndexWriter.java:635)
at 
org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:3260)
at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:2930)
at 
org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:379)
at 
org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:447)
NOTE: test params are: codec=RandomCodecProvider: {field=MockRandom}, 
locale=nl_NL, timezone=Turkey
NOTE: all tests run in this JVM:
[TestIndexWriter]
NOTE: Linux 2.6.32-31-generic i386/Sun Microsystems Inc. 1.6.0_20 
(32-bit)/cpus=1,threads=2,free=26480072,total=33468416
{noformat}

> TestIndexWriter.testBackgroundOptimize fails with too many open files
> -
>
> Key: LUCENE-3123
> URL: https://issues.apache.org/jira/browse/LUCENE-3123
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: core/index
> Environment: Linux 2.6.32-31-generic i386/Sun Microsystems Inc. 
> 1.6.0_20 (32-bit)/cpus=1,threads=2
>Reporter: Doron Cohen
>
> Recreate with this line:
> ant test -Dtestcase=TestIndexWriter -Dtestmethod=testBackgroundOptimize 
> -Dtests.seed=-3981504507637360146:51354004663342240
> Might be related to LUCENE-2873 ?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (LUCENE-3123) TestIndexWriter.testBackgroundOptimize fails with too many open files

2011-05-19 Thread Doron Cohen (JIRA)

TestIndexWriter.testBackgroundOptimize fails with too many open files
-

 Key: LUCENE-3123
 URL: https://issues.apache.org/jira/browse/LUCENE-3123
 Project: Lucene - Java
  Issue Type: Bug
  Components: core/index
 Environment: Linux 2.6.32-31-generic i386/Sun Microsystems Inc. 
1.6.0_20 (32-bit)/cpus=1,threads=2
Reporter: Doron Cohen


Recreate with this line:

ant test -Dtestcase=TestIndexWriter -Dtestmethod=testBackgroundOptimize 
-Dtests.seed=-3981504507637360146:51354004663342240

Might be related to LUCENE-2873 ?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3068) The repeats mechanism in SloppyPhraseScorer is broken when doc has tokens at same position

2011-05-19 Thread Doron Cohen (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13036111#comment-13036111
 ] 

Doron Cohen commented on LUCENE-3068:
-

bq. Note that if you go back to the root page, and click on a given day, it 
tells you the svn rev and also hg ref (of luceneutil)

Great, thanks!

So, this commit to trunk in r1124293 falls between these two:

- Tue 17/05/2011 Lucene/Solr trunk rev 1104671
- Wed 18/05/2011 Lucene/Solr trunk rev 1124524

... No measurable degradation, good!

> The repeats mechanism in SloppyPhraseScorer is broken when doc has tokens at 
> same position
> --
>
> Key: LUCENE-3068
> URL: https://issues.apache.org/jira/browse/LUCENE-3068
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: core/search
>Affects Versions: 3.0.3, 3.1, 4.0
>Reporter: Michael McCandless
>Assignee: Doron Cohen
>Priority: Minor
> Fix For: 3.2, 4.0
>
> Attachments: LUCENE-3068.patch, LUCENE-3068.patch, LUCENE-3068.patch, 
> LUCENE-3068.patch
>
>
> In LUCENE-736 we made fixes to SloppyPhraseScorer, because it was
> matching docs that it shouldn't; but I think those changes caused it
> to fail to match docs that it should, specifically when the doc itself
> has tokens at the same position.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3068) The repeats mechanism in SloppyPhraseScorer is broken when doc has tokens at same position

2011-05-19 Thread Doron Cohen (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13036107#comment-13036107
 ] 

Doron Cohen commented on LUCENE-3068:
-

Looking at http://people.apache.org/~mikemccand/lucenebench/SloppyPhrase.html 
(Mike this is a great tool!) I see no particular slowdown at the last runs.

A thought about these benchmarks, it would be helpful if the checked revision 
would be shown - perhaps as part of the hover text when hovering the mouse on a 
graph point...

> The repeats mechanism in SloppyPhraseScorer is broken when doc has tokens at 
> same position
> --
>
> Key: LUCENE-3068
> URL: https://issues.apache.org/jira/browse/LUCENE-3068
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: core/search
>Affects Versions: 3.0.3, 3.1, 4.0
>Reporter: Michael McCandless
>Assignee: Doron Cohen
>Priority: Minor
> Fix For: 3.2, 4.0
>
> Attachments: LUCENE-3068.patch, LUCENE-3068.patch, LUCENE-3068.patch, 
> LUCENE-3068.patch
>
>
> In LUCENE-736 we made fixes to SloppyPhraseScorer, because it was
> matching docs that it shouldn't; but I think those changes caused it
> to fail to match docs that it should, specifically when the doc itself
> has tokens at the same position.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-3120) span query matches too many docs when two query terms are the same unless inOrder=true

2011-05-19 Thread Doron Cohen (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-3120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Doron Cohen updated LUCENE-3120:


Attachment: LUCENE-3120.patch

Attached test case demonstrating the bug.

> span query matches too many docs when two query terms are the same unless 
> inOrder=true
> --
>
> Key: LUCENE-3120
> URL: https://issues.apache.org/jira/browse/LUCENE-3120
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: core/search
>Reporter: Doron Cohen
>Priority: Minor
> Fix For: 3.2, 4.0
>
> Attachments: LUCENE-3120.patch
>
>
> spinoff of user list discussion - [SpanNearQuery - inOrder 
> parameter|http://markmail.org/message/i4cstlwgjmlcfwlc].
> With 3 documents:
> *  "a b x c d"
> *  "a b b d"
> *  "a b x b y d"
> Here are a few queries (the number in parenthesis indicates expected #hits):
> These ones work *as expected*:
> * (1)  in-order, slop=0, "b", "x", "b"
> * (1)  in-order, slop=0, "b", "b"
> * (2)  in-order, slop=1, "b", "b"
> These ones match *too many* hits:
> * (1)  any-order, slop=0, "b", "x", "b"
> * (1)  any-order, slop=1, "b", "x", "b"
> * (1)  any-order, slop=2, "b", "x", "b"
> * (1)  any-order, slop=3, "b", "x", "b"
> These ones match *too many* hits as well:
> * (1)  any-order, slop=0, "b", "b"
> * (2)  any-order, slop=1, "b", "b"
> Each of the above passes when using a phrase query (applying the slop, no 
> in-order indication in phrase query).
> This seems related to a known overlapping spans issue - [non-overlapping Span 
> queries|http://markmail.org/message/7jxn5eysjagjwlon] - as indicated by Hoss, 
> so we might decide to close this bug after all, but I would like to at least 
> have the junit that exposes the behavior in JIRA.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (LUCENE-3120) span query matches too many docs when two query terms are the same unless inOrder=true

2011-05-19 Thread Doron Cohen (JIRA)

span query matches too many docs when two query terms are the same unless 
inOrder=true
--

 Key: LUCENE-3120
 URL: https://issues.apache.org/jira/browse/LUCENE-3120
 Project: Lucene - Java
  Issue Type: Bug
  Components: core/search
Reporter: Doron Cohen
Priority: Minor
 Fix For: 3.2, 4.0


spinoff of user list discussion - [SpanNearQuery - inOrder 
parameter|http://markmail.org/message/i4cstlwgjmlcfwlc].

With 3 documents:
*  "a b x c d"
*  "a b b d"
*  "a b x b y d"

Here are a few queries (the number in parenthesis indicates expected #hits):


These ones work *as expected*:
* (1)  in-order, slop=0, "b", "x", "b"
* (1)  in-order, slop=0, "b", "b"
* (2)  in-order, slop=1, "b", "b"

These ones match *too many* hits:
* (1)  any-order, slop=0, "b", "x", "b"
* (1)  any-order, slop=1, "b", "x", "b"
* (1)  any-order, slop=2, "b", "x", "b"
* (1)  any-order, slop=3, "b", "x", "b"

These ones match *too many* hits as well:
* (1)  any-order, slop=0, "b", "b"
* (2)  any-order, slop=1, "b", "b"

Each of the above passes when using a phrase query (applying the slop, no 
in-order indication in phrase query).

This seems related to a known overlapping spans issue - [non-overlapping Span 
queries|http://markmail.org/message/7jxn5eysjagjwlon] - as indicated by Hoss, 
so we might decide to close this bug after all, but I would like to at least 
have the junit that exposes the behavior in JIRA.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3068) The repeats mechanism in SloppyPhraseScorer is broken when doc has tokens at same position

2011-05-18 Thread Doron Cohen (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13035643#comment-13035643
 ] 

Doron Cohen commented on LUCENE-3068:
-

I wonder if this should be fixed also in 3.1 branch?
Probably so only if we make a 3.1.1, but not needed if its gonna be a 3.2. 
What's the best practice then? Reopen until decision?
Or rely on rescanning all 3.2 changes in case its gonna be 3.1.1?

> The repeats mechanism in SloppyPhraseScorer is broken when doc has tokens at 
> same position
> --
>
> Key: LUCENE-3068
> URL: https://issues.apache.org/jira/browse/LUCENE-3068
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: core/search
>Affects Versions: 3.0.3, 3.1, 4.0
>Reporter: Michael McCandless
>Assignee: Doron Cohen
>Priority: Minor
> Fix For: 3.2, 4.0
>
> Attachments: LUCENE-3068.patch, LUCENE-3068.patch, LUCENE-3068.patch, 
> LUCENE-3068.patch
>
>
> In LUCENE-736 we made fixes to SloppyPhraseScorer, because it was
> matching docs that it shouldn't; but I think those changes caused it
> to fail to match docs that it should, specifically when the doc itself
> has tokens at the same position.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-3068) The repeats mechanism in SloppyPhraseScorer is broken when doc has tokens at same position

2011-05-18 Thread Doron Cohen (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-3068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Doron Cohen resolved LUCENE-3068.
-

Resolution: Fixed

fix merged to 3x in r1124302.

> The repeats mechanism in SloppyPhraseScorer is broken when doc has tokens at 
> same position
> --
>
> Key: LUCENE-3068
> URL: https://issues.apache.org/jira/browse/LUCENE-3068
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: core/search
>Affects Versions: 3.0.3, 3.1, 4.0
>Reporter: Michael McCandless
>Assignee: Doron Cohen
>Priority: Minor
> Fix For: 3.2, 4.0
>
> Attachments: LUCENE-3068.patch, LUCENE-3068.patch, LUCENE-3068.patch, 
> LUCENE-3068.patch
>
>
> In LUCENE-736 we made fixes to SloppyPhraseScorer, because it was
> matching docs that it shouldn't; but I think those changes caused it
> to fail to match docs that it should, specifically when the doc itself
> has tokens at the same position.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3068) The repeats mechanism in SloppyPhraseScorer is broken when doc has tokens at same position

2011-05-18 Thread Doron Cohen (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13035422#comment-13035422
 ] 

Doron Cohen commented on LUCENE-3068:
-

fixed in trunk in r1124293.

> The repeats mechanism in SloppyPhraseScorer is broken when doc has tokens at 
> same position
> --
>
> Key: LUCENE-3068
> URL: https://issues.apache.org/jira/browse/LUCENE-3068
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: core/search
>Affects Versions: 3.0.3, 3.1, 4.0
>Reporter: Michael McCandless
>Assignee: Doron Cohen
>Priority: Minor
> Fix For: 3.2, 4.0
>
> Attachments: LUCENE-3068.patch, LUCENE-3068.patch, LUCENE-3068.patch, 
> LUCENE-3068.patch
>
>
> In LUCENE-736 we made fixes to SloppyPhraseScorer, because it was
> matching docs that it shouldn't; but I think those changes caused it
> to fail to match docs that it should, specifically when the doc itself
> has tokens at the same position.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-2736) Wrong implementation of DocIdSetIterator.advance

2011-05-17 Thread Doron Cohen (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13034618#comment-13034618
 ] 

Doron Cohen commented on LUCENE-2736:
-

Shai, with the modified text the NOTE on "implementations freedom to not 
advance beyond in some situations" becomes strange... I think that the original 
text stress the fact the "real intended" behavior is to do advance beyond 
current, just that for performance reasons the decision whether to advance 
beyond in some situations is left for implementation decision, and so, if 
caller provides a target which is not greater than current, it should be aware 
of this possibility. 

So I think it is perhaps better to either not modify this at all, or at most, 
to add "(see NOTE below)" just after "beyond":

{noformat}
-   * Advances to the first beyond the current whose document number is greater
+   * Advances to the first beyond (see NOTE below) the current whose document 
number is greater
{noformat}

This would prevent the confusion I think?

> Wrong implementation of DocIdSetIterator.advance 
> -
>
> Key: LUCENE-2736
> URL: https://issues.apache.org/jira/browse/LUCENE-2736
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: core/search
>Affects Versions: 3.2, 4.0
>Reporter: Hardy Ferentschik
>Assignee: Shai Erera
> Attachments: LUCENE-2736.patch
>
>
> Implementations of {{DocIdSetIterator}} behave differently when advanced is 
> called. Taking the following test for {{OpenBitSet}}, {{DocIdBitSet}} and 
> {{SortedVIntList}} only {{SortedVIntList}} passes the test:
> {code:title=org.apache.lucene.search.TestDocIdSet.java|borderStyle=solid}
> ...
>   public void testAdvanceWithOpenBitSet() throws IOException {
>   DocIdSet idSet = new OpenBitSet( new long[] { 1121 }, 1 );  // 
> bits 0, 5, 6, 10
>   assertAdvance( idSet );
>   }
>   public void testAdvanceDocIdBitSet() throws IOException {
>   BitSet bitSet = new BitSet();
>   bitSet.set( 0 );
>   bitSet.set( 5 );
>   bitSet.set( 6 );
>   bitSet.set( 10 );
>   DocIdSet idSet = new DocIdBitSet(bitSet);
>   assertAdvance( idSet );
>   }
>   public void testAdvanceWithSortedVIntList() throws IOException {
>   DocIdSet idSet = new SortedVIntList( 0, 5, 6, 10 );
>   assertAdvance( idSet );
>   }   
>   private void assertAdvance(DocIdSet idSet) throws IOException {
>   DocIdSetIterator iter = idSet.iterator();
>   int docId = iter.nextDoc();
>   assertEquals( "First doc id should be 0", 0, docId );
>   docId = iter.nextDoc();
>   assertEquals( "Second doc id should be 5", 5, docId );
>   docId = iter.advance( 5 );
>   assertEquals( "Advancing iterator should return the next doc 
> id", 6, docId );
>   }
> {code}
> The javadoc for {{advance}} says:
> {quote}
> Advances to the first *beyond* the current whose document number is greater 
> than or equal to _target_.
> {quote}
> This seems to indicate that {{SortedVIntList}} behaves correctly, whereas the 
> other two don't. 
> Just looking at the {{DocIdBitSet}} implementation advance is implemented as:
> {code}
> bitSet.nextSetBit(target);
> {code}
> where the docs of {{nextSetBit}} say:
> {quote}
> Returns the index of the first bit that is set to true that occurs *on or 
> after* the specified starting index
> {quote}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3034) If you vary a setting per round and that setting is a long string, the report padding/columns break down.

2011-05-12 Thread Doron Cohen (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13032499#comment-13032499
 ] 

Doron Cohen commented on LUCENE-3034:
-

bq. My original workaround was to simply pad the column name

Yeah that's what I meant, so ok, better formatting will help.

> If you vary a setting per round and that setting is a long string, the report 
> padding/columns break down.
> -
>
> Key: LUCENE-3034
> URL: https://issues.apache.org/jira/browse/LUCENE-3034
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: contrib/benchmark
>Reporter: Mark Miller
>Assignee: Mark Miller
>Priority: Trivial
> Fix For: 3.1.1, 4.0
>
>
> This is especially noticeable if you vary a setting where the value is a 
> fully specified class name - in this case, it would be nice if columns in 
> each row still lined up.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3034) If you vary a setting per round and that setting is a long string, the report padding/columns break down.

2011-05-12 Thread Doron Cohen (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13032453#comment-13032453
 ] 

Doron Cohen commented on LUCENE-3034:
-

Hi Mark, could you add an example algorithm with this behavior?

Also, this is from the package javadocs:

{code}
# multi val params are iterated by NewRound's, added to reports, start with 
column name.
merge.factor=mrg:10:20
max.buffered=buf:100:1000
{code}

Is it possible to workaround the problem by specifying a sufficiently long 
column name as the first value, that is, replacing e.g. 'mrg' or 'buf' in the 
above?

> If you vary a setting per round and that setting is a long string, the report 
> padding/columns break down.
> -
>
> Key: LUCENE-3034
> URL: https://issues.apache.org/jira/browse/LUCENE-3034
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: contrib/benchmark
>Reporter: Mark Miller
>Assignee: Mark Miller
>Priority: Trivial
> Fix For: 3.1.1, 4.0
>
>
> This is especially noticeable if you vary a setting where the value is a 
> fully specified class name - in this case, it would be nice if columns in 
> each row still lined up.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-3068) The repeats mechanism in SloppyPhraseScorer is broken when doc has tokens at same position

2011-05-05 Thread Doron Cohen (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-3068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Doron Cohen updated LUCENE-3068:


Attachment: LUCENE-3068.patch

Patch with more test cases - AND/OR logic for MPQ is combined, and test code 
made simpler.

> The repeats mechanism in SloppyPhraseScorer is broken when doc has tokens at 
> same position
> --
>
> Key: LUCENE-3068
> URL: https://issues.apache.org/jira/browse/LUCENE-3068
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: Search
>Affects Versions: 3.0.3, 3.1, 4.0
>Reporter: Michael McCandless
>Assignee: Doron Cohen
>Priority: Minor
> Fix For: 3.2, 4.0
>
> Attachments: LUCENE-3068.patch, LUCENE-3068.patch, LUCENE-3068.patch, 
> LUCENE-3068.patch
>
>
> In LUCENE-736 we made fixes to SloppyPhraseScorer, because it was
> matching docs that it shouldn't; but I think those changes caused it
> to fail to match docs that it should, specifically when the doc itself
> has tokens at the same position.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3068) The repeats mechanism in SloppyPhraseScorer is broken when doc has tokens at same position

2011-05-05 Thread Doron Cohen (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13029274#comment-13029274
 ] 

Doron Cohen commented on LUCENE-3068:
-

Thanks for reviewing Shai!
I'll updated the patch with random newDirectory and newICFG - not the focus 
here, but may improve coverage anyhow,
I added tests for the combined case - some AND some OR - that is, using MPQ, 
some add() with a single term (AND), some with an array longer than 1 (OR). 
Also refactored the tests a bit so that now there's a small test method for 
each test case.

> The repeats mechanism in SloppyPhraseScorer is broken when doc has tokens at 
> same position
> --
>
> Key: LUCENE-3068
> URL: https://issues.apache.org/jira/browse/LUCENE-3068
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: Search
>Affects Versions: 3.0.3, 3.1, 4.0
>Reporter: Michael McCandless
>Assignee: Doron Cohen
>Priority: Minor
> Fix For: 3.2, 4.0
>
> Attachments: LUCENE-3068.patch, LUCENE-3068.patch, LUCENE-3068.patch
>
>
> In LUCENE-736 we made fixes to SloppyPhraseScorer, because it was
> matching docs that it shouldn't; but I think those changes caused it
> to fail to match docs that it should, specifically when the doc itself
> has tokens at the same position.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-3068) The repeats mechanism in SloppyPhraseScorer is broken when doc has tokens at same position

2011-05-05 Thread Doron Cohen (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-3068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Doron Cohen updated LUCENE-3068:


Attachment: LUCENE-3068.patch

Attached patch fixes this bug by excluding fro the repeats check those PPs 
originated fro same offset in the query. 

This allows more strict phrase queries: strict on terms in same position (AND 
logic) but still sloppy.

All tests pass, this is ready to go in (unless there are reservations).

> The repeats mechanism in SloppyPhraseScorer is broken when doc has tokens at 
> same position
> --
>
> Key: LUCENE-3068
> URL: https://issues.apache.org/jira/browse/LUCENE-3068
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: Search
>Affects Versions: 3.0.3, 3.1, 4.0
>Reporter: Michael McCandless
>Assignee: Doron Cohen
>Priority: Minor
> Fix For: 3.2, 4.0
>
> Attachments: LUCENE-3068.patch, LUCENE-3068.patch, LUCENE-3068.patch
>
>
> In LUCENE-736 we made fixes to SloppyPhraseScorer, because it was
> matching docs that it shouldn't; but I think those changes caused it
> to fail to match docs that it should, specifically when the doc itself
> has tokens at the same position.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3068) The repeats mechanism in SloppyPhraseScorer is broken when doc has tokens at same position

2011-05-04 Thread Doron Cohen (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13029150#comment-13029150
 ] 

Doron Cohen commented on LUCENE-3068:
-

This is more complex than I originally thought.

# QueryParser creates a MultiplePhraseQuery (MPQ) when one of the (phrase) 
query positions is a multi-term.
# MPQ has an implicit OR behavior - it is used for e.g. wildcarding a phrase 
query.
# PhraseQuery (PQ) sloppy scorer assumes each query position has a single term.
# PQ with several terms in same position cannot be created by parsing it with a 
QP, only manual.
  Manually created, it would have an AND semantics: only docs with ALL the 
terms in pos N should match.
  In other words, assume doc D terms and positions are: 
  a:0 b:1 c:1 d:2
  MPQ for (a,b):0 d:1 should match D, finding the phrase b:1 d:2 (OR semantics)
  PQ for (a,b):0 d:1 should not match D, because it does not contain 'a' and 
'b' in the same position (AND semantics).


Therefore, rewriting PQ into MPQ is not a valid fix, because it would turn the 
AND logic assumed by creating the PQ this way, by an OR logic as assumed in 
MPQ. 

{code:title=TestPositionIncrement.testSetPosition has a test for this case 
exactly}
// phrase query should fail for non existing searched term 
// even if there exist another searched terms in the same searched 
position. 
q = new PhraseQuery();
q.add(new Term("field", "3"),0);
q.add(new Term("field", "9"),0);
hits = searcher.search(q, null, 1000).scoreDocs;
assertEquals(0, hits.length);
{code}

Although QP by default will not create this PQ, I think we need to support it, 
for applications needing to be strict with the search results, with slop. 

So fixing this would need to take place inside SloppyScorer, digging further...

> The repeats mechanism in SloppyPhraseScorer is broken when doc has tokens at 
> same position
> --
>
> Key: LUCENE-3068
> URL: https://issues.apache.org/jira/browse/LUCENE-3068
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: Search
>Affects Versions: 3.0.3, 3.1, 4.0
>Reporter: Michael McCandless
>Assignee: Doron Cohen
>Priority: Minor
> Fix For: 3.2, 4.0
>
> Attachments: LUCENE-3068.patch, LUCENE-3068.patch
>
>
> In LUCENE-736 we made fixes to SloppyPhraseScorer, because it was
> matching docs that it shouldn't; but I think those changes caused it
> to fail to match docs that it should, specifically when the doc itself
> has tokens at the same position.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-3068) The repeats mechanism in SloppyPhraseScorer is broken when doc has tokens at same position

2011-05-04 Thread Doron Cohen (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-3068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Doron Cohen updated LUCENE-3068:


Attachment: LUCENE-3068.patch

Attached modified version of the test - one that invokes the query parser to 
create an MFQ. The test passes.

> The repeats mechanism in SloppyPhraseScorer is broken when doc has tokens at 
> same position
> --
>
> Key: LUCENE-3068
> URL: https://issues.apache.org/jira/browse/LUCENE-3068
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: Search
>Affects Versions: 3.0.3, 3.1, 4.0
>Reporter: Michael McCandless
>Assignee: Doron Cohen
>Priority: Minor
> Fix For: 3.2, 4.0
>
> Attachments: LUCENE-3068.patch, LUCENE-3068.patch
>
>
> In LUCENE-736 we made fixes to SloppyPhraseScorer, because it was
> matching docs that it shouldn't; but I think those changes caused it
> to fail to match docs that it should, specifically when the doc itself
> has tokens at the same position.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3068) The repeats mechanism in SloppyPhraseScorer is broken when doc has tokens at same position

2011-05-04 Thread Doron Cohen (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13028895#comment-13028895
 ] 

Doron Cohen commented on LUCENE-3068:
-

bq. specifically when the doc itself has tokens at the same position.

I am not convinced yet that there is a bug here - I think the code does allow 
this? 

There is another assumption in the code, that any two different PPs are in 
different TPs - which underlines the assumption that originally each PP differs 
in position, This seems a valid assumption, because QP will create MFQ if there 
are two terms in the (phrase) query with same position. 

bq. maybe any time a *PhraseQuery has overlapping positions, we should rewrite 
to a MultiPhraseQuery and let it handle the same positions...? Is there any 
downside to that?

I think this is the correct behavior - in particular this will be the query 
that a QP will create. The only way to create a PQ (not MPQ) for PPs in same 
positions is to create it manually. But why would anyone do that? And they did, 
wouldn't such a rewrite be a surprise to them?

A patch to follow with a revised version of this test - one that uses the QP. 
In this patch the QP indeed creates an MFQ, and I am yet unable to make it 
fail. Still trying.

> The repeats mechanism in SloppyPhraseScorer is broken when doc has tokens at 
> same position
> --
>
> Key: LUCENE-3068
> URL: https://issues.apache.org/jira/browse/LUCENE-3068
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: Search
>Affects Versions: 3.0.3, 3.1, 4.0
>Reporter: Michael McCandless
>Assignee: Doron Cohen
>Priority: Minor
> Fix For: 3.2, 4.0
>
> Attachments: LUCENE-3068.patch
>
>
> In LUCENE-736 we made fixes to SloppyPhraseScorer, because it was
> matching docs that it shouldn't; but I think those changes caused it
> to fail to match docs that it should, specifically when the doc itself
> has tokens at the same position.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Assigned] (LUCENE-3068) The repeats mechanism in SloppyPhraseScorer is broken when doc has tokens at same position

2011-05-04 Thread Doron Cohen (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-3068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Doron Cohen reassigned LUCENE-3068:
---

Assignee: Doron Cohen

> The repeats mechanism in SloppyPhraseScorer is broken when doc has tokens at 
> same position
> --
>
> Key: LUCENE-3068
> URL: https://issues.apache.org/jira/browse/LUCENE-3068
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: Search
>Affects Versions: 3.0.3, 3.1, 4.0
>Reporter: Michael McCandless
>Assignee: Doron Cohen
>Priority: Minor
> Fix For: 3.2, 4.0
>
> Attachments: LUCENE-3068.patch
>
>
> In LUCENE-736 we made fixes to SloppyPhraseScorer, because it was
> matching docs that it shouldn't; but I think those changes caused it
> to fail to match docs that it should, specifically when the doc itself
> has tokens at the same position.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-3010) Add the ability for the Lucene Benchmark code to read Solr configuration information for testing Analyzer/Filter Chains

2011-04-11 Thread Doron Cohen (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-3010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Doron Cohen updated LUCENE-3010:


Description: I would like to be able to use the Lucene Benchmark code in 
Lucene contrib with Solr to run some indexing tests.  It would be nice if 
Lucene Benchmark could read my Solr configuration rather than having to 
translate my filter chain and other parameters into Lucene java code.  This 
relates to LUCENE-2845,   (was: I would like to be able to use the Lucene 
Benchmark code in Lucene contrib with Solr to run some indexing tests.  It 
would be nice if Lucene Benchmark could read my Solr configuration rather than 
having to translate my filter chain and other parameters into Lucene java code. 
 This relates to Lucene 2845, )

> Add the ability for the  Lucene Benchmark code to read Solr configuration 
> information for testing Analyzer/Filter Chains
> 
>
> Key: LUCENE-3010
> URL: https://issues.apache.org/jira/browse/LUCENE-3010
> Project: Lucene - Java
>  Issue Type: Wish
>  Components: contrib/benchmark
>Reporter: Tom Burton-West
>Priority: Trivial
>
> I would like to be able to use the Lucene Benchmark code in Lucene contrib 
> with Solr to run some indexing tests.  It would be nice if Lucene Benchmark 
> could read my Solr configuration rather than having to translate my filter 
> chain and other parameters into Lucene java code.  This relates to 
> LUCENE-2845, 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-2952) Make license checking/maintenance easier/automated

2011-03-24 Thread Doron Cohen (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13010850#comment-13010850
 ] 

Doron Cohen commented on LUCENE-2952:
-

Eclipse complains for top common-build.xml (trunk, 3x) that default target 
"validate" does not exist in the project:
{code} Make license checking/maintenance easier/automated
> --
>
> Key: LUCENE-2952
> URL: https://issues.apache.org/jira/browse/LUCENE-2952
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Grant Ingersoll
>Assignee: Grant Ingersoll
>Priority: Minor
> Fix For: 3.2, 4.0
>
> Attachments: LUCENE-2952.patch, LUCENE-2952.patch, LUCENE-2952.patch, 
> LUCENE-2952.patch, LUCENE-2952.patch, LUCENE-2952.patch, LUCENE-2952.patch
>
>
> Instead of waiting until release to check licenses are valid, we should make 
> it a part of our build process to ensure that all dependencies have proper 
> licenses, etc.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-2988) trunk 'ant test' hangs

2011-03-24 Thread Doron Cohen (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-2988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Doron Cohen resolved LUCENE-2988.
-

   Resolution: Cannot Reproduce
Lucene Fields:   (was: [New])

Could not reproduce once moved from ant 1.8.1 to 1.7.1.

> trunk 'ant test' hangs
> --
>
> Key: LUCENE-2988
> URL: https://issues.apache.org/jira/browse/LUCENE-2988
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: Tests
> Environment: inspected so far on XP within Cygwin using IBM JDK 6
>Reporter: Doron Cohen
>Assignee: Doron Cohen
> Fix For: 4.0
>
> Attachments: 5-java-dumps.zip
>
>
> Running 'ant test' from trunk on XP in a Cygwin shell hangs.
> There was no progress in the console for a long time, so i stopped the 
> program.
> Before stopping it, created 5 consecutive thread dumps to see where the code 
> is.
> It is not clear what is going on - does not seem like a Lucene code I think 
> but not sure.
> Opening this issue to keep an eye on this - I will try with other JDKs to see 
> if this is persistent.
> Also, when first seeing this had local changes of two issue: LUCENE-2986 and 
> LUCENE-2977 - I think the changes in these issues are related but will repeat 
> the tests without these changes.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-2988) trunk 'ant test' hangs

2011-03-24 Thread Doron Cohen (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13010754#comment-13010754
 ] 

Doron Cohen commented on LUCENE-2988:
-

Thanks Robert for looking into this!

bq. it seems they are from the 'ant' jvm, not the forked process that actually 
runs the tests

Indeed, now I notice that too... when 'ant test' hanged I used  to 
get the thread dump and did not realize the ant JVM stands in between... That 
console is lost and I don't know the exact location in the tests when that 
happened - too bad. 

bq. It could be the case here that its just in some ultra-slow test such as 
TestIndexWriterOnDiskFull... this one frequently takes several minutes for me 
even on Sun JRE.

I was away for half an hour so its something slower...

I was not able to reproduce this, and also in the meantime found out that I was 
using incompatible ant version - 1.8.1. 

So thinking of closing this for now, will reopen it if it will reappear with 
ant 1.7.1.

> trunk 'ant test' hangs
> --
>
> Key: LUCENE-2988
> URL: https://issues.apache.org/jira/browse/LUCENE-2988
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: Tests
> Environment: inspected so far on XP within Cygwin using IBM JDK 6
>Reporter: Doron Cohen
>Assignee: Doron Cohen
> Fix For: 4.0
>
> Attachments: 5-java-dumps.zip
>
>
> Running 'ant test' from trunk on XP in a Cygwin shell hangs.
> There was no progress in the console for a long time, so i stopped the 
> program.
> Before stopping it, created 5 consecutive thread dumps to see where the code 
> is.
> It is not clear what is going on - does not seem like a Lucene code I think 
> but not sure.
> Opening this issue to keep an eye on this - I will try with other JDKs to see 
> if this is persistent.
> Also, when first seeing this had local changes of two issue: LUCENE-2986 and 
> LUCENE-2977 - I think the changes in these issues are related but will repeat 
> the tests without these changes.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

1 2 3 >

1 - 100 of 204 matches

Mail list logo