[jira] [Resolved] (LUCENE-4590) WriteEnwikiLineDoc which writes Wikipedia category pages to a separate file
[ https://issues.apache.org/jira/browse/LUCENE-4590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Doron Cohen resolved LUCENE-4590. - Resolution: Fixed done. > WriteEnwikiLineDoc which writes Wikipedia category pages to a separate file > --- > > Key: LUCENE-4590 > URL: https://issues.apache.org/jira/browse/LUCENE-4590 > Project: Lucene - Core > Issue Type: New Feature > Components: modules/benchmark >Reporter: Doron Cohen >Assignee: Doron Cohen >Priority: Minor > Attachments: LUCENE-4590.patch > > > It may be convenient to split Wikipedia's line file into two separate files: > category-pages and non-category ones. > It is possible to split the original line file with grep or such. > It is more efficient to do it in advance. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Reopened] (LUCENE-4590) WriteEnwikiLineDoc which writes Wikipedia category pages to a separate file
[ https://issues.apache.org/jira/browse/LUCENE-4590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Doron Cohen reopened LUCENE-4590: - Lucene Fields: (was: New) Reopen issue for making the categories file name method public: categoriesLineFile() so that it can easily be modified in the future without breaking apps logic. > WriteEnwikiLineDoc which writes Wikipedia category pages to a separate file > --- > > Key: LUCENE-4590 > URL: https://issues.apache.org/jira/browse/LUCENE-4590 > Project: Lucene - Core > Issue Type: New Feature > Components: modules/benchmark >Reporter: Doron Cohen >Assignee: Doron Cohen >Priority: Minor > Attachments: LUCENE-4590.patch > > > It may be convenient to split Wikipedia's line file into two separate files: > category-pages and non-category ones. > It is possible to split the original line file with grep or such. > It is more efficient to do it in advance. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-4590) WriteEnwikiLineDoc which writes Wikipedia category pages to a separate file
[ https://issues.apache.org/jira/browse/LUCENE-4590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Doron Cohen resolved LUCENE-4590. - Resolution: Fixed Done. > WriteEnwikiLineDoc which writes Wikipedia category pages to a separate file > --- > > Key: LUCENE-4590 > URL: https://issues.apache.org/jira/browse/LUCENE-4590 > Project: Lucene - Core > Issue Type: New Feature > Components: modules/benchmark >Reporter: Doron Cohen >Assignee: Doron Cohen >Priority: Minor > Attachments: LUCENE-4590.patch > > > It may be convenient to split Wikipedia's line file into two separate files: > category-pages and non-category ones. > It is possible to split the original line file with grep or such. > It is more efficient to do it in advance. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-4595) EnwikiContentSource thread safety problem (NPE) in 'forever' mode
[ https://issues.apache.org/jira/browse/LUCENE-4595?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Doron Cohen resolved LUCENE-4595. - Resolution: Fixed Lucene Fields: (was: New) Fixed. Seems the tag bot missed the trunk commit for this one, so her they are both: - trunk: [r1418281|http://svn.apache.org/viewvc?view=revision&revision=1418281] - 4x: [r1418925|http://svn.apache.org/viewvc?view=revision&revision=1418925] > EnwikiContentSource thread safety problem (NPE) in 'forever' mode > - > > Key: LUCENE-4595 > URL: https://issues.apache.org/jira/browse/LUCENE-4595 > Project: Lucene - Core > Issue Type: Bug > Components: modules/benchmark >Reporter: Doron Cohen >Assignee: Doron Cohen >Priority: Minor > Attachments: LUCENE-4595.patch > > > If close() is invoked around when an additional input stream reader is > recreated for the 'forever' behavior, an uncaught NPE might occur. > This bug was probably always there, just exposed now with the > EnwikioContentSourceTest added in LUCENE-4588. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-4588) EnwikiContentSource silently swallows the last wiki doc
[ https://issues.apache.org/jira/browse/LUCENE-4588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Doron Cohen resolved LUCENE-4588. - Resolution: Fixed Lucene Fields: (was: New) Fixed. As a side note, merging benchmark changes to 4x is so much easier than it used to be in 3x, now that trunk and branch are structured the same! Now if only 'precommit' would run 60 times faster (that would be 12 seconds here)... wouldn't that be great? :) > EnwikiContentSource silently swallows the last wiki doc > --- > > Key: LUCENE-4588 > URL: https://issues.apache.org/jira/browse/LUCENE-4588 > Project: Lucene - Core > Issue Type: Bug > Components: modules/benchmark >Reporter: Doron Cohen >Assignee: Doron Cohen >Priority: Minor > Attachments: LUCENE-4588.patch > > > Last wiki doc is never returned -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4588) EnwikiContentSource silently swallows the last wiki doc
[ https://issues.apache.org/jira/browse/LUCENE-4588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13527399#comment-13527399 ] Doron Cohen commented on LUCENE-4588: - Two more commits to trunk (uncaught by bot due to incorrect message format): - [r1417871|http://svn.apache.org/viewvc?rev=1417871&view=rev] -- LUCENE-4588 (cont): (EnwikiContentSource fixes) avoid using the forbidden StringBufferInputStream.. - [r1417921|http://svn.apache.org/viewvc?rev=1417921&view=rev] -- LUCENE-4588 (cont): simplify test input stream crration. > EnwikiContentSource silently swallows the last wiki doc > --- > > Key: LUCENE-4588 > URL: https://issues.apache.org/jira/browse/LUCENE-4588 > Project: Lucene - Core > Issue Type: Bug > Components: modules/benchmark >Reporter: Doron Cohen >Assignee: Doron Cohen >Priority: Minor > Attachments: LUCENE-4588.patch > > > Last wiki doc is never returned -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4595) EnwikiContentSource thread safety problem (NPE) in 'forever' mode
[ https://issues.apache.org/jira/browse/LUCENE-4595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13526326#comment-13526326 ] Doron Cohen commented on LUCENE-4595: - Thanks for verifying Robert. Committed the fix, let's see if the build becomes stable again. Issue remains open for porting to 4x. > EnwikiContentSource thread safety problem (NPE) in 'forever' mode > - > > Key: LUCENE-4595 > URL: https://issues.apache.org/jira/browse/LUCENE-4595 > Project: Lucene - Core > Issue Type: Bug > Components: modules/benchmark >Reporter: Doron Cohen >Assignee: Doron Cohen >Priority: Minor > Attachments: LUCENE-4595.patch > > > If close() is invoked around when an additional input stream reader is > recreated for the 'forever' behavior, an uncaught NPE might occur. > This bug was probably always there, just exposed now with the > EnwikioContentSourceTest added in LUCENE-4588. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-4590) WriteEnwikiLineDoc which writes Wikipedia category pages to a separate file
[ https://issues.apache.org/jira/browse/LUCENE-4590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Doron Cohen updated LUCENE-4590: Attachment: LUCENE-4590.patch Patch with the new task and a test. > WriteEnwikiLineDoc which writes Wikipedia category pages to a separate file > --- > > Key: LUCENE-4590 > URL: https://issues.apache.org/jira/browse/LUCENE-4590 > Project: Lucene - Core > Issue Type: New Feature > Components: modules/benchmark >Reporter: Doron Cohen >Assignee: Doron Cohen >Priority: Minor > Attachments: LUCENE-4590.patch > > > It may be convenient to split Wikipedia's line file into two separate files: > category-pages and non-category ones. > It is possible to split the original line file with grep or such. > It is more efficient to do it in advance. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4590) WriteEnwikiLineDoc which writes Wikipedia category pages to a separate file
[ https://issues.apache.org/jira/browse/LUCENE-4590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13514649#comment-13514649 ] Doron Cohen commented on LUCENE-4590: - Now I see what you mean. Spooky, it is as if you were looking into the patch I did not post here.. How did you know I chose not to modify EnwikiConentSource... I agree that if someone wishes to index just the non-category pages, the new WriteEnwikiLineDoc would create the category pages file for no use. Also, if indexing is conducted straight away, not through a line file first, categories will be indexed. But then anyone could check the title and decide not to index those docs. So I see the advantage, just not tempted to add this at the moment, but it can be added. > WriteEnwikiLineDoc which writes Wikipedia category pages to a separate file > --- > > Key: LUCENE-4590 > URL: https://issues.apache.org/jira/browse/LUCENE-4590 > Project: Lucene - Core > Issue Type: New Feature > Components: modules/benchmark >Reporter: Doron Cohen >Assignee: Doron Cohen >Priority: Minor > > It may be convenient to split Wikipedia's line file into two separate files: > category-pages and non-category ones. > It is possible to split the original line file with grep or such. > It is more efficient to do it in advance. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4588) EnwikiContentSource silently swallows the last wiki doc
[ https://issues.apache.org/jira/browse/LUCENE-4588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13514644#comment-13514644 ] Doron Cohen commented on LUCENE-4588: - Thanks for the review Shai, changed as you suggested and committed (while jira was down...) > EnwikiContentSource silently swallows the last wiki doc > --- > > Key: LUCENE-4588 > URL: https://issues.apache.org/jira/browse/LUCENE-4588 > Project: Lucene - Core > Issue Type: Bug > Components: modules/benchmark >Reporter: Doron Cohen >Assignee: Doron Cohen >Priority: Minor > Attachments: LUCENE-4588.patch > > > Last wiki doc is never returned -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-4595) EnwikiContentSource thread safety problem (NPE) in 'forever' mode
[ https://issues.apache.org/jira/browse/LUCENE-4595?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Doron Cohen updated LUCENE-4595: Attachment: LUCENE-4595.patch Patch supposed to fix this. But I was not able to recreate the bug, so couldn't actually test it. > EnwikiContentSource thread safety problem (NPE) in 'forever' mode > - > > Key: LUCENE-4595 > URL: https://issues.apache.org/jira/browse/LUCENE-4595 > Project: Lucene - Core > Issue Type: Bug > Components: modules/benchmark >Reporter: Doron Cohen >Assignee: Doron Cohen >Priority: Minor > Attachments: LUCENE-4595.patch > > > If close() is invoked around when an additional input stream reader is > recreated for the 'forever' behavior, an uncaught NPE might occur. > This bug was probably always there, just exposed now with the > EnwikioContentSourceTest added in LUCENE-4588. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4595) EnwikiContentSource thread safety problem (NPE) in 'forever' mode
[ https://issues.apache.org/jira/browse/LUCENE-4595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13512113#comment-13512113 ] Doron Cohen commented on LUCENE-4595: - Jenkin's reproduce params and error log: {noformat} Build: http://jenkins.sd-datasolutions.de/job/Lucene-Solr-trunk-Linux/3093/ Java: 32bit/jdk1.6.0_37 -server -XX:+UseSerialGC 1 tests failed. FAILED: org.apache.lucene.benchmark.byTask.feeds.EnwikiContentSourceTest.testForever Error Message: Captured an uncaught exception in thread: Thread[id=140, name=Thread-2, state=RUNNABLE, group=TGRP-EnwikiContentSourceTest] Stack Trace: com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an uncaught exception in thread: Thread[id=140, name=Thread-2, state=RUNNABLE, group=TGRP-EnwikiContentSourceTest] at __randomizedtesting.SeedInfo.seed([EF7AF10441351C3B:AB004FFFCF2C6B8C]:0) Caused by: java.lang.NullPointerException at __randomizedtesting.SeedInfo.seed([EF7AF10441351C3B]:0) at java.io.Reader.(Reader.java:61) at java.io.InputStreamReader.(InputStreamReader.java:112) at org.apache.lucene.benchmark.byTask.feeds.EnwikiContentSource$Parser.run(EnwikiContentSource.java:186) at java.lang.Thread.run(Thread.java:662) Build Log: [...truncated 5173 lines...] [junit4:junit4] Suite: org.apache.lucene.benchmark.byTask.feeds.EnwikiContentSourceTest [junit4:junit4] 2> 7 Δεκ 2012 6:39:53 πμ com.carrotsearch.randomizedtesting.RandomizedRunner$QueueUncaughtExceptionsHandler uncaughtException [junit4:junit4] 2> WARNING: Uncaught exception in thread: Thread[Thread-2,5,TGRP-EnwikiContentSourceTest] [junit4:junit4] 2> java.lang.NullPointerException [junit4:junit4] 2>at __randomizedtesting.SeedInfo.seed([EF7AF10441351C3B]:0) [junit4:junit4] 2>at java.io.Reader.(Reader.java:61) [junit4:junit4] 2>at java.io.InputStreamReader.(InputStreamReader.java:112) [junit4:junit4] 2>at org.apache.lucene.benchmark.byTask.feeds.EnwikiContentSource$Parser.run(EnwikiContentSource.java:186) [junit4:junit4] 2>at java.lang.Thread.run(Thread.java:662) [junit4:junit4] 2> NOTE: reproduce with: ant test -Dtestcase=EnwikiContentSourceTest -Dtests.method=testForever -Dtests.seed=EF7AF10441351C3B -Dtests.multiplier=3 -Dtests.slow=true -Dtests.locale=el -Dtests.timezone=SST -Dtests.file.encoding=UTF-8 [junit4:junit4] ERROR 0.07s J1 | EnwikiContentSourceTest.testForever <<< [junit4:junit4]> Throwable #1: com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an uncaught exception in thread: Thread[id=140, name=Thread-2, state=RUNNABLE, group=TGRP-EnwikiContentSourceTest] [junit4:junit4]>at __randomizedtesting.SeedInfo.seed([EF7AF10441351C3B:AB004FFFCF2C6B8C]:0) [junit4:junit4]> Caused by: java.lang.NullPointerException [junit4:junit4]>at __randomizedtesting.SeedInfo.seed([EF7AF10441351C3B]:0) [junit4:junit4]>at java.io.Reader.(Reader.java:61) [junit4:junit4]>at java.io.InputStreamReader.(InputStreamReader.java:112) [junit4:junit4]>at org.apache.lucene.benchmark.byTask.feeds.EnwikiContentSource$Parser.run(EnwikiContentSource.java:186) [junit4:junit4]>at java.lang.Thread.run(Thread.java:662) [junit4:junit4] 2> NOTE: test params are: codec=Lucene41: {}, sim=DefaultSimilarity, locale=el, timezone=SST [junit4:junit4] 2> NOTE: Linux 3.2.0-34-generic i386/Sun Microsystems Inc. 1.6.0_37 (32-bit)/cpus=8,threads=1,free=47084536,total=64946176 [junit4:junit4] 2> NOTE: All tests run in this JVM: [TrecContentSourceTest, TestConfig, DocMakerTest, SearchWithSortTaskTest, StreamUtilsTest, WriteLineDocTaskTest, CreateIndexTaskTest, TestQualityRun, LineDocSourceTest, TestPerfTasksParse, AddIndexesTaskTest, PerfTaskTest, AltPackageTaskTest, EnwikiContentSourceTest] [junit4:junit4] Completed on J1 in 0.30s, 3 tests, 1 error <<< FAILURES! {noformat} > EnwikiContentSource thread safety problem (NPE) in 'forever' mode > - > > Key: LUCENE-4595 > URL: https://issues.apache.org/jira/browse/LUCENE-4595 > Project: Lucene - Core > Issue Type: Bug > Components: modules/benchmark >Reporter: Doron Cohen >Assignee: Doron Cohen >Priority: Minor > > If close() is invoked around when an additional input stream reader is > recreated for the 'forever' behavior, an uncaught NPE might occur. > This bug was probably always there, just exposed now with the > EnwikioContentSourceTest added in LUCENE-4588. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira -
[jira] [Created] (LUCENE-4595) EnwikiContentSource thread safety problem (NPE) in 'forever' mode
Doron Cohen created LUCENE-4595: --- Summary: EnwikiContentSource thread safety problem (NPE) in 'forever' mode Key: LUCENE-4595 URL: https://issues.apache.org/jira/browse/LUCENE-4595 Project: Lucene - Core Issue Type: Bug Components: modules/benchmark Reporter: Doron Cohen Assignee: Doron Cohen Priority: Minor If close() is invoked around when an additional input stream reader is recreated for the 'forever' behavior, an uncaught NPE might occur. This bug was probably always there, just exposed now with the EnwikioContentSourceTest added in LUCENE-4588. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4590) WriteEnwikiLineDoc which writes Wikipedia category pages to a separate file
[ https://issues.apache.org/jira/browse/LUCENE-4590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13511262#comment-13511262 ] Doron Cohen commented on LUCENE-4590: - bq. Do you think perhaps that EnwikiContentSource should let the caller know whether the returned DocData represents a content page or category page? That's what I planned at start, but decided to leave WriteLineDoc intact because it is general, that is, not aware of the unique structure of Wikipedia data, where some of the pages represent categories. bq. So maybe, if someone wants to generate a line file from the pages only... flexibility that I think you are trying to achieve... Actually I am after the two files... :) These category pages are (unique) taxonomy node names, but without the taxonomy structure, which can be deduced from the (parent) categories of the category pages. Having this separate category pages can be useful for deducing that taxonomy. > WriteEnwikiLineDoc which writes Wikipedia category pages to a separate file > --- > > Key: LUCENE-4590 > URL: https://issues.apache.org/jira/browse/LUCENE-4590 > Project: Lucene - Core > Issue Type: New Feature > Components: modules/benchmark >Reporter: Doron Cohen >Assignee: Doron Cohen >Priority: Minor > > It may be convenient to split Wikipedia's line file into two separate files: > category-pages and non-category ones. > It is possible to split the original line file with grep or such. > It is more efficient to do it in advance. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-4590) WriteEnwikiLineDoc which writes Wikipedia category pages to a separate file
[ https://issues.apache.org/jira/browse/LUCENE-4590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Doron Cohen updated LUCENE-4590: Component/s: modules/benchmark > WriteEnwikiLineDoc which writes Wikipedia category pages to a separate file > --- > > Key: LUCENE-4590 > URL: https://issues.apache.org/jira/browse/LUCENE-4590 > Project: Lucene - Core > Issue Type: New Feature > Components: modules/benchmark >Reporter: Doron Cohen >Assignee: Doron Cohen >Priority: Minor > > It may be convenient to split Wikipedia's line file into two separate files: > category-pages and non-category ones. > It is possible to split the original line file with grep or such. > It is more efficient to do it in advance. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-4590) WriteEnwikiLineDoc which writes Wikipedia category pages to a separate file
Doron Cohen created LUCENE-4590: --- Summary: WriteEnwikiLineDoc which writes Wikipedia category pages to a separate file Key: LUCENE-4590 URL: https://issues.apache.org/jira/browse/LUCENE-4590 Project: Lucene - Core Issue Type: New Feature Reporter: Doron Cohen Assignee: Doron Cohen Priority: Minor It may be convenient to split Wikipedia's line file into two separate files: category-pages and non-category ones. It is possible to split the original line file with grep or such. It is more efficient to do it in advance. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Assigned] (LUCENE-4588) EnwikiContentSource silently swallows the last wiki doc
[ https://issues.apache.org/jira/browse/LUCENE-4588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Doron Cohen reassigned LUCENE-4588: --- Assignee: Doron Cohen > EnwikiContentSource silently swallows the last wiki doc > --- > > Key: LUCENE-4588 > URL: https://issues.apache.org/jira/browse/LUCENE-4588 > Project: Lucene - Core > Issue Type: Bug > Components: modules/benchmark >Reporter: Doron Cohen >Assignee: Doron Cohen >Priority: Minor > Attachments: LUCENE-4588.patch > > > Last wiki doc is never returned -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-4588) EnwikiContentSource silently swallows the last wiki doc
[ https://issues.apache.org/jira/browse/LUCENE-4588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Doron Cohen updated LUCENE-4588: Attachment: LUCENE-4588.patch Patch adds a test for enwiki-content-source and fixes both the last doc problem and the thread leak. > EnwikiContentSource silently swallows the last wiki doc > --- > > Key: LUCENE-4588 > URL: https://issues.apache.org/jira/browse/LUCENE-4588 > Project: Lucene - Core > Issue Type: Bug > Components: modules/benchmark >Reporter: Doron Cohen >Priority: Minor > Attachments: LUCENE-4588.patch > > > Last wiki doc is never returned -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4588) EnwikiContentSource silently swallows the last wiki doc
[ https://issues.apache.org/jira/browse/LUCENE-4588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13510774#comment-13510774 ] Doron Cohen commented on LUCENE-4588: - In addition, there's a thread leak in 'forever' mode. > EnwikiContentSource silently swallows the last wiki doc > --- > > Key: LUCENE-4588 > URL: https://issues.apache.org/jira/browse/LUCENE-4588 > Project: Lucene - Core > Issue Type: Bug > Components: modules/benchmark >Reporter: Doron Cohen >Priority: Minor > > Last wiki doc is never returned -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-4588) EnwikiContentSource silently swallows the last wiki doc
Doron Cohen created LUCENE-4588: --- Summary: EnwikiContentSource silently swallows the last wiki doc Key: LUCENE-4588 URL: https://issues.apache.org/jira/browse/LUCENE-4588 Project: Lucene - Core Issue Type: Bug Components: modules/benchmark Reporter: Doron Cohen Priority: Minor Last wiki doc is never returned -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3464) Rename IndexReader.reopen to make it clear that reopen may not happen
[ https://issues.apache.org/jira/browse/LUCENE-3464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13114656#comment-13114656 ] Doron Cohen commented on LUCENE-3464: - I liked reopen()... (but also like returning null in case there's nothing newer...) If the name is going to change, two additional names to consider: * newest() * newer() For "newest()" I think current behavior of returning "this" makes sense when "this" is the newest. For "newer()" returning null in that case seems right. One problem I have with these names is that they both seem to hide the fact that things are going on down there, when it is required to open a new reader... > Rename IndexReader.reopen to make it clear that reopen may not happen > - > > Key: LUCENE-3464 > URL: https://issues.apache.org/jira/browse/LUCENE-3464 > Project: Lucene - Java > Issue Type: Bug >Reporter: Michael McCandless >Assignee: Michael McCandless > Fix For: 3.5, 4.0 > > > Spinoff from LUCENE-3454 where Shai noted this inconsistency. > IR.reopen sounds like an unconditional operation, which has trapped users in > the past into always closing the old reader instead of only closing it if the > returned reader is new. > I think this hidden maybe-ness is trappy and we should rename it > (maybeReopen? reopenIfNeeded?). > In addition, instead of returning "this" when the reopen didn't happen, I > think we should return null to enforce proper usage of the maybe-ness of this > API. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3454) rename optimize to a less cool-sounding name
[ https://issues.apache.org/jira/browse/LUCENE-3454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13114617#comment-13114617 ] Doron Cohen commented on LUCENE-3454: - To me merge(num) doing nothing "because there are already no more than n segments" is as fine as close() doing nothing "because of already being closed" so +1 for merge(num). > rename optimize to a less cool-sounding name > > > Key: LUCENE-3454 > URL: https://issues.apache.org/jira/browse/LUCENE-3454 > Project: Lucene - Java > Issue Type: Improvement >Affects Versions: 4.0 >Reporter: Robert Muir > > I think users see the name optimize and feel they must do this, because who > wants a suboptimal system? but this probably just results in wasted time and > resources. > maybe rename to collapseSegments or something? -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-3457) Upgrade to commons-compress 1.2
[ https://issues.apache.org/jira/browse/LUCENE-3457?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Doron Cohen resolved LUCENE-3457. - Resolution: Fixed Fixed: - 1175475 - trunk - 1175528 - 3x > Upgrade to commons-compress 1.2 > --- > > Key: LUCENE-3457 > URL: https://issues.apache.org/jira/browse/LUCENE-3457 > Project: Lucene - Java > Issue Type: Bug > Components: modules/benchmark >Reporter: Doron Cohen >Assignee: Doron Cohen >Priority: Minor > Fix For: 3.5, 4.0 > > Attachments: LUCENE-3457.patch, test.out.gz > > > Commons Compress bug COMPRESS-127 was fixed in 1.2, so the workaround in > benchmark's StreamUtils is no longer required. Compress is also used in solr. > Replace with new jar in both benchmark and solr and get rid of that > workaround. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3457) Upgrade to commons-compress 1.2
[ https://issues.apache.org/jira/browse/LUCENE-3457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13114302#comment-13114302 ] Doron Cohen commented on LUCENE-3457: - ok great, thanks Robert, so this has nothing to do with the comprees jar update. I'll commit shortly. > Upgrade to commons-compress 1.2 > --- > > Key: LUCENE-3457 > URL: https://issues.apache.org/jira/browse/LUCENE-3457 > Project: Lucene - Java > Issue Type: Bug > Components: modules/benchmark >Reporter: Doron Cohen >Assignee: Doron Cohen >Priority: Minor > Fix For: 3.5, 4.0 > > Attachments: LUCENE-3457.patch, test.out.gz > > > Commons Compress bug COMPRESS-127 was fixed in 1.2, so the workaround in > benchmark's StreamUtils is no longer required. Compress is also used in solr. > Replace with new jar in both benchmark and solr and get rid of that > workaround. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3457) Upgrade to commons-compress 1.2
[ https://issues.apache.org/jira/browse/LUCENE-3457?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Doron Cohen updated LUCENE-3457: Attachment: test.out.gz Still it fails - this time running 'clean test' from trunk, all lucene tests pass, some of solr tests failed: - org.apache.solr.handler.TestReplicationHandler [junit] Tests run: 1, Failures: 1, Errors: 0, Time elapsed: 43.703 sec - org.apache.solr.handler.component.DebugComponentTest [junit] Tests run: 2, Failures: 1, Errors: 0, Time elapsed: 1 sec - org.apache.solr.handler.component.TermVectorComponentTest [junit] Tests run: 4, Failures: 1, Errors: 0, Time elapsed: 1.375 sec - org.apache.solr.request.JSONWriterTest [junit] Tests run: 2, Failures: 1, Errors: 0, Time elapsed: 1.078 sec - org.apache.solr.schema.BadIndexSchemaTest [junit] Tests run: 5, Failures: 1, Errors: 0, Time elapsed: 1.266 sec - org.apache.solr.schema.RequiredFieldsTest [junit] Tests run: 3, Failures: 1, Errors: 0, Time elapsed: 1.422 sec - org.apache.solr.search.QueryParsingTest [junit] Tests run: 2, Failures: 1, Errors: 0, Time elapsed: 0.641 sec - org.apache.solr.search.SpatialFilterTest [junit] Tests run: 3, Failures: 1, Errors: 0, Time elapsed: 1.438 sec - org.apache.solr.search.TestQueryTypes [junit] Tests run: 1, Failures: 1, Errors: 0, Time elapsed: 0.953 sec - org.apache.solr.servlet.CacheHeaderTest [junit] Tests run: 5, Failures: 1, Errors: 0, Time elapsed: 0.984 sec - org.apache.solr.spelling.SpellCheckCollatorTest [junit] Tests run: 4, Failures: 1, Errors: 0, Time elapsed: 1.281 sec - org.apache.solr.update.DocumentBuilderTest [junit] Tests run: 4, Failures: 1, Errors: 0, Time elapsed: 0.734 sec - org.apache.solr.util.SolrPluginUtilsTest [junit] Tests run: 7, Failures: 1, Errors: 0, Time elapsed: 0.766 sec Running alone, TestReplicationHandler for example passes. Same for DebugComponentTest. I am not sure what is happenning here. Attaching the test output in case someone wants take a look. > Upgrade to commons-compress 1.2 > --- > > Key: LUCENE-3457 > URL: https://issues.apache.org/jira/browse/LUCENE-3457 > Project: Lucene - Java > Issue Type: Bug > Components: modules/benchmark >Reporter: Doron Cohen >Assignee: Doron Cohen >Priority: Minor > Fix For: 3.5, 4.0 > > Attachments: LUCENE-3457.patch, test.out.gz > > > Commons Compress bug COMPRESS-127 was fixed in 1.2, so the workaround in > benchmark's StreamUtils is no longer required. Compress is also used in solr. > Replace with new jar in both benchmark and solr and get rid of that > workaround. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Issue Comment Edited] (LUCENE-3457) Upgrade to commons-compress 1.2
[ https://issues.apache.org/jira/browse/LUCENE-3457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13114219#comment-13114219 ] Doron Cohen edited comment on LUCENE-3457 at 9/25/11 11:44 AM: --- Thanks Chris, almost sure I did a clean, will try again. was (Author: doronc): Thanks Chriss, almost sure I did a clean, will try again. > Upgrade to commons-compress 1.2 > --- > > Key: LUCENE-3457 > URL: https://issues.apache.org/jira/browse/LUCENE-3457 > Project: Lucene - Java > Issue Type: Bug > Components: modules/benchmark >Reporter: Doron Cohen >Assignee: Doron Cohen >Priority: Minor > Fix For: 3.5, 4.0 > > Attachments: LUCENE-3457.patch > > > Commons Compress bug COMPRESS-127 was fixed in 1.2, so the workaround in > benchmark's StreamUtils is no longer required. Compress is also used in solr. > Replace with new jar in both benchmark and solr and get rid of that > workaround. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3457) Upgrade to commons-compress 1.2
[ https://issues.apache.org/jira/browse/LUCENE-3457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13114219#comment-13114219 ] Doron Cohen commented on LUCENE-3457: - Thanks Chriss, almost sure I did a clean, will try again. > Upgrade to commons-compress 1.2 > --- > > Key: LUCENE-3457 > URL: https://issues.apache.org/jira/browse/LUCENE-3457 > Project: Lucene - Java > Issue Type: Bug > Components: modules/benchmark >Reporter: Doron Cohen >Assignee: Doron Cohen >Priority: Minor > Fix For: 3.5, 4.0 > > Attachments: LUCENE-3457.patch > > > Commons Compress bug COMPRESS-127 was fixed in 1.2, so the workaround in > benchmark's StreamUtils is no longer required. Compress is also used in solr. > Replace with new jar in both benchmark and solr and get rid of that > workaround. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3457) Upgrade to commons-compress 1.2
[ https://issues.apache.org/jira/browse/LUCENE-3457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13114213#comment-13114213 ] Doron Cohen commented on LUCENE-3457: - hmmm, this is strange. These are the tests that failed with compress-1.2 for 'ant clean test' under solr: - org.apache.solr.handler.TestReplicationHandler [junit] Tests run: 1, Failures: 1, Errors: 0, Time elapsed: 39.968 sec - org.apache.solr.handler.component.DebugComponentTest [junit] Tests run: 2, Failures: 1, Errors: 0, Time elapsed: 1.219 sec - org.apache.solr.handler.component.TermVectorComponentTest [junit] Tests run: 4, Failures: 1, Errors: 0, Time elapsed: 1 sec - org.apache.solr.request.JSONWriterTest [junit] Tests run: 2, Failures: 1, Errors: 0, Time elapsed: 0.75 sec - org.apache.solr.response.TestCSVResponseWriter [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.719 sec - org.apache.solr.schema.BadIndexSchemaTest [junit] Tests run: 5, Failures: 1, Errors: 0, Time elapsed: 1.187 sec - org.apache.solr.search.TestQueryUtils [junit] Tests run: 1, Failures: 1, Errors: 0, Time elapsed: 1.14 sec - org.apache.solr.search.similarities.TestBM25SimilarityFactory [junit] Tests run: 2, Failures: 1, Errors: 0, Time elapsed: 0.187 sec - org.apache.solr.servlet.DirectSolrConnectionTest [junit] Tests run: 2, Failures: 1, Errors: 0, Time elapsed: 0.344 sec - org.apache.solr.update.processor.SignatureUpdateProcessorFactoryTest [junit] Tests run: 4, Failures: 1, Errors: 0, Time elapsed: 3.984 sec I replaced 1.1 and they all passed. However replaced to compress-1.2 and now they all passed. I now see that I am on r1174072, I'll update and try again > Upgrade to commons-compress 1.2 > --- > > Key: LUCENE-3457 > URL: https://issues.apache.org/jira/browse/LUCENE-3457 > Project: Lucene - Java > Issue Type: Bug > Components: modules/benchmark >Reporter: Doron Cohen >Assignee: Doron Cohen >Priority: Minor > Fix For: 3.5, 4.0 > > Attachments: LUCENE-3457.patch > > > Commons Compress bug COMPRESS-127 was fixed in 1.2, so the workaround in > benchmark's StreamUtils is no longer required. Compress is also used in solr. > Replace with new jar in both benchmark and solr and get rid of that > workaround. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3457) Upgrade to commons-compress 1.2
[ https://issues.apache.org/jira/browse/LUCENE-3457?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Doron Cohen updated LUCENE-3457: Attachment: LUCENE-3457.patch Attached simple patch with the fix. After applying the patch need to also download commons-compress-1.2.jar and place it in under module/benchmark/lib and under solr/contrib/extraction/lib. Currently several solr tests fails for me with this patch, probably not related to replacing the compress jar, as when running alone (-Dtestcase) they pass. > Upgrade to commons-compress 1.2 > --- > > Key: LUCENE-3457 > URL: https://issues.apache.org/jira/browse/LUCENE-3457 > Project: Lucene - Java > Issue Type: Bug > Components: modules/benchmark >Reporter: Doron Cohen >Assignee: Doron Cohen >Priority: Minor > Fix For: 3.5, 4.0 > > Attachments: LUCENE-3457.patch > > > Commons Compress bug COMPRESS-127 was fixed in 1.2, so the workaround in > benchmark's StreamUtils is no longer required. Compress is also used in solr. > Replace with new jar in both benchmark and solr and get rid of that > workaround. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-3457) Upgrade to commons-compress 1.2
Upgrade to commons-compress 1.2 --- Key: LUCENE-3457 URL: https://issues.apache.org/jira/browse/LUCENE-3457 Project: Lucene - Java Issue Type: Bug Components: modules/benchmark Reporter: Doron Cohen Assignee: Doron Cohen Priority: Minor Fix For: 3.5, 4.0 Commons Compress bug COMPRESS-127 was fixed in 1.2, so the workaround in benchmark's StreamUtils is no longer required. Compress is also used in solr. Replace with new jar in both benchmark and solr and get rid of that workaround. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-3215) SloppyPhraseScorer sometimes computes Infinite freq
[ https://issues.apache.org/jira/browse/LUCENE-3215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Doron Cohen resolved LUCENE-3215. - Resolution: Fixed Fix Version/s: 4.0 3.5 Fixed - r1173961 - trunk - r1174002 - 3x Prior to committing I compared the performance of sloppy phrase queries with/out repeats for large documents with many candidate matches and did not see the anticipated speedup, though, at least, no degradations as well. > SloppyPhraseScorer sometimes computes Infinite freq > --- > > Key: LUCENE-3215 > URL: https://issues.apache.org/jira/browse/LUCENE-3215 > Project: Lucene - Java > Issue Type: Bug >Reporter: Robert Muir >Assignee: Doron Cohen > Fix For: 3.5, 4.0 > > Attachments: LUCENE-3215.patch, LUCENE-3215.patch, LUCENE-3215.patch, > LUCENE-3215.patch, LUCENE-3215_test.patch, LUCENE-3215_test.patch > > > reported on user list: > http://www.lucidimagination.com/search/document/400cbc528ed63db9/score_of_infinity_on_dismax_query -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3390) Incorrect sort by Numeric values for documents missing the sorting field
[ https://issues.apache.org/jira/browse/LUCENE-3390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Doron Cohen updated LUCENE-3390: Attachment: LUCENE-3390-BitsInterface.patch Attached patch with a test that fails before this fix (otherwise patch same as previous). The test uses 4 collectors simultaneously, each with different missing values. > Incorrect sort by Numeric values for documents missing the sorting field > > > Key: LUCENE-3390 > URL: https://issues.apache.org/jira/browse/LUCENE-3390 > Project: Lucene - Java > Issue Type: Bug > Components: core/search >Affects Versions: 3.3 >Reporter: Gilad Barkai >Assignee: Doron Cohen >Priority: Minor > Labels: double, float, int, long, numeric, sort > Fix For: 3.4 > > Attachments: LUCENE-3390-BitsInterface.patch, > LUCENE-3390-BitsInterface.patch, LUCENE-3390-fix-like-trunk.patch, > LUCENE-3390-fix-like-trunk.patch, LUCENE-3390-fix-like-trunk.patch, > LUCENE-3390-fix-like-trunk.patch, LUCENE-3390.patch, SortByDouble.java > > > While sorting results over a numeric field, documents which do not contain a > value for the sorting field seem to get 0 (ZERO) value in the sort. (Tested > against Double, Float, Int & Long numeric fields ascending and descending > order). > This behavior is unexpected, as zero is "comparable" to the rest of the > values. A better solution would either be allowing the user to define such a > "non-value" default, or always bring those document results as the last ones. > Example scenario: > Adding 3 documents, 1st with value 3.5d, 2nd with -10d, and 3rd without any > value. > Searching with MatchAllDocsQuery, with sort over that field in descending > order yields the docid results of 0, 2, 1. > Asking for the top 2 documents brings the document without any value as the > 2nd result - which seems as a bug? -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3390) Incorrect sort by Numeric values for documents missing the sorting field
[ https://issues.apache.org/jira/browse/LUCENE-3390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13109454#comment-13109454 ] Doron Cohen commented on LUCENE-3390: - I wrote a small test that should fail with the bug Uwe fixed here and pass with the fix. For some reason it is still failing even with that fix. Tried this with previous patch, will now try with last one, though I think it it should pass also with previous one. I'll give it another try. > Incorrect sort by Numeric values for documents missing the sorting field > > > Key: LUCENE-3390 > URL: https://issues.apache.org/jira/browse/LUCENE-3390 > Project: Lucene - Java > Issue Type: Bug > Components: core/search >Affects Versions: 3.3 >Reporter: Gilad Barkai >Assignee: Doron Cohen >Priority: Minor > Labels: double, float, int, long, numeric, sort > Fix For: 3.4 > > Attachments: LUCENE-3390-BitsInterface.patch, > LUCENE-3390-fix-like-trunk.patch, LUCENE-3390-fix-like-trunk.patch, > LUCENE-3390-fix-like-trunk.patch, LUCENE-3390-fix-like-trunk.patch, > LUCENE-3390.patch, SortByDouble.java > > > While sorting results over a numeric field, documents which do not contain a > value for the sorting field seem to get 0 (ZERO) value in the sort. (Tested > against Double, Float, Int & Long numeric fields ascending and descending > order). > This behavior is unexpected, as zero is "comparable" to the rest of the > values. A better solution would either be allowing the user to define such a > "non-value" default, or always bring those document results as the last ones. > Example scenario: > Adding 3 documents, 1st with value 3.5d, 2nd with -10d, and 3rd without any > value. > Searching with MatchAllDocsQuery, with sort over that field in descending > order yields the docid results of 0, 2, 1. > Asking for the top 2 documents brings the document without any value as the > 2nd result - which seems as a bug? -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3390) Incorrect sort by Numeric values for documents missing the sorting field
[ https://issues.apache.org/jira/browse/LUCENE-3390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13108490#comment-13108490 ] Doron Cohen commented on LUCENE-3390: - Hi Uwe, thanks for catching this. I agree that this is a bug, and needs to be fixed. Just to make sure that we agree on what the problem is, let me describe it again: in current 3x code in setNextReader() we extract the values from the cache, e.g. by {code}FieldCache.DEFAULT.getDoubles(reader, field, parser);{code} and, if a missing value was set, we iterate the unvalued docs and set them to that missing value. However this settings takes place at the same array just obtained from the cache, and so this is (1) inefficient as it will happen again in the next sort with same field, (2) incorrect as if two sorts of *same* field have different missing value they will collide, and (3) unsafe as you indicated. I was very happy with the reuse of the cache for caching the missing values so I would like to try to solve this with that "frame"... More later... > Incorrect sort by Numeric values for documents missing the sorting field > > > Key: LUCENE-3390 > URL: https://issues.apache.org/jira/browse/LUCENE-3390 > Project: Lucene - Java > Issue Type: Bug > Components: core/search >Affects Versions: 3.3 >Reporter: Gilad Barkai >Assignee: Doron Cohen >Priority: Minor > Labels: double, float, int, long, numeric, sort > Fix For: 3.4 > > Attachments: LUCENE-3390-fix-like-trunk.patch, > LUCENE-3390-fix-like-trunk.patch, LUCENE-3390-fix-like-trunk.patch, > LUCENE-3390.patch, SortByDouble.java > > > While sorting results over a numeric field, documents which do not contain a > value for the sorting field seem to get 0 (ZERO) value in the sort. (Tested > against Double, Float, Int & Long numeric fields ascending and descending > order). > This behavior is unexpected, as zero is "comparable" to the rest of the > values. A better solution would either be allowing the user to define such a > "non-value" default, or always bring those document results as the last ones. > Example scenario: > Adding 3 documents, 1st with value 3.5d, 2nd with -10d, and 3rd without any > value. > Searching with MatchAllDocsQuery, with sort over that field in descending > order yields the docid results of 0, 2, 1. > Asking for the top 2 documents brings the document without any value as the > 2nd result - which seems as a bug? -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3215) SloppyPhraseScorer sometimes computes Infinite freq
[ https://issues.apache.org/jira/browse/LUCENE-3215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Doron Cohen updated LUCENE-3215: Attachment: LUCENE-3215.patch Previous patch still produces NANs and infinite scores with holes. Updated patch is fixing this, by updating END (before computing the new match-length) also for pp (not only for its repeats). I plan to commit this soon. > SloppyPhraseScorer sometimes computes Infinite freq > --- > > Key: LUCENE-3215 > URL: https://issues.apache.org/jira/browse/LUCENE-3215 > Project: Lucene - Java > Issue Type: Bug >Reporter: Robert Muir >Assignee: Doron Cohen > Attachments: LUCENE-3215.patch, LUCENE-3215.patch, LUCENE-3215.patch, > LUCENE-3215.patch, LUCENE-3215_test.patch, LUCENE-3215_test.patch > > > reported on user list: > http://www.lucidimagination.com/search/document/400cbc528ed63db9/score_of_infinity_on_dismax_query -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3215) SloppyPhraseScorer sometimes computes Infinite freq
[ https://issues.apache.org/jira/browse/LUCENE-3215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Doron Cohen updated LUCENE-3215: Attachment: LUCENE-3215.patch Updated patch for current trunk r1172055. > SloppyPhraseScorer sometimes computes Infinite freq > --- > > Key: LUCENE-3215 > URL: https://issues.apache.org/jira/browse/LUCENE-3215 > Project: Lucene - Java > Issue Type: Bug >Reporter: Robert Muir >Assignee: Doron Cohen > Attachments: LUCENE-3215.patch, LUCENE-3215.patch, LUCENE-3215.patch, > LUCENE-3215_test.patch, LUCENE-3215_test.patch > > > reported on user list: > http://www.lucidimagination.com/search/document/400cbc528ed63db9/score_of_infinity_on_dismax_query -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3215) SloppyPhraseScorer sometimes computes Infinite freq
[ https://issues.apache.org/jira/browse/LUCENE-3215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Doron Cohen updated LUCENE-3215: Attachment: LUCENE-3215.patch Attached patch is based on r1166541 - before recent changes to scorers. Will merge with recent changes tomorrow or so. All tests pass. I believe that sloppy scoring performance should improve with this change but did not check this. > SloppyPhraseScorer sometimes computes Infinite freq > --- > > Key: LUCENE-3215 > URL: https://issues.apache.org/jira/browse/LUCENE-3215 > Project: Lucene - Java > Issue Type: Bug >Reporter: Robert Muir >Assignee: Doron Cohen > Attachments: LUCENE-3215.patch, LUCENE-3215.patch, > LUCENE-3215_test.patch, LUCENE-3215_test.patch > > > reported on user list: > http://www.lucidimagination.com/search/document/400cbc528ed63db9/score_of_infinity_on_dismax_query -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Issue Comment Edited] (LUCENE-3215) SloppyPhraseScorer sometimes computes Infinite freq
[ https://issues.apache.org/jira/browse/LUCENE-3215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13107209#comment-13107209 ] Doron Cohen edited comment on LUCENE-3215 at 9/17/11 6:56 PM: -- OK I think I have a fix for this. While looking at it, I realized that PhraseScorer (the one that used to base both Exact&Sloppy phrase scorers but now is the base of only sloppy phrase scorer) is way too complicated and inefficient. All those sort calls after each matching doc can be avoided. So I am modifying PhraseScorer to not have a phrase-queue at all - just the sorted linked list, which is always kept sorted by advancing last beyond first. Last is renamed to 'min' and first is renamed to 'max'. Making the list cyclic allows more efficient manipulation of it. With this, SloppyPhraseScorer is modified to maintain its own phrase queue. The queue size is set at the first candidate document. In order to handle repetitions (Same term in different query offsets) it will contain only some of the pps: those that either have no repetitions, or are the first (lower query offset) in a repeating group. A linked list of repeating pps was added: so PhrasePositions has a new member: nextRepeating. Detection of repeating pps and creation of that list is done once per scorer: at the first candidate doc. For solving the bugs reported here, in addition to the initiation of 'end' as explained in previous comment, advanceRepeatingPPs now also update two values: - end, in case one of the repeating pps is far ahead (larger) - position of the first pp in a repeating list (the one that is in the queue - in case the repeating pp is far behind (smaller). This can happen when there are holes in the query, as position = tpPOs - offset. It fixes the problem of false negative distances which caused this bug. It is tricky: relies on that PhrasePositions.nextPosition() ignores pp.position and just call positions.nextPosition(). But it is correct, as the modified position is used to replace pp in the queue. Last, I think that the test added with holes had one wrong assert: It added four docs: - drug drug - drug druggy drug - drug druggy druggy drug - drug druggy drug druggy drug defined this query (number is the offset): - drug(1) drug(3) and expected that with slop=1 the first doc would not be found. I think it should be found, as the slop operates in both directions. So modified the query to: drug(1) drug(3) Patch to follow. was (Author: doronc): OK I think I have a fix for this. While looking at it, I realized that PhraseScorer (the one that used to base both Exact&Sloppy phrase scorers but now is the base of only sloppy phrase scorer) is way too complicated and inefficient. All those sort calls after each matching doc can be avoided. So I am modifying PhraseScorer to not have a phrase-queue at all - just the sorted linked list, which is always kept sorted by advancing last beyond first. Last is renamed to 'min' and first is renamed to 'max'. Making the list cyclic allows more efficient manipulation of it. With this, SloppyPhraseScorer is modified to maintain its own phrase queue. The queue size is set at the first candidate document. In order to handle repetitions (Same term in different query offsets) it will contain only some of the pps: those that either have no repetitions, or are the first (lower query offset) in a repeating group. A linked list of repeating pps was added: so PhrasePositions has a new member: nextRepeating. Detection of repeating pps and creation of that list is done once per scorer: at the first candidate doc. For solving the bugs reported here, in addition to the initiation of 'end' as explained in previous comment, advanceRepeatingPPs now also update two values: - end, in case one of the repeating pps is far ahead (larger) - position of the first pp in a repeating list (the one that is in the queue - in case the repeating pp is far behind (smaller). This can happen when there are holes in the query, as position = tpPOs - offset. It fixes the problem of false negative distances which caused this bug. It is tricky: relies on that PhrasePositions.nextPosition() ignores pp.position and just call positions.nextPosition(). But it is correct, as the modified position is used to replace pp in the queue. Last, I think that the test added with holes had one wrong assert: It added four docs: - drug drug - drug druggy drug - drug druggy druggy drug - drug druggy drug druggy drug defined this query (number is the offset): - drug(1) drug(3) and expected that with slop=1 the first doc would not be found. I think it should be found, as the slop operates in both directions. So modified the query to: drug(1) drug(3) Patch to follow. > SloppyPhraseScorer sometimes computes Infinite freq > --- > >
[jira] [Commented] (LUCENE-3215) SloppyPhraseScorer sometimes computes Infinite freq
[ https://issues.apache.org/jira/browse/LUCENE-3215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13107209#comment-13107209 ] Doron Cohen commented on LUCENE-3215: - OK I think I have a fix for this. While looking at it, I realized that PhraseScorer (the one that used to base both Exact&Sloppy phrase scorers but now is the base of only sloppy phrase scorer) is way too complicated and inefficient. All those sort calls after each matching doc can be avoided. So I am modifying PhraseScorer to not have a phrase-queue at all - just the sorted linked list, which is always kept sorted by advancing last beyond first. Last is renamed to 'min' and first is renamed to 'max'. Making the list cyclic allows more efficient manipulation of it. With this, SloppyPhraseScorer is modified to maintain its own phrase queue. The queue size is set at the first candidate document. In order to handle repetitions (Same term in different query offsets) it will contain only some of the pps: those that either have no repetitions, or are the first (lower query offset) in a repeating group. A linked list of repeating pps was added: so PhrasePositions has a new member: nextRepeating. Detection of repeating pps and creation of that list is done once per scorer: at the first candidate doc. For solving the bugs reported here, in addition to the initiation of 'end' as explained in previous comment, advanceRepeatingPPs now also update two values: - end, in case one of the repeating pps is far ahead (larger) - position of the first pp in a repeating list (the one that is in the queue - in case the repeating pp is far behind (smaller). This can happen when there are holes in the query, as position = tpPOs - offset. It fixes the problem of false negative distances which caused this bug. It is tricky: relies on that PhrasePositions.nextPosition() ignores pp.position and just call positions.nextPosition(). But it is correct, as the modified position is used to replace pp in the queue. Last, I think that the test added with holes had one wrong assert: It added four docs: - drug drug - drug druggy drug - drug druggy druggy drug - drug druggy drug druggy drug defined this query (number is the offset): - drug(1) drug(3) and expected that with slop=1 the first doc would not be found. I think it should be found, as the slop operates in both directions. So modified the query to: drug(1) drug(3) Patch to follow. > SloppyPhraseScorer sometimes computes Infinite freq > --- > > Key: LUCENE-3215 > URL: https://issues.apache.org/jira/browse/LUCENE-3215 > Project: Lucene - Java > Issue Type: Bug >Reporter: Robert Muir >Assignee: Doron Cohen > Attachments: LUCENE-3215.patch, LUCENE-3215_test.patch, > LUCENE-3215_test.patch > > > reported on user list: > http://www.lucidimagination.com/search/document/400cbc528ed63db9/score_of_infinity_on_dismax_query -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3215) SloppyPhraseScorer sometimes computes Infinite freq
[ https://issues.apache.org/jira/browse/LUCENE-3215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13100182#comment-13100182 ] Doron Cohen commented on LUCENE-3215: - An update on this... This is not related to LUCENE-3142 - the latter was fixed but this one still fails. The patch fix which 'abs' the distance indeed avoids the infinite score problem, but I was not 100% comfortable with it - how can the distance be none positive? Digging into it shows a wrong assumption in SloppyPhraseScorer: {code} private int initPhrasePositions() throws IOException { int end = 0; {code} The initial value of end assumes that all positions will be nonnegative. But this is wrong, as PP position is computed as {code} position = postings.nextPosition() - offset {code} So, whenever the query term appears in the doc in a position smaller than its offset in the query, the computed position is negative. The correct initialization for end is therefore: {code} private int initPhrasePositions() throws IOException { int end = Integer.MIN_VALUE; {code} You would expect this bug to surfaced sooner... Anyhow, for the 3 tests that Robert added, this only resolve testInfiniteFreq1() but the other two tests still fail, investigating... > SloppyPhraseScorer sometimes computes Infinite freq > --- > > Key: LUCENE-3215 > URL: https://issues.apache.org/jira/browse/LUCENE-3215 > Project: Lucene - Java > Issue Type: Bug >Reporter: Robert Muir >Assignee: Doron Cohen > Attachments: LUCENE-3215.patch, LUCENE-3215_test.patch, > LUCENE-3215_test.patch > > > reported on user list: > http://www.lucidimagination.com/search/document/400cbc528ed63db9/score_of_infinity_on_dismax_query -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Assigned] (LUCENE-3215) SloppyPhraseScorer sometimes computes Infinite freq
[ https://issues.apache.org/jira/browse/LUCENE-3215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Doron Cohen reassigned LUCENE-3215: --- Assignee: Doron Cohen > SloppyPhraseScorer sometimes computes Infinite freq > --- > > Key: LUCENE-3215 > URL: https://issues.apache.org/jira/browse/LUCENE-3215 > Project: Lucene - Java > Issue Type: Bug >Reporter: Robert Muir >Assignee: Doron Cohen > Attachments: LUCENE-3215.patch, LUCENE-3215_test.patch, > LUCENE-3215_test.patch > > > reported on user list: > http://www.lucidimagination.com/search/document/400cbc528ed63db9/score_of_infinity_on_dismax_query -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-3412) SloppyPhraseScorer returns non-deterministic results for queries with many repeats
[ https://issues.apache.org/jira/browse/LUCENE-3412?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Doron Cohen resolved LUCENE-3412. - Resolution: Fixed Fix Version/s: 4.0 3.5 Lucene Fields: (was: [New]) Fix committed: - r1166541 - trunk - r1166563 - 3x (fix not included in 3.4 RC, therefore marked as 3.5 above) > SloppyPhraseScorer returns non-deterministic results for queries with many > repeats > -- > > Key: LUCENE-3412 > URL: https://issues.apache.org/jira/browse/LUCENE-3412 > Project: Lucene - Java > Issue Type: Bug > Components: core/search >Affects Versions: 3.1, 3.2, 3.3, 4.0 >Reporter: Michael Ryan >Assignee: Doron Cohen > Fix For: 3.5, 4.0 > > Attachments: LUCENE-3412.patch, LUCENE-3412.patch > > > Proximity queries with many repeats (four or more, based on my testing) > return non-deterministic results. I run the same query multiple times with > the same data set and get different results. > So far I've reproduced this with Solr 1.4.1, 3.1, 3.2, 3.3, and latest 4.0 > trunk. > Steps to reproduce (using the Solr example): > 1) In solrconfig.xml, set queryResultCache size to 0. > 2) Add some documents with text "dog dog dog" and "dog dog dog dog". > http://localhost:8983/solr/update?stream.body=%3Cadd%3E%3Cdoc%3E%3Cfield%20name=%22id%22%3E1%3C/field%3E%3Cfield%20name=%22text%22%3Edog%20dog%20dog%3C/field%3E%3C/doc%3E%3Cdoc%3E%3Cfield%20name=%22id%22%3E2%3C/field%3E%3Cfield%20name=%22text%22%3Edog%20dog%20dog%20dog%3C/field%3E%3C/doc%3E%3C/add%3E&commit=true > 3) Do a "dog dog dog dog"~1 query. > http://localhost:8983/solr/select?q=%22dog%20dog%20dog%20dog%22~1 > 4) Repeat step 3 many times. > Expected results: The document with id 2 should be returned. > Actual results: The document with id 2 is always returned. The document with > id 1 is sometimes returned. > Different proximity values show the same bug - "dog dog dog dog"~5, "dog dog > dog dog"~100, etc show the same behavior. > So far I've traced it down to the "repeats" array in > SloppyPhraseScorer.initPhrasePositions() - depending on the order of the > elements in this array, the document may or may not match. I think the > HashSet may be to blame, but I'm not sure - that at least seems to be where > the non-determinism is coming from. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3412) SloppyPhraseScorer returns non-deterministic results for queries with many repeats
[ https://issues.apache.org/jira/browse/LUCENE-3412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13100098#comment-13100098 ] Doron Cohen commented on LUCENE-3412: - Thanks Michael for verifying this, I'll go ahead and commit. > SloppyPhraseScorer returns non-deterministic results for queries with many > repeats > -- > > Key: LUCENE-3412 > URL: https://issues.apache.org/jira/browse/LUCENE-3412 > Project: Lucene - Java > Issue Type: Bug > Components: core/search >Affects Versions: 3.1, 3.2, 3.3, 4.0 >Reporter: Michael Ryan >Assignee: Doron Cohen > Attachments: LUCENE-3412.patch, LUCENE-3412.patch > > > Proximity queries with many repeats (four or more, based on my testing) > return non-deterministic results. I run the same query multiple times with > the same data set and get different results. > So far I've reproduced this with Solr 1.4.1, 3.1, 3.2, 3.3, and latest 4.0 > trunk. > Steps to reproduce (using the Solr example): > 1) In solrconfig.xml, set queryResultCache size to 0. > 2) Add some documents with text "dog dog dog" and "dog dog dog dog". > http://localhost:8983/solr/update?stream.body=%3Cadd%3E%3Cdoc%3E%3Cfield%20name=%22id%22%3E1%3C/field%3E%3Cfield%20name=%22text%22%3Edog%20dog%20dog%3C/field%3E%3C/doc%3E%3Cdoc%3E%3Cfield%20name=%22id%22%3E2%3C/field%3E%3Cfield%20name=%22text%22%3Edog%20dog%20dog%20dog%3C/field%3E%3C/doc%3E%3C/add%3E&commit=true > 3) Do a "dog dog dog dog"~1 query. > http://localhost:8983/solr/select?q=%22dog%20dog%20dog%20dog%22~1 > 4) Repeat step 3 many times. > Expected results: The document with id 2 should be returned. > Actual results: The document with id 2 is always returned. The document with > id 1 is sometimes returned. > Different proximity values show the same bug - "dog dog dog dog"~5, "dog dog > dog dog"~100, etc show the same behavior. > So far I've traced it down to the "repeats" array in > SloppyPhraseScorer.initPhrasePositions() - depending on the order of the > elements in this array, the document may or may not match. I think the > HashSet may be to blame, but I'm not sure - that at least seems to be where > the non-determinism is coming from. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3412) SloppyPhraseScorer returns non-deterministic results for queries with many repeats
[ https://issues.apache.org/jira/browse/LUCENE-3412?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Doron Cohen updated LUCENE-3412: Attachment: LUCENE-3412.patch Attached patch with fix to this bug. The fix is rather simple, - just process PP's in offset order. That is, when avoiding conflicts (a conflict means: more than a single query PP is landing on the same doc TP), make sure to handle PPs in a specific order: from first in query to last in query. This is crucial because the check for conflicts returns the PP with greater offset, and that one is advanced. It was pretty quick to fix this, but took longer to justify the fix. I added some explanations in the code so that next time justification would be faster :) and also renamed termPositionsDiffer() to termPositionsConflict() which more accurately describes the logic of that method. now need to see if this fix is also related to LUCENE-3215. > SloppyPhraseScorer returns non-deterministic results for queries with many > repeats > -- > > Key: LUCENE-3412 > URL: https://issues.apache.org/jira/browse/LUCENE-3412 > Project: Lucene - Java > Issue Type: Bug > Components: core/search >Affects Versions: 3.1, 3.2, 3.3, 4.0 >Reporter: Michael Ryan >Assignee: Doron Cohen > Attachments: LUCENE-3412.patch, LUCENE-3412.patch > > > Proximity queries with many repeats (four or more, based on my testing) > return non-deterministic results. I run the same query multiple times with > the same data set and get different results. > So far I've reproduced this with Solr 1.4.1, 3.1, 3.2, 3.3, and latest 4.0 > trunk. > Steps to reproduce (using the Solr example): > 1) In solrconfig.xml, set queryResultCache size to 0. > 2) Add some documents with text "dog dog dog" and "dog dog dog dog". > http://localhost:8983/solr/update?stream.body=%3Cadd%3E%3Cdoc%3E%3Cfield%20name=%22id%22%3E1%3C/field%3E%3Cfield%20name=%22text%22%3Edog%20dog%20dog%3C/field%3E%3C/doc%3E%3Cdoc%3E%3Cfield%20name=%22id%22%3E2%3C/field%3E%3Cfield%20name=%22text%22%3Edog%20dog%20dog%20dog%3C/field%3E%3C/doc%3E%3C/add%3E&commit=true > 3) Do a "dog dog dog dog"~1 query. > http://localhost:8983/solr/select?q=%22dog%20dog%20dog%20dog%22~1 > 4) Repeat step 3 many times. > Expected results: The document with id 2 should be returned. > Actual results: The document with id 2 is always returned. The document with > id 1 is sometimes returned. > Different proximity values show the same bug - "dog dog dog dog"~5, "dog dog > dog dog"~100, etc show the same behavior. > So far I've traced it down to the "repeats" array in > SloppyPhraseScorer.initPhrasePositions() - depending on the order of the > elements in this array, the document may or may not match. I think the > HashSet may be to blame, but I'm not sure - that at least seems to be where > the non-determinism is coming from. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3412) SloppyPhraseScorer returns non-deterministic results for queries with many repeats
[ https://issues.apache.org/jira/browse/LUCENE-3412?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Doron Cohen updated LUCENE-3412: Attachment: LUCENE-3412.patch I am able to see this inconsistent behavior! Attached patch contains a test that fails on this. The test currently prints the trial number, and the first loop always pass in all 30 trials (expected) while the second loop always fail (for me) but is inconsistent about when it fails. Sometimes, it fails on the first iteration. Some other times it fails on the 3rd, 9th, etc. Quite peculiar... investigating... > SloppyPhraseScorer returns non-deterministic results for queries with many > repeats > -- > > Key: LUCENE-3412 > URL: https://issues.apache.org/jira/browse/LUCENE-3412 > Project: Lucene - Java > Issue Type: Bug > Components: core/search >Affects Versions: 3.1, 3.2, 3.3, 4.0 >Reporter: Michael Ryan >Assignee: Doron Cohen > Attachments: LUCENE-3412.patch > > > Proximity queries with many repeats (four or more, based on my testing) > return non-deterministic results. I run the same query multiple times with > the same data set and get different results. > So far I've reproduced this with Solr 1.4.1, 3.1, 3.2, 3.3, and latest 4.0 > trunk. > Steps to reproduce (using the Solr example): > 1) In solrconfig.xml, set queryResultCache size to 0. > 2) Add some documents with text "dog dog dog" and "dog dog dog dog". > http://localhost:8983/solr/update?stream.body=%3Cadd%3E%3Cdoc%3E%3Cfield%20name=%22id%22%3E1%3C/field%3E%3Cfield%20name=%22text%22%3Edog%20dog%20dog%3C/field%3E%3C/doc%3E%3Cdoc%3E%3Cfield%20name=%22id%22%3E2%3C/field%3E%3Cfield%20name=%22text%22%3Edog%20dog%20dog%20dog%3C/field%3E%3C/doc%3E%3C/add%3E&commit=true > 3) Do a "dog dog dog dog"~1 query. > http://localhost:8983/solr/select?q=%22dog%20dog%20dog%20dog%22~1 > 4) Repeat step 3 many times. > Expected results: The document with id 2 should be returned. > Actual results: The document with id 2 is always returned. The document with > id 1 is sometimes returned. > Different proximity values show the same bug - "dog dog dog dog"~5, "dog dog > dog dog"~100, etc show the same behavior. > So far I've traced it down to the "repeats" array in > SloppyPhraseScorer.initPhrasePositions() - depending on the order of the > elements in this array, the document may or may not match. I think the > HashSet may be to blame, but I'm not sure - that at least seems to be where > the non-determinism is coming from. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Assigned] (LUCENE-3412) SloppyPhraseScorer returns non-deterministic results for queries with many repeats
[ https://issues.apache.org/jira/browse/LUCENE-3412?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Doron Cohen reassigned LUCENE-3412: --- Assignee: Doron Cohen > SloppyPhraseScorer returns non-deterministic results for queries with many > repeats > -- > > Key: LUCENE-3412 > URL: https://issues.apache.org/jira/browse/LUCENE-3412 > Project: Lucene - Java > Issue Type: Bug > Components: core/search >Affects Versions: 3.1, 3.2, 3.3, 4.0 >Reporter: Michael Ryan >Assignee: Doron Cohen > > Proximity queries with many repeats (four or more, based on my testing) > return non-deterministic results. I run the same query multiple times with > the same data set and get different results. > So far I've reproduced this with Solr 1.4.1, 3.1, 3.2, 3.3, and latest 4.0 > trunk. > Steps to reproduce (using the Solr example): > 1) In solrconfig.xml, set queryResultCache size to 0. > 2) Add some documents with text "dog dog dog" and "dog dog dog dog". > http://localhost:8983/solr/update?stream.body=%3Cadd%3E%3Cdoc%3E%3Cfield%20name=%22id%22%3E1%3C/field%3E%3Cfield%20name=%22text%22%3Edog%20dog%20dog%3C/field%3E%3C/doc%3E%3Cdoc%3E%3Cfield%20name=%22id%22%3E2%3C/field%3E%3Cfield%20name=%22text%22%3Edog%20dog%20dog%20dog%3C/field%3E%3C/doc%3E%3C/add%3E&commit=true > 3) Do a "dog dog dog dog"~1 query. > http://localhost:8983/solr/select?q=%22dog%20dog%20dog%20dog%22~1 > 4) Repeat step 3 many times. > Expected results: The document with id 2 should be returned. > Actual results: The document with id 2 is always returned. The document with > id 1 is sometimes returned. > Different proximity values show the same bug - "dog dog dog dog"~5, "dog dog > dog dog"~100, etc show the same behavior. > So far I've traced it down to the "repeats" array in > SloppyPhraseScorer.initPhrasePositions() - depending on the order of the > elements in this array, the document may or may not match. I think the > HashSet may be to blame, but I'm not sure - that at least seems to be where > the non-determinism is coming from. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-3390) Incorrect sort by Numeric values for documents missing the sorting field
[ https://issues.apache.org/jira/browse/LUCENE-3390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Doron Cohen resolved LUCENE-3390. - Resolution: Fixed Fix Version/s: 3.4 Lucene Fields: [Patch Available] (was: [New]) Fixed in 3.x r1164794. Thanks Gilad! > Incorrect sort by Numeric values for documents missing the sorting field > > > Key: LUCENE-3390 > URL: https://issues.apache.org/jira/browse/LUCENE-3390 > Project: Lucene - Java > Issue Type: Bug > Components: core/search >Affects Versions: 3.3 >Reporter: Gilad Barkai >Assignee: Doron Cohen >Priority: Minor > Labels: double, float, int, long, numeric, sort > Fix For: 3.4 > > Attachments: LUCENE-3390.patch, SortByDouble.java > > > While sorting results over a numeric field, documents which do not contain a > value for the sorting field seem to get 0 (ZERO) value in the sort. (Tested > against Double, Float, Int & Long numeric fields ascending and descending > order). > This behavior is unexpected, as zero is "comparable" to the rest of the > values. A better solution would either be allowing the user to define such a > "non-value" default, or always bring those document results as the last ones. > Example scenario: > Adding 3 documents, 1st with value 3.5d, 2nd with -10d, and 3rd without any > value. > Searching with MatchAllDocsQuery, with sort over that field in descending > order yields the docid results of 0, 2, 1. > Asking for the top 2 documents brings the document without any value as the > 2nd result - which seems as a bug? -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Assigned] (LUCENE-3390) Incorrect sort by Numeric values for documents missing the sorting field
[ https://issues.apache.org/jira/browse/LUCENE-3390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Doron Cohen reassigned LUCENE-3390: --- Assignee: Doron Cohen > Incorrect sort by Numeric values for documents missing the sorting field > > > Key: LUCENE-3390 > URL: https://issues.apache.org/jira/browse/LUCENE-3390 > Project: Lucene - Java > Issue Type: Bug > Components: core/search >Affects Versions: 3.3 >Reporter: Gilad Barkai >Assignee: Doron Cohen >Priority: Minor > Labels: double, float, int, long, numeric, sort > Attachments: LUCENE-3390.patch, SortByDouble.java > > > While sorting results over a numeric field, documents which do not contain a > value for the sorting field seem to get 0 (ZERO) value in the sort. (Tested > against Double, Float, Int & Long numeric fields ascending and descending > order). > This behavior is unexpected, as zero is "comparable" to the rest of the > values. A better solution would either be allowing the user to define such a > "non-value" default, or always bring those document results as the last ones. > Example scenario: > Adding 3 documents, 1st with value 3.5d, 2nd with -10d, and 3rd without any > value. > Searching with MatchAllDocsQuery, with sort over that field in descending > order yields the docid results of 0, 2, 1. > Asking for the top 2 documents brings the document without any value as the > 2nd result - which seems as a bug? -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3390) Incorrect sort by Numeric values for documents missing the sorting field
[ https://issues.apache.org/jira/browse/LUCENE-3390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Doron Cohen updated LUCENE-3390: Attachment: LUCENE-3390.patch Attached patch fixing this bug. TestSort was enhanced to test the new setMissingValue() method - actually merging the test from trunk r1002460 (LUCENE-2671). All search test passed (running the rest now..) > Incorrect sort by Numeric values for documents missing the sorting field > > > Key: LUCENE-3390 > URL: https://issues.apache.org/jira/browse/LUCENE-3390 > Project: Lucene - Java > Issue Type: Bug > Components: core/search >Affects Versions: 3.3 >Reporter: Gilad Barkai >Priority: Minor > Labels: double, float, int, long, numeric, sort > Attachments: LUCENE-3390.patch, SortByDouble.java > > > While sorting results over a numeric field, documents which do not contain a > value for the sorting field seem to get 0 (ZERO) value in the sort. (Tested > against Double, Float, Int & Long numeric fields ascending and descending > order). > This behavior is unexpected, as zero is "comparable" to the rest of the > values. A better solution would either be allowing the user to define such a > "non-value" default, or always bring those document results as the last ones. > Example scenario: > Adding 3 documents, 1st with value 3.5d, 2nd with -10d, and 3rd without any > value. > Searching with MatchAllDocsQuery, with sort over that field in descending > order yields the docid results of 0, 2, 1. > Asking for the top 2 documents brings the document without any value as the > 2nd result - which seems as a bug? -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3390) Incorrect sort by Numeric values for documents missing the sorting field
[ https://issues.apache.org/jira/browse/LUCENE-3390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13095409#comment-13095409 ] Doron Cohen commented on LUCENE-3390: - I think it may be useful to solve this also in 3x - without the cached-array-creators of the trunk, but with similar concept - i.e. an additional cache "type" will cache the docs missing values for certain field, and will allow to use the default value assigned by apps calling setMissingValue() as in trunk. Gilad and I looked at this, will post a patch shortly for review... > Incorrect sort by Numeric values for documents missing the sorting field > > > Key: LUCENE-3390 > URL: https://issues.apache.org/jira/browse/LUCENE-3390 > Project: Lucene - Java > Issue Type: Bug > Components: core/search >Affects Versions: 3.3 >Reporter: Gilad Barkai >Priority: Minor > Labels: double, float, int, long, numeric, sort > Attachments: SortByDouble.java > > > While sorting results over a numeric field, documents which do not contain a > value for the sorting field seem to get 0 (ZERO) value in the sort. (Tested > against Double, Float, Int & Long numeric fields ascending and descending > order). > This behavior is unexpected, as zero is "comparable" to the rest of the > values. A better solution would either be allowing the user to define such a > "non-value" default, or always bring those document results as the last ones. > Example scenario: > Adding 3 documents, 1st with value 3.5d, 2nd with -10d, and 3rd without any > value. > Searching with MatchAllDocsQuery, with sort over that field in descending > order yields the docid results of 0, 2, 1. > Asking for the top 2 documents brings the document without any value as the > 2nd result - which seems as a bug? -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-3142) benchmark/stats package is obsolete and unused - remove it
[ https://issues.apache.org/jira/browse/LUCENE-3142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Doron Cohen resolved LUCENE-3142. - Resolution: Fixed r1141465: trunk r1141468: 3x > benchmark/stats package is obsolete and unused - remove it > -- > > Key: LUCENE-3142 > URL: https://issues.apache.org/jira/browse/LUCENE-3142 > Project: Lucene - Java > Issue Type: Bug > Components: modules/benchmark >Reporter: Doron Cohen >Assignee: Doron Cohen >Priority: Minor > > This seems like a leftover from the original benchmark implementation and can > thus be removed. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3153) Adding field w/ norms should fail if same field was added w/o norms already
[ https://issues.apache.org/jira/browse/LUCENE-3153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13041461#comment-13041461 ] Doron Cohen commented on LUCENE-3153: - I was not clear enough. I meant that when deciding on consistency of requested NORMS state, if relying only on committed data, then the handling of add/update requests is in a best effort manner, while the handling at commit is complete. So, for this example: * Index does not contain field F * doc1 is added with F set to NO NORMS * doc2 is added with F set to WITH NORMS I was not sure about the ability to tell that F in doc2 is inconsistent, because of relying on committed data, and, perhaps, especially with DWPT. At commit, it is def possible to check this. Similarly this scenario has same problem: * Index contains (committed) field F WITH NORMS * doc1 is added with F set to NO NORMS * doc2 is added with F set to WITH NORMS Again, F in doc2, while consistent with F as committed in the index, is inconsistent with previously added F in doc1. In this situation, throwing the exception due to inconsistencies might have to be late in some scenarios (at commit) and hence unacceptable IMO. At the least, such a behavior should be specifically requested by application, e.g. by setting a STRICT_NORMS mode or something like that in iwcfg. I am not convinced going that far is justified. > Adding field w/ norms should fail if same field was added w/o norms already > --- > > Key: LUCENE-3153 > URL: https://issues.apache.org/jira/browse/LUCENE-3153 > Project: Lucene - Java > Issue Type: Bug > Components: core/index >Reporter: Shai Erera > Fix For: 4.0 > > > A spinoff from LUCENE-3146. Consider the following two scenarios, according > to how 4.0 currently works: > * Field "a" is added w/ norms. Sometime later field "a" is added to a > document w/o norms -- norms are disabled for field "a", for all docs. > * Field "a" is added w/o norms - norms are disabled for field "a". Sometime > later field "a" is added to a document w/ norms -- app thinks norms were > added, while in fact they are dropped. > This is a bug and case #2 should fail on add/updateDocument - app should know > norms were not added. While case #1 isn't great either, it's the only way an > app can choose to disable norms for field "a", after instances of it already > contain norms, so we should support that scenario. > In order to detect that early, we should track norms info in .fnx, as Mike > describes at LUCENE-3146. Since this changes the index format, we should also > update the "file format" page after we do it. > Not sure what's the deal w/ 3.x indexes that are read by 4.0 code. Initially > they won't have .fnx file, so no central norms information exist to detect > the cases I've described above. Over time, as segments are merged, .fnx will > include information from more and more segments, but there's always a chance > few segments will still contain the norms for field "a". I'm not very > familiar w/ that part of the code, but I think that: > * If .fnx says "no norms for field a", the we ignore any norms information > that may or may not exist in segments. > * If .fnx says "norms for field a", then we need to make up some norms values > for (old) segments w/ no norms? We need to make up values during segment > merge and search? -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3164) consolidate various CHANGES.txt into two files: lucene and solr
[ https://issues.apache.org/jira/browse/LUCENE-3164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13041439#comment-13041439 ] Doron Cohen commented on LUCENE-3164: - Agreed, 3 for now, and then we'll see... > consolidate various CHANGES.txt into two files: lucene and solr > --- > > Key: LUCENE-3164 > URL: https://issues.apache.org/jira/browse/LUCENE-3164 > Project: Lucene - Java > Issue Type: Task >Reporter: Robert Muir > > There are CHANGES.txt files everywhere: lucene/contrib has a CHANGES.txt, the > benchmark package has its own CHANGES.txt, in trunk all the modules have > their own CHANGES.txt, and each solr contrib has its own CHANGES.txt > I propose we merge these files into a CHANGES.txt for each "product" we make. > so that means lucene/CHANGES.txt and solr/CHANGES.txt -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3164) consolidate various CHANGES.txt into two files: lucene and solr
[ https://issues.apache.org/jira/browse/LUCENE-3164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13041425#comment-13041425 ] Doron Cohen commented on LUCENE-3164: - I agree that with frequent releases this is less of an issue. What are your thoughts about trunk in this regard - would you like there 3 changes files, i.e. keep one for modules? > consolidate various CHANGES.txt into two files: lucene and solr > --- > > Key: LUCENE-3164 > URL: https://issues.apache.org/jira/browse/LUCENE-3164 > Project: Lucene - Java > Issue Type: Task >Reporter: Robert Muir > > There are CHANGES.txt files everywhere: lucene/contrib has a CHANGES.txt, the > benchmark package has its own CHANGES.txt, in trunk all the modules have > their own CHANGES.txt, and each solr contrib has its own CHANGES.txt > I propose we merge these files into a CHANGES.txt for each "product" we make. > so that means lucene/CHANGES.txt and solr/CHANGES.txt -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3161) consider warnings from the source compilation
[ https://issues.apache.org/jira/browse/LUCENE-3161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13041418#comment-13041418 ] Doron Cohen commented on LUCENE-3161: - bq. And, I don't think we should in general hide any warnings, even to users for the reasons i mentioned above. +1 for not hiding! > consider warnings from the source compilation > - > > Key: LUCENE-3161 > URL: https://issues.apache.org/jira/browse/LUCENE-3161 > Project: Lucene - Java > Issue Type: Task > Components: general/build >Reporter: Robert Muir > Labels: maybe32blocker > Fix For: 3.3, 4.0 > > > as Doron mentioned in his review: At compiling there are various warning > printed, I think it would be more assuring for downloaders if the build runs > without warning. These warnings are not a stopper. > we could conditionalize these warnings so that they don't "display" when > compiling from actual releases, but I have to wonder if we should hide > these... being open source I think we should display all our warts, maybe > some contributor sees these warnings and decides they want to submit a patch > to fix some of them. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3164) consolidate various CHANGES.txt into two files: lucene and solr
[ https://issues.apache.org/jira/browse/LUCENE-3164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13041409#comment-13041409 ] Doron Cohen commented on LUCENE-3164: - Specifically, current files are: lucene: - CHANGES.txt - contrib/benchmark/CHANGES.txt - contrib/CHANGES.txt - contrib/grouping/CHANGES.txt solr - CHANGES.txt - client/ruby/flare/vendor/plugins/engines/CHANGELOG (\?) - client/ruby/solr-ruby/CHANGES.yml (\?) - contrib/analysis-extras/CHANGES.txt - contrib/clustering/CHANGES.txt - contrib/dataimporthandler/CHANGES.txt - solr/contrib/extraction/CHANGES.txt - solr/contrib/uima/CHANGES.txt In favor of this, all changes would become more easily readable for users in the HTML format. There is a risk that changes in contribs/modules would clutter the core changes. For example, today, even small changes in contrib/benchmark are listed in the changes file. But when this becomes part of the global changes file, not sure if all bm changes would be adequate to be listed there? > consolidate various CHANGES.txt into two files: lucene and solr > --- > > Key: LUCENE-3164 > URL: https://issues.apache.org/jira/browse/LUCENE-3164 > Project: Lucene - Java > Issue Type: Task >Reporter: Robert Muir > > There are CHANGES.txt files everywhere: lucene/contrib has a CHANGES.txt, the > benchmark package has its own CHANGES.txt, in trunk all the modules have > their own CHANGES.txt, and each solr contrib has its own CHANGES.txt > I propose we merge these files into a CHANGES.txt for each "product" we make. > so that means lucene/CHANGES.txt and solr/CHANGES.txt -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3153) Adding field w/ norms should fail if same field was added w/o norms already
[ https://issues.apache.org/jira/browse/LUCENE-3153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13041403#comment-13041403 ] Doron Cohen commented on LUCENE-3153: - Can this be checked before any commit (/flush)? Assume 10 docs were added without norms to a fresh index, now, without a commit or even a flush, a document is added with norms. Is the info required for checking the "configuration" for that field available at that time? If it is not, this is still just a best effort check. > Adding field w/ norms should fail if same field was added w/o norms already > --- > > Key: LUCENE-3153 > URL: https://issues.apache.org/jira/browse/LUCENE-3153 > Project: Lucene - Java > Issue Type: Bug > Components: core/index >Reporter: Shai Erera > Fix For: 4.0 > > > A spinoff from LUCENE-3146. Consider the following two scenarios, according > to how 4.0 currently works: > * Field "a" is added w/ norms. Sometime later field "a" is added to a > document w/o norms -- norms are disabled for field "a", for all docs. > * Field "a" is added w/o norms - norms are disabled for field "a". Sometime > later field "a" is added to a document w/ norms -- app thinks norms were > added, while in fact they are dropped. > This is a bug and case #2 should fail on add/updateDocument - app should know > norms were not added. While case #1 isn't great either, it's the only way an > app can choose to disable norms for field "a", after instances of it already > contain norms, so we should support that scenario. > In order to detect that early, we should track norms info in .fnx, as Mike > describes at LUCENE-3146. Since this changes the index format, we should also > update the "file format" page after we do it. > Not sure what's the deal w/ 3.x indexes that are read by 4.0 code. Initially > they won't have .fnx file, so no central norms information exist to detect > the cases I've described above. Over time, as segments are merged, .fnx will > include information from more and more segments, but there's always a chance > few segments will still contain the norms for field "a". I'm not very > familiar w/ that part of the code, but I think that: > * If .fnx says "no norms for field a", the we ignore any norms information > that may or may not exist in segments. > * If .fnx says "norms for field a", then we need to make up some norms values > for (old) segments w/ no norms? We need to make up values during segment > merge and search? -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3142) benchmark/stats package is obsolete and unused - remove it
[ https://issues.apache.org/jira/browse/LUCENE-3142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13039073#comment-13039073 ] Doron Cohen commented on LUCENE-3142: - Just to make sure this is clear, the package in question is: o.a.l.benchmark.stats > benchmark/stats package is obsolete and unused - remove it > -- > > Key: LUCENE-3142 > URL: https://issues.apache.org/jira/browse/LUCENE-3142 > Project: Lucene - Java > Issue Type: Bug > Components: modules/benchmark >Reporter: Doron Cohen >Assignee: Doron Cohen >Priority: Minor > > This seems like a leftover from the original benchmark implementation and can > thus be removed. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3142) benchmark/stats package is obsolete and unused - remove it
[ https://issues.apache.org/jira/browse/LUCENE-3142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13039066#comment-13039066 ] Doron Cohen commented on LUCENE-3142: - Does anyone see why this should remain? (I will wait ~2 days before actually removing it) > benchmark/stats package is obsolete and unused - remove it > -- > > Key: LUCENE-3142 > URL: https://issues.apache.org/jira/browse/LUCENE-3142 > Project: Lucene - Java > Issue Type: Bug > Components: modules/benchmark >Reporter: Doron Cohen >Assignee: Doron Cohen >Priority: Minor > > This seems like a leftover from the original benchmark implementation and can > thus be removed. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-3142) benchmark/stats package is obsolete and unused - remove it
benchmark/stats package is obsolete and unused - remove it -- Key: LUCENE-3142 URL: https://issues.apache.org/jira/browse/LUCENE-3142 Project: Lucene - Java Issue Type: Bug Components: modules/benchmark Reporter: Doron Cohen Assignee: Doron Cohen Priority: Minor This seems like a leftover from the original benchmark implementation and can thus be removed. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-3137) Benchmark's ExtractReuters creates its temp dir wrongly if provided out-dir param ends by slash
[ https://issues.apache.org/jira/browse/LUCENE-3137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Doron Cohen resolved LUCENE-3137. - Resolution: Fixed Fix Version/s: 4.0 3.2 Trunk: r1127436 3x: r1127466 > Benchmark's ExtractReuters creates its temp dir wrongly if provided out-dir > param ends by slash > --- > > Key: LUCENE-3137 > URL: https://issues.apache.org/jira/browse/LUCENE-3137 > Project: Lucene - Java > Issue Type: Bug > Components: modules/benchmark >Affects Versions: 3.2, 4.0 >Reporter: Doron Cohen >Assignee: Doron Cohen >Priority: Minor > Fix For: 3.2, 4.0 > > Attachments: LUCENE-3137.patch > > > See LUCENE-929 for context. > As result, it might fail to create the temp dir at all. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-929) contrib/benchmark build doesn't handle checking if content is properly extracted
[ https://issues.apache.org/jira/browse/LUCENE-929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Doron Cohen resolved LUCENE-929. Resolution: Fixed bq. Doron, that's fine to open a new issue and close this one, but it was this issue's fix that introduced the bug. Thanks for clarifying! Okay, so I will fix this in LUCENE-3137 (it makes sense to me at this time since this one was resolved 4 months ago and fixed something else) and resolve this one. > contrib/benchmark build doesn't handle checking if content is properly > extracted > > > Key: LUCENE-929 > URL: https://issues.apache.org/jira/browse/LUCENE-929 > Project: Lucene - Java > Issue Type: Bug > Components: modules/benchmark >Reporter: Grant Ingersoll >Assignee: Grant Ingersoll >Priority: Minor > Fix For: 4.0, 3.1 > > > The contrib/benchmark build does not properly handle checking to see if the > content (such as Reuters coll.) is properly extracted. It only checks to see > if the directory exists. Thus, it is possible that the directory gets > created and the extraction fails. Then, the next time it is run, it skips > the extraction part and tries to continue on running the benchmark. > The workaround is to manually delete the extraction directory. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-929) contrib/benchmark build doesn't handle checking if content is properly extracted
[ https://issues.apache.org/jira/browse/LUCENE-929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13038502#comment-13038502 ] Doron Cohen commented on LUCENE-929: There's now a simple patch for this in LUCENE-3137. I think this one can be closed? > contrib/benchmark build doesn't handle checking if content is properly > extracted > > > Key: LUCENE-929 > URL: https://issues.apache.org/jira/browse/LUCENE-929 > Project: Lucene - Java > Issue Type: Bug > Components: modules/benchmark >Reporter: Grant Ingersoll >Assignee: Grant Ingersoll >Priority: Minor > Fix For: 3.1, 4.0 > > > The contrib/benchmark build does not properly handle checking to see if the > content (such as Reuters coll.) is properly extracted. It only checks to see > if the directory exists. Thus, it is possible that the directory gets > created and the extraction fails. Then, the next time it is run, it skips > the extraction part and tries to continue on running the benchmark. > The workaround is to manually delete the extraction directory. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-929) contrib/benchmark build doesn't handle checking if content is properly extracted
[ https://issues.apache.org/jira/browse/LUCENE-929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13038498#comment-13038498 ] Doron Cohen commented on LUCENE-929: bq. Note, this fix this doesn't work if the output dir has a trailing slash I think this is a separate issue - I mean not handling a trailing slash. Created LUCENE-3137 for handling this. > contrib/benchmark build doesn't handle checking if content is properly > extracted > > > Key: LUCENE-929 > URL: https://issues.apache.org/jira/browse/LUCENE-929 > Project: Lucene - Java > Issue Type: Bug > Components: modules/benchmark >Reporter: Grant Ingersoll >Assignee: Grant Ingersoll >Priority: Minor > Fix For: 3.1, 4.0 > > > The contrib/benchmark build does not properly handle checking to see if the > content (such as Reuters coll.) is properly extracted. It only checks to see > if the directory exists. Thus, it is possible that the directory gets > created and the extraction fails. Then, the next time it is run, it skips > the extraction part and tries to continue on running the benchmark. > The workaround is to manually delete the extraction directory. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3137) Benchmark's ExtractReuters creates its temp dir wrongly if provided out-dir param ends by slash
[ https://issues.apache.org/jira/browse/LUCENE-3137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Doron Cohen updated LUCENE-3137: Attachment: LUCENE-3137.patch Simple patch solving this slash problem. > Benchmark's ExtractReuters creates its temp dir wrongly if provided out-dir > param ends by slash > --- > > Key: LUCENE-3137 > URL: https://issues.apache.org/jira/browse/LUCENE-3137 > Project: Lucene - Java > Issue Type: Bug > Components: modules/benchmark >Affects Versions: 3.2, 4.0 >Reporter: Doron Cohen >Assignee: Doron Cohen >Priority: Minor > Attachments: LUCENE-3137.patch > > > See LUCENE-929 for context. > As result, it might fail to create the temp dir at all. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-3137) Benchmark's ExtractReuters creates its temp dir wrongly if provided out-dir param ends by slash
Benchmark's ExtractReuters creates its temp dir wrongly if provided out-dir param ends by slash --- Key: LUCENE-3137 URL: https://issues.apache.org/jira/browse/LUCENE-3137 Project: Lucene - Java Issue Type: Bug Components: modules/benchmark Affects Versions: 3.2, 4.0 Reporter: Doron Cohen Assignee: Doron Cohen Priority: Minor See LUCENE-929 for context. As result, it might fail to create the temp dir at all. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (SOLR-2500) TestSolrProperties sometimes fails with "no such core: core0"
[ https://issues.apache.org/jira/browse/SOLR-2500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Doron Cohen resolved SOLR-2500. --- Resolution: Fixed Fix Version/s: 4.0 3.2 fixed in trunk: r1125932. merged to 3x: r1125942. > TestSolrProperties sometimes fails with "no such core: core0" > - > > Key: SOLR-2500 > URL: https://issues.apache.org/jira/browse/SOLR-2500 > Project: Solr > Issue Type: Bug >Affects Versions: 4.0 >Reporter: Robert Muir >Assignee: Doron Cohen > Fix For: 3.2, 4.0 > > Attachments: SOLR-2500.patch, SOLR-2500.patch, SOLR-2500.patch, > solr-after-1st-run.xml, solr-clean.xml > > > [junit] Testsuite: > org.apache.solr.client.solrj.embedded.TestSolrProperties > [junit] Testcase: > testProperties(org.apache.solr.client.solrj.embedded.TestSolrProperties): > Caused an ERROR > [junit] No such core: core0 > [junit] org.apache.solr.common.SolrException: No such core: core0 > [junit] at > org.apache.solr.client.solrj.embedded.EmbeddedSolrServer.request(EmbeddedSolrServer.java:118) > [junit] at > org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:105) > [junit] at > org.apache.solr.client.solrj.embedded.TestSolrProperties.testProperties(TestSolrProperties.java:128) > [junit] at > org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1260) > [junit] at > org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1189) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Assigned] (SOLR-2500) TestSolrProperties sometimes fails with "no such core: core0"
[ https://issues.apache.org/jira/browse/SOLR-2500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Doron Cohen reassigned SOLR-2500: - Assignee: Doron Cohen > TestSolrProperties sometimes fails with "no such core: core0" > - > > Key: SOLR-2500 > URL: https://issues.apache.org/jira/browse/SOLR-2500 > Project: Solr > Issue Type: Bug >Affects Versions: 4.0 >Reporter: Robert Muir >Assignee: Doron Cohen > Attachments: SOLR-2500.patch, SOLR-2500.patch, SOLR-2500.patch, > solr-after-1st-run.xml, solr-clean.xml > > > [junit] Testsuite: > org.apache.solr.client.solrj.embedded.TestSolrProperties > [junit] Testcase: > testProperties(org.apache.solr.client.solrj.embedded.TestSolrProperties): > Caused an ERROR > [junit] No such core: core0 > [junit] org.apache.solr.common.SolrException: No such core: core0 > [junit] at > org.apache.solr.client.solrj.embedded.EmbeddedSolrServer.request(EmbeddedSolrServer.java:118) > [junit] at > org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:105) > [junit] at > org.apache.solr.client.solrj.embedded.TestSolrProperties.testProperties(TestSolrProperties.java:128) > [junit] at > org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1260) > [junit] at > org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1189) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3120) span query matches too many docs when two query terms are the same unless inOrder=true
[ https://issues.apache.org/jira/browse/LUCENE-3120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Doron Cohen updated LUCENE-3120: Attachment: LUCENE-3120.patch Updated patch with fixed test to not depend on analysis module. > span query matches too many docs when two query terms are the same unless > inOrder=true > -- > > Key: LUCENE-3120 > URL: https://issues.apache.org/jira/browse/LUCENE-3120 > Project: Lucene - Java > Issue Type: Bug > Components: core/search >Reporter: Doron Cohen >Priority: Minor > Fix For: 3.2, 4.0 > > Attachments: LUCENE-3120.patch, LUCENE-3120.patch > > > spinoff of user list discussion - [SpanNearQuery - inOrder > parameter|http://markmail.org/message/i4cstlwgjmlcfwlc]. > With 3 documents: > * "a b x c d" > * "a b b d" > * "a b x b y d" > Here are a few queries (the number in parenthesis indicates expected #hits): > These ones work *as expected*: > * (1) in-order, slop=0, "b", "x", "b" > * (1) in-order, slop=0, "b", "b" > * (2) in-order, slop=1, "b", "b" > These ones match *too many* hits: > * (1) any-order, slop=0, "b", "x", "b" > * (1) any-order, slop=1, "b", "x", "b" > * (1) any-order, slop=2, "b", "x", "b" > * (1) any-order, slop=3, "b", "x", "b" > These ones match *too many* hits as well: > * (1) any-order, slop=0, "b", "b" > * (2) any-order, slop=1, "b", "b" > Each of the above passes when using a phrase query (applying the slop, no > in-order indication in phrase query). > This seems related to a known overlapping spans issue - [non-overlapping Span > queries|http://markmail.org/message/7jxn5eysjagjwlon] - as indicated by Hoss, > so we might decide to close this bug after all, but I would like to at least > have the junit that exposes the behavior in JIRA. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-2500) TestSolrCoreProperties sometimes fails with "no such core: core0"
[ https://issues.apache.org/jira/browse/SOLR-2500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Doron Cohen updated SOLR-2500: -- Attachment: SOLR-2500.patch Attached patch, test passes now in both IDE and cmd line: * at setup() copies solr.xml to a private file. * use that private file as its solr.solr.home. * erase that file at tearDown(), though not erasing it should not affect on further/re/tests. * fixes the deletion at tearDown() to look at solr.solr.home rather than solr.home. (I think this was a bug on a bug in this test - it used the original file at s.s.h but for cleanup attempted to remove files from just s.h. This debugging took place in pure darkness, better review... > TestSolrCoreProperties sometimes fails with "no such core: core0" > - > > Key: SOLR-2500 > URL: https://issues.apache.org/jira/browse/SOLR-2500 > Project: Solr > Issue Type: Bug >Affects Versions: 4.0 >Reporter: Robert Muir > Attachments: SOLR-2500.patch, SOLR-2500.patch, SOLR-2500.patch, > solr-after-1st-run.xml, solr-clean.xml > > > [junit] Testsuite: > org.apache.solr.client.solrj.embedded.TestSolrProperties > [junit] Testcase: > testProperties(org.apache.solr.client.solrj.embedded.TestSolrProperties): > Caused an ERROR > [junit] No such core: core0 > [junit] org.apache.solr.common.SolrException: No such core: core0 > [junit] at > org.apache.solr.client.solrj.embedded.EmbeddedSolrServer.request(EmbeddedSolrServer.java:118) > [junit] at > org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:105) > [junit] at > org.apache.solr.client.solrj.embedded.TestSolrProperties.testProperties(TestSolrProperties.java:128) > [junit] at > org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1260) > [junit] at > org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1189) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-3123) TestIndexWriter.testBackgroundOptimize fails with too many open files
[ https://issues.apache.org/jira/browse/LUCENE-3123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Doron Cohen resolved LUCENE-3123. - Resolution: Fixed Fix Version/s: 4.0 3.2 Fixed by Mike, thanks Mike! > TestIndexWriter.testBackgroundOptimize fails with too many open files > - > > Key: LUCENE-3123 > URL: https://issues.apache.org/jira/browse/LUCENE-3123 > Project: Lucene - Java > Issue Type: Bug > Components: core/index > Environment: Linux 2.6.32-31-generic i386/Sun Microsystems Inc. > 1.6.0_20 (32-bit)/cpus=1,threads=2 >Reporter: Doron Cohen > Fix For: 3.2, 4.0 > > > Recreate with this line: > ant test -Dtestcase=TestIndexWriter -Dtestmethod=testBackgroundOptimize > -Dtests.seed=-3981504507637360146:51354004663342240 > Might be related to LUCENE-2873 ? -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3123) TestIndexWriter.testBackgroundOptimize fails with too many open files
[ https://issues.apache.org/jira/browse/LUCENE-3123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13036331#comment-13036331 ] Doron Cohen commented on LUCENE-3123: - I fact in 3x this is not reproducible with same seed (expected as Robert once explained) and I was not able to reproduce it with no seed, tried with -Dtest.iter=100 as well (though I am not sure, would a new seed be created in each iteration? Need to verify this...) Anyhow in 3x the test passes also after svn up with this fix. So I think this can be resolved... > TestIndexWriter.testBackgroundOptimize fails with too many open files > - > > Key: LUCENE-3123 > URL: https://issues.apache.org/jira/browse/LUCENE-3123 > Project: Lucene - Java > Issue Type: Bug > Components: core/index > Environment: Linux 2.6.32-31-generic i386/Sun Microsystems Inc. > 1.6.0_20 (32-bit)/cpus=1,threads=2 >Reporter: Doron Cohen > > Recreate with this line: > ant test -Dtestcase=TestIndexWriter -Dtestmethod=testBackgroundOptimize > -Dtests.seed=-3981504507637360146:51354004663342240 > Might be related to LUCENE-2873 ? -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3123) TestIndexWriter.testBackgroundOptimize fails with too many open files
[ https://issues.apache.org/jira/browse/LUCENE-3123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13036322#comment-13036322 ] Doron Cohen commented on LUCENE-3123: - Yes, thanks, now it passes (trunk) - with this seed as well quite a few times without specifying a seed. I'll now verify on 3x. > TestIndexWriter.testBackgroundOptimize fails with too many open files > - > > Key: LUCENE-3123 > URL: https://issues.apache.org/jira/browse/LUCENE-3123 > Project: Lucene - Java > Issue Type: Bug > Components: core/index > Environment: Linux 2.6.32-31-generic i386/Sun Microsystems Inc. > 1.6.0_20 (32-bit)/cpus=1,threads=2 >Reporter: Doron Cohen > > Recreate with this line: > ant test -Dtestcase=TestIndexWriter -Dtestmethod=testBackgroundOptimize > -Dtests.seed=-3981504507637360146:51354004663342240 > Might be related to LUCENE-2873 ? -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2500) TestSolrCoreProperties sometimes fails with "no such core: core0"
[ https://issues.apache.org/jira/browse/SOLR-2500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13036300#comment-13036300 ] Doron Cohen commented on SOLR-2500: --- Oops just noticed I was testing all this time TestSolrProperties and not TestSolrCoreProperties, and, because the error message was the same as in the issue description *"No such core: core0"* I was sure that this is the same test... Now this is confusing... Hmmm.. the original exception reported above is [junit] at org.apache.solr.client.solrj.embedded.TestSolrProperties.testProperties(TestSolrProperties.java:128) So perhaps I was working on the correct bug after all and just the JIRA issue title is inaccurate? Or I need to call it a day... :) Anyhow, TestSolrProperties consistently behaves as I described here, while TestSolrCoreProperties consistently passes (when ran in standalone mode). > TestSolrCoreProperties sometimes fails with "no such core: core0" > - > > Key: SOLR-2500 > URL: https://issues.apache.org/jira/browse/SOLR-2500 > Project: Solr > Issue Type: Bug >Affects Versions: 4.0 >Reporter: Robert Muir > Attachments: SOLR-2500.patch, solr-after-1st-run.xml, solr-clean.xml > > > [junit] Testsuite: > org.apache.solr.client.solrj.embedded.TestSolrProperties > [junit] Testcase: > testProperties(org.apache.solr.client.solrj.embedded.TestSolrProperties): > Caused an ERROR > [junit] No such core: core0 > [junit] org.apache.solr.common.SolrException: No such core: core0 > [junit] at > org.apache.solr.client.solrj.embedded.EmbeddedSolrServer.request(EmbeddedSolrServer.java:118) > [junit] at > org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:105) > [junit] at > org.apache.solr.client.solrj.embedded.TestSolrProperties.testProperties(TestSolrProperties.java:128) > [junit] at > org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1260) > [junit] at > org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1189) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2500) TestSolrCoreProperties sometimes fails with "no such core: core0"
[ https://issues.apache.org/jira/browse/SOLR-2500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13036288#comment-13036288 ] Doron Cohen commented on SOLR-2500: --- FWIW, also the first clean run would fail if test's tearDown() is modified like this: {noformat} -persistedFile.delete(); +assertTrue("could not delete "+persistedFile, persistedFile.delete()); {noformat} For some reason it fails to remove that file - in both Linux and Windows. > TestSolrCoreProperties sometimes fails with "no such core: core0" > - > > Key: SOLR-2500 > URL: https://issues.apache.org/jira/browse/SOLR-2500 > Project: Solr > Issue Type: Bug >Affects Versions: 4.0 >Reporter: Robert Muir > Attachments: SOLR-2500.patch, solr-after-1st-run.xml, solr-clean.xml > > > [junit] Testsuite: > org.apache.solr.client.solrj.embedded.TestSolrProperties > [junit] Testcase: > testProperties(org.apache.solr.client.solrj.embedded.TestSolrProperties): > Caused an ERROR > [junit] No such core: core0 > [junit] org.apache.solr.common.SolrException: No such core: core0 > [junit] at > org.apache.solr.client.solrj.embedded.EmbeddedSolrServer.request(EmbeddedSolrServer.java:118) > [junit] at > org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:105) > [junit] at > org.apache.solr.client.solrj.embedded.TestSolrProperties.testProperties(TestSolrProperties.java:128) > [junit] at > org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1260) > [junit] at > org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1189) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-2500) TestSolrCoreProperties sometimes fails with "no such core: core0"
[ https://issues.apache.org/jira/browse/SOLR-2500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Doron Cohen updated SOLR-2500: -- Attachment: solr-after-1st-run.xml solr-clean.xml solr.xml files from trunk/bin/solr/shared: - clean - with which the test passes. - after-1st-run - with which it fails. > TestSolrCoreProperties sometimes fails with "no such core: core0" > - > > Key: SOLR-2500 > URL: https://issues.apache.org/jira/browse/SOLR-2500 > Project: Solr > Issue Type: Bug >Affects Versions: 4.0 >Reporter: Robert Muir > Attachments: solr-after-1st-run.xml, solr-clean.xml > > > [junit] Testsuite: > org.apache.solr.client.solrj.embedded.TestSolrProperties > [junit] Testcase: > testProperties(org.apache.solr.client.solrj.embedded.TestSolrProperties): > Caused an ERROR > [junit] No such core: core0 > [junit] org.apache.solr.common.SolrException: No such core: core0 > [junit] at > org.apache.solr.client.solrj.embedded.EmbeddedSolrServer.request(EmbeddedSolrServer.java:118) > [junit] at > org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:105) > [junit] at > org.apache.solr.client.solrj.embedded.TestSolrProperties.testProperties(TestSolrProperties.java:128) > [junit] at > org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1260) > [junit] at > org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1189) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2500) TestSolrCoreProperties sometimes fails with "no such core: core0"
[ https://issues.apache.org/jira/browse/SOLR-2500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13036242#comment-13036242 ] Doron Cohen commented on SOLR-2500: --- >From Eclipse (XP), passed at 1st attempt, failed at the 2nd! I am not familiar with this part of the code so it would be too much work to track it all the way myself, but I think I can now provide sufficient information for solving it. In Eclipse, after cleaning the project the test passes, and then start failing in all successive runs. So I assume when you run it isolated you also do clean, which covers Eclipse's clean (and more). I tracked the content of the cleaned relevant dir before and after the test - it is (trunk/)bin/solr - there's only one file that differs between the runs - this is bin/solr/shared/solr.xml. Not sure if this is a bug in the test not cleaning after itself or a bug in the code that reads the configuration... I'll attach here the two file so that you can compare them. > TestSolrCoreProperties sometimes fails with "no such core: core0" > - > > Key: SOLR-2500 > URL: https://issues.apache.org/jira/browse/SOLR-2500 > Project: Solr > Issue Type: Bug >Affects Versions: 4.0 >Reporter: Robert Muir > > [junit] Testsuite: > org.apache.solr.client.solrj.embedded.TestSolrProperties > [junit] Testcase: > testProperties(org.apache.solr.client.solrj.embedded.TestSolrProperties): > Caused an ERROR > [junit] No such core: core0 > [junit] org.apache.solr.common.SolrException: No such core: core0 > [junit] at > org.apache.solr.client.solrj.embedded.EmbeddedSolrServer.request(EmbeddedSolrServer.java:118) > [junit] at > org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:105) > [junit] at > org.apache.solr.client.solrj.embedded.TestSolrProperties.testProperties(TestSolrProperties.java:128) > [junit] at > org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1260) > [junit] at > org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1189) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3123) TestIndexWriter.testBackgroundOptimize fails with too many open files
[ https://issues.apache.org/jira/browse/LUCENE-3123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13036163#comment-13036163 ] Doron Cohen commented on LUCENE-3123: - This is on Ubuntu btw. Run log: {noformat} NOTE: reproduce with: ant test -Dtestcase=TestIndexWriter -Dtestmethod=testBackgroundOptimize -Dtests.seed=-3981504507637360146:51354004663342240 NOTE: reproduce with: ant test -Dtestcase=TestIndexWriter -Dtestmethod=testBackgroundOptimize -Dtests.seed=-3981504507637360146:51354004663342240 The following exceptions were thrown by threads: *** Thread: Lucene Merge Thread #0 *** org.apache.lucene.index.MergePolicy$MergeException: java.io.FileNotFoundException: /tmp/test4907593285402510583tmp/_51_0.sd (Too many open files) at org.apache.lucene.index.ConcurrentMergeScheduler.handleMergeException(ConcurrentMergeScheduler.java:507) at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:472) Caused by: java.io.FileNotFoundException: /tmp/test4907593285402510583tmp/_51_0.sd (Too many open files) at java.io.RandomAccessFile.open(Native Method) at java.io.RandomAccessFile.(RandomAccessFile.java:233) at org.apache.lucene.store.SimpleFSDirectory$SimpleFSIndexInput$Descriptor.(SimpleFSDirectory.java:69) at org.apache.lucene.store.SimpleFSDirectory$SimpleFSIndexInput.(SimpleFSDirectory.java:90) at org.apache.lucene.store.SimpleFSDirectory.openInput(SimpleFSDirectory.java:56) at org.apache.lucene.store.FSDirectory.openInput(FSDirectory.java:337) at org.apache.lucene.store.MockDirectoryWrapper.openInput(MockDirectoryWrapper.java:402) at org.apache.lucene.index.codecs.mockrandom.MockRandomCodec.fieldsProducer(MockRandomCodec.java:236) at org.apache.lucene.index.PerFieldCodecWrapper$FieldsReader.(PerFieldCodecWrapper.java:113) at org.apache.lucene.index.PerFieldCodecWrapper.fieldsProducer(PerFieldCodecWrapper.java:210) at org.apache.lucene.index.SegmentReader$CoreReaders.(SegmentReader.java:131) at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:495) at org.apache.lucene.index.IndexWriter$ReaderPool.get(IndexWriter.java:635) at org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:3260) at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:2930) at org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:379) at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:447) NOTE: test params are: codec=RandomCodecProvider: {field=MockRandom}, locale=nl_NL, timezone=Turkey NOTE: all tests run in this JVM: [TestIndexWriter] NOTE: Linux 2.6.32-31-generic i386/Sun Microsystems Inc. 1.6.0_20 (32-bit)/cpus=1,threads=2,free=26480072,total=33468416 {noformat} > TestIndexWriter.testBackgroundOptimize fails with too many open files > - > > Key: LUCENE-3123 > URL: https://issues.apache.org/jira/browse/LUCENE-3123 > Project: Lucene - Java > Issue Type: Bug > Components: core/index > Environment: Linux 2.6.32-31-generic i386/Sun Microsystems Inc. > 1.6.0_20 (32-bit)/cpus=1,threads=2 >Reporter: Doron Cohen > > Recreate with this line: > ant test -Dtestcase=TestIndexWriter -Dtestmethod=testBackgroundOptimize > -Dtests.seed=-3981504507637360146:51354004663342240 > Might be related to LUCENE-2873 ? -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-3123) TestIndexWriter.testBackgroundOptimize fails with too many open files
TestIndexWriter.testBackgroundOptimize fails with too many open files - Key: LUCENE-3123 URL: https://issues.apache.org/jira/browse/LUCENE-3123 Project: Lucene - Java Issue Type: Bug Components: core/index Environment: Linux 2.6.32-31-generic i386/Sun Microsystems Inc. 1.6.0_20 (32-bit)/cpus=1,threads=2 Reporter: Doron Cohen Recreate with this line: ant test -Dtestcase=TestIndexWriter -Dtestmethod=testBackgroundOptimize -Dtests.seed=-3981504507637360146:51354004663342240 Might be related to LUCENE-2873 ? -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3068) The repeats mechanism in SloppyPhraseScorer is broken when doc has tokens at same position
[ https://issues.apache.org/jira/browse/LUCENE-3068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13036111#comment-13036111 ] Doron Cohen commented on LUCENE-3068: - bq. Note that if you go back to the root page, and click on a given day, it tells you the svn rev and also hg ref (of luceneutil) Great, thanks! So, this commit to trunk in r1124293 falls between these two: - Tue 17/05/2011 Lucene/Solr trunk rev 1104671 - Wed 18/05/2011 Lucene/Solr trunk rev 1124524 ... No measurable degradation, good! > The repeats mechanism in SloppyPhraseScorer is broken when doc has tokens at > same position > -- > > Key: LUCENE-3068 > URL: https://issues.apache.org/jira/browse/LUCENE-3068 > Project: Lucene - Java > Issue Type: Bug > Components: core/search >Affects Versions: 3.0.3, 3.1, 4.0 >Reporter: Michael McCandless >Assignee: Doron Cohen >Priority: Minor > Fix For: 3.2, 4.0 > > Attachments: LUCENE-3068.patch, LUCENE-3068.patch, LUCENE-3068.patch, > LUCENE-3068.patch > > > In LUCENE-736 we made fixes to SloppyPhraseScorer, because it was > matching docs that it shouldn't; but I think those changes caused it > to fail to match docs that it should, specifically when the doc itself > has tokens at the same position. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3068) The repeats mechanism in SloppyPhraseScorer is broken when doc has tokens at same position
[ https://issues.apache.org/jira/browse/LUCENE-3068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13036107#comment-13036107 ] Doron Cohen commented on LUCENE-3068: - Looking at http://people.apache.org/~mikemccand/lucenebench/SloppyPhrase.html (Mike this is a great tool!) I see no particular slowdown at the last runs. A thought about these benchmarks, it would be helpful if the checked revision would be shown - perhaps as part of the hover text when hovering the mouse on a graph point... > The repeats mechanism in SloppyPhraseScorer is broken when doc has tokens at > same position > -- > > Key: LUCENE-3068 > URL: https://issues.apache.org/jira/browse/LUCENE-3068 > Project: Lucene - Java > Issue Type: Bug > Components: core/search >Affects Versions: 3.0.3, 3.1, 4.0 >Reporter: Michael McCandless >Assignee: Doron Cohen >Priority: Minor > Fix For: 3.2, 4.0 > > Attachments: LUCENE-3068.patch, LUCENE-3068.patch, LUCENE-3068.patch, > LUCENE-3068.patch > > > In LUCENE-736 we made fixes to SloppyPhraseScorer, because it was > matching docs that it shouldn't; but I think those changes caused it > to fail to match docs that it should, specifically when the doc itself > has tokens at the same position. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3120) span query matches too many docs when two query terms are the same unless inOrder=true
[ https://issues.apache.org/jira/browse/LUCENE-3120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Doron Cohen updated LUCENE-3120: Attachment: LUCENE-3120.patch Attached test case demonstrating the bug. > span query matches too many docs when two query terms are the same unless > inOrder=true > -- > > Key: LUCENE-3120 > URL: https://issues.apache.org/jira/browse/LUCENE-3120 > Project: Lucene - Java > Issue Type: Bug > Components: core/search >Reporter: Doron Cohen >Priority: Minor > Fix For: 3.2, 4.0 > > Attachments: LUCENE-3120.patch > > > spinoff of user list discussion - [SpanNearQuery - inOrder > parameter|http://markmail.org/message/i4cstlwgjmlcfwlc]. > With 3 documents: > * "a b x c d" > * "a b b d" > * "a b x b y d" > Here are a few queries (the number in parenthesis indicates expected #hits): > These ones work *as expected*: > * (1) in-order, slop=0, "b", "x", "b" > * (1) in-order, slop=0, "b", "b" > * (2) in-order, slop=1, "b", "b" > These ones match *too many* hits: > * (1) any-order, slop=0, "b", "x", "b" > * (1) any-order, slop=1, "b", "x", "b" > * (1) any-order, slop=2, "b", "x", "b" > * (1) any-order, slop=3, "b", "x", "b" > These ones match *too many* hits as well: > * (1) any-order, slop=0, "b", "b" > * (2) any-order, slop=1, "b", "b" > Each of the above passes when using a phrase query (applying the slop, no > in-order indication in phrase query). > This seems related to a known overlapping spans issue - [non-overlapping Span > queries|http://markmail.org/message/7jxn5eysjagjwlon] - as indicated by Hoss, > so we might decide to close this bug after all, but I would like to at least > have the junit that exposes the behavior in JIRA. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-3120) span query matches too many docs when two query terms are the same unless inOrder=true
span query matches too many docs when two query terms are the same unless inOrder=true -- Key: LUCENE-3120 URL: https://issues.apache.org/jira/browse/LUCENE-3120 Project: Lucene - Java Issue Type: Bug Components: core/search Reporter: Doron Cohen Priority: Minor Fix For: 3.2, 4.0 spinoff of user list discussion - [SpanNearQuery - inOrder parameter|http://markmail.org/message/i4cstlwgjmlcfwlc]. With 3 documents: * "a b x c d" * "a b b d" * "a b x b y d" Here are a few queries (the number in parenthesis indicates expected #hits): These ones work *as expected*: * (1) in-order, slop=0, "b", "x", "b" * (1) in-order, slop=0, "b", "b" * (2) in-order, slop=1, "b", "b" These ones match *too many* hits: * (1) any-order, slop=0, "b", "x", "b" * (1) any-order, slop=1, "b", "x", "b" * (1) any-order, slop=2, "b", "x", "b" * (1) any-order, slop=3, "b", "x", "b" These ones match *too many* hits as well: * (1) any-order, slop=0, "b", "b" * (2) any-order, slop=1, "b", "b" Each of the above passes when using a phrase query (applying the slop, no in-order indication in phrase query). This seems related to a known overlapping spans issue - [non-overlapping Span queries|http://markmail.org/message/7jxn5eysjagjwlon] - as indicated by Hoss, so we might decide to close this bug after all, but I would like to at least have the junit that exposes the behavior in JIRA. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3068) The repeats mechanism in SloppyPhraseScorer is broken when doc has tokens at same position
[ https://issues.apache.org/jira/browse/LUCENE-3068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13035643#comment-13035643 ] Doron Cohen commented on LUCENE-3068: - I wonder if this should be fixed also in 3.1 branch? Probably so only if we make a 3.1.1, but not needed if its gonna be a 3.2. What's the best practice then? Reopen until decision? Or rely on rescanning all 3.2 changes in case its gonna be 3.1.1? > The repeats mechanism in SloppyPhraseScorer is broken when doc has tokens at > same position > -- > > Key: LUCENE-3068 > URL: https://issues.apache.org/jira/browse/LUCENE-3068 > Project: Lucene - Java > Issue Type: Bug > Components: core/search >Affects Versions: 3.0.3, 3.1, 4.0 >Reporter: Michael McCandless >Assignee: Doron Cohen >Priority: Minor > Fix For: 3.2, 4.0 > > Attachments: LUCENE-3068.patch, LUCENE-3068.patch, LUCENE-3068.patch, > LUCENE-3068.patch > > > In LUCENE-736 we made fixes to SloppyPhraseScorer, because it was > matching docs that it shouldn't; but I think those changes caused it > to fail to match docs that it should, specifically when the doc itself > has tokens at the same position. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-3068) The repeats mechanism in SloppyPhraseScorer is broken when doc has tokens at same position
[ https://issues.apache.org/jira/browse/LUCENE-3068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Doron Cohen resolved LUCENE-3068. - Resolution: Fixed fix merged to 3x in r1124302. > The repeats mechanism in SloppyPhraseScorer is broken when doc has tokens at > same position > -- > > Key: LUCENE-3068 > URL: https://issues.apache.org/jira/browse/LUCENE-3068 > Project: Lucene - Java > Issue Type: Bug > Components: core/search >Affects Versions: 3.0.3, 3.1, 4.0 >Reporter: Michael McCandless >Assignee: Doron Cohen >Priority: Minor > Fix For: 3.2, 4.0 > > Attachments: LUCENE-3068.patch, LUCENE-3068.patch, LUCENE-3068.patch, > LUCENE-3068.patch > > > In LUCENE-736 we made fixes to SloppyPhraseScorer, because it was > matching docs that it shouldn't; but I think those changes caused it > to fail to match docs that it should, specifically when the doc itself > has tokens at the same position. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3068) The repeats mechanism in SloppyPhraseScorer is broken when doc has tokens at same position
[ https://issues.apache.org/jira/browse/LUCENE-3068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13035422#comment-13035422 ] Doron Cohen commented on LUCENE-3068: - fixed in trunk in r1124293. > The repeats mechanism in SloppyPhraseScorer is broken when doc has tokens at > same position > -- > > Key: LUCENE-3068 > URL: https://issues.apache.org/jira/browse/LUCENE-3068 > Project: Lucene - Java > Issue Type: Bug > Components: core/search >Affects Versions: 3.0.3, 3.1, 4.0 >Reporter: Michael McCandless >Assignee: Doron Cohen >Priority: Minor > Fix For: 3.2, 4.0 > > Attachments: LUCENE-3068.patch, LUCENE-3068.patch, LUCENE-3068.patch, > LUCENE-3068.patch > > > In LUCENE-736 we made fixes to SloppyPhraseScorer, because it was > matching docs that it shouldn't; but I think those changes caused it > to fail to match docs that it should, specifically when the doc itself > has tokens at the same position. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2736) Wrong implementation of DocIdSetIterator.advance
[ https://issues.apache.org/jira/browse/LUCENE-2736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13034618#comment-13034618 ] Doron Cohen commented on LUCENE-2736: - Shai, with the modified text the NOTE on "implementations freedom to not advance beyond in some situations" becomes strange... I think that the original text stress the fact the "real intended" behavior is to do advance beyond current, just that for performance reasons the decision whether to advance beyond in some situations is left for implementation decision, and so, if caller provides a target which is not greater than current, it should be aware of this possibility. So I think it is perhaps better to either not modify this at all, or at most, to add "(see NOTE below)" just after "beyond": {noformat} - * Advances to the first beyond the current whose document number is greater + * Advances to the first beyond (see NOTE below) the current whose document number is greater {noformat} This would prevent the confusion I think? > Wrong implementation of DocIdSetIterator.advance > - > > Key: LUCENE-2736 > URL: https://issues.apache.org/jira/browse/LUCENE-2736 > Project: Lucene - Java > Issue Type: Bug > Components: core/search >Affects Versions: 3.2, 4.0 >Reporter: Hardy Ferentschik >Assignee: Shai Erera > Attachments: LUCENE-2736.patch > > > Implementations of {{DocIdSetIterator}} behave differently when advanced is > called. Taking the following test for {{OpenBitSet}}, {{DocIdBitSet}} and > {{SortedVIntList}} only {{SortedVIntList}} passes the test: > {code:title=org.apache.lucene.search.TestDocIdSet.java|borderStyle=solid} > ... > public void testAdvanceWithOpenBitSet() throws IOException { > DocIdSet idSet = new OpenBitSet( new long[] { 1121 }, 1 ); // > bits 0, 5, 6, 10 > assertAdvance( idSet ); > } > public void testAdvanceDocIdBitSet() throws IOException { > BitSet bitSet = new BitSet(); > bitSet.set( 0 ); > bitSet.set( 5 ); > bitSet.set( 6 ); > bitSet.set( 10 ); > DocIdSet idSet = new DocIdBitSet(bitSet); > assertAdvance( idSet ); > } > public void testAdvanceWithSortedVIntList() throws IOException { > DocIdSet idSet = new SortedVIntList( 0, 5, 6, 10 ); > assertAdvance( idSet ); > } > private void assertAdvance(DocIdSet idSet) throws IOException { > DocIdSetIterator iter = idSet.iterator(); > int docId = iter.nextDoc(); > assertEquals( "First doc id should be 0", 0, docId ); > docId = iter.nextDoc(); > assertEquals( "Second doc id should be 5", 5, docId ); > docId = iter.advance( 5 ); > assertEquals( "Advancing iterator should return the next doc > id", 6, docId ); > } > {code} > The javadoc for {{advance}} says: > {quote} > Advances to the first *beyond* the current whose document number is greater > than or equal to _target_. > {quote} > This seems to indicate that {{SortedVIntList}} behaves correctly, whereas the > other two don't. > Just looking at the {{DocIdBitSet}} implementation advance is implemented as: > {code} > bitSet.nextSetBit(target); > {code} > where the docs of {{nextSetBit}} say: > {quote} > Returns the index of the first bit that is set to true that occurs *on or > after* the specified starting index > {quote} -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3034) If you vary a setting per round and that setting is a long string, the report padding/columns break down.
[ https://issues.apache.org/jira/browse/LUCENE-3034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13032499#comment-13032499 ] Doron Cohen commented on LUCENE-3034: - bq. My original workaround was to simply pad the column name Yeah that's what I meant, so ok, better formatting will help. > If you vary a setting per round and that setting is a long string, the report > padding/columns break down. > - > > Key: LUCENE-3034 > URL: https://issues.apache.org/jira/browse/LUCENE-3034 > Project: Lucene - Java > Issue Type: Improvement > Components: contrib/benchmark >Reporter: Mark Miller >Assignee: Mark Miller >Priority: Trivial > Fix For: 3.1.1, 4.0 > > > This is especially noticeable if you vary a setting where the value is a > fully specified class name - in this case, it would be nice if columns in > each row still lined up. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3034) If you vary a setting per round and that setting is a long string, the report padding/columns break down.
[ https://issues.apache.org/jira/browse/LUCENE-3034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13032453#comment-13032453 ] Doron Cohen commented on LUCENE-3034: - Hi Mark, could you add an example algorithm with this behavior? Also, this is from the package javadocs: {code} # multi val params are iterated by NewRound's, added to reports, start with column name. merge.factor=mrg:10:20 max.buffered=buf:100:1000 {code} Is it possible to workaround the problem by specifying a sufficiently long column name as the first value, that is, replacing e.g. 'mrg' or 'buf' in the above? > If you vary a setting per round and that setting is a long string, the report > padding/columns break down. > - > > Key: LUCENE-3034 > URL: https://issues.apache.org/jira/browse/LUCENE-3034 > Project: Lucene - Java > Issue Type: Improvement > Components: contrib/benchmark >Reporter: Mark Miller >Assignee: Mark Miller >Priority: Trivial > Fix For: 3.1.1, 4.0 > > > This is especially noticeable if you vary a setting where the value is a > fully specified class name - in this case, it would be nice if columns in > each row still lined up. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3068) The repeats mechanism in SloppyPhraseScorer is broken when doc has tokens at same position
[ https://issues.apache.org/jira/browse/LUCENE-3068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Doron Cohen updated LUCENE-3068: Attachment: LUCENE-3068.patch Patch with more test cases - AND/OR logic for MPQ is combined, and test code made simpler. > The repeats mechanism in SloppyPhraseScorer is broken when doc has tokens at > same position > -- > > Key: LUCENE-3068 > URL: https://issues.apache.org/jira/browse/LUCENE-3068 > Project: Lucene - Java > Issue Type: Bug > Components: Search >Affects Versions: 3.0.3, 3.1, 4.0 >Reporter: Michael McCandless >Assignee: Doron Cohen >Priority: Minor > Fix For: 3.2, 4.0 > > Attachments: LUCENE-3068.patch, LUCENE-3068.patch, LUCENE-3068.patch, > LUCENE-3068.patch > > > In LUCENE-736 we made fixes to SloppyPhraseScorer, because it was > matching docs that it shouldn't; but I think those changes caused it > to fail to match docs that it should, specifically when the doc itself > has tokens at the same position. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3068) The repeats mechanism in SloppyPhraseScorer is broken when doc has tokens at same position
[ https://issues.apache.org/jira/browse/LUCENE-3068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13029274#comment-13029274 ] Doron Cohen commented on LUCENE-3068: - Thanks for reviewing Shai! I'll updated the patch with random newDirectory and newICFG - not the focus here, but may improve coverage anyhow, I added tests for the combined case - some AND some OR - that is, using MPQ, some add() with a single term (AND), some with an array longer than 1 (OR). Also refactored the tests a bit so that now there's a small test method for each test case. > The repeats mechanism in SloppyPhraseScorer is broken when doc has tokens at > same position > -- > > Key: LUCENE-3068 > URL: https://issues.apache.org/jira/browse/LUCENE-3068 > Project: Lucene - Java > Issue Type: Bug > Components: Search >Affects Versions: 3.0.3, 3.1, 4.0 >Reporter: Michael McCandless >Assignee: Doron Cohen >Priority: Minor > Fix For: 3.2, 4.0 > > Attachments: LUCENE-3068.patch, LUCENE-3068.patch, LUCENE-3068.patch > > > In LUCENE-736 we made fixes to SloppyPhraseScorer, because it was > matching docs that it shouldn't; but I think those changes caused it > to fail to match docs that it should, specifically when the doc itself > has tokens at the same position. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3068) The repeats mechanism in SloppyPhraseScorer is broken when doc has tokens at same position
[ https://issues.apache.org/jira/browse/LUCENE-3068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Doron Cohen updated LUCENE-3068: Attachment: LUCENE-3068.patch Attached patch fixes this bug by excluding fro the repeats check those PPs originated fro same offset in the query. This allows more strict phrase queries: strict on terms in same position (AND logic) but still sloppy. All tests pass, this is ready to go in (unless there are reservations). > The repeats mechanism in SloppyPhraseScorer is broken when doc has tokens at > same position > -- > > Key: LUCENE-3068 > URL: https://issues.apache.org/jira/browse/LUCENE-3068 > Project: Lucene - Java > Issue Type: Bug > Components: Search >Affects Versions: 3.0.3, 3.1, 4.0 >Reporter: Michael McCandless >Assignee: Doron Cohen >Priority: Minor > Fix For: 3.2, 4.0 > > Attachments: LUCENE-3068.patch, LUCENE-3068.patch, LUCENE-3068.patch > > > In LUCENE-736 we made fixes to SloppyPhraseScorer, because it was > matching docs that it shouldn't; but I think those changes caused it > to fail to match docs that it should, specifically when the doc itself > has tokens at the same position. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3068) The repeats mechanism in SloppyPhraseScorer is broken when doc has tokens at same position
[ https://issues.apache.org/jira/browse/LUCENE-3068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13029150#comment-13029150 ] Doron Cohen commented on LUCENE-3068: - This is more complex than I originally thought. # QueryParser creates a MultiplePhraseQuery (MPQ) when one of the (phrase) query positions is a multi-term. # MPQ has an implicit OR behavior - it is used for e.g. wildcarding a phrase query. # PhraseQuery (PQ) sloppy scorer assumes each query position has a single term. # PQ with several terms in same position cannot be created by parsing it with a QP, only manual. Manually created, it would have an AND semantics: only docs with ALL the terms in pos N should match. In other words, assume doc D terms and positions are: a:0 b:1 c:1 d:2 MPQ for (a,b):0 d:1 should match D, finding the phrase b:1 d:2 (OR semantics) PQ for (a,b):0 d:1 should not match D, because it does not contain 'a' and 'b' in the same position (AND semantics). Therefore, rewriting PQ into MPQ is not a valid fix, because it would turn the AND logic assumed by creating the PQ this way, by an OR logic as assumed in MPQ. {code:title=TestPositionIncrement.testSetPosition has a test for this case exactly} // phrase query should fail for non existing searched term // even if there exist another searched terms in the same searched position. q = new PhraseQuery(); q.add(new Term("field", "3"),0); q.add(new Term("field", "9"),0); hits = searcher.search(q, null, 1000).scoreDocs; assertEquals(0, hits.length); {code} Although QP by default will not create this PQ, I think we need to support it, for applications needing to be strict with the search results, with slop. So fixing this would need to take place inside SloppyScorer, digging further... > The repeats mechanism in SloppyPhraseScorer is broken when doc has tokens at > same position > -- > > Key: LUCENE-3068 > URL: https://issues.apache.org/jira/browse/LUCENE-3068 > Project: Lucene - Java > Issue Type: Bug > Components: Search >Affects Versions: 3.0.3, 3.1, 4.0 >Reporter: Michael McCandless >Assignee: Doron Cohen >Priority: Minor > Fix For: 3.2, 4.0 > > Attachments: LUCENE-3068.patch, LUCENE-3068.patch > > > In LUCENE-736 we made fixes to SloppyPhraseScorer, because it was > matching docs that it shouldn't; but I think those changes caused it > to fail to match docs that it should, specifically when the doc itself > has tokens at the same position. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3068) The repeats mechanism in SloppyPhraseScorer is broken when doc has tokens at same position
[ https://issues.apache.org/jira/browse/LUCENE-3068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Doron Cohen updated LUCENE-3068: Attachment: LUCENE-3068.patch Attached modified version of the test - one that invokes the query parser to create an MFQ. The test passes. > The repeats mechanism in SloppyPhraseScorer is broken when doc has tokens at > same position > -- > > Key: LUCENE-3068 > URL: https://issues.apache.org/jira/browse/LUCENE-3068 > Project: Lucene - Java > Issue Type: Bug > Components: Search >Affects Versions: 3.0.3, 3.1, 4.0 >Reporter: Michael McCandless >Assignee: Doron Cohen >Priority: Minor > Fix For: 3.2, 4.0 > > Attachments: LUCENE-3068.patch, LUCENE-3068.patch > > > In LUCENE-736 we made fixes to SloppyPhraseScorer, because it was > matching docs that it shouldn't; but I think those changes caused it > to fail to match docs that it should, specifically when the doc itself > has tokens at the same position. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3068) The repeats mechanism in SloppyPhraseScorer is broken when doc has tokens at same position
[ https://issues.apache.org/jira/browse/LUCENE-3068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13028895#comment-13028895 ] Doron Cohen commented on LUCENE-3068: - bq. specifically when the doc itself has tokens at the same position. I am not convinced yet that there is a bug here - I think the code does allow this? There is another assumption in the code, that any two different PPs are in different TPs - which underlines the assumption that originally each PP differs in position, This seems a valid assumption, because QP will create MFQ if there are two terms in the (phrase) query with same position. bq. maybe any time a *PhraseQuery has overlapping positions, we should rewrite to a MultiPhraseQuery and let it handle the same positions...? Is there any downside to that? I think this is the correct behavior - in particular this will be the query that a QP will create. The only way to create a PQ (not MPQ) for PPs in same positions is to create it manually. But why would anyone do that? And they did, wouldn't such a rewrite be a surprise to them? A patch to follow with a revised version of this test - one that uses the QP. In this patch the QP indeed creates an MFQ, and I am yet unable to make it fail. Still trying. > The repeats mechanism in SloppyPhraseScorer is broken when doc has tokens at > same position > -- > > Key: LUCENE-3068 > URL: https://issues.apache.org/jira/browse/LUCENE-3068 > Project: Lucene - Java > Issue Type: Bug > Components: Search >Affects Versions: 3.0.3, 3.1, 4.0 >Reporter: Michael McCandless >Assignee: Doron Cohen >Priority: Minor > Fix For: 3.2, 4.0 > > Attachments: LUCENE-3068.patch > > > In LUCENE-736 we made fixes to SloppyPhraseScorer, because it was > matching docs that it shouldn't; but I think those changes caused it > to fail to match docs that it should, specifically when the doc itself > has tokens at the same position. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Assigned] (LUCENE-3068) The repeats mechanism in SloppyPhraseScorer is broken when doc has tokens at same position
[ https://issues.apache.org/jira/browse/LUCENE-3068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Doron Cohen reassigned LUCENE-3068: --- Assignee: Doron Cohen > The repeats mechanism in SloppyPhraseScorer is broken when doc has tokens at > same position > -- > > Key: LUCENE-3068 > URL: https://issues.apache.org/jira/browse/LUCENE-3068 > Project: Lucene - Java > Issue Type: Bug > Components: Search >Affects Versions: 3.0.3, 3.1, 4.0 >Reporter: Michael McCandless >Assignee: Doron Cohen >Priority: Minor > Fix For: 3.2, 4.0 > > Attachments: LUCENE-3068.patch > > > In LUCENE-736 we made fixes to SloppyPhraseScorer, because it was > matching docs that it shouldn't; but I think those changes caused it > to fail to match docs that it should, specifically when the doc itself > has tokens at the same position. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3010) Add the ability for the Lucene Benchmark code to read Solr configuration information for testing Analyzer/Filter Chains
[ https://issues.apache.org/jira/browse/LUCENE-3010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Doron Cohen updated LUCENE-3010: Description: I would like to be able to use the Lucene Benchmark code in Lucene contrib with Solr to run some indexing tests. It would be nice if Lucene Benchmark could read my Solr configuration rather than having to translate my filter chain and other parameters into Lucene java code. This relates to LUCENE-2845, (was: I would like to be able to use the Lucene Benchmark code in Lucene contrib with Solr to run some indexing tests. It would be nice if Lucene Benchmark could read my Solr configuration rather than having to translate my filter chain and other parameters into Lucene java code. This relates to Lucene 2845, ) > Add the ability for the Lucene Benchmark code to read Solr configuration > information for testing Analyzer/Filter Chains > > > Key: LUCENE-3010 > URL: https://issues.apache.org/jira/browse/LUCENE-3010 > Project: Lucene - Java > Issue Type: Wish > Components: contrib/benchmark >Reporter: Tom Burton-West >Priority: Trivial > > I would like to be able to use the Lucene Benchmark code in Lucene contrib > with Solr to run some indexing tests. It would be nice if Lucene Benchmark > could read my Solr configuration rather than having to translate my filter > chain and other parameters into Lucene java code. This relates to > LUCENE-2845, -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2952) Make license checking/maintenance easier/automated
[ https://issues.apache.org/jira/browse/LUCENE-2952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13010850#comment-13010850 ] Doron Cohen commented on LUCENE-2952: - Eclipse complains for top common-build.xml (trunk, 3x) that default target "validate" does not exist in the project: {code} Make license checking/maintenance easier/automated > -- > > Key: LUCENE-2952 > URL: https://issues.apache.org/jira/browse/LUCENE-2952 > Project: Lucene - Java > Issue Type: Improvement >Reporter: Grant Ingersoll >Assignee: Grant Ingersoll >Priority: Minor > Fix For: 3.2, 4.0 > > Attachments: LUCENE-2952.patch, LUCENE-2952.patch, LUCENE-2952.patch, > LUCENE-2952.patch, LUCENE-2952.patch, LUCENE-2952.patch, LUCENE-2952.patch > > > Instead of waiting until release to check licenses are valid, we should make > it a part of our build process to ensure that all dependencies have proper > licenses, etc. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-2988) trunk 'ant test' hangs
[ https://issues.apache.org/jira/browse/LUCENE-2988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Doron Cohen resolved LUCENE-2988. - Resolution: Cannot Reproduce Lucene Fields: (was: [New]) Could not reproduce once moved from ant 1.8.1 to 1.7.1. > trunk 'ant test' hangs > -- > > Key: LUCENE-2988 > URL: https://issues.apache.org/jira/browse/LUCENE-2988 > Project: Lucene - Java > Issue Type: Bug > Components: Tests > Environment: inspected so far on XP within Cygwin using IBM JDK 6 >Reporter: Doron Cohen >Assignee: Doron Cohen > Fix For: 4.0 > > Attachments: 5-java-dumps.zip > > > Running 'ant test' from trunk on XP in a Cygwin shell hangs. > There was no progress in the console for a long time, so i stopped the > program. > Before stopping it, created 5 consecutive thread dumps to see where the code > is. > It is not clear what is going on - does not seem like a Lucene code I think > but not sure. > Opening this issue to keep an eye on this - I will try with other JDKs to see > if this is persistent. > Also, when first seeing this had local changes of two issue: LUCENE-2986 and > LUCENE-2977 - I think the changes in these issues are related but will repeat > the tests without these changes. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2988) trunk 'ant test' hangs
[ https://issues.apache.org/jira/browse/LUCENE-2988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13010754#comment-13010754 ] Doron Cohen commented on LUCENE-2988: - Thanks Robert for looking into this! bq. it seems they are from the 'ant' jvm, not the forked process that actually runs the tests Indeed, now I notice that too... when 'ant test' hanged I used to get the thread dump and did not realize the ant JVM stands in between... That console is lost and I don't know the exact location in the tests when that happened - too bad. bq. It could be the case here that its just in some ultra-slow test such as TestIndexWriterOnDiskFull... this one frequently takes several minutes for me even on Sun JRE. I was away for half an hour so its something slower... I was not able to reproduce this, and also in the meantime found out that I was using incompatible ant version - 1.8.1. So thinking of closing this for now, will reopen it if it will reappear with ant 1.7.1. > trunk 'ant test' hangs > -- > > Key: LUCENE-2988 > URL: https://issues.apache.org/jira/browse/LUCENE-2988 > Project: Lucene - Java > Issue Type: Bug > Components: Tests > Environment: inspected so far on XP within Cygwin using IBM JDK 6 >Reporter: Doron Cohen >Assignee: Doron Cohen > Fix For: 4.0 > > Attachments: 5-java-dumps.zip > > > Running 'ant test' from trunk on XP in a Cygwin shell hangs. > There was no progress in the console for a long time, so i stopped the > program. > Before stopping it, created 5 consecutive thread dumps to see where the code > is. > It is not clear what is going on - does not seem like a Lucene code I think > but not sure. > Opening this issue to keep an eye on this - I will try with other JDKs to see > if this is persistent. > Also, when first seeing this had local changes of two issue: LUCENE-2986 and > LUCENE-2977 - I think the changes in these issues are related but will repeat > the tests without these changes. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org