Re: [ANNOUNCE] Apache PyLucene 3.1.0
Thanks! btw, originally I went this way http://lucene.apache.org/pylucene/jcc/documentation/install.html , but it is not up-to-date and trunk doesn't seem to be compilable (problems with ant xml files, plus no doc directory etc). Then I used the tar and it worked like charm. best regards -- Valery A.Khamenya On Fri, Apr 8, 2011 at 5:16 AM, dar...@ontrenet.com wrote: Congrats Andi. A truly awesome project. On Thu, 7 Apr 2011 20:02:22 -0700 (PDT), Andi Vajda va...@apache.org wrote: I am pleased to announce the availability of Apache PyLucene 3.1.0. Apache PyLucene, a subproject of Apache Lucene, is a Python extension for accessing Apache Lucene Core. Its goal is to allow you to use Lucene's text indexing and searching capabilities from Python. It is API compatible with the latest version of Lucene Core, 3.1.0. This release contains a number of bug fixes and improvements. Details can be found in the changes files: http://svn.apache.org/repos/asf/lucene/pylucene/tags/pylucene_3_1_0/CHANGES http://svn.apache.org/repos/asf/lucene/pylucene/trunk/jcc/CHANGES Apache PyLucene is available from the following download page: http://www.apache.org/dyn/closer.cgi/lucene/pylucene/pylucene-3.1.0-1-src.tar.gz When downloading from a mirror site, please remember to verify the downloads using signatures found on the Apache site: http://www.apache.org/dist/lucene/pylucene/KEYS For more information on Apache PyLucene, visit the project home page: http://lucene.apache.org/pylucene Andi..
Re: pylucene in twistd app running as daemon
On Thu, 14 Apr 2011, Marcus wrote: in a previous post, a user expressed problems running pylucene in a twistd app (running as a daemon) I have similar issues. I didn't see an answer in the postings. Everything works fine when twistd runs in foreground mode (twistd --nodaemon) the trick would seem to be knowing where is the right place to run initVM. initVM() _must_ be run from the main thread before calling anything involving the embedded JVM. env.attachCurrentThread() _must_ be run from any other thread before calling anything involving the embedded JVM. Andi..
[jira] [Commented] (SOLR-2436) move uimaConfig to under the uima's update processor in solrconfig.xml
[ https://issues.apache.org/jira/browse/SOLR-2436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13019713#comment-13019713 ] Tommaso Teofili commented on SOLR-2436: --- Thanks Uwe for the useful clarification regarding XML resources loading. I agree having such information on the wiki would be good. bq. If it is going to commit, it breaks back-compat. I think we need a note for users in CHANGES.txt. Yes, and we need to align the README and wiki too. move uimaConfig to under the uima's update processor in solrconfig.xml -- Key: SOLR-2436 URL: https://issues.apache.org/jira/browse/SOLR-2436 Project: Solr Issue Type: Improvement Affects Versions: 3.1 Reporter: Koji Sekiguchi Priority: Minor Attachments: SOLR-2436-3.patch, SOLR-2436.patch, SOLR-2436.patch, SOLR-2436.patch, SOLR-2436_2.patch Solr contrib UIMA has its config just beneath config. I think it should move to uima's update processor tag. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2436) move uimaConfig to under the uima's update processor in solrconfig.xml
[ https://issues.apache.org/jira/browse/SOLR-2436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13019715#comment-13019715 ] Uwe Schindler commented on SOLR-2436: - bq. When I did it, I missed something. Thank you for the alarm. The code in DataImporter.java and DataImportHandler.java is a very bad example because it also supports loading from StringReaders and such stuff solely for testing puposes. Because of this the code is very complicated and parts of it are in both files. The examples in Solr core are much better. move uimaConfig to under the uima's update processor in solrconfig.xml -- Key: SOLR-2436 URL: https://issues.apache.org/jira/browse/SOLR-2436 Project: Solr Issue Type: Improvement Affects Versions: 3.1 Reporter: Koji Sekiguchi Priority: Minor Attachments: SOLR-2436-3.patch, SOLR-2436.patch, SOLR-2436.patch, SOLR-2436.patch, SOLR-2436_2.patch Solr contrib UIMA has its config just beneath config. I think it should move to uima's update processor tag. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-3027) TestOmitTf.testMixedMerge random seed failure
TestOmitTf.testMixedMerge random seed failure - Key: LUCENE-3027 URL: https://issues.apache.org/jira/browse/LUCENE-3027 Project: Lucene - Java Issue Type: Bug Reporter: selckin Version: trunk r1091638 ant test -Dtests.seed=-6595054217575280191:5576532348905930588 [junit] - Standard Error - [junit] WARNING: test method: 'testDeMorgan' left thread running: Thread[NRT search threads-1691-thread-2,5,main] [junit] RESOURCE LEAK: test method: 'testDeMorgan' left 1 thread(s) running [junit] NOTE: reproduce with: ant test -Dtestcase=TestBooleanQuery -Dtestmethod=testDeMorgan -Dtests.seed=-6595054217575280191:5576532348905930588 [junit] - --- [junit] Testsuite: org.apache.lucene.index.TestNorms [junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 5.064 sec [junit] [junit] Testsuite: org.apache.lucene.index.TestOmitTf [junit] Testcase: testMixedMerge(org.apache.lucene.index.TestOmitTf): Caused an ERROR [junit] CheckIndex failed [junit] java.lang.RuntimeException: CheckIndex failed [junit] at org.apache.lucene.util._TestUtil.checkIndex(_TestUtil.java:152) [junit] at org.apache.lucene.util._TestUtil.checkIndex(_TestUtil.java:138) [junit] at org.apache.lucene.index.TestOmitTf.testMixedMerge(TestOmitTf.java:155) [junit] at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1232) [junit] at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1160) [junit] [junit] [junit] Tests run: 5, Failures: 0, Errors: 1, Time elapsed: 0.851 sec [junit] [junit] - Standard Output --- [junit] CheckIndex failed [junit] Segments file=segments_1 numSegments=1 version=FORMAT_4_0 [Lucene 4.0] [junit] 1 of 1: name=_12 docCount=60 [junit] codec=SegmentCodecs [codecs=[MockRandom, MockVariableIntBlock(baseBlockSize=112)], provider=RandomCodecProvider: {f1=MockRandom, f2=MockVariableIntBlock(baseBlockSize=112)}] [junit] compound=false [junit] hasProx=false [junit] numFiles=16 [junit] size (MB)=0,01 [junit] diagnostics = {optimize=true, mergeFactor=2, os.version=2.6.37-gentoo, os=Linux, lucene.version=4.0-SNAPSHOT, source=merge, os.arch=amd64, java.version=1.6.0_24, java.vendor=Sun Microsystems Inc.} [junit] no deletions [junit] test: open reader.OK [junit] test: fields..OK [2 fields] [junit] test: field norms.OK [2 fields] [junit] test: terms, freq, prox...ERROR: java.io.IOException: Read past EOF [junit] java.io.IOException: Read past EOF [junit] at org.apache.lucene.store.RAMInputStream.switchCurrentBuffer(RAMInputStream.java:90) [junit] at org.apache.lucene.store.RAMInputStream.readByte(RAMInputStream.java:63) [junit] at org.apache.lucene.store.MockIndexInputWrapper.readByte(MockIndexInputWrapper.java:105) [junit] at org.apache.lucene.store.DataInput.readVInt(DataInput.java:94) [junit] at org.apache.lucene.index.codecs.sep.SepSkipListReader.readSkipData(SepSkipListReader.java:188) [junit] at org.apache.lucene.index.codecs.MultiLevelSkipListReader.loadNextSkip(MultiLevelSkipListReader.java:142) [junit] at org.apache.lucene.index.codecs.MultiLevelSkipListReader.skipTo(MultiLevelSkipListReader.java:112) [junit] at org.apache.lucene.index.codecs.sep.SepPostingsReaderImpl$SepDocsEnum.advance(SepPostingsReaderImpl.java:454) [junit] at org.apache.lucene.index.CheckIndex.testTermIndex(CheckIndex.java:782) [junit] at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:495) [junit] at org.apache.lucene.util._TestUtil.checkIndex(_TestUtil.java:148) [junit] at org.apache.lucene.util._TestUtil.checkIndex(_TestUtil.java:138) [junit] at org.apache.lucene.index.TestOmitTf.testMixedMerge(TestOmitTf.java:155) [junit] at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) [junit] at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) [junit] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) [junit] at java.lang.reflect.Method.invoke(Method.java:597) [junit] at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44) [junit] at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15) [junit] at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41) [junit] at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20) [junit] at
[jira] [Updated] (LUCENE-3027) TestOmitTf.testMixedMerge random seed failure
[ https://issues.apache.org/jira/browse/LUCENE-3027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] selckin updated LUCENE-3027: Attachment: output.txt ant output TestOmitTf.testMixedMerge random seed failure - Key: LUCENE-3027 URL: https://issues.apache.org/jira/browse/LUCENE-3027 Project: Lucene - Java Issue Type: Bug Reporter: selckin Attachments: output.txt Version: trunk r1091638 ant test -Dtests.seed=-6595054217575280191:5576532348905930588 [junit] - Standard Error - [junit] WARNING: test method: 'testDeMorgan' left thread running: Thread[NRT search threads-1691-thread-2,5,main] [junit] RESOURCE LEAK: test method: 'testDeMorgan' left 1 thread(s) running [junit] NOTE: reproduce with: ant test -Dtestcase=TestBooleanQuery -Dtestmethod=testDeMorgan -Dtests.seed=-6595054217575280191:5576532348905930588 [junit] - --- [junit] Testsuite: org.apache.lucene.index.TestNorms [junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 5.064 sec [junit] [junit] Testsuite: org.apache.lucene.index.TestOmitTf [junit] Testcase: testMixedMerge(org.apache.lucene.index.TestOmitTf): Caused an ERROR [junit] CheckIndex failed [junit] java.lang.RuntimeException: CheckIndex failed [junit] at org.apache.lucene.util._TestUtil.checkIndex(_TestUtil.java:152) [junit] at org.apache.lucene.util._TestUtil.checkIndex(_TestUtil.java:138) [junit] at org.apache.lucene.index.TestOmitTf.testMixedMerge(TestOmitTf.java:155) [junit] at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1232) [junit] at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1160) [junit] [junit] [junit] Tests run: 5, Failures: 0, Errors: 1, Time elapsed: 0.851 sec [junit] [junit] - Standard Output --- [junit] CheckIndex failed [junit] Segments file=segments_1 numSegments=1 version=FORMAT_4_0 [Lucene 4.0] [junit] 1 of 1: name=_12 docCount=60 [junit] codec=SegmentCodecs [codecs=[MockRandom, MockVariableIntBlock(baseBlockSize=112)], provider=RandomCodecProvider: {f1=MockRandom, f2=MockVariableIntBlock(baseBlockSize=112)}] [junit] compound=false [junit] hasProx=false [junit] numFiles=16 [junit] size (MB)=0,01 [junit] diagnostics = {optimize=true, mergeFactor=2, os.version=2.6.37-gentoo, os=Linux, lucene.version=4.0-SNAPSHOT, source=merge, os.arch=amd64, java.version=1.6.0_24, java.vendor=Sun Microsystems Inc.} [junit] no deletions [junit] test: open reader.OK [junit] test: fields..OK [2 fields] [junit] test: field norms.OK [2 fields] [junit] test: terms, freq, prox...ERROR: java.io.IOException: Read past EOF [junit] java.io.IOException: Read past EOF [junit] at org.apache.lucene.store.RAMInputStream.switchCurrentBuffer(RAMInputStream.java:90) [junit] at org.apache.lucene.store.RAMInputStream.readByte(RAMInputStream.java:63) [junit] at org.apache.lucene.store.MockIndexInputWrapper.readByte(MockIndexInputWrapper.java:105) [junit] at org.apache.lucene.store.DataInput.readVInt(DataInput.java:94) [junit] at org.apache.lucene.index.codecs.sep.SepSkipListReader.readSkipData(SepSkipListReader.java:188) [junit] at org.apache.lucene.index.codecs.MultiLevelSkipListReader.loadNextSkip(MultiLevelSkipListReader.java:142) [junit] at org.apache.lucene.index.codecs.MultiLevelSkipListReader.skipTo(MultiLevelSkipListReader.java:112) [junit] at org.apache.lucene.index.codecs.sep.SepPostingsReaderImpl$SepDocsEnum.advance(SepPostingsReaderImpl.java:454) [junit] at org.apache.lucene.index.CheckIndex.testTermIndex(CheckIndex.java:782) [junit] at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:495) [junit] at org.apache.lucene.util._TestUtil.checkIndex(_TestUtil.java:148) [junit] at org.apache.lucene.util._TestUtil.checkIndex(_TestUtil.java:138) [junit] at org.apache.lucene.index.TestOmitTf.testMixedMerge(TestOmitTf.java:155) [junit] at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) [junit] at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) [junit] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) [junit] at java.lang.reflect.Method.invoke(Method.java:597) [junit] at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44) [junit] at
[HUDSON] Lucene-Solr-tests-only-3.x - Build # 7088 - Failure
Build: https://hudson.apache.org/hudson/job/Lucene-Solr-tests-only-3.x/7088/ 1 tests failed. REGRESSION: org.apache.solr.client.solrj.embedded.SolrExampleStreamingTest.testCommitWithin Error Message: expected:0 but was:1 Stack Trace: junit.framework.AssertionFailedError: expected:0 but was:1 at org.apache.solr.client.solrj.SolrExampleTests.testCommitWithin(SolrExampleTests.java:327) at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1082) at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1010) Build Log (for compile errors): [...truncated 10717 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3027) TestOmitTf.testMixedMerge random seed failure
[ https://issues.apache.org/jira/browse/LUCENE-3027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13019726#comment-13019726 ] Robert Muir commented on LUCENE-3027: - Thanks for reporting this, I can reproduce on windows also, looks serious. Might be triggered by the fact we recently started randomizing skipinterval? Note, the test will NOT fail if you try the repro line!!! You have to do 'ant test -Dtests.seed=-6595054217575280191:5576532348905930588' I don't know if this causes a timing issue or what, but it works for me too: {noformat} [junit] Testsuite: org.apache.lucene.index.TestOmitTf [junit] Testcase: testMixedMerge(org.apache.lucene.index.TestOmitTf): Caused an ERROR [junit] CheckIndex failed [junit] java.lang.RuntimeException: CheckIndex failed [junit] at org.apache.lucene.util._TestUtil.checkIndex(_TestUtil.java:152) [junit] at org.apache.lucene.util._TestUtil.checkIndex(_TestUtil.java:138) [junit] at org.apache.lucene.index.TestOmitTf.testMixedMerge(TestOmitTf.java:155) [junit] at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1232) [junit] at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1160) [junit] [junit] [junit] Tests run: 5, Failures: 0, Errors: 1, Time elapsed: 0.284 sec [junit] [junit] - Standard Output --- [junit] CheckIndex failed [junit] Segments file=segments_1 numSegments=1 version=FORMAT_4_0 [Lucene 4.0] [junit] 1 of 1: name=_12 docCount=60 [junit] codec=SegmentCodecs [codecs=[MockRandom, MockVariableIntBlock(baseBlockSize=112)], provider=RandomCodecProvider: {f1=MockRandom, f2=MockVariable IntBlock(baseBlockSize=112)}] [junit] compound=false [junit] hasProx=false [junit] numFiles=16 [junit] size (MB)=0,01 [junit] diagnostics = {optimize=true, mergeFactor=2, os.version=6.0, os=Windows Vista, lucene.version=4.0-SNAPSHOT, source=merge, os.arch=x86, java.vers ion=1.6.0_23, java.vendor=Sun Microsystems Inc.} [junit] no deletions [junit] test: open reader.OK [junit] test: fields..OK [2 fields] [junit] test: field norms.OK [2 fields] [junit] test: terms, freq, prox...ERROR: java.io.IOException: Read past EOF [junit] java.io.IOException: Read past EOF [junit] at org.apache.lucene.store.RAMInputStream.switchCurrentBuffer(RAMInputStream.java:90) [junit] at org.apache.lucene.store.RAMInputStream.readByte(RAMInputStream.java:63) [junit] at org.apache.lucene.store.MockIndexInputWrapper.readByte(MockIndexInputWrapper.java:105) [junit] at org.apache.lucene.store.DataInput.readVInt(DataInput.java:94) [junit] at org.apache.lucene.index.codecs.sep.SepSkipListReader.readSkipData(SepSkipListReader.java:188) [junit] at org.apache.lucene.index.codecs.MultiLevelSkipListReader.loadNextSkip(MultiLevelSkipListReader.java:142) [junit] at org.apache.lucene.index.codecs.MultiLevelSkipListReader.skipTo(MultiLevelSkipListReader.java:112) [junit] at org.apache.lucene.index.codecs.sep.SepPostingsReaderImpl$SepDocsEnum.advance(SepPostingsReaderImpl.java:454) [junit] at org.apache.lucene.index.CheckIndex.testTermIndex(CheckIndex.java:782) {noformat} TestOmitTf.testMixedMerge random seed failure - Key: LUCENE-3027 URL: https://issues.apache.org/jira/browse/LUCENE-3027 Project: Lucene - Java Issue Type: Bug Reporter: selckin Attachments: output.txt Version: trunk r1091638 ant test -Dtests.seed=-6595054217575280191:5576532348905930588 [junit] - Standard Error - [junit] WARNING: test method: 'testDeMorgan' left thread running: Thread[NRT search threads-1691-thread-2,5,main] [junit] RESOURCE LEAK: test method: 'testDeMorgan' left 1 thread(s) running [junit] NOTE: reproduce with: ant test -Dtestcase=TestBooleanQuery -Dtestmethod=testDeMorgan -Dtests.seed=-6595054217575280191:5576532348905930588 [junit] - --- [junit] Testsuite: org.apache.lucene.index.TestNorms [junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 5.064 sec [junit] [junit] Testsuite: org.apache.lucene.index.TestOmitTf [junit] Testcase: testMixedMerge(org.apache.lucene.index.TestOmitTf): Caused an ERROR [junit] CheckIndex failed [junit] java.lang.RuntimeException: CheckIndex failed [junit] at org.apache.lucene.util._TestUtil.checkIndex(_TestUtil.java:152) [junit] at org.apache.lucene.util._TestUtil.checkIndex(_TestUtil.java:138) [junit] at
[jira] [Commented] (LUCENE-3027) TestOmitTf.testMixedMerge random seed failure
[ https://issues.apache.org/jira/browse/LUCENE-3027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13019730#comment-13019730 ] Michael McCandless commented on LUCENE-3027: Nice find!! This seems to fail for me consistently: {noformat} ant test-core -Dtestcase=TestOmitTf -Dtestmethod=testMixedMerge -Dtests.seed=-6440890546631805798:9110494168610462642 {noformat} I'll hunt... TestOmitTf.testMixedMerge random seed failure - Key: LUCENE-3027 URL: https://issues.apache.org/jira/browse/LUCENE-3027 Project: Lucene - Java Issue Type: Bug Reporter: selckin Attachments: output.txt Version: trunk r1091638 ant test -Dtests.seed=-6595054217575280191:5576532348905930588 [junit] - Standard Error - [junit] WARNING: test method: 'testDeMorgan' left thread running: Thread[NRT search threads-1691-thread-2,5,main] [junit] RESOURCE LEAK: test method: 'testDeMorgan' left 1 thread(s) running [junit] NOTE: reproduce with: ant test -Dtestcase=TestBooleanQuery -Dtestmethod=testDeMorgan -Dtests.seed=-6595054217575280191:5576532348905930588 [junit] - --- [junit] Testsuite: org.apache.lucene.index.TestNorms [junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 5.064 sec [junit] [junit] Testsuite: org.apache.lucene.index.TestOmitTf [junit] Testcase: testMixedMerge(org.apache.lucene.index.TestOmitTf): Caused an ERROR [junit] CheckIndex failed [junit] java.lang.RuntimeException: CheckIndex failed [junit] at org.apache.lucene.util._TestUtil.checkIndex(_TestUtil.java:152) [junit] at org.apache.lucene.util._TestUtil.checkIndex(_TestUtil.java:138) [junit] at org.apache.lucene.index.TestOmitTf.testMixedMerge(TestOmitTf.java:155) [junit] at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1232) [junit] at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1160) [junit] [junit] [junit] Tests run: 5, Failures: 0, Errors: 1, Time elapsed: 0.851 sec [junit] [junit] - Standard Output --- [junit] CheckIndex failed [junit] Segments file=segments_1 numSegments=1 version=FORMAT_4_0 [Lucene 4.0] [junit] 1 of 1: name=_12 docCount=60 [junit] codec=SegmentCodecs [codecs=[MockRandom, MockVariableIntBlock(baseBlockSize=112)], provider=RandomCodecProvider: {f1=MockRandom, f2=MockVariableIntBlock(baseBlockSize=112)}] [junit] compound=false [junit] hasProx=false [junit] numFiles=16 [junit] size (MB)=0,01 [junit] diagnostics = {optimize=true, mergeFactor=2, os.version=2.6.37-gentoo, os=Linux, lucene.version=4.0-SNAPSHOT, source=merge, os.arch=amd64, java.version=1.6.0_24, java.vendor=Sun Microsystems Inc.} [junit] no deletions [junit] test: open reader.OK [junit] test: fields..OK [2 fields] [junit] test: field norms.OK [2 fields] [junit] test: terms, freq, prox...ERROR: java.io.IOException: Read past EOF [junit] java.io.IOException: Read past EOF [junit] at org.apache.lucene.store.RAMInputStream.switchCurrentBuffer(RAMInputStream.java:90) [junit] at org.apache.lucene.store.RAMInputStream.readByte(RAMInputStream.java:63) [junit] at org.apache.lucene.store.MockIndexInputWrapper.readByte(MockIndexInputWrapper.java:105) [junit] at org.apache.lucene.store.DataInput.readVInt(DataInput.java:94) [junit] at org.apache.lucene.index.codecs.sep.SepSkipListReader.readSkipData(SepSkipListReader.java:188) [junit] at org.apache.lucene.index.codecs.MultiLevelSkipListReader.loadNextSkip(MultiLevelSkipListReader.java:142) [junit] at org.apache.lucene.index.codecs.MultiLevelSkipListReader.skipTo(MultiLevelSkipListReader.java:112) [junit] at org.apache.lucene.index.codecs.sep.SepPostingsReaderImpl$SepDocsEnum.advance(SepPostingsReaderImpl.java:454) [junit] at org.apache.lucene.index.CheckIndex.testTermIndex(CheckIndex.java:782) [junit] at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:495) [junit] at org.apache.lucene.util._TestUtil.checkIndex(_TestUtil.java:148) [junit] at org.apache.lucene.util._TestUtil.checkIndex(_TestUtil.java:138) [junit] at org.apache.lucene.index.TestOmitTf.testMixedMerge(TestOmitTf.java:155) [junit] at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) [junit] at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) [junit] at
[jira] [Commented] (LUCENE-3027) TestOmitTf.testMixedMerge random seed failure
[ https://issues.apache.org/jira/browse/LUCENE-3027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13019748#comment-13019748 ] Michael McCandless commented on LUCENE-3027: I found the issue: it's because a FieldInfo got into a bad state where omitTF was true but storesPayloads was also true. Plus, that the sep codec's skip data reader was tricked by this bad state. StandardCodec is not affected, and 3.x is not affected. Still, for 3.x I'll backport making sure FieldInfo always clears storesPayloads if omitTF is true. TestOmitTf.testMixedMerge random seed failure - Key: LUCENE-3027 URL: https://issues.apache.org/jira/browse/LUCENE-3027 Project: Lucene - Java Issue Type: Bug Reporter: selckin Attachments: output.txt Version: trunk r1091638 ant test -Dtests.seed=-6595054217575280191:5576532348905930588 [junit] - Standard Error - [junit] WARNING: test method: 'testDeMorgan' left thread running: Thread[NRT search threads-1691-thread-2,5,main] [junit] RESOURCE LEAK: test method: 'testDeMorgan' left 1 thread(s) running [junit] NOTE: reproduce with: ant test -Dtestcase=TestBooleanQuery -Dtestmethod=testDeMorgan -Dtests.seed=-6595054217575280191:5576532348905930588 [junit] - --- [junit] Testsuite: org.apache.lucene.index.TestNorms [junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 5.064 sec [junit] [junit] Testsuite: org.apache.lucene.index.TestOmitTf [junit] Testcase: testMixedMerge(org.apache.lucene.index.TestOmitTf): Caused an ERROR [junit] CheckIndex failed [junit] java.lang.RuntimeException: CheckIndex failed [junit] at org.apache.lucene.util._TestUtil.checkIndex(_TestUtil.java:152) [junit] at org.apache.lucene.util._TestUtil.checkIndex(_TestUtil.java:138) [junit] at org.apache.lucene.index.TestOmitTf.testMixedMerge(TestOmitTf.java:155) [junit] at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1232) [junit] at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1160) [junit] [junit] [junit] Tests run: 5, Failures: 0, Errors: 1, Time elapsed: 0.851 sec [junit] [junit] - Standard Output --- [junit] CheckIndex failed [junit] Segments file=segments_1 numSegments=1 version=FORMAT_4_0 [Lucene 4.0] [junit] 1 of 1: name=_12 docCount=60 [junit] codec=SegmentCodecs [codecs=[MockRandom, MockVariableIntBlock(baseBlockSize=112)], provider=RandomCodecProvider: {f1=MockRandom, f2=MockVariableIntBlock(baseBlockSize=112)}] [junit] compound=false [junit] hasProx=false [junit] numFiles=16 [junit] size (MB)=0,01 [junit] diagnostics = {optimize=true, mergeFactor=2, os.version=2.6.37-gentoo, os=Linux, lucene.version=4.0-SNAPSHOT, source=merge, os.arch=amd64, java.version=1.6.0_24, java.vendor=Sun Microsystems Inc.} [junit] no deletions [junit] test: open reader.OK [junit] test: fields..OK [2 fields] [junit] test: field norms.OK [2 fields] [junit] test: terms, freq, prox...ERROR: java.io.IOException: Read past EOF [junit] java.io.IOException: Read past EOF [junit] at org.apache.lucene.store.RAMInputStream.switchCurrentBuffer(RAMInputStream.java:90) [junit] at org.apache.lucene.store.RAMInputStream.readByte(RAMInputStream.java:63) [junit] at org.apache.lucene.store.MockIndexInputWrapper.readByte(MockIndexInputWrapper.java:105) [junit] at org.apache.lucene.store.DataInput.readVInt(DataInput.java:94) [junit] at org.apache.lucene.index.codecs.sep.SepSkipListReader.readSkipData(SepSkipListReader.java:188) [junit] at org.apache.lucene.index.codecs.MultiLevelSkipListReader.loadNextSkip(MultiLevelSkipListReader.java:142) [junit] at org.apache.lucene.index.codecs.MultiLevelSkipListReader.skipTo(MultiLevelSkipListReader.java:112) [junit] at org.apache.lucene.index.codecs.sep.SepPostingsReaderImpl$SepDocsEnum.advance(SepPostingsReaderImpl.java:454) [junit] at org.apache.lucene.index.CheckIndex.testTermIndex(CheckIndex.java:782) [junit] at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:495) [junit] at org.apache.lucene.util._TestUtil.checkIndex(_TestUtil.java:148) [junit] at org.apache.lucene.util._TestUtil.checkIndex(_TestUtil.java:138) [junit] at org.apache.lucene.index.TestOmitTf.testMixedMerge(TestOmitTf.java:155) [junit] at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) [junit]
[jira] [Updated] (LUCENE-3022) DictionaryCompoundWordTokenFilter Flag onlyLongestMatch has no affect
[ https://issues.apache.org/jira/browse/LUCENE-3022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Johann Höchtl updated LUCENE-3022: -- Attachment: LUCENE-3022.patch Patch fixing this issue including JUnitTest DictionaryCompoundWordTokenFilter Flag onlyLongestMatch has no affect - Key: LUCENE-3022 URL: https://issues.apache.org/jira/browse/LUCENE-3022 Project: Lucene - Java Issue Type: Bug Components: contrib/analyzers Affects Versions: 2.9.4, 3.1 Reporter: Johann Höchtl Priority: Minor Attachments: LUCENE-3022.patch Original Estimate: 5m Remaining Estimate: 5m When using the DictionaryCompoundWordTokenFilter with a german dictionary, I got a strange behaviour: The german word streifenbluse (blouse with stripes) was decompounded to streifen (stripe),reifen(tire) which makes no sense at all. I thought the flag onlyLongestMatch would fix this, because streifen is longer than reifen, but it had no effect. So I reviewed the sourcecode and found the problem: [code] protected void decomposeInternal(final Token token) { // Only words longer than minWordSize get processed if (token.length() this.minWordSize) { return; } char[] lowerCaseTermBuffer=makeLowerCaseCopy(token.buffer()); for (int i=0;itoken.length()-this.minSubwordSize;++i) { Token longestMatchToken=null; for (int j=this.minSubwordSize-1;jthis.maxSubwordSize;++j) { if(i+jtoken.length()) { break; } if(dictionary.contains(lowerCaseTermBuffer, i, j)) { if (this.onlyLongestMatch) { if (longestMatchToken!=null) { if (longestMatchToken.length()j) { longestMatchToken=createToken(i,j,token); } } else { longestMatchToken=createToken(i,j,token); } } else { tokens.add(createToken(i,j,token)); } } } if (this.onlyLongestMatch longestMatchToken!=null) { tokens.add(longestMatchToken); } } } [/code] should be changed to [code] protected void decomposeInternal(final Token token) { // Only words longer than minWordSize get processed if (token.termLength() this.minWordSize) { return; } char[] lowerCaseTermBuffer=makeLowerCaseCopy(token.termBuffer()); Token longestMatchToken=null; for (int i=0;itoken.termLength()-this.minSubwordSize;++i) { for (int j=this.minSubwordSize-1;jthis.maxSubwordSize;++j) { if(i+jtoken.termLength()) { break; } if(dictionary.contains(lowerCaseTermBuffer, i, j)) { if (this.onlyLongestMatch) { if (longestMatchToken!=null) { if (longestMatchToken.termLength()j) { longestMatchToken=createToken(i,j,token); } } else { longestMatchToken=createToken(i,j,token); } } else { tokens.add(createToken(i,j,token)); } } } } if (this.onlyLongestMatch longestMatchToken!=null) { tokens.add(longestMatchToken); } } [/code] So, that only the longest token is really indexed and the onlyLongestMatch Flag makes sense. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3027) TestOmitTf.testMixedMerge random seed failure
[ https://issues.apache.org/jira/browse/LUCENE-3027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-3027: --- Attachment: LUCENE-3027.patch Patch. I added asserts that trip when FieldInfo illegally has omitTFAP true, and storePayloads also true. I fixed three places where this was able to occur: merging, flushing and on loading a preflex index. I'll commit shortly. TestOmitTf.testMixedMerge random seed failure - Key: LUCENE-3027 URL: https://issues.apache.org/jira/browse/LUCENE-3027 Project: Lucene - Java Issue Type: Bug Reporter: selckin Attachments: LUCENE-3027.patch, output.txt Version: trunk r1091638 ant test -Dtests.seed=-6595054217575280191:5576532348905930588 [junit] - Standard Error - [junit] WARNING: test method: 'testDeMorgan' left thread running: Thread[NRT search threads-1691-thread-2,5,main] [junit] RESOURCE LEAK: test method: 'testDeMorgan' left 1 thread(s) running [junit] NOTE: reproduce with: ant test -Dtestcase=TestBooleanQuery -Dtestmethod=testDeMorgan -Dtests.seed=-6595054217575280191:5576532348905930588 [junit] - --- [junit] Testsuite: org.apache.lucene.index.TestNorms [junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 5.064 sec [junit] [junit] Testsuite: org.apache.lucene.index.TestOmitTf [junit] Testcase: testMixedMerge(org.apache.lucene.index.TestOmitTf): Caused an ERROR [junit] CheckIndex failed [junit] java.lang.RuntimeException: CheckIndex failed [junit] at org.apache.lucene.util._TestUtil.checkIndex(_TestUtil.java:152) [junit] at org.apache.lucene.util._TestUtil.checkIndex(_TestUtil.java:138) [junit] at org.apache.lucene.index.TestOmitTf.testMixedMerge(TestOmitTf.java:155) [junit] at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1232) [junit] at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1160) [junit] [junit] [junit] Tests run: 5, Failures: 0, Errors: 1, Time elapsed: 0.851 sec [junit] [junit] - Standard Output --- [junit] CheckIndex failed [junit] Segments file=segments_1 numSegments=1 version=FORMAT_4_0 [Lucene 4.0] [junit] 1 of 1: name=_12 docCount=60 [junit] codec=SegmentCodecs [codecs=[MockRandom, MockVariableIntBlock(baseBlockSize=112)], provider=RandomCodecProvider: {f1=MockRandom, f2=MockVariableIntBlock(baseBlockSize=112)}] [junit] compound=false [junit] hasProx=false [junit] numFiles=16 [junit] size (MB)=0,01 [junit] diagnostics = {optimize=true, mergeFactor=2, os.version=2.6.37-gentoo, os=Linux, lucene.version=4.0-SNAPSHOT, source=merge, os.arch=amd64, java.version=1.6.0_24, java.vendor=Sun Microsystems Inc.} [junit] no deletions [junit] test: open reader.OK [junit] test: fields..OK [2 fields] [junit] test: field norms.OK [2 fields] [junit] test: terms, freq, prox...ERROR: java.io.IOException: Read past EOF [junit] java.io.IOException: Read past EOF [junit] at org.apache.lucene.store.RAMInputStream.switchCurrentBuffer(RAMInputStream.java:90) [junit] at org.apache.lucene.store.RAMInputStream.readByte(RAMInputStream.java:63) [junit] at org.apache.lucene.store.MockIndexInputWrapper.readByte(MockIndexInputWrapper.java:105) [junit] at org.apache.lucene.store.DataInput.readVInt(DataInput.java:94) [junit] at org.apache.lucene.index.codecs.sep.SepSkipListReader.readSkipData(SepSkipListReader.java:188) [junit] at org.apache.lucene.index.codecs.MultiLevelSkipListReader.loadNextSkip(MultiLevelSkipListReader.java:142) [junit] at org.apache.lucene.index.codecs.MultiLevelSkipListReader.skipTo(MultiLevelSkipListReader.java:112) [junit] at org.apache.lucene.index.codecs.sep.SepPostingsReaderImpl$SepDocsEnum.advance(SepPostingsReaderImpl.java:454) [junit] at org.apache.lucene.index.CheckIndex.testTermIndex(CheckIndex.java:782) [junit] at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:495) [junit] at org.apache.lucene.util._TestUtil.checkIndex(_TestUtil.java:148) [junit] at org.apache.lucene.util._TestUtil.checkIndex(_TestUtil.java:138) [junit] at org.apache.lucene.index.TestOmitTf.testMixedMerge(TestOmitTf.java:155) [junit] at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) [junit] at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) [junit] at
[jira] [Commented] (LUCENE-3027) TestOmitTf.testMixedMerge random seed failure
[ https://issues.apache.org/jira/browse/LUCENE-3027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13019765#comment-13019765 ] Simon Willnauer commented on LUCENE-3027: - patch looks good mike! simon TestOmitTf.testMixedMerge random seed failure - Key: LUCENE-3027 URL: https://issues.apache.org/jira/browse/LUCENE-3027 Project: Lucene - Java Issue Type: Bug Reporter: selckin Attachments: LUCENE-3027.patch, output.txt Version: trunk r1091638 ant test -Dtests.seed=-6595054217575280191:5576532348905930588 [junit] - Standard Error - [junit] WARNING: test method: 'testDeMorgan' left thread running: Thread[NRT search threads-1691-thread-2,5,main] [junit] RESOURCE LEAK: test method: 'testDeMorgan' left 1 thread(s) running [junit] NOTE: reproduce with: ant test -Dtestcase=TestBooleanQuery -Dtestmethod=testDeMorgan -Dtests.seed=-6595054217575280191:5576532348905930588 [junit] - --- [junit] Testsuite: org.apache.lucene.index.TestNorms [junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 5.064 sec [junit] [junit] Testsuite: org.apache.lucene.index.TestOmitTf [junit] Testcase: testMixedMerge(org.apache.lucene.index.TestOmitTf): Caused an ERROR [junit] CheckIndex failed [junit] java.lang.RuntimeException: CheckIndex failed [junit] at org.apache.lucene.util._TestUtil.checkIndex(_TestUtil.java:152) [junit] at org.apache.lucene.util._TestUtil.checkIndex(_TestUtil.java:138) [junit] at org.apache.lucene.index.TestOmitTf.testMixedMerge(TestOmitTf.java:155) [junit] at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1232) [junit] at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1160) [junit] [junit] [junit] Tests run: 5, Failures: 0, Errors: 1, Time elapsed: 0.851 sec [junit] [junit] - Standard Output --- [junit] CheckIndex failed [junit] Segments file=segments_1 numSegments=1 version=FORMAT_4_0 [Lucene 4.0] [junit] 1 of 1: name=_12 docCount=60 [junit] codec=SegmentCodecs [codecs=[MockRandom, MockVariableIntBlock(baseBlockSize=112)], provider=RandomCodecProvider: {f1=MockRandom, f2=MockVariableIntBlock(baseBlockSize=112)}] [junit] compound=false [junit] hasProx=false [junit] numFiles=16 [junit] size (MB)=0,01 [junit] diagnostics = {optimize=true, mergeFactor=2, os.version=2.6.37-gentoo, os=Linux, lucene.version=4.0-SNAPSHOT, source=merge, os.arch=amd64, java.version=1.6.0_24, java.vendor=Sun Microsystems Inc.} [junit] no deletions [junit] test: open reader.OK [junit] test: fields..OK [2 fields] [junit] test: field norms.OK [2 fields] [junit] test: terms, freq, prox...ERROR: java.io.IOException: Read past EOF [junit] java.io.IOException: Read past EOF [junit] at org.apache.lucene.store.RAMInputStream.switchCurrentBuffer(RAMInputStream.java:90) [junit] at org.apache.lucene.store.RAMInputStream.readByte(RAMInputStream.java:63) [junit] at org.apache.lucene.store.MockIndexInputWrapper.readByte(MockIndexInputWrapper.java:105) [junit] at org.apache.lucene.store.DataInput.readVInt(DataInput.java:94) [junit] at org.apache.lucene.index.codecs.sep.SepSkipListReader.readSkipData(SepSkipListReader.java:188) [junit] at org.apache.lucene.index.codecs.MultiLevelSkipListReader.loadNextSkip(MultiLevelSkipListReader.java:142) [junit] at org.apache.lucene.index.codecs.MultiLevelSkipListReader.skipTo(MultiLevelSkipListReader.java:112) [junit] at org.apache.lucene.index.codecs.sep.SepPostingsReaderImpl$SepDocsEnum.advance(SepPostingsReaderImpl.java:454) [junit] at org.apache.lucene.index.CheckIndex.testTermIndex(CheckIndex.java:782) [junit] at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:495) [junit] at org.apache.lucene.util._TestUtil.checkIndex(_TestUtil.java:148) [junit] at org.apache.lucene.util._TestUtil.checkIndex(_TestUtil.java:138) [junit] at org.apache.lucene.index.TestOmitTf.testMixedMerge(TestOmitTf.java:155) [junit] at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) [junit] at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) [junit] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) [junit] at java.lang.reflect.Method.invoke(Method.java:597) [junit] at
[jira] [Updated] (LUCENE-3018) Lucene Native Directory implementation need automated build
[ https://issues.apache.org/jira/browse/LUCENE-3018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Thacker updated LUCENE-3018: -- Attachment: LUCENE-3018.patch I have modified the build.xml . There is one problem with this build file- The linking to the JNI header files is still giving errors. What am I doing wrong ? This is how I am running the ant task : {code:title=Command Line|borderStyle=solid} ant -lib lucene/dev/trunk/lucene/cpptasks.jar build-native {code} Lucene Native Directory implementation need automated build --- Key: LUCENE-3018 URL: https://issues.apache.org/jira/browse/LUCENE-3018 Project: Lucene - Java Issue Type: Wish Components: Build Affects Versions: 4.0 Reporter: Simon Willnauer Assignee: Varun Thacker Priority: Minor Fix For: 4.0 Attachments: LUCENE-3018.patch Currently the native directory impl in contrib/misc require manual action to compile the c code (partially) documented in https://svn.apache.org/repos/asf/lucene/dev/trunk/lucene/contrib/misc/src/java/overview.html yet it would be nice if we had an ant task and documentation for all platforms how to compile them and set up the prerequisites. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2956) Support updateDocument() with DWPTs
[ https://issues.apache.org/jira/browse/LUCENE-2956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13019768#comment-13019768 ] Simon Willnauer commented on LUCENE-2956: - bq. Shall we start again on LUCENE-2312? I think we still need/want to use sequence ids there. The RT DWPTs shouldn't have so many documents that using a long[] for the sequence ids is too RAM consuming? Jason I think nothing prevents you from start working on this again Yet, I think we should freeze the branch now and only allow merging, bug fixes, tests and documentation fixes until we land on trunk. Once we are there we can freely push stuff in the branch again and make it work with seq. ids. thoughts? Support updateDocument() with DWPTs --- Key: LUCENE-2956 URL: https://issues.apache.org/jira/browse/LUCENE-2956 Project: Lucene - Java Issue Type: Bug Components: Index Affects Versions: Realtime Branch Reporter: Michael Busch Assignee: Simon Willnauer Priority: Minor Fix For: Realtime Branch Attachments: LUCENE-2956.patch, LUCENE-2956.patch With separate DocumentsWriterPerThreads (DWPT) it can currently happen that the delete part of an updateDocument() is flushed and committed separately from the corresponding new document. We need to make sure that updateDocument() is always an atomic operation from a IW.commit() and IW.getReader() perspective. See LUCENE-2324 for more details. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-2956) Support updateDocument() with DWPTs
[ https://issues.apache.org/jira/browse/LUCENE-2956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Willnauer resolved LUCENE-2956. - Resolution: Fixed committed to branch Support updateDocument() with DWPTs --- Key: LUCENE-2956 URL: https://issues.apache.org/jira/browse/LUCENE-2956 Project: Lucene - Java Issue Type: Bug Components: Index Affects Versions: Realtime Branch Reporter: Michael Busch Assignee: Simon Willnauer Priority: Minor Fix For: Realtime Branch Attachments: LUCENE-2956.patch, LUCENE-2956.patch With separate DocumentsWriterPerThreads (DWPT) it can currently happen that the delete part of an updateDocument() is flushed and committed separately from the corresponding new document. We need to make sure that updateDocument() is always an atomic operation from a IW.commit() and IW.getReader() perspective. See LUCENE-2324 for more details. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-3028) IW.getReader() returns inconsistent reader on RT Branch
IW.getReader() returns inconsistent reader on RT Branch --- Key: LUCENE-3028 URL: https://issues.apache.org/jira/browse/LUCENE-3028 Project: Lucene - Java Issue Type: Bug Components: Index Affects Versions: Realtime Branch Reporter: Simon Willnauer Assignee: Simon Willnauer Fix For: Realtime Branch I extended the testcase TestRollingUpdates#testUpdateSameDoc to pull a NRT reader after each update and asserted that is always sees only one document. Yet, this fails with current branch since there is a problem in how we flush in the getReader() case. What happens here is that we flush all threads and then release the lock (letting other flushes which came in after we entered the flushAllThread context, continue) so that we could concurrently get a new segment that transports global deletes without the corresponding add. They sneak in while we continue to open the NRT reader which in turn sees inconsistent results. I will upload a patch soon -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-2378) FST-based Lookup (suggestions) for prefix matches.
[ https://issues.apache.org/jira/browse/SOLR-2378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dawid Weiss updated SOLR-2378: -- Attachment: SOLR-2378.patch Adding Solr tests, removed the big queries file so that it doesn't bloat the patch (will commit it in directly). FST-based Lookup (suggestions) for prefix matches. -- Key: SOLR-2378 URL: https://issues.apache.org/jira/browse/SOLR-2378 Project: Solr Issue Type: New Feature Components: spellchecker Reporter: Dawid Weiss Assignee: Dawid Weiss Labels: lookup, prefix Fix For: 4.0 Attachments: SOLR-2378.patch, SOLR-2378.patch Implement a subclass of Lookup based on finite state automata/ transducers (Lucene FST package). This issue is for implementing a relatively basic prefix matcher, we will handle infixes and other types of input matches gradually. Impl. phases: - -write a DFA based suggester effectively identical to ternary tree based solution right now,- - -baseline benchmark against tern. tree (memory consumption, rebuilding speed, indexing speed; reuse Andrzej's benchmark code)- - -modify DFA to encode term weights directly in the automaton (optimize for onlyMostPopular case)- - -benchmark again- - add infix suggestion support with prefix matches boosted higher (?) - benchmark again - modify the tutorial on the wiki [http://wiki.apache.org/solr/Suggester] -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-2378) FST-based Lookup (suggestions) for prefix matches.
[ https://issues.apache.org/jira/browse/SOLR-2378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dawid Weiss updated SOLR-2378: -- Attachment: (was: SOLR-2378.patch) FST-based Lookup (suggestions) for prefix matches. -- Key: SOLR-2378 URL: https://issues.apache.org/jira/browse/SOLR-2378 Project: Solr Issue Type: New Feature Components: spellchecker Reporter: Dawid Weiss Assignee: Dawid Weiss Labels: lookup, prefix Fix For: 4.0 Attachments: SOLR-2378.patch Implement a subclass of Lookup based on finite state automata/ transducers (Lucene FST package). This issue is for implementing a relatively basic prefix matcher, we will handle infixes and other types of input matches gradually. Impl. phases: - -write a DFA based suggester effectively identical to ternary tree based solution right now,- - -baseline benchmark against tern. tree (memory consumption, rebuilding speed, indexing speed; reuse Andrzej's benchmark code)- - -modify DFA to encode term weights directly in the automaton (optimize for onlyMostPopular case)- - -benchmark again- - add infix suggestion support with prefix matches boosted higher (?) - benchmark again - modify the tutorial on the wiki [http://wiki.apache.org/solr/Suggester] -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-3027) TestOmitTf.testMixedMerge random seed failure
[ https://issues.apache.org/jira/browse/LUCENE-3027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved LUCENE-3027. Resolution: Fixed Thanks selckin! TestOmitTf.testMixedMerge random seed failure - Key: LUCENE-3027 URL: https://issues.apache.org/jira/browse/LUCENE-3027 Project: Lucene - Java Issue Type: Bug Reporter: selckin Attachments: LUCENE-3027.patch, output.txt Version: trunk r1091638 ant test -Dtests.seed=-6595054217575280191:5576532348905930588 [junit] - Standard Error - [junit] WARNING: test method: 'testDeMorgan' left thread running: Thread[NRT search threads-1691-thread-2,5,main] [junit] RESOURCE LEAK: test method: 'testDeMorgan' left 1 thread(s) running [junit] NOTE: reproduce with: ant test -Dtestcase=TestBooleanQuery -Dtestmethod=testDeMorgan -Dtests.seed=-6595054217575280191:5576532348905930588 [junit] - --- [junit] Testsuite: org.apache.lucene.index.TestNorms [junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 5.064 sec [junit] [junit] Testsuite: org.apache.lucene.index.TestOmitTf [junit] Testcase: testMixedMerge(org.apache.lucene.index.TestOmitTf): Caused an ERROR [junit] CheckIndex failed [junit] java.lang.RuntimeException: CheckIndex failed [junit] at org.apache.lucene.util._TestUtil.checkIndex(_TestUtil.java:152) [junit] at org.apache.lucene.util._TestUtil.checkIndex(_TestUtil.java:138) [junit] at org.apache.lucene.index.TestOmitTf.testMixedMerge(TestOmitTf.java:155) [junit] at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1232) [junit] at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1160) [junit] [junit] [junit] Tests run: 5, Failures: 0, Errors: 1, Time elapsed: 0.851 sec [junit] [junit] - Standard Output --- [junit] CheckIndex failed [junit] Segments file=segments_1 numSegments=1 version=FORMAT_4_0 [Lucene 4.0] [junit] 1 of 1: name=_12 docCount=60 [junit] codec=SegmentCodecs [codecs=[MockRandom, MockVariableIntBlock(baseBlockSize=112)], provider=RandomCodecProvider: {f1=MockRandom, f2=MockVariableIntBlock(baseBlockSize=112)}] [junit] compound=false [junit] hasProx=false [junit] numFiles=16 [junit] size (MB)=0,01 [junit] diagnostics = {optimize=true, mergeFactor=2, os.version=2.6.37-gentoo, os=Linux, lucene.version=4.0-SNAPSHOT, source=merge, os.arch=amd64, java.version=1.6.0_24, java.vendor=Sun Microsystems Inc.} [junit] no deletions [junit] test: open reader.OK [junit] test: fields..OK [2 fields] [junit] test: field norms.OK [2 fields] [junit] test: terms, freq, prox...ERROR: java.io.IOException: Read past EOF [junit] java.io.IOException: Read past EOF [junit] at org.apache.lucene.store.RAMInputStream.switchCurrentBuffer(RAMInputStream.java:90) [junit] at org.apache.lucene.store.RAMInputStream.readByte(RAMInputStream.java:63) [junit] at org.apache.lucene.store.MockIndexInputWrapper.readByte(MockIndexInputWrapper.java:105) [junit] at org.apache.lucene.store.DataInput.readVInt(DataInput.java:94) [junit] at org.apache.lucene.index.codecs.sep.SepSkipListReader.readSkipData(SepSkipListReader.java:188) [junit] at org.apache.lucene.index.codecs.MultiLevelSkipListReader.loadNextSkip(MultiLevelSkipListReader.java:142) [junit] at org.apache.lucene.index.codecs.MultiLevelSkipListReader.skipTo(MultiLevelSkipListReader.java:112) [junit] at org.apache.lucene.index.codecs.sep.SepPostingsReaderImpl$SepDocsEnum.advance(SepPostingsReaderImpl.java:454) [junit] at org.apache.lucene.index.CheckIndex.testTermIndex(CheckIndex.java:782) [junit] at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:495) [junit] at org.apache.lucene.util._TestUtil.checkIndex(_TestUtil.java:148) [junit] at org.apache.lucene.util._TestUtil.checkIndex(_TestUtil.java:138) [junit] at org.apache.lucene.index.TestOmitTf.testMixedMerge(TestOmitTf.java:155) [junit] at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) [junit] at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) [junit] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) [junit] at java.lang.reflect.Method.invoke(Method.java:597) [junit] at
[jira] [Updated] (SOLR-2378) FST-based Lookup (suggestions) for prefix matches.
[ https://issues.apache.org/jira/browse/SOLR-2378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dawid Weiss updated SOLR-2378: -- Description: Implement a subclass of Lookup based on finite state automata/ transducers (Lucene FST package). This issue is for implementing a relatively basic prefix matcher, we will handle infixes and other types of input matches gradually. Impl. phases: - -write a DFA based suggester effectively identical to ternary tree based solution right now,- - -baseline benchmark against tern. tree (memory consumption, rebuilding speed, indexing speed; reuse Andrzej's benchmark code)- - -modify DFA to encode term weights directly in the automaton (optimize for onlyMostPopular case)- - -benchmark again- - -benchmark again- - -modify the tutorial on the wiki [http://wiki.apache.org/solr/Suggester]- was: Implement a subclass of Lookup based on finite state automata/ transducers (Lucene FST package). This issue is for implementing a relatively basic prefix matcher, we will handle infixes and other types of input matches gradually. Impl. phases: - -write a DFA based suggester effectively identical to ternary tree based solution right now,- - -baseline benchmark against tern. tree (memory consumption, rebuilding speed, indexing speed; reuse Andrzej's benchmark code)- - -modify DFA to encode term weights directly in the automaton (optimize for onlyMostPopular case)- - -benchmark again- - add infix suggestion support with prefix matches boosted higher (?) - benchmark again - modify the tutorial on the wiki [http://wiki.apache.org/solr/Suggester] FST-based Lookup (suggestions) for prefix matches. -- Key: SOLR-2378 URL: https://issues.apache.org/jira/browse/SOLR-2378 Project: Solr Issue Type: New Feature Components: spellchecker Reporter: Dawid Weiss Assignee: Dawid Weiss Labels: lookup, prefix Fix For: 4.0 Attachments: SOLR-2378.patch Implement a subclass of Lookup based on finite state automata/ transducers (Lucene FST package). This issue is for implementing a relatively basic prefix matcher, we will handle infixes and other types of input matches gradually. Impl. phases: - -write a DFA based suggester effectively identical to ternary tree based solution right now,- - -baseline benchmark against tern. tree (memory consumption, rebuilding speed, indexing speed; reuse Andrzej's benchmark code)- - -modify DFA to encode term weights directly in the automaton (optimize for onlyMostPopular case)- - -benchmark again- - -benchmark again- - -modify the tutorial on the wiki [http://wiki.apache.org/solr/Suggester]- -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-2378) FST-based Lookup (suggestions) for prefix matches.
[ https://issues.apache.org/jira/browse/SOLR-2378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dawid Weiss updated SOLR-2378: -- Description: Implement a subclass of Lookup based on finite state automata/ transducers (Lucene FST package). This issue is for implementing a relatively basic prefix matcher, we will handle infixes and other types of input matches gradually. Impl. phases: - -write a DFA based suggester effectively identical to ternary tree based solution right now,- - -baseline benchmark against tern. tree (memory consumption, rebuilding speed, indexing speed; reuse Andrzej's benchmark code)- - -modify DFA to encode term weights directly in the automaton (optimize for onlyMostPopular case)- - -benchmark again- - -benchmark again- - -modify the tutorial on the wiki- [http://wiki.apache.org/solr/Suggester] was: Implement a subclass of Lookup based on finite state automata/ transducers (Lucene FST package). This issue is for implementing a relatively basic prefix matcher, we will handle infixes and other types of input matches gradually. Impl. phases: - -write a DFA based suggester effectively identical to ternary tree based solution right now,- - -baseline benchmark against tern. tree (memory consumption, rebuilding speed, indexing speed; reuse Andrzej's benchmark code)- - -modify DFA to encode term weights directly in the automaton (optimize for onlyMostPopular case)- - -benchmark again- - -benchmark again- - -modify the tutorial on the wiki [http://wiki.apache.org/solr/Suggester]- FST-based Lookup (suggestions) for prefix matches. -- Key: SOLR-2378 URL: https://issues.apache.org/jira/browse/SOLR-2378 Project: Solr Issue Type: New Feature Components: spellchecker Reporter: Dawid Weiss Assignee: Dawid Weiss Labels: lookup, prefix Fix For: 4.0 Attachments: SOLR-2378.patch Implement a subclass of Lookup based on finite state automata/ transducers (Lucene FST package). This issue is for implementing a relatively basic prefix matcher, we will handle infixes and other types of input matches gradually. Impl. phases: - -write a DFA based suggester effectively identical to ternary tree based solution right now,- - -baseline benchmark against tern. tree (memory consumption, rebuilding speed, indexing speed; reuse Andrzej's benchmark code)- - -modify DFA to encode term weights directly in the automaton (optimize for onlyMostPopular case)- - -benchmark again- - -benchmark again- - -modify the tutorial on the wiki- [http://wiki.apache.org/solr/Suggester] -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (SOLR-2378) FST-based Lookup (suggestions) for prefix matches.
[ https://issues.apache.org/jira/browse/SOLR-2378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dawid Weiss resolved SOLR-2378. --- Resolution: Fixed In trunk. FST-based Lookup (suggestions) for prefix matches. -- Key: SOLR-2378 URL: https://issues.apache.org/jira/browse/SOLR-2378 Project: Solr Issue Type: New Feature Components: spellchecker Reporter: Dawid Weiss Assignee: Dawid Weiss Labels: lookup, prefix Fix For: 4.0 Attachments: SOLR-2378.patch Implement a subclass of Lookup based on finite state automata/ transducers (Lucene FST package). This issue is for implementing a relatively basic prefix matcher, we will handle infixes and other types of input matches gradually. Impl. phases: - -write a DFA based suggester effectively identical to ternary tree based solution right now,- - -baseline benchmark against tern. tree (memory consumption, rebuilding speed, indexing speed; reuse Andrzej's benchmark code)- - -modify DFA to encode term weights directly in the automaton (optimize for onlyMostPopular case)- - -benchmark again- - -benchmark again- - -modify the tutorial on the wiki- [http://wiki.apache.org/solr/Suggester] -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
IndexWriter.ramSizeInBytes
Hi I'm indexing w/ IW, flush-by-RAM=off and flush-by-doc=MAX_INT. Whenever iw.ramSizeInBytes() = threshold, I commit the changes, serializes the Directory somewhere and starts with a new Directory and IW instance. The threshold is currently 32MB. I noticed though that the size of the serialized Directory is nearly half (16 MB). Is that expected? Will I see that behavior every time (e.g. w/ large stored fields), or is it data dependent? I assume that the data can affect the compression, but I never thought that by 50% factor, from RAM to disk. Shai
[jira] [Commented] (LUCENE-3018) Lucene Native Directory implementation need automated build
[ https://issues.apache.org/jira/browse/LUCENE-3018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13019786#comment-13019786 ] Simon Willnauer commented on LUCENE-3018: - It seems like that you need to use includepath rather than libset here something like {code} includepath pathelement location=${java.home}/include/ pathelement location=${java.home}/include/linux/ /includepath {code} simon Lucene Native Directory implementation need automated build --- Key: LUCENE-3018 URL: https://issues.apache.org/jira/browse/LUCENE-3018 Project: Lucene - Java Issue Type: Wish Components: Build Affects Versions: 4.0 Reporter: Simon Willnauer Assignee: Varun Thacker Priority: Minor Fix For: 4.0 Attachments: LUCENE-3018.patch Currently the native directory impl in contrib/misc require manual action to compile the c code (partially) documented in https://svn.apache.org/repos/asf/lucene/dev/trunk/lucene/contrib/misc/src/java/overview.html yet it would be nice if we had an ant task and documentation for all platforms how to compile them and set up the prerequisites. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3028) IW.getReader() returns inconsistent reader on RT Branch
[ https://issues.apache.org/jira/browse/LUCENE-3028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Willnauer updated LUCENE-3028: Comment: was deleted (was: here is a first patch) IW.getReader() returns inconsistent reader on RT Branch --- Key: LUCENE-3028 URL: https://issues.apache.org/jira/browse/LUCENE-3028 Project: Lucene - Java Issue Type: Bug Components: Index Affects Versions: Realtime Branch Reporter: Simon Willnauer Assignee: Simon Willnauer Fix For: Realtime Branch Attachments: LUCENE-3028.patch I extended the testcase TestRollingUpdates#testUpdateSameDoc to pull a NRT reader after each update and asserted that is always sees only one document. Yet, this fails with current branch since there is a problem in how we flush in the getReader() case. What happens here is that we flush all threads and then release the lock (letting other flushes which came in after we entered the flushAllThread context, continue) so that we could concurrently get a new segment that transports global deletes without the corresponding add. They sneak in while we continue to open the NRT reader which in turn sees inconsistent results. I will upload a patch soon -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3028) IW.getReader() returns inconsistent reader on RT Branch
[ https://issues.apache.org/jira/browse/LUCENE-3028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Willnauer updated LUCENE-3028: Attachment: (was: LUCENE-3028.patch) IW.getReader() returns inconsistent reader on RT Branch --- Key: LUCENE-3028 URL: https://issues.apache.org/jira/browse/LUCENE-3028 Project: Lucene - Java Issue Type: Bug Components: Index Affects Versions: Realtime Branch Reporter: Simon Willnauer Assignee: Simon Willnauer Fix For: Realtime Branch Attachments: LUCENE-3028.patch I extended the testcase TestRollingUpdates#testUpdateSameDoc to pull a NRT reader after each update and asserted that is always sees only one document. Yet, this fails with current branch since there is a problem in how we flush in the getReader() case. What happens here is that we flush all threads and then release the lock (letting other flushes which came in after we entered the flushAllThread context, continue) so that we could concurrently get a new segment that transports global deletes without the corresponding add. They sneak in while we continue to open the NRT reader which in turn sees inconsistent results. I will upload a patch soon -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3028) IW.getReader() returns inconsistent reader on RT Branch
[ https://issues.apache.org/jira/browse/LUCENE-3028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Willnauer updated LUCENE-3028: Attachment: LUCENE-3028.patch here is a first patch IW.getReader() returns inconsistent reader on RT Branch --- Key: LUCENE-3028 URL: https://issues.apache.org/jira/browse/LUCENE-3028 Project: Lucene - Java Issue Type: Bug Components: Index Affects Versions: Realtime Branch Reporter: Simon Willnauer Assignee: Simon Willnauer Fix For: Realtime Branch Attachments: LUCENE-3028.patch I extended the testcase TestRollingUpdates#testUpdateSameDoc to pull a NRT reader after each update and asserted that is always sees only one document. Yet, this fails with current branch since there is a problem in how we flush in the getReader() case. What happens here is that we flush all threads and then release the lock (letting other flushes which came in after we entered the flushAllThread context, continue) so that we could concurrently get a new segment that transports global deletes without the corresponding add. They sneak in while we continue to open the NRT reader which in turn sees inconsistent results. I will upload a patch soon -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3028) IW.getReader() returns inconsistent reader on RT Branch
[ https://issues.apache.org/jira/browse/LUCENE-3028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Willnauer updated LUCENE-3028: Attachment: LUCENE-3028.patch here is a first patch IW.getReader() returns inconsistent reader on RT Branch --- Key: LUCENE-3028 URL: https://issues.apache.org/jira/browse/LUCENE-3028 Project: Lucene - Java Issue Type: Bug Components: Index Affects Versions: Realtime Branch Reporter: Simon Willnauer Assignee: Simon Willnauer Fix For: Realtime Branch Attachments: LUCENE-3028.patch I extended the testcase TestRollingUpdates#testUpdateSameDoc to pull a NRT reader after each update and asserted that is always sees only one document. Yet, this fails with current branch since there is a problem in how we flush in the getReader() case. What happens here is that we flush all threads and then release the lock (letting other flushes which came in after we entered the flushAllThread context, continue) so that we could concurrently get a new segment that transports global deletes without the corresponding add. They sneak in while we continue to open the NRT reader which in turn sees inconsistent results. I will upload a patch soon -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3018) Lucene Native Directory implementation need automated build
[ https://issues.apache.org/jira/browse/LUCENE-3018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13019789#comment-13019789 ] Simon Willnauer commented on LUCENE-3018: - regarding you attached patch. you should make sure that you are checking the {noformat} Grant license to ASF for inclusion in ASF works (as per the Apache License §5) {noformat} checkbox in the attach dialog when you uploading patches. Can you also provide a quick guide how to install the cpp tasks for ant and maybe upload the jars you have added to make this task work? simon Lucene Native Directory implementation need automated build --- Key: LUCENE-3018 URL: https://issues.apache.org/jira/browse/LUCENE-3018 Project: Lucene - Java Issue Type: Wish Components: Build Affects Versions: 4.0 Reporter: Simon Willnauer Assignee: Varun Thacker Priority: Minor Fix For: 4.0 Attachments: LUCENE-3018.patch Currently the native directory impl in contrib/misc require manual action to compile the c code (partially) documented in https://svn.apache.org/repos/asf/lucene/dev/trunk/lucene/contrib/misc/src/java/overview.html yet it would be nice if we had an ant task and documentation for all platforms how to compile them and set up the prerequisites. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3028) IW.getReader() returns inconsistent reader on RT Branch
[ https://issues.apache.org/jira/browse/LUCENE-3028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Willnauer updated LUCENE-3028: Attachment: LUCENE-3028.patch next iteration, edited some asserts in DW IW.getReader() returns inconsistent reader on RT Branch --- Key: LUCENE-3028 URL: https://issues.apache.org/jira/browse/LUCENE-3028 Project: Lucene - Java Issue Type: Bug Components: Index Affects Versions: Realtime Branch Reporter: Simon Willnauer Assignee: Simon Willnauer Fix For: Realtime Branch Attachments: LUCENE-3028.patch, LUCENE-3028.patch I extended the testcase TestRollingUpdates#testUpdateSameDoc to pull a NRT reader after each update and asserted that is always sees only one document. Yet, this fails with current branch since there is a problem in how we flush in the getReader() case. What happens here is that we flush all threads and then release the lock (letting other flushes which came in after we entered the flushAllThread context, continue) so that we could concurrently get a new segment that transports global deletes without the corresponding add. They sneak in while we continue to open the NRT reader which in turn sees inconsistent results. I will upload a patch soon -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3023) Land DWPT on trunk
[ https://issues.apache.org/jira/browse/LUCENE-3023?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13019801#comment-13019801 ] Simon Willnauer commented on LUCENE-3023: - I added a [jenkins build|https://builds.apache.org/hudson/job/Lucene-Solr-tests-only-realtime_search-branch/] that runs every 4 hours to give the RT branch some exercise. I added my email address and buschmi to the recipients if the build fails if you wanna be added let me know. From now on we should only commit bugfixes, documentation and merges with trunk to this branch. From my point of view there is only one blocker left here (LUCENE-3028) so the remaining work is mainly reviewing the current state and polishing the javadocs. I will go over IW, IR and DW java docs as a start. Land DWPT on trunk -- Key: LUCENE-3023 URL: https://issues.apache.org/jira/browse/LUCENE-3023 Project: Lucene - Java Issue Type: Task Affects Versions: CSF branch, 4.0 Reporter: Simon Willnauer Assignee: Simon Willnauer Fix For: 4.0 With LUCENE-2956 we have resolved the last remaining issue for LUCENE-2324 so we can proceed landing the DWPT development on trunk soon. I think one of the bigger issues here is to make sure that all JavaDocs for IW etc. are still correct though. I will start going through that first. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-2573) Tiered flushing of DWPTs by RAM with low/high water marks
[ https://issues.apache.org/jira/browse/LUCENE-2573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Willnauer resolved LUCENE-2573. - Resolution: Fixed this is committed to branch reviews should go through LUCENE-3023 Tiered flushing of DWPTs by RAM with low/high water marks - Key: LUCENE-2573 URL: https://issues.apache.org/jira/browse/LUCENE-2573 Project: Lucene - Java Issue Type: Improvement Components: Index Reporter: Michael Busch Assignee: Simon Willnauer Priority: Minor Fix For: Realtime Branch Attachments: LUCENE-2573.patch, LUCENE-2573.patch, LUCENE-2573.patch, LUCENE-2573.patch, LUCENE-2573.patch, LUCENE-2573.patch, LUCENE-2573.patch, LUCENE-2573.patch, LUCENE-2573.patch Now that we have DocumentsWriterPerThreads we need to track total consumed RAM across all DWPTs. A flushing strategy idea that was discussed in LUCENE-2324 was to use a tiered approach: - Flush the first DWPT at a low water mark (e.g. at 90% of allowed RAM) - Flush all DWPTs at a high water mark (e.g. at 110%) - Use linear steps in between high and low watermark: E.g. when 5 DWPTs are used, flush at 90%, 95%, 100%, 105% and 110%. Should we allow the user to configure the low and high water mark values explicitly using total values (e.g. low water mark at 120MB, high water mark at 140MB)? Or shall we keep for simplicity the single setRAMBufferSizeMB() config method and use something like 90% and 110% for the water marks? -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2324) Per thread DocumentsWriters that write their own private segments
[ https://issues.apache.org/jira/browse/LUCENE-2324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13019803#comment-13019803 ] Simon Willnauer commented on LUCENE-2324: - guys I opened LUCENE-3023 to land on trunk! can I close this and we iterate on LUCENE-3023 from now on? simon Per thread DocumentsWriters that write their own private segments - Key: LUCENE-2324 URL: https://issues.apache.org/jira/browse/LUCENE-2324 Project: Lucene - Java Issue Type: Improvement Components: Index Reporter: Michael Busch Assignee: Michael Busch Priority: Minor Fix For: Realtime Branch Attachments: LUCENE-2324-SMALL.patch, LUCENE-2324-SMALL.patch, LUCENE-2324-SMALL.patch, LUCENE-2324-SMALL.patch, LUCENE-2324-SMALL.patch, LUCENE-2324.patch, LUCENE-2324.patch, LUCENE-2324.patch, LUCENE-2324.patch, lucene-2324.patch, lucene-2324.patch, test.out, test.out, test.out, test.out See LUCENE-2293 for motivation and more details. I'm copying here Mike's summary he posted on 2293: Change the approach for how we buffer in RAM to a more isolated approach, whereby IW has N fully independent RAM segments in-process and when a doc needs to be indexed it's added to one of them. Each segment would also write its own doc stores and normal segment merging (not the inefficient merge we now do on flush) would merge them. This should be a good simplification in the chain (eg maybe we can remove the *PerThread classes). The segments can flush independently, letting us make much better concurrent use of IO CPU. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: IndexWriter.ramSizeInBytes
This is actually [sadly] expected. This is showing that your RAM efficiency is ~50% (well, less, if the segment also has stored fields / term vectors). This is because the in-RAM data structures cannot be 100% efficient as they must leave room to grow the individual postings. But once written on disk the format is obviously compacted vs what's in RAM. Mike http://blog.mikemccandless.com On Thu, Apr 14, 2011 at 7:21 AM, Shai Erera ser...@gmail.com wrote: Hi I'm indexing w/ IW, flush-by-RAM=off and flush-by-doc=MAX_INT. Whenever iw.ramSizeInBytes() = threshold, I commit the changes, serializes the Directory somewhere and starts with a new Directory and IW instance. The threshold is currently 32MB. I noticed though that the size of the serialized Directory is nearly half (16 MB). Is that expected? Will I see that behavior every time (e.g. w/ large stored fields), or is it data dependent? I assume that the data can affect the compression, but I never thought that by 50% factor, from RAM to disk. Shai - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: TestIndexWriterDelete#testUpdatesOnDiskFull can false fail
just committed to trunk simon On Wed, Apr 13, 2011 at 5:06 PM, Michael McCandless luc...@mikemccandless.com wrote: +1 Mike http://blog.mikemccandless.com On Wed, Apr 13, 2011 at 5:58 AM, Simon Willnauer simon.willna...@googlemail.com wrote: In TestIndexWriterDelete#testUpdatesOnDiskFull especially between line 538 and 553 we could get a random exception from the MockDirectoryWrapper which makes the test fail since we are not catching / expecting those exceptions. I can make this fail on trunk even in 1000 runs but on realtime it fails quickly after I merged this morning. I think we should just disable the random exception for this part and reenable after we are done, see patch below! - Thoughts? Index: lucene/src/test/org/apache/lucene/index/TestIndexWriterDelete.java === --- lucene/src/test/org/apache/lucene/index/TestIndexWriterDelete.java (revision 1091721) +++ lucene/src/test/org/apache/lucene/index/TestIndexWriterDelete.java (working copy) @@ -536,7 +536,9 @@ fail(testName + hit IOException after disk space was freed up); } } - + // prevent throwing a random exception here!! + final double randomIOExceptionRate = dir.getRandomIOExceptionRate(); + dir.setRandomIOExceptionRate(0.0); if (!success) { // Must force the close else the writer can have // open files which cause exc in MockRAMDir.close @@ -549,6 +551,7 @@ _TestUtil.checkIndex(dir); TestIndexWriter.assertNoUnreferencedFiles(dir, after writer.close); } + dir.setRandomIOExceptionRate(randomIOExceptionRate); // Finally, verify index is not corrupt, and, if // we succeeded, we see all docs changed, and if - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3028) IW.getReader() returns inconsistent reader on RT Branch
[ https://issues.apache.org/jira/browse/LUCENE-3028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13019806#comment-13019806 ] Simon Willnauer commented on LUCENE-3028: - I will commit this latest patch to the branch we can still iterates but since we have jenkins running builds I want to let that sink a bit too simon IW.getReader() returns inconsistent reader on RT Branch --- Key: LUCENE-3028 URL: https://issues.apache.org/jira/browse/LUCENE-3028 Project: Lucene - Java Issue Type: Bug Components: Index Affects Versions: Realtime Branch Reporter: Simon Willnauer Assignee: Simon Willnauer Fix For: Realtime Branch Attachments: LUCENE-3028.patch, LUCENE-3028.patch I extended the testcase TestRollingUpdates#testUpdateSameDoc to pull a NRT reader after each update and asserted that is always sees only one document. Yet, this fails with current branch since there is a problem in how we flush in the getReader() case. What happens here is that we flush all threads and then release the lock (letting other flushes which came in after we entered the flushAllThread context, continue) so that we could concurrently get a new segment that transports global deletes without the corresponding add. They sneak in while we continue to open the NRT reader which in turn sees inconsistent results. I will upload a patch soon -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3023) Land DWPT on trunk
[ https://issues.apache.org/jira/browse/LUCENE-3023?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13019814#comment-13019814 ] Michael McCandless commented on LUCENE-3023: Why not just email dev@ when it fails? Since it will soon land I think all should feel pain when it fails ;) Land DWPT on trunk -- Key: LUCENE-3023 URL: https://issues.apache.org/jira/browse/LUCENE-3023 Project: Lucene - Java Issue Type: Task Affects Versions: CSF branch, 4.0 Reporter: Simon Willnauer Assignee: Simon Willnauer Fix For: 4.0 With LUCENE-2956 we have resolved the last remaining issue for LUCENE-2324 so we can proceed landing the DWPT development on trunk soon. I think one of the bigger issues here is to make sure that all JavaDocs for IW etc. are still correct though. I will start going through that first. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3018) Lucene Native Directory implementation need automated build
[ https://issues.apache.org/jira/browse/LUCENE-3018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Thacker updated LUCENE-3018: -- Attachment: cpptasks.jar LUCENE-3018.patch The build.xml now includes a task to convert NativePosixUtil.cpp to NativePosixUtil.so . The task name is called build-native. Command to run the ant task : {code:|borderStyle=solid} ant -lib lucene/lib/cpptasks.jar build-native {code} This requires cpptasks to be installed. I have uploaded cpptasks.jar which needs to be placed in the lucene/lib forlder . Lucene Native Directory implementation need automated build --- Key: LUCENE-3018 URL: https://issues.apache.org/jira/browse/LUCENE-3018 Project: Lucene - Java Issue Type: Wish Components: Build Affects Versions: 4.0 Reporter: Simon Willnauer Assignee: Varun Thacker Priority: Minor Fix For: 4.0 Attachments: LUCENE-3018.patch, LUCENE-3018.patch, cpptasks.jar Currently the native directory impl in contrib/misc require manual action to compile the c code (partially) documented in https://svn.apache.org/repos/asf/lucene/dev/trunk/lucene/contrib/misc/src/java/overview.html yet it would be nice if we had an ant task and documentation for all platforms how to compile them and set up the prerequisites. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2956) Support updateDocument() with DWPTs
[ https://issues.apache.org/jira/browse/LUCENE-2956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13019823#comment-13019823 ] Jason Rutherglen commented on LUCENE-2956: -- {quote}Jason I think nothing prevents you from start working on this again Yet, I think we should freeze the branch now and only allow merging, bug fixes, tests and documentation fixes until we land on trunk. Once we are there we can freely push stuff in the branch again and make it work with seq. ids. {quote} OK, great. I remember now that our main concern was the memory usage of using a short[] (for the seq ids) if the total number of documents is numerous (eg, 10s of millions). Also at some point we'd have double the memory usage when we roll over to the next set, until the previous readers are closed. bq. I think we should freeze the branch now and only allow merging, bug fixes, tests and documentation fixes until we land on trunk Maybe once LUCENE-2312 sequence ids work for deletes, we can look at creating a separate branch that implements seq id deletes for all segments, and compare with the BV approach. Eg, performance, memory usage, and simplicity. Support updateDocument() with DWPTs --- Key: LUCENE-2956 URL: https://issues.apache.org/jira/browse/LUCENE-2956 Project: Lucene - Java Issue Type: Bug Components: Index Affects Versions: Realtime Branch Reporter: Michael Busch Assignee: Simon Willnauer Priority: Minor Fix For: Realtime Branch Attachments: LUCENE-2956.patch, LUCENE-2956.patch With separate DocumentsWriterPerThreads (DWPT) it can currently happen that the delete part of an updateDocument() is flushed and committed separately from the corresponding new document. We need to make sure that updateDocument() is always an atomic operation from a IW.commit() and IW.getReader() perspective. See LUCENE-2324 for more details. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2956) Support updateDocument() with DWPTs
[ https://issues.apache.org/jira/browse/LUCENE-2956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13019827#comment-13019827 ] Simon Willnauer commented on LUCENE-2956: - bq. Maybe once LUCENE-2312 sequence ids work for deletes, we can look at creating a separate branch that implements seq id deletes for all segments, and compare with the BV approach. Eg, performance, memory usage, and simplicity. I don't think we need to create a different branch until then DWPT will be no trunk and we can simply compare to trunk, no? Support updateDocument() with DWPTs --- Key: LUCENE-2956 URL: https://issues.apache.org/jira/browse/LUCENE-2956 Project: Lucene - Java Issue Type: Bug Components: Index Affects Versions: Realtime Branch Reporter: Michael Busch Assignee: Simon Willnauer Priority: Minor Fix For: Realtime Branch Attachments: LUCENE-2956.patch, LUCENE-2956.patch With separate DocumentsWriterPerThreads (DWPT) it can currently happen that the delete part of an updateDocument() is flushed and committed separately from the corresponding new document. We need to make sure that updateDocument() is always an atomic operation from a IW.commit() and IW.getReader() perspective. See LUCENE-2324 for more details. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-2242) Get distinct count of names for a facet field
[ https://issues.apache.org/jira/browse/SOLR-2242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmitry Drozdov updated SOLR-2242: - Attachment: SOLR.2242.solr3.1.patch Thanks for the patch! It also works for version 3.1, just the line numbers differ - attaching the adopted patch for 3.1 just in case. Get distinct count of names for a facet field - Key: SOLR-2242 URL: https://issues.apache.org/jira/browse/SOLR-2242 Project: Solr Issue Type: New Feature Components: Response Writers Affects Versions: 4.0 Reporter: Bill Bell Priority: Minor Fix For: 4.0 Attachments: SOLR.2242.solr3.1.patch, SOLR.2242.v2.patch When returning facet.field=name of field you will get a list of matches for distinct values. This is normal behavior. This patch tells you how many distinct values you have (# of rows). Use with limit=-1 and mincount=1. The feature is called namedistinct. Here is an example: http://localhost:8983/solr/select?q=*:*facet=truefacet.field=manufacet.mincount=1facet.limit=-1f.manu.facet.namedistinct=0facet.field=pricef.price.facet.namedistinct=1 Here is an example on field hgid (without namedistinct): {code} - lst name=facet_fields - lst name=hgid int name=HGPY045FD36D4000A1/int int name=HGPY0FBC6690453A91/int int name=HGPY1E44ED6C4FB3B1/int int name=HGPY1FA631034A1B81/int int name=HGPY3317ABAC43B481/int int name=HGPY3A17B2294CB5A5/int int name=HGPY3ADD2B3D48C391/int /lst /lst {code} With namedistinct (HGPY045FD36D4000A, HGPY0FBC6690453A9, HGPY1E44ED6C4FB3B, HGPY1FA631034A1B8, HGPY3317ABAC43B48, HGPY3A17B2294CB5A, HGPY3ADD2B3D48C39). This returns number of rows (7), not the number of values (11). {code} - lst name=facet_fields - lst name=hgid int name=_count_7/int /lst /lst {code} This works actually really good to get total number of fields for a group.field=hgid. Enjoy! -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3018) Lucene Native Directory implementation need automated build
[ https://issues.apache.org/jira/browse/LUCENE-3018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13019833#comment-13019833 ] Steven Rowe commented on LUCENE-3018: - When {{cpptasks.jar}} is committed to the Lucene source tree, it should have a version number included in its name. E.g., if the jar was built from the 1.0b5 sources, the committed jar should be named {{cpptasks-1.0b5.jar}}. Varun, where did you get the {{cpptasks.jar}} from? If you build it yourself, please use a Java 1.5 JDK, to insure it will be compatible with 1.5 JVMs. Lucene Native Directory implementation need automated build --- Key: LUCENE-3018 URL: https://issues.apache.org/jira/browse/LUCENE-3018 Project: Lucene - Java Issue Type: Wish Components: Build Affects Versions: 4.0 Reporter: Simon Willnauer Assignee: Varun Thacker Priority: Minor Fix For: 4.0 Attachments: LUCENE-3018.patch, LUCENE-3018.patch, cpptasks.jar Currently the native directory impl in contrib/misc require manual action to compile the c code (partially) documented in https://svn.apache.org/repos/asf/lucene/dev/trunk/lucene/contrib/misc/src/java/overview.html yet it would be nice if we had an ant task and documentation for all platforms how to compile them and set up the prerequisites. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3018) Lucene Native Directory implementation need automated build
[ https://issues.apache.org/jira/browse/LUCENE-3018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13019839#comment-13019839 ] Varun Thacker commented on LUCENE-3018: --- Sorry for not being clear about it. I should have named it cpptasks-1.0b4.jar. I did not build it myself but used the one provided on the ant-contrib development page. Link to cpptasks-1.0b4 : http://sourceforge.net/projects/ant-contrib/files/ant-contrib/cpptasks-1.0-beta4/ Should I upload the LICENSE file which came with it ? Lucene Native Directory implementation need automated build --- Key: LUCENE-3018 URL: https://issues.apache.org/jira/browse/LUCENE-3018 Project: Lucene - Java Issue Type: Wish Components: Build Affects Versions: 4.0 Reporter: Simon Willnauer Assignee: Varun Thacker Priority: Minor Fix For: 4.0 Attachments: LUCENE-3018.patch, LUCENE-3018.patch, cpptasks.jar Currently the native directory impl in contrib/misc require manual action to compile the c code (partially) documented in https://svn.apache.org/repos/asf/lucene/dev/trunk/lucene/contrib/misc/src/java/overview.html yet it would be nice if we had an ant task and documentation for all platforms how to compile them and set up the prerequisites. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[HUDSON] Lucene-Solr-tests-only-trunk - Build # 7104 - Failure
Build: https://hudson.apache.org/hudson/job/Lucene-Solr-tests-only-trunk/7104/ 4 tests failed. REGRESSION: org.apache.solr.cloud.BasicDistributedZkTest.testDistribSearch Error Message: KeeperErrorCode = ConnectionLoss for /collections/collection1/shards Stack Trace: org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /collections/collection1/shards at org.apache.zookeeper.KeeperException.create(KeeperException.java:90) at org.apache.zookeeper.KeeperException.create(KeeperException.java:42) at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:637) at org.apache.solr.common.cloud.SolrZkClient.makePath(SolrZkClient.java:347) at org.apache.solr.common.cloud.SolrZkClient.makePath(SolrZkClient.java:308) at org.apache.solr.common.cloud.SolrZkClient.makePath(SolrZkClient.java:290) at org.apache.solr.common.cloud.SolrZkClient.makePath(SolrZkClient.java:260) at org.apache.solr.cloud.AbstractZkTestCase.buildZooKeeper(AbstractZkTestCase.java:80) at org.apache.solr.cloud.AbstractDistributedZkTestCase.setUp(AbstractDistributedZkTestCase.java:47) at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1232) at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1160) FAILED: junit.framework.TestSuite.org.apache.solr.cloud.BasicZkTest Error Message: KeeperErrorCode = ConnectionLoss for /solr Stack Trace: org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /solr at org.apache.zookeeper.KeeperException.create(KeeperException.java:90) at org.apache.zookeeper.KeeperException.create(KeeperException.java:42) at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:637) at org.apache.solr.common.cloud.SolrZkClient.makePath(SolrZkClient.java:347) at org.apache.solr.common.cloud.SolrZkClient.makePath(SolrZkClient.java:308) at org.apache.solr.common.cloud.SolrZkClient.makePath(SolrZkClient.java:290) at org.apache.solr.common.cloud.SolrZkClient.makePath(SolrZkClient.java:255) at org.apache.solr.cloud.AbstractZkTestCase.buildZooKeeper(AbstractZkTestCase.java:72) at org.apache.solr.cloud.AbstractZkTestCase.azt_beforeClass(AbstractZkTestCase.java:62) REGRESSION: org.apache.solr.cloud.CloudStateUpdateTest.testCoreRegistration Error Message: null Stack Trace: org.apache.solr.common.cloud.ZooKeeperException: at org.apache.solr.core.CoreContainer.register(CoreContainer.java:517) at org.apache.solr.core.CoreContainer.register(CoreContainer.java:545) at org.apache.solr.cloud.CloudStateUpdateTest.testCoreRegistration(CloudStateUpdateTest.java:156) at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1232) at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1160) Caused by: org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /collections/testcore/shards/lucene.zones.apache.org:1661_solr_testcore at org.apache.zookeeper.KeeperException.create(KeeperException.java:90) at org.apache.zookeeper.KeeperException.create(KeeperException.java:42) at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:637) at org.apache.solr.common.cloud.SolrZkClient.makePath(SolrZkClient.java:347) at org.apache.solr.common.cloud.SolrZkClient.makePath(SolrZkClient.java:308) at org.apache.solr.common.cloud.SolrZkClient.makePath(SolrZkClient.java:370) at org.apache.solr.cloud.ZkController.addZkShardsNode(ZkController.java:155) at org.apache.solr.cloud.ZkController.register(ZkController.java:481) at org.apache.solr.core.CoreContainer.register(CoreContainer.java:508) REGRESSION: org.apache.solr.cloud.ZkSolrClientTest.testMakeRootNode Error Message: KeeperErrorCode = ConnectionLoss for /solr Stack Trace: org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /solr at org.apache.zookeeper.KeeperException.create(KeeperException.java:90) at org.apache.zookeeper.KeeperException.create(KeeperException.java:42) at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:637) at org.apache.solr.common.cloud.SolrZkClient.makePath(SolrZkClient.java:347) at org.apache.solr.common.cloud.SolrZkClient.makePath(SolrZkClient.java:308) at org.apache.solr.common.cloud.SolrZkClient.makePath(SolrZkClient.java:290) at org.apache.solr.common.cloud.SolrZkClient.makePath(SolrZkClient.java:255) at org.apache.solr.cloud.AbstractZkTestCase.makeSolrZkNode(AbstractZkTestCase.java:128) at org.apache.solr.cloud.ZkSolrClientTest.testMakeRootNode(ZkSolrClientTest.java:57) at
[jira] [Commented] (LUCENE-3018) Lucene Native Directory implementation need automated build
[ https://issues.apache.org/jira/browse/LUCENE-3018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13019848#comment-13019848 ] Simon Willnauer commented on LUCENE-3018: - hey varun, here are some comments do we need {code} property environment=env/ property name=jni1 location=${env.JAVA_HOME}/include / property name=jni2 location=${env.JAVA_HOME}/include/linux / {code} or can we simply use {code} includepath pathelement location=${java.home}/include/ pathelement location=${java.home}/lnclude/linux/ /includepath {code} instead of using {code}fileset file=src/java/org/apache/lucene/store/NativePosixUtil.cpp / {code} we should rather using {code}fileset file=${src.dir}/org/apache/lucene/store/NativePosixUtil.cpp / {code} I wonder if we really want to put the build .so file into src/java/org/apache/lucene/store/NativePosixUtil (outfile) or if this should rather be build into ${common.build.dir} that way it would be cleaned up too something like this: {code} mkdir dir=${common.build.dir}/native/ cpptasks:cc outtype=shared subsystem=console outfile=${common.build.dir}/native/NativePosixUtil {code} Do we need to specify gcc as the compiler? afaik its default so we might can let it choose the default? I also wonder what happens if the java.home points to a $JAVA_HOME/jre directory and not to $JAVA_HOME directly in such a case we need to include ${java.home}/../include etc. maybe we need to specify the path based on a condition? it would be great if we had a way to test that the native lib works so maybe we wanna check that too with a small testcase? simon Lucene Native Directory implementation need automated build --- Key: LUCENE-3018 URL: https://issues.apache.org/jira/browse/LUCENE-3018 Project: Lucene - Java Issue Type: Wish Components: Build Affects Versions: 4.0 Reporter: Simon Willnauer Assignee: Varun Thacker Priority: Minor Fix For: 4.0 Attachments: LUCENE-3018.patch, LUCENE-3018.patch, cpptasks.jar Currently the native directory impl in contrib/misc require manual action to compile the c code (partially) documented in https://svn.apache.org/repos/asf/lucene/dev/trunk/lucene/contrib/misc/src/java/overview.html yet it would be nice if we had an ant task and documentation for all platforms how to compile them and set up the prerequisites. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Issue Comment Edited] (LUCENE-3018) Lucene Native Directory implementation need automated build
[ https://issues.apache.org/jira/browse/LUCENE-3018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13019848#comment-13019848 ] Simon Willnauer edited comment on LUCENE-3018 at 4/14/11 2:33 PM: -- hey varun, here are some comments do we need {code} property environment=env/ property name=jni1 location=${env.JAVA_HOME}/include / property name=jni2 location=${env.JAVA_HOME}/include/linux / {code} or can we simply use {code} includepath pathelement location=${java.home}/include/ pathelement location=${java.home}/lnclude/linux/ /includepath {code} instead of using {code}fileset file=src/java/org/apache/lucene/store/NativePosixUtil.cpp / {code} we should rather using {code}fileset file=${src.dir}/org/apache/lucene/store/NativePosixUtil.cpp / {code} I wonder if we really want to put the build .so file into src/java/org/apache/lucene/store/NativePosixUtil (outfile) or if this should rather be build into ${common.build.dir} that way it would be cleaned up too something like this: mkdir dir=${common.build.dir}/native/ cpptasks:cc outtype=shared subsystem=console outfile=${common.build.dir}/native/NativePosixUtil Do we need to specify gcc as the compiler? afaik its default so we might can let it choose the default? I also wonder what happens if the java.home points to a $JAVA_HOME/jre directory and not to $JAVA_HOME directly in such a case we need to include ${java.home}/../include etc. maybe we need to specify the path based on a condition? it would be great if we had a way to test that the native lib works so maybe we wanna check that too with a small testcase? simon was (Author: simonw): hey varun, here are some comments do we need {code} property environment=env/ property name=jni1 location=${env.JAVA_HOME}/include / property name=jni2 location=${env.JAVA_HOME}/include/linux / {code} or can we simply use {code} includepath pathelement location=${java.home}/include/ pathelement location=${java.home}/lnclude/linux/ /includepath {code} instead of using {code}fileset file=src/java/org/apache/lucene/store/NativePosixUtil.cpp / {code} we should rather using {code}fileset file=${src.dir}/org/apache/lucene/store/NativePosixUtil.cpp / {code} I wonder if we really want to put the build .so file into src/java/org/apache/lucene/store/NativePosixUtil (outfile) or if this should rather be build into ${common.build.dir} that way it would be cleaned up too something like this: {code} mkdir dir=${common.build.dir}/native/ cpptasks:cc outtype=shared subsystem=console outfile=${common.build.dir}/native/NativePosixUtil {code} Do we need to specify gcc as the compiler? afaik its default so we might can let it choose the default? I also wonder what happens if the java.home points to a $JAVA_HOME/jre directory and not to $JAVA_HOME directly in such a case we need to include ${java.home}/../include etc. maybe we need to specify the path based on a condition? it would be great if we had a way to test that the native lib works so maybe we wanna check that too with a small testcase? simon Lucene Native Directory implementation need automated build --- Key: LUCENE-3018 URL: https://issues.apache.org/jira/browse/LUCENE-3018 Project: Lucene - Java Issue Type: Wish Components: Build Affects Versions: 4.0 Reporter: Simon Willnauer Assignee: Varun Thacker Priority: Minor Fix For: 4.0 Attachments: LUCENE-3018.patch, LUCENE-3018.patch, cpptasks.jar Currently the native directory impl in contrib/misc require manual action to compile the c code (partially) documented in https://svn.apache.org/repos/asf/lucene/dev/trunk/lucene/contrib/misc/src/java/overview.html yet it would be nice if we had an ant task and documentation for all platforms how to compile them and set up the prerequisites. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3023) Land DWPT on trunk
[ https://issues.apache.org/jira/browse/LUCENE-3023?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13019852#comment-13019852 ] Simon Willnauer commented on LUCENE-3023: - bq. Why not just email dev@ when it fails? Since it will soon land I think all should feel pain when it fails true, done! Land DWPT on trunk -- Key: LUCENE-3023 URL: https://issues.apache.org/jira/browse/LUCENE-3023 Project: Lucene - Java Issue Type: Task Affects Versions: CSF branch, 4.0 Reporter: Simon Willnauer Assignee: Simon Willnauer Fix For: 4.0 With LUCENE-2956 we have resolved the last remaining issue for LUCENE-2324 so we can proceed landing the DWPT development on trunk soon. I think one of the bigger issues here is to make sure that all JavaDocs for IW etc. are still correct though. I will start going through that first. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[HUDSON] Lucene-Solr-tests-only-realtime_search-branch - Build # 2 - Still Failing
Build: https://hudson.apache.org/hudson/job/Lucene-Solr-tests-only-realtime_search-branch/2/ 1 tests failed. REGRESSION: org.apache.lucene.index.TestIndexWriterDelete.testUpdatesOnDiskFull Error Message: fake disk full at 13517 bytes when writing _0_1.del (file length=0; wrote 10 of 20 bytes) Stack Trace: java.io.IOException: fake disk full at 13517 bytes when writing _0_1.del (file length=0; wrote 10 of 20 bytes) at org.apache.lucene.store.MockIndexOutputWrapper.writeBytes(MockIndexOutputWrapper.java:111) at org.apache.lucene.store.DataOutput.writeBytes(DataOutput.java:43) at org.apache.lucene.util.BitVector.writeBits(BitVector.java:182) at org.apache.lucene.util.BitVector.write(BitVector.java:171) at org.apache.lucene.index.SegmentReader.commitChanges(SegmentReader.java:718) at org.apache.lucene.index.SegmentReader.doCommit(SegmentReader.java:696) at org.apache.lucene.index.IndexWriter$ReaderPool.commit(IndexWriter.java:572) at org.apache.lucene.index.IndexWriter.startCommit(IndexWriter.java:3597) at org.apache.lucene.index.IndexWriter.prepareCommit(IndexWriter.java:2466) at org.apache.lucene.index.IndexWriter.commitInternal(IndexWriter.java:2537) at org.apache.lucene.index.IndexWriter.closeInternal(IndexWriter.java:1067) at org.apache.lucene.index.IndexWriter.rollbackInternal(IndexWriter.java:1923) at org.apache.lucene.index.IndexWriter.rollback(IndexWriter.java:1848) at org.apache.lucene.index.TestIndexWriterDelete.doTestOperationsOnDiskFull(TestIndexWriterDelete.java:545) at org.apache.lucene.index.TestIndexWriterDelete.testUpdatesOnDiskFull(TestIndexWriterDelete.java:409) at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1226) at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1154) Build Log (for compile errors): [...truncated 3190 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-2571) Indexing performance tests with realtime branch
[ https://issues.apache.org/jira/browse/LUCENE-2571?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Willnauer updated LUCENE-2571: Attachment: wikimedium.trunk.Standard.nd10M_dps_addDocuments.png wikimedium.trunk.Standard.nd10M_dps.png wikimedium.realtime.Standard.nd10M_dps_addDocuments_flush.png wikimedium.realtime.Standard.nd10M_dps_addDocuments.png wikimedium.realtime.Standard.nd10M_dps.png benchmarks charts attached Indexing performance tests with realtime branch --- Key: LUCENE-2571 URL: https://issues.apache.org/jira/browse/LUCENE-2571 Project: Lucene - Java Issue Type: Task Components: Index Reporter: Michael Busch Priority: Minor Fix For: Realtime Branch Attachments: wikimedium.realtime.Standard.nd10M_dps.png, wikimedium.realtime.Standard.nd10M_dps_addDocuments.png, wikimedium.realtime.Standard.nd10M_dps_addDocuments_flush.png, wikimedium.trunk.Standard.nd10M_dps.png, wikimedium.trunk.Standard.nd10M_dps_addDocuments.png We should run indexing performance tests with the DWPT changes and compare to trunk. We need to test both single-threaded and multi-threaded performance. NOTE: flush by RAM isn't implemented just yet, so either we wait with the tests or flush by doc count. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3018) Lucene Native Directory implementation need automated build
[ https://issues.apache.org/jira/browse/LUCENE-3018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Thacker updated LUCENE-3018: -- Attachment: LUCENE-3018.patch I have made the changes which were mentioned by Simon. I have changed the way JNI header files are included: {code:title=JNI header includes|borderStyle=solid} includepath pathelement location=${java.home}/../include/ pathelement location=${java.home}/../include/linux/ for/includepath {code} The reason being when I echoed java.home it's path was : {code:title=path|borderStyle=solid} /usr/lib/jvm/java-6-sun-1.6.0.24/jvm {code} Changed the path convention to: {code} {code:borderStyle=solid} fileset file=${src.dir}/org/apache/lucene/store/NativePosixUtil.cpp / {code} The directory for the shared library is now: {code} {code:title=Shared File Directory|borderStyle=solid} lucene/build/native/ {code} I have explicitly specified GCC as a compiler so that in future when Windows is also incorporated it would be needed. I will write a small test case to see whether the .so file being built is working fine. Lucene Native Directory implementation need automated build --- Key: LUCENE-3018 URL: https://issues.apache.org/jira/browse/LUCENE-3018 Project: Lucene - Java Issue Type: Wish Components: Build Affects Versions: 4.0 Reporter: Simon Willnauer Assignee: Varun Thacker Priority: Minor Fix For: 4.0 Attachments: LUCENE-3018.patch, LUCENE-3018.patch, LUCENE-3018.patch, cpptasks.jar Currently the native directory impl in contrib/misc require manual action to compile the c code (partially) documented in https://svn.apache.org/repos/asf/lucene/dev/trunk/lucene/contrib/misc/src/java/overview.html yet it would be nice if we had an ant task and documentation for all platforms how to compile them and set up the prerequisites. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Issue Comment Edited] (LUCENE-3018) Lucene Native Directory implementation need automated build
[ https://issues.apache.org/jira/browse/LUCENE-3018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13019861#comment-13019861 ] Varun Thacker edited comment on LUCENE-3018 at 4/14/11 3:13 PM: I have made the changes which were mentioned by Simon. I have changed the way JNI header files are included: {code:title=JNI header includes|borderStyle=solid} includepath pathelement location=${java.home}/../include/ pathelement location=${java.home}/../include/linux/ /includepath {code} The reason being when I echoed java.home it's path was : {code:title=path|borderStyle=solid} /usr/lib/jvm/java-6-sun-1.6.0.24/jvm {code} Changed the path convention to: {code:borderStyle=solid} fileset file=${src.dir}/org/apache/lucene/store/NativePosixUtil.cpp / {code} The directory for the shared library is now: {code:title=Shared File Directory|borderStyle=solid} lucene/build/native/ {code} I have explicitly specified GCC as a compiler so that in future when Windows is also incorporated it would be needed. I will write a small test case to see whether the .so file being built is working fine. was (Author: varunthacker): I have made the changes which were mentioned by Simon. I have changed the way JNI header files are included: {code:title=JNI header includes|borderStyle=solid} includepath pathelement location=${java.home}/../include/ pathelement location=${java.home}/../include/linux/ for/includepath {code} The reason being when I echoed java.home it's path was : {code:title=path|borderStyle=solid} /usr/lib/jvm/java-6-sun-1.6.0.24/jvm {code} Changed the path convention to: {code} {code:borderStyle=solid} fileset file=${src.dir}/org/apache/lucene/store/NativePosixUtil.cpp / {code} The directory for the shared library is now: {code} {code:title=Shared File Directory|borderStyle=solid} lucene/build/native/ {code} I have explicitly specified GCC as a compiler so that in future when Windows is also incorporated it would be needed. I will write a small test case to see whether the .so file being built is working fine. Lucene Native Directory implementation need automated build --- Key: LUCENE-3018 URL: https://issues.apache.org/jira/browse/LUCENE-3018 Project: Lucene - Java Issue Type: Wish Components: Build Affects Versions: 4.0 Reporter: Simon Willnauer Assignee: Varun Thacker Priority: Minor Fix For: 4.0 Attachments: LUCENE-3018.patch, LUCENE-3018.patch, LUCENE-3018.patch, cpptasks.jar Currently the native directory impl in contrib/misc require manual action to compile the c code (partially) documented in https://svn.apache.org/repos/asf/lucene/dev/trunk/lucene/contrib/misc/src/java/overview.html yet it would be nice if we had an ant task and documentation for all platforms how to compile them and set up the prerequisites. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Assigned] (LUCENE-3026) smartcn analyzer throw NullPointer exception when the length of analysed text over 32767
[ https://issues.apache.org/jira/browse/LUCENE-3026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir reassigned LUCENE-3026: --- Assignee: Robert Muir smartcn analyzer throw NullPointer exception when the length of analysed text over 32767 Key: LUCENE-3026 URL: https://issues.apache.org/jira/browse/LUCENE-3026 Project: Lucene - Java Issue Type: Bug Components: contrib/analyzers Affects Versions: 3.1, 4.0 Reporter: wangzhenghang Assignee: Robert Muir Attachments: LUCENE-3026.patch That's all because of org.apache.lucene.analysis.cn.smart.hhmm.SegGraph's makeIndex() method: public ListSegToken makeIndex() { ListSegToken result = new ArrayListSegToken(); int s = -1, count = 0, size = tokenListTable.size(); ListSegToken tokenList; short index = 0; while (count size) { if (isStartExist(s)) { tokenList = tokenListTable.get(s); for (SegToken st : tokenList) { st.index = index; result.add(st); index++; } count++; } s++; } return result; } here 'short index = 0;' should be 'int index = 0;'. And that's reported here http://code.google.com/p/imdict-chinese-analyzer/issues/detail?id=2 and http://code.google.com/p/imdict-chinese-analyzer/issues/detail?id=11, the author XiaoPingGao have already fixed this bug:http://code.google.com/p/imdict-chinese-analyzer/source/browse/trunk/src/org/apache/lucene/analysis/cn/smart/hhmm/SegGraph.java -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3018) Lucene Native Directory implementation need automated build
[ https://issues.apache.org/jira/browse/LUCENE-3018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13019863#comment-13019863 ] Simon Willnauer commented on LUCENE-3018: - bq. I will write a small test case to see whether the .so file being built is working fine. awesome! :) Lucene Native Directory implementation need automated build --- Key: LUCENE-3018 URL: https://issues.apache.org/jira/browse/LUCENE-3018 Project: Lucene - Java Issue Type: Wish Components: Build Affects Versions: 4.0 Reporter: Simon Willnauer Assignee: Varun Thacker Priority: Minor Fix For: 4.0 Attachments: LUCENE-3018.patch, LUCENE-3018.patch, LUCENE-3018.patch, cpptasks.jar Currently the native directory impl in contrib/misc require manual action to compile the c code (partially) documented in https://svn.apache.org/repos/asf/lucene/dev/trunk/lucene/contrib/misc/src/java/overview.html yet it would be nice if we had an ant task and documentation for all platforms how to compile them and set up the prerequisites. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [HUDSON] Lucene-Solr-tests-only-realtime_search-branch - Build # 2 - Still Failing
I just committed a fix for this On Thu, Apr 14, 2011 at 4:47 PM, Apache Hudson Server hud...@hudson.apache.org wrote: Build: https://hudson.apache.org/hudson/job/Lucene-Solr-tests-only-realtime_search-branch/2/ 1 tests failed. REGRESSION: org.apache.lucene.index.TestIndexWriterDelete.testUpdatesOnDiskFull Error Message: fake disk full at 13517 bytes when writing _0_1.del (file length=0; wrote 10 of 20 bytes) Stack Trace: java.io.IOException: fake disk full at 13517 bytes when writing _0_1.del (file length=0; wrote 10 of 20 bytes) at org.apache.lucene.store.MockIndexOutputWrapper.writeBytes(MockIndexOutputWrapper.java:111) at org.apache.lucene.store.DataOutput.writeBytes(DataOutput.java:43) at org.apache.lucene.util.BitVector.writeBits(BitVector.java:182) at org.apache.lucene.util.BitVector.write(BitVector.java:171) at org.apache.lucene.index.SegmentReader.commitChanges(SegmentReader.java:718) at org.apache.lucene.index.SegmentReader.doCommit(SegmentReader.java:696) at org.apache.lucene.index.IndexWriter$ReaderPool.commit(IndexWriter.java:572) at org.apache.lucene.index.IndexWriter.startCommit(IndexWriter.java:3597) at org.apache.lucene.index.IndexWriter.prepareCommit(IndexWriter.java:2466) at org.apache.lucene.index.IndexWriter.commitInternal(IndexWriter.java:2537) at org.apache.lucene.index.IndexWriter.closeInternal(IndexWriter.java:1067) at org.apache.lucene.index.IndexWriter.rollbackInternal(IndexWriter.java:1923) at org.apache.lucene.index.IndexWriter.rollback(IndexWriter.java:1848) at org.apache.lucene.index.TestIndexWriterDelete.doTestOperationsOnDiskFull(TestIndexWriterDelete.java:545) at org.apache.lucene.index.TestIndexWriterDelete.testUpdatesOnDiskFull(TestIndexWriterDelete.java:409) at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1226) at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1154) Build Log (for compile errors): [...truncated 3190 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3026) smartcn analyzer throw NullPointer exception when the length of analysed text over 32767
[ https://issues.apache.org/jira/browse/LUCENE-3026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-3026: Fix Version/s: 4.0 3.2 smartcn analyzer throw NullPointer exception when the length of analysed text over 32767 Key: LUCENE-3026 URL: https://issues.apache.org/jira/browse/LUCENE-3026 Project: Lucene - Java Issue Type: Bug Components: contrib/analyzers Affects Versions: 3.1, 4.0 Reporter: wangzhenghang Assignee: Robert Muir Fix For: 3.2, 4.0 Attachments: LUCENE-3026.patch That's all because of org.apache.lucene.analysis.cn.smart.hhmm.SegGraph's makeIndex() method: public ListSegToken makeIndex() { ListSegToken result = new ArrayListSegToken(); int s = -1, count = 0, size = tokenListTable.size(); ListSegToken tokenList; short index = 0; while (count size) { if (isStartExist(s)) { tokenList = tokenListTable.get(s); for (SegToken st : tokenList) { st.index = index; result.add(st); index++; } count++; } s++; } return result; } here 'short index = 0;' should be 'int index = 0;'. And that's reported here http://code.google.com/p/imdict-chinese-analyzer/issues/detail?id=2 and http://code.google.com/p/imdict-chinese-analyzer/issues/detail?id=11, the author XiaoPingGao have already fixed this bug:http://code.google.com/p/imdict-chinese-analyzer/source/browse/trunk/src/org/apache/lucene/analysis/cn/smart/hhmm/SegGraph.java -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-2952) Make license checking/maintenance easier/automated
[ https://issues.apache.org/jira/browse/LUCENE-2952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Ingersoll resolved LUCENE-2952. - Resolution: Fixed Make license checking/maintenance easier/automated -- Key: LUCENE-2952 URL: https://issues.apache.org/jira/browse/LUCENE-2952 Project: Lucene - Java Issue Type: Improvement Reporter: Grant Ingersoll Assignee: Grant Ingersoll Priority: Minor Fix For: 3.2, 4.0 Attachments: LUCENE-2952.patch, LUCENE-2952.patch, LUCENE-2952.patch, LUCENE-2952.patch, LUCENE-2952.patch, LUCENE-2952.patch, LUCENE-2952.patch Instead of waiting until release to check licenses are valid, we should make it a part of our build process to ensure that all dependencies have proper licenses, etc. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Setting the max number of merge threads across IndexWriters
Today the ConcurrentMergeScheduler allows setting the max thread count and is bound to a single IndexWriter. However in the [common] case of multiple IndexWriters running in the same process, this disallows one from managing the aggregate number of merge threads executing at any given time. I think this can be fixed, shall I open an issue? - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-3026) smartcn analyzer throw NullPointer exception when the length of analysed text over 32767
[ https://issues.apache.org/jira/browse/LUCENE-3026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir resolved LUCENE-3026. - Resolution: Fixed Committed revision 1092328, 1092338 (branch_3x). Thank you for the patch! smartcn analyzer throw NullPointer exception when the length of analysed text over 32767 Key: LUCENE-3026 URL: https://issues.apache.org/jira/browse/LUCENE-3026 Project: Lucene - Java Issue Type: Bug Components: contrib/analyzers Affects Versions: 3.1, 4.0 Reporter: wangzhenghang Assignee: Robert Muir Fix For: 3.2, 4.0 Attachments: LUCENE-3026.patch That's all because of org.apache.lucene.analysis.cn.smart.hhmm.SegGraph's makeIndex() method: public ListSegToken makeIndex() { ListSegToken result = new ArrayListSegToken(); int s = -1, count = 0, size = tokenListTable.size(); ListSegToken tokenList; short index = 0; while (count size) { if (isStartExist(s)) { tokenList = tokenListTable.get(s); for (SegToken st : tokenList) { st.index = index; result.add(st); index++; } count++; } s++; } return result; } here 'short index = 0;' should be 'int index = 0;'. And that's reported here http://code.google.com/p/imdict-chinese-analyzer/issues/detail?id=2 and http://code.google.com/p/imdict-chinese-analyzer/issues/detail?id=11, the author XiaoPingGao have already fixed this bug:http://code.google.com/p/imdict-chinese-analyzer/source/browse/trunk/src/org/apache/lucene/analysis/cn/smart/hhmm/SegGraph.java -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Setting the max number of merge threads across IndexWriters
I think the proposal involved using a ThreadPoolExecutor, which seemed to not quite work as well as what we have. I think it'll be easier to simply pass a global context that keeps a counter of the actively running threads, and pass that into each IW's CMS? On Thu, Apr 14, 2011 at 8:25 AM, Simon Willnauer simon.willna...@googlemail.com wrote: On Thu, Apr 14, 2011 at 5:20 PM, Jason Rutherglen jason.rutherg...@gmail.com wrote: Today the ConcurrentMergeScheduler allows setting the max thread count and is bound to a single IndexWriter. However in the [common] case of multiple IndexWriters running in the same process, this disallows one from managing the aggregate number of merge threads executing at any given time. I think this can be fixed, shall I open an issue? go ahead! I think I have seen this suggestion somewhere maybe you need to see if there is one already simon - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS-MAVEN] Lucene-Solr-Maven-trunk #89: POMs out of sync
Build: https://hudson.apache.org/hudson/job/Lucene-Solr-Maven-trunk/89/ No tests ran. Build Log (for compile errors): [...truncated 50 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Assigned] (LUCENE-3022) DictionaryCompoundWordTokenFilter Flag onlyLongestMatch has no affect
[ https://issues.apache.org/jira/browse/LUCENE-3022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir reassigned LUCENE-3022: --- Assignee: Robert Muir DictionaryCompoundWordTokenFilter Flag onlyLongestMatch has no affect - Key: LUCENE-3022 URL: https://issues.apache.org/jira/browse/LUCENE-3022 Project: Lucene - Java Issue Type: Bug Components: contrib/analyzers Affects Versions: 2.9.4, 3.1 Reporter: Johann Höchtl Assignee: Robert Muir Priority: Minor Attachments: LUCENE-3022.patch Original Estimate: 5m Remaining Estimate: 5m When using the DictionaryCompoundWordTokenFilter with a german dictionary, I got a strange behaviour: The german word streifenbluse (blouse with stripes) was decompounded to streifen (stripe),reifen(tire) which makes no sense at all. I thought the flag onlyLongestMatch would fix this, because streifen is longer than reifen, but it had no effect. So I reviewed the sourcecode and found the problem: [code] protected void decomposeInternal(final Token token) { // Only words longer than minWordSize get processed if (token.length() this.minWordSize) { return; } char[] lowerCaseTermBuffer=makeLowerCaseCopy(token.buffer()); for (int i=0;itoken.length()-this.minSubwordSize;++i) { Token longestMatchToken=null; for (int j=this.minSubwordSize-1;jthis.maxSubwordSize;++j) { if(i+jtoken.length()) { break; } if(dictionary.contains(lowerCaseTermBuffer, i, j)) { if (this.onlyLongestMatch) { if (longestMatchToken!=null) { if (longestMatchToken.length()j) { longestMatchToken=createToken(i,j,token); } } else { longestMatchToken=createToken(i,j,token); } } else { tokens.add(createToken(i,j,token)); } } } if (this.onlyLongestMatch longestMatchToken!=null) { tokens.add(longestMatchToken); } } } [/code] should be changed to [code] protected void decomposeInternal(final Token token) { // Only words longer than minWordSize get processed if (token.termLength() this.minWordSize) { return; } char[] lowerCaseTermBuffer=makeLowerCaseCopy(token.termBuffer()); Token longestMatchToken=null; for (int i=0;itoken.termLength()-this.minSubwordSize;++i) { for (int j=this.minSubwordSize-1;jthis.maxSubwordSize;++j) { if(i+jtoken.termLength()) { break; } if(dictionary.contains(lowerCaseTermBuffer, i, j)) { if (this.onlyLongestMatch) { if (longestMatchToken!=null) { if (longestMatchToken.termLength()j) { longestMatchToken=createToken(i,j,token); } } else { longestMatchToken=createToken(i,j,token); } } else { tokens.add(createToken(i,j,token)); } } } } if (this.onlyLongestMatch longestMatchToken!=null) { tokens.add(longestMatchToken); } } [/code] So, that only the longest token is really indexed and the onlyLongestMatch Flag makes sense. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3022) DictionaryCompoundWordTokenFilter Flag onlyLongestMatch has no affect
[ https://issues.apache.org/jira/browse/LUCENE-3022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-3022: Fix Version/s: 4.0 3.2 DictionaryCompoundWordTokenFilter Flag onlyLongestMatch has no affect - Key: LUCENE-3022 URL: https://issues.apache.org/jira/browse/LUCENE-3022 Project: Lucene - Java Issue Type: Bug Components: contrib/analyzers Affects Versions: 2.9.4, 3.1 Reporter: Johann Höchtl Assignee: Robert Muir Priority: Minor Fix For: 3.2, 4.0 Attachments: LUCENE-3022.patch Original Estimate: 5m Remaining Estimate: 5m When using the DictionaryCompoundWordTokenFilter with a german dictionary, I got a strange behaviour: The german word streifenbluse (blouse with stripes) was decompounded to streifen (stripe),reifen(tire) which makes no sense at all. I thought the flag onlyLongestMatch would fix this, because streifen is longer than reifen, but it had no effect. So I reviewed the sourcecode and found the problem: [code] protected void decomposeInternal(final Token token) { // Only words longer than minWordSize get processed if (token.length() this.minWordSize) { return; } char[] lowerCaseTermBuffer=makeLowerCaseCopy(token.buffer()); for (int i=0;itoken.length()-this.minSubwordSize;++i) { Token longestMatchToken=null; for (int j=this.minSubwordSize-1;jthis.maxSubwordSize;++j) { if(i+jtoken.length()) { break; } if(dictionary.contains(lowerCaseTermBuffer, i, j)) { if (this.onlyLongestMatch) { if (longestMatchToken!=null) { if (longestMatchToken.length()j) { longestMatchToken=createToken(i,j,token); } } else { longestMatchToken=createToken(i,j,token); } } else { tokens.add(createToken(i,j,token)); } } } if (this.onlyLongestMatch longestMatchToken!=null) { tokens.add(longestMatchToken); } } } [/code] should be changed to [code] protected void decomposeInternal(final Token token) { // Only words longer than minWordSize get processed if (token.termLength() this.minWordSize) { return; } char[] lowerCaseTermBuffer=makeLowerCaseCopy(token.termBuffer()); Token longestMatchToken=null; for (int i=0;itoken.termLength()-this.minSubwordSize;++i) { for (int j=this.minSubwordSize-1;jthis.maxSubwordSize;++j) { if(i+jtoken.termLength()) { break; } if(dictionary.contains(lowerCaseTermBuffer, i, j)) { if (this.onlyLongestMatch) { if (longestMatchToken!=null) { if (longestMatchToken.termLength()j) { longestMatchToken=createToken(i,j,token); } } else { longestMatchToken=createToken(i,j,token); } } else { tokens.add(createToken(i,j,token)); } } } } if (this.onlyLongestMatch longestMatchToken!=null) { tokens.add(longestMatchToken); } } [/code] So, that only the longest token is really indexed and the onlyLongestMatch Flag makes sense. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Setting the max number of merge threads across IndexWriters
I proposed to decouple MergeScheduler from IW (stop keeping a reference to it). Then you can create a single CMS and pass it to all your IWs. On Thu, Apr 14, 2011 at 19:40, Jason Rutherglen jason.rutherg...@gmail.com wrote: I think the proposal involved using a ThreadPoolExecutor, which seemed to not quite work as well as what we have. I think it'll be easier to simply pass a global context that keeps a counter of the actively running threads, and pass that into each IW's CMS? On Thu, Apr 14, 2011 at 8:25 AM, Simon Willnauer simon.willna...@googlemail.com wrote: On Thu, Apr 14, 2011 at 5:20 PM, Jason Rutherglen jason.rutherg...@gmail.com wrote: Today the ConcurrentMergeScheduler allows setting the max thread count and is bound to a single IndexWriter. However in the [common] case of multiple IndexWriters running in the same process, this disallows one from managing the aggregate number of merge threads executing at any given time. I think this can be fixed, shall I open an issue? go ahead! I think I have seen this suggestion somewhere maybe you need to see if there is one already simon - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org -- Kirill Zakharenko/Кирилл Захаренко E-Mail/Jabber: ear...@gmail.com Phone: +7 (495) 683-567-4 ICQ: 104465785 - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Setting the max number of merge threads across IndexWriters
On Thu, Apr 14, 2011 at 5:52 PM, Earwin Burrfoot ear...@gmail.com wrote: I proposed to decouple MergeScheduler from IW (stop keeping a reference to it). Then you can create a single CMS and pass it to all your IWs. Yep that was it... is there an issue for this? simon On Thu, Apr 14, 2011 at 19:40, Jason Rutherglen jason.rutherg...@gmail.com wrote: I think the proposal involved using a ThreadPoolExecutor, which seemed to not quite work as well as what we have. I think it'll be easier to simply pass a global context that keeps a counter of the actively running threads, and pass that into each IW's CMS? On Thu, Apr 14, 2011 at 8:25 AM, Simon Willnauer simon.willna...@googlemail.com wrote: On Thu, Apr 14, 2011 at 5:20 PM, Jason Rutherglen jason.rutherg...@gmail.com wrote: Today the ConcurrentMergeScheduler allows setting the max thread count and is bound to a single IndexWriter. However in the [common] case of multiple IndexWriters running in the same process, this disallows one from managing the aggregate number of merge threads executing at any given time. I think this can be fixed, shall I open an issue? go ahead! I think I have seen this suggestion somewhere maybe you need to see if there is one already simon - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org -- Kirill Zakharenko/Кирилл Захаренко E-Mail/Jabber: ear...@gmail.com Phone: +7 (495) 683-567-4 ICQ: 104465785 - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-3029) MultiPhraseQuery assigns different scores to identical docs when using 0 pos-incr
MultiPhraseQuery assigns different scores to identical docs when using 0 pos-incr - Key: LUCENE-3029 URL: https://issues.apache.org/jira/browse/LUCENE-3029 Project: Lucene - Java Issue Type: Bug Reporter: Michael McCandless Assignee: Michael McCandless Priority: Minor Fix For: 3.0.4, 3.2, 4.0 If you have two identical docs with tokens a b c all zero pos-incr (ie they occur on the same position), and you run a MultiPhraseQuery with [a, b] and [c] (all pos incr 0)... then the two docs will get different scores despite being identical. Admittedly it's a strange query... but I think the scorer ought to count the phrase as having tf=1 for each doc. The problem is that we are missing a tie-breaker for the PhraseQuery used by ExactPhraseScorer, and so the PQ ends up flip/flopping such that every other document gets the same score. Ie, even docIDs all get one score and odd docIDs all get another score. Once I added the hard tie-breaker (ord) the scores are the same. However... there's a separate bug, that can over-count the tf, such that if I create the MPQ like this: {noformat} mpq.add(new Term[] {new Term(field, a)}, 0); mpq.add(new Term[] {new Term(field, b), new Term(field, c)}, 0); {noformat} I get tf=2 per doc, but if I create it like this: {noformat} mpq.add(new Term[] {new Term(field, b), new Term(field, c)}, 0); mpq.add(new Term[] {new Term(field, a)}, 0); {noformat} I get tf=1 (which I think is correct?). This happens because MultipleTermPositions freely returns the same position more than once: it just unions the positions of the two streams, so when both have their term at pos=0, you'll get pos=0 twice, which is not good and leads to over-counting tf. Unfortunately, I don't see a performant way to fix that... and I'm not sure that it really matters that much in practice. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3029) MultiPhraseQuery assigns different scores to identical docs when using 0 pos-incr
[ https://issues.apache.org/jira/browse/LUCENE-3029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-3029: --- Attachment: LUCENE-3029.patch Patch. MultiPhraseQuery assigns different scores to identical docs when using 0 pos-incr - Key: LUCENE-3029 URL: https://issues.apache.org/jira/browse/LUCENE-3029 Project: Lucene - Java Issue Type: Bug Reporter: Michael McCandless Assignee: Michael McCandless Priority: Minor Fix For: 3.0.4, 3.2, 4.0 Attachments: LUCENE-3029.patch If you have two identical docs with tokens a b c all zero pos-incr (ie they occur on the same position), and you run a MultiPhraseQuery with [a, b] and [c] (all pos incr 0)... then the two docs will get different scores despite being identical. Admittedly it's a strange query... but I think the scorer ought to count the phrase as having tf=1 for each doc. The problem is that we are missing a tie-breaker for the PhraseQuery used by ExactPhraseScorer, and so the PQ ends up flip/flopping such that every other document gets the same score. Ie, even docIDs all get one score and odd docIDs all get another score. Once I added the hard tie-breaker (ord) the scores are the same. However... there's a separate bug, that can over-count the tf, such that if I create the MPQ like this: {noformat} mpq.add(new Term[] {new Term(field, a)}, 0); mpq.add(new Term[] {new Term(field, b), new Term(field, c)}, 0); {noformat} I get tf=2 per doc, but if I create it like this: {noformat} mpq.add(new Term[] {new Term(field, b), new Term(field, c)}, 0); mpq.add(new Term[] {new Term(field, a)}, 0); {noformat} I get tf=1 (which I think is correct?). This happens because MultipleTermPositions freely returns the same position more than once: it just unions the positions of the two streams, so when both have their term at pos=0, you'll get pos=0 twice, which is not good and leads to over-counting tf. Unfortunately, I don't see a performant way to fix that... and I'm not sure that it really matters that much in practice. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3022) DictionaryCompoundWordTokenFilter Flag onlyLongestMatch has no affect
[ https://issues.apache.org/jira/browse/LUCENE-3022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-3022: Attachment: LUCENE-3022.patch Hi Johann, in my opinion your patch is completely correct, thanks for fixing this. I noticed though, that a solr test failed because its factory defaults to this value being on (and the previous behavior was broken!!!) Because of this, I propose we default this behavior to off in the Solr factory and add an upgrading note. Previously decompounding in solr defaulted to buggy behavior, but I think by default we should index all compound components (since that seems to be what the desired intended behavior was, which mostly worked, only because of the bug!) I'll leave the issue open for a few days to see if anyone objects to this plan. DictionaryCompoundWordTokenFilter Flag onlyLongestMatch has no affect - Key: LUCENE-3022 URL: https://issues.apache.org/jira/browse/LUCENE-3022 Project: Lucene - Java Issue Type: Bug Components: contrib/analyzers Affects Versions: 2.9.4, 3.1 Reporter: Johann Höchtl Assignee: Robert Muir Priority: Minor Fix For: 3.2, 4.0 Attachments: LUCENE-3022.patch, LUCENE-3022.patch Original Estimate: 5m Remaining Estimate: 5m When using the DictionaryCompoundWordTokenFilter with a german dictionary, I got a strange behaviour: The german word streifenbluse (blouse with stripes) was decompounded to streifen (stripe),reifen(tire) which makes no sense at all. I thought the flag onlyLongestMatch would fix this, because streifen is longer than reifen, but it had no effect. So I reviewed the sourcecode and found the problem: [code] protected void decomposeInternal(final Token token) { // Only words longer than minWordSize get processed if (token.length() this.minWordSize) { return; } char[] lowerCaseTermBuffer=makeLowerCaseCopy(token.buffer()); for (int i=0;itoken.length()-this.minSubwordSize;++i) { Token longestMatchToken=null; for (int j=this.minSubwordSize-1;jthis.maxSubwordSize;++j) { if(i+jtoken.length()) { break; } if(dictionary.contains(lowerCaseTermBuffer, i, j)) { if (this.onlyLongestMatch) { if (longestMatchToken!=null) { if (longestMatchToken.length()j) { longestMatchToken=createToken(i,j,token); } } else { longestMatchToken=createToken(i,j,token); } } else { tokens.add(createToken(i,j,token)); } } } if (this.onlyLongestMatch longestMatchToken!=null) { tokens.add(longestMatchToken); } } } [/code] should be changed to [code] protected void decomposeInternal(final Token token) { // Only words longer than minWordSize get processed if (token.termLength() this.minWordSize) { return; } char[] lowerCaseTermBuffer=makeLowerCaseCopy(token.termBuffer()); Token longestMatchToken=null; for (int i=0;itoken.termLength()-this.minSubwordSize;++i) { for (int j=this.minSubwordSize-1;jthis.maxSubwordSize;++j) { if(i+jtoken.termLength()) { break; } if(dictionary.contains(lowerCaseTermBuffer, i, j)) { if (this.onlyLongestMatch) { if (longestMatchToken!=null) { if (longestMatchToken.termLength()j) { longestMatchToken=createToken(i,j,token); } } else { longestMatchToken=createToken(i,j,token); } } else { tokens.add(createToken(i,j,token)); } } } } if (this.onlyLongestMatch longestMatchToken!=null) { tokens.add(longestMatchToken); } } [/code] So, that only the longest token is really indexed and the onlyLongestMatch Flag makes sense. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [ANNOUNCE] Apache PyLucene 3.1.0
On Apr 14, 2011, at 2:05, Valery Khamenya khame...@gmail.com wrote: Thanks! btw, originally I went this way http://lucene.apache.org/pylucene/jcc/documentation/install.html , but it is not up-to-date and trunk doesn't seem to be compilable (problems with ant xml files, plus no doc directory etc). Trunk needs work. Trunk is based off Lucene's trunk which is moving and changing rapidly, and anything but stable. In particular, pylucene unit tests and samples need to be more or less redone because of all the api changes that occurred on Lucene's trunk. The stable trunk is actually branch_3x which should build and run and pass all its tests. The 3.1 release that just occurred comes from branch_3x. Andi.. Then I used the tar and it worked like charm. best regards -- Valery A.Khamenya On Fri, Apr 8, 2011 at 5:16 AM, dar...@ontrenet.com wrote: Congrats Andi. A truly awesome project. On Thu, 7 Apr 2011 20:02:22 -0700 (PDT), Andi Vajda va...@apache.org wrote: I am pleased to announce the availability of Apache PyLucene 3.1.0. Apache PyLucene, a subproject of Apache Lucene, is a Python extension for accessing Apache Lucene Core. Its goal is to allow you to use Lucene's text indexing and searching capabilities from Python. It is API compatible with the latest version of Lucene Core, 3.1.0. This release contains a number of bug fixes and improvements. Details can be found in the changes files: http://svn.apache.org/repos/asf/lucene/pylucene/tags/pylucene_3_1_0/CHANGES http://svn.apache.org/repos/asf/lucene/pylucene/trunk/jcc/CHANGES Apache PyLucene is available from the following download page: http://www.apache.org/dyn/closer.cgi/lucene/pylucene/pylucene-3.1.0-1-src.tar.gz When downloading from a mirror site, please remember to verify the downloads using signatures found on the Apache site: http://www.apache.org/dist/lucene/pylucene/KEYS For more information on Apache PyLucene, visit the project home page: http://lucene.apache.org/pylucene Andi..
Re: failure of some PyLucene tests on windows OS
On Apr 14, 2011, at 2:22, Thomas Koch k...@orbiteam.de wrote: Well, sure, not running the code that breaks solves the problem. But can you then run the tests multiple times ? [Thomas Koch] note that previously closeStore() was not called, but now when calling it the test_PyLucene runs OK. And yes I can run tests several times- on PyLucene2.9 the index-dirs testrepo and testpyrepo are cleaned up (i.e. removed) after tests succeed. With PyLucene3.1 the testpyrepo is left (because test_PythonDirectory.py fails to cleanup). So this looks like a problem in the test-code rather than in windows/python/pyLucene - some store not being closed results in file lock. Not sure if the bug is there. Me neither ,-( Just for test_PyLucene the 'test-code-fix' fixes the issue. Another problem is the dependency between test_PythonDirectory and test_PyLucene: test_PythonDirectory uses tests defined in test_PyLucene which makes it bit difficult to figure out where the problem is... So has anyone seen this problem before? I'd expect anyone running on Windows to see these test failures. Andi.. Regards, Thomas
[jira] [Commented] (LUCENE-3018) Lucene Native Directory implementation need automated build
[ https://issues.apache.org/jira/browse/LUCENE-3018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13019889#comment-13019889 ] Simon Willnauer commented on LUCENE-3018: - one more comment about the cpptasks-1.0b4.jar, I think we should put it into lucene/contrib/misc/lib instead of lucene/lib since we only need it in there though. while we are on that you might need to update the README.TXT and the overview.html accordingly since we now have an ant build for it. Lucene Native Directory implementation need automated build --- Key: LUCENE-3018 URL: https://issues.apache.org/jira/browse/LUCENE-3018 Project: Lucene - Java Issue Type: Wish Components: Build Affects Versions: 4.0 Reporter: Simon Willnauer Assignee: Varun Thacker Priority: Minor Fix For: 4.0 Attachments: LUCENE-3018.patch, LUCENE-3018.patch, LUCENE-3018.patch, cpptasks.jar Currently the native directory impl in contrib/misc require manual action to compile the c code (partially) documented in https://svn.apache.org/repos/asf/lucene/dev/trunk/lucene/contrib/misc/src/java/overview.html yet it would be nice if we had an ant task and documentation for all platforms how to compile them and set up the prerequisites. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-2571) Indexing performance tests with realtime branch
[ https://issues.apache.org/jira/browse/LUCENE-2571?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Willnauer updated LUCENE-2571: Attachment: (was: wikimedium.realtime.Standard.nd10M_dps.png) Indexing performance tests with realtime branch --- Key: LUCENE-2571 URL: https://issues.apache.org/jira/browse/LUCENE-2571 Project: Lucene - Java Issue Type: Task Components: Index Reporter: Michael Busch Priority: Minor Fix For: Realtime Branch Attachments: wikimedium.realtime.Standard.nd10M_dps.png, wikimedium.realtime.Standard.nd10M_dps_addDocuments.png, wikimedium.realtime.Standard.nd10M_dps_addDocuments_flush.png, wikimedium.trunk.Standard.nd10M_dps.png, wikimedium.trunk.Standard.nd10M_dps_addDocuments.png We should run indexing performance tests with the DWPT changes and compare to trunk. We need to test both single-threaded and multi-threaded performance. NOTE: flush by RAM isn't implemented just yet, so either we wait with the tests or flush by doc count. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2571) Indexing performance tests with realtime branch
[ https://issues.apache.org/jira/browse/LUCENE-2571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13019890#comment-13019890 ] Simon Willnauer commented on LUCENE-2571: - I run batch indexing benchmarks trunk vs. realtime branch with addDocument and with updateDocument. For add document I indexed 10M wikipedia docs into a spinning disk reading from a separate SSD Here is the realtime graph: !wikimedium.realtime.Standard.nd10M_dps_addDocuments.png! vs. trunk: !wikimedium.trunk.Standard.nd10M_dps_addDocuments.png! This graph shows how DWPT is flushing to disk over time: !wikimedium.realtime.Standard.nd10M_dps_addDocuments_flush.png! for updateDocument I build a 10M docs wiki index and indexed the exact same documents with updateDocument here are the results: Realtime Branch: !wikimedium.realtime.Standard.nd10M_dps.png! trunk: !wikimedium.trunk.Standard.nd10M_dps.png! Indexing performance tests with realtime branch --- Key: LUCENE-2571 URL: https://issues.apache.org/jira/browse/LUCENE-2571 Project: Lucene - Java Issue Type: Task Components: Index Reporter: Michael Busch Priority: Minor Fix For: Realtime Branch Attachments: wikimedium.realtime.Standard.nd10M_dps.png, wikimedium.realtime.Standard.nd10M_dps_addDocuments.png, wikimedium.realtime.Standard.nd10M_dps_addDocuments_flush.png, wikimedium.trunk.Standard.nd10M_dps.png, wikimedium.trunk.Standard.nd10M_dps_addDocuments.png We should run indexing performance tests with the DWPT changes and compare to trunk. We need to test both single-threaded and multi-threaded performance. NOTE: flush by RAM isn't implemented just yet, so either we wait with the tests or flush by doc count. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-2571) Indexing performance tests with realtime branch
[ https://issues.apache.org/jira/browse/LUCENE-2571?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Willnauer updated LUCENE-2571: Attachment: wikimedium.trunk.Standard.nd10M_dps.png wikimedium.realtime.Standard.nd10M_dps.png updated attachements Indexing performance tests with realtime branch --- Key: LUCENE-2571 URL: https://issues.apache.org/jira/browse/LUCENE-2571 Project: Lucene - Java Issue Type: Task Components: Index Reporter: Michael Busch Priority: Minor Fix For: Realtime Branch Attachments: wikimedium.realtime.Standard.nd10M_dps.png, wikimedium.realtime.Standard.nd10M_dps_addDocuments.png, wikimedium.realtime.Standard.nd10M_dps_addDocuments_flush.png, wikimedium.trunk.Standard.nd10M_dps.png, wikimedium.trunk.Standard.nd10M_dps_addDocuments.png We should run indexing performance tests with the DWPT changes and compare to trunk. We need to test both single-threaded and multi-threaded performance. NOTE: flush by RAM isn't implemented just yet, so either we wait with the tests or flush by doc count. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[HUDSON] Lucene-Solr-tests-only-trunk - Build # 7105 - Still Failing
Build: https://hudson.apache.org/hudson/job/Lucene-Solr-tests-only-trunk/7105/ No tests ran. Build Log (for compile errors): [...truncated 47 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[HUDSON] Lucene-Solr-tests-only-3.x - Build # 7097 - Failure
Build: https://hudson.apache.org/hudson/job/Lucene-Solr-tests-only-3.x/7097/ No tests ran. Build Log (for compile errors): [...truncated 54 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[HUDSON] Lucene-Solr-tests-only-realtime_search-branch - Build # 3 - Still Failing
Build: https://hudson.apache.org/hudson/job/Lucene-Solr-tests-only-realtime_search-branch/3/ No tests ran. Build Log (for compile errors): [...truncated 53 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-2468) TestFunctionQuery fails always on windows
TestFunctionQuery fails always on windows - Key: SOLR-2468 URL: https://issues.apache.org/jira/browse/SOLR-2468 Project: Solr Issue Type: Bug Reporter: Robert Muir NOTE: reproduce with: ant test -Dtestcase=TestFunctionQuery -Dtestmethod=testExternalFieldValueSourceParser -Dtests.seed=1172323467847461017:3327452514993896990 -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2468) TestFunctionQuery fails always on windows
[ https://issues.apache.org/jira/browse/SOLR-2468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13019906#comment-13019906 ] Robert Muir commented on SOLR-2468: --- {noformat} [junit] - Standard Error - [junit] NOTE: reproduce with: ant test -Dtestcase=TestFunctionQuery -Dtestmethod=testExternalFieldValueSourceParser -Dtests.seed=1172323467847461017:3327452 514993896990 [junit] NOTE: test params are: codec=PreFlex, locale=hr, timezone=America/Argentina/La_Rioja [junit] NOTE: all tests run in this JVM: [junit] [TestFunctionQuery] [junit] NOTE: Windows Vista 6.0 x86/Sun Microsystems Inc. 1.6.0_23 (32-bit)/cpus=4,threads=1,free=10225608,total=16252928 [junit] - --- [junit] Testcase: testExternalFieldValueSourceParser(org.apache.solr.search.function.TestFunctionQuery): Caused an ERROR [junit] java.io.FileNotFoundException: C:\Users\rmuir\workspace\lucene-trunk\solr\build\test-results\temp\1\solrtest-TestFunctionQuery-1302799686658\exte rnal_CoMpleX fieldName _extf.1302799686550 (The filename, directory name, or volume label syntax is incorrect) [junit] java.lang.RuntimeException: java.io.FileNotFoundException: C:\Users\rmuir\workspace\lucene-trunk\solr\build\test-results\temp\1\solrtest-TestFunc tionQuery-1302799686658\external_CoMpleX fieldName _extf.1302799686550 (The filename, directory name, or volume label syntax is incorrect) [junit] at org.apache.solr.search.function.TestFunctionQuery.makeExternalFile(TestFunctionQuery.java:56) [junit] at org.apache.solr.search.function.TestFunctionQuery.testExternalFieldValueSourceParser(TestFunctionQuery.java:536) [junit] at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1232) [junit] at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1160) [junit] Caused by: java.io.FileNotFoundException: C:\Users\rmuir\workspace\lucene-trunk\solr\build\test-results\temp\1\solrtest-TestFunctionQuery-1302799 686658\external_CoMpleX fieldName _extf.1302799686550 (The filename, directory name, or volume label syntax is incorrect) [junit] at java.io.FileOutputStream.open(Native Method) [junit] at java.io.FileOutputStream.init(FileOutputStream.java:179) [junit] at java.io.FileOutputStream.init(FileOutputStream.java:70) [junit] at org.apache.solr.search.function.TestFunctionQuery.makeExternalFile(TestFunctionQuery.java:52) [junit] {noformat} TestFunctionQuery fails always on windows - Key: SOLR-2468 URL: https://issues.apache.org/jira/browse/SOLR-2468 Project: Solr Issue Type: Bug Reporter: Robert Muir NOTE: reproduce with: ant test -Dtestcase=TestFunctionQuery -Dtestmethod=testExternalFieldValueSourceParser -Dtests.seed=1172323467847461017:3327452514993896990 -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2468) TestFunctionQuery fails always on windows
[ https://issues.apache.org/jira/browse/SOLR-2468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13019913#comment-13019913 ] Robert Muir commented on SOLR-2468: --- just at a glance, it appears the test tries to create a file with a double quote in it. On some platforms such as windows, you cannot use certain characters in a filename... I think this is the problem? TestFunctionQuery fails always on windows - Key: SOLR-2468 URL: https://issues.apache.org/jira/browse/SOLR-2468 Project: Solr Issue Type: Bug Reporter: Robert Muir NOTE: reproduce with: ant test -Dtestcase=TestFunctionQuery -Dtestmethod=testExternalFieldValueSourceParser -Dtests.seed=1172323467847461017:3327452514993896990 -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[HUDSON] Lucene-Solr-tests-only-3.x - Build # 7098 - Still Failing
Build: https://hudson.apache.org/hudson/job/Lucene-Solr-tests-only-3.x/7098/ No tests ran. Build Log (for compile errors): [...truncated 52 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[HUDSON] Lucene-Solr-tests-only-3.x - Build # 7099 - Still Failing
Build: https://hudson.apache.org/hudson/job/Lucene-Solr-tests-only-3.x/7099/ No tests ran. Build Log (for compile errors): [...truncated 52 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-3016) Analyzer for Latvian
[ https://issues.apache.org/jira/browse/LUCENE-3016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir resolved LUCENE-3016. - Resolution: Fixed Committed revision 1092396, 1092398 (branch_3x) Analyzer for Latvian Key: LUCENE-3016 URL: https://issues.apache.org/jira/browse/LUCENE-3016 Project: Lucene - Java Issue Type: New Feature Reporter: Robert Muir Assignee: Robert Muir Fix For: 3.2, 4.0 Attachments: LUCENE-3016.patch Less aggressive form of Kreslins' phd thesis: A stemming algorithm for Latvian. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Setting the max number of merge threads across IndexWriters
Can't remember. Probably no. I started an experimental MS api rewrite (incorporating ability to share MSs between IWs) some time ago, but never had the time to finish it. On Thu, Apr 14, 2011 at 19:56, Simon Willnauer simon.willna...@googlemail.com wrote: On Thu, Apr 14, 2011 at 5:52 PM, Earwin Burrfoot ear...@gmail.com wrote: I proposed to decouple MergeScheduler from IW (stop keeping a reference to it). Then you can create a single CMS and pass it to all your IWs. Yep that was it... is there an issue for this? simon On Thu, Apr 14, 2011 at 19:40, Jason Rutherglen jason.rutherg...@gmail.com wrote: I think the proposal involved using a ThreadPoolExecutor, which seemed to not quite work as well as what we have. I think it'll be easier to simply pass a global context that keeps a counter of the actively running threads, and pass that into each IW's CMS? On Thu, Apr 14, 2011 at 8:25 AM, Simon Willnauer simon.willna...@googlemail.com wrote: On Thu, Apr 14, 2011 at 5:20 PM, Jason Rutherglen jason.rutherg...@gmail.com wrote: Today the ConcurrentMergeScheduler allows setting the max thread count and is bound to a single IndexWriter. However in the [common] case of multiple IndexWriters running in the same process, this disallows one from managing the aggregate number of merge threads executing at any given time. I think this can be fixed, shall I open an issue? go ahead! I think I have seen this suggestion somewhere maybe you need to see if there is one already simon - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org -- Kirill Zakharenko/Кирилл Захаренко E-Mail/Jabber: ear...@gmail.com Phone: +7 (495) 683-567-4 ICQ: 104465785 - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org -- Kirill Zakharenko/Кирилл Захаренко E-Mail/Jabber: ear...@gmail.com Phone: +7 (495) 683-567-4 ICQ: 104465785 - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2378) FST-based Lookup (suggestions) for prefix matches.
[ https://issues.apache.org/jira/browse/SOLR-2378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13019932#comment-13019932 ] Robert Muir commented on SOLR-2378: --- Just an idea: should we default to this implementation in trunk? It seems to be a significant reduction in RAM. FST-based Lookup (suggestions) for prefix matches. -- Key: SOLR-2378 URL: https://issues.apache.org/jira/browse/SOLR-2378 Project: Solr Issue Type: New Feature Components: spellchecker Reporter: Dawid Weiss Assignee: Dawid Weiss Labels: lookup, prefix Fix For: 4.0 Attachments: SOLR-2378.patch Implement a subclass of Lookup based on finite state automata/ transducers (Lucene FST package). This issue is for implementing a relatively basic prefix matcher, we will handle infixes and other types of input matches gradually. Impl. phases: - -write a DFA based suggester effectively identical to ternary tree based solution right now,- - -baseline benchmark against tern. tree (memory consumption, rebuilding speed, indexing speed; reuse Andrzej's benchmark code)- - -modify DFA to encode term weights directly in the automaton (optimize for onlyMostPopular case)- - -benchmark again- - -benchmark again- - -modify the tutorial on the wiki- [http://wiki.apache.org/solr/Suggester] -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[HUDSON] Lucene-Solr-tests-only-trunk - Build # 7107 - Still Failing
Build: https://hudson.apache.org/hudson/job/Lucene-Solr-tests-only-trunk/7107/ No tests ran. Build Log (for compile errors): [...truncated 31 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[HUDSON] Lucene-Solr-tests-only-trunk - Build # 7108 - Still Failing
Build: https://hudson.apache.org/hudson/job/Lucene-Solr-tests-only-trunk/7108/ No tests ran. Build Log (for compile errors): [...truncated 7478 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2193) Re-architect Update Handler
[ https://issues.apache.org/jira/browse/SOLR-2193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13019953#comment-13019953 ] Jayson Minard commented on SOLR-2193: - Some of this was already solved in: https://issues.apache.org/jira/browse/SOLR-1155 (locking and re-opening index writer were fixed) Re-architect Update Handler --- Key: SOLR-2193 URL: https://issues.apache.org/jira/browse/SOLR-2193 Project: Solr Issue Type: Improvement Reporter: Mark Miller Assignee: Mark Miller Fix For: 4.0 Attachments: SOLR-2193.patch, SOLR-2193.patch, SOLR-2193.patch The update handler needs an overhaul. A few goals I think we might want to look at: 1. Cleanup - drop DirectUpdateHandler(2) line - move to something like UpdateHandler, DefaultUpdateHandler 2. Expose the SolrIndexWriter in the api or add the proper abstractions to get done what we now do with special casing: if (directupdatehandler2) success else failish 3. Stop closing the IndexWriter and start using commit (still lazy IW init though). 4. Drop iwAccess, iwCommit locks and sync mostly at the Lucene level. 5. Keep NRT support in mind. 6. Keep microsharding in mind (maintain logical index as multiple physical indexes) 7. Address the current issues we face because multiple original/'reloaded' cores can have a different IndexWriter on the same index. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-1155) Change DirectUpdateHandler2 to allow concurrent adds during an autocommit
[ https://issues.apache.org/jira/browse/SOLR-1155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13019954#comment-13019954 ] Jayson Minard commented on SOLR-1155: - Thank Yonik, I'll take a look at his to see if there was anything I learned that applies. This SOLR-1155 has been used in heavy production load and is very stable against 1.4 so maybe Mark will take a peek, I posted a note on the other issue as well. Change DirectUpdateHandler2 to allow concurrent adds during an autocommit - Key: SOLR-1155 URL: https://issues.apache.org/jira/browse/SOLR-1155 Project: Solr Issue Type: Improvement Components: search Affects Versions: 1.3, 1.4 Reporter: Jayson Minard Fix For: Next Attachments: SOLR-1155-release1.4-rev834789.patch, SOLR-1155-trunk-rev834706.patch, Solr-1155.patch, Solr-1155.patch Currently DirectUpdateHandler2 will block adds during a commit, and it seems to be possible with recent changes to Lucene to allow them to run concurrently. See: http://www.nabble.com/Autocommit-blocking-adds---AutoCommit-Speedup--td23435224.html -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-1155) Change DirectUpdateHandler2 to allow concurrent adds during an autocommit
[ https://issues.apache.org/jira/browse/SOLR-1155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13019955#comment-13019955 ] Jayson Minard commented on SOLR-1155: - I'll look at updating this for 3.1 for those that need it on that release, and Mark's looks good for 4.x and beyond. Change DirectUpdateHandler2 to allow concurrent adds during an autocommit - Key: SOLR-1155 URL: https://issues.apache.org/jira/browse/SOLR-1155 Project: Solr Issue Type: Improvement Components: search Affects Versions: 1.3, 1.4 Reporter: Jayson Minard Fix For: Next Attachments: SOLR-1155-release1.4-rev834789.patch, SOLR-1155-trunk-rev834706.patch, Solr-1155.patch, Solr-1155.patch Currently DirectUpdateHandler2 will block adds during a commit, and it seems to be possible with recent changes to Lucene to allow them to run concurrently. See: http://www.nabble.com/Autocommit-blocking-adds---AutoCommit-Speedup--td23435224.html -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2193) Re-architect Update Handler
[ https://issues.apache.org/jira/browse/SOLR-2193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13019961#comment-13019961 ] Jayson Minard commented on SOLR-2193: - Since SOLR-1155 is probably an easier change for Solr 3.1 due to its ancestry, so to get the same benefits I'll work to update it for that version, assuming this patch of yours is for 4.x onwards. Re-architect Update Handler --- Key: SOLR-2193 URL: https://issues.apache.org/jira/browse/SOLR-2193 Project: Solr Issue Type: Improvement Reporter: Mark Miller Assignee: Mark Miller Fix For: 4.0 Attachments: SOLR-2193.patch, SOLR-2193.patch, SOLR-2193.patch The update handler needs an overhaul. A few goals I think we might want to look at: 1. Cleanup - drop DirectUpdateHandler(2) line - move to something like UpdateHandler, DefaultUpdateHandler 2. Expose the SolrIndexWriter in the api or add the proper abstractions to get done what we now do with special casing: if (directupdatehandler2) success else failish 3. Stop closing the IndexWriter and start using commit (still lazy IW init though). 4. Drop iwAccess, iwCommit locks and sync mostly at the Lucene level. 5. Keep NRT support in mind. 6. Keep microsharding in mind (maintain logical index as multiple physical indexes) 7. Address the current issues we face because multiple original/'reloaded' cores can have a different IndexWriter on the same index. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2193) Re-architect Update Handler
[ https://issues.apache.org/jira/browse/SOLR-2193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13019983#comment-13019983 ] Mark Miller commented on SOLR-2193: --- Yes - my plan for this was 4.x. Re-architect Update Handler --- Key: SOLR-2193 URL: https://issues.apache.org/jira/browse/SOLR-2193 Project: Solr Issue Type: Improvement Reporter: Mark Miller Assignee: Mark Miller Fix For: 4.0 Attachments: SOLR-2193.patch, SOLR-2193.patch, SOLR-2193.patch The update handler needs an overhaul. A few goals I think we might want to look at: 1. Cleanup - drop DirectUpdateHandler(2) line - move to something like UpdateHandler, DefaultUpdateHandler 2. Expose the SolrIndexWriter in the api or add the proper abstractions to get done what we now do with special casing: if (directupdatehandler2) success else failish 3. Stop closing the IndexWriter and start using commit (still lazy IW init though). 4. Drop iwAccess, iwCommit locks and sync mostly at the Lucene level. 5. Keep NRT support in mind. 6. Keep microsharding in mind (maintain logical index as multiple physical indexes) 7. Address the current issues we face because multiple original/'reloaded' cores can have a different IndexWriter on the same index. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-2468) TestFunctionQuery fails always on windows
[ https://issues.apache.org/jira/browse/SOLR-2468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hoss Man updated SOLR-2468: --- Attachment: SOLR-2468.patch this is a test i recently added for SOLR-2335, didn't realize some oses would complain about quotes in filenames. i pulled the test apart to test the two differnet aspects independently, so now the esoteric file name testing just relies on being able to support spaces in filenames. TestFunctionQuery fails always on windows - Key: SOLR-2468 URL: https://issues.apache.org/jira/browse/SOLR-2468 Project: Solr Issue Type: Bug Reporter: Robert Muir Attachments: SOLR-2468.patch NOTE: reproduce with: ant test -Dtestcase=TestFunctionQuery -Dtestmethod=testExternalFieldValueSourceParser -Dtests.seed=1172323467847461017:3327452514993896990 -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Lucene Merge failing on Open Files
On Wed, Apr 6, 2011 at 8:44 PM, Grant Ingersoll gsing...@apache.org wrote: Begin forwarded message: From: Michael McCandless luc...@mikemccandless.com Date: April 5, 2011 5:46:13 AM EDT To: simon.willna...@gmail.com Cc: Simon Willnauer simon.willna...@googlemail.com, java-u...@lucene.apache.org, paul_t...@fastmail.fm Subject: Re: Lucene Merge failing on Open Files Reply-To: java-u...@lucene.apache.org Yeah, that mergeFactor is way too high and will cause too-many-open-files (if the index has enough segments). This is one of the things that has always bothered me about Merge Factor. We state what the lower bound is, but we don't doc the upper bound. Should we even allow higher values? Of course, how does one pick the cutoff? I've seen up to about 100 be effective. But 3000 is a bit high (although, who knows what the future will hold) grant, we can at least add some documentation no? simon - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2468) TestFunctionQuery fails always on windows
[ https://issues.apache.org/jira/browse/SOLR-2468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13019993#comment-13019993 ] Robert Muir commented on SOLR-2468: --- +1 TestFunctionQuery fails always on windows - Key: SOLR-2468 URL: https://issues.apache.org/jira/browse/SOLR-2468 Project: Solr Issue Type: Bug Reporter: Robert Muir Attachments: SOLR-2468.patch NOTE: reproduce with: ant test -Dtestcase=TestFunctionQuery -Dtestmethod=testExternalFieldValueSourceParser -Dtests.seed=1172323467847461017:3327452514993896990 -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3028) IW.getReader() returns inconsistent reader on RT Branch
[ https://issues.apache.org/jira/browse/LUCENE-3028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13019996#comment-13019996 ] selckin commented on LUCENE-3028: - Seems to fail once every 6-8 runs quite consistently (at least i think this is the issue) brachnes/realtime_search r1092329 {{ [junit] Testsuite: org.apache.lucene.index.TestRollingUpdates [junit] Testcase: testUpdateSameDoc(org.apache.lucene.index.TestRollingUpdates):Caused an ERROR [junit] MockDirectoryWrapper: cannot close: there are still open files: {_ho.fdt=1, _ho.prx=1, _ho.fdx=1, _ho.nrm=1, _j0.fdt=1, _ho.tis=1, _j0.fdx=1, _j0.tis=1, _j0.prx=1, _ho.frq=1, _ho.tvx=1, _ho.tvd=1, _j0.nrm=1, _ho.tvf=1, _j0.frq=1, _j0.tvf=1, _j0.tvd=1, _j0.tvx=1} [junit] java.lang.RuntimeException: MockDirectoryWrapper: cannot close: there are still open files: {_ho.fdt=1, _ho.prx=1, _ho.fdx=1, _ho.nrm=1, _j0.fdt=1, _ho.tis=1, _j0.fdx=1, _j0.tis=1, _j0.prx=1, _ho.frq=1, _ho.tvx=1, _ho.tvd=1, _j0.nrm=1, _ho.tvf=1, _j0.frq=1, _j0.tvf=1 , _j0.tvd=1, _j0.tvx=1} [junit] at org.apache.lucene.store.MockDirectoryWrapper.close(MockDirectoryWrapper.java:414) [junit] at org.apache.lucene.index.TestRollingUpdates.testUpdateSameDoc(TestRollingUpdates.java:104) [junit] at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1226) [junit] at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1154) [junit] Caused by: java.lang.RuntimeException: unclosed IndexInput [junit] at org.apache.lucene.store.MockDirectoryWrapper.openInput(MockDirectoryWrapper.java:369) [junit] at org.apache.lucene.store.Directory.openInput(Directory.java:122) [junit] at org.apache.lucene.index.TermVectorsReader.init(TermVectorsReader.java:86) [junit] at org.apache.lucene.index.SegmentReader$CoreReaders.openDocStores(SegmentReader.java:236) [junit] at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:495) [junit] at org.apache.lucene.index.IndexWriter$ReaderPool.get(IndexWriter.java:629) [junit] at org.apache.lucene.index.IndexWriter$ReaderPool.getReadOnlyClone(IndexWriter.java:587) [junit] at org.apache.lucene.index.DirectoryReader.init(DirectoryReader.java:172) [junit] at org.apache.lucene.index.IndexWriter.getReader(IndexWriter.java:377) [junit] at org.apache.lucene.index.DirectoryReader.doReopenFromWriter(DirectoryReader.java:419) [junit] at org.apache.lucene.index.DirectoryReader.doReopen(DirectoryReader.java:432) [junit] at org.apache.lucene.index.DirectoryReader.reopen(DirectoryReader.java:392) [junit] at org.apache.lucene.index.TestRollingUpdates$IndexingThread.run(TestRollingUpdates.java:129) [junit] [junit] [junit] Testcase: testUpdateSameDoc(org.apache.lucene.index.TestRollingUpdates):FAILED [junit] Some threads threw uncaught exceptions! [junit] junit.framework.AssertionFailedError: Some threads threw uncaught exceptions! [junit] at org.apache.lucene.util.LuceneTestCase.tearDown(LuceneTestCase.java:521) [junit] at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1226) [junit] at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1154) [junit] [junit] [junit] Tests run: 2, Failures: 1, Errors: 1, Time elapsed: 6.649 sec [junit] [junit] - Standard Error - [junit] NOTE: reproduce with: ant test -Dtestcase=TestRollingUpdates -Dtestmethod=testUpdateSameDoc -Dtests.seed=-4094951767438954769:-1203905293622856057 [junit] NOTE: reproduce with: ant test -Dtestcase=TestRollingUpdates -Dtestmethod=testUpdateSameDoc -Dtests.seed=-4094951767438954769:-1203905293622856057 [junit] The following exceptions were thrown by threads: [junit] *** Thread: Thread-103 *** [junit] java.lang.AssertionError: expected: org.apache.lucene.index.DocumentsWriterDeleteQueue@18635827but was: org.apache.lucene.index.DocumentsWriterDeleteQueue@223074f3 false [junit] at org.apache.lucene.index.DocumentsWriter.doFlush(DocumentsWriter.java:359) [junit] at org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:346) [junit] at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1367) [junit] at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1339) [junit] at org.apache.lucene.index.TestRollingUpdates$IndexingThread.run(TestRollingUpdates.java:125) [junit] *** Thread: Thread-106 *** [junit] java.lang.AssertionError: expected: org.apache.lucene.index.DocumentsWriterDeleteQueue@18635827but was: org.apache.lucene.index.DocumentsWriterDeleteQueue@223074f3 false [junit]
[jira] [Resolved] (SOLR-2468) TestFunctionQuery fails always on windows
[ https://issues.apache.org/jira/browse/SOLR-2468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hoss Man resolved SOLR-2468. Resolution: Fixed Assignee: Hoss Man Committed revision 1092451. TestFunctionQuery fails always on windows - Key: SOLR-2468 URL: https://issues.apache.org/jira/browse/SOLR-2468 Project: Solr Issue Type: Bug Reporter: Robert Muir Assignee: Hoss Man Attachments: SOLR-2468.patch NOTE: reproduce with: ant test -Dtestcase=TestFunctionQuery -Dtestmethod=testExternalFieldValueSourceParser -Dtests.seed=1172323467847461017:3327452514993896990 -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3028) IW.getReader() returns inconsistent reader on RT Branch
[ https://issues.apache.org/jira/browse/LUCENE-3028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13020002#comment-13020002 ] Simon Willnauer commented on LUCENE-3028: - hmm I can't even after 1k runs :( IW.getReader() returns inconsistent reader on RT Branch --- Key: LUCENE-3028 URL: https://issues.apache.org/jira/browse/LUCENE-3028 Project: Lucene - Java Issue Type: Bug Components: Index Affects Versions: Realtime Branch Reporter: Simon Willnauer Assignee: Simon Willnauer Fix For: Realtime Branch Attachments: LUCENE-3028.patch, LUCENE-3028.patch I extended the testcase TestRollingUpdates#testUpdateSameDoc to pull a NRT reader after each update and asserted that is always sees only one document. Yet, this fails with current branch since there is a problem in how we flush in the getReader() case. What happens here is that we flush all threads and then release the lock (letting other flushes which came in after we entered the flushAllThread context, continue) so that we could concurrently get a new segment that transports global deletes without the corresponding add. They sneak in while we continue to open the NRT reader which in turn sees inconsistent results. I will upload a patch soon -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3028) IW.getReader() returns inconsistent reader on RT Branch
[ https://issues.apache.org/jira/browse/LUCENE-3028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] selckin updated LUCENE-3028: Attachment: realtime-1.txt IW.getReader() returns inconsistent reader on RT Branch --- Key: LUCENE-3028 URL: https://issues.apache.org/jira/browse/LUCENE-3028 Project: Lucene - Java Issue Type: Bug Components: Index Affects Versions: Realtime Branch Reporter: Simon Willnauer Assignee: Simon Willnauer Fix For: Realtime Branch Attachments: LUCENE-3028.patch, LUCENE-3028.patch, realtime-1.txt I extended the testcase TestRollingUpdates#testUpdateSameDoc to pull a NRT reader after each update and asserted that is always sees only one document. Yet, this fails with current branch since there is a problem in how we flush in the getReader() case. What happens here is that we flush all threads and then release the lock (letting other flushes which came in after we entered the flushAllThread context, continue) so that we could concurrently get a new segment that transports global deletes without the corresponding add. They sneak in while we continue to open the NRT reader which in turn sees inconsistent results. I will upload a patch soon -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3028) IW.getReader() returns inconsistent reader on RT Branch
[ https://issues.apache.org/jira/browse/LUCENE-3028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13020005#comment-13020005 ] Simon Willnauer commented on LUCENE-3028: - I just committed a fix for this - seems like the assert which resets the current flushing queue was at the wrong position. IW.getReader() returns inconsistent reader on RT Branch --- Key: LUCENE-3028 URL: https://issues.apache.org/jira/browse/LUCENE-3028 Project: Lucene - Java Issue Type: Bug Components: Index Affects Versions: Realtime Branch Reporter: Simon Willnauer Assignee: Simon Willnauer Fix For: Realtime Branch Attachments: LUCENE-3028.patch, LUCENE-3028.patch, realtime-1.txt I extended the testcase TestRollingUpdates#testUpdateSameDoc to pull a NRT reader after each update and asserted that is always sees only one document. Yet, this fails with current branch since there is a problem in how we flush in the getReader() case. What happens here is that we flush all threads and then release the lock (letting other flushes which came in after we entered the flushAllThread context, continue) so that we could concurrently get a new segment that transports global deletes without the corresponding add. They sneak in while we continue to open the NRT reader which in turn sees inconsistent results. I will upload a patch soon -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org