date:20110414

TestOmitTf.testMixedMerge random seed failure
-

 Key: LUCENE-3027
 URL: https://issues.apache.org/jira/browse/LUCENE-3027
 Project: Lucene - Java
  Issue Type: Bug
Reporter: selckin


Version: trunk r1091638

ant test -Dtests.seed=-6595054217575280191:5576532348905930588


[junit] - Standard Error -
[junit] WARNING: test method: 'testDeMorgan' left thread running: 
Thread[NRT search threads-1691-thread-2,5,main]
[junit] RESOURCE LEAK: test method: 'testDeMorgan' left 1 thread(s) running
[junit] NOTE: reproduce with: ant test -Dtestcase=TestBooleanQuery 
-Dtestmethod=testDeMorgan -Dtests.seed=-6595054217575280191:5576532348905930588
[junit] -  ---
[junit] Testsuite: org.apache.lucene.index.TestNorms
[junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 5.064 sec
[junit] 
[junit] Testsuite: org.apache.lucene.index.TestOmitTf
[junit] Testcase: testMixedMerge(org.apache.lucene.index.TestOmitTf):   
Caused an ERROR
[junit] CheckIndex failed
[junit] java.lang.RuntimeException: CheckIndex failed
[junit] at 
org.apache.lucene.util._TestUtil.checkIndex(_TestUtil.java:152)
[junit] at 
org.apache.lucene.util._TestUtil.checkIndex(_TestUtil.java:138)
[junit] at 
org.apache.lucene.index.TestOmitTf.testMixedMerge(TestOmitTf.java:155)
[junit] at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1232)
[junit] at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1160)
[junit] 
[junit] 
[junit] Tests run: 5, Failures: 0, Errors: 1, Time elapsed: 0.851 sec
[junit] 
[junit] - Standard Output ---
[junit] CheckIndex failed
[junit] Segments file=segments_1 numSegments=1 version=FORMAT_4_0 [Lucene 
4.0]
[junit]   1 of 1: name=_12 docCount=60
[junit] codec=SegmentCodecs [codecs=[MockRandom, 
MockVariableIntBlock(baseBlockSize=112)], provider=RandomCodecProvider: 
{f1=MockRandom, f2=MockVariableIntBlock(baseBlockSize=112)}]
[junit] compound=false
[junit] hasProx=false
[junit] numFiles=16
[junit] size (MB)=0,01
[junit] diagnostics = {optimize=true, mergeFactor=2, 
os.version=2.6.37-gentoo, os=Linux, lucene.version=4.0-SNAPSHOT, source=merge, 
os.arch=amd64, java.version=1.6.0_24, java.vendor=Sun Microsystems Inc.}
[junit] no deletions
[junit] test: open reader.OK
[junit] test: fields..OK [2 fields]
[junit] test: field norms.OK [2 fields]
[junit] test: terms, freq, prox...ERROR: java.io.IOException: Read past 
EOF
[junit] java.io.IOException: Read past EOF
[junit] at 
org.apache.lucene.store.RAMInputStream.switchCurrentBuffer(RAMInputStream.java:90)
[junit] at 
org.apache.lucene.store.RAMInputStream.readByte(RAMInputStream.java:63)
[junit] at 
org.apache.lucene.store.MockIndexInputWrapper.readByte(MockIndexInputWrapper.java:105)
[junit] at org.apache.lucene.store.DataInput.readVInt(DataInput.java:94)
[junit] at 
org.apache.lucene.index.codecs.sep.SepSkipListReader.readSkipData(SepSkipListReader.java:188)
[junit] at 
org.apache.lucene.index.codecs.MultiLevelSkipListReader.loadNextSkip(MultiLevelSkipListReader.java:142)
[junit] at 
org.apache.lucene.index.codecs.MultiLevelSkipListReader.skipTo(MultiLevelSkipListReader.java:112)
[junit] at 
org.apache.lucene.index.codecs.sep.SepPostingsReaderImpl$SepDocsEnum.advance(SepPostingsReaderImpl.java:454)
[junit] at 
org.apache.lucene.index.CheckIndex.testTermIndex(CheckIndex.java:782)
[junit] at 
org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:495)
[junit] at 
org.apache.lucene.util._TestUtil.checkIndex(_TestUtil.java:148)
[junit] at 
org.apache.lucene.util._TestUtil.checkIndex(_TestUtil.java:138)
[junit] at 
org.apache.lucene.index.TestOmitTf.testMixedMerge(TestOmitTf.java:155)
[junit] at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
[junit] at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
[junit] at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
[junit] at java.lang.reflect.Method.invoke(Method.java:597)
[junit] at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44)
[junit] at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
[junit] at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41)
[junit] at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20)
[junit] at

[jira] [Updated] (LUCENE-3027) TestOmitTf.testMixedMerge random seed failure


 [ 
https://issues.apache.org/jira/browse/LUCENE-3027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

selckin updated LUCENE-3027:


Attachment: output.txt

ant output

 TestOmitTf.testMixedMerge random seed failure
 -

 Key: LUCENE-3027
 URL: https://issues.apache.org/jira/browse/LUCENE-3027
 Project: Lucene - Java
  Issue Type: Bug
Reporter: selckin
 Attachments: output.txt


 Version: trunk r1091638
 ant test -Dtests.seed=-6595054217575280191:5576532348905930588
 [junit] - Standard Error -
 [junit] WARNING: test method: 'testDeMorgan' left thread running: 
 Thread[NRT search threads-1691-thread-2,5,main]
 [junit] RESOURCE LEAK: test method: 'testDeMorgan' left 1 thread(s) 
 running
 [junit] NOTE: reproduce with: ant test -Dtestcase=TestBooleanQuery 
 -Dtestmethod=testDeMorgan 
 -Dtests.seed=-6595054217575280191:5576532348905930588
 [junit] -  ---
 [junit] Testsuite: org.apache.lucene.index.TestNorms
 [junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 5.064 sec
 [junit] 
 [junit] Testsuite: org.apache.lucene.index.TestOmitTf
 [junit] Testcase: testMixedMerge(org.apache.lucene.index.TestOmitTf): 
 Caused an ERROR
 [junit] CheckIndex failed
 [junit] java.lang.RuntimeException: CheckIndex failed
 [junit]   at 
 org.apache.lucene.util._TestUtil.checkIndex(_TestUtil.java:152)
 [junit]   at 
 org.apache.lucene.util._TestUtil.checkIndex(_TestUtil.java:138)
 [junit]   at 
 org.apache.lucene.index.TestOmitTf.testMixedMerge(TestOmitTf.java:155)
 [junit]   at 
 org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1232)
 [junit]   at 
 org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1160)
 [junit] 
 [junit] 
 [junit] Tests run: 5, Failures: 0, Errors: 1, Time elapsed: 0.851 sec
 [junit] 
 [junit] - Standard Output ---
 [junit] CheckIndex failed
 [junit] Segments file=segments_1 numSegments=1 version=FORMAT_4_0 [Lucene 
 4.0]
 [junit]   1 of 1: name=_12 docCount=60
 [junit] codec=SegmentCodecs [codecs=[MockRandom, 
 MockVariableIntBlock(baseBlockSize=112)], provider=RandomCodecProvider: 
 {f1=MockRandom, f2=MockVariableIntBlock(baseBlockSize=112)}]
 [junit] compound=false
 [junit] hasProx=false
 [junit] numFiles=16
 [junit] size (MB)=0,01
 [junit] diagnostics = {optimize=true, mergeFactor=2, 
 os.version=2.6.37-gentoo, os=Linux, lucene.version=4.0-SNAPSHOT, 
 source=merge, os.arch=amd64, java.version=1.6.0_24, java.vendor=Sun 
 Microsystems Inc.}
 [junit] no deletions
 [junit] test: open reader.OK
 [junit] test: fields..OK [2 fields]
 [junit] test: field norms.OK [2 fields]
 [junit] test: terms, freq, prox...ERROR: java.io.IOException: Read 
 past EOF
 [junit] java.io.IOException: Read past EOF
 [junit]   at 
 org.apache.lucene.store.RAMInputStream.switchCurrentBuffer(RAMInputStream.java:90)
 [junit]   at 
 org.apache.lucene.store.RAMInputStream.readByte(RAMInputStream.java:63)
 [junit]   at 
 org.apache.lucene.store.MockIndexInputWrapper.readByte(MockIndexInputWrapper.java:105)
 [junit]   at org.apache.lucene.store.DataInput.readVInt(DataInput.java:94)
 [junit]   at 
 org.apache.lucene.index.codecs.sep.SepSkipListReader.readSkipData(SepSkipListReader.java:188)
 [junit]   at 
 org.apache.lucene.index.codecs.MultiLevelSkipListReader.loadNextSkip(MultiLevelSkipListReader.java:142)
 [junit]   at 
 org.apache.lucene.index.codecs.MultiLevelSkipListReader.skipTo(MultiLevelSkipListReader.java:112)
 [junit]   at 
 org.apache.lucene.index.codecs.sep.SepPostingsReaderImpl$SepDocsEnum.advance(SepPostingsReaderImpl.java:454)
 [junit]   at 
 org.apache.lucene.index.CheckIndex.testTermIndex(CheckIndex.java:782)
 [junit]   at 
 org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:495)
 [junit]   at 
 org.apache.lucene.util._TestUtil.checkIndex(_TestUtil.java:148)
 [junit]   at 
 org.apache.lucene.util._TestUtil.checkIndex(_TestUtil.java:138)
 [junit]   at 
 org.apache.lucene.index.TestOmitTf.testMixedMerge(TestOmitTf.java:155)
 [junit]   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 [junit]   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
 [junit]   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 [junit]   at java.lang.reflect.Method.invoke(Method.java:597)
 [junit]   at 
 org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44)
 [junit]   at

[HUDSON] Lucene-Solr-tests-only-3.x - Build # 7088 - Failure

Build: https://hudson.apache.org/hudson/job/Lucene-Solr-tests-only-3.x/7088/

1 tests failed.
REGRESSION:  
org.apache.solr.client.solrj.embedded.SolrExampleStreamingTest.testCommitWithin

Error Message:
expected:0 but was:1

Stack Trace:
junit.framework.AssertionFailedError: expected:0 but was:1
at 
org.apache.solr.client.solrj.SolrExampleTests.testCommitWithin(SolrExampleTests.java:327)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1082)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1010)




Build Log (for compile errors):
[...truncated 10717 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3027) TestOmitTf.testMixedMerge random seed failure


[ 
https://issues.apache.org/jira/browse/LUCENE-3027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13019726#comment-13019726
 ] 

Robert Muir commented on LUCENE-3027:
-

Thanks for reporting this, I can reproduce on windows also, looks serious.

Might be triggered by the fact we recently started randomizing skipinterval?

Note, the test will NOT fail if you try the repro line!!!
You have to do 'ant test -Dtests.seed=-6595054217575280191:5576532348905930588'

I don't know if this causes a timing issue or what, but it works for me too:

{noformat}
[junit] Testsuite: org.apache.lucene.index.TestOmitTf
[junit] Testcase: testMixedMerge(org.apache.lucene.index.TestOmitTf):   
Caused an ERROR
[junit] CheckIndex failed
[junit] java.lang.RuntimeException: CheckIndex failed
[junit] at 
org.apache.lucene.util._TestUtil.checkIndex(_TestUtil.java:152)
[junit] at 
org.apache.lucene.util._TestUtil.checkIndex(_TestUtil.java:138)
[junit] at 
org.apache.lucene.index.TestOmitTf.testMixedMerge(TestOmitTf.java:155)
[junit] at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1232)
[junit] at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1160)
[junit]
[junit]
[junit] Tests run: 5, Failures: 0, Errors: 1, Time elapsed: 0.284 sec
[junit]
[junit] - Standard Output ---
[junit] CheckIndex failed
[junit] Segments file=segments_1 numSegments=1 version=FORMAT_4_0 [Lucene 
4.0]
[junit]   1 of 1: name=_12 docCount=60
[junit] codec=SegmentCodecs [codecs=[MockRandom, 
MockVariableIntBlock(baseBlockSize=112)], provider=RandomCodecProvider: 
{f1=MockRandom, f2=MockVariable
IntBlock(baseBlockSize=112)}]
[junit] compound=false
[junit] hasProx=false
[junit] numFiles=16
[junit] size (MB)=0,01
[junit] diagnostics = {optimize=true, mergeFactor=2, os.version=6.0, 
os=Windows Vista, lucene.version=4.0-SNAPSHOT, source=merge, os.arch=x86, 
java.vers
ion=1.6.0_23, java.vendor=Sun Microsystems Inc.}
[junit] no deletions
[junit] test: open reader.OK
[junit] test: fields..OK [2 fields]
[junit] test: field norms.OK [2 fields]
[junit] test: terms, freq, prox...ERROR: java.io.IOException: Read past 
EOF
[junit] java.io.IOException: Read past EOF
[junit] at 
org.apache.lucene.store.RAMInputStream.switchCurrentBuffer(RAMInputStream.java:90)
[junit] at 
org.apache.lucene.store.RAMInputStream.readByte(RAMInputStream.java:63)
[junit] at 
org.apache.lucene.store.MockIndexInputWrapper.readByte(MockIndexInputWrapper.java:105)
[junit] at org.apache.lucene.store.DataInput.readVInt(DataInput.java:94)
[junit] at 
org.apache.lucene.index.codecs.sep.SepSkipListReader.readSkipData(SepSkipListReader.java:188)
[junit] at 
org.apache.lucene.index.codecs.MultiLevelSkipListReader.loadNextSkip(MultiLevelSkipListReader.java:142)
[junit] at 
org.apache.lucene.index.codecs.MultiLevelSkipListReader.skipTo(MultiLevelSkipListReader.java:112)
[junit] at 
org.apache.lucene.index.codecs.sep.SepPostingsReaderImpl$SepDocsEnum.advance(SepPostingsReaderImpl.java:454)
[junit] at 
org.apache.lucene.index.CheckIndex.testTermIndex(CheckIndex.java:782)
{noformat}

 TestOmitTf.testMixedMerge random seed failure
 -

 Key: LUCENE-3027
 URL: https://issues.apache.org/jira/browse/LUCENE-3027
 Project: Lucene - Java
  Issue Type: Bug
Reporter: selckin
 Attachments: output.txt


 Version: trunk r1091638
 ant test -Dtests.seed=-6595054217575280191:5576532348905930588
 [junit] - Standard Error -
 [junit] WARNING: test method: 'testDeMorgan' left thread running: 
 Thread[NRT search threads-1691-thread-2,5,main]
 [junit] RESOURCE LEAK: test method: 'testDeMorgan' left 1 thread(s) 
 running
 [junit] NOTE: reproduce with: ant test -Dtestcase=TestBooleanQuery 
 -Dtestmethod=testDeMorgan 
 -Dtests.seed=-6595054217575280191:5576532348905930588
 [junit] -  ---
 [junit] Testsuite: org.apache.lucene.index.TestNorms
 [junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 5.064 sec
 [junit] 
 [junit] Testsuite: org.apache.lucene.index.TestOmitTf
 [junit] Testcase: testMixedMerge(org.apache.lucene.index.TestOmitTf): 
 Caused an ERROR
 [junit] CheckIndex failed
 [junit] java.lang.RuntimeException: CheckIndex failed
 [junit]   at 
 org.apache.lucene.util._TestUtil.checkIndex(_TestUtil.java:152)
 [junit]   at 
 org.apache.lucene.util._TestUtil.checkIndex(_TestUtil.java:138)
 [junit]   at

[jira] [Commented] (LUCENE-3027) TestOmitTf.testMixedMerge random seed failure


[ 
https://issues.apache.org/jira/browse/LUCENE-3027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13019730#comment-13019730
 ] 

Michael McCandless commented on LUCENE-3027:


Nice find!!

This seems to fail for me consistently:
{noformat}
ant test-core -Dtestcase=TestOmitTf -Dtestmethod=testMixedMerge 
-Dtests.seed=-6440890546631805798:9110494168610462642
{noformat}

I'll hunt...

 TestOmitTf.testMixedMerge random seed failure
 -

 Key: LUCENE-3027
 URL: https://issues.apache.org/jira/browse/LUCENE-3027
 Project: Lucene - Java
  Issue Type: Bug
Reporter: selckin
 Attachments: output.txt


 Version: trunk r1091638
 ant test -Dtests.seed=-6595054217575280191:5576532348905930588
 [junit] - Standard Error -
 [junit] WARNING: test method: 'testDeMorgan' left thread running: 
 Thread[NRT search threads-1691-thread-2,5,main]
 [junit] RESOURCE LEAK: test method: 'testDeMorgan' left 1 thread(s) 
 running
 [junit] NOTE: reproduce with: ant test -Dtestcase=TestBooleanQuery 
 -Dtestmethod=testDeMorgan 
 -Dtests.seed=-6595054217575280191:5576532348905930588
 [junit] -  ---
 [junit] Testsuite: org.apache.lucene.index.TestNorms
 [junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 5.064 sec
 [junit] 
 [junit] Testsuite: org.apache.lucene.index.TestOmitTf
 [junit] Testcase: testMixedMerge(org.apache.lucene.index.TestOmitTf): 
 Caused an ERROR
 [junit] CheckIndex failed
 [junit] java.lang.RuntimeException: CheckIndex failed
 [junit]   at 
 org.apache.lucene.util._TestUtil.checkIndex(_TestUtil.java:152)
 [junit]   at 
 org.apache.lucene.util._TestUtil.checkIndex(_TestUtil.java:138)
 [junit]   at 
 org.apache.lucene.index.TestOmitTf.testMixedMerge(TestOmitTf.java:155)
 [junit]   at 
 org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1232)
 [junit]   at 
 org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1160)
 [junit] 
 [junit] 
 [junit] Tests run: 5, Failures: 0, Errors: 1, Time elapsed: 0.851 sec
 [junit] 
 [junit] - Standard Output ---
 [junit] CheckIndex failed
 [junit] Segments file=segments_1 numSegments=1 version=FORMAT_4_0 [Lucene 
 4.0]
 [junit]   1 of 1: name=_12 docCount=60
 [junit] codec=SegmentCodecs [codecs=[MockRandom, 
 MockVariableIntBlock(baseBlockSize=112)], provider=RandomCodecProvider: 
 {f1=MockRandom, f2=MockVariableIntBlock(baseBlockSize=112)}]
 [junit] compound=false
 [junit] hasProx=false
 [junit] numFiles=16
 [junit] size (MB)=0,01
 [junit] diagnostics = {optimize=true, mergeFactor=2, 
 os.version=2.6.37-gentoo, os=Linux, lucene.version=4.0-SNAPSHOT, 
 source=merge, os.arch=amd64, java.version=1.6.0_24, java.vendor=Sun 
 Microsystems Inc.}
 [junit] no deletions
 [junit] test: open reader.OK
 [junit] test: fields..OK [2 fields]
 [junit] test: field norms.OK [2 fields]
 [junit] test: terms, freq, prox...ERROR: java.io.IOException: Read 
 past EOF
 [junit] java.io.IOException: Read past EOF
 [junit]   at 
 org.apache.lucene.store.RAMInputStream.switchCurrentBuffer(RAMInputStream.java:90)
 [junit]   at 
 org.apache.lucene.store.RAMInputStream.readByte(RAMInputStream.java:63)
 [junit]   at 
 org.apache.lucene.store.MockIndexInputWrapper.readByte(MockIndexInputWrapper.java:105)
 [junit]   at org.apache.lucene.store.DataInput.readVInt(DataInput.java:94)
 [junit]   at 
 org.apache.lucene.index.codecs.sep.SepSkipListReader.readSkipData(SepSkipListReader.java:188)
 [junit]   at 
 org.apache.lucene.index.codecs.MultiLevelSkipListReader.loadNextSkip(MultiLevelSkipListReader.java:142)
 [junit]   at 
 org.apache.lucene.index.codecs.MultiLevelSkipListReader.skipTo(MultiLevelSkipListReader.java:112)
 [junit]   at 
 org.apache.lucene.index.codecs.sep.SepPostingsReaderImpl$SepDocsEnum.advance(SepPostingsReaderImpl.java:454)
 [junit]   at 
 org.apache.lucene.index.CheckIndex.testTermIndex(CheckIndex.java:782)
 [junit]   at 
 org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:495)
 [junit]   at 
 org.apache.lucene.util._TestUtil.checkIndex(_TestUtil.java:148)
 [junit]   at 
 org.apache.lucene.util._TestUtil.checkIndex(_TestUtil.java:138)
 [junit]   at 
 org.apache.lucene.index.TestOmitTf.testMixedMerge(TestOmitTf.java:155)
 [junit]   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 [junit]   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
 [junit]   at

[jira] [Commented] (LUCENE-3027) TestOmitTf.testMixedMerge random seed failure


[ 
https://issues.apache.org/jira/browse/LUCENE-3027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13019748#comment-13019748
 ] 

Michael McCandless commented on LUCENE-3027:


I found the issue: it's because a FieldInfo got into a bad state where omitTF 
was true but storesPayloads was also true.  Plus, that the sep codec's skip 
data reader was tricked by this bad state.

StandardCodec is not affected, and 3.x is not affected.  Still, for 3.x I'll 
backport making sure FieldInfo always clears storesPayloads if omitTF is true.

 TestOmitTf.testMixedMerge random seed failure
 -

 Key: LUCENE-3027
 URL: https://issues.apache.org/jira/browse/LUCENE-3027
 Project: Lucene - Java
  Issue Type: Bug
Reporter: selckin
 Attachments: output.txt


 Version: trunk r1091638
 ant test -Dtests.seed=-6595054217575280191:5576532348905930588
 [junit] - Standard Error -
 [junit] WARNING: test method: 'testDeMorgan' left thread running: 
 Thread[NRT search threads-1691-thread-2,5,main]
 [junit] RESOURCE LEAK: test method: 'testDeMorgan' left 1 thread(s) 
 running
 [junit] NOTE: reproduce with: ant test -Dtestcase=TestBooleanQuery 
 -Dtestmethod=testDeMorgan 
 -Dtests.seed=-6595054217575280191:5576532348905930588
 [junit] -  ---
 [junit] Testsuite: org.apache.lucene.index.TestNorms
 [junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 5.064 sec
 [junit] 
 [junit] Testsuite: org.apache.lucene.index.TestOmitTf
 [junit] Testcase: testMixedMerge(org.apache.lucene.index.TestOmitTf): 
 Caused an ERROR
 [junit] CheckIndex failed
 [junit] java.lang.RuntimeException: CheckIndex failed
 [junit]   at 
 org.apache.lucene.util._TestUtil.checkIndex(_TestUtil.java:152)
 [junit]   at 
 org.apache.lucene.util._TestUtil.checkIndex(_TestUtil.java:138)
 [junit]   at 
 org.apache.lucene.index.TestOmitTf.testMixedMerge(TestOmitTf.java:155)
 [junit]   at 
 org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1232)
 [junit]   at 
 org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1160)
 [junit] 
 [junit] 
 [junit] Tests run: 5, Failures: 0, Errors: 1, Time elapsed: 0.851 sec
 [junit] 
 [junit] - Standard Output ---
 [junit] CheckIndex failed
 [junit] Segments file=segments_1 numSegments=1 version=FORMAT_4_0 [Lucene 
 4.0]
 [junit]   1 of 1: name=_12 docCount=60
 [junit] codec=SegmentCodecs [codecs=[MockRandom, 
 MockVariableIntBlock(baseBlockSize=112)], provider=RandomCodecProvider: 
 {f1=MockRandom, f2=MockVariableIntBlock(baseBlockSize=112)}]
 [junit] compound=false
 [junit] hasProx=false
 [junit] numFiles=16
 [junit] size (MB)=0,01
 [junit] diagnostics = {optimize=true, mergeFactor=2, 
 os.version=2.6.37-gentoo, os=Linux, lucene.version=4.0-SNAPSHOT, 
 source=merge, os.arch=amd64, java.version=1.6.0_24, java.vendor=Sun 
 Microsystems Inc.}
 [junit] no deletions
 [junit] test: open reader.OK
 [junit] test: fields..OK [2 fields]
 [junit] test: field norms.OK [2 fields]
 [junit] test: terms, freq, prox...ERROR: java.io.IOException: Read 
 past EOF
 [junit] java.io.IOException: Read past EOF
 [junit]   at 
 org.apache.lucene.store.RAMInputStream.switchCurrentBuffer(RAMInputStream.java:90)
 [junit]   at 
 org.apache.lucene.store.RAMInputStream.readByte(RAMInputStream.java:63)
 [junit]   at 
 org.apache.lucene.store.MockIndexInputWrapper.readByte(MockIndexInputWrapper.java:105)
 [junit]   at org.apache.lucene.store.DataInput.readVInt(DataInput.java:94)
 [junit]   at 
 org.apache.lucene.index.codecs.sep.SepSkipListReader.readSkipData(SepSkipListReader.java:188)
 [junit]   at 
 org.apache.lucene.index.codecs.MultiLevelSkipListReader.loadNextSkip(MultiLevelSkipListReader.java:142)
 [junit]   at 
 org.apache.lucene.index.codecs.MultiLevelSkipListReader.skipTo(MultiLevelSkipListReader.java:112)
 [junit]   at 
 org.apache.lucene.index.codecs.sep.SepPostingsReaderImpl$SepDocsEnum.advance(SepPostingsReaderImpl.java:454)
 [junit]   at 
 org.apache.lucene.index.CheckIndex.testTermIndex(CheckIndex.java:782)
 [junit]   at 
 org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:495)
 [junit]   at 
 org.apache.lucene.util._TestUtil.checkIndex(_TestUtil.java:148)
 [junit]   at 
 org.apache.lucene.util._TestUtil.checkIndex(_TestUtil.java:138)
 [junit]   at 
 org.apache.lucene.index.TestOmitTf.testMixedMerge(TestOmitTf.java:155)
 [junit]   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 [junit]

[jira] [Updated] (LUCENE-3022) DictionaryCompoundWordTokenFilter Flag onlyLongestMatch has no affect

2011-04-14 Thread JIRA


 [ 
https://issues.apache.org/jira/browse/LUCENE-3022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Johann Höchtl updated LUCENE-3022:
--

Attachment: LUCENE-3022.patch

Patch fixing this issue
including JUnitTest

 DictionaryCompoundWordTokenFilter Flag onlyLongestMatch has no affect
 -

 Key: LUCENE-3022
 URL: https://issues.apache.org/jira/browse/LUCENE-3022
 Project: Lucene - Java
  Issue Type: Bug
  Components: contrib/analyzers
Affects Versions: 2.9.4, 3.1
Reporter: Johann Höchtl
Priority: Minor
 Attachments: LUCENE-3022.patch

   Original Estimate: 5m
  Remaining Estimate: 5m

 When using the DictionaryCompoundWordTokenFilter with a german dictionary, I 
 got a strange behaviour:
 The german word streifenbluse (blouse with stripes) was decompounded to 
 streifen (stripe),reifen(tire) which makes no sense at all.
 I thought the flag onlyLongestMatch would fix this, because streifen is 
 longer than reifen, but it had no effect.
 So I reviewed the sourcecode and found the problem:
 [code]
 protected void decomposeInternal(final Token token) {
 // Only words longer than minWordSize get processed
 if (token.length()  this.minWordSize) {
   return;
 }
 
 char[] lowerCaseTermBuffer=makeLowerCaseCopy(token.buffer());
 
 for (int i=0;itoken.length()-this.minSubwordSize;++i) {
 Token longestMatchToken=null;
 for (int j=this.minSubwordSize-1;jthis.maxSubwordSize;++j) {
 if(i+jtoken.length()) {
 break;
 }
 if(dictionary.contains(lowerCaseTermBuffer, i, j)) {
 if (this.onlyLongestMatch) {
if (longestMatchToken!=null) {
  if (longestMatchToken.length()j) {
longestMatchToken=createToken(i,j,token);
  }
} else {
  longestMatchToken=createToken(i,j,token);
}
 } else {
tokens.add(createToken(i,j,token));
 }
 } 
 }
 if (this.onlyLongestMatch  longestMatchToken!=null) {
   tokens.add(longestMatchToken);
 }
 }
   }
 [/code]
 should be changed to 
 [code]
 protected void decomposeInternal(final Token token) {
 // Only words longer than minWordSize get processed
 if (token.termLength()  this.minWordSize) {
   return;
 }
 char[] lowerCaseTermBuffer=makeLowerCaseCopy(token.termBuffer());
 Token longestMatchToken=null;
 for (int i=0;itoken.termLength()-this.minSubwordSize;++i) {
 for (int j=this.minSubwordSize-1;jthis.maxSubwordSize;++j) {
 if(i+jtoken.termLength()) {
 break;
 }
 if(dictionary.contains(lowerCaseTermBuffer, i, j)) {
 if (this.onlyLongestMatch) {
if (longestMatchToken!=null) {
  if (longestMatchToken.termLength()j) {
longestMatchToken=createToken(i,j,token);
  }
} else {
  longestMatchToken=createToken(i,j,token);
}
 } else {
tokens.add(createToken(i,j,token));
 }
 }
 }
 }
 if (this.onlyLongestMatch  longestMatchToken!=null) {
 tokens.add(longestMatchToken);
 }
   }
 [/code]
 So, that only the longest token is really indexed and the onlyLongestMatch 
 Flag makes sense.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-3027) TestOmitTf.testMixedMerge random seed failure


 [ 
https://issues.apache.org/jira/browse/LUCENE-3027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-3027:
---

Attachment: LUCENE-3027.patch

Patch.

I added asserts that trip when FieldInfo illegally has omitTFAP true,
and storePayloads also true.  I fixed three places where this was able
to occur: merging, flushing and on loading a preflex index.

I'll commit shortly.


 TestOmitTf.testMixedMerge random seed failure
 -

 Key: LUCENE-3027
 URL: https://issues.apache.org/jira/browse/LUCENE-3027
 Project: Lucene - Java
  Issue Type: Bug
Reporter: selckin
 Attachments: LUCENE-3027.patch, output.txt


 Version: trunk r1091638
 ant test -Dtests.seed=-6595054217575280191:5576532348905930588
 [junit] - Standard Error -
 [junit] WARNING: test method: 'testDeMorgan' left thread running: 
 Thread[NRT search threads-1691-thread-2,5,main]
 [junit] RESOURCE LEAK: test method: 'testDeMorgan' left 1 thread(s) 
 running
 [junit] NOTE: reproduce with: ant test -Dtestcase=TestBooleanQuery 
 -Dtestmethod=testDeMorgan 
 -Dtests.seed=-6595054217575280191:5576532348905930588
 [junit] -  ---
 [junit] Testsuite: org.apache.lucene.index.TestNorms
 [junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 5.064 sec
 [junit] 
 [junit] Testsuite: org.apache.lucene.index.TestOmitTf
 [junit] Testcase: testMixedMerge(org.apache.lucene.index.TestOmitTf): 
 Caused an ERROR
 [junit] CheckIndex failed
 [junit] java.lang.RuntimeException: CheckIndex failed
 [junit]   at 
 org.apache.lucene.util._TestUtil.checkIndex(_TestUtil.java:152)
 [junit]   at 
 org.apache.lucene.util._TestUtil.checkIndex(_TestUtil.java:138)
 [junit]   at 
 org.apache.lucene.index.TestOmitTf.testMixedMerge(TestOmitTf.java:155)
 [junit]   at 
 org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1232)
 [junit]   at 
 org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1160)
 [junit] 
 [junit] 
 [junit] Tests run: 5, Failures: 0, Errors: 1, Time elapsed: 0.851 sec
 [junit] 
 [junit] - Standard Output ---
 [junit] CheckIndex failed
 [junit] Segments file=segments_1 numSegments=1 version=FORMAT_4_0 [Lucene 
 4.0]
 [junit]   1 of 1: name=_12 docCount=60
 [junit] codec=SegmentCodecs [codecs=[MockRandom, 
 MockVariableIntBlock(baseBlockSize=112)], provider=RandomCodecProvider: 
 {f1=MockRandom, f2=MockVariableIntBlock(baseBlockSize=112)}]
 [junit] compound=false
 [junit] hasProx=false
 [junit] numFiles=16
 [junit] size (MB)=0,01
 [junit] diagnostics = {optimize=true, mergeFactor=2, 
 os.version=2.6.37-gentoo, os=Linux, lucene.version=4.0-SNAPSHOT, 
 source=merge, os.arch=amd64, java.version=1.6.0_24, java.vendor=Sun 
 Microsystems Inc.}
 [junit] no deletions
 [junit] test: open reader.OK
 [junit] test: fields..OK [2 fields]
 [junit] test: field norms.OK [2 fields]
 [junit] test: terms, freq, prox...ERROR: java.io.IOException: Read 
 past EOF
 [junit] java.io.IOException: Read past EOF
 [junit]   at 
 org.apache.lucene.store.RAMInputStream.switchCurrentBuffer(RAMInputStream.java:90)
 [junit]   at 
 org.apache.lucene.store.RAMInputStream.readByte(RAMInputStream.java:63)
 [junit]   at 
 org.apache.lucene.store.MockIndexInputWrapper.readByte(MockIndexInputWrapper.java:105)
 [junit]   at org.apache.lucene.store.DataInput.readVInt(DataInput.java:94)
 [junit]   at 
 org.apache.lucene.index.codecs.sep.SepSkipListReader.readSkipData(SepSkipListReader.java:188)
 [junit]   at 
 org.apache.lucene.index.codecs.MultiLevelSkipListReader.loadNextSkip(MultiLevelSkipListReader.java:142)
 [junit]   at 
 org.apache.lucene.index.codecs.MultiLevelSkipListReader.skipTo(MultiLevelSkipListReader.java:112)
 [junit]   at 
 org.apache.lucene.index.codecs.sep.SepPostingsReaderImpl$SepDocsEnum.advance(SepPostingsReaderImpl.java:454)
 [junit]   at 
 org.apache.lucene.index.CheckIndex.testTermIndex(CheckIndex.java:782)
 [junit]   at 
 org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:495)
 [junit]   at 
 org.apache.lucene.util._TestUtil.checkIndex(_TestUtil.java:148)
 [junit]   at 
 org.apache.lucene.util._TestUtil.checkIndex(_TestUtil.java:138)
 [junit]   at 
 org.apache.lucene.index.TestOmitTf.testMixedMerge(TestOmitTf.java:155)
 [junit]   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 [junit]   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
 [junit]   at

[jira] [Commented] (LUCENE-3027) TestOmitTf.testMixedMerge random seed failure


[ 
https://issues.apache.org/jira/browse/LUCENE-3027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13019765#comment-13019765
 ] 

Simon Willnauer commented on LUCENE-3027:
-

patch looks good mike!

simon

 TestOmitTf.testMixedMerge random seed failure
 -

 Key: LUCENE-3027
 URL: https://issues.apache.org/jira/browse/LUCENE-3027
 Project: Lucene - Java
  Issue Type: Bug
Reporter: selckin
 Attachments: LUCENE-3027.patch, output.txt


 Version: trunk r1091638
 ant test -Dtests.seed=-6595054217575280191:5576532348905930588
 [junit] - Standard Error -
 [junit] WARNING: test method: 'testDeMorgan' left thread running: 
 Thread[NRT search threads-1691-thread-2,5,main]
 [junit] RESOURCE LEAK: test method: 'testDeMorgan' left 1 thread(s) 
 running
 [junit] NOTE: reproduce with: ant test -Dtestcase=TestBooleanQuery 
 -Dtestmethod=testDeMorgan 
 -Dtests.seed=-6595054217575280191:5576532348905930588
 [junit] -  ---
 [junit] Testsuite: org.apache.lucene.index.TestNorms
 [junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 5.064 sec
 [junit] 
 [junit] Testsuite: org.apache.lucene.index.TestOmitTf
 [junit] Testcase: testMixedMerge(org.apache.lucene.index.TestOmitTf): 
 Caused an ERROR
 [junit] CheckIndex failed
 [junit] java.lang.RuntimeException: CheckIndex failed
 [junit]   at 
 org.apache.lucene.util._TestUtil.checkIndex(_TestUtil.java:152)
 [junit]   at 
 org.apache.lucene.util._TestUtil.checkIndex(_TestUtil.java:138)
 [junit]   at 
 org.apache.lucene.index.TestOmitTf.testMixedMerge(TestOmitTf.java:155)
 [junit]   at 
 org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1232)
 [junit]   at 
 org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1160)
 [junit] 
 [junit] 
 [junit] Tests run: 5, Failures: 0, Errors: 1, Time elapsed: 0.851 sec
 [junit] 
 [junit] - Standard Output ---
 [junit] CheckIndex failed
 [junit] Segments file=segments_1 numSegments=1 version=FORMAT_4_0 [Lucene 
 4.0]
 [junit]   1 of 1: name=_12 docCount=60
 [junit] codec=SegmentCodecs [codecs=[MockRandom, 
 MockVariableIntBlock(baseBlockSize=112)], provider=RandomCodecProvider: 
 {f1=MockRandom, f2=MockVariableIntBlock(baseBlockSize=112)}]
 [junit] compound=false
 [junit] hasProx=false
 [junit] numFiles=16
 [junit] size (MB)=0,01
 [junit] diagnostics = {optimize=true, mergeFactor=2, 
 os.version=2.6.37-gentoo, os=Linux, lucene.version=4.0-SNAPSHOT, 
 source=merge, os.arch=amd64, java.version=1.6.0_24, java.vendor=Sun 
 Microsystems Inc.}
 [junit] no deletions
 [junit] test: open reader.OK
 [junit] test: fields..OK [2 fields]
 [junit] test: field norms.OK [2 fields]
 [junit] test: terms, freq, prox...ERROR: java.io.IOException: Read 
 past EOF
 [junit] java.io.IOException: Read past EOF
 [junit]   at 
 org.apache.lucene.store.RAMInputStream.switchCurrentBuffer(RAMInputStream.java:90)
 [junit]   at 
 org.apache.lucene.store.RAMInputStream.readByte(RAMInputStream.java:63)
 [junit]   at 
 org.apache.lucene.store.MockIndexInputWrapper.readByte(MockIndexInputWrapper.java:105)
 [junit]   at org.apache.lucene.store.DataInput.readVInt(DataInput.java:94)
 [junit]   at 
 org.apache.lucene.index.codecs.sep.SepSkipListReader.readSkipData(SepSkipListReader.java:188)
 [junit]   at 
 org.apache.lucene.index.codecs.MultiLevelSkipListReader.loadNextSkip(MultiLevelSkipListReader.java:142)
 [junit]   at 
 org.apache.lucene.index.codecs.MultiLevelSkipListReader.skipTo(MultiLevelSkipListReader.java:112)
 [junit]   at 
 org.apache.lucene.index.codecs.sep.SepPostingsReaderImpl$SepDocsEnum.advance(SepPostingsReaderImpl.java:454)
 [junit]   at 
 org.apache.lucene.index.CheckIndex.testTermIndex(CheckIndex.java:782)
 [junit]   at 
 org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:495)
 [junit]   at 
 org.apache.lucene.util._TestUtil.checkIndex(_TestUtil.java:148)
 [junit]   at 
 org.apache.lucene.util._TestUtil.checkIndex(_TestUtil.java:138)
 [junit]   at 
 org.apache.lucene.index.TestOmitTf.testMixedMerge(TestOmitTf.java:155)
 [junit]   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 [junit]   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
 [junit]   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 [junit]   at java.lang.reflect.Method.invoke(Method.java:597)
 [junit]   at

[jira] [Updated] (LUCENE-3018) Lucene Native Directory implementation need automated build


 [ 
https://issues.apache.org/jira/browse/LUCENE-3018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Thacker updated LUCENE-3018:
--

Attachment: LUCENE-3018.patch

I have modified the build.xml . There is one problem with this build file- The 
linking to the JNI header files is still giving errors. What am I doing wrong ?

This is how I am running the ant task :
{code:title=Command Line|borderStyle=solid}
ant -lib lucene/dev/trunk/lucene/cpptasks.jar build-native
{code} 

 Lucene Native Directory implementation need automated build
 ---

 Key: LUCENE-3018
 URL: https://issues.apache.org/jira/browse/LUCENE-3018
 Project: Lucene - Java
  Issue Type: Wish
  Components: Build
Affects Versions: 4.0
Reporter: Simon Willnauer
Assignee: Varun Thacker
Priority: Minor
 Fix For: 4.0

 Attachments: LUCENE-3018.patch


 Currently the native directory impl in contrib/misc require manual action to 
 compile the c code (partially) documented in 
  
 https://svn.apache.org/repos/asf/lucene/dev/trunk/lucene/contrib/misc/src/java/overview.html
 yet it would be nice if we had an ant task and documentation for all 
 platforms how to compile them and set up the prerequisites.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-2956) Support updateDocument() with DWPTs

[
https://issues.apache.org/jira/browse/LUCENE-2956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13019768#comment-13019768
]

Simon Willnauer commented on LUCENE-2956:
-

bq. Shall we start again on LUCENE-2312? I think we still need/want to use
sequence ids there. The RT DWPTs shouldn't have so many documents that using a
long[] for the sequence ids is too RAM consuming?

Jason I think nothing prevents you from start working on this again Yet, I
think we should freeze the branch now and only allow merging, bug fixes, tests
and documentation fixes until we land on trunk. Once we are there we can freely
push stuff in the branch again and make it work with seq. ids.

thoughts?

Support updateDocument() with DWPTs
---

Key: LUCENE-2956
URL: https://issues.apache.org/jira/browse/LUCENE-2956
Project: Lucene - Java
Issue Type: Bug
Components: Index
Affects Versions: Realtime Branch
Reporter: Michael Busch
Assignee: Simon Willnauer
Priority: Minor
Fix For: Realtime Branch

Attachments: LUCENE-2956.patch, LUCENE-2956.patch

With separate DocumentsWriterPerThreads (DWPT) it can currently happen that
the delete part of an updateDocument() is flushed and committed separately
from the corresponding new document.
We need to make sure that updateDocument() is always an atomic operation from
a IW.commit() and IW.getReader() perspective. See LUCENE-2324 for more
details.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-2956) Support updateDocument() with DWPTs


 [ 
https://issues.apache.org/jira/browse/LUCENE-2956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Willnauer resolved LUCENE-2956.
-

Resolution: Fixed

committed to branch

 Support updateDocument() with DWPTs
 ---

 Key: LUCENE-2956
 URL: https://issues.apache.org/jira/browse/LUCENE-2956
 Project: Lucene - Java
  Issue Type: Bug
  Components: Index
Affects Versions: Realtime Branch
Reporter: Michael Busch
Assignee: Simon Willnauer
Priority: Minor
 Fix For: Realtime Branch

 Attachments: LUCENE-2956.patch, LUCENE-2956.patch


 With separate DocumentsWriterPerThreads (DWPT) it can currently happen that 
 the delete part of an updateDocument() is flushed and committed separately 
 from the corresponding new document.
 We need to make sure that updateDocument() is always an atomic operation from 
 a IW.commit() and IW.getReader() perspective.  See LUCENE-2324 for more 
 details.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (LUCENE-3028) IW.getReader() returns inconsistent reader on RT Branch

IW.getReader() returns inconsistent reader on RT Branch
---

 Key: LUCENE-3028
 URL: https://issues.apache.org/jira/browse/LUCENE-3028
 Project: Lucene - Java
  Issue Type: Bug
  Components: Index
Affects Versions: Realtime Branch
Reporter: Simon Willnauer
Assignee: Simon Willnauer
 Fix For: Realtime Branch


I extended the testcase TestRollingUpdates#testUpdateSameDoc to pull a NRT 
reader after each update and asserted that is always sees only one document. 
Yet, this fails with current branch since there is a problem in how we flush in 
the getReader() case. What happens here is that we flush all threads and then 
release the lock (letting other flushes which came in after we entered the 
flushAllThread context, continue) so that we could concurrently get a new 
segment that transports global deletes without the corresponding add. They 
sneak in while we continue to open the NRT reader which in turn sees 
inconsistent results.

I will upload a patch soon

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-2378) FST-based Lookup (suggestions) for prefix matches.


 [ 
https://issues.apache.org/jira/browse/SOLR-2378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dawid Weiss updated SOLR-2378:
--

Attachment: SOLR-2378.patch

Adding Solr tests, removed the big queries file so that it doesn't bloat the 
patch (will commit it in directly).

 FST-based Lookup (suggestions) for prefix matches.
 --

 Key: SOLR-2378
 URL: https://issues.apache.org/jira/browse/SOLR-2378
 Project: Solr
  Issue Type: New Feature
  Components: spellchecker
Reporter: Dawid Weiss
Assignee: Dawid Weiss
  Labels: lookup, prefix
 Fix For: 4.0

 Attachments: SOLR-2378.patch, SOLR-2378.patch


 Implement a subclass of Lookup based on finite state automata/ transducers 
 (Lucene FST package). This issue is for implementing a relatively basic 
 prefix matcher, we will handle infixes and other types of input matches 
 gradually. Impl. phases:
 - -write a DFA based suggester effectively identical to ternary tree based 
 solution right now,-
 - -baseline benchmark against tern. tree (memory consumption, rebuilding 
 speed, indexing speed; reuse Andrzej's benchmark code)-
 - -modify DFA to encode term weights directly in the automaton (optimize for 
 onlyMostPopular case)-
 - -benchmark again-
 - add infix suggestion support with prefix matches boosted higher (?)
 - benchmark again
 - modify the tutorial on the wiki [http://wiki.apache.org/solr/Suggester]

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-2378) FST-based Lookup (suggestions) for prefix matches.


 [ 
https://issues.apache.org/jira/browse/SOLR-2378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dawid Weiss updated SOLR-2378:
--

Attachment: (was: SOLR-2378.patch)

 FST-based Lookup (suggestions) for prefix matches.
 --

 Key: SOLR-2378
 URL: https://issues.apache.org/jira/browse/SOLR-2378
 Project: Solr
  Issue Type: New Feature
  Components: spellchecker
Reporter: Dawid Weiss
Assignee: Dawid Weiss
  Labels: lookup, prefix
 Fix For: 4.0

 Attachments: SOLR-2378.patch


 Implement a subclass of Lookup based on finite state automata/ transducers 
 (Lucene FST package). This issue is for implementing a relatively basic 
 prefix matcher, we will handle infixes and other types of input matches 
 gradually. Impl. phases:
 - -write a DFA based suggester effectively identical to ternary tree based 
 solution right now,-
 - -baseline benchmark against tern. tree (memory consumption, rebuilding 
 speed, indexing speed; reuse Andrzej's benchmark code)-
 - -modify DFA to encode term weights directly in the automaton (optimize for 
 onlyMostPopular case)-
 - -benchmark again-
 - add infix suggestion support with prefix matches boosted higher (?)
 - benchmark again
 - modify the tutorial on the wiki [http://wiki.apache.org/solr/Suggester]

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-3027) TestOmitTf.testMixedMerge random seed failure


 [ 
https://issues.apache.org/jira/browse/LUCENE-3027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless resolved LUCENE-3027.


Resolution: Fixed

Thanks selckin!

 TestOmitTf.testMixedMerge random seed failure
 -

 Key: LUCENE-3027
 URL: https://issues.apache.org/jira/browse/LUCENE-3027
 Project: Lucene - Java
  Issue Type: Bug
Reporter: selckin
 Attachments: LUCENE-3027.patch, output.txt


 Version: trunk r1091638
 ant test -Dtests.seed=-6595054217575280191:5576532348905930588
 [junit] - Standard Error -
 [junit] WARNING: test method: 'testDeMorgan' left thread running: 
 Thread[NRT search threads-1691-thread-2,5,main]
 [junit] RESOURCE LEAK: test method: 'testDeMorgan' left 1 thread(s) 
 running
 [junit] NOTE: reproduce with: ant test -Dtestcase=TestBooleanQuery 
 -Dtestmethod=testDeMorgan 
 -Dtests.seed=-6595054217575280191:5576532348905930588
 [junit] -  ---
 [junit] Testsuite: org.apache.lucene.index.TestNorms
 [junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 5.064 sec
 [junit] 
 [junit] Testsuite: org.apache.lucene.index.TestOmitTf
 [junit] Testcase: testMixedMerge(org.apache.lucene.index.TestOmitTf): 
 Caused an ERROR
 [junit] CheckIndex failed
 [junit] java.lang.RuntimeException: CheckIndex failed
 [junit]   at 
 org.apache.lucene.util._TestUtil.checkIndex(_TestUtil.java:152)
 [junit]   at 
 org.apache.lucene.util._TestUtil.checkIndex(_TestUtil.java:138)
 [junit]   at 
 org.apache.lucene.index.TestOmitTf.testMixedMerge(TestOmitTf.java:155)
 [junit]   at 
 org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1232)
 [junit]   at 
 org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1160)
 [junit] 
 [junit] 
 [junit] Tests run: 5, Failures: 0, Errors: 1, Time elapsed: 0.851 sec
 [junit] 
 [junit] - Standard Output ---
 [junit] CheckIndex failed
 [junit] Segments file=segments_1 numSegments=1 version=FORMAT_4_0 [Lucene 
 4.0]
 [junit]   1 of 1: name=_12 docCount=60
 [junit] codec=SegmentCodecs [codecs=[MockRandom, 
 MockVariableIntBlock(baseBlockSize=112)], provider=RandomCodecProvider: 
 {f1=MockRandom, f2=MockVariableIntBlock(baseBlockSize=112)}]
 [junit] compound=false
 [junit] hasProx=false
 [junit] numFiles=16
 [junit] size (MB)=0,01
 [junit] diagnostics = {optimize=true, mergeFactor=2, 
 os.version=2.6.37-gentoo, os=Linux, lucene.version=4.0-SNAPSHOT, 
 source=merge, os.arch=amd64, java.version=1.6.0_24, java.vendor=Sun 
 Microsystems Inc.}
 [junit] no deletions
 [junit] test: open reader.OK
 [junit] test: fields..OK [2 fields]
 [junit] test: field norms.OK [2 fields]
 [junit] test: terms, freq, prox...ERROR: java.io.IOException: Read 
 past EOF
 [junit] java.io.IOException: Read past EOF
 [junit]   at 
 org.apache.lucene.store.RAMInputStream.switchCurrentBuffer(RAMInputStream.java:90)
 [junit]   at 
 org.apache.lucene.store.RAMInputStream.readByte(RAMInputStream.java:63)
 [junit]   at 
 org.apache.lucene.store.MockIndexInputWrapper.readByte(MockIndexInputWrapper.java:105)
 [junit]   at org.apache.lucene.store.DataInput.readVInt(DataInput.java:94)
 [junit]   at 
 org.apache.lucene.index.codecs.sep.SepSkipListReader.readSkipData(SepSkipListReader.java:188)
 [junit]   at 
 org.apache.lucene.index.codecs.MultiLevelSkipListReader.loadNextSkip(MultiLevelSkipListReader.java:142)
 [junit]   at 
 org.apache.lucene.index.codecs.MultiLevelSkipListReader.skipTo(MultiLevelSkipListReader.java:112)
 [junit]   at 
 org.apache.lucene.index.codecs.sep.SepPostingsReaderImpl$SepDocsEnum.advance(SepPostingsReaderImpl.java:454)
 [junit]   at 
 org.apache.lucene.index.CheckIndex.testTermIndex(CheckIndex.java:782)
 [junit]   at 
 org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:495)
 [junit]   at 
 org.apache.lucene.util._TestUtil.checkIndex(_TestUtil.java:148)
 [junit]   at 
 org.apache.lucene.util._TestUtil.checkIndex(_TestUtil.java:138)
 [junit]   at 
 org.apache.lucene.index.TestOmitTf.testMixedMerge(TestOmitTf.java:155)
 [junit]   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 [junit]   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
 [junit]   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 [junit]   at java.lang.reflect.Method.invoke(Method.java:597)
 [junit]   at

[jira] [Updated] (SOLR-2378) FST-based Lookup (suggestions) for prefix matches.


 [ 
https://issues.apache.org/jira/browse/SOLR-2378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dawid Weiss updated SOLR-2378:
--

Description: 
Implement a subclass of Lookup based on finite state automata/ transducers 
(Lucene FST package). This issue is for implementing a relatively basic prefix 
matcher, we will handle infixes and other types of input matches gradually. 
Impl. phases:

- -write a DFA based suggester effectively identical to ternary tree based 
solution right now,-
- -baseline benchmark against tern. tree (memory consumption, rebuilding speed, 
indexing speed; reuse Andrzej's benchmark code)-
- -modify DFA to encode term weights directly in the automaton (optimize for 
onlyMostPopular case)-
- -benchmark again-
- -benchmark again-
- -modify the tutorial on the wiki [http://wiki.apache.org/solr/Suggester]-


  was:
Implement a subclass of Lookup based on finite state automata/ transducers 
(Lucene FST package). This issue is for implementing a relatively basic prefix 
matcher, we will handle infixes and other types of input matches gradually. 
Impl. phases:

- -write a DFA based suggester effectively identical to ternary tree based 
solution right now,-
- -baseline benchmark against tern. tree (memory consumption, rebuilding speed, 
indexing speed; reuse Andrzej's benchmark code)-
- -modify DFA to encode term weights directly in the automaton (optimize for 
onlyMostPopular case)-
- -benchmark again-
- add infix suggestion support with prefix matches boosted higher (?)
- benchmark again
- modify the tutorial on the wiki [http://wiki.apache.org/solr/Suggester]


 FST-based Lookup (suggestions) for prefix matches.
 --

 Key: SOLR-2378
 URL: https://issues.apache.org/jira/browse/SOLR-2378
 Project: Solr
  Issue Type: New Feature
  Components: spellchecker
Reporter: Dawid Weiss
Assignee: Dawid Weiss
  Labels: lookup, prefix
 Fix For: 4.0

 Attachments: SOLR-2378.patch


 Implement a subclass of Lookup based on finite state automata/ transducers 
 (Lucene FST package). This issue is for implementing a relatively basic 
 prefix matcher, we will handle infixes and other types of input matches 
 gradually. Impl. phases:
 - -write a DFA based suggester effectively identical to ternary tree based 
 solution right now,-
 - -baseline benchmark against tern. tree (memory consumption, rebuilding 
 speed, indexing speed; reuse Andrzej's benchmark code)-
 - -modify DFA to encode term weights directly in the automaton (optimize for 
 onlyMostPopular case)-
 - -benchmark again-
 - -benchmark again-
 - -modify the tutorial on the wiki [http://wiki.apache.org/solr/Suggester]-

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-2378) FST-based Lookup (suggestions) for prefix matches.


 [ 
https://issues.apache.org/jira/browse/SOLR-2378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dawid Weiss updated SOLR-2378:
--

Description: 
Implement a subclass of Lookup based on finite state automata/ transducers 
(Lucene FST package). This issue is for implementing a relatively basic prefix 
matcher, we will handle infixes and other types of input matches gradually. 
Impl. phases:

- -write a DFA based suggester effectively identical to ternary tree based 
solution right now,-
- -baseline benchmark against tern. tree (memory consumption, rebuilding speed, 
indexing speed; reuse Andrzej's benchmark code)-
- -modify DFA to encode term weights directly in the automaton (optimize for 
onlyMostPopular case)-
- -benchmark again-
- -benchmark again-
- -modify the tutorial on the wiki- [http://wiki.apache.org/solr/Suggester]


  was:
Implement a subclass of Lookup based on finite state automata/ transducers 
(Lucene FST package). This issue is for implementing a relatively basic prefix 
matcher, we will handle infixes and other types of input matches gradually. 
Impl. phases:

- -write a DFA based suggester effectively identical to ternary tree based 
solution right now,-
- -baseline benchmark against tern. tree (memory consumption, rebuilding speed, 
indexing speed; reuse Andrzej's benchmark code)-
- -modify DFA to encode term weights directly in the automaton (optimize for 
onlyMostPopular case)-
- -benchmark again-
- -benchmark again-
- -modify the tutorial on the wiki [http://wiki.apache.org/solr/Suggester]-



 FST-based Lookup (suggestions) for prefix matches.
 --

 Key: SOLR-2378
 URL: https://issues.apache.org/jira/browse/SOLR-2378
 Project: Solr
  Issue Type: New Feature
  Components: spellchecker
Reporter: Dawid Weiss
Assignee: Dawid Weiss
  Labels: lookup, prefix
 Fix For: 4.0

 Attachments: SOLR-2378.patch


 Implement a subclass of Lookup based on finite state automata/ transducers 
 (Lucene FST package). This issue is for implementing a relatively basic 
 prefix matcher, we will handle infixes and other types of input matches 
 gradually. Impl. phases:
 - -write a DFA based suggester effectively identical to ternary tree based 
 solution right now,-
 - -baseline benchmark against tern. tree (memory consumption, rebuilding 
 speed, indexing speed; reuse Andrzej's benchmark code)-
 - -modify DFA to encode term weights directly in the automaton (optimize for 
 onlyMostPopular case)-
 - -benchmark again-
 - -benchmark again-
 - -modify the tutorial on the wiki- [http://wiki.apache.org/solr/Suggester]

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (SOLR-2378) FST-based Lookup (suggestions) for prefix matches.


 [ 
https://issues.apache.org/jira/browse/SOLR-2378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dawid Weiss resolved SOLR-2378.
---

Resolution: Fixed

In trunk.

 FST-based Lookup (suggestions) for prefix matches.
 --

 Key: SOLR-2378
 URL: https://issues.apache.org/jira/browse/SOLR-2378
 Project: Solr
  Issue Type: New Feature
  Components: spellchecker
Reporter: Dawid Weiss
Assignee: Dawid Weiss
  Labels: lookup, prefix
 Fix For: 4.0

 Attachments: SOLR-2378.patch


 Implement a subclass of Lookup based on finite state automata/ transducers 
 (Lucene FST package). This issue is for implementing a relatively basic 
 prefix matcher, we will handle infixes and other types of input matches 
 gradually. Impl. phases:
 - -write a DFA based suggester effectively identical to ternary tree based 
 solution right now,-
 - -baseline benchmark against tern. tree (memory consumption, rebuilding 
 speed, indexing speed; reuse Andrzej's benchmark code)-
 - -modify DFA to encode term weights directly in the automaton (optimize for 
 onlyMostPopular case)-
 - -benchmark again-
 - -benchmark again-
 - -modify the tutorial on the wiki- [http://wiki.apache.org/solr/Suggester]

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

IndexWriter.ramSizeInBytes

2011-04-14 Thread Shai Erera

Hi

I'm indexing w/ IW, flush-by-RAM=off and flush-by-doc=MAX_INT. Whenever
iw.ramSizeInBytes() = threshold, I commit the changes, serializes the
Directory somewhere and starts with a new Directory and IW instance.

The threshold is currently 32MB. I noticed though that the size of the
serialized Directory is nearly half (16 MB). Is that expected? Will I see
that behavior every time (e.g. w/ large stored fields), or is it data
dependent? I assume that the data can affect the compression, but I never
thought that by 50% factor, from RAM to disk.

Shai

[jira] [Commented] (LUCENE-3018) Lucene Native Directory implementation need automated build


[ 
https://issues.apache.org/jira/browse/LUCENE-3018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13019786#comment-13019786
 ] 

Simon Willnauer commented on LUCENE-3018:
-

It seems like that you need to use includepath rather than libset here

something like 
{code}
includepath
  pathelement location=${java.home}/include/
  pathelement location=${java.home}/include/linux/
/includepath
{code}

simon

 Lucene Native Directory implementation need automated build
 ---

 Key: LUCENE-3018
 URL: https://issues.apache.org/jira/browse/LUCENE-3018
 Project: Lucene - Java
  Issue Type: Wish
  Components: Build
Affects Versions: 4.0
Reporter: Simon Willnauer
Assignee: Varun Thacker
Priority: Minor
 Fix For: 4.0

 Attachments: LUCENE-3018.patch


 Currently the native directory impl in contrib/misc require manual action to 
 compile the c code (partially) documented in 
  
 https://svn.apache.org/repos/asf/lucene/dev/trunk/lucene/contrib/misc/src/java/overview.html
 yet it would be nice if we had an ant task and documentation for all 
 platforms how to compile them and set up the prerequisites.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-3028) IW.getReader() returns inconsistent reader on RT Branch


 [ 
https://issues.apache.org/jira/browse/LUCENE-3028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Willnauer updated LUCENE-3028:


Comment: was deleted

(was: here is a first patch)

 IW.getReader() returns inconsistent reader on RT Branch
 ---

 Key: LUCENE-3028
 URL: https://issues.apache.org/jira/browse/LUCENE-3028
 Project: Lucene - Java
  Issue Type: Bug
  Components: Index
Affects Versions: Realtime Branch
Reporter: Simon Willnauer
Assignee: Simon Willnauer
 Fix For: Realtime Branch

 Attachments: LUCENE-3028.patch


 I extended the testcase TestRollingUpdates#testUpdateSameDoc to pull a NRT 
 reader after each update and asserted that is always sees only one document. 
 Yet, this fails with current branch since there is a problem in how we flush 
 in the getReader() case. What happens here is that we flush all threads and 
 then release the lock (letting other flushes which came in after we entered 
 the flushAllThread context, continue) so that we could concurrently get a new 
 segment that transports global deletes without the corresponding add. They 
 sneak in while we continue to open the NRT reader which in turn sees 
 inconsistent results.
 I will upload a patch soon

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-3028) IW.getReader() returns inconsistent reader on RT Branch


 [ 
https://issues.apache.org/jira/browse/LUCENE-3028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Willnauer updated LUCENE-3028:


Attachment: (was: LUCENE-3028.patch)

 IW.getReader() returns inconsistent reader on RT Branch
 ---

 Key: LUCENE-3028
 URL: https://issues.apache.org/jira/browse/LUCENE-3028
 Project: Lucene - Java
  Issue Type: Bug
  Components: Index
Affects Versions: Realtime Branch
Reporter: Simon Willnauer
Assignee: Simon Willnauer
 Fix For: Realtime Branch

 Attachments: LUCENE-3028.patch


 I extended the testcase TestRollingUpdates#testUpdateSameDoc to pull a NRT 
 reader after each update and asserted that is always sees only one document. 
 Yet, this fails with current branch since there is a problem in how we flush 
 in the getReader() case. What happens here is that we flush all threads and 
 then release the lock (letting other flushes which came in after we entered 
 the flushAllThread context, continue) so that we could concurrently get a new 
 segment that transports global deletes without the corresponding add. They 
 sneak in while we continue to open the NRT reader which in turn sees 
 inconsistent results.
 I will upload a patch soon

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-3028) IW.getReader() returns inconsistent reader on RT Branch


 [ 
https://issues.apache.org/jira/browse/LUCENE-3028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Willnauer updated LUCENE-3028:


Attachment: LUCENE-3028.patch

here is a first patch

 IW.getReader() returns inconsistent reader on RT Branch
 ---

 Key: LUCENE-3028
 URL: https://issues.apache.org/jira/browse/LUCENE-3028
 Project: Lucene - Java
  Issue Type: Bug
  Components: Index
Affects Versions: Realtime Branch
Reporter: Simon Willnauer
Assignee: Simon Willnauer
 Fix For: Realtime Branch

 Attachments: LUCENE-3028.patch


 I extended the testcase TestRollingUpdates#testUpdateSameDoc to pull a NRT 
 reader after each update and asserted that is always sees only one document. 
 Yet, this fails with current branch since there is a problem in how we flush 
 in the getReader() case. What happens here is that we flush all threads and 
 then release the lock (letting other flushes which came in after we entered 
 the flushAllThread context, continue) so that we could concurrently get a new 
 segment that transports global deletes without the corresponding add. They 
 sneak in while we continue to open the NRT reader which in turn sees 
 inconsistent results.
 I will upload a patch soon

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-3028) IW.getReader() returns inconsistent reader on RT Branch


 [ 
https://issues.apache.org/jira/browse/LUCENE-3028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Willnauer updated LUCENE-3028:


Attachment: LUCENE-3028.patch

here is a first patch

 IW.getReader() returns inconsistent reader on RT Branch
 ---

 Key: LUCENE-3028
 URL: https://issues.apache.org/jira/browse/LUCENE-3028
 Project: Lucene - Java
  Issue Type: Bug
  Components: Index
Affects Versions: Realtime Branch
Reporter: Simon Willnauer
Assignee: Simon Willnauer
 Fix For: Realtime Branch

 Attachments: LUCENE-3028.patch


 I extended the testcase TestRollingUpdates#testUpdateSameDoc to pull a NRT 
 reader after each update and asserted that is always sees only one document. 
 Yet, this fails with current branch since there is a problem in how we flush 
 in the getReader() case. What happens here is that we flush all threads and 
 then release the lock (letting other flushes which came in after we entered 
 the flushAllThread context, continue) so that we could concurrently get a new 
 segment that transports global deletes without the corresponding add. They 
 sneak in while we continue to open the NRT reader which in turn sees 
 inconsistent results.
 I will upload a patch soon

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3018) Lucene Native Directory implementation need automated build


[ 
https://issues.apache.org/jira/browse/LUCENE-3018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13019789#comment-13019789
 ] 

Simon Willnauer commented on LUCENE-3018:
-

regarding you attached patch. you should make sure that you are checking the 
{noformat} Grant license to ASF for inclusion in ASF works (as per the Apache 
License §5) {noformat} checkbox in the attach dialog when you uploading 
patches. Can you also provide a quick guide how to install the cpp tasks for 
ant and maybe upload the jars you have added to make this task work?

simon

 Lucene Native Directory implementation need automated build
 ---

 Key: LUCENE-3018
 URL: https://issues.apache.org/jira/browse/LUCENE-3018
 Project: Lucene - Java
  Issue Type: Wish
  Components: Build
Affects Versions: 4.0
Reporter: Simon Willnauer
Assignee: Varun Thacker
Priority: Minor
 Fix For: 4.0

 Attachments: LUCENE-3018.patch


 Currently the native directory impl in contrib/misc require manual action to 
 compile the c code (partially) documented in 
  
 https://svn.apache.org/repos/asf/lucene/dev/trunk/lucene/contrib/misc/src/java/overview.html
 yet it would be nice if we had an ant task and documentation for all 
 platforms how to compile them and set up the prerequisites.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-3028) IW.getReader() returns inconsistent reader on RT Branch


 [ 
https://issues.apache.org/jira/browse/LUCENE-3028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Willnauer updated LUCENE-3028:


Attachment: LUCENE-3028.patch

next iteration, edited some asserts in DW

 IW.getReader() returns inconsistent reader on RT Branch
 ---

 Key: LUCENE-3028
 URL: https://issues.apache.org/jira/browse/LUCENE-3028
 Project: Lucene - Java
  Issue Type: Bug
  Components: Index
Affects Versions: Realtime Branch
Reporter: Simon Willnauer
Assignee: Simon Willnauer
 Fix For: Realtime Branch

 Attachments: LUCENE-3028.patch, LUCENE-3028.patch


 I extended the testcase TestRollingUpdates#testUpdateSameDoc to pull a NRT 
 reader after each update and asserted that is always sees only one document. 
 Yet, this fails with current branch since there is a problem in how we flush 
 in the getReader() case. What happens here is that we flush all threads and 
 then release the lock (letting other flushes which came in after we entered 
 the flushAllThread context, continue) so that we could concurrently get a new 
 segment that transports global deletes without the corresponding add. They 
 sneak in while we continue to open the NRT reader which in turn sees 
 inconsistent results.
 I will upload a patch soon

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3023) Land DWPT on trunk

[
https://issues.apache.org/jira/browse/LUCENE-3023?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13019801#comment-13019801
]

Simon Willnauer commented on LUCENE-3023:
-

I added a [jenkins
build|https://builds.apache.org/hudson/job/Lucene-Solr-tests-only-realtime_search-branch/]
that runs every 4 hours to give the RT branch some exercise. I added my email
address and buschmi to the recipients if the build fails if you wanna be
added let me know.
From now on we should only commit bugfixes, documentation and merges with
trunk to this branch. From my point of view there is only one blocker left
here (LUCENE-3028) so the remaining work is mainly reviewing the current state
and polishing the javadocs. I will go over IW, IR and DW java docs as a start.

Land DWPT on trunk
--

Key: LUCENE-3023
URL: https://issues.apache.org/jira/browse/LUCENE-3023
Project: Lucene - Java
Issue Type: Task
Affects Versions: CSF branch, 4.0
Reporter: Simon Willnauer
Assignee: Simon Willnauer
Fix For: 4.0

With LUCENE-2956 we have resolved the last remaining issue for LUCENE-2324 so
we can proceed landing the DWPT development on trunk soon. I think one of the
bigger issues here is to make sure that all JavaDocs for IW etc. are still
correct though. I will start going through that first.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-2573) Tiered flushing of DWPTs by RAM with low/high water marks

[
https://issues.apache.org/jira/browse/LUCENE-2573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Simon Willnauer resolved LUCENE-2573.
-

Resolution: Fixed

this is committed to branch reviews should go through LUCENE-3023

Tiered flushing of DWPTs by RAM with low/high water marks
-

Key: LUCENE-2573
URL: https://issues.apache.org/jira/browse/LUCENE-2573
Project: Lucene - Java
Issue Type: Improvement
Components: Index
Reporter: Michael Busch
Assignee: Simon Willnauer
Priority: Minor
Fix For: Realtime Branch

Attachments: LUCENE-2573.patch, LUCENE-2573.patch, LUCENE-2573.patch,
LUCENE-2573.patch, LUCENE-2573.patch, LUCENE-2573.patch, LUCENE-2573.patch,
LUCENE-2573.patch, LUCENE-2573.patch

Now that we have DocumentsWriterPerThreads we need to track total consumed
RAM across all DWPTs.
A flushing strategy idea that was discussed in LUCENE-2324 was to use a
tiered approach:
- Flush the first DWPT at a low water mark (e.g. at 90% of allowed RAM)
- Flush all DWPTs at a high water mark (e.g. at 110%)
- Use linear steps in between high and low watermark: E.g. when 5 DWPTs are
used, flush at 90%, 95%, 100%, 105% and 110%.
Should we allow the user to configure the low and high water mark values
explicitly using total values (e.g. low water mark at 120MB, high water mark
at 140MB)? Or shall we keep for simplicity the single setRAMBufferSizeMB()
config method and use something like 90% and 110% for the water marks?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-2324) Per thread DocumentsWriters that write their own private segments


[ 
https://issues.apache.org/jira/browse/LUCENE-2324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13019803#comment-13019803
 ] 

Simon Willnauer commented on LUCENE-2324:
-

guys I opened LUCENE-3023 to land on trunk! can I close this and we iterate on 
LUCENE-3023 from now on?

simon

 Per thread DocumentsWriters that write their own private segments
 -

 Key: LUCENE-2324
 URL: https://issues.apache.org/jira/browse/LUCENE-2324
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Reporter: Michael Busch
Assignee: Michael Busch
Priority: Minor
 Fix For: Realtime Branch

 Attachments: LUCENE-2324-SMALL.patch, LUCENE-2324-SMALL.patch, 
 LUCENE-2324-SMALL.patch, LUCENE-2324-SMALL.patch, LUCENE-2324-SMALL.patch, 
 LUCENE-2324.patch, LUCENE-2324.patch, LUCENE-2324.patch, LUCENE-2324.patch, 
 lucene-2324.patch, lucene-2324.patch, test.out, test.out, test.out, test.out


 See LUCENE-2293 for motivation and more details.
 I'm copying here Mike's summary he posted on 2293:
 Change the approach for how we buffer in RAM to a more isolated
 approach, whereby IW has N fully independent RAM segments
 in-process and when a doc needs to be indexed it's added to one of
 them. Each segment would also write its own doc stores and
 normal segment merging (not the inefficient merge we now do on
 flush) would merge them. This should be a good simplification in
 the chain (eg maybe we can remove the *PerThread classes). The
 segments can flush independently, letting us make much better
 concurrent use of IO  CPU.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: IndexWriter.ramSizeInBytes

2011-04-14 Thread Michael McCandless

This is actually [sadly] expected.

This is showing that your RAM efficiency is ~50% (well, less, if the
segment also has stored fields / term vectors).

This is because the in-RAM data structures cannot be 100% efficient as
they must leave room to grow the individual postings.  But once
written on disk the format is obviously compacted vs what's in RAM.

Mike

http://blog.mikemccandless.com

On Thu, Apr 14, 2011 at 7:21 AM, Shai Erera ser...@gmail.com wrote:
 Hi

 I'm indexing w/ IW, flush-by-RAM=off and flush-by-doc=MAX_INT. Whenever
 iw.ramSizeInBytes() = threshold, I commit the changes, serializes the
 Directory somewhere and starts with a new Directory and IW instance.

 The threshold is currently 32MB. I noticed though that the size of the
 serialized Directory is nearly half (16 MB). Is that expected? Will I see
 that behavior every time (e.g. w/ large stored fields), or is it data
 dependent? I assume that the data can affect the compression, but I never
 thought that by 50% factor, from RAM to disk.

 Shai


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: TestIndexWriterDelete#testUpdatesOnDiskFull can false fail

just committed to trunk

simon

On Wed, Apr 13, 2011 at 5:06 PM, Michael McCandless
luc...@mikemccandless.com wrote:
 +1

 Mike

 http://blog.mikemccandless.com

 On Wed, Apr 13, 2011 at 5:58 AM, Simon Willnauer
 simon.willna...@googlemail.com wrote:
 In TestIndexWriterDelete#testUpdatesOnDiskFull especially between line
 538 and 553 we could get a random exception from the
 MockDirectoryWrapper which makes the test fail since we are not
 catching / expecting those exceptions.
 I can make this fail  on trunk even in 1000 runs but on realtime it
 fails quickly after I merged this morning. I think we should just
 disable the random exception for this part and reenable after we are
 done, see patch below! - Thoughts?


 Index: lucene/src/test/org/apache/lucene/index/TestIndexWriterDelete.java
 ===
 --- lucene/src/test/org/apache/lucene/index/TestIndexWriterDelete.java  
 (revision
 1091721)
 +++ lucene/src/test/org/apache/lucene/index/TestIndexWriterDelete.java  
 (working
 copy)
 @@ -536,7 +536,9 @@
             fail(testName +  hit IOException after disk space was freed 
 up);
           }
         }
 -
 +        // prevent throwing a random exception here!!
 +        final double randomIOExceptionRate = dir.getRandomIOExceptionRate();
 +        dir.setRandomIOExceptionRate(0.0);
         if (!success) {
           // Must force the close else the writer can have
           // open files which cause exc in MockRAMDir.close
 @@ -549,6 +551,7 @@
           _TestUtil.checkIndex(dir);
           TestIndexWriter.assertNoUnreferencedFiles(dir, after 
 writer.close);
         }
 +        dir.setRandomIOExceptionRate(randomIOExceptionRate);

         // Finally, verify index is not corrupt, and, if
         // we succeeded, we see all docs changed, and if

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3028) IW.getReader() returns inconsistent reader on RT Branch


[ 
https://issues.apache.org/jira/browse/LUCENE-3028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13019806#comment-13019806
 ] 

Simon Willnauer commented on LUCENE-3028:
-

I will commit this latest patch to the branch we can still iterates but since 
we have jenkins running builds I want to let that sink a bit too

simon

 IW.getReader() returns inconsistent reader on RT Branch
 ---

 Key: LUCENE-3028
 URL: https://issues.apache.org/jira/browse/LUCENE-3028
 Project: Lucene - Java
  Issue Type: Bug
  Components: Index
Affects Versions: Realtime Branch
Reporter: Simon Willnauer
Assignee: Simon Willnauer
 Fix For: Realtime Branch

 Attachments: LUCENE-3028.patch, LUCENE-3028.patch


 I extended the testcase TestRollingUpdates#testUpdateSameDoc to pull a NRT 
 reader after each update and asserted that is always sees only one document. 
 Yet, this fails with current branch since there is a problem in how we flush 
 in the getReader() case. What happens here is that we flush all threads and 
 then release the lock (letting other flushes which came in after we entered 
 the flushAllThread context, continue) so that we could concurrently get a new 
 segment that transports global deletes without the corresponding add. They 
 sneak in while we continue to open the NRT reader which in turn sees 
 inconsistent results.
 I will upload a patch soon

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3023) Land DWPT on trunk


[ 
https://issues.apache.org/jira/browse/LUCENE-3023?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13019814#comment-13019814
 ] 

Michael McCandless commented on LUCENE-3023:


Why not just email dev@ when it fails?  Since it will soon land I think all 
should feel pain when it fails ;)

 Land DWPT on trunk
 --

 Key: LUCENE-3023
 URL: https://issues.apache.org/jira/browse/LUCENE-3023
 Project: Lucene - Java
  Issue Type: Task
Affects Versions: CSF branch, 4.0
Reporter: Simon Willnauer
Assignee: Simon Willnauer
 Fix For: 4.0


 With LUCENE-2956 we have resolved the last remaining issue for LUCENE-2324 so 
 we can proceed landing the DWPT development on trunk soon. I think one of the 
 bigger issues here is to make sure that all JavaDocs for IW etc. are still 
 correct though. I will start going through that first.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-3018) Lucene Native Directory implementation need automated build


 [ 
https://issues.apache.org/jira/browse/LUCENE-3018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Thacker updated LUCENE-3018:
--

Attachment: cpptasks.jar
LUCENE-3018.patch

The build.xml now includes a task to convert NativePosixUtil.cpp to 
NativePosixUtil.so . The task name is called build-native. 

Command to run the ant task :
{code:|borderStyle=solid}
ant -lib lucene/lib/cpptasks.jar build-native
{code} 

This requires cpptasks to be installed. I have uploaded cpptasks.jar which 
needs to be placed in the lucene/lib forlder .


 Lucene Native Directory implementation need automated build
 ---

 Key: LUCENE-3018
 URL: https://issues.apache.org/jira/browse/LUCENE-3018
 Project: Lucene - Java
  Issue Type: Wish
  Components: Build
Affects Versions: 4.0
Reporter: Simon Willnauer
Assignee: Varun Thacker
Priority: Minor
 Fix For: 4.0

 Attachments: LUCENE-3018.patch, LUCENE-3018.patch, cpptasks.jar


 Currently the native directory impl in contrib/misc require manual action to 
 compile the c code (partially) documented in 
  
 https://svn.apache.org/repos/asf/lucene/dev/trunk/lucene/contrib/misc/src/java/overview.html
 yet it would be nice if we had an ant task and documentation for all 
 platforms how to compile them and set up the prerequisites.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-2956) Support updateDocument() with DWPTs

2011-04-14 Thread Jason Rutherglen (JIRA)

[
https://issues.apache.org/jira/browse/LUCENE-2956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13019823#comment-13019823
]

Jason Rutherglen commented on LUCENE-2956:
--

{quote}Jason I think nothing prevents you from start working on this again
Yet, I think we should freeze the branch now and only allow merging, bug fixes,
tests and documentation fixes until we land on trunk. Once we are there we can
freely push stuff in the branch again and make it work with seq. ids.
{quote}

OK, great. I remember now that our main concern was the memory usage of using
a short[] (for the seq ids) if the total number of documents is numerous (eg,
10s of millions). Also at some point we'd have double the memory usage when we
roll over to the next set, until the previous readers are closed.

bq. I think we should freeze the branch now and only allow merging, bug fixes,
tests and documentation fixes until we land on trunk

Maybe once LUCENE-2312 sequence ids work for deletes, we can look at creating a
separate branch that implements seq id deletes for all segments, and compare
with the BV approach. Eg, performance, memory usage, and simplicity.

Support updateDocument() with DWPTs
---

Attachments: LUCENE-2956.patch, LUCENE-2956.patch

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-2956) Support updateDocument() with DWPTs


[ 
https://issues.apache.org/jira/browse/LUCENE-2956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13019827#comment-13019827
 ] 

Simon Willnauer commented on LUCENE-2956:
-

bq. Maybe once LUCENE-2312 sequence ids work for deletes, we can look at 
creating a separate branch that implements seq id deletes for all segments, and 
compare with the BV approach.  Eg, performance, memory usage, and simplicity.
I don't think we need to create a different branch until then DWPT will be no 
trunk and we can simply compare to trunk, no?


 Support updateDocument() with DWPTs
 ---

 Key: LUCENE-2956
 URL: https://issues.apache.org/jira/browse/LUCENE-2956
 Project: Lucene - Java
  Issue Type: Bug
  Components: Index
Affects Versions: Realtime Branch
Reporter: Michael Busch
Assignee: Simon Willnauer
Priority: Minor
 Fix For: Realtime Branch

 Attachments: LUCENE-2956.patch, LUCENE-2956.patch


 With separate DocumentsWriterPerThreads (DWPT) it can currently happen that 
 the delete part of an updateDocument() is flushed and committed separately 
 from the corresponding new document.
 We need to make sure that updateDocument() is always an atomic operation from 
 a IW.commit() and IW.getReader() perspective.  See LUCENE-2324 for more 
 details.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-2242) Get distinct count of names for a facet field

2011-04-14 Thread Dmitry Drozdov (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-2242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitry Drozdov updated SOLR-2242:
-

Attachment: SOLR.2242.solr3.1.patch

Thanks for the patch!
It also works for version 3.1, just the line numbers differ - attaching the 
adopted patch for 3.1 just in case.

 Get distinct count of names for a facet field
 -

 Key: SOLR-2242
 URL: https://issues.apache.org/jira/browse/SOLR-2242
 Project: Solr
  Issue Type: New Feature
  Components: Response Writers
Affects Versions: 4.0
Reporter: Bill Bell
Priority: Minor
 Fix For: 4.0

 Attachments: SOLR.2242.solr3.1.patch, SOLR.2242.v2.patch


 When returning facet.field=name of field you will get a list of matches for 
 distinct values. This is normal behavior. This patch tells you how many 
 distinct values you have (# of rows). Use with limit=-1 and mincount=1.
 The feature is called namedistinct. Here is an example:
 http://localhost:8983/solr/select?q=*:*facet=truefacet.field=manufacet.mincount=1facet.limit=-1f.manu.facet.namedistinct=0facet.field=pricef.price.facet.namedistinct=1
 Here is an example on field hgid (without namedistinct):
 {code}
 - lst name=facet_fields
 - lst name=hgid
   int name=HGPY045FD36D4000A1/int 
   int name=HGPY0FBC6690453A91/int 
   int name=HGPY1E44ED6C4FB3B1/int 
   int name=HGPY1FA631034A1B81/int 
   int name=HGPY3317ABAC43B481/int 
   int name=HGPY3A17B2294CB5A5/int 
   int name=HGPY3ADD2B3D48C391/int 
   /lst
   /lst
 {code}
 With namedistinct (HGPY045FD36D4000A, HGPY0FBC6690453A9, 
 HGPY1E44ED6C4FB3B, HGPY1FA631034A1B8, HGPY3317ABAC43B48, 
 HGPY3A17B2294CB5A, HGPY3ADD2B3D48C39). This returns number of rows 
 (7), not the number of values (11).
 {code}
 - lst name=facet_fields
 - lst name=hgid
   int name=_count_7/int 
   /lst
   /lst
 {code}
 This works actually really good to get total number of fields for a 
 group.field=hgid. Enjoy!

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3018) Lucene Native Directory implementation need automated build

2011-04-14 Thread Steven Rowe (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13019833#comment-13019833
 ] 

Steven Rowe commented on LUCENE-3018:
-

When {{cpptasks.jar}} is committed to the Lucene source tree, it should have a 
version number included in its name.  E.g., if the jar was built from the 1.0b5 
sources, the committed jar should be named {{cpptasks-1.0b5.jar}}.  

Varun, where did you get the {{cpptasks.jar}} from?  If you build it yourself, 
please use a Java 1.5 JDK, to insure it will be compatible with 1.5 JVMs.

 Lucene Native Directory implementation need automated build
 ---

 Key: LUCENE-3018
 URL: https://issues.apache.org/jira/browse/LUCENE-3018
 Project: Lucene - Java
  Issue Type: Wish
  Components: Build
Affects Versions: 4.0
Reporter: Simon Willnauer
Assignee: Varun Thacker
Priority: Minor
 Fix For: 4.0

 Attachments: LUCENE-3018.patch, LUCENE-3018.patch, cpptasks.jar


 Currently the native directory impl in contrib/misc require manual action to 
 compile the c code (partially) documented in 
  
 https://svn.apache.org/repos/asf/lucene/dev/trunk/lucene/contrib/misc/src/java/overview.html
 yet it would be nice if we had an ant task and documentation for all 
 platforms how to compile them and set up the prerequisites.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3018) Lucene Native Directory implementation need automated build


[ 
https://issues.apache.org/jira/browse/LUCENE-3018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13019839#comment-13019839
 ] 

Varun Thacker commented on LUCENE-3018:
---

Sorry for not being clear about it. I should have named it cpptasks-1.0b4.jar. 
I did not build it myself but used the one provided on the ant-contrib 
development page.

Link to cpptasks-1.0b4 : 
http://sourceforge.net/projects/ant-contrib/files/ant-contrib/cpptasks-1.0-beta4/

Should I upload the LICENSE file which came with it ?



 Lucene Native Directory implementation need automated build
 ---

 Key: LUCENE-3018
 URL: https://issues.apache.org/jira/browse/LUCENE-3018
 Project: Lucene - Java
  Issue Type: Wish
  Components: Build
Affects Versions: 4.0
Reporter: Simon Willnauer
Assignee: Varun Thacker
Priority: Minor
 Fix For: 4.0

 Attachments: LUCENE-3018.patch, LUCENE-3018.patch, cpptasks.jar


 Currently the native directory impl in contrib/misc require manual action to 
 compile the c code (partially) documented in 
  
 https://svn.apache.org/repos/asf/lucene/dev/trunk/lucene/contrib/misc/src/java/overview.html
 yet it would be nice if we had an ant task and documentation for all 
 platforms how to compile them and set up the prerequisites.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[HUDSON] Lucene-Solr-tests-only-trunk - Build # 7104 - Failure

Build: https://hudson.apache.org/hudson/job/Lucene-Solr-tests-only-trunk/7104/

4 tests failed.
REGRESSION:  org.apache.solr.cloud.BasicDistributedZkTest.testDistribSearch

Error Message:
KeeperErrorCode = ConnectionLoss for /collections/collection1/shards

Stack Trace:
org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = 
ConnectionLoss for /collections/collection1/shards
at org.apache.zookeeper.KeeperException.create(KeeperException.java:90)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:637)
at 
org.apache.solr.common.cloud.SolrZkClient.makePath(SolrZkClient.java:347)
at 
org.apache.solr.common.cloud.SolrZkClient.makePath(SolrZkClient.java:308)
at 
org.apache.solr.common.cloud.SolrZkClient.makePath(SolrZkClient.java:290)
at 
org.apache.solr.common.cloud.SolrZkClient.makePath(SolrZkClient.java:260)
at 
org.apache.solr.cloud.AbstractZkTestCase.buildZooKeeper(AbstractZkTestCase.java:80)
at 
org.apache.solr.cloud.AbstractDistributedZkTestCase.setUp(AbstractDistributedZkTestCase.java:47)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1232)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1160)


FAILED:  junit.framework.TestSuite.org.apache.solr.cloud.BasicZkTest

Error Message:
KeeperErrorCode = ConnectionLoss for /solr

Stack Trace:
org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = 
ConnectionLoss for /solr
at org.apache.zookeeper.KeeperException.create(KeeperException.java:90)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:637)
at 
org.apache.solr.common.cloud.SolrZkClient.makePath(SolrZkClient.java:347)
at 
org.apache.solr.common.cloud.SolrZkClient.makePath(SolrZkClient.java:308)
at 
org.apache.solr.common.cloud.SolrZkClient.makePath(SolrZkClient.java:290)
at 
org.apache.solr.common.cloud.SolrZkClient.makePath(SolrZkClient.java:255)
at 
org.apache.solr.cloud.AbstractZkTestCase.buildZooKeeper(AbstractZkTestCase.java:72)
at 
org.apache.solr.cloud.AbstractZkTestCase.azt_beforeClass(AbstractZkTestCase.java:62)


REGRESSION:  org.apache.solr.cloud.CloudStateUpdateTest.testCoreRegistration

Error Message:
null

Stack Trace:
org.apache.solr.common.cloud.ZooKeeperException: 
at org.apache.solr.core.CoreContainer.register(CoreContainer.java:517)
at org.apache.solr.core.CoreContainer.register(CoreContainer.java:545)
at 
org.apache.solr.cloud.CloudStateUpdateTest.testCoreRegistration(CloudStateUpdateTest.java:156)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1232)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1160)
Caused by: org.apache.zookeeper.KeeperException$ConnectionLossException: 
KeeperErrorCode = ConnectionLoss for 
/collections/testcore/shards/lucene.zones.apache.org:1661_solr_testcore
at org.apache.zookeeper.KeeperException.create(KeeperException.java:90)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:637)
at 
org.apache.solr.common.cloud.SolrZkClient.makePath(SolrZkClient.java:347)
at 
org.apache.solr.common.cloud.SolrZkClient.makePath(SolrZkClient.java:308)
at 
org.apache.solr.common.cloud.SolrZkClient.makePath(SolrZkClient.java:370)
at 
org.apache.solr.cloud.ZkController.addZkShardsNode(ZkController.java:155)
at org.apache.solr.cloud.ZkController.register(ZkController.java:481)
at org.apache.solr.core.CoreContainer.register(CoreContainer.java:508)


REGRESSION:  org.apache.solr.cloud.ZkSolrClientTest.testMakeRootNode

Error Message:
KeeperErrorCode = ConnectionLoss for /solr

Stack Trace:
org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = 
ConnectionLoss for /solr
at org.apache.zookeeper.KeeperException.create(KeeperException.java:90)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:637)
at 
org.apache.solr.common.cloud.SolrZkClient.makePath(SolrZkClient.java:347)
at 
org.apache.solr.common.cloud.SolrZkClient.makePath(SolrZkClient.java:308)
at 
org.apache.solr.common.cloud.SolrZkClient.makePath(SolrZkClient.java:290)
at 
org.apache.solr.common.cloud.SolrZkClient.makePath(SolrZkClient.java:255)
at 
org.apache.solr.cloud.AbstractZkTestCase.makeSolrZkNode(AbstractZkTestCase.java:128)
at 
org.apache.solr.cloud.ZkSolrClientTest.testMakeRootNode(ZkSolrClientTest.java:57)
at

[jira] [Commented] (LUCENE-3018) Lucene Native Directory implementation need automated build


[ 
https://issues.apache.org/jira/browse/LUCENE-3018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13019848#comment-13019848
 ] 

Simon Willnauer commented on LUCENE-3018:
-

hey varun,

here are some comments

do we need 
{code}
property environment=env/
property name=jni1 location=${env.JAVA_HOME}/include /
property name=jni2 location=${env.JAVA_HOME}/include/linux /
{code}

or can we simply use 

{code}
includepath
  pathelement location=${java.home}/include/
  pathelement location=${java.home}/lnclude/linux/
/includepath
{code}

instead of using 

{code}fileset file=src/java/org/apache/lucene/store/NativePosixUtil.cpp / 
{code}

we should rather using 
{code}fileset file=${src.dir}/org/apache/lucene/store/NativePosixUtil.cpp / 
{code}

I wonder if we really want to put the build .so file into 
src/java/org/apache/lucene/store/NativePosixUtil (outfile) or if this should 
rather be build into ${common.build.dir}
that way it would be cleaned up too something like this:
{code}
mkdir dir=${common.build.dir}/native/
cpptasks:cc outtype=shared subsystem=console 
outfile=${common.build.dir}/native/NativePosixUtil 
{code}

Do we need to specify gcc as the compiler? afaik its default so we might can 
let it choose the default?

I also wonder what happens if the java.home points to a $JAVA_HOME/jre 
directory and not to $JAVA_HOME directly in such a case we need to include 
${java.home}/../include etc.  maybe we need to specify the path based on a 
condition? 

it would be great if we had a way to test that the native lib works so maybe we 
wanna check that too with a small testcase?

simon





 Lucene Native Directory implementation need automated build
 ---

 Key: LUCENE-3018
 URL: https://issues.apache.org/jira/browse/LUCENE-3018
 Project: Lucene - Java
  Issue Type: Wish
  Components: Build
Affects Versions: 4.0
Reporter: Simon Willnauer
Assignee: Varun Thacker
Priority: Minor
 Fix For: 4.0

 Attachments: LUCENE-3018.patch, LUCENE-3018.patch, cpptasks.jar


 Currently the native directory impl in contrib/misc require manual action to 
 compile the c code (partially) documented in 
  
 https://svn.apache.org/repos/asf/lucene/dev/trunk/lucene/contrib/misc/src/java/overview.html
 yet it would be nice if we had an ant task and documentation for all 
 platforms how to compile them and set up the prerequisites.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Issue Comment Edited] (LUCENE-3018) Lucene Native Directory implementation need automated build


[ 
https://issues.apache.org/jira/browse/LUCENE-3018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13019848#comment-13019848
 ] 

Simon Willnauer edited comment on LUCENE-3018 at 4/14/11 2:33 PM:
--

hey varun,

here are some comments

do we need 
{code}
property environment=env/
property name=jni1 location=${env.JAVA_HOME}/include /
property name=jni2 location=${env.JAVA_HOME}/include/linux /
{code}

or can we simply use 

{code}
includepath
  pathelement location=${java.home}/include/
  pathelement location=${java.home}/lnclude/linux/
/includepath
{code}

instead of using 

{code}fileset file=src/java/org/apache/lucene/store/NativePosixUtil.cpp / 
{code}

we should rather using 
{code}fileset file=${src.dir}/org/apache/lucene/store/NativePosixUtil.cpp / 
{code}

I wonder if we really want to put the build .so file into 
src/java/org/apache/lucene/store/NativePosixUtil (outfile) or if this should 
rather be build into ${common.build.dir}
that way it would be cleaned up too something like this:

mkdir dir=${common.build.dir}/native/
cpptasks:cc outtype=shared subsystem=console 
outfile=${common.build.dir}/native/NativePosixUtil 



Do we need to specify gcc as the compiler? afaik its default so we might can 
let it choose the default?

I also wonder what happens if the java.home points to a $JAVA_HOME/jre 
directory and not to $JAVA_HOME directly in such a case we need to include 
${java.home}/../include etc.  maybe we need to specify the path based on a 
condition? 

it would be great if we had a way to test that the native lib works so maybe we 
wanna check that too with a small testcase?

simon





  was (Author: simonw):
hey varun,

here are some comments

do we need 
{code}
property environment=env/
property name=jni1 location=${env.JAVA_HOME}/include /
property name=jni2 location=${env.JAVA_HOME}/include/linux /
{code}

or can we simply use 

{code}
includepath
  pathelement location=${java.home}/include/
  pathelement location=${java.home}/lnclude/linux/
/includepath
{code}

instead of using 

{code}fileset file=src/java/org/apache/lucene/store/NativePosixUtil.cpp / 
{code}

we should rather using 
{code}fileset file=${src.dir}/org/apache/lucene/store/NativePosixUtil.cpp / 
{code}

I wonder if we really want to put the build .so file into 
src/java/org/apache/lucene/store/NativePosixUtil (outfile) or if this should 
rather be build into ${common.build.dir}
that way it would be cleaned up too something like this:
{code}
mkdir dir=${common.build.dir}/native/
cpptasks:cc outtype=shared subsystem=console 
outfile=${common.build.dir}/native/NativePosixUtil 
{code}

Do we need to specify gcc as the compiler? afaik its default so we might can 
let it choose the default?

I also wonder what happens if the java.home points to a $JAVA_HOME/jre 
directory and not to $JAVA_HOME directly in such a case we need to include 
${java.home}/../include etc.  maybe we need to specify the path based on a 
condition? 

it would be great if we had a way to test that the native lib works so maybe we 
wanna check that too with a small testcase?

simon




  
 Lucene Native Directory implementation need automated build
 ---

 Key: LUCENE-3018
 URL: https://issues.apache.org/jira/browse/LUCENE-3018
 Project: Lucene - Java
  Issue Type: Wish
  Components: Build
Affects Versions: 4.0
Reporter: Simon Willnauer
Assignee: Varun Thacker
Priority: Minor
 Fix For: 4.0

 Attachments: LUCENE-3018.patch, LUCENE-3018.patch, cpptasks.jar


 Currently the native directory impl in contrib/misc require manual action to 
 compile the c code (partially) documented in 
  
 https://svn.apache.org/repos/asf/lucene/dev/trunk/lucene/contrib/misc/src/java/overview.html
 yet it would be nice if we had an ant task and documentation for all 
 platforms how to compile them and set up the prerequisites.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3023) Land DWPT on trunk


[ 
https://issues.apache.org/jira/browse/LUCENE-3023?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13019852#comment-13019852
 ] 

Simon Willnauer commented on LUCENE-3023:
-

bq. Why not just email dev@ when it fails? Since it will soon land I think all 
should feel pain when it fails 
true, done!

 Land DWPT on trunk
 --

 Key: LUCENE-3023
 URL: https://issues.apache.org/jira/browse/LUCENE-3023
 Project: Lucene - Java
  Issue Type: Task
Affects Versions: CSF branch, 4.0
Reporter: Simon Willnauer
Assignee: Simon Willnauer
 Fix For: 4.0


 With LUCENE-2956 we have resolved the last remaining issue for LUCENE-2324 so 
 we can proceed landing the DWPT development on trunk soon. I think one of the 
 bigger issues here is to make sure that all JavaDocs for IW etc. are still 
 correct though. I will start going through that first.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[HUDSON] Lucene-Solr-tests-only-realtime_search-branch - Build # 2 - Still Failing

Build: 
https://hudson.apache.org/hudson/job/Lucene-Solr-tests-only-realtime_search-branch/2/

1 tests failed.
REGRESSION:  org.apache.lucene.index.TestIndexWriterDelete.testUpdatesOnDiskFull

Error Message:
fake disk full at 13517 bytes when writing _0_1.del (file length=0; wrote 10 of 
20 bytes)

Stack Trace:
java.io.IOException: fake disk full at 13517 bytes when writing _0_1.del (file 
length=0; wrote 10 of 20 bytes)
at 
org.apache.lucene.store.MockIndexOutputWrapper.writeBytes(MockIndexOutputWrapper.java:111)
at org.apache.lucene.store.DataOutput.writeBytes(DataOutput.java:43)
at org.apache.lucene.util.BitVector.writeBits(BitVector.java:182)
at org.apache.lucene.util.BitVector.write(BitVector.java:171)
at 
org.apache.lucene.index.SegmentReader.commitChanges(SegmentReader.java:718)
at 
org.apache.lucene.index.SegmentReader.doCommit(SegmentReader.java:696)
at 
org.apache.lucene.index.IndexWriter$ReaderPool.commit(IndexWriter.java:572)
at 
org.apache.lucene.index.IndexWriter.startCommit(IndexWriter.java:3597)
at 
org.apache.lucene.index.IndexWriter.prepareCommit(IndexWriter.java:2466)
at 
org.apache.lucene.index.IndexWriter.commitInternal(IndexWriter.java:2537)
at 
org.apache.lucene.index.IndexWriter.closeInternal(IndexWriter.java:1067)
at 
org.apache.lucene.index.IndexWriter.rollbackInternal(IndexWriter.java:1923)
at org.apache.lucene.index.IndexWriter.rollback(IndexWriter.java:1848)
at 
org.apache.lucene.index.TestIndexWriterDelete.doTestOperationsOnDiskFull(TestIndexWriterDelete.java:545)
at 
org.apache.lucene.index.TestIndexWriterDelete.testUpdatesOnDiskFull(TestIndexWriterDelete.java:409)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1226)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1154)




Build Log (for compile errors):
[...truncated 3190 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-2571) Indexing performance tests with realtime branch


 [ 
https://issues.apache.org/jira/browse/LUCENE-2571?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Willnauer updated LUCENE-2571:


Attachment: wikimedium.trunk.Standard.nd10M_dps_addDocuments.png
wikimedium.trunk.Standard.nd10M_dps.png
wikimedium.realtime.Standard.nd10M_dps_addDocuments_flush.png
wikimedium.realtime.Standard.nd10M_dps_addDocuments.png
wikimedium.realtime.Standard.nd10M_dps.png

benchmarks charts attached

 Indexing performance tests with realtime branch
 ---

 Key: LUCENE-2571
 URL: https://issues.apache.org/jira/browse/LUCENE-2571
 Project: Lucene - Java
  Issue Type: Task
  Components: Index
Reporter: Michael Busch
Priority: Minor
 Fix For: Realtime Branch

 Attachments: wikimedium.realtime.Standard.nd10M_dps.png, 
 wikimedium.realtime.Standard.nd10M_dps_addDocuments.png, 
 wikimedium.realtime.Standard.nd10M_dps_addDocuments_flush.png, 
 wikimedium.trunk.Standard.nd10M_dps.png, 
 wikimedium.trunk.Standard.nd10M_dps_addDocuments.png


 We should run indexing performance tests with the DWPT changes and compare to 
 trunk.
 We need to test both single-threaded and multi-threaded performance.
 NOTE:  flush by RAM isn't implemented just yet, so either we wait with the 
 tests or flush by doc count.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-3018) Lucene Native Directory implementation need automated build


 [ 
https://issues.apache.org/jira/browse/LUCENE-3018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Thacker updated LUCENE-3018:
--

Attachment: LUCENE-3018.patch

I have made the changes which were mentioned by Simon.

I have changed the way JNI header files are included:
{code:title=JNI header includes|borderStyle=solid}
includepath
pathelement location=${java.home}/../include/
pathelement location=${java.home}/../include/linux/
for/includepath
{code}  

The reason being when I echoed java.home it's path was :
{code:title=path|borderStyle=solid}
/usr/lib/jvm/java-6-sun-1.6.0.24/jvm
{code} 

Changed the path convention to:
{code}  
{code:borderStyle=solid}
fileset file=${src.dir}/org/apache/lucene/store/NativePosixUtil.cpp /
{code} 

The directory for the shared library is now:
{code}  
{code:title=Shared File Directory|borderStyle=solid}
lucene/build/native/
{code} 

I have explicitly specified GCC as a compiler so that in future when Windows is 
also incorporated it would be needed.

I will write a small test case to see whether the .so file being built is 
working fine.


 Lucene Native Directory implementation need automated build
 ---

 Key: LUCENE-3018
 URL: https://issues.apache.org/jira/browse/LUCENE-3018
 Project: Lucene - Java
  Issue Type: Wish
  Components: Build
Affects Versions: 4.0
Reporter: Simon Willnauer
Assignee: Varun Thacker
Priority: Minor
 Fix For: 4.0

 Attachments: LUCENE-3018.patch, LUCENE-3018.patch, LUCENE-3018.patch, 
 cpptasks.jar


 Currently the native directory impl in contrib/misc require manual action to 
 compile the c code (partially) documented in 
  
 https://svn.apache.org/repos/asf/lucene/dev/trunk/lucene/contrib/misc/src/java/overview.html
 yet it would be nice if we had an ant task and documentation for all 
 platforms how to compile them and set up the prerequisites.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Issue Comment Edited] (LUCENE-3018) Lucene Native Directory implementation need automated build


[ 
https://issues.apache.org/jira/browse/LUCENE-3018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13019861#comment-13019861
 ] 

Varun Thacker edited comment on LUCENE-3018 at 4/14/11 3:13 PM:


I have made the changes which were mentioned by Simon.

I have changed the way JNI header files are included:
{code:title=JNI header includes|borderStyle=solid}
includepath
pathelement location=${java.home}/../include/
pathelement location=${java.home}/../include/linux/
/includepath
{code}  

The reason being when I echoed java.home it's path was :
{code:title=path|borderStyle=solid}
/usr/lib/jvm/java-6-sun-1.6.0.24/jvm
{code} 

Changed the path convention to:
{code:borderStyle=solid}
fileset file=${src.dir}/org/apache/lucene/store/NativePosixUtil.cpp /
{code} 

The directory for the shared library is now:
{code:title=Shared File Directory|borderStyle=solid}
lucene/build/native/
{code} 

I have explicitly specified GCC as a compiler so that in future when Windows is 
also incorporated it would be needed.

I will write a small test case to see whether the .so file being built is 
working fine.


  was (Author: varunthacker):
I have made the changes which were mentioned by Simon.

I have changed the way JNI header files are included:
{code:title=JNI header includes|borderStyle=solid}
includepath
pathelement location=${java.home}/../include/
pathelement location=${java.home}/../include/linux/
for/includepath
{code}  

The reason being when I echoed java.home it's path was :
{code:title=path|borderStyle=solid}
/usr/lib/jvm/java-6-sun-1.6.0.24/jvm
{code} 

Changed the path convention to:
{code}  
{code:borderStyle=solid}
fileset file=${src.dir}/org/apache/lucene/store/NativePosixUtil.cpp /
{code} 

The directory for the shared library is now:
{code}  
{code:title=Shared File Directory|borderStyle=solid}
lucene/build/native/
{code} 

I have explicitly specified GCC as a compiler so that in future when Windows is 
also incorporated it would be needed.

I will write a small test case to see whether the .so file being built is 
working fine.

  
 Lucene Native Directory implementation need automated build
 ---

 Key: LUCENE-3018
 URL: https://issues.apache.org/jira/browse/LUCENE-3018
 Project: Lucene - Java
  Issue Type: Wish
  Components: Build
Affects Versions: 4.0
Reporter: Simon Willnauer
Assignee: Varun Thacker
Priority: Minor
 Fix For: 4.0

 Attachments: LUCENE-3018.patch, LUCENE-3018.patch, LUCENE-3018.patch, 
 cpptasks.jar


 Currently the native directory impl in contrib/misc require manual action to 
 compile the c code (partially) documented in 
  
 https://svn.apache.org/repos/asf/lucene/dev/trunk/lucene/contrib/misc/src/java/overview.html
 yet it would be nice if we had an ant task and documentation for all 
 platforms how to compile them and set up the prerequisites.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Assigned] (LUCENE-3026) smartcn analyzer throw NullPointer exception when the length of analysed text over 32767


 [ 
https://issues.apache.org/jira/browse/LUCENE-3026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir reassigned LUCENE-3026:
---

Assignee: Robert Muir

 smartcn analyzer throw NullPointer exception when the length of analysed text 
 over 32767
 

 Key: LUCENE-3026
 URL: https://issues.apache.org/jira/browse/LUCENE-3026
 Project: Lucene - Java
  Issue Type: Bug
  Components: contrib/analyzers
Affects Versions: 3.1, 4.0
Reporter: wangzhenghang
Assignee: Robert Muir
 Attachments: LUCENE-3026.patch


 That's all because of org.apache.lucene.analysis.cn.smart.hhmm.SegGraph's 
 makeIndex() method:
   public ListSegToken makeIndex() {
 ListSegToken result = new ArrayListSegToken();
 int s = -1, count = 0, size = tokenListTable.size();
 ListSegToken tokenList;
 short index = 0;
 while (count  size) {
   if (isStartExist(s)) {
 tokenList = tokenListTable.get(s);
 for (SegToken st : tokenList) {
   st.index = index;
   result.add(st);
   index++;
 }
 count++;
   }
   s++;
 }
 return result;
   }
 here 'short index = 0;' should be 'int index = 0;'. And that's reported here 
 http://code.google.com/p/imdict-chinese-analyzer/issues/detail?id=2 and 
 http://code.google.com/p/imdict-chinese-analyzer/issues/detail?id=11, the 
 author XiaoPingGao have already fixed this 
 bug:http://code.google.com/p/imdict-chinese-analyzer/source/browse/trunk/src/org/apache/lucene/analysis/cn/smart/hhmm/SegGraph.java

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3018) Lucene Native Directory implementation need automated build


[ 
https://issues.apache.org/jira/browse/LUCENE-3018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13019863#comment-13019863
 ] 

Simon Willnauer commented on LUCENE-3018:
-

bq. I will write a small test case to see whether the .so file being built is 
working fine.
awesome! :)

 Lucene Native Directory implementation need automated build
 ---

 Key: LUCENE-3018
 URL: https://issues.apache.org/jira/browse/LUCENE-3018
 Project: Lucene - Java
  Issue Type: Wish
  Components: Build
Affects Versions: 4.0
Reporter: Simon Willnauer
Assignee: Varun Thacker
Priority: Minor
 Fix For: 4.0

 Attachments: LUCENE-3018.patch, LUCENE-3018.patch, LUCENE-3018.patch, 
 cpptasks.jar


 Currently the native directory impl in contrib/misc require manual action to 
 compile the c code (partially) documented in 
  
 https://svn.apache.org/repos/asf/lucene/dev/trunk/lucene/contrib/misc/src/java/overview.html
 yet it would be nice if we had an ant task and documentation for all 
 platforms how to compile them and set up the prerequisites.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: [HUDSON] Lucene-Solr-tests-only-realtime_search-branch - Build # 2 - Still Failing

I just committed a fix for this

On Thu, Apr 14, 2011 at 4:47 PM, Apache Hudson Server
hud...@hudson.apache.org wrote:
 Build: 
 https://hudson.apache.org/hudson/job/Lucene-Solr-tests-only-realtime_search-branch/2/

 1 tests failed.
 REGRESSION:  
 org.apache.lucene.index.TestIndexWriterDelete.testUpdatesOnDiskFull

 Error Message:
 fake disk full at 13517 bytes when writing _0_1.del (file length=0; wrote 10 
 of 20 bytes)

 Stack Trace:
 java.io.IOException: fake disk full at 13517 bytes when writing _0_1.del 
 (file length=0; wrote 10 of 20 bytes)
        at 
 org.apache.lucene.store.MockIndexOutputWrapper.writeBytes(MockIndexOutputWrapper.java:111)
        at org.apache.lucene.store.DataOutput.writeBytes(DataOutput.java:43)
        at org.apache.lucene.util.BitVector.writeBits(BitVector.java:182)
        at org.apache.lucene.util.BitVector.write(BitVector.java:171)
        at 
 org.apache.lucene.index.SegmentReader.commitChanges(SegmentReader.java:718)
        at 
 org.apache.lucene.index.SegmentReader.doCommit(SegmentReader.java:696)
        at 
 org.apache.lucene.index.IndexWriter$ReaderPool.commit(IndexWriter.java:572)
        at 
 org.apache.lucene.index.IndexWriter.startCommit(IndexWriter.java:3597)
        at 
 org.apache.lucene.index.IndexWriter.prepareCommit(IndexWriter.java:2466)
        at 
 org.apache.lucene.index.IndexWriter.commitInternal(IndexWriter.java:2537)
        at 
 org.apache.lucene.index.IndexWriter.closeInternal(IndexWriter.java:1067)
        at 
 org.apache.lucene.index.IndexWriter.rollbackInternal(IndexWriter.java:1923)
        at org.apache.lucene.index.IndexWriter.rollback(IndexWriter.java:1848)
        at 
 org.apache.lucene.index.TestIndexWriterDelete.doTestOperationsOnDiskFull(TestIndexWriterDelete.java:545)
        at 
 org.apache.lucene.index.TestIndexWriterDelete.testUpdatesOnDiskFull(TestIndexWriterDelete.java:409)
        at 
 org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1226)
        at 
 org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1154)




 Build Log (for compile errors):
 [...truncated 3190 lines...]



 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-3026) smartcn analyzer throw NullPointer exception when the length of analysed text over 32767


 [ 
https://issues.apache.org/jira/browse/LUCENE-3026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-3026:


Fix Version/s: 4.0
   3.2

 smartcn analyzer throw NullPointer exception when the length of analysed text 
 over 32767
 

 Key: LUCENE-3026
 URL: https://issues.apache.org/jira/browse/LUCENE-3026
 Project: Lucene - Java
  Issue Type: Bug
  Components: contrib/analyzers
Affects Versions: 3.1, 4.0
Reporter: wangzhenghang
Assignee: Robert Muir
 Fix For: 3.2, 4.0

 Attachments: LUCENE-3026.patch


 That's all because of org.apache.lucene.analysis.cn.smart.hhmm.SegGraph's 
 makeIndex() method:
   public ListSegToken makeIndex() {
 ListSegToken result = new ArrayListSegToken();
 int s = -1, count = 0, size = tokenListTable.size();
 ListSegToken tokenList;
 short index = 0;
 while (count  size) {
   if (isStartExist(s)) {
 tokenList = tokenListTable.get(s);
 for (SegToken st : tokenList) {
   st.index = index;
   result.add(st);
   index++;
 }
 count++;
   }
   s++;
 }
 return result;
   }
 here 'short index = 0;' should be 'int index = 0;'. And that's reported here 
 http://code.google.com/p/imdict-chinese-analyzer/issues/detail?id=2 and 
 http://code.google.com/p/imdict-chinese-analyzer/issues/detail?id=11, the 
 author XiaoPingGao have already fixed this 
 bug:http://code.google.com/p/imdict-chinese-analyzer/source/browse/trunk/src/org/apache/lucene/analysis/cn/smart/hhmm/SegGraph.java

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-2952) Make license checking/maintenance easier/automated

2011-04-14 Thread Grant Ingersoll (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-2952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Ingersoll resolved LUCENE-2952.
-

Resolution: Fixed

 Make license checking/maintenance easier/automated
 --

 Key: LUCENE-2952
 URL: https://issues.apache.org/jira/browse/LUCENE-2952
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Grant Ingersoll
Assignee: Grant Ingersoll
Priority: Minor
 Fix For: 3.2, 4.0

 Attachments: LUCENE-2952.patch, LUCENE-2952.patch, LUCENE-2952.patch, 
 LUCENE-2952.patch, LUCENE-2952.patch, LUCENE-2952.patch, LUCENE-2952.patch


 Instead of waiting until release to check licenses are valid, we should make 
 it a part of our build process to ensure that all dependencies have proper 
 licenses, etc.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Setting the max number of merge threads across IndexWriters

2011-04-14 Thread Jason Rutherglen

Today the ConcurrentMergeScheduler allows setting the max thread
count and is bound to a single IndexWriter.

However in the [common] case of multiple IndexWriters running in
the same process, this disallows one from managing the aggregate
number of merge threads executing at any given time.

I think this can be fixed, shall I open an issue?

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-3026) smartcn analyzer throw NullPointer exception when the length of analysed text over 32767


 [ 
https://issues.apache.org/jira/browse/LUCENE-3026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir resolved LUCENE-3026.
-

Resolution: Fixed

Committed revision 1092328, 1092338 (branch_3x).

Thank you for the patch!

 smartcn analyzer throw NullPointer exception when the length of analysed text 
 over 32767
 

 Key: LUCENE-3026
 URL: https://issues.apache.org/jira/browse/LUCENE-3026
 Project: Lucene - Java
  Issue Type: Bug
  Components: contrib/analyzers
Affects Versions: 3.1, 4.0
Reporter: wangzhenghang
Assignee: Robert Muir
 Fix For: 3.2, 4.0

 Attachments: LUCENE-3026.patch


 That's all because of org.apache.lucene.analysis.cn.smart.hhmm.SegGraph's 
 makeIndex() method:
   public ListSegToken makeIndex() {
 ListSegToken result = new ArrayListSegToken();
 int s = -1, count = 0, size = tokenListTable.size();
 ListSegToken tokenList;
 short index = 0;
 while (count  size) {
   if (isStartExist(s)) {
 tokenList = tokenListTable.get(s);
 for (SegToken st : tokenList) {
   st.index = index;
   result.add(st);
   index++;
 }
 count++;
   }
   s++;
 }
 return result;
   }
 here 'short index = 0;' should be 'int index = 0;'. And that's reported here 
 http://code.google.com/p/imdict-chinese-analyzer/issues/detail?id=2 and 
 http://code.google.com/p/imdict-chinese-analyzer/issues/detail?id=11, the 
 author XiaoPingGao have already fixed this 
 bug:http://code.google.com/p/imdict-chinese-analyzer/source/browse/trunk/src/org/apache/lucene/analysis/cn/smart/hhmm/SegGraph.java

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Setting the max number of merge threads across IndexWriters

2011-04-14 Thread Jason Rutherglen

I think the proposal involved using a ThreadPoolExecutor, which seemed
to not quite work as well as what we have.  I think it'll be easier to
simply pass a global context that keeps a counter of the actively
running threads, and pass that into each IW's CMS?

On Thu, Apr 14, 2011 at 8:25 AM, Simon Willnauer
simon.willna...@googlemail.com wrote:
 On Thu, Apr 14, 2011 at 5:20 PM, Jason Rutherglen
 jason.rutherg...@gmail.com wrote:
 Today the ConcurrentMergeScheduler allows setting the max thread
 count and is bound to a single IndexWriter.

 However in the [common] case of multiple IndexWriters running in
 the same process, this disallows one from managing the aggregate
 number of merge threads executing at any given time.

 I think this can be fixed, shall I open an issue?

 go ahead! I think I have seen this suggestion somewhere maybe you need
 to see if there is one already

 simon

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org



 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[JENKINS-MAVEN] Lucene-Solr-Maven-trunk #89: POMs out of sync

Build: https://hudson.apache.org/hudson/job/Lucene-Solr-Maven-trunk/89/

No tests ran.

Build Log (for compile errors):
[...truncated 50 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Assigned] (LUCENE-3022) DictionaryCompoundWordTokenFilter Flag onlyLongestMatch has no affect


 [ 
https://issues.apache.org/jira/browse/LUCENE-3022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir reassigned LUCENE-3022:
---

Assignee: Robert Muir

 DictionaryCompoundWordTokenFilter Flag onlyLongestMatch has no affect
 -

 Key: LUCENE-3022
 URL: https://issues.apache.org/jira/browse/LUCENE-3022
 Project: Lucene - Java
  Issue Type: Bug
  Components: contrib/analyzers
Affects Versions: 2.9.4, 3.1
Reporter: Johann Höchtl
Assignee: Robert Muir
Priority: Minor
 Attachments: LUCENE-3022.patch

   Original Estimate: 5m
  Remaining Estimate: 5m

 When using the DictionaryCompoundWordTokenFilter with a german dictionary, I 
 got a strange behaviour:
 The german word streifenbluse (blouse with stripes) was decompounded to 
 streifen (stripe),reifen(tire) which makes no sense at all.
 I thought the flag onlyLongestMatch would fix this, because streifen is 
 longer than reifen, but it had no effect.
 So I reviewed the sourcecode and found the problem:
 [code]
 protected void decomposeInternal(final Token token) {
 // Only words longer than minWordSize get processed
 if (token.length()  this.minWordSize) {
   return;
 }
 
 char[] lowerCaseTermBuffer=makeLowerCaseCopy(token.buffer());
 
 for (int i=0;itoken.length()-this.minSubwordSize;++i) {
 Token longestMatchToken=null;
 for (int j=this.minSubwordSize-1;jthis.maxSubwordSize;++j) {
 if(i+jtoken.length()) {
 break;
 }
 if(dictionary.contains(lowerCaseTermBuffer, i, j)) {
 if (this.onlyLongestMatch) {
if (longestMatchToken!=null) {
  if (longestMatchToken.length()j) {
longestMatchToken=createToken(i,j,token);
  }
} else {
  longestMatchToken=createToken(i,j,token);
}
 } else {
tokens.add(createToken(i,j,token));
 }
 } 
 }
 if (this.onlyLongestMatch  longestMatchToken!=null) {
   tokens.add(longestMatchToken);
 }
 }
   }
 [/code]
 should be changed to 
 [code]
 protected void decomposeInternal(final Token token) {
 // Only words longer than minWordSize get processed
 if (token.termLength()  this.minWordSize) {
   return;
 }
 char[] lowerCaseTermBuffer=makeLowerCaseCopy(token.termBuffer());
 Token longestMatchToken=null;
 for (int i=0;itoken.termLength()-this.minSubwordSize;++i) {
 for (int j=this.minSubwordSize-1;jthis.maxSubwordSize;++j) {
 if(i+jtoken.termLength()) {
 break;
 }
 if(dictionary.contains(lowerCaseTermBuffer, i, j)) {
 if (this.onlyLongestMatch) {
if (longestMatchToken!=null) {
  if (longestMatchToken.termLength()j) {
longestMatchToken=createToken(i,j,token);
  }
} else {
  longestMatchToken=createToken(i,j,token);
}
 } else {
tokens.add(createToken(i,j,token));
 }
 }
 }
 }
 if (this.onlyLongestMatch  longestMatchToken!=null) {
 tokens.add(longestMatchToken);
 }
   }
 [/code]
 So, that only the longest token is really indexed and the onlyLongestMatch 
 Flag makes sense.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-3022) DictionaryCompoundWordTokenFilter Flag onlyLongestMatch has no affect


 [ 
https://issues.apache.org/jira/browse/LUCENE-3022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-3022:


Fix Version/s: 4.0
   3.2

 DictionaryCompoundWordTokenFilter Flag onlyLongestMatch has no affect
 -

 Key: LUCENE-3022
 URL: https://issues.apache.org/jira/browse/LUCENE-3022
 Project: Lucene - Java
  Issue Type: Bug
  Components: contrib/analyzers
Affects Versions: 2.9.4, 3.1
Reporter: Johann Höchtl
Assignee: Robert Muir
Priority: Minor
 Fix For: 3.2, 4.0

 Attachments: LUCENE-3022.patch

   Original Estimate: 5m
  Remaining Estimate: 5m

 When using the DictionaryCompoundWordTokenFilter with a german dictionary, I 
 got a strange behaviour:
 The german word streifenbluse (blouse with stripes) was decompounded to 
 streifen (stripe),reifen(tire) which makes no sense at all.
 I thought the flag onlyLongestMatch would fix this, because streifen is 
 longer than reifen, but it had no effect.
 So I reviewed the sourcecode and found the problem:
 [code]
 protected void decomposeInternal(final Token token) {
 // Only words longer than minWordSize get processed
 if (token.length()  this.minWordSize) {
   return;
 }
 
 char[] lowerCaseTermBuffer=makeLowerCaseCopy(token.buffer());
 
 for (int i=0;itoken.length()-this.minSubwordSize;++i) {
 Token longestMatchToken=null;
 for (int j=this.minSubwordSize-1;jthis.maxSubwordSize;++j) {
 if(i+jtoken.length()) {
 break;
 }
 if(dictionary.contains(lowerCaseTermBuffer, i, j)) {
 if (this.onlyLongestMatch) {
if (longestMatchToken!=null) {
  if (longestMatchToken.length()j) {
longestMatchToken=createToken(i,j,token);
  }
} else {
  longestMatchToken=createToken(i,j,token);
}
 } else {
tokens.add(createToken(i,j,token));
 }
 } 
 }
 if (this.onlyLongestMatch  longestMatchToken!=null) {
   tokens.add(longestMatchToken);
 }
 }
   }
 [/code]
 should be changed to 
 [code]
 protected void decomposeInternal(final Token token) {
 // Only words longer than minWordSize get processed
 if (token.termLength()  this.minWordSize) {
   return;
 }
 char[] lowerCaseTermBuffer=makeLowerCaseCopy(token.termBuffer());
 Token longestMatchToken=null;
 for (int i=0;itoken.termLength()-this.minSubwordSize;++i) {
 for (int j=this.minSubwordSize-1;jthis.maxSubwordSize;++j) {
 if(i+jtoken.termLength()) {
 break;
 }
 if(dictionary.contains(lowerCaseTermBuffer, i, j)) {
 if (this.onlyLongestMatch) {
if (longestMatchToken!=null) {
  if (longestMatchToken.termLength()j) {
longestMatchToken=createToken(i,j,token);
  }
} else {
  longestMatchToken=createToken(i,j,token);
}
 } else {
tokens.add(createToken(i,j,token));
 }
 }
 }
 }
 if (this.onlyLongestMatch  longestMatchToken!=null) {
 tokens.add(longestMatchToken);
 }
   }
 [/code]
 So, that only the longest token is really indexed and the onlyLongestMatch 
 Flag makes sense.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Setting the max number of merge threads across IndexWriters

2011-04-14 Thread Earwin Burrfoot

I proposed to decouple MergeScheduler from IW (stop keeping a
reference to it). Then you can create a single CMS and pass it to all
your IWs.

On Thu, Apr 14, 2011 at 19:40, Jason Rutherglen
jason.rutherg...@gmail.com wrote:
 I think the proposal involved using a ThreadPoolExecutor, which seemed
 to not quite work as well as what we have.  I think it'll be easier to
 simply pass a global context that keeps a counter of the actively
 running threads, and pass that into each IW's CMS?

 On Thu, Apr 14, 2011 at 8:25 AM, Simon Willnauer
 simon.willna...@googlemail.com wrote:
 On Thu, Apr 14, 2011 at 5:20 PM, Jason Rutherglen
 jason.rutherg...@gmail.com wrote:
 Today the ConcurrentMergeScheduler allows setting the max thread
 count and is bound to a single IndexWriter.

 However in the [common] case of multiple IndexWriters running in
 the same process, this disallows one from managing the aggregate
 number of merge threads executing at any given time.

 I think this can be fixed, shall I open an issue?

 go ahead! I think I have seen this suggestion somewhere maybe you need
 to see if there is one already

 simon

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org



 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org



 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org





-- 
Kirill Zakharenko/Кирилл Захаренко
E-Mail/Jabber: ear...@gmail.com
Phone: +7 (495) 683-567-4
ICQ: 104465785

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Setting the max number of merge threads across IndexWriters

On Thu, Apr 14, 2011 at 5:52 PM, Earwin Burrfoot ear...@gmail.com wrote:
 I proposed to decouple MergeScheduler from IW (stop keeping a
 reference to it). Then you can create a single CMS and pass it to all
 your IWs.
Yep that was it... is there an issue for this?

simon

 On Thu, Apr 14, 2011 at 19:40, Jason Rutherglen
 jason.rutherg...@gmail.com wrote:
 I think the proposal involved using a ThreadPoolExecutor, which seemed
 to not quite work as well as what we have.  I think it'll be easier to
 simply pass a global context that keeps a counter of the actively
 running threads, and pass that into each IW's CMS?

 On Thu, Apr 14, 2011 at 8:25 AM, Simon Willnauer
 simon.willna...@googlemail.com wrote:
 On Thu, Apr 14, 2011 at 5:20 PM, Jason Rutherglen
 jason.rutherg...@gmail.com wrote:
 Today the ConcurrentMergeScheduler allows setting the max thread
 count and is bound to a single IndexWriter.

 However in the [common] case of multiple IndexWriters running in
 the same process, this disallows one from managing the aggregate
 number of merge threads executing at any given time.

 I think this can be fixed, shall I open an issue?

 go ahead! I think I have seen this suggestion somewhere maybe you need
 to see if there is one already

 simon

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org



 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org



 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org





 --
 Kirill Zakharenko/Кирилл Захаренко
 E-Mail/Jabber: ear...@gmail.com
 Phone: +7 (495) 683-567-4
 ICQ: 104465785

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (LUCENE-3029) MultiPhraseQuery assigns different scores to identical docs when using 0 pos-incr

MultiPhraseQuery assigns different scores to identical docs when using 0 
pos-incr
-

 Key: LUCENE-3029
 URL: https://issues.apache.org/jira/browse/LUCENE-3029
 Project: Lucene - Java
  Issue Type: Bug
Reporter: Michael McCandless
Assignee: Michael McCandless
Priority: Minor
 Fix For: 3.0.4, 3.2, 4.0


If you have two identical docs with tokens a b c all zero pos-incr (ie
they occur on the same position), and you run a MultiPhraseQuery with
[a, b] and [c] (all pos incr 0)... then the two docs will get
different scores despite being identical.

Admittedly it's a strange query... but I think the scorer ought to
count the phrase as having tf=1 for each doc.

The problem is that we are missing a tie-breaker for the PhraseQuery
used by ExactPhraseScorer, and so the PQ ends up flip/flopping such
that every other document gets the same score.  Ie, even docIDs all
get one score and odd docIDs all get another score.

Once I added the hard tie-breaker (ord) the scores are the same.

However... there's a separate bug, that can over-count the tf, such
that if I create the MPQ like this:
{noformat}
  mpq.add(new Term[] {new Term(field, a)}, 0);
  mpq.add(new Term[] {new Term(field, b), new Term(field, c)}, 0);
{noformat}

I get tf=2 per doc, but if I create it like this:

{noformat}
  mpq.add(new Term[] {new Term(field, b), new Term(field, c)}, 0);
  mpq.add(new Term[] {new Term(field, a)}, 0);
{noformat}

I get tf=1 (which I think is correct?).

This happens because MultipleTermPositions freely returns the same
position more than once: it just unions the positions of the two
streams, so when both have their term at pos=0, you'll get pos=0
twice, which is not good and leads to over-counting tf.

Unfortunately, I don't see a performant way to fix that... and I'm not
sure that it really matters that much in practice.






--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-3029) MultiPhraseQuery assigns different scores to identical docs when using 0 pos-incr


 [ 
https://issues.apache.org/jira/browse/LUCENE-3029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-3029:
---

Attachment: LUCENE-3029.patch

Patch.

 MultiPhraseQuery assigns different scores to identical docs when using 0 
 pos-incr
 -

 Key: LUCENE-3029
 URL: https://issues.apache.org/jira/browse/LUCENE-3029
 Project: Lucene - Java
  Issue Type: Bug
Reporter: Michael McCandless
Assignee: Michael McCandless
Priority: Minor
 Fix For: 3.0.4, 3.2, 4.0

 Attachments: LUCENE-3029.patch


 If you have two identical docs with tokens a b c all zero pos-incr (ie
 they occur on the same position), and you run a MultiPhraseQuery with
 [a, b] and [c] (all pos incr 0)... then the two docs will get
 different scores despite being identical.
 Admittedly it's a strange query... but I think the scorer ought to
 count the phrase as having tf=1 for each doc.
 The problem is that we are missing a tie-breaker for the PhraseQuery
 used by ExactPhraseScorer, and so the PQ ends up flip/flopping such
 that every other document gets the same score.  Ie, even docIDs all
 get one score and odd docIDs all get another score.
 Once I added the hard tie-breaker (ord) the scores are the same.
 However... there's a separate bug, that can over-count the tf, such
 that if I create the MPQ like this:
 {noformat}
   mpq.add(new Term[] {new Term(field, a)}, 0);
   mpq.add(new Term[] {new Term(field, b), new Term(field, c)}, 0);
 {noformat}
 I get tf=2 per doc, but if I create it like this:
 {noformat}
   mpq.add(new Term[] {new Term(field, b), new Term(field, c)}, 0);
   mpq.add(new Term[] {new Term(field, a)}, 0);
 {noformat}
 I get tf=1 (which I think is correct?).
 This happens because MultipleTermPositions freely returns the same
 position more than once: it just unions the positions of the two
 streams, so when both have their term at pos=0, you'll get pos=0
 twice, which is not good and leads to over-counting tf.
 Unfortunately, I don't see a performant way to fix that... and I'm not
 sure that it really matters that much in practice.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-3022) DictionaryCompoundWordTokenFilter Flag onlyLongestMatch has no affect


 [ 
https://issues.apache.org/jira/browse/LUCENE-3022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-3022:


Attachment: LUCENE-3022.patch

Hi Johann, in my opinion your patch is completely correct, thanks for fixing 
this.

I noticed though, that a solr test failed because its factory defaults to this 
value being on (and the previous behavior was broken!!!)

Because of this, I propose we default this behavior to off in the Solr 
factory and add an upgrading note. Previously decompounding in solr defaulted 
to buggy behavior, but I think by default we should index all compound 
components (since that seems to be what the desired intended behavior was, 
which mostly worked, only because of the bug!)

I'll leave the issue open for a few days to see if anyone objects to this plan.


 DictionaryCompoundWordTokenFilter Flag onlyLongestMatch has no affect
 -

 Key: LUCENE-3022
 URL: https://issues.apache.org/jira/browse/LUCENE-3022
 Project: Lucene - Java
  Issue Type: Bug
  Components: contrib/analyzers
Affects Versions: 2.9.4, 3.1
Reporter: Johann Höchtl
Assignee: Robert Muir
Priority: Minor
 Fix For: 3.2, 4.0

 Attachments: LUCENE-3022.patch, LUCENE-3022.patch

   Original Estimate: 5m
  Remaining Estimate: 5m

 When using the DictionaryCompoundWordTokenFilter with a german dictionary, I 
 got a strange behaviour:
 The german word streifenbluse (blouse with stripes) was decompounded to 
 streifen (stripe),reifen(tire) which makes no sense at all.
 I thought the flag onlyLongestMatch would fix this, because streifen is 
 longer than reifen, but it had no effect.
 So I reviewed the sourcecode and found the problem:
 [code]
 protected void decomposeInternal(final Token token) {
 // Only words longer than minWordSize get processed
 if (token.length()  this.minWordSize) {
   return;
 }
 
 char[] lowerCaseTermBuffer=makeLowerCaseCopy(token.buffer());
 
 for (int i=0;itoken.length()-this.minSubwordSize;++i) {
 Token longestMatchToken=null;
 for (int j=this.minSubwordSize-1;jthis.maxSubwordSize;++j) {
 if(i+jtoken.length()) {
 break;
 }
 if(dictionary.contains(lowerCaseTermBuffer, i, j)) {
 if (this.onlyLongestMatch) {
if (longestMatchToken!=null) {
  if (longestMatchToken.length()j) {
longestMatchToken=createToken(i,j,token);
  }
} else {
  longestMatchToken=createToken(i,j,token);
}
 } else {
tokens.add(createToken(i,j,token));
 }
 } 
 }
 if (this.onlyLongestMatch  longestMatchToken!=null) {
   tokens.add(longestMatchToken);
 }
 }
   }
 [/code]
 should be changed to 
 [code]
 protected void decomposeInternal(final Token token) {
 // Only words longer than minWordSize get processed
 if (token.termLength()  this.minWordSize) {
   return;
 }
 char[] lowerCaseTermBuffer=makeLowerCaseCopy(token.termBuffer());
 Token longestMatchToken=null;
 for (int i=0;itoken.termLength()-this.minSubwordSize;++i) {
 for (int j=this.minSubwordSize-1;jthis.maxSubwordSize;++j) {
 if(i+jtoken.termLength()) {
 break;
 }
 if(dictionary.contains(lowerCaseTermBuffer, i, j)) {
 if (this.onlyLongestMatch) {
if (longestMatchToken!=null) {
  if (longestMatchToken.termLength()j) {
longestMatchToken=createToken(i,j,token);
  }
} else {
  longestMatchToken=createToken(i,j,token);
}
 } else {
tokens.add(createToken(i,j,token));
 }
 }
 }
 }
 if (this.onlyLongestMatch  longestMatchToken!=null) {
 tokens.add(longestMatchToken);
 }
   }
 [/code]
 So, that only the longest token is really indexed and the onlyLongestMatch 
 Flag makes sense.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: [ANNOUNCE] Apache PyLucene 3.1.0

2011-04-14 Thread Andi Vajda


On Apr 14, 2011, at 2:05, Valery Khamenya khame...@gmail.com wrote:

 Thanks!
 
 btw, originally I went this way 
 http://lucene.apache.org/pylucene/jcc/documentation/install.html , but it is 
 not up-to-date and trunk doesn't seem to be compilable (problems with ant xml 
 files, plus no doc directory etc).

Trunk needs work. Trunk is based off Lucene's trunk which is moving and 
changing rapidly, and anything but stable. 
In particular, pylucene unit tests and samples need to be more or less redone 
because of all the api changes that occurred on Lucene's trunk. 

The stable trunk is actually branch_3x which should build and run and pass 
all its tests. The 3.1 release that just occurred comes from branch_3x.

Andi..

 
 Then I used the tar and it worked like charm.
 
 best regards
 --
 Valery A.Khamenya
 
 
 On Fri, Apr 8, 2011 at 5:16 AM, dar...@ontrenet.com wrote:
 
 Congrats Andi. A truly awesome project.
 
 On Thu, 7 Apr 2011 20:02:22 -0700 (PDT), Andi Vajda va...@apache.org
 wrote:
  I am pleased to announce the availability of Apache PyLucene 3.1.0.
 
  Apache PyLucene, a subproject of Apache Lucene, is a Python extension
 for
  accessing Apache Lucene Core. Its goal is to allow you to use Lucene's
 text
  indexing and searching capabilities from Python. It is API compatible
 with
  the latest version of Lucene Core, 3.1.0.
 
  This release contains a number of bug fixes and improvements. Details
 can
  be
  found in the changes files:
 
 
 http://svn.apache.org/repos/asf/lucene/pylucene/tags/pylucene_3_1_0/CHANGES
  http://svn.apache.org/repos/asf/lucene/pylucene/trunk/jcc/CHANGES
 
  Apache PyLucene is available from the following download page:
 
 http://www.apache.org/dyn/closer.cgi/lucene/pylucene/pylucene-3.1.0-1-src.tar.gz
 
  When downloading from a mirror site, please remember to verify the
  downloads
  using signatures found on the Apache site:
  http://www.apache.org/dist/lucene/pylucene/KEYS
 
  For more information on Apache PyLucene, visit the project home page:
  http://lucene.apache.org/pylucene
 
  Andi..

Re: failure of some PyLucene tests on windows OS

2011-04-14 Thread Andi Vajda


On Apr 14, 2011, at 2:22, Thomas Koch k...@orbiteam.de wrote:

 Well, sure, not running the code that breaks solves the problem. But can
 you
 then run the tests multiple times ?
 
 [Thomas Koch] note that previously closeStore() was not called, but now when
 calling it the test_PyLucene runs OK.
 
 And yes I can run tests several times- on PyLucene2.9 the index-dirs
 testrepo and testpyrepo are cleaned up (i.e. removed) after tests succeed.
 With PyLucene3.1 the testpyrepo is left (because test_PythonDirectory.py
 fails to cleanup).
 
 So this looks like a problem in the test-code rather than in
 windows/python/pyLucene - some store not being closed results in file
 lock.
 
 Not sure if the bug is there.
 
 Me neither ,-( 
 
 Just for test_PyLucene the 'test-code-fix' fixes the issue. Another problem
 is the dependency between test_PythonDirectory and test_PyLucene:
 test_PythonDirectory uses tests defined in test_PyLucene which makes it bit
 difficult to figure out where the problem is...
 
 
 So has anyone seen this problem before? 

I'd expect anyone running on Windows to see these test failures. 

Andi..

 
 
 Regards,
 Thomas

[jira] [Commented] (LUCENE-3018) Lucene Native Directory implementation need automated build


[ 
https://issues.apache.org/jira/browse/LUCENE-3018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13019889#comment-13019889
 ] 

Simon Willnauer commented on LUCENE-3018:
-

one more comment about the cpptasks-1.0b4.jar, I think we should put it into 
lucene/contrib/misc/lib instead of lucene/lib since we only need it in there 
though. while we are on that you might need to update the README.TXT and the 
overview.html accordingly since we now have an ant build for it.


 Lucene Native Directory implementation need automated build
 ---

 Key: LUCENE-3018
 URL: https://issues.apache.org/jira/browse/LUCENE-3018
 Project: Lucene - Java
  Issue Type: Wish
  Components: Build
Affects Versions: 4.0
Reporter: Simon Willnauer
Assignee: Varun Thacker
Priority: Minor
 Fix For: 4.0

 Attachments: LUCENE-3018.patch, LUCENE-3018.patch, LUCENE-3018.patch, 
 cpptasks.jar


 Currently the native directory impl in contrib/misc require manual action to 
 compile the c code (partially) documented in 
  
 https://svn.apache.org/repos/asf/lucene/dev/trunk/lucene/contrib/misc/src/java/overview.html
 yet it would be nice if we had an ant task and documentation for all 
 platforms how to compile them and set up the prerequisites.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-2571) Indexing performance tests with realtime branch


 [ 
https://issues.apache.org/jira/browse/LUCENE-2571?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Willnauer updated LUCENE-2571:


Attachment: (was: wikimedium.realtime.Standard.nd10M_dps.png)

 Indexing performance tests with realtime branch
 ---

 Key: LUCENE-2571
 URL: https://issues.apache.org/jira/browse/LUCENE-2571
 Project: Lucene - Java
  Issue Type: Task
  Components: Index
Reporter: Michael Busch
Priority: Minor
 Fix For: Realtime Branch

 Attachments: wikimedium.realtime.Standard.nd10M_dps.png, 
 wikimedium.realtime.Standard.nd10M_dps_addDocuments.png, 
 wikimedium.realtime.Standard.nd10M_dps_addDocuments_flush.png, 
 wikimedium.trunk.Standard.nd10M_dps.png, 
 wikimedium.trunk.Standard.nd10M_dps_addDocuments.png


 We should run indexing performance tests with the DWPT changes and compare to 
 trunk.
 We need to test both single-threaded and multi-threaded performance.
 NOTE:  flush by RAM isn't implemented just yet, so either we wait with the 
 tests or flush by doc count.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-2571) Indexing performance tests with realtime branch


[ 
https://issues.apache.org/jira/browse/LUCENE-2571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13019890#comment-13019890
 ] 

Simon Willnauer commented on LUCENE-2571:
-

I run batch indexing benchmarks trunk vs. realtime branch with addDocument and 
with updateDocument. 

For add document I indexed 10M wikipedia docs into a spinning disk reading from 
a separate SSD

Here is the realtime graph:
!wikimedium.realtime.Standard.nd10M_dps_addDocuments.png!

vs. trunk:
!wikimedium.trunk.Standard.nd10M_dps_addDocuments.png!

This graph shows how DWPT is flushing to disk over time:

!wikimedium.realtime.Standard.nd10M_dps_addDocuments_flush.png!

for updateDocument I build a 10M docs wiki index and indexed the exact same 
documents with updateDocument here are the results:
Realtime Branch:
!wikimedium.realtime.Standard.nd10M_dps.png!

trunk:
!wikimedium.trunk.Standard.nd10M_dps.png!



 Indexing performance tests with realtime branch
 ---

 Key: LUCENE-2571
 URL: https://issues.apache.org/jira/browse/LUCENE-2571
 Project: Lucene - Java
  Issue Type: Task
  Components: Index
Reporter: Michael Busch
Priority: Minor
 Fix For: Realtime Branch

 Attachments: wikimedium.realtime.Standard.nd10M_dps.png, 
 wikimedium.realtime.Standard.nd10M_dps_addDocuments.png, 
 wikimedium.realtime.Standard.nd10M_dps_addDocuments_flush.png, 
 wikimedium.trunk.Standard.nd10M_dps.png, 
 wikimedium.trunk.Standard.nd10M_dps_addDocuments.png


 We should run indexing performance tests with the DWPT changes and compare to 
 trunk.
 We need to test both single-threaded and multi-threaded performance.
 NOTE:  flush by RAM isn't implemented just yet, so either we wait with the 
 tests or flush by doc count.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-2571) Indexing performance tests with realtime branch


 [ 
https://issues.apache.org/jira/browse/LUCENE-2571?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Willnauer updated LUCENE-2571:


Attachment: wikimedium.trunk.Standard.nd10M_dps.png
wikimedium.realtime.Standard.nd10M_dps.png

updated attachements

 Indexing performance tests with realtime branch
 ---

 Key: LUCENE-2571
 URL: https://issues.apache.org/jira/browse/LUCENE-2571
 Project: Lucene - Java
  Issue Type: Task
  Components: Index
Reporter: Michael Busch
Priority: Minor
 Fix For: Realtime Branch

 Attachments: wikimedium.realtime.Standard.nd10M_dps.png, 
 wikimedium.realtime.Standard.nd10M_dps_addDocuments.png, 
 wikimedium.realtime.Standard.nd10M_dps_addDocuments_flush.png, 
 wikimedium.trunk.Standard.nd10M_dps.png, 
 wikimedium.trunk.Standard.nd10M_dps_addDocuments.png


 We should run indexing performance tests with the DWPT changes and compare to 
 trunk.
 We need to test both single-threaded and multi-threaded performance.
 NOTE:  flush by RAM isn't implemented just yet, so either we wait with the 
 tests or flush by doc count.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[HUDSON] Lucene-Solr-tests-only-trunk - Build # 7105 - Still Failing

Build: https://hudson.apache.org/hudson/job/Lucene-Solr-tests-only-trunk/7105/

No tests ran.

Build Log (for compile errors):
[...truncated 47 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[HUDSON] Lucene-Solr-tests-only-3.x - Build # 7097 - Failure

Build: https://hudson.apache.org/hudson/job/Lucene-Solr-tests-only-3.x/7097/

No tests ran.

Build Log (for compile errors):
[...truncated 54 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[HUDSON] Lucene-Solr-tests-only-realtime_search-branch - Build # 3 - Still Failing

Build: 
https://hudson.apache.org/hudson/job/Lucene-Solr-tests-only-realtime_search-branch/3/

No tests ran.

Build Log (for compile errors):
[...truncated 53 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (SOLR-2468) TestFunctionQuery fails always on windows

TestFunctionQuery fails always on windows
-

 Key: SOLR-2468
 URL: https://issues.apache.org/jira/browse/SOLR-2468
 Project: Solr
  Issue Type: Bug
Reporter: Robert Muir


NOTE: reproduce with: ant test -Dtestcase=TestFunctionQuery 
-Dtestmethod=testExternalFieldValueSourceParser 
-Dtests.seed=1172323467847461017:3327452514993896990

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2468) TestFunctionQuery fails always on windows


[ 
https://issues.apache.org/jira/browse/SOLR-2468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13019906#comment-13019906
 ] 

Robert Muir commented on SOLR-2468:
---

{noformat}
[junit] - Standard Error -
[junit] NOTE: reproduce with: ant test -Dtestcase=TestFunctionQuery 
-Dtestmethod=testExternalFieldValueSourceParser 
-Dtests.seed=1172323467847461017:3327452
514993896990
[junit] NOTE: test params are: codec=PreFlex, locale=hr, 
timezone=America/Argentina/La_Rioja
[junit] NOTE: all tests run in this JVM:
[junit] [TestFunctionQuery]
[junit] NOTE: Windows Vista 6.0 x86/Sun Microsystems Inc. 1.6.0_23 
(32-bit)/cpus=4,threads=1,free=10225608,total=16252928
[junit] -  ---
[junit] Testcase: 
testExternalFieldValueSourceParser(org.apache.solr.search.function.TestFunctionQuery):
Caused an ERROR
[junit] java.io.FileNotFoundException: 
C:\Users\rmuir\workspace\lucene-trunk\solr\build\test-results\temp\1\solrtest-TestFunctionQuery-1302799686658\exte
rnal_CoMpleX  fieldName _extf.1302799686550 (The filename, directory name, or 
volume label syntax is incorrect)
[junit] java.lang.RuntimeException: java.io.FileNotFoundException: 
C:\Users\rmuir\workspace\lucene-trunk\solr\build\test-results\temp\1\solrtest-TestFunc
tionQuery-1302799686658\external_CoMpleX  fieldName _extf.1302799686550 (The 
filename, directory name, or volume label syntax is incorrect)
[junit] at 
org.apache.solr.search.function.TestFunctionQuery.makeExternalFile(TestFunctionQuery.java:56)
[junit] at 
org.apache.solr.search.function.TestFunctionQuery.testExternalFieldValueSourceParser(TestFunctionQuery.java:536)
[junit] at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1232)
[junit] at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1160)
[junit] Caused by: java.io.FileNotFoundException: 
C:\Users\rmuir\workspace\lucene-trunk\solr\build\test-results\temp\1\solrtest-TestFunctionQuery-1302799
686658\external_CoMpleX  fieldName _extf.1302799686550 (The filename, 
directory name, or volume label syntax is incorrect)
[junit] at java.io.FileOutputStream.open(Native Method)
[junit] at java.io.FileOutputStream.init(FileOutputStream.java:179)
[junit] at java.io.FileOutputStream.init(FileOutputStream.java:70)
[junit] at 
org.apache.solr.search.function.TestFunctionQuery.makeExternalFile(TestFunctionQuery.java:52)
[junit]
{noformat}

 TestFunctionQuery fails always on windows
 -

 Key: SOLR-2468
 URL: https://issues.apache.org/jira/browse/SOLR-2468
 Project: Solr
  Issue Type: Bug
Reporter: Robert Muir

 NOTE: reproduce with: ant test -Dtestcase=TestFunctionQuery 
 -Dtestmethod=testExternalFieldValueSourceParser 
 -Dtests.seed=1172323467847461017:3327452514993896990

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2468) TestFunctionQuery fails always on windows


[ 
https://issues.apache.org/jira/browse/SOLR-2468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13019913#comment-13019913
 ] 

Robert Muir commented on SOLR-2468:
---

just at a glance, it appears the test tries to create a file with a double 
quote in it. On some platforms such as windows, you cannot use certain 
characters in a filename... I think this is the problem?

 TestFunctionQuery fails always on windows
 -

 Key: SOLR-2468
 URL: https://issues.apache.org/jira/browse/SOLR-2468
 Project: Solr
  Issue Type: Bug
Reporter: Robert Muir

 NOTE: reproduce with: ant test -Dtestcase=TestFunctionQuery 
 -Dtestmethod=testExternalFieldValueSourceParser 
 -Dtests.seed=1172323467847461017:3327452514993896990

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[HUDSON] Lucene-Solr-tests-only-3.x - Build # 7098 - Still Failing

Build: https://hudson.apache.org/hudson/job/Lucene-Solr-tests-only-3.x/7098/

No tests ran.

Build Log (for compile errors):
[...truncated 52 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[HUDSON] Lucene-Solr-tests-only-3.x - Build # 7099 - Still Failing

Build: https://hudson.apache.org/hudson/job/Lucene-Solr-tests-only-3.x/7099/

No tests ran.

Build Log (for compile errors):
[...truncated 52 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-3016) Analyzer for Latvian


 [ 
https://issues.apache.org/jira/browse/LUCENE-3016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir resolved LUCENE-3016.
-

Resolution: Fixed

Committed revision 1092396, 1092398 (branch_3x)

 Analyzer for Latvian
 

 Key: LUCENE-3016
 URL: https://issues.apache.org/jira/browse/LUCENE-3016
 Project: Lucene - Java
  Issue Type: New Feature
Reporter: Robert Muir
Assignee: Robert Muir
 Fix For: 3.2, 4.0

 Attachments: LUCENE-3016.patch


 Less aggressive form of Kreslins' phd thesis: A stemming algorithm for 
 Latvian.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Setting the max number of merge threads across IndexWriters

2011-04-14 Thread Earwin Burrfoot

Can't remember. Probably no. I started an experimental MS api rewrite
(incorporating ability to share MSs between IWs) some time ago, but
never had the time to finish it.

On Thu, Apr 14, 2011 at 19:56, Simon Willnauer
simon.willna...@googlemail.com wrote:
 On Thu, Apr 14, 2011 at 5:52 PM, Earwin Burrfoot ear...@gmail.com wrote:
 I proposed to decouple MergeScheduler from IW (stop keeping a
 reference to it). Then you can create a single CMS and pass it to all
 your IWs.
 Yep that was it... is there an issue for this?

 simon

 On Thu, Apr 14, 2011 at 19:40, Jason Rutherglen
 jason.rutherg...@gmail.com wrote:
 I think the proposal involved using a ThreadPoolExecutor, which seemed
 to not quite work as well as what we have.  I think it'll be easier to
 simply pass a global context that keeps a counter of the actively
 running threads, and pass that into each IW's CMS?

 On Thu, Apr 14, 2011 at 8:25 AM, Simon Willnauer
 simon.willna...@googlemail.com wrote:
 On Thu, Apr 14, 2011 at 5:20 PM, Jason Rutherglen
 jason.rutherg...@gmail.com wrote:
 Today the ConcurrentMergeScheduler allows setting the max thread
 count and is bound to a single IndexWriter.

 However in the [common] case of multiple IndexWriters running in
 the same process, this disallows one from managing the aggregate
 number of merge threads executing at any given time.

 I think this can be fixed, shall I open an issue?

 go ahead! I think I have seen this suggestion somewhere maybe you need
 to see if there is one already

 simon

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org



 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org



 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org





 --
 Kirill Zakharenko/Кирилл Захаренко
 E-Mail/Jabber: ear...@gmail.com
 Phone: +7 (495) 683-567-4
 ICQ: 104465785

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org



 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org





-- 
Kirill Zakharenko/Кирилл Захаренко
E-Mail/Jabber: ear...@gmail.com
Phone: +7 (495) 683-567-4
ICQ: 104465785

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2378) FST-based Lookup (suggestions) for prefix matches.


[ 
https://issues.apache.org/jira/browse/SOLR-2378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13019932#comment-13019932
 ] 

Robert Muir commented on SOLR-2378:
---

Just an idea: should we default to this implementation in trunk?
It seems to be a significant reduction in RAM.


 FST-based Lookup (suggestions) for prefix matches.
 --

 Key: SOLR-2378
 URL: https://issues.apache.org/jira/browse/SOLR-2378
 Project: Solr
  Issue Type: New Feature
  Components: spellchecker
Reporter: Dawid Weiss
Assignee: Dawid Weiss
  Labels: lookup, prefix
 Fix For: 4.0

 Attachments: SOLR-2378.patch


 Implement a subclass of Lookup based on finite state automata/ transducers 
 (Lucene FST package). This issue is for implementing a relatively basic 
 prefix matcher, we will handle infixes and other types of input matches 
 gradually. Impl. phases:
 - -write a DFA based suggester effectively identical to ternary tree based 
 solution right now,-
 - -baseline benchmark against tern. tree (memory consumption, rebuilding 
 speed, indexing speed; reuse Andrzej's benchmark code)-
 - -modify DFA to encode term weights directly in the automaton (optimize for 
 onlyMostPopular case)-
 - -benchmark again-
 - -benchmark again-
 - -modify the tutorial on the wiki- [http://wiki.apache.org/solr/Suggester]

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[HUDSON] Lucene-Solr-tests-only-trunk - Build # 7107 - Still Failing

Build: https://hudson.apache.org/hudson/job/Lucene-Solr-tests-only-trunk/7107/

No tests ran.

Build Log (for compile errors):
[...truncated 31 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[HUDSON] Lucene-Solr-tests-only-trunk - Build # 7108 - Still Failing

Build: https://hudson.apache.org/hudson/job/Lucene-Solr-tests-only-trunk/7108/

No tests ran.

Build Log (for compile errors):
[...truncated 7478 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2193) Re-architect Update Handler

[
https://issues.apache.org/jira/browse/SOLR-2193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13019953#comment-13019953
]

Jayson Minard commented on SOLR-2193:
-

Some of this was already solved in:
https://issues.apache.org/jira/browse/SOLR-1155

(locking and re-opening index writer were fixed)

Re-architect Update Handler
---

Key: SOLR-2193
URL: https://issues.apache.org/jira/browse/SOLR-2193
Project: Solr
Issue Type: Improvement
Reporter: Mark Miller
Assignee: Mark Miller
Fix For: 4.0

Attachments: SOLR-2193.patch, SOLR-2193.patch, SOLR-2193.patch

The update handler needs an overhaul.
A few goals I think we might want to look at:
1. Cleanup - drop DirectUpdateHandler(2) line - move to something like
UpdateHandler, DefaultUpdateHandler
2. Expose the SolrIndexWriter in the api or add the proper abstractions to
get done what we now do with special casing:
if (directupdatehandler2)
success
else
failish
3. Stop closing the IndexWriter and start using commit (still lazy IW init
though).
4. Drop iwAccess, iwCommit locks and sync mostly at the Lucene level.
5. Keep NRT support in mind.
6. Keep microsharding in mind (maintain logical index as multiple physical
indexes)
7. Address the current issues we face because multiple original/'reloaded'
cores can have a different IndexWriter on the same index.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-1155) Change DirectUpdateHandler2 to allow concurrent adds during an autocommit


[ 
https://issues.apache.org/jira/browse/SOLR-1155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13019954#comment-13019954
 ] 

Jayson Minard commented on SOLR-1155:
-

Thank Yonik, I'll take a look at his to see if there was anything I learned 
that applies.  This SOLR-1155 has been used in heavy production load and is 
very stable against 1.4 so maybe Mark will take a peek, I posted a note on the 
other issue as well.

 Change DirectUpdateHandler2 to allow concurrent adds during an autocommit
 -

 Key: SOLR-1155
 URL: https://issues.apache.org/jira/browse/SOLR-1155
 Project: Solr
  Issue Type: Improvement
  Components: search
Affects Versions: 1.3, 1.4
Reporter: Jayson Minard
 Fix For: Next

 Attachments: SOLR-1155-release1.4-rev834789.patch, 
 SOLR-1155-trunk-rev834706.patch, Solr-1155.patch, Solr-1155.patch


 Currently DirectUpdateHandler2 will block adds during a commit, and it seems 
 to be possible with recent changes to Lucene to allow them to run 
 concurrently.  
 See: 
 http://www.nabble.com/Autocommit-blocking-adds---AutoCommit-Speedup--td23435224.html

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-1155) Change DirectUpdateHandler2 to allow concurrent adds during an autocommit


[ 
https://issues.apache.org/jira/browse/SOLR-1155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13019955#comment-13019955
 ] 

Jayson Minard commented on SOLR-1155:
-

I'll look at updating this for 3.1 for those that need it on that release, and 
Mark's looks good for 4.x and beyond.

 Change DirectUpdateHandler2 to allow concurrent adds during an autocommit
 -

 Key: SOLR-1155
 URL: https://issues.apache.org/jira/browse/SOLR-1155
 Project: Solr
  Issue Type: Improvement
  Components: search
Affects Versions: 1.3, 1.4
Reporter: Jayson Minard
 Fix For: Next

 Attachments: SOLR-1155-release1.4-rev834789.patch, 
 SOLR-1155-trunk-rev834706.patch, Solr-1155.patch, Solr-1155.patch


 Currently DirectUpdateHandler2 will block adds during a commit, and it seems 
 to be possible with recent changes to Lucene to allow them to run 
 concurrently.  
 See: 
 http://www.nabble.com/Autocommit-blocking-adds---AutoCommit-Speedup--td23435224.html

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2193) Re-architect Update Handler

[
https://issues.apache.org/jira/browse/SOLR-2193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13019961#comment-13019961
]

Jayson Minard commented on SOLR-2193:
-

Since SOLR-1155 is probably an easier change for Solr 3.1 due to its ancestry,
so to get the same benefits I'll work to update it for that version, assuming
this patch of yours is for 4.x onwards.

Re-architect Update Handler
---

Key: SOLR-2193
URL: https://issues.apache.org/jira/browse/SOLR-2193
Project: Solr
Issue Type: Improvement
Reporter: Mark Miller
Assignee: Mark Miller
Fix For: 4.0

Attachments: SOLR-2193.patch, SOLR-2193.patch, SOLR-2193.patch

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2193) Re-architect Update Handler

2011-04-14 Thread Mark Miller (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-2193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13019983#comment-13019983
]

Mark Miller commented on SOLR-2193:
---

Yes - my plan for this was 4.x.

Re-architect Update Handler
---

Key: SOLR-2193
URL: https://issues.apache.org/jira/browse/SOLR-2193
Project: Solr
Issue Type: Improvement
Reporter: Mark Miller
Assignee: Mark Miller
Fix For: 4.0

Attachments: SOLR-2193.patch, SOLR-2193.patch, SOLR-2193.patch

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-2468) TestFunctionQuery fails always on windows

2011-04-14 Thread Hoss Man (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-2468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hoss Man updated SOLR-2468:
---

Attachment: SOLR-2468.patch

this is a test i recently added for SOLR-2335, didn't realize some oses would 
complain about quotes in filenames.

i pulled the test apart to test the two differnet aspects independently, so now 
the esoteric file name testing just relies on being able to support spaces in 
filenames.



 TestFunctionQuery fails always on windows
 -

 Key: SOLR-2468
 URL: https://issues.apache.org/jira/browse/SOLR-2468
 Project: Solr
  Issue Type: Bug
Reporter: Robert Muir
 Attachments: SOLR-2468.patch


 NOTE: reproduce with: ant test -Dtestcase=TestFunctionQuery 
 -Dtestmethod=testExternalFieldValueSourceParser 
 -Dtests.seed=1172323467847461017:3327452514993896990

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Lucene Merge failing on Open Files

On Wed, Apr 6, 2011 at 8:44 PM, Grant Ingersoll gsing...@apache.org wrote:

 Begin forwarded message:

 From: Michael McCandless luc...@mikemccandless.com
 Date: April 5, 2011 5:46:13 AM EDT
 To: simon.willna...@gmail.com
 Cc: Simon Willnauer simon.willna...@googlemail.com,
 java-u...@lucene.apache.org, paul_t...@fastmail.fm
 Subject: Re: Lucene Merge failing on Open Files
 Reply-To: java-u...@lucene.apache.org

 Yeah, that mergeFactor is way too high and will cause
 too-many-open-files (if the index has enough segments).

 This is one of the things that has always bothered me about Merge Factor.
  We state what the lower bound is, but we don't doc the upper bound.
 Should we even allow higher values?  Of course, how does one pick the
 cutoff?  I've seen up to about 100 be effective.  But 3000 is a bit high
 (although, who knows what the future will hold)

grant, we can at least add some documentation no?

simon

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2468) TestFunctionQuery fails always on windows


[ 
https://issues.apache.org/jira/browse/SOLR-2468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13019993#comment-13019993
 ] 

Robert Muir commented on SOLR-2468:
---

+1

 TestFunctionQuery fails always on windows
 -

 Key: SOLR-2468
 URL: https://issues.apache.org/jira/browse/SOLR-2468
 Project: Solr
  Issue Type: Bug
Reporter: Robert Muir
 Attachments: SOLR-2468.patch


 NOTE: reproduce with: ant test -Dtestcase=TestFunctionQuery 
 -Dtestmethod=testExternalFieldValueSourceParser 
 -Dtests.seed=1172323467847461017:3327452514993896990

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3028) IW.getReader() returns inconsistent reader on RT Branch


[ 
https://issues.apache.org/jira/browse/LUCENE-3028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13019996#comment-13019996
 ] 

selckin commented on LUCENE-3028:
-

Seems to fail once every 6-8 runs quite consistently (at least i think this is 
the issue)

brachnes/realtime_search r1092329

{{
[junit] Testsuite: org.apache.lucene.index.TestRollingUpdates
[junit] Testcase: 
testUpdateSameDoc(org.apache.lucene.index.TestRollingUpdates):Caused an 
ERROR
[junit] MockDirectoryWrapper: cannot close: there are still open files: 
{_ho.fdt=1, _ho.prx=1, _ho.fdx=1, _ho.nrm=1, _j0.fdt=1, _ho.tis=1, _j0.fdx=1, 
_j0.tis=1, _j0.prx=1, _ho.frq=1, _ho.tvx=1, _ho.tvd=1, _j0.nrm=1, _ho.tvf=1, 
_j0.frq=1, _j0.tvf=1, _j0.tvd=1, _j0.tvx=1}
[junit] java.lang.RuntimeException: MockDirectoryWrapper: cannot close: 
there are still open files: {_ho.fdt=1, _ho.prx=1, _ho.fdx=1, _ho.nrm=1, 
_j0.fdt=1, _ho.tis=1, _j0.fdx=1, _j0.tis=1, _j0.prx=1, _ho.frq=1, _ho.tvx=1, 
_ho.tvd=1, _j0.nrm=1, _ho.tvf=1, _j0.frq=1, _j0.tvf=1
, _j0.tvd=1, _j0.tvx=1}
[junit] at 
org.apache.lucene.store.MockDirectoryWrapper.close(MockDirectoryWrapper.java:414)
[junit] at 
org.apache.lucene.index.TestRollingUpdates.testUpdateSameDoc(TestRollingUpdates.java:104)
[junit] at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1226)
[junit] at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1154)
[junit] Caused by: java.lang.RuntimeException: unclosed IndexInput
[junit] at 
org.apache.lucene.store.MockDirectoryWrapper.openInput(MockDirectoryWrapper.java:369)
[junit] at 
org.apache.lucene.store.Directory.openInput(Directory.java:122)
[junit] at 
org.apache.lucene.index.TermVectorsReader.init(TermVectorsReader.java:86)
[junit] at 
org.apache.lucene.index.SegmentReader$CoreReaders.openDocStores(SegmentReader.java:236)
[junit] at 
org.apache.lucene.index.SegmentReader.get(SegmentReader.java:495)
[junit] at 
org.apache.lucene.index.IndexWriter$ReaderPool.get(IndexWriter.java:629)
[junit] at 
org.apache.lucene.index.IndexWriter$ReaderPool.getReadOnlyClone(IndexWriter.java:587)
[junit] at 
org.apache.lucene.index.DirectoryReader.init(DirectoryReader.java:172)
[junit] at 
org.apache.lucene.index.IndexWriter.getReader(IndexWriter.java:377)
[junit] at 
org.apache.lucene.index.DirectoryReader.doReopenFromWriter(DirectoryReader.java:419)
[junit] at 
org.apache.lucene.index.DirectoryReader.doReopen(DirectoryReader.java:432)
[junit] at 
org.apache.lucene.index.DirectoryReader.reopen(DirectoryReader.java:392)
[junit] at 
org.apache.lucene.index.TestRollingUpdates$IndexingThread.run(TestRollingUpdates.java:129)
[junit] 
[junit] 
[junit] Testcase: 
testUpdateSameDoc(org.apache.lucene.index.TestRollingUpdates):FAILED
[junit] Some threads threw uncaught exceptions!
[junit] junit.framework.AssertionFailedError: Some threads threw uncaught 
exceptions!
[junit] at 
org.apache.lucene.util.LuceneTestCase.tearDown(LuceneTestCase.java:521)
[junit] at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1226)
[junit] at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1154)
[junit] 
[junit] 
[junit] Tests run: 2, Failures: 1, Errors: 1, Time elapsed: 6.649 sec
[junit] 
[junit] - Standard Error -
[junit] NOTE: reproduce with: ant test -Dtestcase=TestRollingUpdates 
-Dtestmethod=testUpdateSameDoc 
-Dtests.seed=-4094951767438954769:-1203905293622856057
[junit] NOTE: reproduce with: ant test -Dtestcase=TestRollingUpdates 
-Dtestmethod=testUpdateSameDoc 
-Dtests.seed=-4094951767438954769:-1203905293622856057
[junit] The following exceptions were thrown by threads:
[junit] *** Thread: Thread-103 ***
[junit] java.lang.AssertionError: expected: 
org.apache.lucene.index.DocumentsWriterDeleteQueue@18635827but was: 
org.apache.lucene.index.DocumentsWriterDeleteQueue@223074f3 false
[junit] at 
org.apache.lucene.index.DocumentsWriter.doFlush(DocumentsWriter.java:359)
[junit] at 
org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:346)
[junit] at 
org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1367)
[junit] at 
org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1339)
[junit] at 
org.apache.lucene.index.TestRollingUpdates$IndexingThread.run(TestRollingUpdates.java:125)
[junit] *** Thread: Thread-106 ***
[junit] java.lang.AssertionError: expected: 
org.apache.lucene.index.DocumentsWriterDeleteQueue@18635827but was: 
org.apache.lucene.index.DocumentsWriterDeleteQueue@223074f3 false
[junit]

[jira] [Resolved] (SOLR-2468) TestFunctionQuery fails always on windows

2011-04-14 Thread Hoss Man (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-2468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hoss Man resolved SOLR-2468.


Resolution: Fixed
  Assignee: Hoss Man

Committed revision 1092451.


 TestFunctionQuery fails always on windows
 -

 Key: SOLR-2468
 URL: https://issues.apache.org/jira/browse/SOLR-2468
 Project: Solr
  Issue Type: Bug
Reporter: Robert Muir
Assignee: Hoss Man
 Attachments: SOLR-2468.patch


 NOTE: reproduce with: ant test -Dtestcase=TestFunctionQuery 
 -Dtestmethod=testExternalFieldValueSourceParser 
 -Dtests.seed=1172323467847461017:3327452514993896990

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3028) IW.getReader() returns inconsistent reader on RT Branch


[ 
https://issues.apache.org/jira/browse/LUCENE-3028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13020002#comment-13020002
 ] 

Simon Willnauer commented on LUCENE-3028:
-

hmm I can't even after 1k runs :(

 IW.getReader() returns inconsistent reader on RT Branch
 ---

 Key: LUCENE-3028
 URL: https://issues.apache.org/jira/browse/LUCENE-3028
 Project: Lucene - Java
  Issue Type: Bug
  Components: Index
Affects Versions: Realtime Branch
Reporter: Simon Willnauer
Assignee: Simon Willnauer
 Fix For: Realtime Branch

 Attachments: LUCENE-3028.patch, LUCENE-3028.patch


 I extended the testcase TestRollingUpdates#testUpdateSameDoc to pull a NRT 
 reader after each update and asserted that is always sees only one document. 
 Yet, this fails with current branch since there is a problem in how we flush 
 in the getReader() case. What happens here is that we flush all threads and 
 then release the lock (letting other flushes which came in after we entered 
 the flushAllThread context, continue) so that we could concurrently get a new 
 segment that transports global deletes without the corresponding add. They 
 sneak in while we continue to open the NRT reader which in turn sees 
 inconsistent results.
 I will upload a patch soon

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-3028) IW.getReader() returns inconsistent reader on RT Branch


 [ 
https://issues.apache.org/jira/browse/LUCENE-3028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

selckin updated LUCENE-3028:


Attachment: realtime-1.txt

 IW.getReader() returns inconsistent reader on RT Branch
 ---

 Key: LUCENE-3028
 URL: https://issues.apache.org/jira/browse/LUCENE-3028
 Project: Lucene - Java
  Issue Type: Bug
  Components: Index
Affects Versions: Realtime Branch
Reporter: Simon Willnauer
Assignee: Simon Willnauer
 Fix For: Realtime Branch

 Attachments: LUCENE-3028.patch, LUCENE-3028.patch, realtime-1.txt


 I extended the testcase TestRollingUpdates#testUpdateSameDoc to pull a NRT 
 reader after each update and asserted that is always sees only one document. 
 Yet, this fails with current branch since there is a problem in how we flush 
 in the getReader() case. What happens here is that we flush all threads and 
 then release the lock (letting other flushes which came in after we entered 
 the flushAllThread context, continue) so that we could concurrently get a new 
 segment that transports global deletes without the corresponding add. They 
 sneak in while we continue to open the NRT reader which in turn sees 
 inconsistent results.
 I will upload a patch soon

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3028) IW.getReader() returns inconsistent reader on RT Branch