Possible test framework improvement

2012-06-12 Thread Sami Siren
Hi,

While looking at some of the test failures it occurred to me that it
would be great to have a tiny addition to the junit output for
succesfull tests. Now if a test succeeds it only prints out something
like this:

   [junit4] Suite: org.apache.solr.analysis.TestKeepFilterFactory
   [junit4] Completed on J0 in 0.22s, 1 test

If that also had a time stamp when the test started it would be, in
some cases, helpful to see what other tests were running at the same
time. I think this information is implicitly available because the
tests output when they finish but the proposed change would make it
more obvious and easier to spot.

--
 Sami Siren

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-3540) MultiCoreExampleTest and MultiCoreEmbedded test clash with each other

2012-06-12 Thread Sami Siren (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-3540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sami Siren updated SOLR-3540:
-

Attachment: SOLR-3540.patch

this patch should fix the issue

> MultiCoreExampleTest and MultiCoreEmbedded test clash with each other
> -
>
> Key: SOLR-3540
> URL: https://issues.apache.org/jira/browse/SOLR-3540
> Project: Solr
>  Issue Type: Bug
>Affects Versions: 4.0
>Reporter: Sami Siren
>Assignee: Sami Siren
>Priority: Minor
> Attachments: SOLR-3540.patch
>
>
> When those two tests are run at same time one of them is going to fail with 
> error like this: 
> {code}
> java.lang.AssertionError
> at __randomizedtesting.SeedInfo.seed([B44AE18D746BCD54:3062FA7EBBB8C061]:0)
> at org.apache.solr.update.TransactionLog.(TransactionLog.java:163)
> at org.apache.solr.update.TransactionLog.(TransactionLog.java:133)
> at org.apache.solr.update.UpdateLog.ensureLog(UpdateLog.java:636)
> {code}
> This is reproducible with:
> {code}
> ant -Dtests.jvms=14 test  
> "-Dtests.class=org.apache.solr.client.solrj.embedded.*"
> {code}
> Looks like this is because they share the directory 
> example/multicore/core0/data/

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-3540) MultiCoreExampleTest and MultiCoreEmbedded test clash with each other

2012-06-12 Thread Sami Siren (JIRA)
Sami Siren created SOLR-3540:


 Summary: MultiCoreExampleTest and MultiCoreEmbedded test clash 
with each other
 Key: SOLR-3540
 URL: https://issues.apache.org/jira/browse/SOLR-3540
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.0
Reporter: Sami Siren
Assignee: Sami Siren
Priority: Minor


When those two tests are run at same time one of them is going to fail with 
error like this: 

{code}
java.lang.AssertionError
at __randomizedtesting.SeedInfo.seed([B44AE18D746BCD54:3062FA7EBBB8C061]:0)
at org.apache.solr.update.TransactionLog.(TransactionLog.java:163)
at org.apache.solr.update.TransactionLog.(TransactionLog.java:133)
at org.apache.solr.update.UpdateLog.ensureLog(UpdateLog.java:636)
{code}

This is reproducible with:
{code}
ant -Dtests.jvms=14 test  
"-Dtests.class=org.apache.solr.client.solrj.embedded.*"
{code}

Looks like this is because they share the directory 
example/multicore/core0/data/

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3534) dismax and edismax should default to "df" when "qf" is absent.

2012-06-12 Thread David Smiley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13294121#comment-13294121
 ] 

David Smiley commented on SOLR-3534:


Jack: I'll improve the exception wording as you suggest.

bq. Any idea what happens for the classic/Solr or flex query parsers if default 
search field is not present?

The lucene query parser (which is the default) doesn't technically require a 
default field, but if there is ambiguity in the query (i.e. a simple search 
word) then you get an exception.

> dismax and edismax should default to "df" when "qf" is absent.
> --
>
> Key: SOLR-3534
> URL: https://issues.apache.org/jira/browse/SOLR-3534
> Project: Solr
>  Issue Type: Improvement
>  Components: query parsers
>Affects Versions: 4.0
>Reporter: David Smiley
>Assignee: David Smiley
>Priority: Minor
> Attachments: 
> SOLR-3534_dismax_and_edismax_should_default_to_df_if_qf_is_absent.patch
>
>
> The dismax and edismax query parsers should default to "df" when the "qf" 
> parameter is absent.  They only use the defaultSearchField in schema.xml as a 
> fallback now.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-2899) Add OpenNLP Analysis capabilities as a module

2012-06-12 Thread Lance Norskog (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13294113#comment-13294113
 ] 

Lance Norskog commented on LUCENE-2899:
---

bq. This really should just be a part of the analysis modules (with the 
exception of the Solr example parts). I don't know exactly how we are handling 
Solr examples anymore, but I seem to recall the general consensus was to not 
proliferate them. Can we just expose the functionality in the main one?
A lot of Solr/Lucene features are only demoed in solrconfig/schema unit test 
files (DIH for example). That is fine.
bq. The models are indeed tricky and I wonder how we can properly hook them 
into the tests, if at all.
D'oh! Forgot about that. If we have tagged data in the project, it helps show 
the other parts of an NLP suite. It's hard to get a full picture of the jigsaw 
puzzle if you don't know NLP software.



> Add OpenNLP Analysis capabilities as a module
> -
>
> Key: LUCENE-2899
> URL: https://issues.apache.org/jira/browse/LUCENE-2899
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: modules/analysis
>Reporter: Grant Ingersoll
>Assignee: Grant Ingersoll
>Priority: Minor
> Attachments: LUCENE-2899.patch, opennlp_trunk.patch
>
>
> Now that OpenNLP is an ASF project and has a nice license, it would be nice 
> to have a submodule (under analysis) that exposed capabilities for it. Drew 
> Farris, Tom Morton and I have code that does:
> * Sentence Detection as a Tokenizer (could also be a TokenFilter, although it 
> would have to change slightly to buffer tokens)
> * NamedEntity recognition as a TokenFilter
> We are also planning a Tokenizer/TokenFilter that can put parts of speech as 
> either payloads (PartOfSpeechAttribute?) on a token or at the same position.
> I'd propose it go under:
> modules/analysis/opennlp

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[JENKINS] Lucene-Solr-trunk-Windows-Java6-64 - Build # 518 - Failure!

2012-06-12 Thread jenkins
Build: 
http://jenkins.sd-datasolutions.de/job/Lucene-Solr-trunk-Windows-Java6-64/518/

No tests ran.

Build Log:
[...truncated 4742 lines...]

[...truncated 4742 lines...]

[...truncated 4742 lines...]

[...truncated 4742 lines...]

[...truncated 4742 lines...]

[...truncated 4742 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: [JENKINS] Lucene-Solr-trunk-Linux-Java6-64 - Build # 880 - Failure!

2012-06-12 Thread Robert Muir
The assert in SimpleText is wrong, it should be >= lastStartOffset. I
committed a fix.

On Tue, Jun 12, 2012 at 10:47 PM,   wrote:
> Build: 
> http://jenkins.sd-datasolutions.de/job/Lucene-Solr-trunk-Linux-Java6-64/880/
>
> 1 tests failed.
> REGRESSION:  
> org.apache.lucene.analysis.TestGraphTokenizers.testDoubleMockGraphTokenFilterRandom
>
> Error Message:
> startOffset=3562 lastEndOffset=3574
>
> Stack Trace:
> java.lang.AssertionError: startOffset=3562 lastEndOffset=3574
>        at 
> __randomizedtesting.SeedInfo.seed([C6631E3EBBD412A:8725212ECB3E5AFB]:0)
>        at 
> org.apache.lucene.codecs.simpletext.SimpleTextFieldsWriter$SimpleTextPostingsWriter.addPosition(SimpleTextFieldsWriter.java:155)
>        at 
> org.apache.lucene.index.FreqProxTermsWriterPerField.flush(FreqProxTermsWriterPerField.java:531)
>        at 
> org.apache.lucene.index.FreqProxTermsWriter.flush(FreqProxTermsWriter.java:85)
>        at org.apache.lucene.index.TermsHash.flush(TermsHash.java:117)
>        at org.apache.lucene.index.DocInverter.flush(DocInverter.java:53)
>        at 
> org.apache.lucene.index.DocFieldProcessor.flush(DocFieldProcessor.java:82)
>        at 
> org.apache.lucene.index.DocumentsWriterPerThread.flush(DocumentsWriterPerThread.java:480)
>        at 
> org.apache.lucene.index.DocumentsWriter.doFlush(DocumentsWriter.java:422)
>        at 
> org.apache.lucene.index.DocumentsWriter.flushAllThreads(DocumentsWriter.java:554)
>        at org.apache.lucene.index.IndexWriter.doFlush(IndexWriter.java:2748)
>        at org.apache.lucene.index.IndexWriter.flush(IndexWriter.java:2724)
>        at 
> org.apache.lucene.index.IndexWriter.closeInternal(IndexWriter.java:904)
>        at org.apache.lucene.index.IndexWriter.close(IndexWriter.java:863)
>        at org.apache.lucene.index.IndexWriter.close(IndexWriter.java:827)
>        at 
> org.apache.lucene.index.RandomIndexWriter.close(RandomIndexWriter.java:438)
>        at org.apache.lucene.util.IOUtils.close(IOUtils.java:143)
>        at 
> org.apache.lucene.analysis.BaseTokenStreamTestCase.checkRandomData(BaseTokenStreamTestCase.java:472)
>        at 
> org.apache.lucene.analysis.BaseTokenStreamTestCase.checkRandomData(BaseTokenStreamTestCase.java:378)
>        at 
> org.apache.lucene.analysis.TestGraphTokenizers.testDoubleMockGraphTokenFilterRandom(TestGraphTokenizers.java:338)
>        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>        at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>        at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>        at java.lang.reflect.Method.invoke(Method.java:597)
>        at 
> com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1969)
>        at 
> com.carrotsearch.randomizedtesting.RandomizedRunner.access$1100(RandomizedRunner.java:132)
>        at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:814)
>        at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:875)
>        at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:889)
>        at 
> org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50)
>        at 
> org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:32)
>        at 
> org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
>        at 
> com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
>        at 
> org.apache.lucene.util.TestRuleReportUncaughtExceptions$1.evaluate(TestRuleReportUncaughtExceptions.java:68)
>        at 
> org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:48)
>        at 
> org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
>        at 
> com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:821)
>        at 
> com.carrotsearch.randomizedtesting.RandomizedRunner.access$700(RandomizedRunner.java:132)
>        at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$3$1.run(RandomizedRunner.java:669)
>        at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:695)
>        at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:734)
>        at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:745)
>        at 
> org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
>        at 
> org.apache.lucene.util.TestRuleReportUncaughtExceptions$1.evaluate(TestRuleReportUncaughtExceptions.java:68)
>        at 
> org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:38)
>        at 
> org.apache.luc

Re: [JENKINS] Lucene-Solr-trunk-Linux-Java6-64 - Build # 880 - Failure!

2012-06-12 Thread Robert Muir
I'm looking into this.

On Tue, Jun 12, 2012 at 10:47 PM,   wrote:
> Build: 
> http://jenkins.sd-datasolutions.de/job/Lucene-Solr-trunk-Linux-Java6-64/880/
>
> 1 tests failed.
> REGRESSION:  
> org.apache.lucene.analysis.TestGraphTokenizers.testDoubleMockGraphTokenFilterRandom
>
> Error Message:
> startOffset=3562 lastEndOffset=3574
>
> Stack Trace:
> java.lang.AssertionError: startOffset=3562 lastEndOffset=3574
>        at 
> __randomizedtesting.SeedInfo.seed([C6631E3EBBD412A:8725212ECB3E5AFB]:0)
>        at 
> org.apache.lucene.codecs.simpletext.SimpleTextFieldsWriter$SimpleTextPostingsWriter.addPosition(SimpleTextFieldsWriter.java:155)
>        at 
> org.apache.lucene.index.FreqProxTermsWriterPerField.flush(FreqProxTermsWriterPerField.java:531)
>        at 
> org.apache.lucene.index.FreqProxTermsWriter.flush(FreqProxTermsWriter.java:85)
>        at org.apache.lucene.index.TermsHash.flush(TermsHash.java:117)
>        at org.apache.lucene.index.DocInverter.flush(DocInverter.java:53)
>        at 
> org.apache.lucene.index.DocFieldProcessor.flush(DocFieldProcessor.java:82)
>        at 
> org.apache.lucene.index.DocumentsWriterPerThread.flush(DocumentsWriterPerThread.java:480)
>        at 
> org.apache.lucene.index.DocumentsWriter.doFlush(DocumentsWriter.java:422)
>        at 
> org.apache.lucene.index.DocumentsWriter.flushAllThreads(DocumentsWriter.java:554)
>        at org.apache.lucene.index.IndexWriter.doFlush(IndexWriter.java:2748)
>        at org.apache.lucene.index.IndexWriter.flush(IndexWriter.java:2724)
>        at 
> org.apache.lucene.index.IndexWriter.closeInternal(IndexWriter.java:904)
>        at org.apache.lucene.index.IndexWriter.close(IndexWriter.java:863)
>        at org.apache.lucene.index.IndexWriter.close(IndexWriter.java:827)
>        at 
> org.apache.lucene.index.RandomIndexWriter.close(RandomIndexWriter.java:438)
>        at org.apache.lucene.util.IOUtils.close(IOUtils.java:143)
>        at 
> org.apache.lucene.analysis.BaseTokenStreamTestCase.checkRandomData(BaseTokenStreamTestCase.java:472)
>        at 
> org.apache.lucene.analysis.BaseTokenStreamTestCase.checkRandomData(BaseTokenStreamTestCase.java:378)
>        at 
> org.apache.lucene.analysis.TestGraphTokenizers.testDoubleMockGraphTokenFilterRandom(TestGraphTokenizers.java:338)
>        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>        at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>        at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>        at java.lang.reflect.Method.invoke(Method.java:597)
>        at 
> com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1969)
>        at 
> com.carrotsearch.randomizedtesting.RandomizedRunner.access$1100(RandomizedRunner.java:132)
>        at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:814)
>        at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:875)
>        at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:889)
>        at 
> org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50)
>        at 
> org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:32)
>        at 
> org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
>        at 
> com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
>        at 
> org.apache.lucene.util.TestRuleReportUncaughtExceptions$1.evaluate(TestRuleReportUncaughtExceptions.java:68)
>        at 
> org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:48)
>        at 
> org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
>        at 
> com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:821)
>        at 
> com.carrotsearch.randomizedtesting.RandomizedRunner.access$700(RandomizedRunner.java:132)
>        at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$3$1.run(RandomizedRunner.java:669)
>        at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:695)
>        at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:734)
>        at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:745)
>        at 
> org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
>        at 
> org.apache.lucene.util.TestRuleReportUncaughtExceptions$1.evaluate(TestRuleReportUncaughtExceptions.java:68)
>        at 
> org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:38)
>        at 
> org.apache.lucene.util.TestRuleIcuHack$1.evaluate(TestRuleIcuHack.java:51)
>  

[jira] [Updated] (LUCENE-4143) add backwards checkindex crosscheck

2012-06-12 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-4143:


Attachment: LUCENE-4143.patch

> add backwards checkindex crosscheck
> ---
>
> Key: LUCENE-4143
> URL: https://issues.apache.org/jira/browse/LUCENE-4143
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: core/index
>Reporter: Robert Muir
> Attachments: LUCENE-4143.patch
>
>
> This is super slow, but ensures they are actually equal (as the existing 
> cross-check just checks that vectors are a subset of postings).
> I added a hack so that we only use it in MockDirectoryWrapper when the 
> delegate is a RAMDir and < 1MB in size, so it doesn't hurt test times.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-4143) add backwards checkindex crosscheck

2012-06-12 Thread Robert Muir (JIRA)
Robert Muir created LUCENE-4143:
---

 Summary: add backwards checkindex crosscheck
 Key: LUCENE-4143
 URL: https://issues.apache.org/jira/browse/LUCENE-4143
 Project: Lucene - Java
  Issue Type: Bug
  Components: core/index
Reporter: Robert Muir


This is super slow, but ensures they are actually equal (as the existing 
cross-check just checks that vectors are a subset of postings).

I added a hack so that we only use it in MockDirectoryWrapper when the delegate 
is a RAMDir and < 1MB in size, so it doesn't hurt test times.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[JENKINS] Lucene-Solr-trunk-Linux-Java6-64 - Build # 880 - Failure!

2012-06-12 Thread jenkins
Build: 
http://jenkins.sd-datasolutions.de/job/Lucene-Solr-trunk-Linux-Java6-64/880/

1 tests failed.
REGRESSION:  
org.apache.lucene.analysis.TestGraphTokenizers.testDoubleMockGraphTokenFilterRandom

Error Message:
startOffset=3562 lastEndOffset=3574

Stack Trace:
java.lang.AssertionError: startOffset=3562 lastEndOffset=3574
at 
__randomizedtesting.SeedInfo.seed([C6631E3EBBD412A:8725212ECB3E5AFB]:0)
at 
org.apache.lucene.codecs.simpletext.SimpleTextFieldsWriter$SimpleTextPostingsWriter.addPosition(SimpleTextFieldsWriter.java:155)
at 
org.apache.lucene.index.FreqProxTermsWriterPerField.flush(FreqProxTermsWriterPerField.java:531)
at 
org.apache.lucene.index.FreqProxTermsWriter.flush(FreqProxTermsWriter.java:85)
at org.apache.lucene.index.TermsHash.flush(TermsHash.java:117)
at org.apache.lucene.index.DocInverter.flush(DocInverter.java:53)
at 
org.apache.lucene.index.DocFieldProcessor.flush(DocFieldProcessor.java:82)
at 
org.apache.lucene.index.DocumentsWriterPerThread.flush(DocumentsWriterPerThread.java:480)
at 
org.apache.lucene.index.DocumentsWriter.doFlush(DocumentsWriter.java:422)
at 
org.apache.lucene.index.DocumentsWriter.flushAllThreads(DocumentsWriter.java:554)
at org.apache.lucene.index.IndexWriter.doFlush(IndexWriter.java:2748)
at org.apache.lucene.index.IndexWriter.flush(IndexWriter.java:2724)
at 
org.apache.lucene.index.IndexWriter.closeInternal(IndexWriter.java:904)
at org.apache.lucene.index.IndexWriter.close(IndexWriter.java:863)
at org.apache.lucene.index.IndexWriter.close(IndexWriter.java:827)
at 
org.apache.lucene.index.RandomIndexWriter.close(RandomIndexWriter.java:438)
at org.apache.lucene.util.IOUtils.close(IOUtils.java:143)
at 
org.apache.lucene.analysis.BaseTokenStreamTestCase.checkRandomData(BaseTokenStreamTestCase.java:472)
at 
org.apache.lucene.analysis.BaseTokenStreamTestCase.checkRandomData(BaseTokenStreamTestCase.java:378)
at 
org.apache.lucene.analysis.TestGraphTokenizers.testDoubleMockGraphTokenFilterRandom(TestGraphTokenizers.java:338)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1969)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.access$1100(RandomizedRunner.java:132)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:814)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:875)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:889)
at 
org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50)
at 
org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:32)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
at 
org.apache.lucene.util.TestRuleReportUncaughtExceptions$1.evaluate(TestRuleReportUncaughtExceptions.java:68)
at 
org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:48)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:821)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.access$700(RandomizedRunner.java:132)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$3$1.run(RandomizedRunner.java:669)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:695)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:734)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:745)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
at 
org.apache.lucene.util.TestRuleReportUncaughtExceptions$1.evaluate(TestRuleReportUncaughtExceptions.java:68)
at 
org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:38)
at 
org.apache.lucene.util.TestRuleIcuHack$1.evaluate(TestRuleIcuHack.java:51)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
at 
org.apache.lucene.util.TestRuleNoInstanceHooksOverrides$1.evaluate

[jira] [Commented] (LUCENE-4142) AnalyzerWrapper doesn't work with CharFilters.

2012-06-12 Thread Chris Male (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13294082#comment-13294082
 ] 

Chris Male commented on LUCENE-4142:


+1

> AnalyzerWrapper doesn't work with CharFilters.
> --
>
> Key: LUCENE-4142
> URL: https://issues.apache.org/jira/browse/LUCENE-4142
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: modules/analysis
>Affects Versions: 4.0
>Reporter: Robert Muir
> Attachments: LUCENE-4142.patch
>
>
> It doesnt override initReader (nor would it be able to) since it doesnt have 
> fieldName. this gives unexpected behavior.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[JENKINS] Lucene-Solr-4.x-Windows-Java7-64 - Build # 52 - Still Failing!

2012-06-12 Thread jenkins
Build: 
http://jenkins.sd-datasolutions.de/job/Lucene-Solr-4.x-Windows-Java7-64/52/

1 tests failed.
FAILED:  org.apache.solr.common.util.ContentStreamTest.testURLStream

Error Message:


Stack Trace:
java.lang.AssertionError
at 
__randomizedtesting.SeedInfo.seed([50828D73D92F3889:60069A5A2D9BEE0D]:0)
at org.junit.Assert.fail(Assert.java:92)
at org.junit.Assert.assertTrue(Assert.java:43)
at org.junit.Assert.assertTrue(Assert.java:54)
at 
org.apache.solr.common.util.ContentStreamTest.testURLStream(ContentStreamTest.java:109)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1969)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.access$1100(RandomizedRunner.java:132)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:814)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:875)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:889)
at 
org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50)
at 
org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:32)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
at 
org.apache.lucene.util.TestRuleReportUncaughtExceptions$1.evaluate(TestRuleReportUncaughtExceptions.java:68)
at 
org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:48)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:821)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.access$700(RandomizedRunner.java:132)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$3$1.run(RandomizedRunner.java:669)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:695)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:734)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:745)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
at 
org.apache.lucene.util.TestRuleReportUncaughtExceptions$1.evaluate(TestRuleReportUncaughtExceptions.java:68)
at 
org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:38)
at 
org.apache.lucene.util.TestRuleIcuHack$1.evaluate(TestRuleIcuHack.java:51)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
at 
org.apache.lucene.util.TestRuleNoInstanceHooksOverrides$1.evaluate(TestRuleNoInstanceHooksOverrides.java:53)
at 
org.apache.lucene.util.TestRuleNoStaticHooksShadowing$1.evaluate(TestRuleNoStaticHooksShadowing.java:52)
at 
org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:36)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
at 
org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:56)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.runSuite(RandomizedRunner.java:605)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.access$400(RandomizedRunner.java:132)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$2.run(RandomizedRunner.java:551)




Build Log:
[...truncated 11889 lines...]
   [junit4]  
   [junit4] Suite: org.apache.solr.client.solrj.SolrExampleBinaryTest
   [junit4] Completed in 8.13s, 21 tests
   [junit4]  
   [junit4] Suite: org.apache.solr.common.util.NamedListTest
   [junit4] Completed in 0.01s, 1 test
   [junit4]  
   [junit4] Suite: 
org.apache.solr.client.solrj.embedded.SolrExampleStreamingTest
   [junit4] Completed in 12.32s, 21 tests
   [junit4]  
   [junit4] Suite: org.apache.solr.client.solrj.TestLBHttpSolrServer
   [junit4] Completed in 14.05s, 3 tests
   [junit4]  
   [junit4] Suite: org.apache.solr.client.solrj.embedded.SolrExampleJettyTest
   [junit4] Completed in 7.35s, 22 tests
   [junit4]  
   [junit4] Suite: org.apache.solr.client.solrj.embedded.JettyWebappTest
   [junit4] Completed in 2.54s, 1 test
   [junit4]  
  

Re: [JENKINS] Lucene-Solr-trunk-Windows-Java6-64 - Build # 516 - Failure!

2012-06-12 Thread Yonik Seeley
I just checked in a fix (hopefully) for this.
The snap puller was creating a temp directory that only used down to
seconds precision.  I've changed it to milliseconds.

-Yonik
http://lucidimagination.com


On Tue, Jun 12, 2012 at 5:40 PM,   wrote:
> Build: 
> http://jenkins.sd-datasolutions.de/job/Lucene-Solr-trunk-Windows-Java6-64/516/
>
> 1 tests failed.
> REGRESSION:  org.apache.solr.handler.TestReplicationHandler.test
>
> Error Message:
> expected:<498> but was:<0>
>
> Stack Trace:
> java.lang.AssertionError: expected:<498> but was:<0>
>        at 
> __randomizedtesting.SeedInfo.seed([1363050687D73DF1:9B373ADC292B5009]:0)
>        at org.junit.Assert.fail(Assert.java:93)
>        at org.junit.Assert.failNotEquals(Assert.java:647)
>        at org.junit.Assert.assertEquals(Assert.java:128)
>        at org.junit.Assert.assertEquals(Assert.java:472)
>        at org.junit.Assert.assertEquals(Assert.java:456)
>        at 
> org.apache.solr.handler.TestReplicationHandler.doTestIndexAndConfigReplication(TestReplicationHandler.java:391)
>        at 
> org.apache.solr.handler.TestReplicationHandler.test(TestReplicationHandler.java:250)
>        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>        at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>        at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>        at java.lang.reflect.Method.invoke(Method.java:597)
>        at 
> com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1969)
>        at 
> com.carrotsearch.randomizedtesting.RandomizedRunner.access$1100(RandomizedRunner.java:132)
>        at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:814)
>        at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:875)
>        at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:889)
>        at 
> com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53)
>        at 
> org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50)
>        at 
> org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:32)
>        at 
> org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
>        at 
> com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
>        at 
> org.apache.lucene.util.TestRuleReportUncaughtExceptions$1.evaluate(TestRuleReportUncaughtExceptions.java:68)
>        at 
> org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:48)
>        at 
> org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
>        at 
> com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:821)
>        at 
> com.carrotsearch.randomizedtesting.RandomizedRunner.access$700(RandomizedRunner.java:132)
>        at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$3$1.run(RandomizedRunner.java:669)
>        at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:695)
>        at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:734)
>        at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:745)
>        at 
> com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53)
>        at 
> org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
>        at 
> org.apache.lucene.util.TestRuleReportUncaughtExceptions$1.evaluate(TestRuleReportUncaughtExceptions.java:68)
>        at 
> org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:38)
>        at 
> org.apache.lucene.util.TestRuleIcuHack$1.evaluate(TestRuleIcuHack.java:51)
>        at 
> com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
>        at 
> org.apache.lucene.util.TestRuleNoInstanceHooksOverrides$1.evaluate(TestRuleNoInstanceHooksOverrides.java:53)
>        at 
> org.apache.lucene.util.TestRuleNoStaticHooksShadowing$1.evaluate(TestRuleNoStaticHooksShadowing.java:52)
>        at 
> org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:36)
>        at 
> org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
>        at 
> org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:56)
>        at 
> com.carrotsearch.randomizedtesting.RandomizedRunner.runSuite(RandomizedRunner.java:605)
>        at 
> com.carrotsearch.randomizedtesting.RandomizedRunner.access$400(Random

[jira] [Updated] (LUCENE-4141) don't allow Analyzer.offsetGap/posIncGap to be negative

2012-06-12 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-4141:


Attachment: LUCENE-4141_test.patch

here's some initial tests: overflow checks work, but negative values cause 
things to go backwards undetected.


> don't allow Analyzer.offsetGap/posIncGap to be negative
> ---
>
> Key: LUCENE-4141
> URL: https://issues.apache.org/jira/browse/LUCENE-4141
> Project: Lucene - Java
>  Issue Type: Bug
>Affects Versions: 4.0
>Reporter: Robert Muir
> Attachments: LUCENE-4141_test.patch
>
>
> unrelated but i thought about this looking at LUCENE-4139: we should check 
> this doesnt make a corrupt index but instead that IW throws a reasonable 
> exception.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-4142) AnalyzerWrapper doesn't work with CharFilters.

2012-06-12 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-4142:


Attachment: LUCENE-4142.patch

> AnalyzerWrapper doesn't work with CharFilters.
> --
>
> Key: LUCENE-4142
> URL: https://issues.apache.org/jira/browse/LUCENE-4142
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: modules/analysis
>Affects Versions: 4.0
>Reporter: Robert Muir
> Attachments: LUCENE-4142.patch
>
>
> It doesnt override initReader (nor would it be able to) since it doesnt have 
> fieldName. this gives unexpected behavior.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-4142) AnalyzerWrapper doesn't work with CharFilters.

2012-06-12 Thread Robert Muir (JIRA)
Robert Muir created LUCENE-4142:
---

 Summary: AnalyzerWrapper doesn't work with CharFilters.
 Key: LUCENE-4142
 URL: https://issues.apache.org/jira/browse/LUCENE-4142
 Project: Lucene - Java
  Issue Type: Bug
  Components: modules/analysis
Affects Versions: 4.0
Reporter: Robert Muir
 Attachments: LUCENE-4142.patch

It doesnt override initReader (nor would it be able to) since it doesnt have 
fieldName. this gives unexpected behavior.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[JENKINS] Lucene-Solr-4.x-Windows-Java7-64 - Build # 51 - Failure!

2012-06-12 Thread jenkins
Build: 
http://jenkins.sd-datasolutions.de/job/Lucene-Solr-4.x-Windows-Java7-64/51/

1 tests failed.
REGRESSION:  org.apache.solr.handler.TestReplicationHandler.test

Error Message:
expected:<498> but was:<0>

Stack Trace:
java.lang.AssertionError: expected:<498> but was:<0>
at 
__randomizedtesting.SeedInfo.seed([FEF60F7A1CF29B59:76A230A0B20EF6A1]:0)
at org.junit.Assert.fail(Assert.java:93)
at org.junit.Assert.failNotEquals(Assert.java:647)
at org.junit.Assert.assertEquals(Assert.java:128)
at org.junit.Assert.assertEquals(Assert.java:472)
at org.junit.Assert.assertEquals(Assert.java:456)
at 
org.apache.solr.handler.TestReplicationHandler.doTestIndexAndConfigReplication(TestReplicationHandler.java:391)
at 
org.apache.solr.handler.TestReplicationHandler.test(TestReplicationHandler.java:250)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1969)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.access$1100(RandomizedRunner.java:132)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:814)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:875)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:889)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53)
at 
org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50)
at 
org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:32)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
at 
org.apache.lucene.util.TestRuleReportUncaughtExceptions$1.evaluate(TestRuleReportUncaughtExceptions.java:68)
at 
org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:48)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:821)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.access$700(RandomizedRunner.java:132)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$3$1.run(RandomizedRunner.java:669)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:695)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:734)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:745)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
at 
org.apache.lucene.util.TestRuleReportUncaughtExceptions$1.evaluate(TestRuleReportUncaughtExceptions.java:68)
at 
org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:38)
at 
org.apache.lucene.util.TestRuleIcuHack$1.evaluate(TestRuleIcuHack.java:51)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
at 
org.apache.lucene.util.TestRuleNoInstanceHooksOverrides$1.evaluate(TestRuleNoInstanceHooksOverrides.java:53)
at 
org.apache.lucene.util.TestRuleNoStaticHooksShadowing$1.evaluate(TestRuleNoStaticHooksShadowing.java:52)
at 
org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:36)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
at 
org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:56)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.runSuite(RandomizedRunner.java:605)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.access$400(RandomizedRunner.java:132)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$2.run(RandomizedRunner.java:551)




Build Log:
[...truncated 14449 lines...]
   [junit4]   2> 37662 T2915 C178 REQ [collection1] webapp=/solr 
path=/replication 
params={command=filecontent&checksum=true&generation=7&wt=filestream&file=_3.fdx}
 status=0 QTime=0 
   [junit4]   2> 37665 T2915 C178 REQ [

[jira] [Commented] (LUCENE-4120) FST should use packed integer arrays

2012-06-12 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13293988#comment-13293988
 ] 

Robert Muir commented on LUCENE-4120:
-

{quote}
Yes, it only affects packed FSTs. In this case, the backward compatibility 
would be rather easy to set-up (just fill a GrowableWriter instead of an int[]).
{quote}

Finally had a chance to glance through the patch. I was confusing myself about 
DocValues (its unaffected here). So this is no backwards break to the index 
format, since we don't use packed FSTs in our standard codec. I wouldn't do any 
backwards compatibility.


> FST should use packed integer arrays
> 
>
> Key: LUCENE-4120
> URL: https://issues.apache.org/jira/browse/LUCENE-4120
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: core/FSTs
>Reporter: Adrien Grand
>Assignee: Adrien Grand
>Priority: Minor
> Fix For: 4.0
>
> Attachments: LUCENE-4120.patch, LUCENE-4120.patch, LUCENE-4120.patch
>
>
> There are some places where an int[] could be advantageously replaced with a 
> packed integer array.
> I am thinking (at least) of:
>  * FST.nodeAddress (GrowableWriter)
>  * FST.inCounts (GrowableWriter)
>  * FST.nodeRefToAddress (read-only Reader)
> The serialization/deserialization methods should be modified too in order to 
> take advantage of PackedInts.get{Reader,Writer}.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-2600) ensure example schema.xml has some mention/explanation of per field similarity vs similarityprovider vs (global) similarity

2012-06-12 Thread Hoss Man (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hoss Man updated SOLR-2600:
---

Fix Version/s: (was: 4.1)
   4.0
 Assignee: Hoss Man

we've already seen questions about this, so i'll make sure we have at least one 
example

> ensure example schema.xml has some mention/explanation of per field 
> similarity vs similarityprovider vs (global) similarity
> ---
>
> Key: SOLR-2600
> URL: https://issues.apache.org/jira/browse/SOLR-2600
> Project: Solr
>  Issue Type: Task
>  Components: documentation
>Reporter: Hoss Man
>Assignee: Hoss Man
>Priority: Blocker
> Fix For: 4.0
>
>
> when SOLR-2338 was commited, there wasn't yet clear understanding of how much 
> the new feature per field similarity fields (vs custom similarity provider 
> (vs global similarity factory)) should be "advertised" in the example 
> configs, and what type of usage should be encouraged/promoted.
> it's likely that by the time 4.0 is released, new language specific field 
> types will already demonstrate these features, and no additional "artificial" 
> usages of them will be needed, but one way or another we should ensure that 
> they are either demoed or mentioned in comments

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-2605) queryparser parses on whitespace

2012-06-12 Thread John Berryman (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13293940#comment-13293940
 ] 

John Berryman commented on LUCENE-2605:
---

(How's it going Jack) Interesting idea, though I really need to crack into the 
QueryParser and play around a little bit before I have a strong opinion myself.

> queryparser parses on whitespace
> 
>
> Key: LUCENE-2605
> URL: https://issues.apache.org/jira/browse/LUCENE-2605
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: core/queryparser
>Reporter: Robert Muir
> Fix For: 4.1
>
>
> The queryparser parses input on whitespace, and sends each whitespace 
> separated term to its own independent token stream.
> This breaks the following at query-time, because they can't see across 
> whitespace boundaries:
> * n-gram analysis
> * shingles 
> * synonyms (especially multi-word for whitespace-separated languages)
> * languages where a 'word' can contain whitespace (e.g. vietnamese)
> Its also rather unexpected, as users think their 
> charfilters/tokenizers/tokenfilters will do the same thing at index and 
> querytime, but
> in many cases they can't. Instead, preferably the queryparser would parse 
> around only real 'operators'.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[JENKINS] Lucene-Solr-trunk-Windows-Java6-64 - Build # 516 - Failure!

2012-06-12 Thread jenkins
Build: 
http://jenkins.sd-datasolutions.de/job/Lucene-Solr-trunk-Windows-Java6-64/516/

1 tests failed.
REGRESSION:  org.apache.solr.handler.TestReplicationHandler.test

Error Message:
expected:<498> but was:<0>

Stack Trace:
java.lang.AssertionError: expected:<498> but was:<0>
at 
__randomizedtesting.SeedInfo.seed([1363050687D73DF1:9B373ADC292B5009]:0)
at org.junit.Assert.fail(Assert.java:93)
at org.junit.Assert.failNotEquals(Assert.java:647)
at org.junit.Assert.assertEquals(Assert.java:128)
at org.junit.Assert.assertEquals(Assert.java:472)
at org.junit.Assert.assertEquals(Assert.java:456)
at 
org.apache.solr.handler.TestReplicationHandler.doTestIndexAndConfigReplication(TestReplicationHandler.java:391)
at 
org.apache.solr.handler.TestReplicationHandler.test(TestReplicationHandler.java:250)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1969)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.access$1100(RandomizedRunner.java:132)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:814)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:875)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:889)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53)
at 
org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50)
at 
org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:32)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
at 
org.apache.lucene.util.TestRuleReportUncaughtExceptions$1.evaluate(TestRuleReportUncaughtExceptions.java:68)
at 
org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:48)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:821)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.access$700(RandomizedRunner.java:132)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$3$1.run(RandomizedRunner.java:669)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:695)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:734)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:745)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
at 
org.apache.lucene.util.TestRuleReportUncaughtExceptions$1.evaluate(TestRuleReportUncaughtExceptions.java:68)
at 
org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:38)
at 
org.apache.lucene.util.TestRuleIcuHack$1.evaluate(TestRuleIcuHack.java:51)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
at 
org.apache.lucene.util.TestRuleNoInstanceHooksOverrides$1.evaluate(TestRuleNoInstanceHooksOverrides.java:53)
at 
org.apache.lucene.util.TestRuleNoStaticHooksShadowing$1.evaluate(TestRuleNoStaticHooksShadowing.java:52)
at 
org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:36)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
at 
org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:56)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.runSuite(RandomizedRunner.java:605)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.access$400(RandomizedRunner.java:132)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$2.run(RandomizedRunner.java:551)




Build Log:
[...truncated 13183 lines...]
   [junit4]   2> 42893 T1985 C101 REQ [collection1] webapp=/solr 
path=/replication 
params={command=filecontent&checksum=true&generation=7&wt=filestream&file=_3.si}
 status=0 QTime=0 
   [junit4]   2> 42897 T1985 C101 REQ

Re: Grouping - Boosting large groups

2012-06-12 Thread corwin
Great! I will do that.

Thanks a lot.

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Grouping-Boosting-large-groups-tp3988959p3989298.html
Sent from the Lucene - Java Developer mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3535) Add block support for XMLLoader

2012-06-12 Thread Erik Hatcher (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13293924#comment-13293924
 ] 

Erik Hatcher commented on SOLR-3535:


bq. It seems like what we really want to express here is nested documents.

Great point, and totally concur that the input should be hierarchical for the 
block join queries.  But do we also need a little bit lower level direct 
(non-hierarchical) way call IndexWriter#addDocuments()?  Or is the Solr need 
here purely on hierarchy modeling?

> Add block support for XMLLoader
> ---
>
> Key: SOLR-3535
> URL: https://issues.apache.org/jira/browse/SOLR-3535
> Project: Solr
>  Issue Type: Sub-task
>  Components: update
>Affects Versions: 4.1, 5.0
>Reporter: Mikhail Khludnev
>Priority: Minor
> Attachments: SOLR-3535.patch
>
>
> I'd like to add the following update xml message:
> 
> 
> 
> 
> out of scope for now: 
> * other update formats
> * update log support (NRT), should not be a big deal
> * overwrite feature support for block updates - it's more complicated, I'll 
> tell you why
> Alt
> * wdyt about adding attribute to the current tag {pre}{pre} 
> * or we can establish RunBlockUpdateProcessor which treat every  
>  as a block.
> *Test is included!!*
> How you'd suggest to improve the patch?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4085) improve TestBackwardsCompatibility to test Lucene 4.x features

2012-06-12 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13293905#comment-13293905
 ] 

Robert Muir commented on LUCENE-4085:
-

I improved the situation to some extent in r1349510. We add docvalues fields of 
various types, a field with offsets, a field that only omits positions, etc.

> improve TestBackwardsCompatibility to test Lucene 4.x features
> --
>
> Key: LUCENE-4085
> URL: https://issues.apache.org/jira/browse/LUCENE-4085
> Project: Lucene - Java
>  Issue Type: Test
>  Components: general/test
>Affects Versions: 5.0
>Reporter: Robert Muir
>
> Currently TestBackwardsCompatibility doesn't test any of the new features of 
> 4.0: e.g. docvalues fields, fields with offsets in postings, etc etc.
> We should improve the index generation and testcases (in 5.x) to ensure we 
> don't break these things.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Grouping - Boosting large groups

2012-06-12 Thread Martijn v Groningen
I assumed using the queries module, which isn't in 3.x. After a look
at the 3.x codebase this doesn't seem to be a problem.
Since all the classes you need are in: o.a.l.search.function package
inside core Lucene. You can use
the CustomScoreQuery & ValueSourceQuery instead of the BoostedQuery.

Martijn

On 12 June 2012 21:24, corwin  wrote:
> That's a good idea, thanks for the tip Martijn. I'm not a fan of performing
> an extra search, but it does seem like it's unavoidable for this scenario.
>
> We are currently working with Lucene 3.5 and you mentioned that it assumes
> Lucene 4 or 3.6. Any particular reason for that? I prefer not upgrading just
> yet unless there's a feature that will specifically help me accomplish this.
>
> Thanks again,
>
> Corwin.
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Grouping-Boosting-large-groups-tp3988959p3989266.html
> Sent from the Lucene - Java Developer mailing list archive at Nabble.com.
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>



-- 
Met vriendelijke groet,

Martijn van Groningen

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-4136) TestDocumentsWriterStallControl hang (reproducible)

2012-06-12 Thread Simon Willnauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Willnauer resolved LUCENE-4136.
-

   Resolution: Fixed
Fix Version/s: 5.0
   4.0
Lucene Fields: New,Patch Available  (was: New)

committed to branch_4x & trunk

> TestDocumentsWriterStallControl hang (reproducible)
> ---
>
> Key: LUCENE-4136
> URL: https://issues.apache.org/jira/browse/LUCENE-4136
> Project: Lucene - Java
>  Issue Type: Bug
>Reporter: Robert Muir
>Assignee: Simon Willnauer
> Fix For: 4.0, 5.0
>
> Attachments: LUCENE-4136.patch
>
>
> On trunk (probably affects 4.0 too, but trunk is where i hit it):
> ant test -Dtestcase=TestDocumentsWriterStallControl 
> -Dtests.seed=9D5404FF4A909330

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Assigned] (LUCENE-4136) TestDocumentsWriterStallControl hang (reproducible)

2012-06-12 Thread Simon Willnauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Willnauer reassigned LUCENE-4136:
---

Assignee: Simon Willnauer

> TestDocumentsWriterStallControl hang (reproducible)
> ---
>
> Key: LUCENE-4136
> URL: https://issues.apache.org/jira/browse/LUCENE-4136
> Project: Lucene - Java
>  Issue Type: Bug
>Reporter: Robert Muir
>Assignee: Simon Willnauer
> Attachments: LUCENE-4136.patch
>
>
> On trunk (probably affects 4.0 too, but trunk is where i hit it):
> ant test -Dtestcase=TestDocumentsWriterStallControl 
> -Dtests.seed=9D5404FF4A909330

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3539) rethink softCommit=true|false param on commits?

2012-06-12 Thread Hoss Man (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13293886#comment-13293886
 ] 

Hoss Man commented on SOLR-3539:



This is something that started to concern me while trying to update the 
tutorial.  I'm having a hard time articulating my concerns to myself, so this 
will largely be stream of consciousness...

Both of these params seen defined more in terms of what they *don't* do then 
what they actually do -- softCommit in particular -- and while they aren't too 
terrible to explain indivdually, it's very hard to clearly articulate how they 
interplay with eachother.

* openSearcher
** true - opens a new searcher against this commit point
** false - does not open a new searcher against this commit point
* softCommit
** true - a new searcher is opened against the commit point, but no data is 
flushed to disk.
** false - the commit point is flushed to disk.

Certain combinations of these params seem redundent 
(openSearcher=true&softCommit=true) while others not only make no sense, but 
are directly contradictory (openSearcher=false&softCommit=true)...

| - |softCommit=true|softCommit=false|
|openSearcher=true|openSearcher is redundent|OK|
|openSearcher=false|contradictory (openSearcher is currently ignored)|OK|

>From a vocabulary standpoint, they also seem confusing to understand.  
>Consider a new user, starting with the 4x example which contains the 
>following...

{code}
   
15000 
false 
  
{code}

Documents this user adds will automaticly get flushed to disk, but won't be 
visible in search results until the user takes some explicit action.  The user, 
upon reading some docs or asking on the list will become aware that he needs to 
open a new searcher, and will be guided to "do a commit" (or maybe a commit 
explicitly adding openSearcher=true).  But this is actually overkill for what 
the user needs, because it will also flush any pending docs to disk.  All the 
user really needs to "open a new searcher" is to do an explicit commit with 
softCommit=true.

-

I would like to suggest that we throw out the the "softCommit" param and 
replace it with a "flush" (or "flushToDisk" or "persist") param, which is 
solely concerned with the persistence of the commit, and completely disjoint 
from "searcher" opening which would be controled entirely with the 
"openSearcher" param.

* openSearcher
** true - opens a new searcher against this commit point
** false - does not open a new searcher against this commit point
* flush
** true - flushes this commit point to stable storage
** false - does not flush this commit point to stable storage

Making the interaction much easier to understand...

| - |flush=true|flush=false|
|openSearcher=true|OK|OK|
|openSearcher=false|OK|No-Op|



I've mainly been thinking about this from a user perspective the last few days, 
so I haven'thad a chance to figure out how much this would impact the internals 
related to softCommit right now.  I supsect there are a lot of places that 
would need to be tweaked, but hopefully most of them would just involve 
flipping logic (softCommit=true -> flush=false).  The biggest challenges i can 
think of are:
* how to deal with the autocommit options in solrconfig.xml.  in 3x we 
supported a single  block.  On the 4x branch we support one 
 lock and one  block -- should we continue to do 
that? would  just implicitly specify flush=false? or should we 
try to generalize to support N  blocks where  and 
 are config options for all of them?
* event eventlistener -- it looks like the SolrEventListener API had a 
postSoftCommit() method added to it, but it doesn't seem to be configurable in 
any way -- i think this is just for tests, but if it's intentionally being 
expost we would need to revamp it ... off the cuff i would suggest removing 
postSoftCommit() changing the postCommit() method to take in some new structure 
specifying the options on the commit.


Thoughts?

> rethink softCommit=true|false param on commits?
> ---
>
> Key: SOLR-3539
> URL: https://issues.apache.org/jira/browse/SOLR-3539
> Project: Solr
>  Issue Type: Bug
>Reporter: Hoss Man
> Fix For: 4.0
>
>
> I think the current NTR related options when doing a commit, particularly 
> "openSearcher="true|false" and "softCommit=true|false", is confusing, and we 
> should rethink them before they get baked into the user API in 4.0.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-un

[jira] [Created] (SOLR-3539) rethink softCommit=true|false param on commits?

2012-06-12 Thread Hoss Man (JIRA)
Hoss Man created SOLR-3539:
--

 Summary: rethink softCommit=true|false param on commits?
 Key: SOLR-3539
 URL: https://issues.apache.org/jira/browse/SOLR-3539
 Project: Solr
  Issue Type: Bug
Reporter: Hoss Man
 Fix For: 4.0


I think the current NTR related options when doing a commit, particularly 
"openSearcher="true|false" and "softCommit=true|false", is confusing, and we 
should rethink them before they get baked into the user API in 4.0.



--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4132) IndexWriterConfig live settings

2012-06-12 Thread Simon Willnauer (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13293885#comment-13293885
 ] 

Simon Willnauer commented on LUCENE-4132:
-

I think the generics here are not very complicated and also not really user 
facing. its only a tool here to make things nice for the user I think that 
justifies it. so I think this looks good though.

> IndexWriterConfig live settings
> ---
>
> Key: LUCENE-4132
> URL: https://issues.apache.org/jira/browse/LUCENE-4132
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Shai Erera
>Priority: Minor
> Fix For: 4.0, 5.0
>
> Attachments: LUCENE-4132.patch
>
>
> A while ago there was a discussion about making some IW settings "live" and I 
> remember that RAM buffer size was one of them. Judging from IW code, I see 
> that RAM buffer can be changed "live" as IW never caches it.
> However, I don't remember which other settings were decided to be "live" and 
> I don't see any documentation in IW nor IWC for that. IW.getConfig mentions:
> {code}
> * NOTE: some settings may be changed on the
> * returned {@link IndexWriterConfig}, and will take
> * effect in the current IndexWriter instance.  See the
> * javadocs for the specific setters in {@link
> * IndexWriterConfig} for details.
> {code}
> But there's no text on e.g. IWC.setRAMBuffer mentioning that.
> I think that it'd be good if we make it easier for users to tell which of the 
> settings are "live" ones. There are few possible ways to do it:
> * Introduce a custom @live.setting tag on the relevant IWC.set methods, and 
> add special text for them in build.xml
> ** Or, drop the tag and just document it clearly.
> * Separate IWC to two interfaces, LiveConfig and OneTimeConfig (name 
> proposals are welcome !), have IWC impl both, and introduce another 
> IW.getLiveConfig which will return that interface, thereby clearly letting 
> the user know which of the settings are "live".
> It'd be good if IWC itself could only expose setXYZ methods for the "live" 
> settings though. So perhaps, off the top of my head, we can do something like 
> this:
> * Introduce a Config object, which is essentially what IWC is today, and pass 
> it to IW.
> * IW will create a different object, IWC from that Config and IW.getConfig 
> will return IWC.
> * IWC itself will only have setXYZ methods for the "live" settings.
> It adds another object, but user code doesn't change - it still creates a 
> Config object when initializing IW, and need to handle a different type if it 
> ever calls IW.getConfig.
> Maybe that's not such a bad idea?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Typo in test framework

2012-06-12 Thread Dawid Weiss
> I guess if it just spelled IGNORED/A, I wouldn't think it's a typo. If it's
> possible, can we have it spelled correctly? It's not critical if it's too

Hmm... It makes that column two characters wider! :)

D.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-4120) FST should use packed integer arrays

2012-06-12 Thread Adrien Grand (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adrien Grand updated LUCENE-4120:
-

Attachment: LUCENE-4120.patch

bq. Can you move the imports under the copyright header in GrowableWriter.java?

Patch updated.

> FST should use packed integer arrays
> 
>
> Key: LUCENE-4120
> URL: https://issues.apache.org/jira/browse/LUCENE-4120
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: core/FSTs
>Reporter: Adrien Grand
>Assignee: Adrien Grand
>Priority: Minor
> Fix For: 4.0
>
> Attachments: LUCENE-4120.patch, LUCENE-4120.patch, LUCENE-4120.patch
>
>
> There are some places where an int[] could be advantageously replaced with a 
> packed integer array.
> I am thinking (at least) of:
>  * FST.nodeAddress (GrowableWriter)
>  * FST.inCounts (GrowableWriter)
>  * FST.nodeRefToAddress (read-only Reader)
> The serialization/deserialization methods should be modified too in order to 
> take advantage of PackedInts.get{Reader,Writer}.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4132) IndexWriterConfig live settings

2012-06-12 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13293868#comment-13293868
 ] 

Shai Erera commented on LUCENE-4132:


The generics is because I wanted to not duplicate code between LiveConfig and 
IWC, so that the live settings share the same setXYZ code. First I thought to 
write a separate LiveConfig class, but then the setter methods need to be 
duplicated. I'll take another look -- perhaps IWC.setRAMBuffer for instance can 
just delegate to a private LiveConfig instance.setter. That will keep the APIs 
without generics, with perhaps so jdoc duplication ...

I can take a stab at something like that, or if you have another proposal. I 
don't want to let go of the builder pattern though.

> IndexWriterConfig live settings
> ---
>
> Key: LUCENE-4132
> URL: https://issues.apache.org/jira/browse/LUCENE-4132
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Shai Erera
>Priority: Minor
> Fix For: 4.0, 5.0
>
> Attachments: LUCENE-4132.patch
>
>
> A while ago there was a discussion about making some IW settings "live" and I 
> remember that RAM buffer size was one of them. Judging from IW code, I see 
> that RAM buffer can be changed "live" as IW never caches it.
> However, I don't remember which other settings were decided to be "live" and 
> I don't see any documentation in IW nor IWC for that. IW.getConfig mentions:
> {code}
> * NOTE: some settings may be changed on the
> * returned {@link IndexWriterConfig}, and will take
> * effect in the current IndexWriter instance.  See the
> * javadocs for the specific setters in {@link
> * IndexWriterConfig} for details.
> {code}
> But there's no text on e.g. IWC.setRAMBuffer mentioning that.
> I think that it'd be good if we make it easier for users to tell which of the 
> settings are "live" ones. There are few possible ways to do it:
> * Introduce a custom @live.setting tag on the relevant IWC.set methods, and 
> add special text for them in build.xml
> ** Or, drop the tag and just document it clearly.
> * Separate IWC to two interfaces, LiveConfig and OneTimeConfig (name 
> proposals are welcome !), have IWC impl both, and introduce another 
> IW.getLiveConfig which will return that interface, thereby clearly letting 
> the user know which of the settings are "live".
> It'd be good if IWC itself could only expose setXYZ methods for the "live" 
> settings though. So perhaps, off the top of my head, we can do something like 
> this:
> * Introduce a Config object, which is essentially what IWC is today, and pass 
> it to IW.
> * IW will create a different object, IWC from that Config and IW.getConfig 
> will return IWC.
> * IWC itself will only have setXYZ methods for the "live" settings.
> It adds another object, but user code doesn't change - it still creates a 
> Config object when initializing IW, and need to handle a different type if it 
> ever calls IW.getConfig.
> Maybe that's not such a bad idea?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-3538) Unloading a SolrCore object and specifying delete does not fully delete all Solr parts

2012-06-12 Thread Andre' Hazelwood (JIRA)
Andre' Hazelwood created SOLR-3538:
--

 Summary: Unloading a SolrCore object and specifying delete does 
not fully delete all Solr parts
 Key: SOLR-3538
 URL: https://issues.apache.org/jira/browse/SOLR-3538
 Project: Solr
  Issue Type: Bug
  Components: multicore
Affects Versions: 4.0
 Environment: Windows
Reporter: Andre' Hazelwood
Priority: Minor


If I issue a action=UNLOAD&delete=true request for a specific Solr Core on the 
CoreAdminHandler, all files are removed except files located in the tlog 
directory under the core.  We are trying to manage our cores from an outside 
system, so having the core not actually get deleted is a pain.

I would expect all files as well as the Core directory to be removed if the 
delete parameter is specified.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3535) Add block support for XMLLoader

2012-06-12 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13293867#comment-13293867
 ] 

Yonik Seeley commented on SOLR-3535:


It seems like what we really want to express here is nested documents.  
Directly expressing that in the transfer syntax (XML, JSON, or binary) would 
seem more natural and also allow us to handle/express multiple levels of 
nesting.  This also frees the user from having to think about details such as 
where the parent document goes (at the beginning or the end?).

Internally representing a parent and it's child documents as a single 
SolrInputDocument also has a lot of benefits and seems like it's the easiest 
path to get this working with all of the existing code (like transaction 
logging, forwarding docs based on ID in cloud mode, etc).



> Add block support for XMLLoader
> ---
>
> Key: SOLR-3535
> URL: https://issues.apache.org/jira/browse/SOLR-3535
> Project: Solr
>  Issue Type: Sub-task
>  Components: update
>Affects Versions: 4.1, 5.0
>Reporter: Mikhail Khludnev
>Priority: Minor
> Attachments: SOLR-3535.patch
>
>
> I'd like to add the following update xml message:
> 
> 
> 
> 
> out of scope for now: 
> * other update formats
> * update log support (NRT), should not be a big deal
> * overwrite feature support for block updates - it's more complicated, I'll 
> tell you why
> Alt
> * wdyt about adding attribute to the current tag {pre}{pre} 
> * or we can establish RunBlockUpdateProcessor which treat every  
>  as a block.
> *Test is included!!*
> How you'd suggest to improve the patch?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Grouping - Boosting large groups

2012-06-12 Thread corwin
That's a good idea, thanks for the tip Martijn. I'm not a fan of performing
an extra search, but it does seem like it's unavoidable for this scenario.

We are currently working with Lucene 3.5 and you mentioned that it assumes
Lucene 4 or 3.6. Any particular reason for that? I prefer not upgrading just
yet unless there's a feature that will specifically help me accomplish this.

Thanks again,

Corwin.

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Grouping-Boosting-large-groups-tp3988959p3989266.html
Sent from the Lucene - Java Developer mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-3535) Add block support for XMLLoader

2012-06-12 Thread Mikhail Khludnev (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13293858#comment-13293858
 ] 

Mikhail Khludnev edited comment on SOLR-3535 at 6/12/12 7:15 PM:
-

@Simon,
the intention of this patch is index support for the parent ticket SOLR-3076. 
BJQ magic is explained at 
http://blog.mikemccandless.com/2012/01/searching-relational-content-with.html

I'm going to rework the patch by this week.  

  was (Author: mkhludnev):
@Simon,
the intention of this patch is index support for the parent ticket SOLR-3076. 
BJQ magic is explained at 
http://blog.mikemccandless.com/2012/01/searching-relational-content-with.html

I'm going to rework the path by this week.  
  
> Add block support for XMLLoader
> ---
>
> Key: SOLR-3535
> URL: https://issues.apache.org/jira/browse/SOLR-3535
> Project: Solr
>  Issue Type: Sub-task
>  Components: update
>Affects Versions: 4.1, 5.0
>Reporter: Mikhail Khludnev
>Priority: Minor
> Attachments: SOLR-3535.patch
>
>
> I'd like to add the following update xml message:
> 
> 
> 
> 
> out of scope for now: 
> * other update formats
> * update log support (NRT), should not be a big deal
> * overwrite feature support for block updates - it's more complicated, I'll 
> tell you why
> Alt
> * wdyt about adding attribute to the current tag {pre}{pre} 
> * or we can establish RunBlockUpdateProcessor which treat every  
>  as a block.
> *Test is included!!*
> How you'd suggest to improve the patch?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3535) Add block support for XMLLoader

2012-06-12 Thread Mikhail Khludnev (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13293858#comment-13293858
 ] 

Mikhail Khludnev commented on SOLR-3535:


@Simon,
the intention of this patch is index support for the parent ticket SOLR-3076. 
BJQ magic is explained at 
http://blog.mikemccandless.com/2012/01/searching-relational-content-with.html

I'm going to rework the path by this week.  

> Add block support for XMLLoader
> ---
>
> Key: SOLR-3535
> URL: https://issues.apache.org/jira/browse/SOLR-3535
> Project: Solr
>  Issue Type: Sub-task
>  Components: update
>Affects Versions: 4.1, 5.0
>Reporter: Mikhail Khludnev
>Priority: Minor
> Attachments: SOLR-3535.patch
>
>
> I'd like to add the following update xml message:
> 
> 
> 
> 
> out of scope for now: 
> * other update formats
> * update log support (NRT), should not be a big deal
> * overwrite feature support for block updates - it's more complicated, I'll 
> tell you why
> Alt
> * wdyt about adding attribute to the current tag {pre}{pre} 
> * or we can establish RunBlockUpdateProcessor which treat every  
>  as a block.
> *Test is included!!*
> How you'd suggest to improve the patch?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4132) IndexWriterConfig live settings

2012-06-12 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13293844#comment-13293844
 ] 

Shai Erera commented on LUCENE-4132:


I love it too, and the changes would be too horrible. We use this builder 
pattern everywhere. Remember, the changes in this issue are to not confuse 
people, that's it. They don't cause users to change their code almost at all.

I don't quite understand what's the issue with the generics. If you don't look 
at IWC / LC code, it's not visible at all. I mean, in your application code, 
you won't see any generics.

> IndexWriterConfig live settings
> ---
>
> Key: LUCENE-4132
> URL: https://issues.apache.org/jira/browse/LUCENE-4132
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Shai Erera
>Priority: Minor
> Fix For: 4.0, 5.0
>
> Attachments: LUCENE-4132.patch
>
>
> A while ago there was a discussion about making some IW settings "live" and I 
> remember that RAM buffer size was one of them. Judging from IW code, I see 
> that RAM buffer can be changed "live" as IW never caches it.
> However, I don't remember which other settings were decided to be "live" and 
> I don't see any documentation in IW nor IWC for that. IW.getConfig mentions:
> {code}
> * NOTE: some settings may be changed on the
> * returned {@link IndexWriterConfig}, and will take
> * effect in the current IndexWriter instance.  See the
> * javadocs for the specific setters in {@link
> * IndexWriterConfig} for details.
> {code}
> But there's no text on e.g. IWC.setRAMBuffer mentioning that.
> I think that it'd be good if we make it easier for users to tell which of the 
> settings are "live" ones. There are few possible ways to do it:
> * Introduce a custom @live.setting tag on the relevant IWC.set methods, and 
> add special text for them in build.xml
> ** Or, drop the tag and just document it clearly.
> * Separate IWC to two interfaces, LiveConfig and OneTimeConfig (name 
> proposals are welcome !), have IWC impl both, and introduce another 
> IW.getLiveConfig which will return that interface, thereby clearly letting 
> the user know which of the settings are "live".
> It'd be good if IWC itself could only expose setXYZ methods for the "live" 
> settings though. So perhaps, off the top of my head, we can do something like 
> this:
> * Introduce a Config object, which is essentially what IWC is today, and pass 
> it to IW.
> * IW will create a different object, IWC from that Config and IW.getConfig 
> will return IWC.
> * IWC itself will only have setXYZ methods for the "live" settings.
> It adds another object, but user code doesn't change - it still creates a 
> Config object when initializing IW, and need to handle a different type if it 
> ever calls IW.getConfig.
> Maybe that's not such a bad idea?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Typo in test framework

2012-06-12 Thread Shai Erera
Thanks Dawid,

I guess if it just spelled IGNORED/A, I wouldn't think it's a typo. If it's
possible, can we have it spelled correctly? It's not critical if it's too
much work.

Shai

On Tue, Jun 12, 2012 at 9:20 PM, Dawid Weiss
wrote:

> Hi Shai.
>
> I think this question may be of relevance to others, so I allowed
> myself to CC the list. So:
>
> > I see these printed when I run test-core:
> >
> >  [junit4] IGNOR/A 0.00s | Test10KPulsings.test10kNotPulsed
> >  [junit4]> Assumption #1: 'nightly' test group is disabled (@Nightly)
> >
> > Is IGNOR a typo? Or is it a weird locale?
>
> JUnit has the notion of "ignored" test (marked with @Ignore) or
> "assumption-ignored" test which is physically executed but at some
> point ends with an AssumptionViolatedException:
>
>
> https://github.com/KentBeck/junit/blob/master/src/main/java/org/junit/internal/AssumptionViolatedException.java
>
> The primary distinction is that the test can evaluate a condition and
> decide to throw an assumption while @Ignore is unconditional. There
> are also other technical side-effects -- listeners do get informed
> about the cause of an assumption (an instance of the thrown exception)
> while they are not informed about any cause of the ignored test (I
> think because it was at some point assumed that tests can only be
> ignored for one reason -- @Ignore annotation). Assumption-ignore
> exceptions can happen simultaneously with other exceptions resulting
> from rules -- the behavior then is not clearly defined...
>
> Randomizedtesting's  task tries hard to report all the events
> that really happened and report them -- including assumption-failed
> tests. So IGNOR/A is an assumption-ignored test (as opposed to IGNORED
> which is a test ignored for other reasons).
>
> Hope this helps,
>
> Dawid
>


Re: Grouping - Boosting large groups

2012-06-12 Thread Martijn v Groningen
Hi Corwin,

This not yet possible out of the box. However I think it is possible:
1) Create a Lucene collector that counts for all groups the number of
document that match. This collector will basically compute a map with
the group value as key and a count as value
2) Run this collector as an extra phase before you run the
TermFirstPassGroupingCollector.
3) Use BoostedQuery with a custom value source. The custom value
source can emit a boost value per document (via FunctionValues) and in
this case you base it on the document count
from the group to document count map computed in step 1.

Note: This approach is more expensive then what you are doing now. It
requires another extra search.
Note: The approach assumes lucene 4.0 (which isn't released), but
should be possible with lucene 3.6 (I think)

Make an issue in Jira about this if you start working on it. This and
similar group properties based sorting / boosting are much needed
features.

Martijn

On 11 June 2012 18:10, corwin  wrote:
> Hi forum,
>
> I've implemented grouping using the TermFirstPassGroupingCollector and
> TermSecondPassGroupingCollector, pretty much exactly as the example at the
> API. This works really well. I'm getting a the groups sorted by the computed
> relevance, within each groups the docs are sorted by a numeric field. So
> far, so good.
>
> Now I want to make things more complicated by boosting larger groups in
> addition to the existing relevance sort. For example, if the first result
> has a relevancy score of 1 and the group has 2 docs and the second group has
> a score of 0.9 and 4 docs, I want to boost the second group so it will
> appear before the first.
>
> Basically I'm trying to boost the groups according to the number of elements
> in the groups.
>
> I couldn't figure out how to do that or find an example anywhere.
>
> I hope I'm making sense
>
> Thanks in advance.
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Grouping-Boosting-large-groups-tp3988959.html
> Sent from the Lucene - Java Developer mailing list archive at Nabble.com.
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>



-- 
Met vriendelijke groet,

Martijn van Groningen

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Typo in test framework

2012-06-12 Thread Dawid Weiss
Hi Shai.

I think this question may be of relevance to others, so I allowed
myself to CC the list. So:

> I see these printed when I run test-core:
>
>  [junit4] IGNOR/A 0.00s | Test10KPulsings.test10kNotPulsed
>  [junit4]    > Assumption #1: 'nightly' test group is disabled (@Nightly)
>
> Is IGNOR a typo? Or is it a weird locale?

JUnit has the notion of "ignored" test (marked with @Ignore) or
"assumption-ignored" test which is physically executed but at some
point ends with an AssumptionViolatedException:

https://github.com/KentBeck/junit/blob/master/src/main/java/org/junit/internal/AssumptionViolatedException.java

The primary distinction is that the test can evaluate a condition and
decide to throw an assumption while @Ignore is unconditional. There
are also other technical side-effects -- listeners do get informed
about the cause of an assumption (an instance of the thrown exception)
while they are not informed about any cause of the ignored test (I
think because it was at some point assumed that tests can only be
ignored for one reason -- @Ignore annotation). Assumption-ignore
exceptions can happen simultaneously with other exceptions resulting
from rules -- the behavior then is not clearly defined...

Randomizedtesting's  task tries hard to report all the events
that really happened and report them -- including assumption-failed
tests. So IGNOR/A is an assumption-ignored test (as opposed to IGNORED
which is a test ignored for other reasons).

Hope this helps,

Dawid

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4120) FST should use packed integer arrays

2012-06-12 Thread Dawid Weiss (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13293812#comment-13293812
 ] 

Dawid Weiss commented on LUCENE-4120:
-

bq. I think that's fine; you can't change an FST once it's built (not yet 
anyway...).

Yeah, it'd be hard with the packed format. I once thought it'd be interesting 
to see incremental fst construction based on merging (much like it's done with 
inverted indexes). Delete would still be difficult (or impossible) but 
additions should be relatively easy to merge.

> FST should use packed integer arrays
> 
>
> Key: LUCENE-4120
> URL: https://issues.apache.org/jira/browse/LUCENE-4120
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: core/FSTs
>Reporter: Adrien Grand
>Assignee: Adrien Grand
>Priority: Minor
> Fix For: 4.0
>
> Attachments: LUCENE-4120.patch, LUCENE-4120.patch
>
>
> There are some places where an int[] could be advantageously replaced with a 
> packed integer array.
> I am thinking (at least) of:
>  * FST.nodeAddress (GrowableWriter)
>  * FST.inCounts (GrowableWriter)
>  * FST.nodeRefToAddress (read-only Reader)
> The serialization/deserialization methods should be modified too in order to 
> take advantage of PackedInts.get{Reader,Writer}.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-4139) multivalued field with offsets makes corrumpt index

2012-06-12 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir resolved LUCENE-4139.
-

   Resolution: Fixed
Fix Version/s: 5.0
   4.0

> multivalued field with offsets makes corrumpt index 
> 
>
> Key: LUCENE-4139
> URL: https://issues.apache.org/jira/browse/LUCENE-4139
> Project: Lucene - Java
>  Issue Type: Bug
>Affects Versions: 4.0
>Reporter: Robert Muir
> Fix For: 4.0, 5.0
>
> Attachments: LUCENE-4139.patch, LUCENE-4139.patch, LUCENE-4139.patch, 
> LUCENE-4139_test.patch, LUCENE-4139_test.patch
>
>
> I was trying to beef up TestBackwardsCompatibility (LUCENE-4085) but i 
> accidentally made a corrupt index due to a typo:
> {code}
> // a field with both offsets and term vectors for a cross-check
> FieldType customType3 = new FieldType(TextField.TYPE_STORED);
> customType3.setStoreTermVectors(true);
> customType3.setStoreTermVectorPositions(true);
> customType3.setStoreTermVectorOffsets(true);
> customType3.setIndexOptions(IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS);
> doc.add(new Field("content3", "here is more content with aaa aaa aaa", 
> customType3));
> // a field that omits only positions
> FieldType customType4 = new FieldType(TextField.TYPE_STORED);
> customType4.setStoreTermVectors(true);
> customType4.setStoreTermVectorPositions(false);
> customType4.setStoreTermVectorOffsets(true);
> customType4.setIndexOptions(IndexOptions.DOCS_AND_FREQS);
> // check out the copy-paste typo here! i forgot to change this to content4
>  doc.add(new Field("content3", "here is more content with aaa aaa aaa", 
> customType3));
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4132) IndexWriterConfig live settings

2012-06-12 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13293792#comment-13293792
 ] 

Uwe Schindler commented on LUCENE-4132:
---

We could, I am against, I love IndexWriterConfig!

> IndexWriterConfig live settings
> ---
>
> Key: LUCENE-4132
> URL: https://issues.apache.org/jira/browse/LUCENE-4132
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Shai Erera
>Priority: Minor
> Fix For: 4.0, 5.0
>
> Attachments: LUCENE-4132.patch
>
>
> A while ago there was a discussion about making some IW settings "live" and I 
> remember that RAM buffer size was one of them. Judging from IW code, I see 
> that RAM buffer can be changed "live" as IW never caches it.
> However, I don't remember which other settings were decided to be "live" and 
> I don't see any documentation in IW nor IWC for that. IW.getConfig mentions:
> {code}
> * NOTE: some settings may be changed on the
> * returned {@link IndexWriterConfig}, and will take
> * effect in the current IndexWriter instance.  See the
> * javadocs for the specific setters in {@link
> * IndexWriterConfig} for details.
> {code}
> But there's no text on e.g. IWC.setRAMBuffer mentioning that.
> I think that it'd be good if we make it easier for users to tell which of the 
> settings are "live" ones. There are few possible ways to do it:
> * Introduce a custom @live.setting tag on the relevant IWC.set methods, and 
> add special text for them in build.xml
> ** Or, drop the tag and just document it clearly.
> * Separate IWC to two interfaces, LiveConfig and OneTimeConfig (name 
> proposals are welcome !), have IWC impl both, and introduce another 
> IW.getLiveConfig which will return that interface, thereby clearly letting 
> the user know which of the settings are "live".
> It'd be good if IWC itself could only expose setXYZ methods for the "live" 
> settings though. So perhaps, off the top of my head, we can do something like 
> this:
> * Introduce a Config object, which is essentially what IWC is today, and pass 
> it to IW.
> * IW will create a different object, IWC from that Config and IW.getConfig 
> will return IWC.
> * IWC itself will only have setXYZ methods for the "live" settings.
> It adds another object, but user code doesn't change - it still creates a 
> Config object when initializing IW, and need to handle a different type if it 
> ever calls IW.getConfig.
> Maybe that's not such a bad idea?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4132) IndexWriterConfig live settings

2012-06-12 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13293791#comment-13293791
 ] 

Michael McCandless commented on LUCENE-4132:


If we remove IWC's chained setters (return void), can we simplify this?

> IndexWriterConfig live settings
> ---
>
> Key: LUCENE-4132
> URL: https://issues.apache.org/jira/browse/LUCENE-4132
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Shai Erera
>Priority: Minor
> Fix For: 4.0, 5.0
>
> Attachments: LUCENE-4132.patch
>
>
> A while ago there was a discussion about making some IW settings "live" and I 
> remember that RAM buffer size was one of them. Judging from IW code, I see 
> that RAM buffer can be changed "live" as IW never caches it.
> However, I don't remember which other settings were decided to be "live" and 
> I don't see any documentation in IW nor IWC for that. IW.getConfig mentions:
> {code}
> * NOTE: some settings may be changed on the
> * returned {@link IndexWriterConfig}, and will take
> * effect in the current IndexWriter instance.  See the
> * javadocs for the specific setters in {@link
> * IndexWriterConfig} for details.
> {code}
> But there's no text on e.g. IWC.setRAMBuffer mentioning that.
> I think that it'd be good if we make it easier for users to tell which of the 
> settings are "live" ones. There are few possible ways to do it:
> * Introduce a custom @live.setting tag on the relevant IWC.set methods, and 
> add special text for them in build.xml
> ** Or, drop the tag and just document it clearly.
> * Separate IWC to two interfaces, LiveConfig and OneTimeConfig (name 
> proposals are welcome !), have IWC impl both, and introduce another 
> IW.getLiveConfig which will return that interface, thereby clearly letting 
> the user know which of the settings are "live".
> It'd be good if IWC itself could only expose setXYZ methods for the "live" 
> settings though. So perhaps, off the top of my head, we can do something like 
> this:
> * Introduce a Config object, which is essentially what IWC is today, and pass 
> it to IW.
> * IW will create a different object, IWC from that Config and IW.getConfig 
> will return IWC.
> * IWC itself will only have setXYZ methods for the "live" settings.
> It adds another object, but user code doesn't change - it still creates a 
> Config object when initializing IW, and need to handle a different type if it 
> ever calls IW.getConfig.
> Maybe that's not such a bad idea?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4132) IndexWriterConfig live settings

2012-06-12 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13293788#comment-13293788
 ] 

Robert Muir commented on LUCENE-4132:
-

Its not certified by me. Its too confusing for a class everyone must use.

I dont care about the builder pattern, builder pattern simply isnt worth it for 
confusing generics on a config class.


> IndexWriterConfig live settings
> ---
>
> Key: LUCENE-4132
> URL: https://issues.apache.org/jira/browse/LUCENE-4132
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Shai Erera
>Priority: Minor
> Fix For: 4.0, 5.0
>
> Attachments: LUCENE-4132.patch
>
>
> A while ago there was a discussion about making some IW settings "live" and I 
> remember that RAM buffer size was one of them. Judging from IW code, I see 
> that RAM buffer can be changed "live" as IW never caches it.
> However, I don't remember which other settings were decided to be "live" and 
> I don't see any documentation in IW nor IWC for that. IW.getConfig mentions:
> {code}
> * NOTE: some settings may be changed on the
> * returned {@link IndexWriterConfig}, and will take
> * effect in the current IndexWriter instance.  See the
> * javadocs for the specific setters in {@link
> * IndexWriterConfig} for details.
> {code}
> But there's no text on e.g. IWC.setRAMBuffer mentioning that.
> I think that it'd be good if we make it easier for users to tell which of the 
> settings are "live" ones. There are few possible ways to do it:
> * Introduce a custom @live.setting tag on the relevant IWC.set methods, and 
> add special text for them in build.xml
> ** Or, drop the tag and just document it clearly.
> * Separate IWC to two interfaces, LiveConfig and OneTimeConfig (name 
> proposals are welcome !), have IWC impl both, and introduce another 
> IW.getLiveConfig which will return that interface, thereby clearly letting 
> the user know which of the settings are "live".
> It'd be good if IWC itself could only expose setXYZ methods for the "live" 
> settings though. So perhaps, off the top of my head, we can do something like 
> this:
> * Introduce a Config object, which is essentially what IWC is today, and pass 
> it to IW.
> * IW will create a different object, IWC from that Config and IW.getConfig 
> will return IWC.
> * IWC itself will only have setXYZ methods for the "live" settings.
> It adds another object, but user code doesn't change - it still creates a 
> Config object when initializing IW, and need to handle a different type if it 
> ever calls IW.getConfig.
> Maybe that's not such a bad idea?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4132) IndexWriterConfig live settings

2012-06-12 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13293784#comment-13293784
 ] 

Uwe Schindler commented on LUCENE-4132:
---

That's certified and suggested by the generics policeman. The generics are 
needed to make the builder API work correct (compare Enum>)

> IndexWriterConfig live settings
> ---
>
> Key: LUCENE-4132
> URL: https://issues.apache.org/jira/browse/LUCENE-4132
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Shai Erera
>Priority: Minor
> Fix For: 4.0, 5.0
>
> Attachments: LUCENE-4132.patch
>
>
> A while ago there was a discussion about making some IW settings "live" and I 
> remember that RAM buffer size was one of them. Judging from IW code, I see 
> that RAM buffer can be changed "live" as IW never caches it.
> However, I don't remember which other settings were decided to be "live" and 
> I don't see any documentation in IW nor IWC for that. IW.getConfig mentions:
> {code}
> * NOTE: some settings may be changed on the
> * returned {@link IndexWriterConfig}, and will take
> * effect in the current IndexWriter instance.  See the
> * javadocs for the specific setters in {@link
> * IndexWriterConfig} for details.
> {code}
> But there's no text on e.g. IWC.setRAMBuffer mentioning that.
> I think that it'd be good if we make it easier for users to tell which of the 
> settings are "live" ones. There are few possible ways to do it:
> * Introduce a custom @live.setting tag on the relevant IWC.set methods, and 
> add special text for them in build.xml
> ** Or, drop the tag and just document it clearly.
> * Separate IWC to two interfaces, LiveConfig and OneTimeConfig (name 
> proposals are welcome !), have IWC impl both, and introduce another 
> IW.getLiveConfig which will return that interface, thereby clearly letting 
> the user know which of the settings are "live".
> It'd be good if IWC itself could only expose setXYZ methods for the "live" 
> settings though. So perhaps, off the top of my head, we can do something like 
> this:
> * Introduce a Config object, which is essentially what IWC is today, and pass 
> it to IW.
> * IW will create a different object, IWC from that Config and IW.getConfig 
> will return IWC.
> * IWC itself will only have setXYZ methods for the "live" settings.
> It adds another object, but user code doesn't change - it still creates a 
> Config object when initializing IW, and need to handle a different type if it 
> ever calls IW.getConfig.
> Maybe that's not such a bad idea?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4132) IndexWriterConfig live settings

2012-06-12 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13293783#comment-13293783
 ] 

Robert Muir commented on LUCENE-4132:
-

I think the class hierarchy/generics are too tricky.
Why do we need generics?

> IndexWriterConfig live settings
> ---
>
> Key: LUCENE-4132
> URL: https://issues.apache.org/jira/browse/LUCENE-4132
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Shai Erera
>Priority: Minor
> Fix For: 4.0, 5.0
>
> Attachments: LUCENE-4132.patch
>
>
> A while ago there was a discussion about making some IW settings "live" and I 
> remember that RAM buffer size was one of them. Judging from IW code, I see 
> that RAM buffer can be changed "live" as IW never caches it.
> However, I don't remember which other settings were decided to be "live" and 
> I don't see any documentation in IW nor IWC for that. IW.getConfig mentions:
> {code}
> * NOTE: some settings may be changed on the
> * returned {@link IndexWriterConfig}, and will take
> * effect in the current IndexWriter instance.  See the
> * javadocs for the specific setters in {@link
> * IndexWriterConfig} for details.
> {code}
> But there's no text on e.g. IWC.setRAMBuffer mentioning that.
> I think that it'd be good if we make it easier for users to tell which of the 
> settings are "live" ones. There are few possible ways to do it:
> * Introduce a custom @live.setting tag on the relevant IWC.set methods, and 
> add special text for them in build.xml
> ** Or, drop the tag and just document it clearly.
> * Separate IWC to two interfaces, LiveConfig and OneTimeConfig (name 
> proposals are welcome !), have IWC impl both, and introduce another 
> IW.getLiveConfig which will return that interface, thereby clearly letting 
> the user know which of the settings are "live".
> It'd be good if IWC itself could only expose setXYZ methods for the "live" 
> settings though. So perhaps, off the top of my head, we can do something like 
> this:
> * Introduce a Config object, which is essentially what IWC is today, and pass 
> it to IW.
> * IW will create a different object, IWC from that Config and IW.getConfig 
> will return IWC.
> * IWC itself will only have setXYZ methods for the "live" settings.
> It adds another object, but user code doesn't change - it still creates a 
> Config object when initializing IW, and need to handle a different type if it 
> ever calls IW.getConfig.
> Maybe that's not such a bad idea?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4132) IndexWriterConfig live settings

2012-06-12 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13293781#comment-13293781
 ] 

Shai Erera commented on LUCENE-4132:


I had a brief chat about IWC.usedByIW with Mike (was introduced in 
LUCENE-4084), and we both agree it's not needed anymore, as now with 
IW.getConfig() returning LiveConfig and IW taking IWC in its ctor, no one can 
pass the same instance returned from getConfig to a new IW, and so the relevant 
test can be nuked, together with that AtomicBoolean.

I'll nuke them then, and absorb ReadOnlyConfig into AbstractLiveConfig and 
stick with just two concrete clases: LiveConfig returned from IW.getConfig and 
IWC given to its ctor.

I'll post a patch probably tomorrow.

> IndexWriterConfig live settings
> ---
>
> Key: LUCENE-4132
> URL: https://issues.apache.org/jira/browse/LUCENE-4132
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Shai Erera
>Priority: Minor
> Fix For: 4.0, 5.0
>
> Attachments: LUCENE-4132.patch
>
>
> A while ago there was a discussion about making some IW settings "live" and I 
> remember that RAM buffer size was one of them. Judging from IW code, I see 
> that RAM buffer can be changed "live" as IW never caches it.
> However, I don't remember which other settings were decided to be "live" and 
> I don't see any documentation in IW nor IWC for that. IW.getConfig mentions:
> {code}
> * NOTE: some settings may be changed on the
> * returned {@link IndexWriterConfig}, and will take
> * effect in the current IndexWriter instance.  See the
> * javadocs for the specific setters in {@link
> * IndexWriterConfig} for details.
> {code}
> But there's no text on e.g. IWC.setRAMBuffer mentioning that.
> I think that it'd be good if we make it easier for users to tell which of the 
> settings are "live" ones. There are few possible ways to do it:
> * Introduce a custom @live.setting tag on the relevant IWC.set methods, and 
> add special text for them in build.xml
> ** Or, drop the tag and just document it clearly.
> * Separate IWC to two interfaces, LiveConfig and OneTimeConfig (name 
> proposals are welcome !), have IWC impl both, and introduce another 
> IW.getLiveConfig which will return that interface, thereby clearly letting 
> the user know which of the settings are "live".
> It'd be good if IWC itself could only expose setXYZ methods for the "live" 
> settings though. So perhaps, off the top of my head, we can do something like 
> this:
> * Introduce a Config object, which is essentially what IWC is today, and pass 
> it to IW.
> * IW will create a different object, IWC from that Config and IW.getConfig 
> will return IWC.
> * IWC itself will only have setXYZ methods for the "live" settings.
> It adds another object, but user code doesn't change - it still creates a 
> Config object when initializing IW, and need to handle a different type if it 
> ever calls IW.getConfig.
> Maybe that's not such a bad idea?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4120) FST should use packed integer arrays

2012-06-12 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13293780#comment-13293780
 ] 

Michael McCandless commented on LUCENE-4120:


Patch looks great!

bq. I can switch this method to Mutable but this means that it won't be 
possible to save a FST read from disk anymore (maybe not a problem?)

I think that's fine; you can't change an FST once it's built (not yet
anyway...).

bq. 0..1 gives more chances to different implementations to be selected. 
FASTEST=7 is only useful for bitsPerValue=1 so that a Direct8 is instantiated. 
If we used an uniformly distributed float between COMPACT=0 and FASTEST=7, a 
Direct* implementation would be used more than 6/7 of the time when 
bitsPerValue>=4. For example, if bitsPerValue=15, a Direct16 will be 
instantiated if acceptableOverheadRatio>=1/15=0.07 and a Packed64 otherwise. A 
lower upper bound for acceptableOverheadRatio makes the latter case more likely.

Ahh OK that makes sense, so let's leave it as 0..1.

Can you move the imports under the copyright header in
GrowableWriter.java?



> FST should use packed integer arrays
> 
>
> Key: LUCENE-4120
> URL: https://issues.apache.org/jira/browse/LUCENE-4120
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: core/FSTs
>Reporter: Adrien Grand
>Assignee: Adrien Grand
>Priority: Minor
> Fix For: 4.0
>
> Attachments: LUCENE-4120.patch, LUCENE-4120.patch
>
>
> There are some places where an int[] could be advantageously replaced with a 
> packed integer array.
> I am thinking (at least) of:
>  * FST.nodeAddress (GrowableWriter)
>  * FST.inCounts (GrowableWriter)
>  * FST.nodeRefToAddress (read-only Reader)
> The serialization/deserialization methods should be modified too in order to 
> take advantage of PackedInts.get{Reader,Writer}.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3535) Add block support for XMLLoader

2012-06-12 Thread Simon Rosenthal (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13293761#comment-13293761
 ] 

Simon Rosenthal commented on SOLR-3535:
---

Mikhail:
not clear to me from the code/comments exactly what this issue/patch is meant 
to accomplish. I'm assuming that the intention is to be able to add atomically 
every document in the block at once ?

That is a use case which I have encountered  (a batch update of a set of 
records with new product price information, where you want to commit them only 
when the complete set has been indexed, regardless of autocommits being fired 
off or other processes issuing commits). If that's the intention, this patch is 
great !

I attempted to address the problem of undesired autocommits  in SOLR-2664 - 
enable/disable autocommit on the fly, but that patch is very out of date.

I do think it should be extended to updates in CSV/JSON and updates using the 
SolrJ API.

+1 for Erik's suggestion on the syntax.



> Add block support for XMLLoader
> ---
>
> Key: SOLR-3535
> URL: https://issues.apache.org/jira/browse/SOLR-3535
> Project: Solr
>  Issue Type: Sub-task
>  Components: update
>Affects Versions: 4.1, 5.0
>Reporter: Mikhail Khludnev
>Priority: Minor
> Attachments: SOLR-3535.patch
>
>
> I'd like to add the following update xml message:
> 
> 
> 
> 
> out of scope for now: 
> * other update formats
> * update log support (NRT), should not be a big deal
> * overwrite feature support for block updates - it's more complicated, I'll 
> tell you why
> Alt
> * wdyt about adding attribute to the current tag {pre}{pre} 
> * or we can establish RunBlockUpdateProcessor which treat every  
>  as a block.
> *Test is included!!*
> How you'd suggest to improve the patch?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4139) multivalued field with offsets makes corrumpt index

2012-06-12 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13293760#comment-13293760
 ] 

Michael McCandless commented on LUCENE-4139:


Patch looks good!  Nice find.  +1

> multivalued field with offsets makes corrumpt index 
> 
>
> Key: LUCENE-4139
> URL: https://issues.apache.org/jira/browse/LUCENE-4139
> Project: Lucene - Java
>  Issue Type: Bug
>Affects Versions: 4.0
>Reporter: Robert Muir
> Attachments: LUCENE-4139.patch, LUCENE-4139.patch, LUCENE-4139.patch, 
> LUCENE-4139_test.patch, LUCENE-4139_test.patch
>
>
> I was trying to beef up TestBackwardsCompatibility (LUCENE-4085) but i 
> accidentally made a corrupt index due to a typo:
> {code}
> // a field with both offsets and term vectors for a cross-check
> FieldType customType3 = new FieldType(TextField.TYPE_STORED);
> customType3.setStoreTermVectors(true);
> customType3.setStoreTermVectorPositions(true);
> customType3.setStoreTermVectorOffsets(true);
> customType3.setIndexOptions(IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS);
> doc.add(new Field("content3", "here is more content with aaa aaa aaa", 
> customType3));
> // a field that omits only positions
> FieldType customType4 = new FieldType(TextField.TYPE_STORED);
> customType4.setStoreTermVectors(true);
> customType4.setStoreTermVectorPositions(false);
> customType4.setStoreTermVectorOffsets(true);
> customType4.setIndexOptions(IndexOptions.DOCS_AND_FREQS);
> // check out the copy-paste typo here! i forgot to change this to content4
>  doc.add(new Field("content3", "here is more content with aaa aaa aaa", 
> customType3));
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-4139) multivalued field with offsets makes corrumpt index

2012-06-12 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-4139:


Attachment: LUCENE-4139.patch

stupid IDE. forgot to press save. This one actually has the 'prevOffset -> 
offsetAccum' rename.

> multivalued field with offsets makes corrumpt index 
> 
>
> Key: LUCENE-4139
> URL: https://issues.apache.org/jira/browse/LUCENE-4139
> Project: Lucene - Java
>  Issue Type: Bug
>Affects Versions: 4.0
>Reporter: Robert Muir
> Attachments: LUCENE-4139.patch, LUCENE-4139.patch, LUCENE-4139.patch, 
> LUCENE-4139_test.patch, LUCENE-4139_test.patch
>
>
> I was trying to beef up TestBackwardsCompatibility (LUCENE-4085) but i 
> accidentally made a corrupt index due to a typo:
> {code}
> // a field with both offsets and term vectors for a cross-check
> FieldType customType3 = new FieldType(TextField.TYPE_STORED);
> customType3.setStoreTermVectors(true);
> customType3.setStoreTermVectorPositions(true);
> customType3.setStoreTermVectorOffsets(true);
> customType3.setIndexOptions(IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS);
> doc.add(new Field("content3", "here is more content with aaa aaa aaa", 
> customType3));
> // a field that omits only positions
> FieldType customType4 = new FieldType(TextField.TYPE_STORED);
> customType4.setStoreTermVectors(true);
> customType4.setStoreTermVectorPositions(false);
> customType4.setStoreTermVectorOffsets(true);
> customType4.setIndexOptions(IndexOptions.DOCS_AND_FREQS);
> // check out the copy-paste typo here! i forgot to change this to content4
>  doc.add(new Field("content3", "here is more content with aaa aaa aaa", 
> customType3));
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-2605) queryparser parses on whitespace

2012-06-12 Thread Jack Krupansky (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13293748#comment-13293748
 ] 

Jack Krupansky commented on LUCENE-2605:


My thought on the original issue is that most query parsers should accumulate 
adjacent terms without intervening operators as a "term list" (quoted phrases 
would be a second level of term list) and that there needs to be a "list" 
interface for query term analysis.

Rather than simply present a raw text stream for the sequence/list of terms, 
each term would be fed into the token stream with an attribute that indicates 
which source term it belongs to.

The synonym processor would see a clean flow of terms and do its processing, 
but would also need to associate an id with each term of a multi-term synonym 
phrase so that multiple multi-word synonym choices for the same input term(s) 
don't get mixed up (i.e., multiple tokens at the same position with no 
indication of which original synonym phrase they came from).

By having those ID's for each multi-term synonym phrase, the caller of the list 
analyzer could then recontruct the tree of "OR" expressions for the various 
multi-term synonym phrases.


> queryparser parses on whitespace
> 
>
> Key: LUCENE-2605
> URL: https://issues.apache.org/jira/browse/LUCENE-2605
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: core/queryparser
>Reporter: Robert Muir
> Fix For: 4.1
>
>
> The queryparser parses input on whitespace, and sends each whitespace 
> separated term to its own independent token stream.
> This breaks the following at query-time, because they can't see across 
> whitespace boundaries:
> * n-gram analysis
> * shingles 
> * synonyms (especially multi-word for whitespace-separated languages)
> * languages where a 'word' can contain whitespace (e.g. vietnamese)
> Its also rather unexpected, as users think their 
> charfilters/tokenizers/tokenfilters will do the same thing at index and 
> querytime, but
> in many cases they can't. Instead, preferably the queryparser would parse 
> around only real 'operators'.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Text Extraction Using iText

2012-06-12 Thread Jack Krupansky
Start by looking at the Tika code that integrates PDFBox since that is exactly 
where you want to end up – if you want to integrate your code with Tika and 
SolrCell.

http://svn.apache.org/repos/asf/tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/pdf/
 

If you are going to replace PDFBox in Tika for SolrCell, that is one thing, but 
if you want to feed the output of your extractor directly to Solr from your own 
client application, see the Solr XML format and the SolrJ interface. 
Ultimately, your extractor will produce two things: 1) extracted content or 
body text, and 2) metadata, all of which are simply “fields” in a “Solr input 
document.”

http://wiki.apache.org/solr/UpdateXmlMessages 
http://wiki.apache.org/solr/Solrj

-- Jack Krupansky

From: Roland Ucker 
Sent: Tuesday, June 12, 2012 2:32 AM
To: dev@lucene.apache.org 
Subject: Text Extraction Using iText

Hello,

I would like to write my own pdf text/metadata extraction module using iText 
instead of tika/pdfbox.

Where to start? Any hints?

Regards,
Roland
 

[jira] [Comment Edited] (LUCENE-4132) IndexWriterConfig live settings

2012-06-12 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13293725#comment-13293725
 ] 

Shai Erera edited comment on LUCENE-4132 at 6/12/12 4:13 PM:
-

Phew, that was tricky, but here's the end result -- refactored 
IndexWriterConfig into the following class hierarchy:

* ReadOnlyConfig
** AbstractLiveConfig
*** LiveConfig
*** IndexWriterConfig

* IndexWriter now takes ReadOnlyConfig, which is an abstract class with all 
abstract getters.

* LiveConfig is returned from IndexWriter.getConfig(), and is initialized with 
the ReadOnlyConfig given to IW. It overrides all getters to delegate the call 
to the given (cloned) config. It is public but with a package-private ctor.

* IndexWriterConfig is still the entry object for users to initialize an 
IndexWriter, and adds its own setters for the non-live settings.

* The AbstractLiveConfig in the middle is used for generics and keeping the 
builder pattern. That way, LiveConfig.set1() and IndexWriterConfig.set1() 
return the proper type (LiveConfig or IndexWriterConfig respectively).

I would have liked IW to keep getting IWC in its ctor, but there's one test 
that prevents it: TestIndexWriterConfig.testIWCInvalidReuse, which initializes 
an IW, call getConfig and passes it to another IW (which is invalid). I don't 
know why it's invalid, as IW clones the given IWC, but that is one reason why I 
had to factor the getters out to a shared ReadOnlyConfig.

ROC is not that bad though -- it kind of protects against IW changing the given 
config ...

At least, no user code should change following these changes, except from 
changing the variable type used to cache IW.getConfig() to LiveConfig, which is 
what we want.

  was (Author: shaie):
Phew, that was tricky, but here's the end result -- refactored 
IndexWriterConfig into the following class hierarchy:

- ReadOnlyConfig
 |_ AbstractLiveConfig
   |_ LiveConfig
   |_ IndexWriterConfig

* IndexWriter now takes ReadOnlyConfig, which is an abstract class with all 
abstract getters.

* LiveConfig is returned from IndexWriter.getConfig(), and is initialized with 
the ReadOnlyConfig given to IW. It overrides all getters to delegate the call 
to the given (cloned) config. It is public but with a package-private ctor.

* IndexWriterConfig is still the entry object for users to initialize an 
IndexWriter, and adds its own setters for the non-live settings.

* The AbstractLiveConfig in the middle is used for generics and keeping the 
builder pattern. That way, LiveConfig.set1() and IndexWriterConfig.set1() 
return the proper type (LiveConfig or IndexWriterConfig respectively).

I would have liked IW to keep getting IWC in its ctor, but there's one test 
that prevents it: TestIndexWriterConfig.testIWCInvalidReuse, which initializes 
an IW, call getConfig and passes it to another IW (which is invalid). I don't 
know why it's invalid, as IW clones the given IWC, but that is one reason why I 
had to factor the getters out to a shared ReadOnlyConfig.

ROC is not that bad though -- it kind of protects against IW changing the given 
config ...

At least, no user code should change following these changes, except from 
changing the variable type used to cache IW.getConfig() to LiveConfig, which is 
what we want.
  
> IndexWriterConfig live settings
> ---
>
> Key: LUCENE-4132
> URL: https://issues.apache.org/jira/browse/LUCENE-4132
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Shai Erera
>Priority: Minor
> Fix For: 4.0, 5.0
>
> Attachments: LUCENE-4132.patch
>
>
> A while ago there was a discussion about making some IW settings "live" and I 
> remember that RAM buffer size was one of them. Judging from IW code, I see 
> that RAM buffer can be changed "live" as IW never caches it.
> However, I don't remember which other settings were decided to be "live" and 
> I don't see any documentation in IW nor IWC for that. IW.getConfig mentions:
> {code}
> * NOTE: some settings may be changed on the
> * returned {@link IndexWriterConfig}, and will take
> * effect in the current IndexWriter instance.  See the
> * javadocs for the specific setters in {@link
> * IndexWriterConfig} for details.
> {code}
> But there's no text on e.g. IWC.setRAMBuffer mentioning that.
> I think that it'd be good if we make it easier for users to tell which of the 
> settings are "live" ones. There are few possible ways to do it:
> * Introduce a custom @live.setting tag on the relevant IWC.set methods, and 
> add special text for them in build.xml
> ** Or, drop the tag and just document it clearly.
> * Separate IWC to two interfaces, LiveConfig and OneTimeConfig (name 
> proposals are welcome !), have IWC impl both, and introduce another 
> IW.ge

[jira] [Updated] (LUCENE-4132) IndexWriterConfig live settings

2012-06-12 Thread Shai Erera (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shai Erera updated LUCENE-4132:
---

Attachment: LUCENE-4132.patch

Phew, that was tricky, but here's the end result -- refactored 
IndexWriterConfig into the following class hierarchy:

- ReadOnlyConfig
 |_ AbstractLiveConfig
   |_ LiveConfig
   |_ IndexWriterConfig

* IndexWriter now takes ReadOnlyConfig, which is an abstract class with all 
abstract getters.

* LiveConfig is returned from IndexWriter.getConfig(), and is initialized with 
the ReadOnlyConfig given to IW. It overrides all getters to delegate the call 
to the given (cloned) config. It is public but with a package-private ctor.

* IndexWriterConfig is still the entry object for users to initialize an 
IndexWriter, and adds its own setters for the non-live settings.

* The AbstractLiveConfig in the middle is used for generics and keeping the 
builder pattern. That way, LiveConfig.set1() and IndexWriterConfig.set1() 
return the proper type (LiveConfig or IndexWriterConfig respectively).

I would have liked IW to keep getting IWC in its ctor, but there's one test 
that prevents it: TestIndexWriterConfig.testIWCInvalidReuse, which initializes 
an IW, call getConfig and passes it to another IW (which is invalid). I don't 
know why it's invalid, as IW clones the given IWC, but that is one reason why I 
had to factor the getters out to a shared ReadOnlyConfig.

ROC is not that bad though -- it kind of protects against IW changing the given 
config ...

At least, no user code should change following these changes, except from 
changing the variable type used to cache IW.getConfig() to LiveConfig, which is 
what we want.

> IndexWriterConfig live settings
> ---
>
> Key: LUCENE-4132
> URL: https://issues.apache.org/jira/browse/LUCENE-4132
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Shai Erera
>Priority: Minor
> Fix For: 4.0, 5.0
>
> Attachments: LUCENE-4132.patch
>
>
> A while ago there was a discussion about making some IW settings "live" and I 
> remember that RAM buffer size was one of them. Judging from IW code, I see 
> that RAM buffer can be changed "live" as IW never caches it.
> However, I don't remember which other settings were decided to be "live" and 
> I don't see any documentation in IW nor IWC for that. IW.getConfig mentions:
> {code}
> * NOTE: some settings may be changed on the
> * returned {@link IndexWriterConfig}, and will take
> * effect in the current IndexWriter instance.  See the
> * javadocs for the specific setters in {@link
> * IndexWriterConfig} for details.
> {code}
> But there's no text on e.g. IWC.setRAMBuffer mentioning that.
> I think that it'd be good if we make it easier for users to tell which of the 
> settings are "live" ones. There are few possible ways to do it:
> * Introduce a custom @live.setting tag on the relevant IWC.set methods, and 
> add special text for them in build.xml
> ** Or, drop the tag and just document it clearly.
> * Separate IWC to two interfaces, LiveConfig and OneTimeConfig (name 
> proposals are welcome !), have IWC impl both, and introduce another 
> IW.getLiveConfig which will return that interface, thereby clearly letting 
> the user know which of the settings are "live".
> It'd be good if IWC itself could only expose setXYZ methods for the "live" 
> settings though. So perhaps, off the top of my head, we can do something like 
> this:
> * Introduce a Config object, which is essentially what IWC is today, and pass 
> it to IW.
> * IW will create a different object, IWC from that Config and IW.getConfig 
> will return IWC.
> * IWC itself will only have setXYZ methods for the "live" settings.
> It adds another object, but user code doesn't change - it still creates a 
> Config object when initializing IW, and need to handle a different type if it 
> ever calls IW.getConfig.
> Maybe that's not such a bad idea?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (LUCENE-2605) queryparser parses on whitespace

2012-06-12 Thread John Berryman (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13293722#comment-13293722
 ] 

John Berryman edited comment on LUCENE-2605 at 6/12/12 4:09 PM:


There is somewhat of a workaround for this for defType=lucene. Just escape 
every whitespace with a slash. So instead of *{{new dress shoes}}* search for 
*{{new\ dress\ shoes}}*. Of course you lose the ability to use normal lucene 
syntax.

I was hoping that this workaround would also work for defType=dismax, but with 
or without the escaped whitespace, queries get interpreted the same, incorrect 
way. For instance, assume I have the following line in my synonyms.txt: 
*{{dress shoes => dress_shoes}}*. Further assume that I have a field 
*{{experiment}}* that gets analysed with synonyms. A search for *{{new dress 
shoes}}* (with or without escaped spaces) will be interpreted as 

*{{+((experiment:new)~0.01 (experiment:dress)~0.01 (experiment:shoes)~0.01) 
(experiment:"new dress_shoes"~3)~0.01}}*

The first clause is manditory and contains independently analysed tokens, so 
this will only match documents that contain "dress", "new", or "shoes", but 
never "dress shoes" because analysis takes place as expected at index time.

  was (Author: berryman):
There is somewhat of a workaround for this for defType=lucene. Just escape 
every whitespace with *{{\}}* . So instead of *{{new dress shoes}}* search for 
*{{new\ dress\ shoes}}*. Of course you lose the ability to use normal lucene 
syntax.

I was hoping that this workaround would also work for defType=dismax, but with 
or without the escaped whitespace, queries get interpreted the same, incorrect 
way. For instance, assume I have the following line in my synonyms.txt: 
*{{dress shoes => dress_shoes}}*. Further assume that I have a field 
*{{experiment}}* that gets analysed with synonyms. A search for *{{new dress 
shoes}}* (with or without escaped spaces) will be interpreted as 

*{{+((experiment:new)~0.01 (experiment:dress)~0.01 (experiment:shoes)~0.01) 
(experiment:"new dress_shoes"~3)~0.01}}*

The first clause is manditory and contains independently analysed tokens, so 
this will only match documents that contain "dress", "new", or "shoes", but 
never "dress shoes" because analysis takes place as expected at index time.
  
> queryparser parses on whitespace
> 
>
> Key: LUCENE-2605
> URL: https://issues.apache.org/jira/browse/LUCENE-2605
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: core/queryparser
>Reporter: Robert Muir
> Fix For: 4.1
>
>
> The queryparser parses input on whitespace, and sends each whitespace 
> separated term to its own independent token stream.
> This breaks the following at query-time, because they can't see across 
> whitespace boundaries:
> * n-gram analysis
> * shingles 
> * synonyms (especially multi-word for whitespace-separated languages)
> * languages where a 'word' can contain whitespace (e.g. vietnamese)
> Its also rather unexpected, as users think their 
> charfilters/tokenizers/tokenfilters will do the same thing at index and 
> querytime, but
> in many cases they can't. Instead, preferably the queryparser would parse 
> around only real 'operators'.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-2605) queryparser parses on whitespace

2012-06-12 Thread John Berryman (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13293722#comment-13293722
 ] 

John Berryman commented on LUCENE-2605:
---

There is somewhat of a workaround for this for defType=lucene. Just escape 
every whitespace with *{{\}}* . So instead of *{{new dress shoes}}* search for 
*{{new\ dress\ shoes}}*. Of course you lose the ability to use normal lucene 
syntax.

I was hoping that this workaround would also work for defType=dismax, but with 
or without the escaped whitespace, queries get interpreted the same, incorrect 
way. For instance, assume I have the following line in my synonyms.txt: 
*{{dress shoes => dress_shoes}}*. Further assume that I have a field 
*{{experiment}}* that gets analysed with synonyms. A search for *{{new dress 
shoes}}* (with or without escaped spaces) will be interpreted as 

*{{+((experiment:new)~0.01 (experiment:dress)~0.01 (experiment:shoes)~0.01) 
(experiment:"new dress_shoes"~3)~0.01}}*

The first clause is manditory and contains independently analysed tokens, so 
this will only match documents that contain "dress", "new", or "shoes", but 
never "dress shoes" because analysis takes place as expected at index time.

> queryparser parses on whitespace
> 
>
> Key: LUCENE-2605
> URL: https://issues.apache.org/jira/browse/LUCENE-2605
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: core/queryparser
>Reporter: Robert Muir
> Fix For: 4.1
>
>
> The queryparser parses input on whitespace, and sends each whitespace 
> separated term to its own independent token stream.
> This breaks the following at query-time, because they can't see across 
> whitespace boundaries:
> * n-gram analysis
> * shingles 
> * synonyms (especially multi-word for whitespace-separated languages)
> * languages where a 'word' can contain whitespace (e.g. vietnamese)
> Its also rather unexpected, as users think their 
> charfilters/tokenizers/tokenfilters will do the same thing at index and 
> querytime, but
> in many cases they can't. Instead, preferably the queryparser would parse 
> around only real 'operators'.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [JENKINS] Lucene-Solr-trunk-Windows-Java7-64 - Build # 300 - Still Failing!

2012-06-12 Thread Sami Siren
I am working on it,  I have identified at least one problem which is
that the overseer is killed too often in that test, i'll let the test
run locally for a bit and if everything looks good commit a fix
tomorrow.

--
 Sami Siren

On Tue, Jun 12, 2012 at 6:32 PM, Mark Miller  wrote:
> While working on the collections api, I have seen this on the odd occasion 
> locally as well.
>
> On Jun 12, 2012, at 10:43 AM, jenk...@sd-datasolutions.de wrote:
>
>> Build: 
>> http://jenkins.sd-datasolutions.de/job/Lucene-Solr-trunk-Windows-Java7-64/300/
>>
>> 1 tests failed.
>> FAILED:  org.apache.solr.cloud.OverseerTest.testShardLeaderChange
>>
>> Error Message:
>> Unexpected shard leader coll:collection1 shard:shard1 expected: but 
>> was:
>>
>> Stack Trace:
>> org.junit.ComparisonFailure: Unexpected shard leader coll:collection1 
>> shard:shard1 expected: but was:
>>       at 
>> __randomizedtesting.SeedInfo.seed([195A5E746C7F55C0:C709D98376E7A031]:0)
>>       at org.junit.Assert.assertEquals(Assert.java:125)
>>       at 
>> org.apache.solr.cloud.OverseerTest.verifyShardLeader(OverseerTest.java:522)
>>       at 
>> org.apache.solr.cloud.OverseerTest.testShardLeaderChange(OverseerTest.java:677)
>>       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>       at 
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>>       at 
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>       at java.lang.reflect.Method.invoke(Method.java:601)
>>       at 
>> com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1969)
>>       at 
>> com.carrotsearch.randomizedtesting.RandomizedRunner.access$1100(RandomizedRunner.java:132)
>>       at 
>> com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:814)
>>       at 
>> com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:875)
>>       at 
>> com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:889)
>>       at 
>> com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53)
>>       at 
>> org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50)
>>       at 
>> org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:32)
>>       at 
>> org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
>>       at 
>> com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
>>       at 
>> org.apache.lucene.util.TestRuleReportUncaughtExceptions$1.evaluate(TestRuleReportUncaughtExceptions.java:68)
>>       at 
>> org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:48)
>>       at 
>> org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
>>       at 
>> com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:821)
>>       at 
>> com.carrotsearch.randomizedtesting.RandomizedRunner.access$700(RandomizedRunner.java:132)
>>       at 
>> com.carrotsearch.randomizedtesting.RandomizedRunner$3$1.run(RandomizedRunner.java:669)
>>       at 
>> com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:695)
>>       at 
>> com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:734)
>>       at 
>> com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:745)
>>       at 
>> com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53)
>>       at 
>> org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
>>       at 
>> org.apache.lucene.util.TestRuleReportUncaughtExceptions$1.evaluate(TestRuleReportUncaughtExceptions.java:68)
>>       at 
>> org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:38)
>>       at 
>> org.apache.lucene.util.TestRuleIcuHack$1.evaluate(TestRuleIcuHack.java:51)
>>       at 
>> com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
>>       at 
>> org.apache.lucene.util.TestRuleNoInstanceHooksOverrides$1.evaluate(TestRuleNoInstanceHooksOverrides.java:53)
>>       at 
>> org.apache.lucene.util.TestRuleNoStaticHooksShadowing$1.evaluate(TestRuleNoStaticHooksShadowing.java:52)
>>       at 
>> org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:36)
>>       at 
>> org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
>>       at 
>> org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:56)
>>       at 
>> com.carrotsearch.randomizedtesting.RandomizedRunner.runSuite(RandomizedRunner.java:605)

Text Extraction Using iText

2012-06-12 Thread Roland Ucker
Hello,

I would like to write my own pdf text/metadata extraction
module using iText instead of tika/pdfbox.

Where to start? Any hints?

Regards,
Roland


Re: [JENKINS] Lucene-Solr-trunk-Windows-Java7-64 - Build # 300 - Still Failing!

2012-06-12 Thread Mark Miller
While working on the collections api, I have seen this on the odd occasion 
locally as well.

On Jun 12, 2012, at 10:43 AM, jenk...@sd-datasolutions.de wrote:

> Build: 
> http://jenkins.sd-datasolutions.de/job/Lucene-Solr-trunk-Windows-Java7-64/300/
> 
> 1 tests failed.
> FAILED:  org.apache.solr.cloud.OverseerTest.testShardLeaderChange
> 
> Error Message:
> Unexpected shard leader coll:collection1 shard:shard1 expected: but 
> was:
> 
> Stack Trace:
> org.junit.ComparisonFailure: Unexpected shard leader coll:collection1 
> shard:shard1 expected: but was:
>   at 
> __randomizedtesting.SeedInfo.seed([195A5E746C7F55C0:C709D98376E7A031]:0)
>   at org.junit.Assert.assertEquals(Assert.java:125)
>   at 
> org.apache.solr.cloud.OverseerTest.verifyShardLeader(OverseerTest.java:522)
>   at 
> org.apache.solr.cloud.OverseerTest.testShardLeaderChange(OverseerTest.java:677)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:601)
>   at 
> com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1969)
>   at 
> com.carrotsearch.randomizedtesting.RandomizedRunner.access$1100(RandomizedRunner.java:132)
>   at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:814)
>   at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:875)
>   at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:889)
>   at 
> com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53)
>   at 
> org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50)
>   at 
> org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:32)
>   at 
> org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
>   at 
> com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
>   at 
> org.apache.lucene.util.TestRuleReportUncaughtExceptions$1.evaluate(TestRuleReportUncaughtExceptions.java:68)
>   at 
> org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:48)
>   at 
> org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
>   at 
> com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:821)
>   at 
> com.carrotsearch.randomizedtesting.RandomizedRunner.access$700(RandomizedRunner.java:132)
>   at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$3$1.run(RandomizedRunner.java:669)
>   at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:695)
>   at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:734)
>   at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:745)
>   at 
> com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53)
>   at 
> org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
>   at 
> org.apache.lucene.util.TestRuleReportUncaughtExceptions$1.evaluate(TestRuleReportUncaughtExceptions.java:68)
>   at 
> org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:38)
>   at 
> org.apache.lucene.util.TestRuleIcuHack$1.evaluate(TestRuleIcuHack.java:51)
>   at 
> com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
>   at 
> org.apache.lucene.util.TestRuleNoInstanceHooksOverrides$1.evaluate(TestRuleNoInstanceHooksOverrides.java:53)
>   at 
> org.apache.lucene.util.TestRuleNoStaticHooksShadowing$1.evaluate(TestRuleNoStaticHooksShadowing.java:52)
>   at 
> org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:36)
>   at 
> org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
>   at 
> org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:56)
>   at 
> com.carrotsearch.randomizedtesting.RandomizedRunner.runSuite(RandomizedRunner.java:605)
>   at 
> com.carrotsearch.randomizedtesting.RandomizedRunner.access$400(RandomizedRunner.java:132)
>   at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$2.run(RandomizedRunner.java:551)
> 
> 
> 
> 
> Build Log:
> [...truncated 11002 lines...]
>   [junit4]   2>   at 
> org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:927)
>   [junit4]   2>   at

[jira] [Updated] (LUCENE-4139) multivalued field with offsets makes corrumpt index

2012-06-12 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-4139:


Attachment: LUCENE-4139.patch

updated patch, i renamed the prevOffset in writeOffset to offsetAccum (i think 
this is less misleading). also added a random test.

> multivalued field with offsets makes corrumpt index 
> 
>
> Key: LUCENE-4139
> URL: https://issues.apache.org/jira/browse/LUCENE-4139
> Project: Lucene - Java
>  Issue Type: Bug
>Affects Versions: 4.0
>Reporter: Robert Muir
> Attachments: LUCENE-4139.patch, LUCENE-4139.patch, 
> LUCENE-4139_test.patch, LUCENE-4139_test.patch
>
>
> I was trying to beef up TestBackwardsCompatibility (LUCENE-4085) but i 
> accidentally made a corrupt index due to a typo:
> {code}
> // a field with both offsets and term vectors for a cross-check
> FieldType customType3 = new FieldType(TextField.TYPE_STORED);
> customType3.setStoreTermVectors(true);
> customType3.setStoreTermVectorPositions(true);
> customType3.setStoreTermVectorOffsets(true);
> customType3.setIndexOptions(IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS);
> doc.add(new Field("content3", "here is more content with aaa aaa aaa", 
> customType3));
> // a field that omits only positions
> FieldType customType4 = new FieldType(TextField.TYPE_STORED);
> customType4.setStoreTermVectors(true);
> customType4.setStoreTermVectorPositions(false);
> customType4.setStoreTermVectorOffsets(true);
> customType4.setIndexOptions(IndexOptions.DOCS_AND_FREQS);
> // check out the copy-paste typo here! i forgot to change this to content4
>  doc.add(new Field("content3", "here is more content with aaa aaa aaa", 
> customType3));
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3440) FastVectorHighlighter: IDF-weighted terms for ordered fragments

2012-06-12 Thread Sebastian Lutze (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13293669#comment-13293669
 ] 

Sebastian Lutze commented on LUCENE-3440:
-

Hi Koji,

bq. I'm going to close and mark this issue as resolved because I think Lucene 
part has been completed. 

that's really awesome! 

bq. Can you open a separate issue for Solr part? 

Sure. 

bq. This is a great improvement for FVH. I really appreciate what you've done! 

It was an honor for me! :) 


> FastVectorHighlighter: IDF-weighted terms for ordered fragments 
> 
>
> Key: LUCENE-3440
> URL: https://issues.apache.org/jira/browse/LUCENE-3440
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: modules/highlighter
>Reporter: Sebastian Lutze
>Assignee: Koji Sekiguchi
>Priority: Minor
>  Labels: FastVectorHighlighter
> Fix For: 4.0, 5.0
>
> Attachments: LUCENE-3440.patch, LUCENE-3440.patch, LUCENE-3440.patch, 
> LUCENE-3440_3.6.1-SNAPSHOT.patch, LUCENE-4.0-SNAPSHOT-3440-9.patch, 
> weight-vs-boost_table01.html, weight-vs-boost_table02.html
>
>
> The FastVectorHighlighter uses for every term found in a fragment an equal 
> weight, which causes a higher ranking for fragments with a high number of 
> words or, in the worst case, a high number of very common words than 
> fragments that contains *all* of the terms used in the original query. 
> This patch provides ordered fragments with IDF-weighted terms: 
> total weight = total weight + IDF for unique term per fragment * boost of 
> query; 
> The ranking-formula should be the same, or at least similar, to that one used 
> in org.apache.lucene.search.highlight.QueryTermScorer.
> The patch is simple, but it works for us. 
> Some ideas:
> - A better approach would be moving the whole fragments-scoring into a 
> separate class.
> - Switch scoring via parameter 
> - Exact phrases should be given a even better score, regardless if a 
> phrase-query was executed or not
> - edismax/dismax-parameters pf, ps and pf^boost should be observed and 
> corresponding fragments should be ranked higher 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[JENKINS] Lucene-Solr-trunk-Windows-Java7-64 - Build # 300 - Still Failing!

2012-06-12 Thread jenkins
Build: 
http://jenkins.sd-datasolutions.de/job/Lucene-Solr-trunk-Windows-Java7-64/300/

1 tests failed.
FAILED:  org.apache.solr.cloud.OverseerTest.testShardLeaderChange

Error Message:
Unexpected shard leader coll:collection1 shard:shard1 expected: but 
was:

Stack Trace:
org.junit.ComparisonFailure: Unexpected shard leader coll:collection1 
shard:shard1 expected: but was:
at 
__randomizedtesting.SeedInfo.seed([195A5E746C7F55C0:C709D98376E7A031]:0)
at org.junit.Assert.assertEquals(Assert.java:125)
at 
org.apache.solr.cloud.OverseerTest.verifyShardLeader(OverseerTest.java:522)
at 
org.apache.solr.cloud.OverseerTest.testShardLeaderChange(OverseerTest.java:677)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1969)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.access$1100(RandomizedRunner.java:132)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:814)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:875)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:889)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53)
at 
org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50)
at 
org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:32)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
at 
org.apache.lucene.util.TestRuleReportUncaughtExceptions$1.evaluate(TestRuleReportUncaughtExceptions.java:68)
at 
org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:48)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:821)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.access$700(RandomizedRunner.java:132)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$3$1.run(RandomizedRunner.java:669)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:695)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:734)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:745)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
at 
org.apache.lucene.util.TestRuleReportUncaughtExceptions$1.evaluate(TestRuleReportUncaughtExceptions.java:68)
at 
org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:38)
at 
org.apache.lucene.util.TestRuleIcuHack$1.evaluate(TestRuleIcuHack.java:51)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
at 
org.apache.lucene.util.TestRuleNoInstanceHooksOverrides$1.evaluate(TestRuleNoInstanceHooksOverrides.java:53)
at 
org.apache.lucene.util.TestRuleNoStaticHooksShadowing$1.evaluate(TestRuleNoStaticHooksShadowing.java:52)
at 
org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:36)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
at 
org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:56)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.runSuite(RandomizedRunner.java:605)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.access$400(RandomizedRunner.java:132)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$2.run(RandomizedRunner.java:551)




Build Log:
[...truncated 11002 lines...]
   [junit4]   2>at 
org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:927)
   [junit4]   2>at 
org.apache.solr.common.cloud.SolrZkClient$7.execute(SolrZkClient.java:289)
   [junit4]   2>at 
org.apache.solr.common.cloud.SolrZkClient$7.execute(SolrZkClient.java:286)
   [junit4]   2>at 
org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecu

[jira] [Resolved] (LUCENE-3440) FastVectorHighlighter: IDF-weighted terms for ordered fragments

2012-06-12 Thread Koji Sekiguchi (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3440?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Sekiguchi resolved LUCENE-3440.


   Resolution: Fixed
Fix Version/s: 5.0
 Assignee: Koji Sekiguchi

Thanks, Sebastian!

> FastVectorHighlighter: IDF-weighted terms for ordered fragments 
> 
>
> Key: LUCENE-3440
> URL: https://issues.apache.org/jira/browse/LUCENE-3440
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: modules/highlighter
>Reporter: Sebastian Lutze
>Assignee: Koji Sekiguchi
>Priority: Minor
>  Labels: FastVectorHighlighter
> Fix For: 4.0, 5.0
>
> Attachments: LUCENE-3440.patch, LUCENE-3440.patch, LUCENE-3440.patch, 
> LUCENE-3440_3.6.1-SNAPSHOT.patch, LUCENE-4.0-SNAPSHOT-3440-9.patch, 
> weight-vs-boost_table01.html, weight-vs-boost_table02.html
>
>
> The FastVectorHighlighter uses for every term found in a fragment an equal 
> weight, which causes a higher ranking for fragments with a high number of 
> words or, in the worst case, a high number of very common words than 
> fragments that contains *all* of the terms used in the original query. 
> This patch provides ordered fragments with IDF-weighted terms: 
> total weight = total weight + IDF for unique term per fragment * boost of 
> query; 
> The ranking-formula should be the same, or at least similar, to that one used 
> in org.apache.lucene.search.highlight.QueryTermScorer.
> The patch is simple, but it works for us. 
> Some ideas:
> - A better approach would be moving the whole fragments-scoring into a 
> separate class.
> - Switch scoring via parameter 
> - Exact phrases should be given a even better score, regardless if a 
> phrase-query was executed or not
> - edismax/dismax-parameters pf, ps and pf^boost should be observed and 
> corresponding fragments should be ranked higher 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3440) FastVectorHighlighter: IDF-weighted terms for ordered fragments

2012-06-12 Thread Koji Sekiguchi (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13293653#comment-13293653
 ] 

Koji Sekiguchi commented on LUCENE-3440:


Hi Sebastian,

I've committed LUCENE-4133.

I'm going to close and mark this issue as resolved because I think Lucene part 
has been completed. Can you open a separate issue for Solr part?

This is a great improvement for FVH. I really appreciate what you've done!

> FastVectorHighlighter: IDF-weighted terms for ordered fragments 
> 
>
> Key: LUCENE-3440
> URL: https://issues.apache.org/jira/browse/LUCENE-3440
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: modules/highlighter
>Reporter: Sebastian Lutze
>Priority: Minor
>  Labels: FastVectorHighlighter
> Fix For: 4.0
>
> Attachments: LUCENE-3440.patch, LUCENE-3440.patch, LUCENE-3440.patch, 
> LUCENE-3440_3.6.1-SNAPSHOT.patch, LUCENE-4.0-SNAPSHOT-3440-9.patch, 
> weight-vs-boost_table01.html, weight-vs-boost_table02.html
>
>
> The FastVectorHighlighter uses for every term found in a fragment an equal 
> weight, which causes a higher ranking for fragments with a high number of 
> words or, in the worst case, a high number of very common words than 
> fragments that contains *all* of the terms used in the original query. 
> This patch provides ordered fragments with IDF-weighted terms: 
> total weight = total weight + IDF for unique term per fragment * boost of 
> query; 
> The ranking-formula should be the same, or at least similar, to that one used 
> in org.apache.lucene.search.highlight.QueryTermScorer.
> The patch is simple, but it works for us. 
> Some ideas:
> - A better approach would be moving the whole fragments-scoring into a 
> separate class.
> - Switch scoring via parameter 
> - Exact phrases should be given a even better score, regardless if a 
> phrase-query was executed or not
> - edismax/dismax-parameters pf, ps and pf^boost should be observed and 
> corresponding fragments should be ranked higher 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3534) dismax and edismax should default to "df" when "qf" is absent.

2012-06-12 Thread Jack Krupansky (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13293651#comment-13293651
 ] 

Jack Krupansky commented on SOLR-3534:
--

Looks like a reasonable compromise. I would edit the exception so that it reads 
like you just said: "neither 'qf', 'df' nor the default search field are 
present" or at least add "Neither" in front of what you currently have.

Any idea what happens for the classic/Solr or flex query parsers if default 
search field is not present?


> dismax and edismax should default to "df" when "qf" is absent.
> --
>
> Key: SOLR-3534
> URL: https://issues.apache.org/jira/browse/SOLR-3534
> Project: Solr
>  Issue Type: Improvement
>  Components: query parsers
>Affects Versions: 4.0
>Reporter: David Smiley
>Assignee: David Smiley
>Priority: Minor
> Attachments: 
> SOLR-3534_dismax_and_edismax_should_default_to_df_if_qf_is_absent.patch
>
>
> The dismax and edismax query parsers should default to "df" when the "qf" 
> parameter is absent.  They only use the defaultSearchField in schema.xml as a 
> fallback now.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3534) dismax and edismax should default to "df" when "qf" is absent.

2012-06-12 Thread David Smiley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13293647#comment-13293647
 ] 

David Smiley commented on SOLR-3534:


Bernd:
In my patch I did throw an exception if neither 'qf', 'df' nor the default 
search field were present.

It's tempting to log warnings if a default is relied upon that is inadvisable 
(like defaultSearchField), but that could flood logs.  A one-time flag could be 
set to prevent this I guess.

> dismax and edismax should default to "df" when "qf" is absent.
> --
>
> Key: SOLR-3534
> URL: https://issues.apache.org/jira/browse/SOLR-3534
> Project: Solr
>  Issue Type: Improvement
>  Components: query parsers
>Affects Versions: 4.0
>Reporter: David Smiley
>Assignee: David Smiley
>Priority: Minor
> Attachments: 
> SOLR-3534_dismax_and_edismax_should_default_to_df_if_qf_is_absent.patch
>
>
> The dismax and edismax query parsers should default to "df" when the "qf" 
> parameter is absent.  They only use the defaultSearchField in schema.xml as a 
> fallback now.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-4133) FastVectorHighlighter: A weighted approach for ordered fragments

2012-06-12 Thread Koji Sekiguchi (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4133?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Sekiguchi resolved LUCENE-4133.


   Resolution: Fixed
Fix Version/s: 5.0

Committed in trunk and 4x.

> FastVectorHighlighter: A weighted approach for ordered fragments
> 
>
> Key: LUCENE-4133
> URL: https://issues.apache.org/jira/browse/LUCENE-4133
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: modules/highlighter
>Affects Versions: 4.0, 5.0
>Reporter: Sebastian Lutze
>Assignee: Koji Sekiguchi
>Priority: Minor
>  Labels: FastVectorHighlighter
> Fix For: 4.0, 5.0
>
> Attachments: LUCENE-4133.patch, LUCENE-4133.patch
>
>
> The FastVectorHighlighter currently disregards IDF-weights for matching terms 
> within generated fragments. In the worst case, a fragment, which contains 
> high number of very common words, is scored higher, than a fragment that 
> contains *all* of the terms which have been used in the original query.
> This patch provides ordered fragments with IDF-weighted terms:
> *For each distinct matching term per fragment:* 
> _weight = weight + IDF * boost_
> *For each fragment:* 
> _weight = weight * length * 1 / sqrt( length )_
> |weight| total weight of fragment 
> |IDF| inverse document frequency for each distinct matching term
> |boost| query boost as provided, for example _term^2_
> |length| total number of non-distinct matching terms per fragment 
> *Method:*
> {code:java}
>   public void add( int startOffset, int endOffset, List 
> phraseInfoList ) {
> 
> float totalBoost = 0;
> 
> List subInfos = new ArrayList();
> HashSet distinctTerms = new HashSet();
> 
> int length = 0;
> for( WeightedPhraseInfo phraseInfo : phraseInfoList ){
>   subInfos.add( new SubInfo( phraseInfo.getText(), 
> phraseInfo.getTermsOffsets(), phraseInfo.getSeqnum() ) );
>   for ( TermInfo ti :  phraseInfo.getTermsInfos()) {
> if ( distinctTerms.add( ti.getText() ) )
>   totalBoost += ti.getWeight() * phraseInfo.getBoost();
> length++;
>   }
> }
> totalBoost *= length * ( 1 / Math.sqrt( length ) );
> 
> getFragInfos().add( new WeightedFragInfo( startOffset, endOffset, 
> subInfos, totalBoost ) );
>   }
> {code}
> The ranking-formula should be the same, or at least similar, to that one used 
> in QueryTermScorer.
> *This patch contains:*
> * a changed class-member in FieldPhraseList (termInfos to termsInfos)
> * a changed local variable in SimpleFieldFragList (score to totalBoost)
> * adds a missing @override in SimpleFragListBuilder
> * class WeightedFieldFragList, a implementation of FieldFragList
> * class WeightedFragListBuilder, a implementation of BaseFragListBuilder
> * class WeightedFragListBuilderTest, a simple test-case 
> * updated docs for FVH 
> Last part (see also LUCENE-4091, LUCENE-4107, LUCENE-4113) of LUCENE-3440. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-3534) dismax and edismax should default to "df" when "qf" is absent.

2012-06-12 Thread David Smiley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-3534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Smiley updated SOLR-3534:
---

Attachment: 
SOLR-3534_dismax_and_edismax_should_default_to_df_if_qf_is_absent.patch

Attached is a patch, with a test.  I pulled out this logic into a static method 
so that both dismax and edismax could use it, just as it was done for parsing 
MM.  I'll apply this patch tomorrow.

> dismax and edismax should default to "df" when "qf" is absent.
> --
>
> Key: SOLR-3534
> URL: https://issues.apache.org/jira/browse/SOLR-3534
> Project: Solr
>  Issue Type: Improvement
>  Components: query parsers
>Affects Versions: 4.0
>Reporter: David Smiley
>Assignee: David Smiley
>Priority: Minor
> Attachments: 
> SOLR-3534_dismax_and_edismax_should_default_to_df_if_qf_is_absent.patch
>
>
> The dismax and edismax query parsers should default to "df" when the "qf" 
> parameter is absent.  They only use the defaultSearchField in schema.xml as a 
> fallback now.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-2899) Add OpenNLP Analysis capabilities as a module

2012-06-12 Thread Joern Kottmann (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13293589#comment-13293589
 ] 

Joern Kottmann commented on LUCENE-2899:


For a test you can run OpenNLP just over a piece of training data, even when 
trained on a tiny amount of data this will give good results. It does not test 
OpenNLP, but is sufficient for the desired interface testing.

> Add OpenNLP Analysis capabilities as a module
> -
>
> Key: LUCENE-2899
> URL: https://issues.apache.org/jira/browse/LUCENE-2899
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: modules/analysis
>Reporter: Grant Ingersoll
>Assignee: Grant Ingersoll
>Priority: Minor
> Attachments: LUCENE-2899.patch, opennlp_trunk.patch
>
>
> Now that OpenNLP is an ASF project and has a nice license, it would be nice 
> to have a submodule (under analysis) that exposed capabilities for it. Drew 
> Farris, Tom Morton and I have code that does:
> * Sentence Detection as a Tokenizer (could also be a TokenFilter, although it 
> would have to change slightly to buffer tokens)
> * NamedEntity recognition as a TokenFilter
> We are also planning a Tokenizer/TokenFilter that can put parts of speech as 
> either payloads (PartOfSpeechAttribute?) on a token or at the same position.
> I'd propose it go under:
> modules/analysis/opennlp

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-2899) Add OpenNLP Analysis capabilities as a module

2012-06-12 Thread Grant Ingersoll (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13293582#comment-13293582
 ] 

Grant Ingersoll commented on LUCENE-2899:
-

This really should just be a part of the analysis modules (with the exception 
of the Solr example parts).  I don't know exactly how we are handling Solr 
examples anymore, but I seem to recall the general consensus was to not 
proliferate them.  Can we just expose the functionality in the main one?

I'll update the patch to move this to the module for starters.  Not sure on 
what to do w/ the example part.

> Add OpenNLP Analysis capabilities as a module
> -
>
> Key: LUCENE-2899
> URL: https://issues.apache.org/jira/browse/LUCENE-2899
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: modules/analysis
>Reporter: Grant Ingersoll
>Assignee: Grant Ingersoll
>Priority: Minor
> Attachments: LUCENE-2899.patch, opennlp_trunk.patch
>
>
> Now that OpenNLP is an ASF project and has a nice license, it would be nice 
> to have a submodule (under analysis) that exposed capabilities for it. Drew 
> Farris, Tom Morton and I have code that does:
> * Sentence Detection as a Tokenizer (could also be a TokenFilter, although it 
> would have to change slightly to buffer tokens)
> * NamedEntity recognition as a TokenFilter
> We are also planning a Tokenizer/TokenFilter that can put parts of speech as 
> either payloads (PartOfSpeechAttribute?) on a token or at the same position.
> I'd propose it go under:
> modules/analysis/opennlp

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4139) multivalued field with offsets makes corrumpt index

2012-06-12 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13293575#comment-13293575
 ] 

Robert Muir commented on LUCENE-4139:
-

{quote}
The problem is more complicated:
How would you sum up offsets for Multivalued fields? How to correctly do this? 
If you just sum up the offsets, they don't help you anymore with higlighting 
(if you get multiple stored fields), although I have no idea how this should 
work at all (highlighting MV fields)...
{quote}

Not really: TermVectorsConsumer does this fine and has for many lucene 
releases. The problem is FreqProxTermsWriter does it wrong. see the patch.

> multivalued field with offsets makes corrumpt index 
> 
>
> Key: LUCENE-4139
> URL: https://issues.apache.org/jira/browse/LUCENE-4139
> Project: Lucene - Java
>  Issue Type: Bug
>Affects Versions: 4.0
>Reporter: Robert Muir
> Attachments: LUCENE-4139.patch, LUCENE-4139_test.patch, 
> LUCENE-4139_test.patch
>
>
> I was trying to beef up TestBackwardsCompatibility (LUCENE-4085) but i 
> accidentally made a corrupt index due to a typo:
> {code}
> // a field with both offsets and term vectors for a cross-check
> FieldType customType3 = new FieldType(TextField.TYPE_STORED);
> customType3.setStoreTermVectors(true);
> customType3.setStoreTermVectorPositions(true);
> customType3.setStoreTermVectorOffsets(true);
> customType3.setIndexOptions(IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS);
> doc.add(new Field("content3", "here is more content with aaa aaa aaa", 
> customType3));
> // a field that omits only positions
> FieldType customType4 = new FieldType(TextField.TYPE_STORED);
> customType4.setStoreTermVectors(true);
> customType4.setStoreTermVectorPositions(false);
> customType4.setStoreTermVectorOffsets(true);
> customType4.setIndexOptions(IndexOptions.DOCS_AND_FREQS);
> // check out the copy-paste typo here! i forgot to change this to content4
>  doc.add(new Field("content3", "here is more content with aaa aaa aaa", 
> customType3));
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-4139) multivalued field with offsets makes corrumpt index

2012-06-12 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-4139:


Attachment: LUCENE-4139.patch

patch... needs review and maybe suggestions on how to make it more intuitive: 
but fixes the bug

> multivalued field with offsets makes corrumpt index 
> 
>
> Key: LUCENE-4139
> URL: https://issues.apache.org/jira/browse/LUCENE-4139
> Project: Lucene - Java
>  Issue Type: Bug
>Affects Versions: 4.0
>Reporter: Robert Muir
> Attachments: LUCENE-4139.patch, LUCENE-4139_test.patch, 
> LUCENE-4139_test.patch
>
>
> I was trying to beef up TestBackwardsCompatibility (LUCENE-4085) but i 
> accidentally made a corrupt index due to a typo:
> {code}
> // a field with both offsets and term vectors for a cross-check
> FieldType customType3 = new FieldType(TextField.TYPE_STORED);
> customType3.setStoreTermVectors(true);
> customType3.setStoreTermVectorPositions(true);
> customType3.setStoreTermVectorOffsets(true);
> customType3.setIndexOptions(IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS);
> doc.add(new Field("content3", "here is more content with aaa aaa aaa", 
> customType3));
> // a field that omits only positions
> FieldType customType4 = new FieldType(TextField.TYPE_STORED);
> customType4.setStoreTermVectors(true);
> customType4.setStoreTermVectorPositions(false);
> customType4.setStoreTermVectorOffsets(true);
> customType4.setIndexOptions(IndexOptions.DOCS_AND_FREQS);
> // check out the copy-paste typo here! i forgot to change this to content4
>  doc.add(new Field("content3", "here is more content with aaa aaa aaa", 
> customType3));
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[JENKINS] Lucene-Solr-trunk-Windows-Java6-64 - Build # 513 - Failure!

2012-06-12 Thread jenkins
Build: 
http://jenkins.sd-datasolutions.de/job/Lucene-Solr-trunk-Windows-Java6-64/513/

1 tests failed.
REGRESSION:  org.apache.solr.handler.TestReplicationHandler.test

Error Message:
expected:<498> but was:<0>

Stack Trace:
java.lang.AssertionError: expected:<498> but was:<0>
at 
__randomizedtesting.SeedInfo.seed([1CD78192E87C5B56:9483BE48468036AE]:0)
at org.junit.Assert.fail(Assert.java:93)
at org.junit.Assert.failNotEquals(Assert.java:647)
at org.junit.Assert.assertEquals(Assert.java:128)
at org.junit.Assert.assertEquals(Assert.java:472)
at org.junit.Assert.assertEquals(Assert.java:456)
at 
org.apache.solr.handler.TestReplicationHandler.doTestIndexAndConfigReplication(TestReplicationHandler.java:391)
at 
org.apache.solr.handler.TestReplicationHandler.test(TestReplicationHandler.java:250)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1969)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.access$1100(RandomizedRunner.java:132)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:814)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:875)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:889)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53)
at 
org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50)
at 
org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:32)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
at 
org.apache.lucene.util.TestRuleReportUncaughtExceptions$1.evaluate(TestRuleReportUncaughtExceptions.java:68)
at 
org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:48)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:821)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.access$700(RandomizedRunner.java:132)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$3$1.run(RandomizedRunner.java:669)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:695)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:734)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:745)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
at 
org.apache.lucene.util.TestRuleReportUncaughtExceptions$1.evaluate(TestRuleReportUncaughtExceptions.java:68)
at 
org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:38)
at 
org.apache.lucene.util.TestRuleIcuHack$1.evaluate(TestRuleIcuHack.java:51)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
at 
org.apache.lucene.util.TestRuleNoInstanceHooksOverrides$1.evaluate(TestRuleNoInstanceHooksOverrides.java:53)
at 
org.apache.lucene.util.TestRuleNoStaticHooksShadowing$1.evaluate(TestRuleNoStaticHooksShadowing.java:52)
at 
org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:36)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
at 
org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:56)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.runSuite(RandomizedRunner.java:605)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.access$400(RandomizedRunner.java:132)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$2.run(RandomizedRunner.java:551)




Build Log:
[...truncated 13340 lines...]
   [junit4]   2> 37703 T1436 C63 REQ [collection1] webapp=/solr 
path=/replication 
params={command=filecontent&checksum=true&generation=7&wt=filestream&file=_3.si}
 status=0 QTime=0 
   [junit4]   2> 37708 T1436 C63 REQ [

[jira] [Commented] (LUCENE-4139) multivalued field with offsets makes corrumpt index

2012-06-12 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13293561#comment-13293561
 ] 

Uwe Schindler commented on LUCENE-4139:
---

The problem is more complicated:
How would you sum up offsets for Multivalued fields? How to correctly do this? 
If you just sum up the offsets, they don't help you anymore with higlighting 
(if you get multiple stored fields), although I have no idea how this should 
work at all (highlighting MV fields)...

> multivalued field with offsets makes corrumpt index 
> 
>
> Key: LUCENE-4139
> URL: https://issues.apache.org/jira/browse/LUCENE-4139
> Project: Lucene - Java
>  Issue Type: Bug
>Affects Versions: 4.0
>Reporter: Robert Muir
> Attachments: LUCENE-4139_test.patch, LUCENE-4139_test.patch
>
>
> I was trying to beef up TestBackwardsCompatibility (LUCENE-4085) but i 
> accidentally made a corrupt index due to a typo:
> {code}
> // a field with both offsets and term vectors for a cross-check
> FieldType customType3 = new FieldType(TextField.TYPE_STORED);
> customType3.setStoreTermVectors(true);
> customType3.setStoreTermVectorPositions(true);
> customType3.setStoreTermVectorOffsets(true);
> customType3.setIndexOptions(IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS);
> doc.add(new Field("content3", "here is more content with aaa aaa aaa", 
> customType3));
> // a field that omits only positions
> FieldType customType4 = new FieldType(TextField.TYPE_STORED);
> customType4.setStoreTermVectors(true);
> customType4.setStoreTermVectorPositions(false);
> customType4.setStoreTermVectorOffsets(true);
> customType4.setIndexOptions(IndexOptions.DOCS_AND_FREQS);
> // check out the copy-paste typo here! i forgot to change this to content4
>  doc.add(new Field("content3", "here is more content with aaa aaa aaa", 
> customType3));
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-2899) Add OpenNLP Analysis capabilities as a module

2012-06-12 Thread Grant Ingersoll (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13293562#comment-13293562
 ] 

Grant Ingersoll commented on LUCENE-2899:
-

Cool!  

I think if we could just get a very small model that can be checked in and used 
for testing purposes, that is all that would be needed.  We don't really need 
to test OpenNLP, we just need to test that the code properly interfaces with 
OpenNLP, so a really small model should be fine.  

> Add OpenNLP Analysis capabilities as a module
> -
>
> Key: LUCENE-2899
> URL: https://issues.apache.org/jira/browse/LUCENE-2899
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: modules/analysis
>Reporter: Grant Ingersoll
>Assignee: Grant Ingersoll
>Priority: Minor
> Attachments: LUCENE-2899.patch, opennlp_trunk.patch
>
>
> Now that OpenNLP is an ASF project and has a nice license, it would be nice 
> to have a submodule (under analysis) that exposed capabilities for it. Drew 
> Farris, Tom Morton and I have code that does:
> * Sentence Detection as a Tokenizer (could also be a TokenFilter, although it 
> would have to change slightly to buffer tokens)
> * NamedEntity recognition as a TokenFilter
> We are also planning a Tokenizer/TokenFilter that can put parts of speech as 
> either payloads (PartOfSpeechAttribute?) on a token or at the same position.
> I'd propose it go under:
> modules/analysis/opennlp

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-2899) Add OpenNLP Analysis capabilities as a module

2012-06-12 Thread Joern Kottmann (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13293560#comment-13293560
 ] 

Joern Kottmann commented on LUCENE-2899:


I am using this mentioned Corpus Server together with the Apache UIMA Cas 
Editor for labeling projects. If someone wants to set something up to label 
data we (OpenNLP people) are happy to help with that!

> Add OpenNLP Analysis capabilities as a module
> -
>
> Key: LUCENE-2899
> URL: https://issues.apache.org/jira/browse/LUCENE-2899
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: modules/analysis
>Reporter: Grant Ingersoll
>Assignee: Grant Ingersoll
>Priority: Minor
> Attachments: LUCENE-2899.patch, opennlp_trunk.patch
>
>
> Now that OpenNLP is an ASF project and has a nice license, it would be nice 
> to have a submodule (under analysis) that exposed capabilities for it. Drew 
> Farris, Tom Morton and I have code that does:
> * Sentence Detection as a Tokenizer (could also be a TokenFilter, although it 
> would have to change slightly to buffer tokens)
> * NamedEntity recognition as a TokenFilter
> We are also planning a Tokenizer/TokenFilter that can put parts of speech as 
> either payloads (PartOfSpeechAttribute?) on a token or at the same position.
> I'd propose it go under:
> modules/analysis/opennlp

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-4141) don't allow Analyzer.offsetGap/posIncGap to be negative

2012-06-12 Thread Robert Muir (JIRA)
Robert Muir created LUCENE-4141:
---

 Summary: don't allow Analyzer.offsetGap/posIncGap to be negative
 Key: LUCENE-4141
 URL: https://issues.apache.org/jira/browse/LUCENE-4141
 Project: Lucene - Java
  Issue Type: Bug
Affects Versions: 4.0
Reporter: Robert Muir


unrelated but i thought about this looking at LUCENE-4139: we should check this 
doesnt make a corrupt index but instead that IW throws a reasonable exception.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4139) multivalued field with offsets makes corrumpt index

2012-06-12 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13293554#comment-13293554
 ] 

Robert Muir commented on LUCENE-4139:
-

Looks like we arent summing up offsets correctly for multivalued fields, thus 
they go backwards.
I added this assert to the postingswriter:
  assert offsetDelta >= 0 && offsetLength >= 0 : "startOffset=" + 
startOffset + ",lastOffset=" + lastOffset + ",endOffset=" + endOffset;

   [junit4]> Throwable #1: java.lang.AssertionError: 
startOffset=26,lastOffset=34,endOffset=29
   [junit4]>at 
__randomizedtesting.SeedInfo.seed([76B886A04FD18EEC:D9439B78AFF692]:0)
   [junit4]>at 
org.apache.lucene.codecs.lucene40.Lucene40PostingsWriter.addPosition(Lucene40PostingsWriter.java:255)

> multivalued field with offsets makes corrumpt index 
> 
>
> Key: LUCENE-4139
> URL: https://issues.apache.org/jira/browse/LUCENE-4139
> Project: Lucene - Java
>  Issue Type: Bug
>Affects Versions: 4.0
>Reporter: Robert Muir
> Attachments: LUCENE-4139_test.patch, LUCENE-4139_test.patch
>
>
> I was trying to beef up TestBackwardsCompatibility (LUCENE-4085) but i 
> accidentally made a corrupt index due to a typo:
> {code}
> // a field with both offsets and term vectors for a cross-check
> FieldType customType3 = new FieldType(TextField.TYPE_STORED);
> customType3.setStoreTermVectors(true);
> customType3.setStoreTermVectorPositions(true);
> customType3.setStoreTermVectorOffsets(true);
> customType3.setIndexOptions(IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS);
> doc.add(new Field("content3", "here is more content with aaa aaa aaa", 
> customType3));
> // a field that omits only positions
> FieldType customType4 = new FieldType(TextField.TYPE_STORED);
> customType4.setStoreTermVectors(true);
> customType4.setStoreTermVectorPositions(false);
> customType4.setStoreTermVectorOffsets(true);
> customType4.setIndexOptions(IndexOptions.DOCS_AND_FREQS);
> // check out the copy-paste typo here! i forgot to change this to content4
>  doc.add(new Field("content3", "here is more content with aaa aaa aaa", 
> customType3));
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[JENKINS] Lucene-Solr-trunk-Windows-Java7-64 - Build # 299 - Still Failing!

2012-06-12 Thread jenkins
Build: 
http://jenkins.sd-datasolutions.de/job/Lucene-Solr-trunk-Windows-Java7-64/299/

2 tests failed.
REGRESSION:  org.apache.solr.cloud.OverseerTest.testShardLeaderChange

Error Message:
Unexpected shard leader coll:collection1 shard:shard1 expected: but 
was:

Stack Trace:
org.junit.ComparisonFailure: Unexpected shard leader coll:collection1 
shard:shard1 expected: but was:
at 
__randomizedtesting.SeedInfo.seed([742AB9E72396E621:AA793E10390E13D0]:0)
at org.junit.Assert.assertEquals(Assert.java:125)
at 
org.apache.solr.cloud.OverseerTest.verifyShardLeader(OverseerTest.java:522)
at 
org.apache.solr.cloud.OverseerTest.testShardLeaderChange(OverseerTest.java:677)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1969)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.access$1100(RandomizedRunner.java:132)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:814)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:875)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:889)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53)
at 
org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50)
at 
org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:32)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
at 
org.apache.lucene.util.TestRuleReportUncaughtExceptions$1.evaluate(TestRuleReportUncaughtExceptions.java:68)
at 
org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:48)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:821)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.access$700(RandomizedRunner.java:132)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$3$1.run(RandomizedRunner.java:669)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:695)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:734)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:745)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
at 
org.apache.lucene.util.TestRuleReportUncaughtExceptions$1.evaluate(TestRuleReportUncaughtExceptions.java:68)
at 
org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:38)
at 
org.apache.lucene.util.TestRuleIcuHack$1.evaluate(TestRuleIcuHack.java:51)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
at 
org.apache.lucene.util.TestRuleNoInstanceHooksOverrides$1.evaluate(TestRuleNoInstanceHooksOverrides.java:53)
at 
org.apache.lucene.util.TestRuleNoStaticHooksShadowing$1.evaluate(TestRuleNoStaticHooksShadowing.java:52)
at 
org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:36)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
at 
org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:56)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.runSuite(RandomizedRunner.java:605)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.access$400(RandomizedRunner.java:132)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$2.run(RandomizedRunner.java:551)


FAILED:  org.apache.solr.handler.TestReplicationHandler.test

Error Message:
expected:<498> but was:<0>

Stack Trace:
java.lang.AssertionError: expected:<498> but was:<0>
at 
__randomizedtesting.SeedInfo.seed([742AB9E72396E621:FC7E863D8D6A8BD9]:0)
at org.junit.Assert.fail(Assert.java:93)
at org.junit.Assert.failNotEquals(Assert.java:647)
at org.junit.Assert.assertEquals(Assert.java:128)
  

[jira] [Commented] (LUCENE-4139) multivalued field with offsets makes corrumpt index

2012-06-12 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13293553#comment-13293553
 ] 

Robert Muir commented on LUCENE-4139:
-

I dont know whats going on with offsets for multivalued fields: will try to dig:
{noformat}
java.lang.RuntimeException: vector term=[61 61 61] field=content3 doc=0: 
startOffset=64 differs from postings startOffset=-2147483622
{noformat}

> multivalued field with offsets makes corrumpt index 
> 
>
> Key: LUCENE-4139
> URL: https://issues.apache.org/jira/browse/LUCENE-4139
> Project: Lucene - Java
>  Issue Type: Bug
>Affects Versions: 4.0
>Reporter: Robert Muir
> Attachments: LUCENE-4139_test.patch, LUCENE-4139_test.patch
>
>
> I was trying to beef up TestBackwardsCompatibility (LUCENE-4085) but i 
> accidentally made a corrupt index due to a typo:
> {code}
> // a field with both offsets and term vectors for a cross-check
> FieldType customType3 = new FieldType(TextField.TYPE_STORED);
> customType3.setStoreTermVectors(true);
> customType3.setStoreTermVectorPositions(true);
> customType3.setStoreTermVectorOffsets(true);
> customType3.setIndexOptions(IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS);
> doc.add(new Field("content3", "here is more content with aaa aaa aaa", 
> customType3));
> // a field that omits only positions
> FieldType customType4 = new FieldType(TextField.TYPE_STORED);
> customType4.setStoreTermVectors(true);
> customType4.setStoreTermVectorPositions(false);
> customType4.setStoreTermVectorOffsets(true);
> customType4.setIndexOptions(IndexOptions.DOCS_AND_FREQS);
> // check out the copy-paste typo here! i forgot to change this to content4
>  doc.add(new Field("content3", "here is more content with aaa aaa aaa", 
> customType3));
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-4139) multivalued field with offsets makes corrumpt index

2012-06-12 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-4139:


Summary: multivalued field with offsets makes corrumpt index   (was: mixing 
up indexOptions in same IW session makes corrumpt index )

> multivalued field with offsets makes corrumpt index 
> 
>
> Key: LUCENE-4139
> URL: https://issues.apache.org/jira/browse/LUCENE-4139
> Project: Lucene - Java
>  Issue Type: Bug
>Affects Versions: 4.0
>Reporter: Robert Muir
> Attachments: LUCENE-4139_test.patch, LUCENE-4139_test.patch
>
>
> I was trying to beef up TestBackwardsCompatibility (LUCENE-4085) but i 
> accidentally made a corrupt index due to a typo:
> {code}
> // a field with both offsets and term vectors for a cross-check
> FieldType customType3 = new FieldType(TextField.TYPE_STORED);
> customType3.setStoreTermVectors(true);
> customType3.setStoreTermVectorPositions(true);
> customType3.setStoreTermVectorOffsets(true);
> customType3.setIndexOptions(IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS);
> doc.add(new Field("content3", "here is more content with aaa aaa aaa", 
> customType3));
> // a field that omits only positions
> FieldType customType4 = new FieldType(TextField.TYPE_STORED);
> customType4.setStoreTermVectors(true);
> customType4.setStoreTermVectorPositions(false);
> customType4.setStoreTermVectorOffsets(true);
> customType4.setIndexOptions(IndexOptions.DOCS_AND_FREQS);
> // check out the copy-paste typo here! i forgot to change this to content4
>  doc.add(new Field("content3", "here is more content with aaa aaa aaa", 
> customType3));
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-4139) mixing up indexOptions in same IW session makes corrumpt index

2012-06-12 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-4139:


Attachment: LUCENE-4139_test.patch

updated test: actually the bug has nothing to do with mixing up fieldtypes, as 
i forget to use the new fieldtype too.

it happens when you have a multivalued field.

> mixing up indexOptions in same IW session makes corrumpt index 
> ---
>
> Key: LUCENE-4139
> URL: https://issues.apache.org/jira/browse/LUCENE-4139
> Project: Lucene - Java
>  Issue Type: Bug
>Affects Versions: 4.0
>Reporter: Robert Muir
> Attachments: LUCENE-4139_test.patch, LUCENE-4139_test.patch
>
>
> I was trying to beef up TestBackwardsCompatibility (LUCENE-4085) but i 
> accidentally made a corrupt index due to a typo:
> {code}
> // a field with both offsets and term vectors for a cross-check
> FieldType customType3 = new FieldType(TextField.TYPE_STORED);
> customType3.setStoreTermVectors(true);
> customType3.setStoreTermVectorPositions(true);
> customType3.setStoreTermVectorOffsets(true);
> customType3.setIndexOptions(IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS);
> doc.add(new Field("content3", "here is more content with aaa aaa aaa", 
> customType3));
> // a field that omits only positions
> FieldType customType4 = new FieldType(TextField.TYPE_STORED);
> customType4.setStoreTermVectors(true);
> customType4.setStoreTermVectorPositions(false);
> customType4.setStoreTermVectorOffsets(true);
> customType4.setIndexOptions(IndexOptions.DOCS_AND_FREQS);
> // check out the copy-paste typo here! i forgot to change this to content4
>  doc.add(new Field("content3", "here is more content with aaa aaa aaa", 
> customType3));
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-2899) Add OpenNLP Analysis capabilities as a module

2012-06-12 Thread Tommaso Teofili (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13293529#comment-13293529
 ] 

Tommaso Teofili commented on LUCENE-2899:
-

bq. I wonder how hard it would be to create much smaller ones based on training 
just a few things.
there was the idea of using the OpenNLP CorpusServer with some wikinews 
articles to train them (back to OPENNLP-385)

> Add OpenNLP Analysis capabilities as a module
> -
>
> Key: LUCENE-2899
> URL: https://issues.apache.org/jira/browse/LUCENE-2899
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: modules/analysis
>Reporter: Grant Ingersoll
>Assignee: Grant Ingersoll
>Priority: Minor
> Attachments: LUCENE-2899.patch, opennlp_trunk.patch
>
>
> Now that OpenNLP is an ASF project and has a nice license, it would be nice 
> to have a submodule (under analysis) that exposed capabilities for it. Drew 
> Farris, Tom Morton and I have code that does:
> * Sentence Detection as a Tokenizer (could also be a TokenFilter, although it 
> would have to change slightly to buffer tokens)
> * NamedEntity recognition as a TokenFilter
> We are also planning a Tokenizer/TokenFilter that can put parts of speech as 
> either payloads (PartOfSpeechAttribute?) on a token or at the same position.
> I'd propose it go under:
> modules/analysis/opennlp

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-4139) mixing up indexOptions in same IW session makes corrumpt index

2012-06-12 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-4139:


Attachment: LUCENE-4139_test.patch

simple test.

> mixing up indexOptions in same IW session makes corrumpt index 
> ---
>
> Key: LUCENE-4139
> URL: https://issues.apache.org/jira/browse/LUCENE-4139
> Project: Lucene - Java
>  Issue Type: Bug
>Affects Versions: 4.0
>Reporter: Robert Muir
> Attachments: LUCENE-4139_test.patch
>
>
> I was trying to beef up TestBackwardsCompatibility (LUCENE-4085) but i 
> accidentally made a corrupt index due to a typo:
> {code}
> // a field with both offsets and term vectors for a cross-check
> FieldType customType3 = new FieldType(TextField.TYPE_STORED);
> customType3.setStoreTermVectors(true);
> customType3.setStoreTermVectorPositions(true);
> customType3.setStoreTermVectorOffsets(true);
> customType3.setIndexOptions(IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS);
> doc.add(new Field("content3", "here is more content with aaa aaa aaa", 
> customType3));
> // a field that omits only positions
> FieldType customType4 = new FieldType(TextField.TYPE_STORED);
> customType4.setStoreTermVectors(true);
> customType4.setStoreTermVectorPositions(false);
> customType4.setStoreTermVectorOffsets(true);
> customType4.setIndexOptions(IndexOptions.DOCS_AND_FREQS);
> // check out the copy-paste typo here! i forgot to change this to content4
>  doc.add(new Field("content3", "here is more content with aaa aaa aaa", 
> customType3));
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Assigned] (LUCENE-2899) Add OpenNLP Analysis capabilities as a module

2012-06-12 Thread Grant Ingersoll (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2899?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Ingersoll reassigned LUCENE-2899:
---

Assignee: Grant Ingersoll

> Add OpenNLP Analysis capabilities as a module
> -
>
> Key: LUCENE-2899
> URL: https://issues.apache.org/jira/browse/LUCENE-2899
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: modules/analysis
>Reporter: Grant Ingersoll
>Assignee: Grant Ingersoll
>Priority: Minor
> Attachments: LUCENE-2899.patch, opennlp_trunk.patch
>
>
> Now that OpenNLP is an ASF project and has a nice license, it would be nice 
> to have a submodule (under analysis) that exposed capabilities for it. Drew 
> Farris, Tom Morton and I have code that does:
> * Sentence Detection as a Tokenizer (could also be a TokenFilter, although it 
> would have to change slightly to buffer tokens)
> * NamedEntity recognition as a TokenFilter
> We are also planning a Tokenizer/TokenFilter that can put parts of speech as 
> either payloads (PartOfSpeechAttribute?) on a token or at the same position.
> I'd propose it go under:
> modules/analysis/opennlp

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-2899) Add OpenNLP Analysis capabilities as a module

2012-06-12 Thread Grant Ingersoll (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13293526#comment-13293526
 ] 

Grant Ingersoll commented on LUCENE-2899:
-

Very cool Lance.  The models are indeed tricky and I wonder how we can properly 
hook them into the tests, if at all.  I wonder how hard it would be to create 
much smaller ones based on training just a few things.

> Add OpenNLP Analysis capabilities as a module
> -
>
> Key: LUCENE-2899
> URL: https://issues.apache.org/jira/browse/LUCENE-2899
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: modules/analysis
>Reporter: Grant Ingersoll
>Priority: Minor
> Attachments: LUCENE-2899.patch, opennlp_trunk.patch
>
>
> Now that OpenNLP is an ASF project and has a nice license, it would be nice 
> to have a submodule (under analysis) that exposed capabilities for it. Drew 
> Farris, Tom Morton and I have code that does:
> * Sentence Detection as a Tokenizer (could also be a TokenFilter, although it 
> would have to change slightly to buffer tokens)
> * NamedEntity recognition as a TokenFilter
> We are also planning a Tokenizer/TokenFilter that can put parts of speech as 
> either payloads (PartOfSpeechAttribute?) on a token or at the same position.
> I'd propose it go under:
> modules/analysis/opennlp

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-4140) IndexWriterConfig has setFlushPolicy but the class is package private

2012-06-12 Thread selckin (JIRA)
selckin created LUCENE-4140:
---

 Summary: IndexWriterConfig has setFlushPolicy but the class is 
package private
 Key: LUCENE-4140
 URL: https://issues.apache.org/jira/browse/LUCENE-4140
 Project: Lucene - Java
  Issue Type: Bug
Reporter: selckin


4.0

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-4139) mixing up indexOptions in same IW session makes corrumpt index

2012-06-12 Thread Robert Muir (JIRA)
Robert Muir created LUCENE-4139:
---

 Summary: mixing up indexOptions in same IW session makes corrumpt 
index 
 Key: LUCENE-4139
 URL: https://issues.apache.org/jira/browse/LUCENE-4139
 Project: Lucene - Java
  Issue Type: Bug
Affects Versions: 4.0
Reporter: Robert Muir


I was trying to beef up TestBackwardsCompatibility (LUCENE-4085) but i 
accidentally made a corrupt index due to a typo:
{code}
// a field with both offsets and term vectors for a cross-check
FieldType customType3 = new FieldType(TextField.TYPE_STORED);
customType3.setStoreTermVectors(true);
customType3.setStoreTermVectorPositions(true);
customType3.setStoreTermVectorOffsets(true);
customType3.setIndexOptions(IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS);
doc.add(new Field("content3", "here is more content with aaa aaa aaa", 
customType3));
// a field that omits only positions
FieldType customType4 = new FieldType(TextField.TYPE_STORED);
customType4.setStoreTermVectors(true);
customType4.setStoreTermVectorPositions(false);
customType4.setStoreTermVectorOffsets(true);
customType4.setIndexOptions(IndexOptions.DOCS_AND_FREQS);
// check out the copy-paste typo here! i forgot to change this to content4
 doc.add(new Field("content3", "here is more content with aaa aaa aaa", 
customType3));
{code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4120) FST should use packed integer arrays

2012-06-12 Thread Adrien Grand (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13293513#comment-13293513
 ] 

Adrien Grand commented on LUCENE-4120:
--

@Robert: Yes, it only affects packed FSTs. In this case, the backward 
compatibility would be rather easy to set-up (just fill a {{GrowableWriter}} 
instead of an {{int[]}}).

@Dawid: Thanks!

> FST should use packed integer arrays
> 
>
> Key: LUCENE-4120
> URL: https://issues.apache.org/jira/browse/LUCENE-4120
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: core/FSTs
>Reporter: Adrien Grand
>Assignee: Adrien Grand
>Priority: Minor
> Fix For: 4.0
>
> Attachments: LUCENE-4120.patch, LUCENE-4120.patch
>
>
> There are some places where an int[] could be advantageously replaced with a 
> packed integer array.
> I am thinking (at least) of:
>  * FST.nodeAddress (GrowableWriter)
>  * FST.inCounts (GrowableWriter)
>  * FST.nodeRefToAddress (read-only Reader)
> The serialization/deserialization methods should be modified too in order to 
> take advantage of PackedInts.get{Reader,Writer}.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4136) TestDocumentsWriterStallControl hang (reproducible)

2012-06-12 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13293506#comment-13293506
 ] 

Robert Muir commented on LUCENE-4136:
-

+1 to commit (and remove @nightly): with the patch this seed is 9 seconds. I 
also then ran the test w/o seed about 20 times and it never spun off for 
minutes.

It must be something about my # of cpus maybe? This reminds me of Uwe's 2-cpu 
problem with another test...

> TestDocumentsWriterStallControl hang (reproducible)
> ---
>
> Key: LUCENE-4136
> URL: https://issues.apache.org/jira/browse/LUCENE-4136
> Project: Lucene - Java
>  Issue Type: Bug
>Reporter: Robert Muir
> Attachments: LUCENE-4136.patch
>
>
> On trunk (probably affects 4.0 too, but trunk is where i hit it):
> ant test -Dtestcase=TestDocumentsWriterStallControl 
> -Dtests.seed=9D5404FF4A909330

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-2357) Reduce transient RAM usage while merging by using packed ints array for docID re-mapping

2012-06-12 Thread Adrien Grand (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adrien Grand resolved LUCENE-2357.
--

Resolution: Fixed
  Assignee: Adrien Grand  (was: Michael McCandless)

> Reduce transient RAM usage while merging by using packed ints array for docID 
> re-mapping
> 
>
> Key: LUCENE-2357
> URL: https://issues.apache.org/jira/browse/LUCENE-2357
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: core/index
>Reporter: Michael McCandless
>Assignee: Adrien Grand
>Priority: Minor
> Fix For: 4.0, 5.0
>
> Attachments: LUCENE-2357.patch, LUCENE-2357.patch, LUCENE-2357.patch, 
> LUCENE-2357.patch, LUCENE-2357.patch
>
>
> We allocate this int[] to remap docIDs due to compaction of deleted ones.
> This uses alot of RAM for large segment merges, and can fail to allocate due 
> to fragmentation on 32 bit JREs.
> Now that we have packed ints, a simple fix would be to use a packed int 
> array... and maybe instead of storing abs docID in the mapping, we could 
> store the number of del docs seen so far (so the remap would do a lookup then 
> a subtract).  This may add some CPU cost to merging but should bring down 
> transient RAM usage quite a bit.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-2357) Reduce transient RAM usage while merging by using packed ints array for docID re-mapping

2012-06-12 Thread Adrien Grand (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13293504#comment-13293504
 ] 

Adrien Grand commented on LUCENE-2357:
--

Committed (r1349234 on trunk and r1349241 on branch 4.x).

> Reduce transient RAM usage while merging by using packed ints array for docID 
> re-mapping
> 
>
> Key: LUCENE-2357
> URL: https://issues.apache.org/jira/browse/LUCENE-2357
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: core/index
>Reporter: Michael McCandless
>Assignee: Michael McCandless
>Priority: Minor
> Fix For: 4.0, 5.0
>
> Attachments: LUCENE-2357.patch, LUCENE-2357.patch, LUCENE-2357.patch, 
> LUCENE-2357.patch, LUCENE-2357.patch
>
>
> We allocate this int[] to remap docIDs due to compaction of deleted ones.
> This uses alot of RAM for large segment merges, and can fail to allocate due 
> to fragmentation on 32 bit JREs.
> Now that we have packed ints, a simple fix would be to use a packed int 
> array... and maybe instead of storing abs docID in the mapping, we could 
> store the number of del docs seen so far (so the remap would do a lookup then 
> a subtract).  This may add some CPU cost to merging but should bring down 
> transient RAM usage quite a bit.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[JENKINS] Lucene-Solr-4.x-Windows-Java7-64 - Build # 47 - Failure!

2012-06-12 Thread jenkins
Build: 
http://jenkins.sd-datasolutions.de/job/Lucene-Solr-4.x-Windows-Java7-64/47/

1 tests failed.
REGRESSION:  org.apache.solr.handler.TestReplicationHandler.test

Error Message:
expected:<498> but was:<0>

Stack Trace:
java.lang.AssertionError: expected:<498> but was:<0>
at 
__randomizedtesting.SeedInfo.seed([DB13DF983319FB1C:5347E0429DE596E4]:0)
at org.junit.Assert.fail(Assert.java:93)
at org.junit.Assert.failNotEquals(Assert.java:647)
at org.junit.Assert.assertEquals(Assert.java:128)
at org.junit.Assert.assertEquals(Assert.java:472)
at org.junit.Assert.assertEquals(Assert.java:456)
at 
org.apache.solr.handler.TestReplicationHandler.doTestIndexAndConfigReplication(TestReplicationHandler.java:391)
at 
org.apache.solr.handler.TestReplicationHandler.test(TestReplicationHandler.java:250)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1969)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.access$1100(RandomizedRunner.java:132)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:814)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:875)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:889)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53)
at 
org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50)
at 
org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:32)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
at 
org.apache.lucene.util.TestRuleReportUncaughtExceptions$1.evaluate(TestRuleReportUncaughtExceptions.java:68)
at 
org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:48)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:821)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.access$700(RandomizedRunner.java:132)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$3$1.run(RandomizedRunner.java:669)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:695)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:734)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:745)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
at 
org.apache.lucene.util.TestRuleReportUncaughtExceptions$1.evaluate(TestRuleReportUncaughtExceptions.java:68)
at 
org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:38)
at 
org.apache.lucene.util.TestRuleIcuHack$1.evaluate(TestRuleIcuHack.java:51)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
at 
org.apache.lucene.util.TestRuleNoInstanceHooksOverrides$1.evaluate(TestRuleNoInstanceHooksOverrides.java:53)
at 
org.apache.lucene.util.TestRuleNoStaticHooksShadowing$1.evaluate(TestRuleNoStaticHooksShadowing.java:52)
at 
org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:36)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
at 
org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:56)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.runSuite(RandomizedRunner.java:605)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.access$400(RandomizedRunner.java:132)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$2.run(RandomizedRunner.java:551)




Build Log:
[...truncated 14187 lines...]
   [junit4]   2> 42483 T1920 C112 REQ [collection1] webapp=/solr 
path=/replication 
params={command=filecontent&checksum=true&generation=7&wt=filestream&file=_3.fdx}
 status=0 QTime=0 
   [junit4]   2> 42491 T1920 C112 REQ [

[JENKINS] Solr-4.x - Build # 7 - Still Failing

2012-06-12 Thread Apache Jenkins Server
Build: https://builds.apache.org/job/Solr-4.x/7/

1 tests failed.
REGRESSION:  org.apache.solr.cloud.RecoveryZkTest.testDistribSearch

Error Message:
Thread threw an uncaught exception, thread: Thread[Lucene Merge Thread #1,6,]

Stack Trace:
java.lang.RuntimeException: Thread threw an uncaught exception, thread: 
Thread[Lucene Merge Thread #1,6,]
at 
com.carrotsearch.randomizedtesting.RunnerThreadGroup.processUncaught(RunnerThreadGroup.java:96)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:857)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.access$700(RandomizedRunner.java:132)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$3$1.run(RandomizedRunner.java:669)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:695)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:734)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:745)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
at 
org.apache.lucene.util.TestRuleReportUncaughtExceptions$1.evaluate(TestRuleReportUncaughtExceptions.java:68)
at 
org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:38)
at 
org.apache.lucene.util.TestRuleIcuHack$1.evaluate(TestRuleIcuHack.java:51)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
at 
org.apache.lucene.util.TestRuleNoInstanceHooksOverrides$1.evaluate(TestRuleNoInstanceHooksOverrides.java:53)
at 
org.apache.lucene.util.TestRuleNoStaticHooksShadowing$1.evaluate(TestRuleNoStaticHooksShadowing.java:52)
at 
org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:36)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
at 
org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:56)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.runSuite(RandomizedRunner.java:605)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.access$400(RandomizedRunner.java:132)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$2.run(RandomizedRunner.java:551)
Caused by: org.apache.lucene.index.MergePolicy$MergeException: 
org.apache.lucene.store.AlreadyClosedException: this Directory is closed
at __randomizedtesting.SeedInfo.seed([EE2933489339AC5B]:0)
at 
org.apache.lucene.index.ConcurrentMergeScheduler.handleMergeException(ConcurrentMergeScheduler.java:507)
at 
org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:480)
Caused by: org.apache.lucene.store.AlreadyClosedException: this Directory is 
closed
at org.apache.lucene.store.Directory.ensureOpen(Directory.java:244)
at org.apache.lucene.store.FSDirectory.listAll(FSDirectory.java:241)
at 
org.apache.lucene.index.IndexFileDeleter.refresh(IndexFileDeleter.java:321)
at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3149)
at 
org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:382)
at 
org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:451)




Build Log:
[...truncated 43487 lines...]
   [junit4]   2>
commit{dir=/usr/home/hudson/hudson-slave/workspace/Solr-4.x/checkout/solr/build/solr-core/test/J0/org.apache.solr.cloud.RecoveryZkTest-1339494931897/control/data/index,segFN=segments_2,generation=2,filenames=[_5y_Lucene40_0.prx,
 _5z.fnm, _5y.si, _5w_Lucene40_0.frq, _5w_Lucene40_0.prx, _5u_nrm.cfe, 
_5t_Lucene40_0.prx, _60.fnm, _62.fdx, _5v.fnm, _5x_1.del, _62.fdt, 
_5u_Lucene40_0.prx, _5x_Lucene40_0.tim, _62_Lucene40_0.tip, _60_Lucene40_0.tip, 
_5w.fdt, _5u_nrm.cfs, _5z.fdt, _5w.fdx, _62.si, _5y_2.del, _5u_2.del, 
_5x_Lucene40_0.tip, _5x_Lucene40_0.frq, _5v.fdx, _5s.fnm, _5z_1.del, 
_62_nrm.cfs, _5v.fdt, _5s.fdt, _5z.fdx, _5s.fdx, _60.si, _5v_Lucene40_0.frq, 
_5x.si, _62_nrm.cfe, _5v_2.del, _5t.fdx, _5x.fnm, _60_Lucene40_0.tim, _5t.fnm, 
_62_Lucene40_0.frq, _5t.fdt, _5w.si, _5u.si, _5w_Lucene40_0.tip, _5t.si, 
_5t_Lucene40_0.frq, _5s.si, _5w_Lucene40_0.tim, _60_Lucene40_0.prx, 
_62_Lucene40_0.prx, _5z.si, _5t_Lucene40_0.tim, _5y_Lucene40_0.frq, 
_60_nrm.cfs, _5z_Lucene40_0.prx, _5w_nrm.cfs, _5t_Lucene40_0.tip, _60.fdx, 
_5u.fdx, _60.fdt, _5u.fdt, _5z_Lucene40_0.tim, _5z_Lucene40_0.tip, 
_5v_Lucene40_0.tip, _5v_Lucene40_0.tim, _5x.fdt, _5s_2.del, _5t_1.del, _5x.fdx, 
_5x_nrm.cfs, _5z_Lucene40_0.frq, _5u_Lucene40_0.frq, _5w_nrm.cfe, _5x_nrm.cfe, 
_5u.fnm, _5v_Lu

[jira] [Resolved] (LUCENE-4061) Improvements to DirectoryTaxonomyWriter (synchronization and others)

2012-06-12 Thread Shai Erera (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4061?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shai Erera resolved LUCENE-4061.


Resolution: Fixed

Committed rev 1349214 (trunk) and 1349223 (4x).

> Improvements to DirectoryTaxonomyWriter (synchronization and others)
> 
>
> Key: LUCENE-4061
> URL: https://issues.apache.org/jira/browse/LUCENE-4061
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: modules/facet
>Reporter: Shai Erera
>Assignee: Shai Erera
> Fix For: 4.0, 5.0
>
> Attachments: LUCENE-4061.patch, LUCENE-4061.patch
>
>
> DirTaxoWriter synchronizes in too many places. For instance addCategory() is 
> fully synchronized, while only a small part of it needs to be.
> Additionally, getCacheMemoryUsage looks bogus - it depends on the type of the 
> TaxoWriterCache. No code uses it, so I'd like to remove it -- whoever is 
> interested can query the specific cache impl it has. Currently, only 
> Cl2oTaxoWriterCache supports it.
> If the changes will be simple, I'll port them to 3.6.1 as well.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



  1   2   >