[JENKINS] Lucene-Solr-trunk-Linux (32bit/jdk1.7.0_21) - Build # 6159 - Still Failing!

2013-06-16 Thread Policeman Jenkins Server
Build: http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-Linux/6159/
Java: 32bit/jdk1.7.0_21 -server -XX:+UseG1GC

1 tests failed.
FAILED:  
org.apache.lucene.spatial.prefix.SpatialOpRecursivePrefixTreeTest.testContains 
{#5 seed=[594050BB28CB401B:ED310C391B43B6BA]}

Error Message:
Should have matched I 
#0:ShapePair(Rect(minX=61.0,maxX=67.0,minY=-122.0,maxY=110.0) , 
Rect(minX=10.0,maxX=61.0,minY=-113.0,maxY=126.0)) 
Q:Rect(minX=50.0,maxX=64.0,minY=39.0,maxY=66.0)

Stack Trace:
java.lang.AssertionError: Should have matched I 
#0:ShapePair(Rect(minX=61.0,maxX=67.0,minY=-122.0,maxY=110.0) , 
Rect(minX=10.0,maxX=61.0,minY=-113.0,maxY=126.0)) 
Q:Rect(minX=50.0,maxX=64.0,minY=39.0,maxY=66.0)
at 
__randomizedtesting.SeedInfo.seed([594050BB28CB401B:ED310C391B43B6BA]:0)
at org.junit.Assert.fail(Assert.java:93)
at 
org.apache.lucene.spatial.prefix.SpatialOpRecursivePrefixTreeTest.fail(SpatialOpRecursivePrefixTreeTest.java:289)
at 
org.apache.lucene.spatial.prefix.SpatialOpRecursivePrefixTreeTest.doTest(SpatialOpRecursivePrefixTreeTest.java:282)
at 
org.apache.lucene.spatial.prefix.SpatialOpRecursivePrefixTreeTest.testContains(SpatialOpRecursivePrefixTreeTest.java:103)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1559)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.access$600(RandomizedRunner.java:79)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:737)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:773)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:787)
at 
org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50)
at 
org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:51)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
at 
org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:49)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:358)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:782)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:442)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:746)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:648)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:682)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:693)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
at 
org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:43)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70)
at 
org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:55)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 

[jira] [Updated] (LUCENE-5029) factor out a generic 'TermState' for better sharing in FST-based term dict

2013-06-16 Thread Han Jiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Han Jiang updated LUCENE-5029:
--

Attachment: LUCENE-5029.patch

This patch keeps the original 'customize termstate in PBF' design. 
It also pushes flushTermsBlock  readTermsBlock to term dict side.

Now the rule is: if you PBF have some monotonical but 'don't care' values,
always fill -1 on them, so that term dict will reuse previous values to
'pad' that -1s. Yes Mike, the algebra is really simple :)

But I still have a problem removing that termBlockOrd from BlockTermState:
every time a caller uses seekExact(), it is expected to get a new term
state in which 'termBlockOrd' is involved. However I cannot fully 
understand how this variable works, and maybe we can use metadataUpto
to replace this? I'll try this later.

Can you put the TestDrillSideway fix in lucene3069 branch as well? 
Thanks :)


 factor out a generic 'TermState' for better sharing in FST-based term dict
 --

 Key: LUCENE-5029
 URL: https://issues.apache.org/jira/browse/LUCENE-5029
 Project: Lucene - Core
  Issue Type: Sub-task
Reporter: Han Jiang
Assignee: Han Jiang
Priority: Minor
 Fix For: 4.4

 Attachments: LUCENE-5029.algebra.patch, LUCENE-5029.algebra.patch, 
 LUCENE-5029.branch-init.patch, LUCENE-5029.patch, LUCENE-5029.patch, 
 LUCENE-5029.patch, LUCENE-5029.patch, LUCENE-5029.patch


 Currently, those two FST-based term dict (memory codec  blocktree) all use 
 FSTBytesRef as a base data structure, this might not share much data in 
 parent arcs, since the encoded BytesRef doesn't guarantee that 
 'Outputs.common()' always creates a long prefix. 
 While for current postings format, it is guaranteed that each FP (pointing to 
 .doc, .pos, etc.) will increase monotonically with 'larger' terms. That 
 means, between two Outputs, the Outputs from smaller term can be safely 
 pushed towards root. However we always have some tricky TermState to deal 
 with (like the singletonDocID for pulsing trick), so as Mike suggested, we 
 can simply cut the whole TermState into two parts: one part for comparation 
 and intersection, another for restoring generic data. Then the data structure 
 will be clear: this generic 'TermState' will consist of a fixed-length 
 LongsRef and variable-length BytesRef. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5029) factor out a generic 'TermState' for better sharing in FST-based term dict

2013-06-16 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13684612#comment-13684612
 ] 

Michael McCandless commented on LUCENE-5029:


Patch looks great, thanks Han!  It's so awesome to see all that hairy
terms block code disappearing from PostingsReader/Writer.

I think you should commit it to the branch and then we can iterate on
the following?:

I think only PostingsBaseWriter should have .longsSize(), and then the
terms dict should store this int itself and later load it at read
time.  This keeps the index self documenting, so an errant PBF that
reports the wrong longsSize at read time is not possible.  Also, I
think it should not take a FieldInfo.  Per-field-ness is handled
higher up (PerFieldPostingsFormat).

I think TempBlockTermsWriter.PendingMetaData should hold the byte[]
not the RAMOutputStream?  I think RAMOutputStream holds its buffer as
1KB sized chunks... we only need the RAMOutputStream while the PBF is
finishing that term; after that we can extract  convert to byte[] I
think.

Instead of -1 for don't care, I think TempPostingsWriterBase impls
should simply not change the value?  This is part of the contract.

Instead of making a separate PendingMetaData in the
TempBlockTermWriter, can we put the byte[] + long[] onto the existing
PendingTerm?  Then we can just pass the slice of PendingTerm down to
flushTermsBlock, fixing it to skip the block entries.

Can we rename nextTerm to decodeTerm?  (next used to be appropriate
when it was decoding the next term in the block... but that's an impl
detail of the terms dict now).

Separately from this effort, now that this issue will make the
per-term long[] visible to the terms dict, we can now easily
investigate better ways of storing that long[] data than simple
delta-coded vLongs, e.g. maybe Simple64 column stride would work
well.  But this is separate :)


 factor out a generic 'TermState' for better sharing in FST-based term dict
 --

 Key: LUCENE-5029
 URL: https://issues.apache.org/jira/browse/LUCENE-5029
 Project: Lucene - Core
  Issue Type: Sub-task
Reporter: Han Jiang
Assignee: Han Jiang
Priority: Minor
 Fix For: 4.4

 Attachments: LUCENE-5029.algebra.patch, LUCENE-5029.algebra.patch, 
 LUCENE-5029.branch-init.patch, LUCENE-5029.patch, LUCENE-5029.patch, 
 LUCENE-5029.patch, LUCENE-5029.patch, LUCENE-5029.patch


 Currently, those two FST-based term dict (memory codec  blocktree) all use 
 FSTBytesRef as a base data structure, this might not share much data in 
 parent arcs, since the encoded BytesRef doesn't guarantee that 
 'Outputs.common()' always creates a long prefix. 
 While for current postings format, it is guaranteed that each FP (pointing to 
 .doc, .pos, etc.) will increase monotonically with 'larger' terms. That 
 means, between two Outputs, the Outputs from smaller term can be safely 
 pushed towards root. However we always have some tricky TermState to deal 
 with (like the singletonDocID for pulsing trick), so as Mike suggested, we 
 can simply cut the whole TermState into two parts: one part for comparation 
 and intersection, another for restoring generic data. Then the data structure 
 will be clear: this generic 'TermState' will consist of a fixed-length 
 LongsRef and variable-length BytesRef. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5029) factor out a generic 'TermState' for better sharing in FST-based term dict

2013-06-16 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13684613#comment-13684613
 ] 

Michael McCandless commented on LUCENE-5029:


bq. Can you put the TestDrillSideway fix in lucene3069 branch as well? 

Sure, I'll just sync up all trunk changes over ...

 factor out a generic 'TermState' for better sharing in FST-based term dict
 --

 Key: LUCENE-5029
 URL: https://issues.apache.org/jira/browse/LUCENE-5029
 Project: Lucene - Core
  Issue Type: Sub-task
Reporter: Han Jiang
Assignee: Han Jiang
Priority: Minor
 Fix For: 4.4

 Attachments: LUCENE-5029.algebra.patch, LUCENE-5029.algebra.patch, 
 LUCENE-5029.branch-init.patch, LUCENE-5029.patch, LUCENE-5029.patch, 
 LUCENE-5029.patch, LUCENE-5029.patch, LUCENE-5029.patch


 Currently, those two FST-based term dict (memory codec  blocktree) all use 
 FSTBytesRef as a base data structure, this might not share much data in 
 parent arcs, since the encoded BytesRef doesn't guarantee that 
 'Outputs.common()' always creates a long prefix. 
 While for current postings format, it is guaranteed that each FP (pointing to 
 .doc, .pos, etc.) will increase monotonically with 'larger' terms. That 
 means, between two Outputs, the Outputs from smaller term can be safely 
 pushed towards root. However we always have some tricky TermState to deal 
 with (like the singletonDocID for pulsing trick), so as Mike suggested, we 
 can simply cut the whole TermState into two parts: one part for comparation 
 and intersection, another for restoring generic data. Then the data structure 
 will be clear: this generic 'TermState' will consist of a fixed-length 
 LongsRef and variable-length BytesRef. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5029) factor out a generic 'TermState' for better sharing in FST-based term dict

2013-06-16 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13684615#comment-13684615
 ] 

Michael McCandless commented on LUCENE-5029:


{quote}
But I still have a problem removing that termBlockOrd from BlockTermState:
every time a caller uses seekExact(), it is expected to get a new term
state in which 'termBlockOrd' is involved. However I cannot fully 
understand how this variable works, and maybe we can use metadataUpto
to replace this? I'll try this later.
{quote}

I think we won't be able to eliminate this, because the termBlockOrd (which 
records the position of this term in the block) is a necessary (from the term 
dict's standpoint) state for this term, because on seekExact followed by 
nextTerm, the terms dict needs to know which entry in the block to go to ...

 factor out a generic 'TermState' for better sharing in FST-based term dict
 --

 Key: LUCENE-5029
 URL: https://issues.apache.org/jira/browse/LUCENE-5029
 Project: Lucene - Core
  Issue Type: Sub-task
Reporter: Han Jiang
Assignee: Han Jiang
Priority: Minor
 Fix For: 4.4

 Attachments: LUCENE-5029.algebra.patch, LUCENE-5029.algebra.patch, 
 LUCENE-5029.branch-init.patch, LUCENE-5029.patch, LUCENE-5029.patch, 
 LUCENE-5029.patch, LUCENE-5029.patch, LUCENE-5029.patch


 Currently, those two FST-based term dict (memory codec  blocktree) all use 
 FSTBytesRef as a base data structure, this might not share much data in 
 parent arcs, since the encoded BytesRef doesn't guarantee that 
 'Outputs.common()' always creates a long prefix. 
 While for current postings format, it is guaranteed that each FP (pointing to 
 .doc, .pos, etc.) will increase monotonically with 'larger' terms. That 
 means, between two Outputs, the Outputs from smaller term can be safely 
 pushed towards root. However we always have some tricky TermState to deal 
 with (like the singletonDocID for pulsing trick), so as Mike suggested, we 
 can simply cut the whole TermState into two parts: one part for comparation 
 and intersection, another for restoring generic data. Then the data structure 
 will be clear: this generic 'TermState' will consist of a fixed-length 
 LongsRef and variable-length BytesRef. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3069) Lucene should have an entirely memory resident term dictionary

2013-06-16 Thread Commit Tag Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13684616#comment-13684616
 ] 

Commit Tag Bot commented on LUCENE-3069:


[lucene3069 commit] mikemccand
http://svn.apache.org/viewvc?view=revisionrevision=1493493

LUCENE-3069: merge trunk changes over

 Lucene should have an entirely memory resident term dictionary
 --

 Key: LUCENE-3069
 URL: https://issues.apache.org/jira/browse/LUCENE-3069
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/index, core/search
Affects Versions: 4.0-ALPHA
Reporter: Simon Willnauer
Assignee: Han Jiang
  Labels: gsoc2013
 Fix For: 4.4


 FST based TermDictionary has been a great improvement yet it still uses a 
 delta codec file for scanning to terms. Some environments have enough memory 
 available to keep the entire FST based term dict in memory. We should add a 
 TermDictionary implementation that encodes all needed information for each 
 term into the FST (custom fst.Output) and builds a FST from the entire term 
 not just the delta.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5029) factor out a generic 'TermState' for better sharing in FST-based term dict

2013-06-16 Thread Commit Tag Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13684621#comment-13684621
 ] 

Commit Tag Bot commented on LUCENE-5029:


[lucene3069 commit] han
http://svn.apache.org/viewvc?view=revisionrevision=1493494

LUCENE-5029: remove block based API from PBF

 factor out a generic 'TermState' for better sharing in FST-based term dict
 --

 Key: LUCENE-5029
 URL: https://issues.apache.org/jira/browse/LUCENE-5029
 Project: Lucene - Core
  Issue Type: Sub-task
Reporter: Han Jiang
Assignee: Han Jiang
Priority: Minor
 Fix For: 4.4

 Attachments: LUCENE-5029.algebra.patch, LUCENE-5029.algebra.patch, 
 LUCENE-5029.branch-init.patch, LUCENE-5029.patch, LUCENE-5029.patch, 
 LUCENE-5029.patch, LUCENE-5029.patch, LUCENE-5029.patch


 Currently, those two FST-based term dict (memory codec  blocktree) all use 
 FSTBytesRef as a base data structure, this might not share much data in 
 parent arcs, since the encoded BytesRef doesn't guarantee that 
 'Outputs.common()' always creates a long prefix. 
 While for current postings format, it is guaranteed that each FP (pointing to 
 .doc, .pos, etc.) will increase monotonically with 'larger' terms. That 
 means, between two Outputs, the Outputs from smaller term can be safely 
 pushed towards root. However we always have some tricky TermState to deal 
 with (like the singletonDocID for pulsing trick), so as Mike suggested, we 
 can simply cut the whole TermState into two parts: one part for comparation 
 and intersection, another for restoring generic data. Then the data structure 
 will be clear: this generic 'TermState' will consist of a fixed-length 
 LongsRef and variable-length BytesRef. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5029) factor out a generic 'TermState' for better sharing in FST-based term dict

2013-06-16 Thread Commit Tag Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13684637#comment-13684637
 ] 

Commit Tag Bot commented on LUCENE-5029:


[lucene3069 commit] han
http://svn.apache.org/viewvc?view=revisionrevision=1493502

LUCENE-5029 simplify contract on generic long[]

 factor out a generic 'TermState' for better sharing in FST-based term dict
 --

 Key: LUCENE-5029
 URL: https://issues.apache.org/jira/browse/LUCENE-5029
 Project: Lucene - Core
  Issue Type: Sub-task
Reporter: Han Jiang
Assignee: Han Jiang
Priority: Minor
 Fix For: 4.4

 Attachments: LUCENE-5029.algebra.patch, LUCENE-5029.algebra.patch, 
 LUCENE-5029.branch-init.patch, LUCENE-5029.patch, LUCENE-5029.patch, 
 LUCENE-5029.patch, LUCENE-5029.patch, LUCENE-5029.patch


 Currently, those two FST-based term dict (memory codec  blocktree) all use 
 FSTBytesRef as a base data structure, this might not share much data in 
 parent arcs, since the encoded BytesRef doesn't guarantee that 
 'Outputs.common()' always creates a long prefix. 
 While for current postings format, it is guaranteed that each FP (pointing to 
 .doc, .pos, etc.) will increase monotonically with 'larger' terms. That 
 means, between two Outputs, the Outputs from smaller term can be safely 
 pushed towards root. However we always have some tricky TermState to deal 
 with (like the singletonDocID for pulsing trick), so as Mike suggested, we 
 can simply cut the whole TermState into two parts: one part for comparation 
 and intersection, another for restoring generic data. Then the data structure 
 will be clear: this generic 'TermState' will consist of a fixed-length 
 LongsRef and variable-length BytesRef. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5029) factor out a generic 'TermState' for better sharing in FST-based term dict

2013-06-16 Thread Commit Tag Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13684650#comment-13684650
 ] 

Commit Tag Bot commented on LUCENE-5029:


[lucene3069 commit] han
http://svn.apache.org/viewvc?view=revisionrevision=1493508

LUCENE-5029: merge PendingMetaData into PendingTerm

 factor out a generic 'TermState' for better sharing in FST-based term dict
 --

 Key: LUCENE-5029
 URL: https://issues.apache.org/jira/browse/LUCENE-5029
 Project: Lucene - Core
  Issue Type: Sub-task
Reporter: Han Jiang
Assignee: Han Jiang
Priority: Minor
 Fix For: 4.4

 Attachments: LUCENE-5029.algebra.patch, LUCENE-5029.algebra.patch, 
 LUCENE-5029.branch-init.patch, LUCENE-5029.patch, LUCENE-5029.patch, 
 LUCENE-5029.patch, LUCENE-5029.patch, LUCENE-5029.patch


 Currently, those two FST-based term dict (memory codec  blocktree) all use 
 FSTBytesRef as a base data structure, this might not share much data in 
 parent arcs, since the encoded BytesRef doesn't guarantee that 
 'Outputs.common()' always creates a long prefix. 
 While for current postings format, it is guaranteed that each FP (pointing to 
 .doc, .pos, etc.) will increase monotonically with 'larger' terms. That 
 means, between two Outputs, the Outputs from smaller term can be safely 
 pushed towards root. However we always have some tricky TermState to deal 
 with (like the singletonDocID for pulsing trick), so as Mike suggested, we 
 can simply cut the whole TermState into two parts: one part for comparation 
 and intersection, another for restoring generic data. Then the data structure 
 will be clear: this generic 'TermState' will consist of a fixed-length 
 LongsRef and variable-length BytesRef. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3069) Lucene should have an entirely memory resident term dictionary

2013-06-16 Thread Commit Tag Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13684674#comment-13684674
 ] 

Commit Tag Bot commented on LUCENE-3069:


[lucene3069 commit] mikemccand
http://svn.apache.org/viewvc?view=revisionrevision=1493516

LUCENE-3069: add nocommit/TODO

 Lucene should have an entirely memory resident term dictionary
 --

 Key: LUCENE-3069
 URL: https://issues.apache.org/jira/browse/LUCENE-3069
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/index, core/search
Affects Versions: 4.0-ALPHA
Reporter: Simon Willnauer
Assignee: Han Jiang
  Labels: gsoc2013
 Fix For: 4.4


 FST based TermDictionary has been a great improvement yet it still uses a 
 delta codec file for scanning to terms. Some environments have enough memory 
 available to keep the entire FST based term dict in memory. We should add a 
 TermDictionary implementation that encodes all needed information for each 
 term into the FST (custom fst.Output) and builds a FST from the entire term 
 not just the delta.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3069) Lucene should have an entirely memory resident term dictionary

2013-06-16 Thread Commit Tag Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13684676#comment-13684676
 ] 

Commit Tag Bot commented on LUCENE-3069:


[lucene3069 commit] han
http://svn.apache.org/viewvc?view=revisionrevision=1493517

LUCENE-3069: setField now expose per-field info to term dict

 Lucene should have an entirely memory resident term dictionary
 --

 Key: LUCENE-3069
 URL: https://issues.apache.org/jira/browse/LUCENE-3069
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/index, core/search
Affects Versions: 4.0-ALPHA
Reporter: Simon Willnauer
Assignee: Han Jiang
  Labels: gsoc2013
 Fix For: 4.4


 FST based TermDictionary has been a great improvement yet it still uses a 
 delta codec file for scanning to terms. Some environments have enough memory 
 available to keep the entire FST based term dict in memory. We should add a 
 TermDictionary implementation that encodes all needed information for each 
 term into the FST (custom fst.Output) and builds a FST from the entire term 
 not just the delta.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-5029) factor out a generic 'TermState' for better sharing in FST-based term dict

2013-06-16 Thread Han Jiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Han Jiang resolved LUCENE-5029.
---

Resolution: Fixed

PostingsBase is now pluggable for non-based term dict, 
and the introduction of long[] and byte[] naturally helps 
the delta-encoding in both block-based term dict, and 
FST-based term dict.

 factor out a generic 'TermState' for better sharing in FST-based term dict
 --

 Key: LUCENE-5029
 URL: https://issues.apache.org/jira/browse/LUCENE-5029
 Project: Lucene - Core
  Issue Type: Sub-task
Reporter: Han Jiang
Assignee: Han Jiang
Priority: Minor
 Fix For: 4.4

 Attachments: LUCENE-5029.algebra.patch, LUCENE-5029.algebra.patch, 
 LUCENE-5029.branch-init.patch, LUCENE-5029.patch, LUCENE-5029.patch, 
 LUCENE-5029.patch, LUCENE-5029.patch, LUCENE-5029.patch


 Currently, those two FST-based term dict (memory codec  blocktree) all use 
 FSTBytesRef as a base data structure, this might not share much data in 
 parent arcs, since the encoded BytesRef doesn't guarantee that 
 'Outputs.common()' always creates a long prefix. 
 While for current postings format, it is guaranteed that each FP (pointing to 
 .doc, .pos, etc.) will increase monotonically with 'larger' terms. That 
 means, between two Outputs, the Outputs from smaller term can be safely 
 pushed towards root. However we always have some tricky TermState to deal 
 with (like the singletonDocID for pulsing trick), so as Mike suggested, we 
 can simply cut the whole TermState into two parts: one part for comparation 
 and intersection, another for restoring generic data. Then the data structure 
 will be clear: this generic 'TermState' will consist of a fixed-length 
 LongsRef and variable-length BytesRef. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (LUCENE-5029) factor out a generic 'TermState' for better sharing in FST-based term dict

2013-06-16 Thread Han Jiang (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13684678#comment-13684678
 ] 

Han Jiang edited comment on LUCENE-5029 at 6/16/13 2:49 PM:


PostingsBase is now pluggable for non-block based term dict, 
and the introduction of long[] and byte[] naturally helps 
the delta-encoding in both block-based term dict, and 
FST-based term dict.

  was (Author: billy):
PostingsBase is now pluggable for non-based term dict, 
and the introduction of long[] and byte[] naturally helps 
the delta-encoding in both block-based term dict, and 
FST-based term dict.
  
 factor out a generic 'TermState' for better sharing in FST-based term dict
 --

 Key: LUCENE-5029
 URL: https://issues.apache.org/jira/browse/LUCENE-5029
 Project: Lucene - Core
  Issue Type: Sub-task
Reporter: Han Jiang
Assignee: Han Jiang
Priority: Minor
 Fix For: 4.4

 Attachments: LUCENE-5029.algebra.patch, LUCENE-5029.algebra.patch, 
 LUCENE-5029.branch-init.patch, LUCENE-5029.patch, LUCENE-5029.patch, 
 LUCENE-5029.patch, LUCENE-5029.patch, LUCENE-5029.patch


 Currently, those two FST-based term dict (memory codec  blocktree) all use 
 FSTBytesRef as a base data structure, this might not share much data in 
 parent arcs, since the encoded BytesRef doesn't guarantee that 
 'Outputs.common()' always creates a long prefix. 
 While for current postings format, it is guaranteed that each FP (pointing to 
 .doc, .pos, etc.) will increase monotonically with 'larger' terms. That 
 means, between two Outputs, the Outputs from smaller term can be safely 
 pushed towards root. However we always have some tricky TermState to deal 
 with (like the singletonDocID for pulsing trick), so as Mike suggested, we 
 can simply cut the whole TermState into two parts: one part for comparation 
 and intersection, another for restoring generic data. Then the data structure 
 will be clear: this generic 'TermState' will consist of a fixed-length 
 LongsRef and variable-length BytesRef. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-4910) solr.xml persistence is completely broken

2013-06-16 Thread Erick Erickson (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-4910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erick Erickson updated SOLR-4910:
-

Attachment: SOLR-4910.patch

OK, if all the tests pass (running now), I think this is ready and I'll put it 
up tonight or tomorrow unless there are objections.

This patch goes against trunk...

This takes care of 4 bugs and 5 things that  testing flushed out, see 
solr/CHANGES.txt.

Shawn (or anyone for that matter) if you have a chance to run this through any 
exercises it would be a Good Thing, especially seeing whether I made the right 
decisions around swapping and renaming. 

Under any circumstances, though, unless someone finds something horribly wrong 
or the tests blow up, I think this improves solr.xml persistence significantly 
and I'll check it in Real Soon Now.

 solr.xml persistence is completely broken
 -

 Key: SOLR-4910
 URL: https://issues.apache.org/jira/browse/SOLR-4910
 Project: Solr
  Issue Type: Bug
Affects Versions: 5.0, 4.4
Reporter: Erick Erickson
Assignee: Erick Erickson
Priority: Blocker
 Attachments: SOLR-4910.patch, SOLR-4910.patch, SOLR-4910.patch, 
 SOLR-4910.patch, SOLR-4910.patch


 I'm working on SOLR-4862 (persisting a created core doesn't preserve some 
 values) and at least compared to 4.3 code, persisting to solr.xml is 
 completely broken.
 I learned to hate persistence while working on SOLR-4196  etc. and I'm glad 
 it's going away. I frequently got lost in implicit properties (they're easy 
 to persist and shouldn't be), what should/shouldn't be persisted (e.g. the 
 translated ${var:default} or the original), and it was a monster, so don't 
 think I'm nostalgic for the historical behavior.
 Before I dive back in I want to get some idea whether or not the current 
 behavior was intentional or not, I don't want to go back into that junk only 
 to undo someone else's work.
 Creating a new core (collection2 in my example) with persistence turned on in 
 solr.xml for instance changes the original definition for collection1 (stock 
 4.x as of tonight) from this:
 core name=collection1 instanceDir=collection1 shard=${shard:} 
 collection=${collection:collection1} config=${solrconfig:solrconfig.xml} 
 schema=${schema:schema.xml}
   coreNodeName=${coreNodeName:}/
 to this:
   core loadOnStartup=true shard=${shard:} instanceDir=collection1/ 
 transient=false name=collection1 dataDir=data/ 
 collection=${collection:collection1}
   property name=name value=collection1/
   property name=config value=solrconfig.xml/
   property name=solr.core.instanceDir value=solr/collection1//
   property name=transient value=false/
   property name=schema value=schema.xml/
   property name=loadOnStartup value=true/
   property name=solr.core.schemaName value=schema.xml/
   property name=solr.core.name value=collection1/
   property name=solr.core.dataDir value=data//
   property name=instanceDir value=collection1//
   property name=solr.core.configName value=solrconfig.xml/
 /core
 So, there are two questions:
 1 what is correct for 4.x?
 2 do we care at all about 5.x?
 As much as I hate to say it, I think that we need to go back to the 4.3 
 behavior. It might be as simple as not persisting in the property tags 
 anything already in the original definition. Not quite sure what to put where 
 in the newly-created core though, I suspect that the compact core + attribs 
 would be best (assuming there's no property tag already in the definition. 
 I really hate the mix of attributes on the core tag and property tags, 
 wish we had one or the other

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4910) solr.xml persistence is completely broken

2013-06-16 Thread Shawn Heisey (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13684706#comment-13684706
 ] 

Shawn Heisey commented on SOLR-4910:


That was a bit of a bear to apply to 4x, but I got it done.  I don't have 
anything real set up with trunk where I can easily work on it with my index 
building code.  Perhaps I should set up a fourth index chain for that.

I will poke around a bit.

 solr.xml persistence is completely broken
 -

 Key: SOLR-4910
 URL: https://issues.apache.org/jira/browse/SOLR-4910
 Project: Solr
  Issue Type: Bug
Affects Versions: 5.0, 4.4
Reporter: Erick Erickson
Assignee: Erick Erickson
Priority: Blocker
 Attachments: SOLR-4910.patch, SOLR-4910.patch, SOLR-4910.patch, 
 SOLR-4910.patch, SOLR-4910.patch


 I'm working on SOLR-4862 (persisting a created core doesn't preserve some 
 values) and at least compared to 4.3 code, persisting to solr.xml is 
 completely broken.
 I learned to hate persistence while working on SOLR-4196  etc. and I'm glad 
 it's going away. I frequently got lost in implicit properties (they're easy 
 to persist and shouldn't be), what should/shouldn't be persisted (e.g. the 
 translated ${var:default} or the original), and it was a monster, so don't 
 think I'm nostalgic for the historical behavior.
 Before I dive back in I want to get some idea whether or not the current 
 behavior was intentional or not, I don't want to go back into that junk only 
 to undo someone else's work.
 Creating a new core (collection2 in my example) with persistence turned on in 
 solr.xml for instance changes the original definition for collection1 (stock 
 4.x as of tonight) from this:
 core name=collection1 instanceDir=collection1 shard=${shard:} 
 collection=${collection:collection1} config=${solrconfig:solrconfig.xml} 
 schema=${schema:schema.xml}
   coreNodeName=${coreNodeName:}/
 to this:
   core loadOnStartup=true shard=${shard:} instanceDir=collection1/ 
 transient=false name=collection1 dataDir=data/ 
 collection=${collection:collection1}
   property name=name value=collection1/
   property name=config value=solrconfig.xml/
   property name=solr.core.instanceDir value=solr/collection1//
   property name=transient value=false/
   property name=schema value=schema.xml/
   property name=loadOnStartup value=true/
   property name=solr.core.schemaName value=schema.xml/
   property name=solr.core.name value=collection1/
   property name=solr.core.dataDir value=data//
   property name=instanceDir value=collection1//
   property name=solr.core.configName value=solrconfig.xml/
 /core
 So, there are two questions:
 1 what is correct for 4.x?
 2 do we care at all about 5.x?
 As much as I hate to say it, I think that we need to go back to the 4.3 
 behavior. It might be as simple as not persisting in the property tags 
 anything already in the original definition. Not quite sure what to put where 
 in the newly-created core though, I suspect that the compact core + attribs 
 would be best (assuming there's no property tag already in the definition. 
 I really hate the mix of attributes on the core tag and property tags, 
 wish we had one or the other

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-4914) Refactor core persistence to reflect deprecating the core tags in solr.xml

2013-06-16 Thread Alan Woodward (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-4914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Woodward updated SOLR-4914:


Attachment: SOLR-4914.patch

Patch with my latest status.  It won't compile yet, as I haven't updated all 
tests to reflect the new API.  Will work on that next.

This combines core discovery and persistence into the CoresListPersistor 
interface.  All persistence logic is removed from CoreContainer and ConfigSolr 
and put into the two implementing classes of CorePropertiesPersistor and 
SolrXMLPersistor.

CoreDescriptor is tidied up a bit, and made effectively immutable (would be 
nicer to make it really immutable, maybe with an ImmutableProperties class).  
The original pre-substitution parameters are stored as well as the values after 
substitution, which makes persistence a lot easier (you just read from the 
original values).

Solr.xml persistence is also made a lot simpler, by just storing everything 
around the cores/ tag as a flat string, and only updating the core tags.  
So you don't need to remember to add new solr.xml parameters to core 
persistence logic any more, and things like comments will be preserved.

This is a pretty big patch, and there's still a fair amount to do, but I'd be 
grateful for some preliminary reviews.  I think it simplifies the whole core 
discovery/persistence logic a lot.

 Refactor core persistence to reflect deprecating the core tags in solr.xml
 

 Key: SOLR-4914
 URL: https://issues.apache.org/jira/browse/SOLR-4914
 Project: Solr
  Issue Type: Improvement
Affects Versions: 5.0
Reporter: Erick Erickson
 Attachments: SOLR-4914.patch, SOLR-4914.patch


 Alan Woodward has done some work to refactor how core persistence works that 
 we should work on going forward that I want to separate from a shorter-term 
 tactical problem (See SOLR-4910).
 I'm attaching Alan's patch to this JIRA and we'll carry it forward separately 
 from 4910.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Assigned] (SOLR-4914) Refactor core persistence to reflect deprecating the core tags in solr.xml

2013-06-16 Thread Alan Woodward (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-4914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Woodward reassigned SOLR-4914:
---

Assignee: Alan Woodward

 Refactor core persistence to reflect deprecating the core tags in solr.xml
 

 Key: SOLR-4914
 URL: https://issues.apache.org/jira/browse/SOLR-4914
 Project: Solr
  Issue Type: Improvement
Affects Versions: 5.0
Reporter: Erick Erickson
Assignee: Alan Woodward
 Attachments: SOLR-4914.patch, SOLR-4914.patch


 Alan Woodward has done some work to refactor how core persistence works that 
 we should work on going forward that I want to separate from a shorter-term 
 tactical problem (See SOLR-4910).
 I'm attaching Alan's patch to this JIRA and we'll carry it forward separately 
 from 4910.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-4914) Factor out core discovery and persistence logic

2013-06-16 Thread Alan Woodward (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-4914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Woodward updated SOLR-4914:


Summary: Factor out core discovery and persistence logic  (was: Refactor 
core persistence to reflect deprecating the core tags in solr.xml)

 Factor out core discovery and persistence logic
 ---

 Key: SOLR-4914
 URL: https://issues.apache.org/jira/browse/SOLR-4914
 Project: Solr
  Issue Type: Improvement
Affects Versions: 5.0
Reporter: Erick Erickson
Assignee: Alan Woodward
 Attachments: SOLR-4914.patch, SOLR-4914.patch


 Alan Woodward has done some work to refactor how core persistence works that 
 we should work on going forward that I want to separate from a shorter-term 
 tactical problem (See SOLR-4910).
 I'm attaching Alan's patch to this JIRA and we'll carry it forward separately 
 from 4910.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



unsubscribe

2013-06-16 Thread Michael Aro
unsubscribe


[jira] [Created] (SOLR-4930) Make PathHierarchyTokenizer use regex and optionally prefix the depth of the path.

2013-06-16 Thread John Berryman (JIRA)
John Berryman created SOLR-4930:
---

 Summary: Make PathHierarchyTokenizer use regex and optionally 
prefix the depth of the path.
 Key: SOLR-4930
 URL: https://issues.apache.org/jira/browse/SOLR-4930
 Project: Solr
  Issue Type: Improvement
  Components: Schema and Analysis
Reporter: John Berryman
Priority: Minor


The PathHierarchyTokenizer lacks a couple of features that I think are commonly 
needed.

1. Split and replace based upon regex.
2. Optionally prefix the token with the depth of the path token

Motivation: I recently had a client who asked me to index laws that were 
organized in the chapters, sections, subsections, etc. The problem was that the 
section number used a mixture of delimiters. Ex: 13.4-64.2, so I had to use 
pattern replacement to map either delimiter to tilda. But the next problem was 
that these could no longer be displayed as facets (at least not without extra 
code on the front end). Also, I wanted to prefix the depth of the path at the 
front of the token. Again, I can achieve this with pattern replacement - but it 
is ugly and non-performant.

I propose we:

* update PathHierarchyTokenizer so that if the parameters for delimiter of 
replacement are single character, then the behavior of PathHierarchyTokenizer 
remains consistent, but if the length of these arguments is greater than one, 
then they should be interpreted as regex.
* add a new parameter called depthPrefixNumChars that indicates how many 
characters will be used for a depth prefix - this defaults to zero

Here's my current first stab at it:
https://github.com/o19s/statedecoded/blob/master/solr_home/statedecoded/src/src/main/java/com/o19s/RegexPathHierarchyTokenizer.java
 This doesn't support the replacement or skip parameter yet. Before I go the 
rest of the way, I wanted to gauge interest and see if others need this.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-2899) Add OpenNLP Analysis capabilities as a module

2013-06-16 Thread Lance Norskog (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2899?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lance Norskog updated LUCENE-2899:
--

Attachment: LUCENE-2899-x.patch

Fixed the Chunker problem. I switched to the new released version of the 
OpenNLP packages. The MaxEnt implementation (statistical modeling) for chunking 
changed slightly, and my test data now produces different nounverb phrase 
chunks for the sample text.

At this point the only problems I know of are that the licenses are slightly 
wrong, and so 
'ant validate' fails.

These comments only apply to LUCENE-2899-x.patch, which applies to the current 
4.x and trunk codelines. LUCENE-2899.patch applies to the release 4.0-4.3 
releases. It is not upgraded to the new OpenNLP release.

 Add OpenNLP Analysis capabilities as a module
 -

 Key: LUCENE-2899
 URL: https://issues.apache.org/jira/browse/LUCENE-2899
 Project: Lucene - Core
  Issue Type: New Feature
  Components: modules/analysis
Reporter: Grant Ingersoll
Assignee: Grant Ingersoll
Priority: Minor
 Fix For: 4.4

 Attachments: LUCENE-2899-current.patch, LUCENE-2899.patch, 
 LUCENE-2899.patch, LUCENE-2899.patch, LUCENE-2899.patch, LUCENE-2899.patch, 
 LUCENE-2899.patch, LUCENE-2899-RJN.patch, LUCENE-2899-x.patch, 
 LUCENE-2899-x.patch, LUCENE-2899-x.patch, OpenNLPFilter.java, 
 OpenNLPFilter.java, OpenNLPTokenizer.java, opennlp_trunk.patch


 Now that OpenNLP is an ASF project and has a nice license, it would be nice 
 to have a submodule (under analysis) that exposed capabilities for it. Drew 
 Farris, Tom Morton and I have code that does:
 * Sentence Detection as a Tokenizer (could also be a TokenFilter, although it 
 would have to change slightly to buffer tokens)
 * NamedEntity recognition as a TokenFilter
 We are also planning a Tokenizer/TokenFilter that can put parts of speech as 
 either payloads (PartOfSpeechAttribute?) on a token or at the same position.
 I'd propose it go under:
 modules/analysis/opennlp

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4910) solr.xml persistence is completely broken

2013-06-16 Thread Commit Tag Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13684868#comment-13684868
 ] 

Commit Tag Bot commented on SOLR-4910:
--

[trunk commit] erick
http://svn.apache.org/viewvc?view=revisionrevision=1493618

SOLR-4910, improvements to persisting solr.xml and misc other fixes, see 
CHANGES.txt

 solr.xml persistence is completely broken
 -

 Key: SOLR-4910
 URL: https://issues.apache.org/jira/browse/SOLR-4910
 Project: Solr
  Issue Type: Bug
Affects Versions: 5.0, 4.4
Reporter: Erick Erickson
Assignee: Erick Erickson
Priority: Blocker
 Attachments: SOLR-4910.patch, SOLR-4910.patch, SOLR-4910.patch, 
 SOLR-4910.patch, SOLR-4910.patch


 I'm working on SOLR-4862 (persisting a created core doesn't preserve some 
 values) and at least compared to 4.3 code, persisting to solr.xml is 
 completely broken.
 I learned to hate persistence while working on SOLR-4196  etc. and I'm glad 
 it's going away. I frequently got lost in implicit properties (they're easy 
 to persist and shouldn't be), what should/shouldn't be persisted (e.g. the 
 translated ${var:default} or the original), and it was a monster, so don't 
 think I'm nostalgic for the historical behavior.
 Before I dive back in I want to get some idea whether or not the current 
 behavior was intentional or not, I don't want to go back into that junk only 
 to undo someone else's work.
 Creating a new core (collection2 in my example) with persistence turned on in 
 solr.xml for instance changes the original definition for collection1 (stock 
 4.x as of tonight) from this:
 core name=collection1 instanceDir=collection1 shard=${shard:} 
 collection=${collection:collection1} config=${solrconfig:solrconfig.xml} 
 schema=${schema:schema.xml}
   coreNodeName=${coreNodeName:}/
 to this:
   core loadOnStartup=true shard=${shard:} instanceDir=collection1/ 
 transient=false name=collection1 dataDir=data/ 
 collection=${collection:collection1}
   property name=name value=collection1/
   property name=config value=solrconfig.xml/
   property name=solr.core.instanceDir value=solr/collection1//
   property name=transient value=false/
   property name=schema value=schema.xml/
   property name=loadOnStartup value=true/
   property name=solr.core.schemaName value=schema.xml/
   property name=solr.core.name value=collection1/
   property name=solr.core.dataDir value=data//
   property name=instanceDir value=collection1//
   property name=solr.core.configName value=solrconfig.xml/
 /core
 So, there are two questions:
 1 what is correct for 4.x?
 2 do we care at all about 5.x?
 As much as I hate to say it, I think that we need to go back to the 4.3 
 behavior. It might be as simple as not persisting in the property tags 
 anything already in the original definition. Not quite sure what to put where 
 in the newly-created core though, I suspect that the compact core + attribs 
 would be best (assuming there's no property tag already in the definition. 
 I really hate the mix of attributes on the core tag and property tags, 
 wish we had one or the other

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (SOLR-4910) solr.xml persistence is completely broken

2013-06-16 Thread Erick Erickson (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-4910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erick Erickson resolved SOLR-4910.
--

   Resolution: Fixed
Fix Version/s: 4.4
   5.0

trunk: 1493618
4x:1493620

Also fixed in this commit:
SOLR-4862, CREATE fails to persist schema, config, and dataDir
SOLR-4363, not persisting coreLoadThreads in solr tag
SOLR-3900, logWatcher properties not persisted
SOLR-4852, cores defined as loadOnStartup=true, transient=false can't be 
searched

[~elyograg] Have at it, let's open up new JIRAs for anything you find, probably 
just assign them to me.

[~romseygeek] This commit probably makes your life more difficult, you may want 
to do an update sooner rather than later, this got somewhat bigger than I 
expected.

 solr.xml persistence is completely broken
 -

 Key: SOLR-4910
 URL: https://issues.apache.org/jira/browse/SOLR-4910
 Project: Solr
  Issue Type: Bug
Affects Versions: 5.0, 4.4
Reporter: Erick Erickson
Assignee: Erick Erickson
Priority: Blocker
 Fix For: 5.0, 4.4

 Attachments: SOLR-4910.patch, SOLR-4910.patch, SOLR-4910.patch, 
 SOLR-4910.patch, SOLR-4910.patch


 I'm working on SOLR-4862 (persisting a created core doesn't preserve some 
 values) and at least compared to 4.3 code, persisting to solr.xml is 
 completely broken.
 I learned to hate persistence while working on SOLR-4196  etc. and I'm glad 
 it's going away. I frequently got lost in implicit properties (they're easy 
 to persist and shouldn't be), what should/shouldn't be persisted (e.g. the 
 translated ${var:default} or the original), and it was a monster, so don't 
 think I'm nostalgic for the historical behavior.
 Before I dive back in I want to get some idea whether or not the current 
 behavior was intentional or not, I don't want to go back into that junk only 
 to undo someone else's work.
 Creating a new core (collection2 in my example) with persistence turned on in 
 solr.xml for instance changes the original definition for collection1 (stock 
 4.x as of tonight) from this:
 core name=collection1 instanceDir=collection1 shard=${shard:} 
 collection=${collection:collection1} config=${solrconfig:solrconfig.xml} 
 schema=${schema:schema.xml}
   coreNodeName=${coreNodeName:}/
 to this:
   core loadOnStartup=true shard=${shard:} instanceDir=collection1/ 
 transient=false name=collection1 dataDir=data/ 
 collection=${collection:collection1}
   property name=name value=collection1/
   property name=config value=solrconfig.xml/
   property name=solr.core.instanceDir value=solr/collection1//
   property name=transient value=false/
   property name=schema value=schema.xml/
   property name=loadOnStartup value=true/
   property name=solr.core.schemaName value=schema.xml/
   property name=solr.core.name value=collection1/
   property name=solr.core.dataDir value=data//
   property name=instanceDir value=collection1//
   property name=solr.core.configName value=solrconfig.xml/
 /core
 So, there are two questions:
 1 what is correct for 4.x?
 2 do we care at all about 5.x?
 As much as I hate to say it, I think that we need to go back to the 4.3 
 behavior. It might be as simple as not persisting in the property tags 
 anything already in the original definition. Not quite sure what to put where 
 in the newly-created core though, I suspect that the compact core + attribs 
 would be best (assuming there's no property tag already in the definition. 
 I really hate the mix of attributes on the core tag and property tags, 
 wish we had one or the other

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (SOLR-4862) Core admin action CREATE fails to persist some settings in solr.xml

2013-06-16 Thread Erick Erickson (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-4862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erick Erickson resolved SOLR-4862.
--

   Resolution: Fixed
Fix Version/s: 4.4
   5.0

Fixed as part of SOLR-4910

 Core admin action CREATE fails to persist some settings in solr.xml
 -

 Key: SOLR-4862
 URL: https://issues.apache.org/jira/browse/SOLR-4862
 Project: Solr
  Issue Type: Bug
  Components: multicore
Affects Versions: 4.3
Reporter: André Widhani
Assignee: Erick Erickson
Priority: Minor
 Fix For: 5.0, 4.4


 When I create a core with Core admin handler using these request parameters:
 action=CREATE
 name=core-tex69bbum21ctk1kq6lmkir-index3
 schema=/etc/opt/dcx/solr/conf/schema.xml
 instanceDir=/etc/opt/dcx/solr/
 config=/etc/opt/dcx/solr/conf/solrconfig.xml
 dataDir=/var/opt/dcx/solr/core-tex69bbum21ctk1kq6lmkir-index3
 in Solr 4.1, solr.xml would have the following entry:
 core schema=/etc/opt/dcx/solr/conf/schema.xml loadOnStartup=true 
 instanceDir=/etc/opt/dcx/solr/ transient=false 
 name=core-tex69bbum21ctk1kq6lmkir-index3 
 config=/etc/opt/dcx/solr/conf/solrconfig.xml 
 dataDir=/var/opt/dcx/solr/core-tex69bbum21ctk1kq6lmkir-index3/ 
 collection=core-tex69bbum21ctk1kq6lmkir-index3/
 while in Solr 4.3 schema, config and dataDir will be missing:
 core loadOnStartup=true instanceDir=/etc/opt/dcx/solr/ 
 transient=false name=core-tex69bbum21ctk1kq6lmkir-index3 
 collection=core-tex69bbum21ctk1kq6lmkir-index3/
 The new core would use the settings specified during CREATE, but after a Solr 
 restart they are lost (fall back to some defaults), as they are not persisted 
 in solr.xml. I should add that solr.xml has persistent=true in the root 
 element.
 http://lucene.472066.n3.nabble.com/Core-admin-action-quot-CREATE-quot-fails-to-persist-some-settings-in-solr-xml-with-Solr-4-3-td4065786.html

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (SOLR-4363) Inconsistent coreLoadThreads attributes in solr.xml between read/write

2013-06-16 Thread Erick Erickson (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-4363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erick Erickson resolved SOLR-4363.
--

   Resolution: Fixed
Fix Version/s: 4.4
   5.0

Fixed as part of SOLR-4910

 Inconsistent coreLoadThreads attributes in solr.xml between read/write
 

 Key: SOLR-4363
 URL: https://issues.apache.org/jira/browse/SOLR-4363
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.1
Reporter: Patanachai Tangchaisin
Assignee: Erick Erickson
Priority: Minor
 Fix For: 5.0, 4.4


 Solr is reading coreLoadThreads from an solr element in solr.xml
 However, when persistent is enabled in solr.xml, Solr inserts 
 coreLoadThreads attribute to a wrong element.
 Before start solr
 {code}
 solr persistent=true coreLoadThreads=2
   cores host=localhost adminPath=/admin/cores hostPort=8983 
 hostContext=solr
.
 /solr
 {code}
 After start solr
 {code}
 solr persistent=true 
   cores host=localhost adminPath=/admin/cores coreLoadThreads=2 
 hostPort=8080 hostContext=solr
.
 /solr
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (SOLR-3900) LogWatcher Config Not Persisted

2013-06-16 Thread Erick Erickson (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-3900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erick Erickson resolved SOLR-3900.
--

   Resolution: Fixed
Fix Version/s: 5.0

Fixed as part of SOLR-4910

 LogWatcher Config Not Persisted 
 

 Key: SOLR-3900
 URL: https://issues.apache.org/jira/browse/SOLR-3900
 Project: Solr
  Issue Type: Bug
  Components: multicore
Reporter: Michael Garski
Assignee: Erick Erickson
Priority: Minor
 Fix For: 5.0, 4.4


 When the solr.xml file is set to persistent=true, the logging element that 
 contains the LogWatcher configuration is not persisted to the new solr.xml 
 file that is written when managing the cores via core admin.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Issue Comment Deleted] (SOLR-4852) If sharedLib is set to lib, classloader fails to find classes in lib

2013-06-16 Thread Erick Erickson (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-4852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erick Erickson updated SOLR-4852:
-

Comment: was deleted

(was: Fixed as part of SOLR-4910)

 If sharedLib is set to lib, classloader fails to find classes in lib
 

 Key: SOLR-4852
 URL: https://issues.apache.org/jira/browse/SOLR-4852
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.4
 Environment: Linux bigindy5 2.6.32-358.6.1.el6.centos.plus.x86_64 #1 
 SMP Wed Apr 24 03:21:04 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux
 java version 1.7.0_21
 Java(TM) SE Runtime Environment (build 1.7.0_21-b11)
 Java HotSpot(TM) 64-Bit Server VM (build 23.21-b01, mixed mode)
Reporter: Shawn Heisey
 Fix For: 5.0, 4.4

 Attachments: SOLR-4852.patch, SOLR-4852.patch, 
 SOLR-4852-test-failhard.txt


 I have some jars in the lib directory under solr.solr.home - DIH, ICU, and 
 MySQL.  If I set sharedLib in solr.xml to lib then the ICUTokenizer class 
 is not found, even though the jar is loaded (twice) during Solr startup.  If 
 I set sharedLib to another location that doesn't exist, the jars are only 
 loaded once and there is no problem.
 I'm using the old-style solr.xml on branch_4x revision 1485566.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (SOLR-4852) If sharedLib is set to lib, classloader fails to find classes in lib

2013-06-16 Thread Erick Erickson (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-4852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erick Erickson resolved SOLR-4852.
--

Resolution: Fixed

Fixed as part of SOLR-4910

 If sharedLib is set to lib, classloader fails to find classes in lib
 

 Key: SOLR-4852
 URL: https://issues.apache.org/jira/browse/SOLR-4852
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.4
 Environment: Linux bigindy5 2.6.32-358.6.1.el6.centos.plus.x86_64 #1 
 SMP Wed Apr 24 03:21:04 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux
 java version 1.7.0_21
 Java(TM) SE Runtime Environment (build 1.7.0_21-b11)
 Java HotSpot(TM) 64-Bit Server VM (build 23.21-b01, mixed mode)
Reporter: Shawn Heisey
 Fix For: 5.0, 4.4

 Attachments: SOLR-4852.patch, SOLR-4852.patch, 
 SOLR-4852-test-failhard.txt


 I have some jars in the lib directory under solr.solr.home - DIH, ICU, and 
 MySQL.  If I set sharedLib in solr.xml to lib then the ICUTokenizer class 
 is not found, even though the jar is loaded (twice) during Solr startup.  If 
 I set sharedLib to another location that doesn't exist, the jars are only 
 loaded once and there is no problem.
 I'm using the old-style solr.xml on branch_4x revision 1485566.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Reopened] (SOLR-4852) If sharedLib is set to lib, classloader fails to find classes in lib

2013-06-16 Thread Erick Erickson (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-4852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erick Erickson reopened SOLR-4852:
--


Sorry, got the wrong one when closing JIRAs related to SOLR-4910

 If sharedLib is set to lib, classloader fails to find classes in lib
 

 Key: SOLR-4852
 URL: https://issues.apache.org/jira/browse/SOLR-4852
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.4
 Environment: Linux bigindy5 2.6.32-358.6.1.el6.centos.plus.x86_64 #1 
 SMP Wed Apr 24 03:21:04 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux
 java version 1.7.0_21
 Java(TM) SE Runtime Environment (build 1.7.0_21-b11)
 Java HotSpot(TM) 64-Bit Server VM (build 23.21-b01, mixed mode)
Reporter: Shawn Heisey
 Fix For: 5.0, 4.4

 Attachments: SOLR-4852.patch, SOLR-4852.patch, 
 SOLR-4852-test-failhard.txt


 I have some jars in the lib directory under solr.solr.home - DIH, ICU, and 
 MySQL.  If I set sharedLib in solr.xml to lib then the ICUTokenizer class 
 is not found, even though the jar is loaded (twice) during Solr startup.  If 
 I set sharedLib to another location that doesn't exist, the jars are only 
 loaded once and there is no problem.
 I'm using the old-style solr.xml on branch_4x revision 1485566.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (SOLR-4850) Cores defined as loadOnStartup=true and transient=true can't be queried

2013-06-16 Thread Erick Erickson (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-4850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erick Erickson resolved SOLR-4850.
--

   Resolution: Fixed
Fix Version/s: 4.4
   5.0

Fixed as part of SOLR-4910

 Cores defined as loadOnStartup=true and transient=true can't be queried 
 

 Key: SOLR-4850
 URL: https://issues.apache.org/jira/browse/SOLR-4850
 Project: Solr
  Issue Type: Bug
  Components: multicore
Affects Versions: 4.3, 4.2.1
Reporter: Lyubov Romanchuk
Assignee: Erick Erickson
 Fix For: 5.0, 4.4


 It seems like in order to query transient cores they must be defined with
 loadOnStartup=false.
 I define one core loadOnStartup=true and transient=false, and another
 cores to be  loadOnStartup=true and transient=true, and
 transientCacheSize is default (=Integer.MAX_VALUE).
 In this case CoreContainer.dynamicDescriptors will be empty and then
 CoreContainer.getCoreFromAnyList(String) and CoreContainer.getCore(String)
 returns null for all transient cores.
 As a result such cores (loadOnStartup=true and transient=true) can't be 
 queried at all (neither from Query nor from Overview). 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4910) solr.xml persistence is completely broken

2013-06-16 Thread Commit Tag Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13684899#comment-13684899
 ] 

Commit Tag Bot commented on SOLR-4910:
--

[branch_4x commit] erick
http://svn.apache.org/viewvc?view=revisionrevision=1493621

SOLR-4910, corrected typo in CHANGES.txt

 solr.xml persistence is completely broken
 -

 Key: SOLR-4910
 URL: https://issues.apache.org/jira/browse/SOLR-4910
 Project: Solr
  Issue Type: Bug
Affects Versions: 5.0, 4.4
Reporter: Erick Erickson
Assignee: Erick Erickson
Priority: Blocker
 Fix For: 5.0, 4.4

 Attachments: SOLR-4910.patch, SOLR-4910.patch, SOLR-4910.patch, 
 SOLR-4910.patch, SOLR-4910.patch


 I'm working on SOLR-4862 (persisting a created core doesn't preserve some 
 values) and at least compared to 4.3 code, persisting to solr.xml is 
 completely broken.
 I learned to hate persistence while working on SOLR-4196  etc. and I'm glad 
 it's going away. I frequently got lost in implicit properties (they're easy 
 to persist and shouldn't be), what should/shouldn't be persisted (e.g. the 
 translated ${var:default} or the original), and it was a monster, so don't 
 think I'm nostalgic for the historical behavior.
 Before I dive back in I want to get some idea whether or not the current 
 behavior was intentional or not, I don't want to go back into that junk only 
 to undo someone else's work.
 Creating a new core (collection2 in my example) with persistence turned on in 
 solr.xml for instance changes the original definition for collection1 (stock 
 4.x as of tonight) from this:
 core name=collection1 instanceDir=collection1 shard=${shard:} 
 collection=${collection:collection1} config=${solrconfig:solrconfig.xml} 
 schema=${schema:schema.xml}
   coreNodeName=${coreNodeName:}/
 to this:
   core loadOnStartup=true shard=${shard:} instanceDir=collection1/ 
 transient=false name=collection1 dataDir=data/ 
 collection=${collection:collection1}
   property name=name value=collection1/
   property name=config value=solrconfig.xml/
   property name=solr.core.instanceDir value=solr/collection1//
   property name=transient value=false/
   property name=schema value=schema.xml/
   property name=loadOnStartup value=true/
   property name=solr.core.schemaName value=schema.xml/
   property name=solr.core.name value=collection1/
   property name=solr.core.dataDir value=data//
   property name=instanceDir value=collection1//
   property name=solr.core.configName value=solrconfig.xml/
 /core
 So, there are two questions:
 1 what is correct for 4.x?
 2 do we care at all about 5.x?
 As much as I hate to say it, I think that we need to go back to the 4.3 
 behavior. It might be as simple as not persisting in the property tags 
 anything already in the original definition. Not quite sure what to put where 
 in the newly-created core though, I suspect that the compact core + attribs 
 would be best (assuming there's no property tag already in the definition. 
 I really hate the mix of attributes on the core tag and property tags, 
 wish we had one or the other

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4910) solr.xml persistence is completely broken

2013-06-16 Thread Commit Tag Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13684900#comment-13684900
 ] 

Commit Tag Bot commented on SOLR-4910:
--

[trunk commit] erick
http://svn.apache.org/viewvc?view=revisionrevision=1493622

SOLR-4910, corrected typo in CHANGES.txt

 solr.xml persistence is completely broken
 -

 Key: SOLR-4910
 URL: https://issues.apache.org/jira/browse/SOLR-4910
 Project: Solr
  Issue Type: Bug
Affects Versions: 5.0, 4.4
Reporter: Erick Erickson
Assignee: Erick Erickson
Priority: Blocker
 Fix For: 5.0, 4.4

 Attachments: SOLR-4910.patch, SOLR-4910.patch, SOLR-4910.patch, 
 SOLR-4910.patch, SOLR-4910.patch


 I'm working on SOLR-4862 (persisting a created core doesn't preserve some 
 values) and at least compared to 4.3 code, persisting to solr.xml is 
 completely broken.
 I learned to hate persistence while working on SOLR-4196  etc. and I'm glad 
 it's going away. I frequently got lost in implicit properties (they're easy 
 to persist and shouldn't be), what should/shouldn't be persisted (e.g. the 
 translated ${var:default} or the original), and it was a monster, so don't 
 think I'm nostalgic for the historical behavior.
 Before I dive back in I want to get some idea whether or not the current 
 behavior was intentional or not, I don't want to go back into that junk only 
 to undo someone else's work.
 Creating a new core (collection2 in my example) with persistence turned on in 
 solr.xml for instance changes the original definition for collection1 (stock 
 4.x as of tonight) from this:
 core name=collection1 instanceDir=collection1 shard=${shard:} 
 collection=${collection:collection1} config=${solrconfig:solrconfig.xml} 
 schema=${schema:schema.xml}
   coreNodeName=${coreNodeName:}/
 to this:
   core loadOnStartup=true shard=${shard:} instanceDir=collection1/ 
 transient=false name=collection1 dataDir=data/ 
 collection=${collection:collection1}
   property name=name value=collection1/
   property name=config value=solrconfig.xml/
   property name=solr.core.instanceDir value=solr/collection1//
   property name=transient value=false/
   property name=schema value=schema.xml/
   property name=loadOnStartup value=true/
   property name=solr.core.schemaName value=schema.xml/
   property name=solr.core.name value=collection1/
   property name=solr.core.dataDir value=data//
   property name=instanceDir value=collection1//
   property name=solr.core.configName value=solrconfig.xml/
 /core
 So, there are two questions:
 1 what is correct for 4.x?
 2 do we care at all about 5.x?
 As much as I hate to say it, I think that we need to go back to the 4.3 
 behavior. It might be as simple as not persisting in the property tags 
 anything already in the original definition. Not quite sure what to put where 
 in the newly-created core though, I suspect that the compact core + attribs 
 would be best (assuming there's no property tag already in the definition. 
 I really hate the mix of attributes on the core tag and property tags, 
 wish we had one or the other

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Adding a mixture of language models to Lucene 4.0

2013-06-16 Thread Otis Gospodnetic
Hi Nikita,

Speaking only for myself here... maybe explain more about what this
library does in plain English - what problem does it solve?  I had to
look up the paper (ha! a known item!):
http://www.cs.cmu.edu/~callan/Papers/sigir03-pto.pdf (add to README so
others don't have to search?)

To make it easy to add this to Lucene, you should:
* use and include ASL
* include ASL snippet in each Java class
* switch to Java for tests
* move to org.apache.lucene...

HTH,
Otis
--
Solr  ElasticSearch Support -- http://sematext.com/





On Fri, Jun 14, 2013 at 7:43 PM, Nikita Zhiltsov
nikita.zhilt...@gmail.com wrote:
 Hi all,

 I've just published a tiny extension to Lucene 4.0, which enables a mixture
 of language models using standard FunctionQuery and ValueSource classes:
 https://github.com/nzhiltsov/lucene-mlm

 I'd like you to assess the possibility of integrating this code into Lucene.
 Appreciate any comments or fixes.

 NB. The implementation avoids using LMSimilarity per field basis, because it
 would break the computation of correct Dirichlet priors for non-matched
 terms, which the standard class LMSimilarity fails to include while
 calculating term frequencies and treats them as zero probability entries.

 --

 Nikita Zhiltsov

 Visiting Graduate Student
 Emory University
 Intelligent Information Access Lab
 E500 Emerson Hall, Atlanta, Georgia, USA
 Phone: (404) 834-5364
 E-mail: znik...@emory.edu


 -
 Graduate Student, Research Fellow
 Kazan Federal University
 Computational Linguistics Laboratory
 Russia, 420008
 Kazan, Prof. Nuzhina Str., 1/37 room 117
 Skype: nickita.jhiltsov
 Personal page: http://cll.niimm.ksu.ru/~nzhiltsov
 E-mail: nikita.zhilt...@gmail.com

 -

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3076) Solr should support block joins

2013-06-16 Thread Otis Gospodnetic (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13684935#comment-13684935
 ] 

Otis Gospodnetic commented on SOLR-3076:


This is issue #2-3 in terms of popularity.  Does it work in SolrCloud-type 
setups?


 Solr should support block joins
 ---

 Key: SOLR-3076
 URL: https://issues.apache.org/jira/browse/SOLR-3076
 Project: Solr
  Issue Type: New Feature
Reporter: Grant Ingersoll
 Fix For: 5.0, 4.4

 Attachments: 27M-singlesegment-histogram.png, 27M-singlesegment.png, 
 bjq-vs-filters-backward-disi.patch, bjq-vs-filters-illegal-state.patch, 
 child-bjqparser.patch, dih-3076.patch, dih-config.xml, 
 parent-bjq-qparser.patch, parent-bjq-qparser.patch, Screen Shot 2012-07-17 at 
 1.12.11 AM.png, SOLR-3076-childDocs.patch, SOLR-3076.patch, SOLR-3076.patch, 
 SOLR-3076.patch, SOLR-3076.patch, SOLR-3076.patch, SOLR-3076.patch, 
 SOLR-3076.patch, SOLR-3076.patch, 
 SOLR-7036-childDocs-solr-fork-trunk-patched, 
 solrconf-bjq-erschema-snippet.xml, solrconfig.xml.patch, 
 tochild-bjq-filtered-search-fix.patch


 Lucene has the ability to do block joins, we should add it to Solr.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4221) Custom sharding

2013-06-16 Thread Otis Gospodnetic (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13684936#comment-13684936
 ] 

Otis Gospodnetic commented on SOLR-4221:


[~noble.paul] should SOLR-4059 be closed as dupe?

 Custom sharding
 ---

 Key: SOLR-4221
 URL: https://issues.apache.org/jira/browse/SOLR-4221
 Project: Solr
  Issue Type: New Feature
Reporter: Yonik Seeley
Assignee: Noble Paul
 Attachments: SOLR-4221.patch


 Features to let users control everything about sharding/routing.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5017) SpatialOpRecursivePrefixTreeTest is failing

2013-06-16 Thread Commit Tag Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13684951#comment-13684951
 ] 

Commit Tag Bot commented on LUCENE-5017:


[branch_4x commit] dsmiley
http://svn.apache.org/viewvc?view=revisionrevision=1493637

LUCENE-5017: SpatialOpRecursivePrefixTreeTest Contains test bug.

 SpatialOpRecursivePrefixTreeTest is failing
 ---

 Key: LUCENE-5017
 URL: https://issues.apache.org/jira/browse/LUCENE-5017
 Project: Lucene - Core
  Issue Type: Bug
  Components: modules/spatial
Reporter: Michael McCandless
Assignee: David Smiley
 Fix For: 5.0, 4.4

 Attachments: LUCENE-5017_SpatialOpRecursivePrefixTreeTest_bug.patch


 This has been failing lately on trunk (e.g. on rev 1486339):
 {noformat}
 ant test  -Dtestcase=SpatialOpRecursivePrefixTreeTest 
 -Dtestmethod=testContains -Dtests.seed=456022665217DADF:2C2A2816BD2BA1C5 
 -Dtests.slow=true -Dtests.locale=nl_BE -Dtests.timezone=Poland 
 -Dtests.file.encoding=ISO-8859-1
 {noformat}
 Not sure what's up ...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [JENKINS] Lucene-Solr-4.x-Linux (32bit/jdk1.6.0_45) - Build # 6066 - Still Failing!

2013-06-16 Thread Smiley, David W.
So it turns out that most test failures were on the 4x branch and that's
because I forgot to apply a test bug to the 4x branch last month (doh!).
But there is still a bug for me to find because trunk failed, and I needed
the -Dtests.multiplier=3 to reproduce it.

Thanks Dawid.

~ David

On 6/15/13 5:19 AM, Dawid Weiss dawid.we...@gmail.com wrote:

 [junit4:junit4]   2 NOTE: reproduce with: ant test
 -Dtestcase=SpatialOpRecursivePrefixTreeTest
-Dtests.method=testContains {#1
 seed=[9166D28D6532217A:472BE5C4B7344982]} -Dtests.seed=9166D28D6532217A
 -Dtests.multiplier=3 -Dtests.slow=true -Dtests.locale=uk_UA
 -Dtests.timezone=Etc/GMT-6 -Dtests.file.encoding=UTF-8

This is a problem with JUnit in general -- the name of a test method
is not really known and has to be derived from a Description object...
lots of hairy stuff. The info shown above in -Dtests.method has a full
seed (class and method-level) so if you run with:

-Dtests.seed=9166D28D6532217A:472BE5C4B7344982

it should reproduce (if it's reproducible) because then the seed is
fixed for all reiterations of @Repeat. If you provide only the first
part of the seed then the @Repeat annotation will pick a different
seed for each run (and the failures should still reproduce).

Try it.

Dawid


 Notice the -Dtests.method=testContains {#1
 seed=[9166D28D6532217A:472BE5C4B7344982]} part, which is wrong because
if I
 do that, it'll not find the method to test.  If I change this to simply
 testContains, and set the seed normally -Dtests.seed=91 then I still
 can't reproduce the problem.  This test appears to have failed a bunch
of
 times lately with different seeds.

 ~ David

 -- Forwarded message --
 From: Policeman Jenkins Server jenk...@thetaphi.de
 Date: Fri, Jun 14, 2013 at 9:33 PM
 Subject: [JENKINS] Lucene-Solr-4.x-Linux (32bit/jdk1.6.0_45) - Build #
6066
 - Still Failing!
 To: dev@lucene.apache.org


 Build: http://jenkins.thetaphi.de/job/Lucene-Solr-4.x-Linux/6066/
 Java: 32bit/jdk1.6.0_45 -server -XX:+UseSerialGC

 1 tests failed.
 FAILED:
 
org.apache.lucene.spatial.prefix.SpatialOpRecursivePrefixTreeTest.testCon
tains
 {#1 seed=[9166D28D6532217A:472BE5C4B7344982]}

 Error Message:
 Shouldn't match I
 #0:ShapePair(Rect(minX=102.0,maxX=112.0,minY=-36.0,maxY=120.0) ,
 Rect(minX=168.0,maxX=175.0,minY=-1.0,maxY=11.0))
 Q:Rect(minX=0.0,maxX=256.0,minY=-128.0,maxY=128.0)

 Stack Trace:
 java.lang.AssertionError: Shouldn't match I
 #0:ShapePair(Rect(minX=102.0,maxX=112.0,minY=-36.0,maxY=120.0) ,
 Rect(minX=168.0,maxX=175.0,minY=-1.0,maxY=11.0))
 Q:Rect(minX=0.0,maxX=256.0,minY=-128.0,maxY=128.0)
 at
 __randomizedtesting.SeedInfo.seed([9166D28D6532217A:472BE5C4B7344982]:0)
 at org.junit.Assert.fail(Assert.java:93)
 at
 
org.apache.lucene.spatial.prefix.SpatialOpRecursivePrefixTreeTest.fail(Sp
atialOpRecursivePrefixTreeTest.java:287)
 at
 
org.apache.lucene.spatial.prefix.SpatialOpRecursivePrefixTreeTest.doTest(
SpatialOpRecursivePrefixTreeTest.java:273)
 at
 
org.apache.lucene.spatial.prefix.SpatialOpRecursivePrefixTreeTest.testCon
tains(SpatialOpRecursivePrefixTreeTest.java:101)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at
 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java
:39)
 at
 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorI
mpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at
 
com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunn
er.java:1559)
 at
 
com.carrotsearch.randomizedtesting.RandomizedRunner.access$600(Randomized
Runner.java:79)
 at
 
com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(Randomized
Runner.java:737)
 at
 
com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(Randomized
Runner.java:773)
 at
 
com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(Randomized
Runner.java:787)
 at
 
org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSe
tupTeardownChained.java:50)
 at
 
org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldC
acheSanity.java:51)
 at
 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeA
fterRule.java:46)
 at
 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.
evaluate(SystemPropertiesInvariantRule.java:55)
 at
 
org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThrea
dAndTestName.java:49)
 at
 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRule
IgnoreAfterMaxFailures.java:70)
 at
 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure
.java:48)
 at
 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(Statem
entAdapter.java:36)
 at
 
com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(

[jira] [Commented] (SOLR-3076) Solr should support block joins

2013-06-16 Thread Vadim Kirilchuk (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13684996#comment-13684996
 ] 

Vadim Kirilchuk commented on SOLR-3076:
---

Otis, patch have a test class 
solr/core/src/test/org/apache/solr/cloud/FullSolrCloudDistribCmdsTest.java and 
a method #testIndexQueryDeleteHierarchical which index, query and then delete 
hierarchical documents. However, it asserts only sizes of parents, children and 
grandchildren with simple term queries (not bjq), so someone need to check it 
manually or update a test.

By the way, what is the first issue? =)

 Solr should support block joins
 ---

 Key: SOLR-3076
 URL: https://issues.apache.org/jira/browse/SOLR-3076
 Project: Solr
  Issue Type: New Feature
Reporter: Grant Ingersoll
 Fix For: 5.0, 4.4

 Attachments: 27M-singlesegment-histogram.png, 27M-singlesegment.png, 
 bjq-vs-filters-backward-disi.patch, bjq-vs-filters-illegal-state.patch, 
 child-bjqparser.patch, dih-3076.patch, dih-config.xml, 
 parent-bjq-qparser.patch, parent-bjq-qparser.patch, Screen Shot 2012-07-17 at 
 1.12.11 AM.png, SOLR-3076-childDocs.patch, SOLR-3076.patch, SOLR-3076.patch, 
 SOLR-3076.patch, SOLR-3076.patch, SOLR-3076.patch, SOLR-3076.patch, 
 SOLR-3076.patch, SOLR-3076.patch, 
 SOLR-7036-childDocs-solr-fork-trunk-patched, 
 solrconf-bjq-erschema-snippet.xml, solrconfig.xml.patch, 
 tochild-bjq-filtered-search-fix.patch


 Lucene has the ability to do block joins, we should add it to Solr.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-3076) Solr should support block joins

2013-06-16 Thread Vadim Kirilchuk (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13684996#comment-13684996
 ] 

Vadim Kirilchuk edited comment on SOLR-3076 at 6/17/13 5:46 AM:


Otis, patch have a test class 
solr/core/src/test/org/apache/solr/cloud/FullSolrCloudDistribCmdsTest.java and 
a method #testIndexQueryDeleteHierarchical which index, query and then delete 
hierarchical documents. However, it asserts only sizes of parents, children and 
grandchildren with simple term queries (not bjq), so someone need to check it 
manually or update a test.

By the way, what is the #1 issue? =)

  was (Author: vkirilchuk):
Otis, patch have a test class 
solr/core/src/test/org/apache/solr/cloud/FullSolrCloudDistribCmdsTest.java and 
a method #testIndexQueryDeleteHierarchical which index, query and then delete 
hierarchical documents. However, it asserts only sizes of parents, children and 
grandchildren with simple term queries (not bjq), so someone need to check it 
manually or update a test.

By the way, what is the first issue? =)
  
 Solr should support block joins
 ---

 Key: SOLR-3076
 URL: https://issues.apache.org/jira/browse/SOLR-3076
 Project: Solr
  Issue Type: New Feature
Reporter: Grant Ingersoll
 Fix For: 5.0, 4.4

 Attachments: 27M-singlesegment-histogram.png, 27M-singlesegment.png, 
 bjq-vs-filters-backward-disi.patch, bjq-vs-filters-illegal-state.patch, 
 child-bjqparser.patch, dih-3076.patch, dih-config.xml, 
 parent-bjq-qparser.patch, parent-bjq-qparser.patch, Screen Shot 2012-07-17 at 
 1.12.11 AM.png, SOLR-3076-childDocs.patch, SOLR-3076.patch, SOLR-3076.patch, 
 SOLR-3076.patch, SOLR-3076.patch, SOLR-3076.patch, SOLR-3076.patch, 
 SOLR-3076.patch, SOLR-3076.patch, 
 SOLR-7036-childDocs-solr-fork-trunk-patched, 
 solrconf-bjq-erschema-snippet.xml, solrconfig.xml.patch, 
 tochild-bjq-filtered-search-fix.patch


 Lucene has the ability to do block joins, we should add it to Solr.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org