Re: backward incompatibility with MockTokenFilter

2013-08-16 Thread Simon Willnauer
Hey John, this class is used for testing only. It's part of the
testing framework and I don't think we can provide migration
suggestion or BW compat for that package. If you rely on the
functionality I suggest you to fork the code into your code base or
move to an official alternative in the analysis jars.

simon

On Fri, Aug 16, 2013 at 7:06 AM, John Wang john.w...@gmail.com wrote:
 Hi folks:

 In release 4.3.1, MockTokenFilter has an api to turn on/off position
 increments, e.g. :

 set/getEnablePositionIncrements()

 In release 4.4.0 that was removed. And the default behavior in 4.4.0 is that
 it is assumed to be true.

 But I don't see this change documented or a migration suggestion.

 Please advise.

 Thanks

 -John

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5168) ByteSliceReader assert trips with 32-bit oracle 1.7.0_25 + G1GC

2013-08-16 Thread Dawid Weiss (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13741968#comment-13741968
 ] 

Dawid Weiss commented on LUCENE-5168:
-

I can reproduce the issue on a different scenario too (core tests) so it's 
quite definitely a compiler bug lurking somewhere.
{code}
   [junit4] ERROR   0.00s | TestSimpleExplanations (suite) 
   [junit4] Throwable #1: java.lang.AssertionError
   [junit4]at 
__randomizedtesting.SeedInfo.seed([8C5A2DB2970990FA]:0)
   [junit4]at 
org.apache.lucene.index.FreqProxTermsWriterPerField.flush(FreqProxTermsWriterPerField.java:457)
   [junit4]at 
org.apache.lucene.index.FreqProxTermsWriter.flush(FreqProxTermsWriter.java:85)
   [junit4]at 
org.apache.lucene.index.TermsHash.flush(TermsHash.java:116)
   [junit4]at 
org.apache.lucene.index.DocInverter.flush(DocInverter.java:53)
   [junit4]at 
org.apache.lucene.index.DocFieldProcessor.flush(DocFieldProcessor.java:81)
   [junit4]at 
org.apache.lucene.index.DocumentsWriterPerThread.flush(DocumentsWriterPerThread.java:501)
   [junit4]at 
org.apache.lucene.index.DocumentsWriter.doFlush(DocumentsWriter.java:478)
   [junit4]at 
org.apache.lucene.index.DocumentsWriter.flushAllThreads(DocumentsWriter.java:615)
   [junit4]at 
org.apache.lucene.index.IndexWriter.getReader(IndexWriter.java:365)
   [junit4]at 
org.apache.lucene.index.RandomIndexWriter.getReader(RandomIndexWriter.java:307)
   [junit4]at 
org.apache.lucene.index.RandomIndexWriter.getReader(RandomIndexWriter.java:249)
   [junit4]at 
org.apache.lucene.search.TestExplanations.beforeClassTestExplanations(TestExplanations.java:82)
   [junit4]at java.lang.Thread.run(Thread.java:724)Throwable #2: 
java.lang.NullPointerException
   [junit4]at 
__randomizedtesting.SeedInfo.seed([8C5A2DB2970990FA]:0)
   [junit4]at 
org.apache.lucene.search.TestExplanations.afterClassTestExplanations(TestExplanations.java:63)
   [junit4]at java.lang.Thread.run(Thread.java:724)
   [junit4] Completed in 0.06s, 0 tests, 1 failure, 1 error  FAILURES!
{code}

Five failures out of a hundred full runs of lucene's core tests. So it's not a 
frequent thing, but it does happen. Java 1.8 b102, 32-bit (Windows).

 ByteSliceReader assert trips with 32-bit oracle 1.7.0_25 + G1GC
 ---

 Key: LUCENE-5168
 URL: https://issues.apache.org/jira/browse/LUCENE-5168
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir
 Attachments: java8-windows-4x-3075-console.txt


 This assertion trips (sometimes from different tests), if you run the 
 highlighting tests on branch_4x with r1512807.
 It reproduces about half the time, always only with 32bit + G1GC (other 
 combinations do not seem to trip it, i didnt try looping or anything really 
 though).
 {noformat}
 rmuir@beast:~/workspace/branch_4x$ svn up -r 1512807
 rmuir@beast:~/workspace/branch_4x$ ant clean
 rmuir@beast:~/workspace/branch_4x$ rm -rf .caches #this is important,
 otherwise master seed does not work!
 rmuir@beast:~/workspace/branch_4x/lucene/highlighter$ ant test
 -Dtests.jvms=2 -Dtests.seed=EBBFA6F4E80A7365 -Dargs=-server
 -XX:+UseG1GC
 {noformat}
 Originally showed up like this:
 {noformat}
 Build: http://jenkins.thetaphi.de/job/Lucene-Solr-4.x-Linux/6874/
 Java: 32bit/jdk1.7.0_25 -server -XX:+UseG1GC
 1 tests failed.
 REGRESSION:  
 org.apache.lucene.search.postingshighlight.TestPostingsHighlighter.testUserFailedToIndexOffsets
 Error Message:
 Stack Trace:
 java.lang.AssertionError
 at 
 __randomizedtesting.SeedInfo.seed([EBBFA6F4E80A7365:1FBF811885F2D611]:0)
 at 
 org.apache.lucene.index.ByteSliceReader.readByte(ByteSliceReader.java:73)
 at org.apache.lucene.store.DataInput.readVInt(DataInput.java:108)
 at 
 org.apache.lucene.index.FreqProxTermsWriterPerField.flush(FreqProxTermsWriterPerField.java:453)
 at 
 org.apache.lucene.index.FreqProxTermsWriter.flush(FreqProxTermsWriter.java:85)
 at org.apache.lucene.index.TermsHash.flush(TermsHash.java:116)
 at org.apache.lucene.index.DocInverter.flush(DocInverter.java:53)
 at 
 org.apache.lucene.index.DocFieldProcessor.flush(DocFieldProcessor.java:81)
 at 
 org.apache.lucene.index.DocumentsWriterPerThread.flush(DocumentsWriterPerThread.java:501)
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: 

[jira] [Commented] (SOLR-3280) to many / sometimes stale CLOSE_WAIT connections from SnapPuller during / after replication

2013-08-16 Thread Bernd Fehling (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13741972#comment-13741972
 ] 

Bernd Fehling commented on SOLR-3280:
-

After going from solr 3.6 to 4.2.1 I haven't seen this anymore. There was 
pretty much rework done in SnapPuller due to multicore. Which version are you 
using?

 to many / sometimes stale CLOSE_WAIT connections from SnapPuller during / 
 after replication
 ---

 Key: SOLR-3280
 URL: https://issues.apache.org/jira/browse/SOLR-3280
 Project: Solr
  Issue Type: Bug
Affects Versions: 3.5, 3.6, 4.0-ALPHA
Reporter: Bernd Fehling
Assignee: Robert Muir
Priority: Minor
 Attachments: SOLR-3280.patch


 There are sometimes to many and also stale CLOSE_WAIT connections 
 during/after replication left over on SLAVE server.
 Normally GC should clean up this but this is not always the case.
 Also if a CLOSE_WAIT is hanging then the new replication won't load.
 Dirty work around so far is to fake a TCP connection as root to that 
 connection and close it. 
 After that the new replication will load, the old index and searcher released 
 and the system will
 return to normal operation.
 Background:
 The SnapPuller is using Apache httpclient 3.x and uses the 
 MultiThreadedHttpConnectionManager.
 The manager holds a connection in CLOSE_WAIT after its use for further 
 requests.
 This is done by calling releaseConnection. But if a connection is stuck it is 
 not available any more and a new
 connection from the pool is used.
 Solution:
 After calling releaseConnection clean up with closeIdleConnections(0).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-5168) ByteSliceReader assert trips with 32-bit oracle 1.7.0_25 + G1GC

2013-08-16 Thread Dawid Weiss (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dawid Weiss updated LUCENE-5168:


Attachment: log.0100
log.0086
log.0078
log.0042
log.0025

Failed logs from 1.8b102 runs.

 ByteSliceReader assert trips with 32-bit oracle 1.7.0_25 + G1GC
 ---

 Key: LUCENE-5168
 URL: https://issues.apache.org/jira/browse/LUCENE-5168
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir
 Attachments: java8-windows-4x-3075-console.txt, log.0025, log.0042, 
 log.0078, log.0086, log.0100


 This assertion trips (sometimes from different tests), if you run the 
 highlighting tests on branch_4x with r1512807.
 It reproduces about half the time, always only with 32bit + G1GC (other 
 combinations do not seem to trip it, i didnt try looping or anything really 
 though).
 {noformat}
 rmuir@beast:~/workspace/branch_4x$ svn up -r 1512807
 rmuir@beast:~/workspace/branch_4x$ ant clean
 rmuir@beast:~/workspace/branch_4x$ rm -rf .caches #this is important,
 otherwise master seed does not work!
 rmuir@beast:~/workspace/branch_4x/lucene/highlighter$ ant test
 -Dtests.jvms=2 -Dtests.seed=EBBFA6F4E80A7365 -Dargs=-server
 -XX:+UseG1GC
 {noformat}
 Originally showed up like this:
 {noformat}
 Build: http://jenkins.thetaphi.de/job/Lucene-Solr-4.x-Linux/6874/
 Java: 32bit/jdk1.7.0_25 -server -XX:+UseG1GC
 1 tests failed.
 REGRESSION:  
 org.apache.lucene.search.postingshighlight.TestPostingsHighlighter.testUserFailedToIndexOffsets
 Error Message:
 Stack Trace:
 java.lang.AssertionError
 at 
 __randomizedtesting.SeedInfo.seed([EBBFA6F4E80A7365:1FBF811885F2D611]:0)
 at 
 org.apache.lucene.index.ByteSliceReader.readByte(ByteSliceReader.java:73)
 at org.apache.lucene.store.DataInput.readVInt(DataInput.java:108)
 at 
 org.apache.lucene.index.FreqProxTermsWriterPerField.flush(FreqProxTermsWriterPerField.java:453)
 at 
 org.apache.lucene.index.FreqProxTermsWriter.flush(FreqProxTermsWriter.java:85)
 at org.apache.lucene.index.TermsHash.flush(TermsHash.java:116)
 at org.apache.lucene.index.DocInverter.flush(DocInverter.java:53)
 at 
 org.apache.lucene.index.DocFieldProcessor.flush(DocFieldProcessor.java:81)
 at 
 org.apache.lucene.index.DocumentsWriterPerThread.flush(DocumentsWriterPerThread.java:501)
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5164) Can not create a collection via collections API (cloud mode)

2013-08-16 Thread Alan Woodward (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13742014#comment-13742014
 ] 

Alan Woodward commented on SOLR-5164:
-

Related:  SOLR-5099.

I think we need an explicit test for creating collections via the API, though.  
It's a bit scary that this bug can occur without the test suite complaining 
about it.  I'm busy for the next couple of days, but will have some time next 
week if nobody else gets there first.

 Can not create a collection via collections API (cloud mode)
 

 Key: SOLR-5164
 URL: https://issues.apache.org/jira/browse/SOLR-5164
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.5, 5.0
Reporter: Erick Erickson
Assignee: Erick Erickson
Priority: Blocker
 Attachments: SOLR-5164.patch


 When you try to create a collection in SolrCloud, the instanceDir that gets 
 created has an extra solr in it which messes up the pathing for all the 
 lib directives in solrconfig.xml as they're all relative.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5168) ByteSliceReader assert trips with 32-bit oracle 1.7.0_25 + G1GC

2013-08-16 Thread Dawid Weiss (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13742015#comment-13742015
 ] 

Dawid Weiss commented on LUCENE-5168:
-

This issue also affects 1.7.0_21-b11 (32 bit).

 ByteSliceReader assert trips with 32-bit oracle 1.7.0_25 + G1GC
 ---

 Key: LUCENE-5168
 URL: https://issues.apache.org/jira/browse/LUCENE-5168
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir
 Attachments: java8-windows-4x-3075-console.txt, log.0025, log.0042, 
 log.0078, log.0086, log.0100


 This assertion trips (sometimes from different tests), if you run the 
 highlighting tests on branch_4x with r1512807.
 It reproduces about half the time, always only with 32bit + G1GC (other 
 combinations do not seem to trip it, i didnt try looping or anything really 
 though).
 {noformat}
 rmuir@beast:~/workspace/branch_4x$ svn up -r 1512807
 rmuir@beast:~/workspace/branch_4x$ ant clean
 rmuir@beast:~/workspace/branch_4x$ rm -rf .caches #this is important,
 otherwise master seed does not work!
 rmuir@beast:~/workspace/branch_4x/lucene/highlighter$ ant test
 -Dtests.jvms=2 -Dtests.seed=EBBFA6F4E80A7365 -Dargs=-server
 -XX:+UseG1GC
 {noformat}
 Originally showed up like this:
 {noformat}
 Build: http://jenkins.thetaphi.de/job/Lucene-Solr-4.x-Linux/6874/
 Java: 32bit/jdk1.7.0_25 -server -XX:+UseG1GC
 1 tests failed.
 REGRESSION:  
 org.apache.lucene.search.postingshighlight.TestPostingsHighlighter.testUserFailedToIndexOffsets
 Error Message:
 Stack Trace:
 java.lang.AssertionError
 at 
 __randomizedtesting.SeedInfo.seed([EBBFA6F4E80A7365:1FBF811885F2D611]:0)
 at 
 org.apache.lucene.index.ByteSliceReader.readByte(ByteSliceReader.java:73)
 at org.apache.lucene.store.DataInput.readVInt(DataInput.java:108)
 at 
 org.apache.lucene.index.FreqProxTermsWriterPerField.flush(FreqProxTermsWriterPerField.java:453)
 at 
 org.apache.lucene.index.FreqProxTermsWriter.flush(FreqProxTermsWriter.java:85)
 at org.apache.lucene.index.TermsHash.flush(TermsHash.java:116)
 at org.apache.lucene.index.DocInverter.flush(DocInverter.java:53)
 at 
 org.apache.lucene.index.DocFieldProcessor.flush(DocFieldProcessor.java:81)
 at 
 org.apache.lucene.index.DocumentsWriterPerThread.flush(DocumentsWriterPerThread.java:501)
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5152) EdgeNGramFilterFactory deletes token

2013-08-16 Thread Christoph Lingg (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13742051#comment-13742051
 ] 

Christoph Lingg commented on SOLR-5152:
---

how about a properties as in _WhitespaceTokenizerFactory_: preserveOriginal=1

 EdgeNGramFilterFactory deletes token
 

 Key: SOLR-5152
 URL: https://issues.apache.org/jira/browse/SOLR-5152
 Project: Solr
  Issue Type: Improvement
Affects Versions: 4.4
Reporter: Christoph Lingg

 I am using EdgeNGramFilterFactory in my schema.xml
 {code:xml}fieldType name=text class=solr.TextField 
 positionIncrementGap=100
   analyzer type=index
 !-- ... --
 filter class=solr.EdgeNGramFilterFactory minGramSize=2 
 maxGramSize=10 side=front /
   /analyzer
 /fieldType{code}
 Some tokens in my index only consist of one character, let's say {{R}}. 
 minGramSize is set to 2 and is bigger than the length of the token. I 
 expected the NGramFilter to left {{R}} unchanged but in fact it is deleting 
 the token.
 For my use case this interpretation is undesirable, and probably for most use 
 cases too!?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5178) doc values should allow configurable defaults

2013-08-16 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13742072#comment-13742072
 ] 

ASF subversion and git services commented on LUCENE-5178:
-

Commit 1514642 from [~rcmuir] in branch 'dev/branches/lucene5178'
[ https://svn.apache.org/r1514642 ]

LUCENE-5178: add 'missing' support to docvalues (simpletext only)

 doc values should allow configurable defaults
 -

 Key: LUCENE-5178
 URL: https://issues.apache.org/jira/browse/LUCENE-5178
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Yonik Seeley

 DocValues should somehow allow a configurable default per-field.
 Possible implementations include setting it on the field in the document or 
 registration of an IndexWriter callback.
 If we don't make the default configurable, then another option is to have 
 DocValues fields keep track of whether a value was indexed for that document 
 or not.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5178) doc values should allow configurable defaults

2013-08-16 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13742073#comment-13742073
 ] 

Robert Muir commented on LUCENE-5178:
-

I created a patch with getDocsWithField (and the current 
fieldcache.getDocsWithField passing thru to it) for docvalues so you know if a 
value is missing.

It also means e.g. SortedDocValues returns -1 ord for missing like fieldcache. 
so its completely consistent there with FC.

currently simpletext is the only one implementing it: the other codecs return 
MatchAllBits (and thats how the backcompat will work, because they never had 
missing values before).

all tests are passing, but I want to think about strategies for the efficient 
codecs (Memory/Disk) before doing anything.

one other thing i like is, if we do it this way, the codec has the chance to 
represent missing values in a more efficient way than if the users do it 
themselves on top.


 doc values should allow configurable defaults
 -

 Key: LUCENE-5178
 URL: https://issues.apache.org/jira/browse/LUCENE-5178
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Yonik Seeley

 DocValues should somehow allow a configurable default per-field.
 Possible implementations include setting it on the field in the document or 
 registration of an IndexWriter callback.
 If we don't make the default configurable, then another option is to have 
 DocValues fields keep track of whether a value was indexed for that document 
 or not.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5164) Can not create a collection via collections API (cloud mode)

2013-08-16 Thread Erick Erickson (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13742098#comment-13742098
 ] 

Erick Erickson commented on SOLR-5164:
--

Blast, I wish I'd paid more attention to SOLR-5099, it'd have saved me some 
time. Sigh..

[~romseygeek] I looked and there are some collection creation tests, but I 
didn't dig enough to understand completely why the second solr in the path 
didn't trip this condition. What it didn't seem like we have was a way to 
restart from scratch. And in the case of SOLR-5099, core creation does succeed 
it's the restart that's the problem.

FWIW.

 Can not create a collection via collections API (cloud mode)
 

 Key: SOLR-5164
 URL: https://issues.apache.org/jira/browse/SOLR-5164
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.5, 5.0
Reporter: Erick Erickson
Assignee: Erick Erickson
Priority: Blocker
 Attachments: SOLR-5164.patch


 When you try to create a collection in SolrCloud, the instanceDir that gets 
 created has an extra solr in it which messes up the pathing for all the 
 lib directives in solrconfig.xml as they're all relative.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (SOLR-5099) The core.properties not created during collection creation

2013-08-16 Thread Erick Erickson (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-5099?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erick Erickson resolved SOLR-5099.
--

   Resolution: Fixed
Fix Version/s: 5.0
   4.5
 Assignee: Erick Erickson  (was: Alan Woodward)

Herb:

I stumbled across this as well. I sure wish I'd paid more attention to this 
JIRA before, you'd have saved me a couple of hours of head-scratching. Nice 
sleuthing, you nailed the problem.

Anyway, I'll check in the fixes for SOLR-5164 this morning and this will be 
fixed.

 The core.properties not created during collection creation
 --

 Key: SOLR-5099
 URL: https://issues.apache.org/jira/browse/SOLR-5099
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Affects Versions: 4.5, 5.0
Reporter: Herb Jiang
Assignee: Erick Erickson
Priority: Critical
 Fix For: 4.5, 5.0

 Attachments: CorePropertiesLocator.java.patch


 When using the new solr.xml structure. The core auto discovery mechanism 
 trying to find core.properties. 
 But I found the core.properties cannot be create when I dynamically create a 
 collection.
 The root issue is the CorePropertiesLocator trying to create properties 
 before the instanceDir is created. 
 And collection creation process will done and looks fine at runtime, but it 
 will cause issues (cores are not auto discovered after server restart).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: MoreLikeThis (MLT) - AND operator between the fields

2013-08-16 Thread Erick Erickson
I don't know enough about MLT to have an opinion one way or the other. But
it's
perfectly fine to open up a JIRA and attach your patch,
see: http://wiki.apache.org/solr/HowToContribute

Best
Erick


On Thu, Aug 15, 2013 at 12:13 PM, Kranti Parisa kranti.par...@gmail.comwrote:

 I was looking at the code and found that it is hard coded to Occur.SHOULD
 in MoreLikeThisQuery.

 I customized the code to pass a new parameter *mlt.operator*=AND/OR
  based on that it computes the MLT documents. Default operator is set to OR.
 And I also want to have *mlt.sort* option, So I will be trying for that
 as well.

 Do you guys think, we should make this part of the MLT feature?
 Please share your ideas. I can submit this change.


 Thanks  Regards,
 Kranti K Parisa
 http://www.linkedin.com/in/krantiparisa



 On Thu, Aug 15, 2013 at 12:05 AM, Kranti Parisa 
 kranti.par...@gmail.comwrote:

 Hi,

 It seems that when we pass multiple field names with mlt.fl parameter, it
 is ORing them to find the MLT documents.

 Is there a way to specify AND operator? Means if mlt.fl=language,year,
 then we should return back the MLT documents which has language AND year
 field values as same as the main query result document.


 http://localhost:8180/solr/mltCore/mlt?q=id:1wt=jsonmlt=truemlt.fl=language,yearfl=*,scoremlt.mindf=0mlt.mintf=0mlt.match.include=false

 The above query should return those documents whose field values
 (language, year) are exactly matching with the document id:1.

 Is this possible thru any config or param? If not, I think it's worth
 having as a feature because we don't know the values of those fields to
 apply as FQ.


 Thanks  Regards,
 Kranti K Parisa
 http://www.linkedin.com/in/krantiparisa





[jira] [Created] (SOLR-5167) Ability to use AnalyzingInfixSuggester in Solr

2013-08-16 Thread Varun Thacker (JIRA)
Varun Thacker created SOLR-5167:
---

 Summary: Ability to use AnalyzingInfixSuggester in Solr
 Key: SOLR-5167
 URL: https://issues.apache.org/jira/browse/SOLR-5167
 Project: Solr
  Issue Type: New Feature
  Components: SearchComponents - other
Reporter: Varun Thacker
Priority: Minor
 Fix For: 4.5, 5.0


We should be able to use AnalyzingInfixSuggester in Solr by defining it in 
solrconfig.xml



--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5149) Query facet to respect mincount

2013-08-16 Thread Markus Jelsma (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13742123#comment-13742123
 ] 

Markus Jelsma commented on SOLR-5149:
-

The use cases mostly limit themselves to saving space when we have a large 
amount of facet queries to return. Also, if our different clients toggle 
mincount with one setting but also have facet queries, we need additional code 
maintain the behaviour. This is not a problem, only inconvenient.

Yes, facet.query.mincount sounds fine.

 Query facet to respect mincount
 ---

 Key: SOLR-5149
 URL: https://issues.apache.org/jira/browse/SOLR-5149
 Project: Solr
  Issue Type: Bug
  Components: SearchComponents - other
Affects Versions: 4.4
Reporter: Markus Jelsma
Priority: Minor
 Fix For: 4.5, 5.0

 Attachments: SOLR-5149-trunk.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-5149) Query facet to respect mincount

2013-08-16 Thread Markus Jelsma (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-5149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Markus Jelsma updated SOLR-5149:


Attachment: SOLR-5149-trunk.patch

Patch for trunk now introduces patch.query.mincount. There's no support for 
facet.zeros in this patch.

 Query facet to respect mincount
 ---

 Key: SOLR-5149
 URL: https://issues.apache.org/jira/browse/SOLR-5149
 Project: Solr
  Issue Type: Bug
  Components: SearchComponents - other
Affects Versions: 4.4
Reporter: Markus Jelsma
Priority: Minor
 Fix For: 4.5, 5.0

 Attachments: SOLR-5149-trunk.patch, SOLR-5149-trunk.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5164) Can not create a collection via collections API (cloud mode)

2013-08-16 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13742134#comment-13742134
 ] 

ASF subversion and git services commented on SOLR-5164:
---

Commit 1514666 from [~erickoerickson] in branch 'dev/trunk'
[ https://svn.apache.org/r1514666 ]

SOLR-5164, Can not create a collection via collections API (cloud mode). Fixes 
SOLR-5099 too

 Can not create a collection via collections API (cloud mode)
 

 Key: SOLR-5164
 URL: https://issues.apache.org/jira/browse/SOLR-5164
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.5, 5.0
Reporter: Erick Erickson
Assignee: Erick Erickson
Priority: Blocker
 Attachments: SOLR-5164.patch


 When you try to create a collection in SolrCloud, the instanceDir that gets 
 created has an extra solr in it which messes up the pathing for all the 
 lib directives in solrconfig.xml as they're all relative.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5099) The core.properties not created during collection creation

2013-08-16 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13742135#comment-13742135
 ] 

ASF subversion and git services commented on SOLR-5099:
---

Commit 1514666 from [~erickoerickson] in branch 'dev/trunk'
[ https://svn.apache.org/r1514666 ]

SOLR-5164, Can not create a collection via collections API (cloud mode). Fixes 
SOLR-5099 too

 The core.properties not created during collection creation
 --

 Key: SOLR-5099
 URL: https://issues.apache.org/jira/browse/SOLR-5099
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Affects Versions: 4.5, 5.0
Reporter: Herb Jiang
Assignee: Erick Erickson
Priority: Critical
 Fix For: 4.5, 5.0

 Attachments: CorePropertiesLocator.java.patch


 When using the new solr.xml structure. The core auto discovery mechanism 
 trying to find core.properties. 
 But I found the core.properties cannot be create when I dynamically create a 
 collection.
 The root issue is the CorePropertiesLocator trying to create properties 
 before the instanceDir is created. 
 And collection creation process will done and looks fine at runtime, but it 
 will cause issues (cores are not auto discovered after server restart).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5167) Ability to use AnalyzingInfixSuggester in Solr

2013-08-16 Thread Varun Thacker (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13742139#comment-13742139
 ] 

Varun Thacker commented on SOLR-5167:
-

We could define it like:
{noformat}
searchComponent class=solr.SpellCheckComponent name=suggest
  lst name=spellchecker
str name=namesuggest/str
str name=classnameorg.apache.solr.spelling.suggest.Suggester/str
str 
name=lookupImplorg.apache.solr.spelling.suggest.AnalyzingInfixSuggester/str
str name=fieldname/str  !-- the indexed field to derive suggestions 
from --
str name=buildOnCommittrue/str
str name=storeDirsuggester/str
str name=suggestAnalyzerFieldTypetext_general/str
str name=minPrefixChars4/str
  /lst
/searchComponent
{noformat}

 Ability to use AnalyzingInfixSuggester in Solr
 --

 Key: SOLR-5167
 URL: https://issues.apache.org/jira/browse/SOLR-5167
 Project: Solr
  Issue Type: New Feature
  Components: SearchComponents - other
Reporter: Varun Thacker
Priority: Minor
 Fix For: 4.5, 5.0


 We should be able to use AnalyzingInfixSuggester in Solr by defining it in 
 solrconfig.xml

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4583) StraightBytesDocValuesField fails if bytes 32k

2013-08-16 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13742145#comment-13742145
 ] 

ASF subversion and git services commented on LUCENE-4583:
-

Commit 1514669 from [~mikemccand] in branch 'dev/trunk'
[ https://svn.apache.org/r1514669 ]

LUCENE-4583: IndexWriter no longer places a limit on length of DV binary fields 
(individual codecs still have their limits, including the default codec)

 StraightBytesDocValuesField fails if bytes  32k
 

 Key: LUCENE-4583
 URL: https://issues.apache.org/jira/browse/LUCENE-4583
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/index
Affects Versions: 4.0, 4.1, 5.0
Reporter: David Smiley
Assignee: Michael McCandless
Priority: Critical
 Fix For: 5.0, 4.5

 Attachments: LUCENE-4583.patch, LUCENE-4583.patch, LUCENE-4583.patch, 
 LUCENE-4583.patch, LUCENE-4583.patch, LUCENE-4583.patch, LUCENE-4583.patch, 
 LUCENE-4583.patch


 I didn't observe any limitations on the size of a bytes based DocValues field 
 value in the docs.  It appears that the limit is 32k, although I didn't get 
 any friendly error telling me that was the limit.  32k is kind of small IMO; 
 I suspect this limit is unintended and as such is a bug.The following 
 test fails:
 {code:java}
   public void testBigDocValue() throws IOException {
 Directory dir = newDirectory();
 IndexWriter writer = new IndexWriter(dir, writerConfig(false));
 Document doc = new Document();
 BytesRef bytes = new BytesRef((4+4)*4097);//4096 works
 bytes.length = bytes.bytes.length;//byte data doesn't matter
 doc.add(new StraightBytesDocValuesField(dvField, bytes));
 writer.addDocument(doc);
 writer.commit();
 writer.close();
 DirectoryReader reader = DirectoryReader.open(dir);
 DocValues docValues = MultiDocValues.getDocValues(reader, dvField);
 //FAILS IF BYTES IS BIG!
 docValues.getSource().getBytes(0, bytes);
 reader.close();
 dir.close();
   }
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-5167) Ability to use AnalyzingInfixSuggester in Solr

2013-08-16 Thread Varun Thacker (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-5167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Thacker updated SOLR-5167:


Attachment: SOLR-5167.patch

I have a few doubts over this impl.

1. AnalyzingInfixSuggester.store() and AnalyzingInfixSuggester.load() return 
true instead of false. Not sure if this is the right?

2. Suggester.reload() throws a FileNotFoundException since no file actually 
gets written. Any suggestions on what the right approach for this would be.

 Ability to use AnalyzingInfixSuggester in Solr
 --

 Key: SOLR-5167
 URL: https://issues.apache.org/jira/browse/SOLR-5167
 Project: Solr
  Issue Type: New Feature
  Components: SearchComponents - other
Reporter: Varun Thacker
Priority: Minor
 Fix For: 4.5, 5.0

 Attachments: SOLR-5167.patch


 We should be able to use AnalyzingInfixSuggester in Solr by defining it in 
 solrconfig.xml

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-5178) doc values should expose missing values (or allow configurable defaults)

2013-08-16 Thread Yonik Seeley (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yonik Seeley updated LUCENE-5178:
-

Summary: doc values should expose missing values (or allow configurable 
defaults)  (was: doc values should allow configurable defaults)

 doc values should expose missing values (or allow configurable defaults)
 

 Key: LUCENE-5178
 URL: https://issues.apache.org/jira/browse/LUCENE-5178
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Yonik Seeley

 DocValues should somehow allow a configurable default per-field.
 Possible implementations include setting it on the field in the document or 
 registration of an IndexWriter callback.
 If we don't make the default configurable, then another option is to have 
 DocValues fields keep track of whether a value was indexed for that document 
 or not.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5099) The core.properties not created during collection creation

2013-08-16 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13742163#comment-13742163
 ] 

ASF subversion and git services commented on SOLR-5099:
---

Commit 1514684 from [~erickoerickson] in branch 'dev/branches/branch_4x'
[ https://svn.apache.org/r1514684 ]

SOLR-5164, Can not create a collection via collections API (cloud mode). Fixes 
SOLR-5099 too

 The core.properties not created during collection creation
 --

 Key: SOLR-5099
 URL: https://issues.apache.org/jira/browse/SOLR-5099
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Affects Versions: 4.5, 5.0
Reporter: Herb Jiang
Assignee: Erick Erickson
Priority: Critical
 Fix For: 4.5, 5.0

 Attachments: CorePropertiesLocator.java.patch


 When using the new solr.xml structure. The core auto discovery mechanism 
 trying to find core.properties. 
 But I found the core.properties cannot be create when I dynamically create a 
 collection.
 The root issue is the CorePropertiesLocator trying to create properties 
 before the instanceDir is created. 
 And collection creation process will done and looks fine at runtime, but it 
 will cause issues (cores are not auto discovered after server restart).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5164) Can not create a collection via collections API (cloud mode)

2013-08-16 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13742162#comment-13742162
 ] 

ASF subversion and git services commented on SOLR-5164:
---

Commit 1514684 from [~erickoerickson] in branch 'dev/branches/branch_4x'
[ https://svn.apache.org/r1514684 ]

SOLR-5164, Can not create a collection via collections API (cloud mode). Fixes 
SOLR-5099 too

 Can not create a collection via collections API (cloud mode)
 

 Key: SOLR-5164
 URL: https://issues.apache.org/jira/browse/SOLR-5164
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.5, 5.0
Reporter: Erick Erickson
Assignee: Erick Erickson
Priority: Blocker
 Attachments: SOLR-5164.patch


 When you try to create a collection in SolrCloud, the instanceDir that gets 
 created has an extra solr in it which messes up the pathing for all the 
 lib directives in solrconfig.xml as they're all relative.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (SOLR-5164) Can not create a collection via collections API (cloud mode)

2013-08-16 Thread Erick Erickson (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-5164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erick Erickson resolved SOLR-5164.
--

   Resolution: Fixed
Fix Version/s: 5.0
   4.5

 Can not create a collection via collections API (cloud mode)
 

 Key: SOLR-5164
 URL: https://issues.apache.org/jira/browse/SOLR-5164
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.5, 5.0
Reporter: Erick Erickson
Assignee: Erick Erickson
Priority: Blocker
 Fix For: 4.5, 5.0

 Attachments: SOLR-5164.patch


 When you try to create a collection in SolrCloud, the instanceDir that gets 
 created has an extra solr in it which messes up the pathing for all the 
 lib directives in solrconfig.xml as they're all relative.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5178) doc values should expose missing values (or allow configurable defaults)

2013-08-16 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13742182#comment-13742182
 ] 

Yonik Seeley commented on LUCENE-5178:
--

Yes, I think tracking/exposing missing values is the best option,  esp for 
numerics where you can use the full range and still tell of there was a value 
or not.

 doc values should expose missing values (or allow configurable defaults)
 

 Key: LUCENE-5178
 URL: https://issues.apache.org/jira/browse/LUCENE-5178
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Yonik Seeley

 DocValues should somehow allow a configurable default per-field.
 Possible implementations include setting it on the field in the document or 
 registration of an IndexWriter callback.
 If we don't make the default configurable, then another option is to have 
 DocValues fields keep track of whether a value was indexed for that document 
 or not.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5178) doc values should expose missing values (or allow configurable defaults)

2013-08-16 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13742186#comment-13742186
 ] 

Robert Muir commented on LUCENE-5178:
-

OK. I can remove the solr defaultValue check here too: i have to fix the tests 
to test sort missing first/last / facet missing etc anyway (currently the dv 
tests avoid that).

 doc values should expose missing values (or allow configurable defaults)
 

 Key: LUCENE-5178
 URL: https://issues.apache.org/jira/browse/LUCENE-5178
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Yonik Seeley

 DocValues should somehow allow a configurable default per-field.
 Possible implementations include setting it on the field in the document or 
 registration of an IndexWriter callback.
 If we don't make the default configurable, then another option is to have 
 DocValues fields keep track of whether a value was indexed for that document 
 or not.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-4718) Allow solr.xml to be stored in zookeeper

2013-08-16 Thread Erick Erickson (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-4718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erick Erickson updated SOLR-4718:
-

Attachment: SOLR-4718.patch

Alan's patch with some modifications and with the new test cases.

 Allow solr.xml to be stored in zookeeper
 

 Key: SOLR-4718
 URL: https://issues.apache.org/jira/browse/SOLR-4718
 Project: Solr
  Issue Type: Improvement
  Components: Schema and Analysis
Affects Versions: 4.3, 5.0
Reporter: Erick Erickson
Assignee: Erick Erickson
 Attachments: SOLR-4718-alternative.patch, SOLR-4718.patch, 
 SOLR-4718.patch, SOLR-4718.patch, SOLR-4718.patch, SOLR-4718.patch, 
 SOLR-4718.patch


 So the near-final piece of this puzzle is to make solr.xml be storable in 
 Zookeeper. Code-wise in terms of Solr, this doesn't look very difficult, I'm 
 working on it now.
 More interesting is how to get the configuration into ZK in the first place, 
 enhancements to ZkCli? Or boostrap-conf? Other? I'm punting on that for this 
 patch.
 Second level is how to tell Solr to get the file from ZK. Some possibilities:
 1 A system prop, -DzkSolrXmlPath=blah where blah is the path _on zk_ where 
 the file is. Would require -DzkHost or -DzkRun as well.
pros - simple, I can wrap my head around it.
  - easy to script
cons - can't run multiple JVMs pointing to different files. Is this 
 really a problem?
 2 New solr.xml element. Something like:
 solr
   solrcloud
  str name=zkHostzkurl/str
  str name=zkSolrXmlPathwhatever/str
   /solrcloud
 solr
Really, this form would hinge on the presence or absence of zkSolrXmlPath. 
 If present, go up and look for the indicated solr.xml file on ZK. Any 
 properties in the ZK version would overwrite anything in the local copy.
 NOTE: I'm really not very interested in supporting this as an option for 
 old-style solr.xml unless it's _really_ easy. For instance, what if the local 
 solr.xml is new-style and the one in ZK is old-style? Or vice-versa? Since 
 old-style is going away, this doesn't seem like it's worth the effort.
 pros - No new mechanisms
 cons - once again requires that there be a solr.xml file on each client. 
 Admittedly for installations that didn't care much about multiple JVMs, it 
 could be a stock file that didn't change...
 For now, I'm going to just manually push solr.xml to ZK, then read it based 
 on a sysprop. That'll get the structure in place while we debate. Not going 
 to check this in until there's some consensus though.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5168) ByteSliceReader assert trips with 32-bit oracle 1.7.0_25 + G1GC

2013-08-16 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13742233#comment-13742233
 ] 

Robert Muir commented on LUCENE-5168:
-

Out of curiosity, were those failures also with G1GC?

 ByteSliceReader assert trips with 32-bit oracle 1.7.0_25 + G1GC
 ---

 Key: LUCENE-5168
 URL: https://issues.apache.org/jira/browse/LUCENE-5168
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir
 Attachments: java8-windows-4x-3075-console.txt, log.0025, log.0042, 
 log.0078, log.0086, log.0100


 This assertion trips (sometimes from different tests), if you run the 
 highlighting tests on branch_4x with r1512807.
 It reproduces about half the time, always only with 32bit + G1GC (other 
 combinations do not seem to trip it, i didnt try looping or anything really 
 though).
 {noformat}
 rmuir@beast:~/workspace/branch_4x$ svn up -r 1512807
 rmuir@beast:~/workspace/branch_4x$ ant clean
 rmuir@beast:~/workspace/branch_4x$ rm -rf .caches #this is important,
 otherwise master seed does not work!
 rmuir@beast:~/workspace/branch_4x/lucene/highlighter$ ant test
 -Dtests.jvms=2 -Dtests.seed=EBBFA6F4E80A7365 -Dargs=-server
 -XX:+UseG1GC
 {noformat}
 Originally showed up like this:
 {noformat}
 Build: http://jenkins.thetaphi.de/job/Lucene-Solr-4.x-Linux/6874/
 Java: 32bit/jdk1.7.0_25 -server -XX:+UseG1GC
 1 tests failed.
 REGRESSION:  
 org.apache.lucene.search.postingshighlight.TestPostingsHighlighter.testUserFailedToIndexOffsets
 Error Message:
 Stack Trace:
 java.lang.AssertionError
 at 
 __randomizedtesting.SeedInfo.seed([EBBFA6F4E80A7365:1FBF811885F2D611]:0)
 at 
 org.apache.lucene.index.ByteSliceReader.readByte(ByteSliceReader.java:73)
 at org.apache.lucene.store.DataInput.readVInt(DataInput.java:108)
 at 
 org.apache.lucene.index.FreqProxTermsWriterPerField.flush(FreqProxTermsWriterPerField.java:453)
 at 
 org.apache.lucene.index.FreqProxTermsWriter.flush(FreqProxTermsWriter.java:85)
 at org.apache.lucene.index.TermsHash.flush(TermsHash.java:116)
 at org.apache.lucene.index.DocInverter.flush(DocInverter.java:53)
 at 
 org.apache.lucene.index.DocFieldProcessor.flush(DocFieldProcessor.java:81)
 at 
 org.apache.lucene.index.DocumentsWriterPerThread.flush(DocumentsWriterPerThread.java:501)
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-5156) Provide a way to move the contents of a file to ZooKeeper with ZkCLI

2013-08-16 Thread Erick Erickson (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-5156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erick Erickson updated SOLR-5156:
-

Attachment: SOLR-5156.patch

I'll commit this shortly

 Provide a way to move the contents of a file to ZooKeeper with ZkCLI
 

 Key: SOLR-5156
 URL: https://issues.apache.org/jira/browse/SOLR-5156
 Project: Solr
  Issue Type: Improvement
Reporter: Erick Erickson
Assignee: Erick Erickson
 Attachments: SOLR-5156.patch, SOLR-5156.patch


 Spinoff from SOLR-4718. We don't have any good way of putting solr.xml up in 
 Zookeeper in the first place. So while we can fake getting the file up there 
 we need a way consistent with ZkCLI

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-5179) Refactoring on PostingsWriterBase for delta-encoding

2013-08-16 Thread Han Jiang (JIRA)
Han Jiang created LUCENE-5179:
-

 Summary: Refactoring on PostingsWriterBase for delta-encoding
 Key: LUCENE-5179
 URL: https://issues.apache.org/jira/browse/LUCENE-5179
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Han Jiang
Assignee: Han Jiang
 Fix For: 5.0, 4.5


A further step from LUCENE-5029.

The short story is, previous API change brings two problems:
* it somewhat breaks backward compatibility: although we can still read old 
format,
  we can no longer reproduce it;
* pulsing codec have problem with it.

And long story...

With the change, current PostingsBase API will be like this:

* term dict tells PBF we start a new term (via startTerm());
* PBF adds docs, positions and other postings data;
* term dict tells PBF all the data for current term is completed (via 
finishTerm()),
  then PBF returns the metadata for current term (as long[] and byte[]);
* term dict might buffer all the metadata in an ArrayList. when all the term is 
collected,
  it then decides how those metadata will be located on disk.

So after the API change, PBF no longer have that annoying 'flushTermBlock', and 
instead
term dict maintains the term, metadata list.

However, for each term we'll now write long[] blob before byte[], so the index 
format is not consistent with pre-4.5.
like in Lucne41, the metadata can be written as longA,bytesA,longB, but now we 
have to write as longA,longB,bytesA.

Another problem is, pulsing codec cannot tell wrapped PBF how the metadata is 
delta-encoded, after all
PulsingPostingsWriter is only a PBF.

For example, we have terms=[a, a1, a2, b, b1 b2] and 
itemsInBlock=2, so theoretically
we'll finally have three blocks in BTTR: [a b]  [a1 a2]  [b1 b2], 
with this
approach, the metadata of term b is delta encoded base on metadata of a. 
but when term dict tells
PBF to finishTerm(b), it might silly do the delta encode base on term a2.

So I think maybe we can introduce a method 'encodeTerm(long[], DataOutput out, 
FieldInfo, TermState, boolean absolute)',
so that during metadata flush, we can control how current term is written? And 
the term dict will buffer TermState, which
implicitly holds metadata like we do in PBReader side.

For example, if we want to reproduce old lucene41 format , we can simple set 
longsSize==0, then PBF
writes the old format (longA,bytesA,longB) to DataOutput, and the compatible 
issue is solved.
For pulsing codec, it will also be able to tell lower level how to encode 
metadata.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5168) ByteSliceReader assert trips with 32-bit oracle 1.7.0_25 + G1GC

2013-08-16 Thread Dawid Weiss (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13742243#comment-13742243
 ] 

Dawid Weiss commented on LUCENE-5168:
-

Yes. This just has to be complex though because it's not just GC-related. 
Disabling escape analysis also makes the tests pass, so does removing inlining.

I managed to find a reproducible scenario under 1.8 (fastdebug) which is great 
because now I can dump the assembly. It's still terribly large...

Anyway, the blame still seems to point to readvint :) Really, not joking. I 
added a sysout in 
{code}
final int code = freq.readVInt();
{code}
this is consistent when the test passes but when it fails you get a difference:
{code}
// normal run
code::0 true
code::4 true
code::2 true

// error run
code::0 true
code::3 true
code::4 true
{code}



 ByteSliceReader assert trips with 32-bit oracle 1.7.0_25 + G1GC
 ---

 Key: LUCENE-5168
 URL: https://issues.apache.org/jira/browse/LUCENE-5168
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir
 Attachments: java8-windows-4x-3075-console.txt, log.0025, log.0042, 
 log.0078, log.0086, log.0100


 This assertion trips (sometimes from different tests), if you run the 
 highlighting tests on branch_4x with r1512807.
 It reproduces about half the time, always only with 32bit + G1GC (other 
 combinations do not seem to trip it, i didnt try looping or anything really 
 though).
 {noformat}
 rmuir@beast:~/workspace/branch_4x$ svn up -r 1512807
 rmuir@beast:~/workspace/branch_4x$ ant clean
 rmuir@beast:~/workspace/branch_4x$ rm -rf .caches #this is important,
 otherwise master seed does not work!
 rmuir@beast:~/workspace/branch_4x/lucene/highlighter$ ant test
 -Dtests.jvms=2 -Dtests.seed=EBBFA6F4E80A7365 -Dargs=-server
 -XX:+UseG1GC
 {noformat}
 Originally showed up like this:
 {noformat}
 Build: http://jenkins.thetaphi.de/job/Lucene-Solr-4.x-Linux/6874/
 Java: 32bit/jdk1.7.0_25 -server -XX:+UseG1GC
 1 tests failed.
 REGRESSION:  
 org.apache.lucene.search.postingshighlight.TestPostingsHighlighter.testUserFailedToIndexOffsets
 Error Message:
 Stack Trace:
 java.lang.AssertionError
 at 
 __randomizedtesting.SeedInfo.seed([EBBFA6F4E80A7365:1FBF811885F2D611]:0)
 at 
 org.apache.lucene.index.ByteSliceReader.readByte(ByteSliceReader.java:73)
 at org.apache.lucene.store.DataInput.readVInt(DataInput.java:108)
 at 
 org.apache.lucene.index.FreqProxTermsWriterPerField.flush(FreqProxTermsWriterPerField.java:453)
 at 
 org.apache.lucene.index.FreqProxTermsWriter.flush(FreqProxTermsWriter.java:85)
 at org.apache.lucene.index.TermsHash.flush(TermsHash.java:116)
 at org.apache.lucene.index.DocInverter.flush(DocInverter.java:53)
 at 
 org.apache.lucene.index.DocFieldProcessor.flush(DocFieldProcessor.java:81)
 at 
 org.apache.lucene.index.DocumentsWriterPerThread.flush(DocumentsWriterPerThread.java:501)
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Reopened] (SOLR-5164) Can not create a collection via collections API (cloud mode)

2013-08-16 Thread Mark Miller (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-5164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Miller reopened SOLR-5164:
---


We should add a test case to the collections api that catches this.

Also, did this affect 4.4? The Affects versions seems to indicate not? If thats 
the case, there should be no separate changes entry.

 Can not create a collection via collections API (cloud mode)
 

 Key: SOLR-5164
 URL: https://issues.apache.org/jira/browse/SOLR-5164
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.5, 5.0
Reporter: Erick Erickson
Assignee: Erick Erickson
Priority: Blocker
 Fix For: 4.5, 5.0

 Attachments: SOLR-5164.patch


 When you try to create a collection in SolrCloud, the instanceDir that gets 
 created has an extra solr in it which messes up the pathing for all the 
 lib directives in solrconfig.xml as they're all relative.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Reopened] (SOLR-5099) The core.properties not created during collection creation

2013-08-16 Thread Mark Miller (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-5099?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Miller reopened SOLR-5099:
---


We need a test for this as well - I'm happy to do it if no one else does, but 
lets not resolve these types of bugs until we have tests for htem.

 The core.properties not created during collection creation
 --

 Key: SOLR-5099
 URL: https://issues.apache.org/jira/browse/SOLR-5099
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Affects Versions: 4.5, 5.0
Reporter: Herb Jiang
Assignee: Erick Erickson
Priority: Critical
 Fix For: 4.5, 5.0

 Attachments: CorePropertiesLocator.java.patch


 When using the new solr.xml structure. The core auto discovery mechanism 
 trying to find core.properties. 
 But I found the core.properties cannot be create when I dynamically create a 
 collection.
 The root issue is the CorePropertiesLocator trying to create properties 
 before the instanceDir is created. 
 And collection creation process will done and looks fine at runtime, but it 
 will cause issues (cores are not auto discovered after server restart).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5164) Can not create a collection via collections API (cloud mode)

2013-08-16 Thread Erick Erickson (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13742259#comment-13742259
 ] 

Erick Erickson commented on SOLR-5164:
--

yeah, we should have a test, but this has been a pretty big rathole for me 
already and I didn't see a simple way to create a test, see my comment earlier.

No, it didn't affect 4.4 so I'll take the entry out of CHANGES.txt in the next 
JIRA I fix (should be this morning sometime).



 Can not create a collection via collections API (cloud mode)
 

 Key: SOLR-5164
 URL: https://issues.apache.org/jira/browse/SOLR-5164
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.5, 5.0
Reporter: Erick Erickson
Assignee: Erick Erickson
Priority: Blocker
 Fix For: 4.5, 5.0

 Attachments: SOLR-5164.patch


 When you try to create a collection in SolrCloud, the instanceDir that gets 
 created has an extra solr in it which messes up the pathing for all the 
 lib directives in solrconfig.xml as they're all relative.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5164) Can not create a collection via collections API (cloud mode)

2013-08-16 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13742258#comment-13742258
 ] 

Mark Miller commented on SOLR-5164:
---

I've reopened SOLR-5099 as well - tests for these bugs are as important as the 
fixes.

 Can not create a collection via collections API (cloud mode)
 

 Key: SOLR-5164
 URL: https://issues.apache.org/jira/browse/SOLR-5164
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.5, 5.0
Reporter: Erick Erickson
Assignee: Erick Erickson
Priority: Blocker
 Fix For: 4.5, 5.0

 Attachments: SOLR-5164.patch


 When you try to create a collection in SolrCloud, the instanceDir that gets 
 created has an extra solr in it which messes up the pathing for all the 
 lib directives in solrconfig.xml as they're all relative.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-5156) Provide a way to move the contents of a file to ZooKeeper with ZkCLI

2013-08-16 Thread Erick Erickson (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-5156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erick Erickson updated SOLR-5156:
-

Attachment: SOLR-5156.patch

Final patch with bogus nocommit removed, passing precommit checks.

 Provide a way to move the contents of a file to ZooKeeper with ZkCLI
 

 Key: SOLR-5156
 URL: https://issues.apache.org/jira/browse/SOLR-5156
 Project: Solr
  Issue Type: Improvement
Reporter: Erick Erickson
Assignee: Erick Erickson
 Attachments: SOLR-5156.patch, SOLR-5156.patch, SOLR-5156.patch


 Spinoff from SOLR-4718. We don't have any good way of putting solr.xml up in 
 Zookeeper in the first place. So while we can fake getting the file up there 
 we need a way consistent with ZkCLI

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5164) Can not create a collection via collections API (cloud mode)

2013-08-16 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13742261#comment-13742261
 ] 

Mark Miller commented on SOLR-5164:
---

bq. but this has been a pretty big rathole for me already and I didn't see a 
simple way to create a test

That's fine, but please don't resolve the issue then. Bug fixes for really ugly 
issues like these absolutely need tests to make sure they don't keep coming 
back. We have seen that type of thing a lot recently - we fix something like 
this and it just breaks a couple months later in a new refactoring. You don't 
have to write the tests, but you might ask for some advice or help from someone 
else on it before resolving the issue. I'm happy to help make sure these 
problems have tests.

 Can not create a collection via collections API (cloud mode)
 

 Key: SOLR-5164
 URL: https://issues.apache.org/jira/browse/SOLR-5164
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.5, 5.0
Reporter: Erick Erickson
Assignee: Erick Erickson
Priority: Blocker
 Fix For: 4.5, 5.0

 Attachments: SOLR-5164.patch


 When you try to create a collection in SolrCloud, the instanceDir that gets 
 created has an extra solr in it which messes up the pathing for all the 
 lib directives in solrconfig.xml as they're all relative.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-4718) Allow solr.xml to be stored in zookeeper

2013-08-16 Thread Erick Erickson (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-4718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erick Erickson updated SOLR-4718:
-

Attachment: SOLR-4718.patch

Final patch with, CHANGES.txt entry here.

 Allow solr.xml to be stored in zookeeper
 

 Key: SOLR-4718
 URL: https://issues.apache.org/jira/browse/SOLR-4718
 Project: Solr
  Issue Type: Improvement
  Components: Schema and Analysis
Affects Versions: 4.3, 5.0
Reporter: Erick Erickson
Assignee: Erick Erickson
 Attachments: SOLR-4718-alternative.patch, SOLR-4718.patch, 
 SOLR-4718.patch, SOLR-4718.patch, SOLR-4718.patch, SOLR-4718.patch, 
 SOLR-4718.patch, SOLR-4718.patch


 So the near-final piece of this puzzle is to make solr.xml be storable in 
 Zookeeper. Code-wise in terms of Solr, this doesn't look very difficult, I'm 
 working on it now.
 More interesting is how to get the configuration into ZK in the first place, 
 enhancements to ZkCli? Or boostrap-conf? Other? I'm punting on that for this 
 patch.
 Second level is how to tell Solr to get the file from ZK. Some possibilities:
 1 A system prop, -DzkSolrXmlPath=blah where blah is the path _on zk_ where 
 the file is. Would require -DzkHost or -DzkRun as well.
pros - simple, I can wrap my head around it.
  - easy to script
cons - can't run multiple JVMs pointing to different files. Is this 
 really a problem?
 2 New solr.xml element. Something like:
 solr
   solrcloud
  str name=zkHostzkurl/str
  str name=zkSolrXmlPathwhatever/str
   /solrcloud
 solr
Really, this form would hinge on the presence or absence of zkSolrXmlPath. 
 If present, go up and look for the indicated solr.xml file on ZK. Any 
 properties in the ZK version would overwrite anything in the local copy.
 NOTE: I'm really not very interested in supporting this as an option for 
 old-style solr.xml unless it's _really_ easy. For instance, what if the local 
 solr.xml is new-style and the one in ZK is old-style? Or vice-versa? Since 
 old-style is going away, this doesn't seem like it's worth the effort.
 pros - No new mechanisms
 cons - once again requires that there be a solr.xml file on each client. 
 Admittedly for installations that didn't care much about multiple JVMs, it 
 could be a stock file that didn't change...
 For now, I'm going to just manually push solr.xml to ZK, then read it based 
 on a sysprop. That'll get the structure in place while we debate. Not going 
 to check this in until there's some consensus though.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[JENKINS] Lucene-Solr-trunk-MacOSX (64bit/jdk1.7.0) - Build # 737 - Failure!

2013-08-16 Thread Policeman Jenkins Server
Build: http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-MacOSX/737/
Java: 64bit/jdk1.7.0 -XX:-UseCompressedOops -XX:+UseConcMarkSweepGC

All tests passed

Build Log:
[...truncated 9920 lines...]
   [junit4] ERROR: JVM J0 ended with an exception, command line: 
/Library/Java/JavaVirtualMachines/jdk1.7.0_25.jdk/Contents/Home/jre/bin/java 
-XX:-UseCompressedOops -XX:+UseConcMarkSweepGC -XX:+HeapDumpOnOutOfMemoryError 
-XX:HeapDumpPath=/Users/jenkins/jenkins-slave/workspace/Lucene-Solr-trunk-MacOSX/heapdumps
 -Dtests.prefix=tests -Dtests.seed=DBD6FE1DD046F358 -Xmx512M -Dtests.iters= 
-Dtests.verbose=false -Dtests.infostream=false -Dtests.codec=random 
-Dtests.postingsformat=random -Dtests.docvaluesformat=random 
-Dtests.locale=random -Dtests.timezone=random -Dtests.directory=random 
-Dtests.linedocsfile=europarl.lines.txt.gz -Dtests.luceneMatchVersion=5.0 
-Dtests.cleanthreads=perClass 
-Djava.util.logging.config.file=/Users/jenkins/jenkins-slave/workspace/Lucene-Solr-trunk-MacOSX/lucene/tools/junit4/logging.properties
 -Dtests.nightly=false -Dtests.weekly=false -Dtests.slow=true 
-Dtests.asserts.gracious=false -Dtests.multiplier=1 -DtempDir=. 
-Djava.io.tmpdir=. 
-Djunit4.tempDir=/Users/jenkins/jenkins-slave/workspace/Lucene-Solr-trunk-MacOSX/solr/build/solr-core/test/temp
 
-Dclover.db.dir=/Users/jenkins/jenkins-slave/workspace/Lucene-Solr-trunk-MacOSX/lucene/build/clover/db
 -Djava.security.manager=org.apache.lucene.util.TestSecurityManager 
-Djava.security.policy=/Users/jenkins/jenkins-slave/workspace/Lucene-Solr-trunk-MacOSX/lucene/tools/junit4/tests.policy
 -Dlucene.version=5.0-SNAPSHOT -Djetty.testMode=1 -Djetty.insecurerandom=1 
-Dsolr.directoryFactory=org.apache.solr.core.MockDirectoryFactory 
-Djava.awt.headless=true -Dtests.disableHdfs=true -Dfile.encoding=UTF-8 
-classpath 

[jira] [Updated] (SOLR-5164) Creating collections via the Collections API does not work with lib include directives.

2013-08-16 Thread Mark Miller (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-5164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Miller updated SOLR-5164:
--

Component/s: SolrCloud
   Priority: Critical  (was: Blocker)
Summary: Creating collections via the Collections API does not work 
with lib include directives.  (was: Can not create a collection via collections 
API (cloud mode))

 Creating collections via the Collections API does not work with lib include 
 directives.
 ---

 Key: SOLR-5164
 URL: https://issues.apache.org/jira/browse/SOLR-5164
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Affects Versions: 4.5, 5.0
Reporter: Erick Erickson
Assignee: Erick Erickson
Priority: Critical
 Fix For: 4.5, 5.0

 Attachments: SOLR-5164.patch


 When you try to create a collection in SolrCloud, the instanceDir that gets 
 created has an extra solr in it which messes up the pathing for all the 
 lib directives in solrconfig.xml as they're all relative.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-5164) Creating collections via the Collections API fails due to core being created in the wrong directory

2013-08-16 Thread Erick Erickson (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-5164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erick Erickson updated SOLR-5164:
-

Summary: Creating collections via the Collections API fails due to core 
being created in the wrong directory  (was: Creating collections via the 
Collections API does not work with lib include directives.)

 Creating collections via the Collections API fails due to core being created 
 in the wrong directory
 ---

 Key: SOLR-5164
 URL: https://issues.apache.org/jira/browse/SOLR-5164
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Affects Versions: 4.5, 5.0
Reporter: Erick Erickson
Assignee: Erick Erickson
Priority: Critical
 Fix For: 4.5, 5.0

 Attachments: SOLR-5164.patch


 When you try to create a collection in SolrCloud, the instanceDir that gets 
 created has an extra solr in it which messes up the pathing for all the 
 lib directives in solrconfig.xml as they're all relative.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5164) Creating collections via the Collections API fails due to core being created in the wrong directory

2013-08-16 Thread Erick Erickson (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13742360#comment-13742360
 ] 

Erick Erickson commented on SOLR-5164:
--

Well, the code is fixed, how about raising another JIRA instead?


 Creating collections via the Collections API fails due to core being created 
 in the wrong directory
 ---

 Key: SOLR-5164
 URL: https://issues.apache.org/jira/browse/SOLR-5164
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Affects Versions: 4.5, 5.0
Reporter: Erick Erickson
Assignee: Erick Erickson
Priority: Critical
 Fix For: 4.5, 5.0

 Attachments: SOLR-5164.patch


 When you try to create a collection in SolrCloud, the instanceDir that gets 
 created has an extra solr in it which messes up the pathing for all the 
 lib directives in solrconfig.xml as they're all relative.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5156) Provide a way to move the contents of a file to ZooKeeper with ZkCLI

2013-08-16 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13742364#comment-13742364
 ] 

ASF subversion and git services commented on SOLR-5156:
---

Commit 1514776 from [~erickoerickson] in branch 'dev/trunk'
[ https://svn.apache.org/r1514776 ]

SOLR-5156 Provide a way to move the contents of a file to ZooKeeper with ZkCLI

 Provide a way to move the contents of a file to ZooKeeper with ZkCLI
 

 Key: SOLR-5156
 URL: https://issues.apache.org/jira/browse/SOLR-5156
 Project: Solr
  Issue Type: Improvement
Reporter: Erick Erickson
Assignee: Erick Erickson
 Attachments: SOLR-5156.patch, SOLR-5156.patch, SOLR-5156.patch


 Spinoff from SOLR-4718. We don't have any good way of putting solr.xml up in 
 Zookeeper in the first place. So while we can fake getting the file up there 
 we need a way consistent with ZkCLI

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Assigned] (SOLR-5099) The core.properties not created during collection creation

2013-08-16 Thread Mark Miller (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-5099?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Miller reassigned SOLR-5099:
-

Assignee: Mark Miller  (was: Erick Erickson)

 The core.properties not created during collection creation
 --

 Key: SOLR-5099
 URL: https://issues.apache.org/jira/browse/SOLR-5099
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Affects Versions: 4.5, 5.0
Reporter: Herb Jiang
Assignee: Mark Miller
Priority: Critical
 Fix For: 4.5, 5.0

 Attachments: CorePropertiesLocator.java.patch


 When using the new solr.xml structure. The core auto discovery mechanism 
 trying to find core.properties. 
 But I found the core.properties cannot be create when I dynamically create a 
 collection.
 The root issue is the CorePropertiesLocator trying to create properties 
 before the instanceDir is created. 
 And collection creation process will done and looks fine at runtime, but it 
 will cause issues (cores are not auto discovered after server restart).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Assigned] (SOLR-5164) Creating collections via the Collections API fails due to core being created in the wrong directory

2013-08-16 Thread Mark Miller (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-5164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Miller reassigned SOLR-5164:
-

Assignee: Mark Miller  (was: Erick Erickson)

 Creating collections via the Collections API fails due to core being created 
 in the wrong directory
 ---

 Key: SOLR-5164
 URL: https://issues.apache.org/jira/browse/SOLR-5164
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Affects Versions: 4.5, 5.0
Reporter: Erick Erickson
Assignee: Mark Miller
Priority: Critical
 Fix For: 4.5, 5.0

 Attachments: SOLR-5164.patch


 When you try to create a collection in SolrCloud, the instanceDir that gets 
 created has an extra solr in it which messes up the pathing for all the 
 lib directives in solrconfig.xml as they're all relative.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5164) Creating collections via the Collections API fails due to core being created in the wrong directory

2013-08-16 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13742388#comment-13742388
 ] 

Mark Miller commented on SOLR-5164:
---

I don't consider this fixed without a test. The two issues are critical and 
somewhat complicated issues.

I'm going to write the tests - without them, we only have your word they are 
fixed today and a random guess they will still be fixed tomorrow or the next 
day. These two issues are much too critical to not consider a test part of the 
issue.

I'll finish the issues.

 Creating collections via the Collections API fails due to core being created 
 in the wrong directory
 ---

 Key: SOLR-5164
 URL: https://issues.apache.org/jira/browse/SOLR-5164
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Affects Versions: 4.5, 5.0
Reporter: Erick Erickson
Assignee: Erick Erickson
Priority: Critical
 Fix For: 4.5, 5.0

 Attachments: SOLR-5164.patch


 When you try to create a collection in SolrCloud, the instanceDir that gets 
 created has an extra solr in it which messes up the pathing for all the 
 lib directives in solrconfig.xml as they're all relative.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-5179) Refactoring on PostingsWriterBase for delta-encoding

2013-08-16 Thread Han Jiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5179?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Han Jiang updated LUCENE-5179:
--

Attachment: LUCENE-5179.patch

Patch for branch3069, tests pass for all 'temp' postings format.

 Refactoring on PostingsWriterBase for delta-encoding
 

 Key: LUCENE-5179
 URL: https://issues.apache.org/jira/browse/LUCENE-5179
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Han Jiang
Assignee: Han Jiang
 Fix For: 5.0, 4.5

 Attachments: LUCENE-5179.patch


 A further step from LUCENE-5029.
 The short story is, previous API change brings two problems:
 * it somewhat breaks backward compatibility: although we can still read old 
 format,
   we can no longer reproduce it;
 * pulsing codec have problem with it.
 And long story...
 With the change, current PostingsBase API will be like this:
 * term dict tells PBF we start a new term (via startTerm());
 * PBF adds docs, positions and other postings data;
 * term dict tells PBF all the data for current term is completed (via 
 finishTerm()),
   then PBF returns the metadata for current term (as long[] and byte[]);
 * term dict might buffer all the metadata in an ArrayList. when all the term 
 is collected,
   it then decides how those metadata will be located on disk.
 So after the API change, PBF no longer have that annoying 'flushTermBlock', 
 and instead
 term dict maintains the term, metadata list.
 However, for each term we'll now write long[] blob before byte[], so the 
 index format is not consistent with pre-4.5.
 like in Lucne41, the metadata can be written as longA,bytesA,longB, but now 
 we have to write as longA,longB,bytesA.
 Another problem is, pulsing codec cannot tell wrapped PBF how the metadata is 
 delta-encoded, after all
 PulsingPostingsWriter is only a PBF.
 For example, we have terms=[a, a1, a2, b, b1 b2] and 
 itemsInBlock=2, so theoretically
 we'll finally have three blocks in BTTR: [a b]  [a1 a2]  [b1 b2], 
 with this
 approach, the metadata of term b is delta encoded base on metadata of a. 
 but when term dict tells
 PBF to finishTerm(b), it might silly do the delta encode base on term a2.
 So I think maybe we can introduce a method 'encodeTerm(long[], DataOutput 
 out, FieldInfo, TermState, boolean absolute)',
 so that during metadata flush, we can control how current term is written? 
 And the term dict will buffer TermState, which
 implicitly holds metadata like we do in PBReader side.
 For example, if we want to reproduce old lucene41 format , we can simple set 
 longsSize==0, then PBF
 writes the old format (longA,bytesA,longB) to DataOutput, and the compatible 
 issue is solved.
 For pulsing codec, it will also be able to tell lower level how to encode 
 metadata.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5099) The core.properties not created during collection creation

2013-08-16 Thread Erick Erickson (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13742403#comment-13742403
 ] 

Erick Erickson commented on SOLR-5099:
--

FWIW, a separate test case would be fine here, but note that the actual fix is 
part of SOLR-5164. I didn't see Herb's patch until after I'd found it as part 
of SOLR-5164

 The core.properties not created during collection creation
 --

 Key: SOLR-5099
 URL: https://issues.apache.org/jira/browse/SOLR-5099
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Affects Versions: 4.5, 5.0
Reporter: Herb Jiang
Assignee: Mark Miller
Priority: Critical
 Fix For: 4.5, 5.0

 Attachments: CorePropertiesLocator.java.patch


 When using the new solr.xml structure. The core auto discovery mechanism 
 trying to find core.properties. 
 But I found the core.properties cannot be create when I dynamically create a 
 collection.
 The root issue is the CorePropertiesLocator trying to create properties 
 before the instanceDir is created. 
 And collection creation process will done and looks fine at runtime, but it 
 will cause issues (cores are not auto discovered after server restart).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3936) QueryElevationComponent: Wrong order when result grouping is activated

2013-08-16 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13742405#comment-13742405
 ] 

ASF subversion and git services commented on SOLR-3936:
---

Commit 1514795 from hoss...@apache.org in branch 'dev/trunk'
[ https://svn.apache.org/r1514795 ]

SOLR-3936: Fixed QueryElevationComponent sorting when used with Grouping

 QueryElevationComponent: Wrong order when result grouping is activated
 --

 Key: SOLR-3936
 URL: https://issues.apache.org/jira/browse/SOLR-3936
 Project: Solr
  Issue Type: Bug
  Components: SearchComponents - other
Affects Versions: 4.0
Reporter: Michael Berger
Assignee: Hoss Man
 Attachments: SOLR-3936.patch, SOLR-3936.patch


 When I use elevation together with grouping I got not the expected result 
 order.
 I tried it with the standard solr example:
 http://localhost:8983/solr/elevate?enableElevation=truefl=score%2C[elevated]%2Cid%2CnameforceElevation=truegroup.field=manugroup=onindent=onq=ipodwt=json
  
 but the results ignored the elevation: 
 { 
   responseHeader:{ 
 status:0, 
 QTime:2, 
 params:{ 
   enableElevation:true, 
   fl:score,[elevated],id,name, 
   indent:on, 
   q:ipod, 
   forceElevation:true, 
   group.field:manu, 
   group:on, 
   wt:json}}, 
   grouped:{ 
 manu:{ 
   matches:2, 
   groups:[{ 
   groupValue:belkin, 
   doclist:{numFound:1,start:0,maxScore:0.7698604,docs:[ 
   { 
 id:F8V7067-APL-KIT, 
 name:Belkin Mobile Power Cord for iPod w/ Dock, 
 score:0.7698604, 
 [elevated]:false}] 
   }}, 
 { 
   groupValue:inc, 
   doclist:{numFound:1,start:0,maxScore:0.28869766,docs:[ 
   { 
 id:MA147LL/A, 
 name:Apple 60 GB iPod with Video Playback Black, 
 score:0.28869766, 
 [elevated]:true}] 
   }}]}}}
 the elevate.xml defines the following rules :
 query text=ipod
doc id=MA147LL/A /  !-- put the actual ipod at the top --
doc id=IW-02 exclude=true / !-- exclude this cable --
  /query
  
 /elevate

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4718) Allow solr.xml to be stored in zookeeper

2013-08-16 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13742407#comment-13742407
 ] 

ASF subversion and git services commented on SOLR-4718:
---

Commit 1514800 from [~erickoerickson] in branch 'dev/trunk'
[ https://svn.apache.org/r1514800 ]

SOLR-4718 Allow solr.xml to be stored in ZooKeeper

 Allow solr.xml to be stored in zookeeper
 

 Key: SOLR-4718
 URL: https://issues.apache.org/jira/browse/SOLR-4718
 Project: Solr
  Issue Type: Improvement
  Components: Schema and Analysis
Affects Versions: 4.3, 5.0
Reporter: Erick Erickson
Assignee: Erick Erickson
 Attachments: SOLR-4718-alternative.patch, SOLR-4718.patch, 
 SOLR-4718.patch, SOLR-4718.patch, SOLR-4718.patch, SOLR-4718.patch, 
 SOLR-4718.patch, SOLR-4718.patch


 So the near-final piece of this puzzle is to make solr.xml be storable in 
 Zookeeper. Code-wise in terms of Solr, this doesn't look very difficult, I'm 
 working on it now.
 More interesting is how to get the configuration into ZK in the first place, 
 enhancements to ZkCli? Or boostrap-conf? Other? I'm punting on that for this 
 patch.
 Second level is how to tell Solr to get the file from ZK. Some possibilities:
 1 A system prop, -DzkSolrXmlPath=blah where blah is the path _on zk_ where 
 the file is. Would require -DzkHost or -DzkRun as well.
pros - simple, I can wrap my head around it.
  - easy to script
cons - can't run multiple JVMs pointing to different files. Is this 
 really a problem?
 2 New solr.xml element. Something like:
 solr
   solrcloud
  str name=zkHostzkurl/str
  str name=zkSolrXmlPathwhatever/str
   /solrcloud
 solr
Really, this form would hinge on the presence or absence of zkSolrXmlPath. 
 If present, go up and look for the indicated solr.xml file on ZK. Any 
 properties in the ZK version would overwrite anything in the local copy.
 NOTE: I'm really not very interested in supporting this as an option for 
 old-style solr.xml unless it's _really_ easy. For instance, what if the local 
 solr.xml is new-style and the one in ZK is old-style? Or vice-versa? Since 
 old-style is going away, this doesn't seem like it's worth the effort.
 pros - No new mechanisms
 cons - once again requires that there be a solr.xml file on each client. 
 Admittedly for installations that didn't care much about multiple JVMs, it 
 could be a stock file that didn't change...
 For now, I'm going to just manually push solr.xml to ZK, then read it based 
 on a sysprop. That'll get the structure in place while we debate. Not going 
 to check this in until there's some consensus though.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5150) HdfsIndexInput may not fully read requested bytes.

2013-08-16 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13742432#comment-13742432
 ] 

Mark Miller commented on SOLR-5150:
---

I've held off on committing this because some performance tests indicate the 
upstream blur patch may have been more performant for merging/flushing while 
the current patch is *much* more performant for queries.

We might be able to use one or the other based on the IOContext.

I'm waiting until I can get some more results and testing done though - I've 
seen lots of random deadlock situations in some of my testing with the upstream 
blue fix (synchronization around two calls).

 HdfsIndexInput may not fully read requested bytes.
 --

 Key: SOLR-5150
 URL: https://issues.apache.org/jira/browse/SOLR-5150
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.4
Reporter: Mark Miller
Assignee: Mark Miller
 Fix For: 4.5, 5.0

 Attachments: SOLR-5150.patch


 Patrick Hunt noticed that our HdfsDirectory code was a bit behind Blur here - 
 the read call we are using may not read all of the requested bytes - it 
 returns the number of bytes actually written - which we ignore.
 Blur moved to using a seek and then readFully call - synchronizing across the 
 two calls to deal with clones.
 We have seen that really kills performance, and using the readFully call that 
 lets you pass the position rather than first doing a seek, performs much 
 better and does not require the synchronization.
 I also noticed that the seekInternal impl should not seek but be a no op 
 since we are seeking on the read.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5150) HdfsIndexInput may not fully read requested bytes.

2013-08-16 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13742433#comment-13742433
 ] 

Mark Miller commented on SOLR-5150:
---

@phunt was on vacation, but is now back and may have some thoughts on this 
issue as well.

 HdfsIndexInput may not fully read requested bytes.
 --

 Key: SOLR-5150
 URL: https://issues.apache.org/jira/browse/SOLR-5150
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.4
Reporter: Mark Miller
Assignee: Mark Miller
 Fix For: 4.5, 5.0

 Attachments: SOLR-5150.patch


 Patrick Hunt noticed that our HdfsDirectory code was a bit behind Blur here - 
 the read call we are using may not read all of the requested bytes - it 
 returns the number of bytes actually written - which we ignore.
 Blur moved to using a seek and then readFully call - synchronizing across the 
 two calls to deal with clones.
 We have seen that really kills performance, and using the readFully call that 
 lets you pass the position rather than first doing a seek, performs much 
 better and does not require the synchronization.
 I also noticed that the seekInternal impl should not seek but be a no op 
 since we are seeking on the read.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-5150) HdfsIndexInput may not fully read requested bytes.

2013-08-16 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13742433#comment-13742433
 ] 

Mark Miller edited comment on SOLR-5150 at 8/16/13 5:37 PM:


[~phunt] was on vacation, but is now back and may have some thoughts on this 
issue as well.

  was (Author: markrmil...@gmail.com):
@phunt was on vacation, but is now back and may have some thoughts on this 
issue as well.
  
 HdfsIndexInput may not fully read requested bytes.
 --

 Key: SOLR-5150
 URL: https://issues.apache.org/jira/browse/SOLR-5150
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.4
Reporter: Mark Miller
Assignee: Mark Miller
 Fix For: 4.5, 5.0

 Attachments: SOLR-5150.patch


 Patrick Hunt noticed that our HdfsDirectory code was a bit behind Blur here - 
 the read call we are using may not read all of the requested bytes - it 
 returns the number of bytes actually written - which we ignore.
 Blur moved to using a seek and then readFully call - synchronizing across the 
 two calls to deal with clones.
 We have seen that really kills performance, and using the readFully call that 
 lets you pass the position rather than first doing a seek, performs much 
 better and does not require the synchronization.
 I also noticed that the seekInternal impl should not seek but be a no op 
 since we are seeking on the read.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-5150) HdfsIndexInput may not fully read requested bytes.

2013-08-16 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13742432#comment-13742432
 ] 

Mark Miller edited comment on SOLR-5150 at 8/16/13 5:38 PM:


I've held off on committing this because some performance tests indicate the 
upstream blur patch may have been more performant for merging/flushing while 
the current patch is *much* more performant for queries.

We might be able to use one or the other based on the IOContext.

I'm waiting until I can get some more results and testing done though - I've 
seen lots of random deadlock situations in some of my testing with the upstream 
blur fix (synchronization around two calls).

  was (Author: markrmil...@gmail.com):
I've held off on committing this because some performance tests indicate 
the upstream blur patch may have been more performant for merging/flushing 
while the current patch is *much* more performant for queries.

We might be able to use one or the other based on the IOContext.

I'm waiting until I can get some more results and testing done though - I've 
seen lots of random deadlock situations in some of my testing with the upstream 
blue fix (synchronization around two calls).
  
 HdfsIndexInput may not fully read requested bytes.
 --

 Key: SOLR-5150
 URL: https://issues.apache.org/jira/browse/SOLR-5150
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.4
Reporter: Mark Miller
Assignee: Mark Miller
 Fix For: 4.5, 5.0

 Attachments: SOLR-5150.patch


 Patrick Hunt noticed that our HdfsDirectory code was a bit behind Blur here - 
 the read call we are using may not read all of the requested bytes - it 
 returns the number of bytes actually written - which we ignore.
 Blur moved to using a seek and then readFully call - synchronizing across the 
 two calls to deal with clones.
 We have seen that really kills performance, and using the readFully call that 
 lets you pass the position rather than first doing a seek, performs much 
 better and does not require the synchronization.
 I also noticed that the seekInternal impl should not seek but be a no op 
 since we are seeking on the read.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5159) Manifest includes non-parsed maven variables

2013-08-16 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13742434#comment-13742434
 ] 

ASF subversion and git services commented on SOLR-5159:
---

Commit 1514813 from [~steve_rowe] in branch 'dev/trunk'
[ https://svn.apache.org/r1514813 ]

SOLR-5159: Manifest includes non-parsed maven variables

 Manifest includes non-parsed maven variables
 

 Key: SOLR-5159
 URL: https://issues.apache.org/jira/browse/SOLR-5159
 Project: Solr
  Issue Type: Bug
  Components: Build
Affects Versions: 4.4, 4.5, 5.0
 Environment: Apache Maven 3.0.5
Reporter: Artem Karpenko
Assignee: Steve Rowe
Priority: Minor
  Labels: maven-bundle-plugin, maven3,
 Attachments: SOLR-5159.patch


 When building Lucene/Solr with Apache Maven 3, all MANIFEST.MF files included 
 into JAR artifacts contain non-parsed POM variables: namely, there are 
 entries like
 Specification-Version: 5.0.0.$\{now.version}
 In the end, Solr displays these values on admin page in Versions section.
 This is caused by unresolved bug in maven-bundle-plugin (FELIX-3392). 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5159) Manifest includes non-parsed maven variables

2013-08-16 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13742435#comment-13742435
 ] 

ASF subversion and git services commented on SOLR-5159:
---

Commit 1514814 from [~steve_rowe] in branch 'dev/trunk'
[ https://svn.apache.org/r1514814 ]

SOLR-5159: fix typo in CHANGES entry

 Manifest includes non-parsed maven variables
 

 Key: SOLR-5159
 URL: https://issues.apache.org/jira/browse/SOLR-5159
 Project: Solr
  Issue Type: Bug
  Components: Build
Affects Versions: 4.4, 4.5, 5.0
 Environment: Apache Maven 3.0.5
Reporter: Artem Karpenko
Assignee: Steve Rowe
Priority: Minor
  Labels: maven-bundle-plugin, maven3,
 Attachments: SOLR-5159.patch


 When building Lucene/Solr with Apache Maven 3, all MANIFEST.MF files included 
 into JAR artifacts contain non-parsed POM variables: namely, there are 
 entries like
 Specification-Version: 5.0.0.$\{now.version}
 In the end, Solr displays these values on admin page in Versions section.
 This is caused by unresolved bug in maven-bundle-plugin (FELIX-3392). 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5159) Manifest includes non-parsed maven variables

2013-08-16 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13742439#comment-13742439
 ] 

ASF subversion and git services commented on SOLR-5159:
---

Commit 1514816 from [~steve_rowe] in branch 'dev/branches/branch_4x'
[ https://svn.apache.org/r1514816 ]

SOLR-5159: Manifest includes non-parsed maven variables (merged trunk r1514813 
and r1514814)

 Manifest includes non-parsed maven variables
 

 Key: SOLR-5159
 URL: https://issues.apache.org/jira/browse/SOLR-5159
 Project: Solr
  Issue Type: Bug
  Components: Build
Affects Versions: 4.4, 4.5, 5.0
 Environment: Apache Maven 3.0.5
Reporter: Artem Karpenko
Assignee: Steve Rowe
Priority: Minor
  Labels: maven-bundle-plugin, maven3,
 Attachments: SOLR-5159.patch


 When building Lucene/Solr with Apache Maven 3, all MANIFEST.MF files included 
 into JAR artifacts contain non-parsed POM variables: namely, there are 
 entries like
 Specification-Version: 5.0.0.$\{now.version}
 In the end, Solr displays these values on admin page in Versions section.
 This is caused by unresolved bug in maven-bundle-plugin (FELIX-3392). 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5150) HdfsIndexInput may not fully read requested bytes.

2013-08-16 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13742441#comment-13742441
 ] 

Mark Miller commented on SOLR-5150:
---

To describe that more fully: not deadlock - just really long pauses - no cpu or 
harddrive usage by either hdfs processes or solr for a *long* time - threads 
hanging out in socket waits of some kind it seemed.

That is how I first saw the slowdown with the blur fix - I was running one of 
the HdfsDirectory tests on my mac and it took 10 min instead of 14 seconds. On 
linux, the test was still fast. Some other perf tests around querying took a 
nose dive on linux as well though. Meanwhile, some tests involving indexing 
sped up.

The current patch sped that test back up on my mac and fixed the query perf 
test.

We might be able to get the best of both worlds, or the synchronized version 
might not be worth it.

 HdfsIndexInput may not fully read requested bytes.
 --

 Key: SOLR-5150
 URL: https://issues.apache.org/jira/browse/SOLR-5150
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.4
Reporter: Mark Miller
Assignee: Mark Miller
 Fix For: 4.5, 5.0

 Attachments: SOLR-5150.patch


 Patrick Hunt noticed that our HdfsDirectory code was a bit behind Blur here - 
 the read call we are using may not read all of the requested bytes - it 
 returns the number of bytes actually written - which we ignore.
 Blur moved to using a seek and then readFully call - synchronizing across the 
 two calls to deal with clones.
 We have seen that really kills performance, and using the readFully call that 
 lets you pass the position rather than first doing a seek, performs much 
 better and does not require the synchronization.
 I also noticed that the seekInternal impl should not seek but be a no op 
 since we are seeking on the read.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5156) Provide a way to move the contents of a file to ZooKeeper with ZkCLI

2013-08-16 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13742446#comment-13742446
 ] 

ASF subversion and git services commented on SOLR-5156:
---

Commit 1514821 from [~erickoerickson] in branch 'dev/branches/branch_4x'
[ https://svn.apache.org/r1514821 ]

SOLR-5156 Provide a way to move the contents of a file to ZooKeeper with ZkCLI

 Provide a way to move the contents of a file to ZooKeeper with ZkCLI
 

 Key: SOLR-5156
 URL: https://issues.apache.org/jira/browse/SOLR-5156
 Project: Solr
  Issue Type: Improvement
Reporter: Erick Erickson
Assignee: Erick Erickson
 Attachments: SOLR-5156.patch, SOLR-5156.patch, SOLR-5156.patch


 Spinoff from SOLR-4718. We don't have any good way of putting solr.xml up in 
 Zookeeper in the first place. So while we can fake getting the file up there 
 we need a way consistent with ZkCLI

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (SOLR-5156) Provide a way to move the contents of a file to ZooKeeper with ZkCLI

2013-08-16 Thread Erick Erickson (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-5156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erick Erickson resolved SOLR-5156.
--

   Resolution: Fixed
Fix Version/s: 5.0
   4.5

 Provide a way to move the contents of a file to ZooKeeper with ZkCLI
 

 Key: SOLR-5156
 URL: https://issues.apache.org/jira/browse/SOLR-5156
 Project: Solr
  Issue Type: Improvement
Reporter: Erick Erickson
Assignee: Erick Erickson
 Fix For: 4.5, 5.0

 Attachments: SOLR-5156.patch, SOLR-5156.patch, SOLR-5156.patch


 Spinoff from SOLR-4718. We don't have any good way of putting solr.xml up in 
 Zookeeper in the first place. So while we can fake getting the file up there 
 we need a way consistent with ZkCLI

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-5168) BJQParserTest reproducible failures

2013-08-16 Thread Hoss Man (JIRA)
Hoss Man created SOLR-5168:
--

 Summary: BJQParserTest reproducible failures
 Key: SOLR-5168
 URL: https://issues.apache.org/jira/browse/SOLR-5168
 Project: Solr
  Issue Type: Bug
Reporter: Hoss Man
Assignee: Yonik Seeley


two recent Jenkins builds have uncovered some test seeds that cause failures in 
multiple test methods in BJQParserTest.  These seeds reproduce reliably (as of 
trunk r1514815) ...

{noformat}
ant test  -Dtestcase=BJQParserTest -Dtests.seed=7A613F321CE87F5B 
-Dtests.multiplier=3 -Dtests.slow=true

ant test  -Dtestcase=BJQParserTest -Dtests.seed=1DC8055F837E437E 
-Dtests.multiplier=2 -Dtests.nightly=true -Dtests.slow=true 
{noformat}


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (SOLR-5159) Manifest includes non-parsed maven variables

2013-08-16 Thread Steve Rowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-5159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Rowe resolved SOLR-5159.
--

   Resolution: Fixed
Fix Version/s: 5.0
   4.5

bq. I want to verify Maven2 locally, and I also want to compare all manifest 
entries with the Ant-produced ones - the solr entries were changed recently, 
and I want to keep them in sync.

Maven 2.2.1 works fine.

I compared the Ant-built and Maven-built manifests, and the Maven-built ones of 
course have lots of Bnd-produced entries not in the Ant-built ones.  There are 
two other differences:

# The Maven-built manifest contains Implementation-Vendor-Id (with Maven 
coordinate groupId as the value: org.apache.lucene or org.apache.solr).  I 
think this is fine to leave in, and maybe the Ant-built manifests should get it 
too?
# The Maven-built manifests have the old style {{Specification-Version}}, 
including a timestamp, e.g. {{5.0.0.2013.08.16.12.36.14}}, where the Ant-built 
manifests just have the version, e.g. {{5.0-SNAPSHOT}}.  The latter is actually 
syntactically incorrect, since the value should only have digits and period.  
I've left it as the old style in the Maven version, since it's not a syntax 
error, and since the Maven versions will only ever be produced by end-users - 
all snapshot and release Maven artifacts are produced by Ant.

I've committed to trunk and branch_4x.

Thanks Artem!

 Manifest includes non-parsed maven variables
 

 Key: SOLR-5159
 URL: https://issues.apache.org/jira/browse/SOLR-5159
 Project: Solr
  Issue Type: Bug
  Components: Build
Affects Versions: 4.4, 4.5, 5.0
 Environment: Apache Maven 3.0.5
Reporter: Artem Karpenko
Assignee: Steve Rowe
Priority: Minor
  Labels: maven-bundle-plugin, maven3,
 Fix For: 4.5, 5.0

 Attachments: SOLR-5159.patch


 When building Lucene/Solr with Apache Maven 3, all MANIFEST.MF files included 
 into JAR artifacts contain non-parsed POM variables: namely, there are 
 entries like
 Specification-Version: 5.0.0.$\{now.version}
 In the end, Solr displays these values on admin page in Versions section.
 This is caused by unresolved bug in maven-bundle-plugin (FELIX-3392). 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5150) HdfsIndexInput may not fully read requested bytes.

2013-08-16 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13742465#comment-13742465
 ] 

Uwe Schindler commented on SOLR-5150:
-

Hi Mark,
I think your version should be preferred in both cases. The Apache Blur 
upstream version looks like SimpleFSIndexInput (which has synchronization on 
the RandomAccessFile). The difference is here, that reading from a real file 
has no network involved (at least not for local filesystems) so the time spent 
in the locked code block is shorter. Still SimpleFSDir is bad for queries.
When merging the whole stuff works single-threaded per file so you would see so 
difference in both approaches. If the positional readFully approach would be 
slower, then this would be clearly a bug in Hdfs.
Another alternative would be: When cloning a file also clone the underlying 
Hdfs connection. With RandomAccessFile we cannot do this in the JDK (we have no 
dup() for file descriptors), but if Hdfs supports some dup() like approach with 
delete on-last close semantics (the file could already be deleted when you dup 
the file descriptor) you could create 2 different connection for each thread.
The backside: Lucene never closes clones - one reason why I gave up on 
implementig a Windows-Optimized directory that would clone underlying file 
descriptor: The clone would never close the dup :(

 HdfsIndexInput may not fully read requested bytes.
 --

 Key: SOLR-5150
 URL: https://issues.apache.org/jira/browse/SOLR-5150
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.4
Reporter: Mark Miller
Assignee: Mark Miller
 Fix For: 4.5, 5.0

 Attachments: SOLR-5150.patch


 Patrick Hunt noticed that our HdfsDirectory code was a bit behind Blur here - 
 the read call we are using may not read all of the requested bytes - it 
 returns the number of bytes actually written - which we ignore.
 Blur moved to using a seek and then readFully call - synchronizing across the 
 two calls to deal with clones.
 We have seen that really kills performance, and using the readFully call that 
 lets you pass the position rather than first doing a seek, performs much 
 better and does not require the synchronization.
 I also noticed that the seekInternal impl should not seek but be a no op 
 since we are seeking on the read.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-5150) HdfsIndexInput may not fully read requested bytes.

2013-08-16 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13742465#comment-13742465
 ] 

Uwe Schindler edited comment on SOLR-5150 at 8/16/13 6:00 PM:
--

Hi Mark,
I think your version should be preferred in both cases. The Apache Blur 
upstream version looks like SimpleFSIndexInput (which has synchronization on 
the RandomAccessFile). The difference is here, that reading from a real file 
has no network involved (at least not for local filesystems) so the time spent 
in the locked code block is shorter. Still SimpleFSDir is bad for queries.
When merging the whole stuff works single-threaded per file so you would see no 
difference in both approaches. If the positional readFully approach would be 
slower, then this would be clearly a bug in Hdfs.
Another alternative would be: When cloning a file also clone the underlying 
Hdfs connection. With RandomAccessFile we cannot do this in the JDK (we have no 
dup() for file descriptors), but if Hdfs supports some dup() like approach with 
delete on-last close semantics (the file could already be deleted when you dup 
the file descriptor) you could create 2 different connection for each thread.
The backside: Lucene never closes clones - one reason why I gave up on 
implementig a Windows-Optimized directory that would clone underlying file 
descriptor: The clone would never close the dup :(

  was (Author: thetaphi):
Hi Mark,
I think your version should be preferred in both cases. The Apache Blur 
upstream version looks like SimpleFSIndexInput (which has synchronization on 
the RandomAccessFile). The difference is here, that reading from a real file 
has no network involved (at least not for local filesystems) so the time spent 
in the locked code block is shorter. Still SimpleFSDir is bad for queries.
When merging the whole stuff works single-threaded per file so you would see so 
difference in both approaches. If the positional readFully approach would be 
slower, then this would be clearly a bug in Hdfs.
Another alternative would be: When cloning a file also clone the underlying 
Hdfs connection. With RandomAccessFile we cannot do this in the JDK (we have no 
dup() for file descriptors), but if Hdfs supports some dup() like approach with 
delete on-last close semantics (the file could already be deleted when you dup 
the file descriptor) you could create 2 different connection for each thread.
The backside: Lucene never closes clones - one reason why I gave up on 
implementig a Windows-Optimized directory that would clone underlying file 
descriptor: The clone would never close the dup :(
  
 HdfsIndexInput may not fully read requested bytes.
 --

 Key: SOLR-5150
 URL: https://issues.apache.org/jira/browse/SOLR-5150
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.4
Reporter: Mark Miller
Assignee: Mark Miller
 Fix For: 4.5, 5.0

 Attachments: SOLR-5150.patch


 Patrick Hunt noticed that our HdfsDirectory code was a bit behind Blur here - 
 the read call we are using may not read all of the requested bytes - it 
 returns the number of bytes actually written - which we ignore.
 Blur moved to using a seek and then readFully call - synchronizing across the 
 two calls to deal with clones.
 We have seen that really kills performance, and using the readFully call that 
 lets you pass the position rather than first doing a seek, performs much 
 better and does not require the synchronization.
 I also noticed that the seekInternal impl should not seek but be a no op 
 since we are seeking on the read.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5168) BJQParserTest reproducible failures

2013-08-16 Thread Hoss Man (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13742467#comment-13742467
 ] 

Hoss Man commented on SOLR-5168:


One of those seeds (1DC8055F837E437E) causes MockRandomMergePolicy -- but a 
cursory review of hte test (and my cursory udnderstanding of the block join 
queries) doesn't suggest any reason why that should cause a problem for this 
test -- the only ever time a commit *might* happen in the test is at the end of 
an entire block.

The other seed (7A613F321CE87F5B) just uses LogDocMergePolicy, so even if my 
cusory understandings above are incorrect, there really seems to be a bug when 
this seed is used.

 BJQParserTest reproducible failures
 ---

 Key: SOLR-5168
 URL: https://issues.apache.org/jira/browse/SOLR-5168
 Project: Solr
  Issue Type: Bug
Reporter: Hoss Man
Assignee: Yonik Seeley

 two recent Jenkins builds have uncovered some test seeds that cause failures 
 in multiple test methods in BJQParserTest.  These seeds reproduce reliably 
 (as of trunk r1514815) ...
 {noformat}
 ant test  -Dtestcase=BJQParserTest -Dtests.seed=7A613F321CE87F5B 
 -Dtests.multiplier=3 -Dtests.slow=true
 ant test  -Dtestcase=BJQParserTest -Dtests.seed=1DC8055F837E437E 
 -Dtests.multiplier=2 -Dtests.nightly=true -Dtests.slow=true 
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3280) to many / sometimes stale CLOSE_WAIT connections from SnapPuller during / after replication

2013-08-16 Thread David Fu (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13742478#comment-13742478
 ] 

David Fu commented on SOLR-3280:


I am still on 3.4 now. I noticed the solr4 pretty much reimplemented the 
snappuller and am thinking about upgrading to v4. Just out of the curiosity, 
what are some issues you faced in the process of upgrading from 3.6 to 4.2.1?

 to many / sometimes stale CLOSE_WAIT connections from SnapPuller during / 
 after replication
 ---

 Key: SOLR-3280
 URL: https://issues.apache.org/jira/browse/SOLR-3280
 Project: Solr
  Issue Type: Bug
Affects Versions: 3.5, 3.6, 4.0-ALPHA
Reporter: Bernd Fehling
Assignee: Robert Muir
Priority: Minor
 Attachments: SOLR-3280.patch


 There are sometimes to many and also stale CLOSE_WAIT connections 
 during/after replication left over on SLAVE server.
 Normally GC should clean up this but this is not always the case.
 Also if a CLOSE_WAIT is hanging then the new replication won't load.
 Dirty work around so far is to fake a TCP connection as root to that 
 connection and close it. 
 After that the new replication will load, the old index and searcher released 
 and the system will
 return to normal operation.
 Background:
 The SnapPuller is using Apache httpclient 3.x and uses the 
 MultiThreadedHttpConnectionManager.
 The manager holds a connection in CLOSE_WAIT after its use for further 
 requests.
 This is done by calling releaseConnection. But if a connection is stuck it is 
 not available any more and a new
 connection from the pool is used.
 Solution:
 After calling releaseConnection clean up with closeIdleConnections(0).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5159) Manifest includes non-parsed maven variables

2013-08-16 Thread Artem Karpenko (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13742480#comment-13742480
 ] 

Artem Karpenko commented on SOLR-5159:
--

Great, thank you Steve, I was glad to help.

 Manifest includes non-parsed maven variables
 

 Key: SOLR-5159
 URL: https://issues.apache.org/jira/browse/SOLR-5159
 Project: Solr
  Issue Type: Bug
  Components: Build
Affects Versions: 4.4, 4.5, 5.0
 Environment: Apache Maven 3.0.5
Reporter: Artem Karpenko
Assignee: Steve Rowe
Priority: Minor
  Labels: maven-bundle-plugin, maven3,
 Fix For: 4.5, 5.0

 Attachments: SOLR-5159.patch


 When building Lucene/Solr with Apache Maven 3, all MANIFEST.MF files included 
 into JAR artifacts contain non-parsed POM variables: namely, there are 
 entries like
 Specification-Version: 5.0.0.$\{now.version}
 In the end, Solr displays these values on admin page in Versions section.
 This is caused by unresolved bug in maven-bundle-plugin (FELIX-3392). 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3936) QueryElevationComponent: Wrong order when result grouping is activated

2013-08-16 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13742500#comment-13742500
 ] 

ASF subversion and git services commented on SOLR-3936:
---

Commit 1514836 from hoss...@apache.org in branch 'dev/branches/branch_4x'
[ https://svn.apache.org/r1514836 ]

SOLR-3936: Fixed QueryElevationComponent sorting when used with Grouping (merge 
r1514795)

 QueryElevationComponent: Wrong order when result grouping is activated
 --

 Key: SOLR-3936
 URL: https://issues.apache.org/jira/browse/SOLR-3936
 Project: Solr
  Issue Type: Bug
  Components: SearchComponents - other
Affects Versions: 4.0
Reporter: Michael Berger
Assignee: Hoss Man
 Attachments: SOLR-3936.patch, SOLR-3936.patch


 When I use elevation together with grouping I got not the expected result 
 order.
 I tried it with the standard solr example:
 http://localhost:8983/solr/elevate?enableElevation=truefl=score%2C[elevated]%2Cid%2CnameforceElevation=truegroup.field=manugroup=onindent=onq=ipodwt=json
  
 but the results ignored the elevation: 
 { 
   responseHeader:{ 
 status:0, 
 QTime:2, 
 params:{ 
   enableElevation:true, 
   fl:score,[elevated],id,name, 
   indent:on, 
   q:ipod, 
   forceElevation:true, 
   group.field:manu, 
   group:on, 
   wt:json}}, 
   grouped:{ 
 manu:{ 
   matches:2, 
   groups:[{ 
   groupValue:belkin, 
   doclist:{numFound:1,start:0,maxScore:0.7698604,docs:[ 
   { 
 id:F8V7067-APL-KIT, 
 name:Belkin Mobile Power Cord for iPod w/ Dock, 
 score:0.7698604, 
 [elevated]:false}] 
   }}, 
 { 
   groupValue:inc, 
   doclist:{numFound:1,start:0,maxScore:0.28869766,docs:[ 
   { 
 id:MA147LL/A, 
 name:Apple 60 GB iPod with Video Playback Black, 
 score:0.28869766, 
 [elevated]:true}] 
   }}]}}}
 the elevate.xml defines the following rules :
 query text=ipod
doc id=MA147LL/A /  !-- put the actual ipod at the top --
doc id=IW-02 exclude=true / !-- exclude this cable --
  /query
  
 /elevate

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (SOLR-5135) Deleting a collection should be extra aggressive in the face of failures.

2013-08-16 Thread Mark Miller (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-5135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Miller resolved SOLR-5135.
---

Resolution: Fixed

 Deleting a collection should be extra aggressive in the face of failures.
 -

 Key: SOLR-5135
 URL: https://issues.apache.org/jira/browse/SOLR-5135
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Reporter: Mark Miller
Assignee: Mark Miller
 Fix For: 4.5, 5.0

 Attachments: SOLR-5135.patch


 Until Zk is the source of truth for the cluster, zk and local node states can 
 get out of whack in certain situations - as a result, sometimes you cannot 
 clean out all of the remnants of a collection to recreate it. For example, if 
 the collection is listed in zk under /collections, but is not in 
 clusterstate.json, you cannot remove or create the collection again due to a 
 early exception in the collection removal chain.
 I think we should probably still return the error - but also delete as much 
 as we can.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (SOLR-3936) QueryElevationComponent: Wrong order when result grouping is activated

2013-08-16 Thread Hoss Man (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-3936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hoss Man resolved SOLR-3936.


   Resolution: Fixed
Fix Version/s: 5.0
   4.5

Thanks again Michael!

 QueryElevationComponent: Wrong order when result grouping is activated
 --

 Key: SOLR-3936
 URL: https://issues.apache.org/jira/browse/SOLR-3936
 Project: Solr
  Issue Type: Bug
  Components: SearchComponents - other
Affects Versions: 4.0
Reporter: Michael Berger
Assignee: Hoss Man
 Fix For: 4.5, 5.0

 Attachments: SOLR-3936.patch, SOLR-3936.patch


 When I use elevation together with grouping I got not the expected result 
 order.
 I tried it with the standard solr example:
 http://localhost:8983/solr/elevate?enableElevation=truefl=score%2C[elevated]%2Cid%2CnameforceElevation=truegroup.field=manugroup=onindent=onq=ipodwt=json
  
 but the results ignored the elevation: 
 { 
   responseHeader:{ 
 status:0, 
 QTime:2, 
 params:{ 
   enableElevation:true, 
   fl:score,[elevated],id,name, 
   indent:on, 
   q:ipod, 
   forceElevation:true, 
   group.field:manu, 
   group:on, 
   wt:json}}, 
   grouped:{ 
 manu:{ 
   matches:2, 
   groups:[{ 
   groupValue:belkin, 
   doclist:{numFound:1,start:0,maxScore:0.7698604,docs:[ 
   { 
 id:F8V7067-APL-KIT, 
 name:Belkin Mobile Power Cord for iPod w/ Dock, 
 score:0.7698604, 
 [elevated]:false}] 
   }}, 
 { 
   groupValue:inc, 
   doclist:{numFound:1,start:0,maxScore:0.28869766,docs:[ 
   { 
 id:MA147LL/A, 
 name:Apple 60 GB iPod with Video Playback Black, 
 score:0.28869766, 
 [elevated]:true}] 
   }}]}}}
 the elevate.xml defines the following rules :
 query text=ipod
doc id=MA147LL/A /  !-- put the actual ipod at the top --
doc id=IW-02 exclude=true / !-- exclude this cable --
  /query
  
 /elevate

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (SOLR-4718) Allow solr.xml to be stored in zookeeper

2013-08-16 Thread Erick Erickson (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-4718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erick Erickson resolved SOLR-4718.
--

   Resolution: Fixed
Fix Version/s: 5.0
   4.5

 Allow solr.xml to be stored in zookeeper
 

 Key: SOLR-4718
 URL: https://issues.apache.org/jira/browse/SOLR-4718
 Project: Solr
  Issue Type: Improvement
  Components: Schema and Analysis
Affects Versions: 4.3, 5.0
Reporter: Erick Erickson
Assignee: Erick Erickson
 Fix For: 4.5, 5.0

 Attachments: SOLR-4718-alternative.patch, SOLR-4718.patch, 
 SOLR-4718.patch, SOLR-4718.patch, SOLR-4718.patch, SOLR-4718.patch, 
 SOLR-4718.patch, SOLR-4718.patch


 So the near-final piece of this puzzle is to make solr.xml be storable in 
 Zookeeper. Code-wise in terms of Solr, this doesn't look very difficult, I'm 
 working on it now.
 More interesting is how to get the configuration into ZK in the first place, 
 enhancements to ZkCli? Or boostrap-conf? Other? I'm punting on that for this 
 patch.
 Second level is how to tell Solr to get the file from ZK. Some possibilities:
 1 A system prop, -DzkSolrXmlPath=blah where blah is the path _on zk_ where 
 the file is. Would require -DzkHost or -DzkRun as well.
pros - simple, I can wrap my head around it.
  - easy to script
cons - can't run multiple JVMs pointing to different files. Is this 
 really a problem?
 2 New solr.xml element. Something like:
 solr
   solrcloud
  str name=zkHostzkurl/str
  str name=zkSolrXmlPathwhatever/str
   /solrcloud
 solr
Really, this form would hinge on the presence or absence of zkSolrXmlPath. 
 If present, go up and look for the indicated solr.xml file on ZK. Any 
 properties in the ZK version would overwrite anything in the local copy.
 NOTE: I'm really not very interested in supporting this as an option for 
 old-style solr.xml unless it's _really_ easy. For instance, what if the local 
 solr.xml is new-style and the one in ZK is old-style? Or vice-versa? Since 
 old-style is going away, this doesn't seem like it's worth the effort.
 pros - No new mechanisms
 cons - once again requires that there be a solr.xml file on each client. 
 Admittedly for installations that didn't care much about multiple JVMs, it 
 could be a stock file that didn't change...
 For now, I'm going to just manually push solr.xml to ZK, then read it based 
 on a sysprop. That'll get the structure in place while we debate. Not going 
 to check this in until there's some consensus though.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4718) Allow solr.xml to be stored in zookeeper

2013-08-16 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13742512#comment-13742512
 ] 

ASF subversion and git services commented on SOLR-4718:
---

Commit 1514843 from [~erickoerickson] in branch 'dev/branches/branch_4x'
[ https://svn.apache.org/r1514843 ]

SOLR-4718 Allow solr.xml to be stored in ZooKeeper

 Allow solr.xml to be stored in zookeeper
 

 Key: SOLR-4718
 URL: https://issues.apache.org/jira/browse/SOLR-4718
 Project: Solr
  Issue Type: Improvement
  Components: Schema and Analysis
Affects Versions: 4.3, 5.0
Reporter: Erick Erickson
Assignee: Erick Erickson
 Attachments: SOLR-4718-alternative.patch, SOLR-4718.patch, 
 SOLR-4718.patch, SOLR-4718.patch, SOLR-4718.patch, SOLR-4718.patch, 
 SOLR-4718.patch, SOLR-4718.patch


 So the near-final piece of this puzzle is to make solr.xml be storable in 
 Zookeeper. Code-wise in terms of Solr, this doesn't look very difficult, I'm 
 working on it now.
 More interesting is how to get the configuration into ZK in the first place, 
 enhancements to ZkCli? Or boostrap-conf? Other? I'm punting on that for this 
 patch.
 Second level is how to tell Solr to get the file from ZK. Some possibilities:
 1 A system prop, -DzkSolrXmlPath=blah where blah is the path _on zk_ where 
 the file is. Would require -DzkHost or -DzkRun as well.
pros - simple, I can wrap my head around it.
  - easy to script
cons - can't run multiple JVMs pointing to different files. Is this 
 really a problem?
 2 New solr.xml element. Something like:
 solr
   solrcloud
  str name=zkHostzkurl/str
  str name=zkSolrXmlPathwhatever/str
   /solrcloud
 solr
Really, this form would hinge on the presence or absence of zkSolrXmlPath. 
 If present, go up and look for the indicated solr.xml file on ZK. Any 
 properties in the ZK version would overwrite anything in the local copy.
 NOTE: I'm really not very interested in supporting this as an option for 
 old-style solr.xml unless it's _really_ easy. For instance, what if the local 
 solr.xml is new-style and the one in ZK is old-style? Or vice-versa? Since 
 old-style is going away, this doesn't seem like it's worth the effort.
 pros - No new mechanisms
 cons - once again requires that there be a solr.xml file on each client. 
 Admittedly for installations that didn't care much about multiple JVMs, it 
 could be a stock file that didn't change...
 For now, I'm going to just manually push solr.xml to ZK, then read it based 
 on a sysprop. That'll get the structure in place while we debate. Not going 
 to check this in until there's some consensus though.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5171) AnalyzingSuggester and FuzzySuggester should be able to share same FST

2013-08-16 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13742513#comment-13742513
 ] 

Michael McCandless commented on LUCENE-5171:


If you use FuzzySuggester with maxEdits=0, does it work?

Or, maybe we should simply merge these two suggesters into AnalyzingSuggester 
and default maxEdits to 0?

 AnalyzingSuggester and FuzzySuggester should be able to share same FST
 --

 Key: LUCENE-5171
 URL: https://issues.apache.org/jira/browse/LUCENE-5171
 Project: Lucene - Core
  Issue Type: Improvement
  Components: modules/other
Affects Versions: 4.4, 4.3.1
Reporter: Anna Björk Nikulásdóttir
Priority: Minor

 In my code I use both suggesters for the same FST. I use 
 AnalyzerSuggester#store() to create the FST and later on 
 AnalyzingSuggester#load() and FuzzySuggester#load() to use it.
 This approach works very well but it unnecessarily creates 2 fst instances 
 resulting in 2x memory consumption.
 It seems that for the time being both suggesters use the same FST format.
 The following trivial method in AnalyzingSuggester provides the possibility 
 to share the same FST among different instances of AnalyzingSuggester. It has 
 been tested in the above scenario:
   public boolean shareFstFrom(AnalyzingSuggester instance)
   {
 if (instance.fst == null) {
   return false;
 }
 this.fst = instance.fst;
 this.maxAnalyzedPathsForOneInput = instance.maxAnalyzedPathsForOneInput;
 this.hasPayloads = instance.hasPayloads;
 return true;
   }
 One could use it like this:
   analyzingSugg = new AnalyzingSuggester(...);
   fuzzySugg = new FuzzySuggester(...);
   analyzingSugg.load(someInputStream);
   fuzzySugg = analyzingSugg.shareFstFrom(analyzingSugg);

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4583) StraightBytesDocValuesField fails if bytes 32k

2013-08-16 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13742521#comment-13742521
 ] 

ASF subversion and git services commented on LUCENE-4583:
-

Commit 1514848 from [~mikemccand] in branch 'dev/branches/branch_4x'
[ https://svn.apache.org/r1514848 ]

LUCENE-4583: IndexWriter no longer places a limit on length of DV binary fields 
(individual codecs still have their limits, including the default codec)

 StraightBytesDocValuesField fails if bytes  32k
 

 Key: LUCENE-4583
 URL: https://issues.apache.org/jira/browse/LUCENE-4583
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/index
Affects Versions: 4.0, 4.1, 5.0
Reporter: David Smiley
Assignee: Michael McCandless
Priority: Critical
 Fix For: 5.0, 4.5

 Attachments: LUCENE-4583.patch, LUCENE-4583.patch, LUCENE-4583.patch, 
 LUCENE-4583.patch, LUCENE-4583.patch, LUCENE-4583.patch, LUCENE-4583.patch, 
 LUCENE-4583.patch


 I didn't observe any limitations on the size of a bytes based DocValues field 
 value in the docs.  It appears that the limit is 32k, although I didn't get 
 any friendly error telling me that was the limit.  32k is kind of small IMO; 
 I suspect this limit is unintended and as such is a bug.The following 
 test fails:
 {code:java}
   public void testBigDocValue() throws IOException {
 Directory dir = newDirectory();
 IndexWriter writer = new IndexWriter(dir, writerConfig(false));
 Document doc = new Document();
 BytesRef bytes = new BytesRef((4+4)*4097);//4096 works
 bytes.length = bytes.bytes.length;//byte data doesn't matter
 doc.add(new StraightBytesDocValuesField(dvField, bytes));
 writer.addDocument(doc);
 writer.commit();
 writer.close();
 DirectoryReader reader = DirectoryReader.open(dir);
 DocValues docValues = MultiDocValues.getDocValues(reader, dvField);
 //FAILS IF BYTES IS BIG!
 docValues.getSource().getBytes(0, bytes);
 reader.close();
 dir.close();
   }
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-4583) StraightBytesDocValuesField fails if bytes 32k

2013-08-16 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless resolved LUCENE-4583.


Resolution: Fixed

 StraightBytesDocValuesField fails if bytes  32k
 

 Key: LUCENE-4583
 URL: https://issues.apache.org/jira/browse/LUCENE-4583
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/index
Affects Versions: 4.0, 4.1, 5.0
Reporter: David Smiley
Assignee: Michael McCandless
Priority: Critical
 Fix For: 5.0, 4.5

 Attachments: LUCENE-4583.patch, LUCENE-4583.patch, LUCENE-4583.patch, 
 LUCENE-4583.patch, LUCENE-4583.patch, LUCENE-4583.patch, LUCENE-4583.patch, 
 LUCENE-4583.patch


 I didn't observe any limitations on the size of a bytes based DocValues field 
 value in the docs.  It appears that the limit is 32k, although I didn't get 
 any friendly error telling me that was the limit.  32k is kind of small IMO; 
 I suspect this limit is unintended and as such is a bug.The following 
 test fails:
 {code:java}
   public void testBigDocValue() throws IOException {
 Directory dir = newDirectory();
 IndexWriter writer = new IndexWriter(dir, writerConfig(false));
 Document doc = new Document();
 BytesRef bytes = new BytesRef((4+4)*4097);//4096 works
 bytes.length = bytes.bytes.length;//byte data doesn't matter
 doc.add(new StraightBytesDocValuesField(dvField, bytes));
 writer.addDocument(doc);
 writer.commit();
 writer.close();
 DirectoryReader reader = DirectoryReader.open(dir);
 DocValues docValues = MultiDocValues.getDocValues(reader, dvField);
 //FAILS IF BYTES IS BIG!
 docValues.getSource().getBytes(0, bytes);
 reader.close();
 dir.close();
   }
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-5164) In some cases, creating collections via the Collections API due to core being created in the wrong directory

2013-08-16 Thread Mark Miller (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-5164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Miller updated SOLR-5164:
--

Summary: In some cases, creating collections via the Collections API due to 
core being created in the wrong directory  (was: Creating collections via the 
Collections API fails due to core being created in the wrong directory)

 In some cases, creating collections via the Collections API due to core being 
 created in the wrong directory
 

 Key: SOLR-5164
 URL: https://issues.apache.org/jira/browse/SOLR-5164
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Affects Versions: 4.5, 5.0
Reporter: Erick Erickson
Assignee: Mark Miller
Priority: Critical
 Fix For: 4.5, 5.0

 Attachments: SOLR-5164.patch


 When you try to create a collection in SolrCloud, the instanceDir that gets 
 created has an extra solr in it which messes up the pathing for all the 
 lib directives in solrconfig.xml as they're all relative.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-5169) Provide a way to query for zookeeper quorum state and other cloud-related info

2013-08-16 Thread Shawn Heisey (JIRA)
Shawn Heisey created SOLR-5169:
--

 Summary: Provide a way to query for zookeeper quorum state and 
other cloud-related info
 Key: SOLR-5169
 URL: https://issues.apache.org/jira/browse/SOLR-5169
 Project: Solr
  Issue Type: Improvement
  Components: SolrCloud
Affects Versions: 4.4
Reporter: Shawn Heisey
Priority: Minor
 Fix For: 4.5, 5.0


There should be a way, either through an existing admin handler or a new one, 
to get an up-to-the-moment zookeeper status.  There may be other status 
information related to SolrCloud that could be included as well.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5099) The core.properties not created during collection creation

2013-08-16 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13742534#comment-13742534
 ] 

ASF subversion and git services commented on SOLR-5099:
---

Commit 1514857 from [~markrmil...@gmail.com] in branch 'dev/trunk'
[ https://svn.apache.org/r1514857 ]

SOLR-5164: add relative solr.home testing to some tests, explicitly check for 
expected instanceDir handling with relative solr.home
SOLR-5099: explicity check for proper solrcore.properties creation
Speed up some tests by setting leaderVoteWait to 0

 The core.properties not created during collection creation
 --

 Key: SOLR-5099
 URL: https://issues.apache.org/jira/browse/SOLR-5099
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Affects Versions: 4.5, 5.0
Reporter: Herb Jiang
Assignee: Mark Miller
Priority: Critical
 Fix For: 4.5, 5.0

 Attachments: CorePropertiesLocator.java.patch


 When using the new solr.xml structure. The core auto discovery mechanism 
 trying to find core.properties. 
 But I found the core.properties cannot be create when I dynamically create a 
 collection.
 The root issue is the CorePropertiesLocator trying to create properties 
 before the instanceDir is created. 
 And collection creation process will done and looks fine at runtime, but it 
 will cause issues (cores are not auto discovered after server restart).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5164) In some cases, creating collections via the Collections API due to core being created in the wrong directory

2013-08-16 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13742533#comment-13742533
 ] 

ASF subversion and git services commented on SOLR-5164:
---

Commit 1514857 from [~markrmil...@gmail.com] in branch 'dev/trunk'
[ https://svn.apache.org/r1514857 ]

SOLR-5164: add relative solr.home testing to some tests, explicitly check for 
expected instanceDir handling with relative solr.home
SOLR-5099: explicity check for proper solrcore.properties creation
Speed up some tests by setting leaderVoteWait to 0

 In some cases, creating collections via the Collections API due to core being 
 created in the wrong directory
 

 Key: SOLR-5164
 URL: https://issues.apache.org/jira/browse/SOLR-5164
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Affects Versions: 4.5, 5.0
Reporter: Erick Erickson
Assignee: Mark Miller
Priority: Critical
 Fix For: 4.5, 5.0

 Attachments: SOLR-5164.patch


 When you try to create a collection in SolrCloud, the instanceDir that gets 
 created has an extra solr in it which messes up the pathing for all the 
 lib directives in solrconfig.xml as they're all relative.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5164) In some cases, creating collections via the Collections API due to core being created in the wrong directory

2013-08-16 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13742537#comment-13742537
 ] 

ASF subversion and git services commented on SOLR-5164:
---

Commit 1514858 from [~markrmil...@gmail.com] in branch 'dev/branches/branch_4x'
[ https://svn.apache.org/r1514858 ]

SOLR-5164: add relative solr.home testing to some tests, explicitly check for 
expected instanceDir handling with relative solr.home
SOLR-5099: explicity check for proper solrcore.properties creation
Speed up some tests by setting leaderVoteWait to 0

 In some cases, creating collections via the Collections API due to core being 
 created in the wrong directory
 

 Key: SOLR-5164
 URL: https://issues.apache.org/jira/browse/SOLR-5164
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Affects Versions: 4.5, 5.0
Reporter: Erick Erickson
Assignee: Mark Miller
Priority: Critical
 Fix For: 4.5, 5.0

 Attachments: SOLR-5164.patch


 When you try to create a collection in SolrCloud, the instanceDir that gets 
 created has an extra solr in it which messes up the pathing for all the 
 lib directives in solrconfig.xml as they're all relative.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5099) The core.properties not created during collection creation

2013-08-16 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13742538#comment-13742538
 ] 

ASF subversion and git services commented on SOLR-5099:
---

Commit 1514858 from [~markrmil...@gmail.com] in branch 'dev/branches/branch_4x'
[ https://svn.apache.org/r1514858 ]

SOLR-5164: add relative solr.home testing to some tests, explicitly check for 
expected instanceDir handling with relative solr.home
SOLR-5099: explicity check for proper solrcore.properties creation
Speed up some tests by setting leaderVoteWait to 0

 The core.properties not created during collection creation
 --

 Key: SOLR-5099
 URL: https://issues.apache.org/jira/browse/SOLR-5099
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Affects Versions: 4.5, 5.0
Reporter: Herb Jiang
Assignee: Mark Miller
Priority: Critical
 Fix For: 4.5, 5.0

 Attachments: CorePropertiesLocator.java.patch


 When using the new solr.xml structure. The core auto discovery mechanism 
 trying to find core.properties. 
 But I found the core.properties cannot be create when I dynamically create a 
 collection.
 The root issue is the CorePropertiesLocator trying to create properties 
 before the instanceDir is created. 
 And collection creation process will done and looks fine at runtime, but it 
 will cause issues (cores are not auto discovered after server restart).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[JENKINS] Lucene-Solr-trunk-Linux (32bit/jdk1.7.0_25) - Build # 7040 - Still Failing!

2013-08-16 Thread Policeman Jenkins Server
Build: http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-Linux/7040/
Java: 32bit/jdk1.7.0_25 -server -XX:+UseSerialGC

3 tests failed.
FAILED:  org.apache.solr.client.solrj.impl.CloudSolrServerTest.testDistribSearch

Error Message:
java.util.concurrent.TimeoutException: Could not connect to ZooKeeper 
127.0.0.1:59345 within 3 ms

Stack Trace:
java.lang.RuntimeException: java.util.concurrent.TimeoutException: Could not 
connect to ZooKeeper 127.0.0.1:59345 within 3 ms
at 
__randomizedtesting.SeedInfo.seed([19D57DD7E754F289:9833F3CF900B92B5]:0)
at 
org.apache.solr.common.cloud.SolrZkClient.init(SolrZkClient.java:130)
at 
org.apache.solr.common.cloud.SolrZkClient.init(SolrZkClient.java:93)
at 
org.apache.solr.common.cloud.SolrZkClient.init(SolrZkClient.java:84)
at 
org.apache.solr.cloud.AbstractZkTestCase.buildZooKeeper(AbstractZkTestCase.java:89)
at 
org.apache.solr.cloud.AbstractZkTestCase.buildZooKeeper(AbstractZkTestCase.java:83)
at 
org.apache.solr.cloud.AbstractDistribZkTestBase.setUp(AbstractDistribZkTestBase.java:70)
at 
org.apache.solr.cloud.AbstractFullDistribZkTestBase.setUp(AbstractFullDistribZkTestBase.java:193)
at 
org.apache.solr.client.solrj.impl.CloudSolrServerTest.setUp(CloudSolrServerTest.java:71)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1559)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.access$600(RandomizedRunner.java:79)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:771)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:787)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53)
at 
org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50)
at 
org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:51)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
at 
org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:49)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:358)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:782)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:442)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:746)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:648)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:682)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:693)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
at 
org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:43)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
at 

[jira] [Commented] (LUCENE-5179) Refactoring on PostingsWriterBase for delta-encoding

2013-08-16 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13742561#comment-13742561
 ] 

Michael McCandless commented on LUCENE-5179:


So, the idea with this patch is to go back to letting the PBF encode
the metadata for the term?  Just, one term at a time, not the whole
block that we have on trunk today.

And the reason for this is back-compat?  Ie, so that in test-framework
we can have writers for the old formats?

One thing that this change precludes is having the terms dict use
different encodings than simple delta vInt to encode the long[]
metadata, e.g. Simple9/16 or something?  But that's OK ... we can
explore those later.

It's sort of frustrating to have to compromise the design just for
back-compat ... e.g. we could instead cheat a bit, and have the
writers write the newer format.  It's easy to make the readers read
either format right?

But ... I don't understand how this change helps Pulsing, or rather
why Pulsing would have trouble w/ the API we have today?


 Refactoring on PostingsWriterBase for delta-encoding
 

 Key: LUCENE-5179
 URL: https://issues.apache.org/jira/browse/LUCENE-5179
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Han Jiang
Assignee: Han Jiang
 Fix For: 5.0, 4.5

 Attachments: LUCENE-5179.patch


 A further step from LUCENE-5029.
 The short story is, previous API change brings two problems:
 * it somewhat breaks backward compatibility: although we can still read old 
 format,
   we can no longer reproduce it;
 * pulsing codec have problem with it.
 And long story...
 With the change, current PostingsBase API will be like this:
 * term dict tells PBF we start a new term (via startTerm());
 * PBF adds docs, positions and other postings data;
 * term dict tells PBF all the data for current term is completed (via 
 finishTerm()),
   then PBF returns the metadata for current term (as long[] and byte[]);
 * term dict might buffer all the metadata in an ArrayList. when all the term 
 is collected,
   it then decides how those metadata will be located on disk.
 So after the API change, PBF no longer have that annoying 'flushTermBlock', 
 and instead
 term dict maintains the term, metadata list.
 However, for each term we'll now write long[] blob before byte[], so the 
 index format is not consistent with pre-4.5.
 like in Lucne41, the metadata can be written as longA,bytesA,longB, but now 
 we have to write as longA,longB,bytesA.
 Another problem is, pulsing codec cannot tell wrapped PBF how the metadata is 
 delta-encoded, after all
 PulsingPostingsWriter is only a PBF.
 For example, we have terms=[a, a1, a2, b, b1 b2] and 
 itemsInBlock=2, so theoretically
 we'll finally have three blocks in BTTR: [a b]  [a1 a2]  [b1 b2], 
 with this
 approach, the metadata of term b is delta encoded base on metadata of a. 
 but when term dict tells
 PBF to finishTerm(b), it might silly do the delta encode base on term a2.
 So I think maybe we can introduce a method 'encodeTerm(long[], DataOutput 
 out, FieldInfo, TermState, boolean absolute)',
 so that during metadata flush, we can control how current term is written? 
 And the term dict will buffer TermState, which
 implicitly holds metadata like we do in PBReader side.
 For example, if we want to reproduce old lucene41 format , we can simple set 
 longsSize==0, then PBF
 writes the old format (longA,bytesA,longB) to DataOutput, and the compatible 
 issue is solved.
 For pulsing codec, it will also be able to tell lower level how to encode 
 metadata.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [JENKINS] Lucene-Solr-trunk-Linux (32bit/jdk1.7.0_25) - Build # 7040 - Still Failing!

2013-08-16 Thread Mark Miller
Ill look to see if this was somehow me soon - all tests passed locally. 

Mark 

Sent from my iPhone

On Aug 16, 2013, at 3:54 PM, Policeman Jenkins Server jenk...@thetaphi.de 
wrote:

 Build: http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-Linux/7040/
 Java: 32bit/jdk1.7.0_25 -server -XX:+UseSerialGC
 
 3 tests failed.
 FAILED:  
 org.apache.solr.client.solrj.impl.CloudSolrServerTest.testDistribSearch
 
 Error Message:
 java.util.concurrent.TimeoutException: Could not connect to ZooKeeper 
 127.0.0.1:59345 within 3 ms
 
 Stack Trace:
 java.lang.RuntimeException: java.util.concurrent.TimeoutException: Could not 
 connect to ZooKeeper 127.0.0.1:59345 within 3 ms
at __randomizedtesting.SeedInfo.seed([19D57DD7E754F289:9833F3CF900B92B5]:0)
at org.apache.solr.common.cloud.SolrZkClient.init(SolrZkClient.java:130)
at org.apache.solr.common.cloud.SolrZkClient.init(SolrZkClient.java:93)
at org.apache.solr.common.cloud.SolrZkClient.init(SolrZkClient.java:84)
at 
 org.apache.solr.cloud.AbstractZkTestCase.buildZooKeeper(AbstractZkTestCase.java:89)
at 
 org.apache.solr.cloud.AbstractZkTestCase.buildZooKeeper(AbstractZkTestCase.java:83)
at 
 org.apache.solr.cloud.AbstractDistribZkTestBase.setUp(AbstractDistribZkTestBase.java:70)
at 
 org.apache.solr.cloud.AbstractFullDistribZkTestBase.setUp(AbstractFullDistribZkTestBase.java:193)
at 
 org.apache.solr.client.solrj.impl.CloudSolrServerTest.setUp(CloudSolrServerTest.java:71)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
 com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1559)
at 
 com.carrotsearch.randomizedtesting.RandomizedRunner.access$600(RandomizedRunner.java:79)
at 
 com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:771)
at 
 com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:787)
at 
 com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53)
at 
 org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50)
at 
 org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:51)
at 
 org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
at 
 com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
at 
 org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:49)
at 
 org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70)
at 
 org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
at 
 com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
 com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:358)
at 
 com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:782)
at 
 com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:442)
at 
 com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:746)
at 
 com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:648)
at 
 com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:682)
at 
 com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:693)
at 
 com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
 com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53)
at 
 org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
at 
 org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42)
at 
 com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
at 
 com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
at 
 com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
at 
 com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
 org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:43)
at 
 

[jira] [Commented] (SOLR-5168) BJQParserTest reproducible failures

2013-08-16 Thread Mikhail Khludnev (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13742569#comment-13742569
 ] 

Mikhail Khludnev commented on SOLR-5168:


I wonder, how it could work (it seems I wrote it myself - my fault).

https://github.com/apache/lucene-solr/blob/trunk/solr/core/src/test/org/apache/solr/search/join/BJQParserTest.java#L56

test doesn't use block add, but adds docs one by one, hence a block can be 
broken by commit
{code}
 public static void createIndex()
...
   assertU(add(doc(idDoc)));
{code} 

 BJQParserTest reproducible failures
 ---

 Key: SOLR-5168
 URL: https://issues.apache.org/jira/browse/SOLR-5168
 Project: Solr
  Issue Type: Bug
Reporter: Hoss Man
Assignee: Yonik Seeley

 two recent Jenkins builds have uncovered some test seeds that cause failures 
 in multiple test methods in BJQParserTest.  These seeds reproduce reliably 
 (as of trunk r1514815) ...
 {noformat}
 ant test  -Dtestcase=BJQParserTest -Dtests.seed=7A613F321CE87F5B 
 -Dtests.multiplier=3 -Dtests.slow=true
 ant test  -Dtestcase=BJQParserTest -Dtests.seed=1DC8055F837E437E 
 -Dtests.multiplier=2 -Dtests.nightly=true -Dtests.slow=true 
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5179) Refactoring on PostingsWriterBase for delta-encoding

2013-08-16 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13742575#comment-13742575
 ] 

Robert Muir commented on LUCENE-5179:
-

Is it for real back compat or for impersonation ?

 Refactoring on PostingsWriterBase for delta-encoding
 

 Key: LUCENE-5179
 URL: https://issues.apache.org/jira/browse/LUCENE-5179
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Han Jiang
Assignee: Han Jiang
 Fix For: 5.0, 4.5

 Attachments: LUCENE-5179.patch


 A further step from LUCENE-5029.
 The short story is, previous API change brings two problems:
 * it somewhat breaks backward compatibility: although we can still read old 
 format,
   we can no longer reproduce it;
 * pulsing codec have problem with it.
 And long story...
 With the change, current PostingsBase API will be like this:
 * term dict tells PBF we start a new term (via startTerm());
 * PBF adds docs, positions and other postings data;
 * term dict tells PBF all the data for current term is completed (via 
 finishTerm()),
   then PBF returns the metadata for current term (as long[] and byte[]);
 * term dict might buffer all the metadata in an ArrayList. when all the term 
 is collected,
   it then decides how those metadata will be located on disk.
 So after the API change, PBF no longer have that annoying 'flushTermBlock', 
 and instead
 term dict maintains the term, metadata list.
 However, for each term we'll now write long[] blob before byte[], so the 
 index format is not consistent with pre-4.5.
 like in Lucne41, the metadata can be written as longA,bytesA,longB, but now 
 we have to write as longA,longB,bytesA.
 Another problem is, pulsing codec cannot tell wrapped PBF how the metadata is 
 delta-encoded, after all
 PulsingPostingsWriter is only a PBF.
 For example, we have terms=[a, a1, a2, b, b1 b2] and 
 itemsInBlock=2, so theoretically
 we'll finally have three blocks in BTTR: [a b]  [a1 a2]  [b1 b2], 
 with this
 approach, the metadata of term b is delta encoded base on metadata of a. 
 but when term dict tells
 PBF to finishTerm(b), it might silly do the delta encode base on term a2.
 So I think maybe we can introduce a method 'encodeTerm(long[], DataOutput 
 out, FieldInfo, TermState, boolean absolute)',
 so that during metadata flush, we can control how current term is written? 
 And the term dict will buffer TermState, which
 implicitly holds metadata like we do in PBReader side.
 For example, if we want to reproduce old lucene41 format , we can simple set 
 longsSize==0, then PBF
 writes the old format (longA,bytesA,longB) to DataOutput, and the compatible 
 issue is solved.
 For pulsing codec, it will also be able to tell lower level how to encode 
 metadata.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5125) Distributed MoreLikeThis fails with NullPointerException, shard query gives EarlyTerminatingCollectorException

2013-08-16 Thread Shawn Heisey (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13742600#comment-13742600
 ] 

Shawn Heisey commented on SOLR-5125:


Does anyone have any ideas here? The same thing happens with a 4x snapshot:

4.5-SNAPSHOT 1514424 - ncindex - 2013-08-15 12:56:50

 Distributed MoreLikeThis fails with NullPointerException, shard query gives 
 EarlyTerminatingCollectorException
 --

 Key: SOLR-5125
 URL: https://issues.apache.org/jira/browse/SOLR-5125
 Project: Solr
  Issue Type: Bug
  Components: MoreLikeThis
Affects Versions: 4.4
Reporter: Shawn Heisey
 Fix For: 4.5, 5.0


 A distributed MoreLikeThis query that works perfectly on 4.2.1 is failing on 
 4.4.0.  The original query returns a NullPointerException.  The Solr log 
 shows that the shard queries are throwing EarlyTerminatingCollectorException. 
  Full details to follow in the comments.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5179) Refactoring on PostingsWriterBase for delta-encoding

2013-08-16 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13742605#comment-13742605
 ] 

Michael McCandless commented on LUCENE-5179:


I believe it's for impersonation.  Real back-compat (reader can read the old 
index format using the new APIs) should work fine, I think?

 Refactoring on PostingsWriterBase for delta-encoding
 

 Key: LUCENE-5179
 URL: https://issues.apache.org/jira/browse/LUCENE-5179
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Han Jiang
Assignee: Han Jiang
 Fix For: 5.0, 4.5

 Attachments: LUCENE-5179.patch


 A further step from LUCENE-5029.
 The short story is, previous API change brings two problems:
 * it somewhat breaks backward compatibility: although we can still read old 
 format,
   we can no longer reproduce it;
 * pulsing codec have problem with it.
 And long story...
 With the change, current PostingsBase API will be like this:
 * term dict tells PBF we start a new term (via startTerm());
 * PBF adds docs, positions and other postings data;
 * term dict tells PBF all the data for current term is completed (via 
 finishTerm()),
   then PBF returns the metadata for current term (as long[] and byte[]);
 * term dict might buffer all the metadata in an ArrayList. when all the term 
 is collected,
   it then decides how those metadata will be located on disk.
 So after the API change, PBF no longer have that annoying 'flushTermBlock', 
 and instead
 term dict maintains the term, metadata list.
 However, for each term we'll now write long[] blob before byte[], so the 
 index format is not consistent with pre-4.5.
 like in Lucne41, the metadata can be written as longA,bytesA,longB, but now 
 we have to write as longA,longB,bytesA.
 Another problem is, pulsing codec cannot tell wrapped PBF how the metadata is 
 delta-encoded, after all
 PulsingPostingsWriter is only a PBF.
 For example, we have terms=[a, a1, a2, b, b1 b2] and 
 itemsInBlock=2, so theoretically
 we'll finally have three blocks in BTTR: [a b]  [a1 a2]  [b1 b2], 
 with this
 approach, the metadata of term b is delta encoded base on metadata of a. 
 but when term dict tells
 PBF to finishTerm(b), it might silly do the delta encode base on term a2.
 So I think maybe we can introduce a method 'encodeTerm(long[], DataOutput 
 out, FieldInfo, TermState, boolean absolute)',
 so that during metadata flush, we can control how current term is written? 
 And the term dict will buffer TermState, which
 implicitly holds metadata like we do in PBReader side.
 For example, if we want to reproduce old lucene41 format , we can simple set 
 longsSize==0, then PBF
 writes the old format (longA,bytesA,longB) to DataOutput, and the compatible 
 issue is solved.
 For pulsing codec, it will also be able to tell lower level how to encode 
 metadata.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5168) BJQParserTest reproducible failures

2013-08-16 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13742607#comment-13742607
 ] 

Yonik Seeley commented on SOLR-5168:


bq. test doesn't use block add

Yeah, I thought that was on purpose to test the query separately from any block 
indexing.
Simplest fix would be to disable the random IW stuff for this test (it would 
always work if the buffering in IW is enough such that the docs are flushed to 
a single segment).  Optimizing after that fact in conjunction with the log 
merge policy would also work.

 BJQParserTest reproducible failures
 ---

 Key: SOLR-5168
 URL: https://issues.apache.org/jira/browse/SOLR-5168
 Project: Solr
  Issue Type: Bug
Reporter: Hoss Man
Assignee: Yonik Seeley

 two recent Jenkins builds have uncovered some test seeds that cause failures 
 in multiple test methods in BJQParserTest.  These seeds reproduce reliably 
 (as of trunk r1514815) ...
 {noformat}
 ant test  -Dtestcase=BJQParserTest -Dtests.seed=7A613F321CE87F5B 
 -Dtests.multiplier=3 -Dtests.slow=true
 ant test  -Dtestcase=BJQParserTest -Dtests.seed=1DC8055F837E437E 
 -Dtests.multiplier=2 -Dtests.nightly=true -Dtests.slow=true 
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5099) The core.properties not created during collection creation

2013-08-16 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13742615#comment-13742615
 ] 

Mark Miller commented on SOLR-5099:
---

For bug fixes to unreleased issues that a non committer contribs towards, we 
should add credit to the issue that caused the bug. If it's minor in comparison 
to the original issue, we tend to create sub Changes entries - see some 
previous examples in Changes. I'll make an update here.

 The core.properties not created during collection creation
 --

 Key: SOLR-5099
 URL: https://issues.apache.org/jira/browse/SOLR-5099
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Affects Versions: 4.5, 5.0
Reporter: Herb Jiang
Assignee: Mark Miller
Priority: Critical
 Fix For: 4.5, 5.0

 Attachments: CorePropertiesLocator.java.patch


 When using the new solr.xml structure. The core auto discovery mechanism 
 trying to find core.properties. 
 But I found the core.properties cannot be create when I dynamically create a 
 collection.
 The root issue is the CorePropertiesLocator trying to create properties 
 before the instanceDir is created. 
 And collection creation process will done and looks fine at runtime, but it 
 will cause issues (cores are not auto discovered after server restart).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-5170) Spatial multi-value distance sort via DocValues

2013-08-16 Thread David Smiley (JIRA)
David Smiley created SOLR-5170:
--

 Summary: Spatial multi-value distance sort via DocValues
 Key: SOLR-5170
 URL: https://issues.apache.org/jira/browse/SOLR-5170
 Project: Solr
  Issue Type: New Feature
  Components: spatial
Reporter: David Smiley
Assignee: David Smiley


The attached patch implements spatial multi-value distance sorting.  In other 
words, a document can have more than one point per field, and using a provided 
function query, it will return the distance to the closest point.  The data 
goes into binary DocValues, and as-such it's pretty friendly to realtime search 
requirements, and it only uses 8 bytes per point.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-5168) BJQParserTest reproducible failures

2013-08-16 Thread Mikhail Khludnev (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-5168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mikhail Khludnev updated SOLR-5168:
---

Attachment: BJQTest.patch

first patch. it solves most of tests but testGrandChildren() still fails on 
broken block. 

 BJQParserTest reproducible failures
 ---

 Key: SOLR-5168
 URL: https://issues.apache.org/jira/browse/SOLR-5168
 Project: Solr
  Issue Type: Bug
Reporter: Hoss Man
Assignee: Yonik Seeley
 Attachments: BJQTest.patch


 two recent Jenkins builds have uncovered some test seeds that cause failures 
 in multiple test methods in BJQParserTest.  These seeds reproduce reliably 
 (as of trunk r1514815) ...
 {noformat}
 ant test  -Dtestcase=BJQParserTest -Dtests.seed=7A613F321CE87F5B 
 -Dtests.multiplier=3 -Dtests.slow=true
 ant test  -Dtestcase=BJQParserTest -Dtests.seed=1DC8055F837E437E 
 -Dtests.multiplier=2 -Dtests.nightly=true -Dtests.slow=true 
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-5170) Spatial multi-value distance sort via DocValues

2013-08-16 Thread David Smiley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-5170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Smiley updated SOLR-5170:
---

Attachment: SOLR-5170_spatial_multi-value_sort_via_docvalues.patch


*The first patch is not committable*.
* The biggest reason why is there's an awkward hack to work-around the fact 
that a Solr FieldType can't aggregate multiple values into a single 
BinaryDocValuesField. So I've got this UpdateRequestProcessor that works in 
concert with the field.  SOLR-4329
* Secondly it needs more tests. It's been working in quasi-production for many 
months, though.
* And thirdly, I'd prefer to see this mechanism integrated into the lucene 
spatial framework somehow.

If you want to know how to use it, look at the tests.  I'm providing this 
because I got permission to open-source it and people want this capability.  
Once SOLR-4329 is addressed then I'll work on this code more to make it 
commit-worthy.

 Spatial multi-value distance sort via DocValues
 ---

 Key: SOLR-5170
 URL: https://issues.apache.org/jira/browse/SOLR-5170
 Project: Solr
  Issue Type: New Feature
  Components: spatial
Reporter: David Smiley
Assignee: David Smiley
 Attachments: SOLR-5170_spatial_multi-value_sort_via_docvalues.patch


 The attached patch implements spatial multi-value distance sorting.  In other 
 words, a document can have more than one point per field, and using a 
 provided function query, it will return the distance to the closest point.  
 The data goes into binary DocValues, and as-such it's pretty friendly to 
 realtime search requirements, and it only uses 8 bytes per point.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5170) Spatial multi-value distance sort via DocValues

2013-08-16 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13742654#comment-13742654
 ] 

Robert Muir commented on SOLR-5170:
---

why use BINARY vs SORTED_SET? that has a much easier fit in solr to boot. its 
designed for multiple values...

 Spatial multi-value distance sort via DocValues
 ---

 Key: SOLR-5170
 URL: https://issues.apache.org/jira/browse/SOLR-5170
 Project: Solr
  Issue Type: New Feature
  Components: spatial
Reporter: David Smiley
Assignee: David Smiley
 Attachments: SOLR-5170_spatial_multi-value_sort_via_docvalues.patch


 The attached patch implements spatial multi-value distance sorting.  In other 
 words, a document can have more than one point per field, and using a 
 provided function query, it will return the distance to the closest point.  
 The data goes into binary DocValues, and as-such it's pretty friendly to 
 realtime search requirements, and it only uses 8 bytes per point.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Luceneutil high variability between runs

2013-08-16 Thread Tom Burton-West
Hello,

I'm trying to benchmark a change to BM25Similarity (LUCENE-5175 )using
luceneutil

I'm running this on a lightly loaded machine with a load average (top) of
about 0.01 when the benchmark is not running.

I made the following changes:
1) localrun.py changed Competition(debug=True) to Competition(debug=False)
2) made the following changes to localconstants.py per Robert Muir's
suggestion:
JAVA_COMMAND = 'java -server -Xms4g -Xmx4g'
SEARCH_NUM_THREADS = 1
3) for the BM25 tests set SIMILARITY_DEFAULT='BM25Similarity'
4) for the BM25 tests uncommened   the following line from searchBench.py
#verifyScores = False

Attached is output from iter 19 of several runs

The first 4 runs show consistently that the modified version is somewhere
between 6% and 8% slower on the tasks with the highest difference between
trunk and patch.
However if you look at the baseline TaskQPS, for HighTerm, for example,
 run 3 is about 55 and run 1 is about 88.  So the difference for this task
 between different runs of the bench program is very much higher than the
differences between trunk and modified/patch within a run.

Is this to be expected?   Is there a reason I should believe  the
differences shown within a run reflect the true differences?

Seeing this variability, I then switched DEFAULT_SIMILARITY back to
DefaultSimilarity.  In this case trunk and my_modified, should be
exercising exactly the same code, since the only changes in the patch are
the addition of a test case for BM25Similarity and a change to
BM25Similarity.

In this case the modified version varies from -6.2% difference from the
base to +4.4% difference from the base for LowTerm.
Comparing  QPS for the base case for HighTerm between different runs we can
see it varies from about 21 for run 1 to 76 for run 3.

Is this kind of  variation between runs of the benchmark to be expected?

Any suggestions about where to look to reduce the variations between runs?

Tom

BM25Similarity runs where my_modified_version is LUCENE-


 tail -33 BM25SimRun1 |head -5
Report after iter 19:
TaskQPS baseline  StdDevQPS my_modified_version  
StdDevPct diff
HighTerm   87.91 (13.2%)   81.02  (8.5%)   
-7.8% ( -26% -   16%)
 MedTerm  111.81 (13.2%)  103.11  (8.4%)   
-7.8% ( -25% -   15%)
 LowTerm  411.44 (17.7%)  382.47 (14.5%)   
-7.0% ( -33% -   30%)
[tburtonw@alamo runs]$ tail -33 BM25SimRun2 |head -5
Report after iter 19:
TaskQPS baseline  StdDevQPS my_modified_version  
StdDevPct diff
HighTerm   62.15  (6.4%)   58.10  (7.1%)   
-6.5% ( -18% -7%)
 MedTerm  139.11  (4.5%)  130.22  (7.5%)   
-6.4% ( -17% -5%)
 LowTerm  391.93 (10.5%)  373.71 (13.1%)   
-4.6% ( -25% -   21%)
[tburtonw@alamo runs]$ tail -33 BM25SimRun3 |head -5
Report after iter 19:
TaskQPS baseline  StdDevQPS my_modified_version  
StdDevPct diff
HighTerm   54.85  (6.5%)   50.18  (1.6%)   
-8.5% ( -15% -0%)
 MedTerm  146.04  (8.6%)  137.31  (4.7%)   
-6.0% ( -17% -8%)
OrNotHighLow   45.85 (11.1%)   43.37 (10.6%)   
-5.4% ( -24% -   18%)
[tburtonw@alamo runs]$ tail -33 BM25SimRun4 |head -5
Report after iter 19:
TaskQPS baseline  StdDevQPS my_modified_version  
StdDevPct diff
OrNotHighMed   49.40  (8.7%)   45.37  (8.8%)   
-8.2% ( -23% -   10%)
OrNotHighLow   65.48  (8.7%)   60.19  (9.0%)   
-8.1% ( -23% -   10%)
   OrNotHighHigh   37.06  (8.2%)   34.18  (8.2%)   
-7.8% ( -22% -9%)

==
Default similarity, which is not modified by the BM25 patch

DefaultSimRun1
 LowTerm  398.97 (17.9%)  398.94 (18.1%)   
-0.0% ( -30% -   43%)
HighTerm   21.13 (12.1%)   21.45 (12.2%)
1.5% ( -20% -   29%)
DefaultSimRun2
 LowTerm  406.93 (17.1%)  381.51 (15.8%)   
-6.2% ( -33% -   32%)
HighTerm   59.21  (2.5%)   59.70  (3.5%)
0.8% (  -5% -7%)
DefaultSimRun3
 LowTerm  431.59 (18.5%)  450.55 (16.8%)
4.4% ( -26% -   48%)
HighTerm   76.45  (2.0%)   76.45  (1.7%)
0.0% (  -3% -3%)



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5179) Refactoring on PostingsWriterBase for delta-encoding

2013-08-16 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13742665#comment-13742665
 ] 

Robert Muir commented on LUCENE-5179:
-

we have had imperfect impersonation before (For example 
PreFlexRWFieldInfosReader).

But the idea was to exercise to the best extent possible: e.g. if somehow we 
can make a Reader in the RW package (impersonator) that subclasses the real 
reader and overrides the term metadata piece, at least we are still testing the 
postings lists and term bytes and so on.

and the real reader in lucene/core still gets some basic tests from 
TestBackwardsCompatibility.

 Refactoring on PostingsWriterBase for delta-encoding
 

 Key: LUCENE-5179
 URL: https://issues.apache.org/jira/browse/LUCENE-5179
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Han Jiang
Assignee: Han Jiang
 Fix For: 5.0, 4.5

 Attachments: LUCENE-5179.patch


 A further step from LUCENE-5029.
 The short story is, previous API change brings two problems:
 * it somewhat breaks backward compatibility: although we can still read old 
 format,
   we can no longer reproduce it;
 * pulsing codec have problem with it.
 And long story...
 With the change, current PostingsBase API will be like this:
 * term dict tells PBF we start a new term (via startTerm());
 * PBF adds docs, positions and other postings data;
 * term dict tells PBF all the data for current term is completed (via 
 finishTerm()),
   then PBF returns the metadata for current term (as long[] and byte[]);
 * term dict might buffer all the metadata in an ArrayList. when all the term 
 is collected,
   it then decides how those metadata will be located on disk.
 So after the API change, PBF no longer have that annoying 'flushTermBlock', 
 and instead
 term dict maintains the term, metadata list.
 However, for each term we'll now write long[] blob before byte[], so the 
 index format is not consistent with pre-4.5.
 like in Lucne41, the metadata can be written as longA,bytesA,longB, but now 
 we have to write as longA,longB,bytesA.
 Another problem is, pulsing codec cannot tell wrapped PBF how the metadata is 
 delta-encoded, after all
 PulsingPostingsWriter is only a PBF.
 For example, we have terms=[a, a1, a2, b, b1 b2] and 
 itemsInBlock=2, so theoretically
 we'll finally have three blocks in BTTR: [a b]  [a1 a2]  [b1 b2], 
 with this
 approach, the metadata of term b is delta encoded base on metadata of a. 
 but when term dict tells
 PBF to finishTerm(b), it might silly do the delta encode base on term a2.
 So I think maybe we can introduce a method 'encodeTerm(long[], DataOutput 
 out, FieldInfo, TermState, boolean absolute)',
 so that during metadata flush, we can control how current term is written? 
 And the term dict will buffer TermState, which
 implicitly holds metadata like we do in PBReader side.
 For example, if we want to reproduce old lucene41 format , we can simple set 
 longsSize==0, then PBF
 writes the old format (longA,bytesA,longB) to DataOutput, and the compatible 
 issue is solved.
 For pulsing codec, it will also be able to tell lower level how to encode 
 metadata.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Luceneutil high variability between runs

2013-08-16 Thread Robert Muir
I think the raw values don't matter so much because there is some
randomization involved? And the same random seed is used...

Your DefaultSimilarityRuns look pretty stable. its between 0.0% and
1.5% variation which is about as good as it gets for HighTerm

LowTerm i am guessing is always noisy because they are so fast. a few
of these measures at least are, i know particularly IntNRQ :)

On Fri, Aug 16, 2013 at 6:20 PM, Tom Burton-West tburt...@umich.edu wrote:
 Hello,

 I'm trying to benchmark a change to BM25Similarity (LUCENE-5175 )using
 luceneutil

 I'm running this on a lightly loaded machine with a load average (top) of
 about 0.01 when the benchmark is not running.

 I made the following changes:
 1) localrun.py changed Competition(debug=True) to Competition(debug=False)
 2) made the following changes to localconstants.py per Robert Muir's
 suggestion:
 JAVA_COMMAND = 'java -server -Xms4g -Xmx4g'
 SEARCH_NUM_THREADS = 1
 3) for the BM25 tests set SIMILARITY_DEFAULT='BM25Similarity'
 4) for the BM25 tests uncommened   the following line from searchBench.py
 #verifyScores = False

 Attached is output from iter 19 of several runs

 The first 4 runs show consistently that the modified version is somewhere
 between 6% and 8% slower on the tasks with the highest difference between
 trunk and patch.
 However if you look at the baseline TaskQPS, for HighTerm, for example,  run
 3 is about 55 and run 1 is about 88.  So the difference for this task
 between different runs of the bench program is very much higher than the
 differences between trunk and modified/patch within a run.

 Is this to be expected?   Is there a reason I should believe  the
 differences shown within a run reflect the true differences?

 Seeing this variability, I then switched DEFAULT_SIMILARITY back to
 DefaultSimilarity.  In this case trunk and my_modified, should be
 exercising exactly the same code, since the only changes in the patch are
 the addition of a test case for BM25Similarity and a change to
 BM25Similarity.

 In this case the modified version varies from -6.2% difference from the
 base to +4.4% difference from the base for LowTerm.
 Comparing  QPS for the base case for HighTerm between different runs we can
 see it varies from about 21 for run 1 to 76 for run 3.

 Is this kind of  variation between runs of the benchmark to be expected?

 Any suggestions about where to look to reduce the variations between runs?

 Tom








 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5164) In some cases, creating collections via the Collections API due to core being created in the wrong directory

2013-08-16 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13742719#comment-13742719
 ] 

Mark Miller commented on SOLR-5164:
---

I added some new important testing - we were not testing with a relative 
solr.home at all really - now it randomly uses one. I also added explicit 
testing to make sure the instance dir for collections api created cores is 
correct.

 In some cases, creating collections via the Collections API due to core being 
 created in the wrong directory
 

 Key: SOLR-5164
 URL: https://issues.apache.org/jira/browse/SOLR-5164
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Affects Versions: 4.5, 5.0
Reporter: Erick Erickson
Assignee: Mark Miller
Priority: Critical
 Fix For: 4.5, 5.0

 Attachments: SOLR-5164.patch


 When you try to create a collection in SolrCloud, the instanceDir that gets 
 created has an extra solr in it which messes up the pathing for all the 
 lib directives in solrconfig.xml as they're all relative.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5099) The core.properties not created during collection creation

2013-08-16 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13742720#comment-13742720
 ] 

Mark Miller commented on SOLR-5099:
---

I added an explicit test to make sure the cores.properties file is created.

 The core.properties not created during collection creation
 --

 Key: SOLR-5099
 URL: https://issues.apache.org/jira/browse/SOLR-5099
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Affects Versions: 4.5, 5.0
Reporter: Herb Jiang
Assignee: Mark Miller
Priority: Critical
 Fix For: 4.5, 5.0

 Attachments: CorePropertiesLocator.java.patch


 When using the new solr.xml structure. The core auto discovery mechanism 
 trying to find core.properties. 
 But I found the core.properties cannot be create when I dynamically create a 
 collection.
 The root issue is the CorePropertiesLocator trying to create properties 
 before the instanceDir is created. 
 And collection creation process will done and looks fine at runtime, but it 
 will cause issues (cores are not auto discovered after server restart).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (SOLR-5164) In some cases, creating collections via the Collections API due to core being created in the wrong directory

2013-08-16 Thread Mark Miller (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-5164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Miller resolved SOLR-5164.
---

Resolution: Fixed

 In some cases, creating collections via the Collections API due to core being 
 created in the wrong directory
 

 Key: SOLR-5164
 URL: https://issues.apache.org/jira/browse/SOLR-5164
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Affects Versions: 4.5, 5.0
Reporter: Erick Erickson
Assignee: Mark Miller
Priority: Critical
 Fix For: 4.5, 5.0

 Attachments: SOLR-5164.patch


 When you try to create a collection in SolrCloud, the instanceDir that gets 
 created has an extra solr in it which messes up the pathing for all the 
 lib directives in solrconfig.xml as they're all relative.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



  1   2   >