[jira] [Commented] (SOLR-4718) Allow solr.xml to be stored in zookeeper

2013-08-11 Thread Noble Paul (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13736216#comment-13736216
 ] 

Noble Paul commented on SOLR-4718:
--

I'm wondering why solr.xml is an xml file? From what I see why can't we just 
make it a properties file.

 Allow solr.xml to be stored in zookeeper
 

 Key: SOLR-4718
 URL: https://issues.apache.org/jira/browse/SOLR-4718
 Project: Solr
  Issue Type: Improvement
  Components: Schema and Analysis
Affects Versions: 4.3, 5.0
Reporter: Erick Erickson
Assignee: Erick Erickson
 Attachments: SOLR-4718.patch


 So the near-final piece of this puzzle is to make solr.xml be storable in 
 Zookeeper. Code-wise in terms of Solr, this doesn't look very difficult, I'm 
 working on it now.
 More interesting is how to get the configuration into ZK in the first place, 
 enhancements to ZkCli? Or boostrap-conf? Other? I'm punting on that for this 
 patch.
 Second level is how to tell Solr to get the file from ZK. Some possibilities:
 1 A system prop, -DzkSolrXmlPath=blah where blah is the path _on zk_ where 
 the file is. Would require -DzkHost or -DzkRun as well.
pros - simple, I can wrap my head around it.
  - easy to script
cons - can't run multiple JVMs pointing to different files. Is this 
 really a problem?
 2 New solr.xml element. Something like:
 solr
   solrcloud
  str name=zkHostzkurl/str
  str name=zkSolrXmlPathwhatever/str
   /solrcloud
 solr
Really, this form would hinge on the presence or absence of zkSolrXmlPath. 
 If present, go up and look for the indicated solr.xml file on ZK. Any 
 properties in the ZK version would overwrite anything in the local copy.
 NOTE: I'm really not very interested in supporting this as an option for 
 old-style solr.xml unless it's _really_ easy. For instance, what if the local 
 solr.xml is new-style and the one in ZK is old-style? Or vice-versa? Since 
 old-style is going away, this doesn't seem like it's worth the effort.
 pros - No new mechanisms
 cons - once again requires that there be a solr.xml file on each client. 
 Admittedly for installations that didn't care much about multiple JVMs, it 
 could be a stock file that didn't change...
 For now, I'm going to just manually push solr.xml to ZK, then read it based 
 on a sysprop. That'll get the structure in place while we debate. Not going 
 to check this in until there's some consensus though.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4906) PostingsHighlighter's PassageFormatter should allow for rendering to arbitrary objects

2013-08-11 Thread Luca Cavanna (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13736240#comment-13736240
 ] 

Luca Cavanna commented on LUCENE-4906:
--

Hi Mike,
I definitely agree that highlighting api should be simple and the postings 
highlighter is probably the only one that's really easy to use.

On the other hand, I think it's good to make explicit that if you use a 
FormatterYourObject, YourObject is what you're going to get back from the 
highlighter. People using the string version wouldn't notice the change, while 
advanced users would have to extend the base class and get type safety too, 
that in my opinion makes it clearier and easier. Using Object feels to me a 
little old-fashioned and bogus, but again that's probably me :)

I do trust your experience though. If you think the object version is better 
that's fine with me. What I care about is that this improvement gets committed 
soon, since it's a really useful one ;)

Thanks a lot for sharing your ideas

 PostingsHighlighter's PassageFormatter should allow for rendering to 
 arbitrary objects
 --

 Key: LUCENE-4906
 URL: https://issues.apache.org/jira/browse/LUCENE-4906
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Michael McCandless
 Attachments: LUCENE-4906.patch, LUCENE-4906.patch


 For example, in a server, I may want to render the highlight result to 
 JsonObject to send back to the front-end. Today since we render to string, I 
 have to render to JSON string and then re-parse to JsonObject, which is 
 inefficient...
 Or, if (Rob's idea:) we make a query that's like MoreLikeThis but it pulls 
 terms from snippets instead, so you get proximity-influenced salient/expanded 
 terms, then perhaps that renders to just an array of tokens or fragments or 
 something from each snippet.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4906) PostingsHighlighter's PassageFormatter should allow for rendering to arbitrary objects

2013-08-11 Thread Luca Cavanna (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13736242#comment-13736242
 ] 

Luca Cavanna commented on LUCENE-4906:
--

One more thing: re-reading Robert's previous comments, I find also interesting 
the idea he had about changing the api to return a proper object instead of the 
MapString, String[], or the String[] for the simplest methods. I wonder if 
it's worth to address this as well in this issue, or if the current api is 
clear enough in your opinion. Any thoughts?

 PostingsHighlighter's PassageFormatter should allow for rendering to 
 arbitrary objects
 --

 Key: LUCENE-4906
 URL: https://issues.apache.org/jira/browse/LUCENE-4906
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Michael McCandless
 Attachments: LUCENE-4906.patch, LUCENE-4906.patch


 For example, in a server, I may want to render the highlight result to 
 JsonObject to send back to the front-end. Today since we render to string, I 
 have to render to JSON string and then re-parse to JsonObject, which is 
 inefficient...
 Or, if (Rob's idea:) we make a query that's like MoreLikeThis but it pulls 
 terms from snippets instead, so you get proximity-influenced salient/expanded 
 terms, then perhaps that renders to just an array of tokens or fragments or 
 something from each snippet.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-5164) Remove the OOM catching in SimpleFSDirectory and NIOFSDirectory

2013-08-11 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-5164:
--

Attachment: LUCENE-5164.patch

New patch:
- The NIOFS read loop was further cleaned up and simplified by using the 
ByteBuffer tracking.
- The setter/getter in FSDirectory are no no-ops (deprecated)
- Every implementation has its own chunk size, which fits the underlying IO 
layer. For RandomAccessFile this is 8192 bytes

I decided not to put the chunking into Buffered*, as it is still separate and 
complicates code of Buffered* even more.

 Remove the OOM catching in SimpleFSDirectory and NIOFSDirectory
 ---

 Key: LUCENE-5164
 URL: https://issues.apache.org/jira/browse/LUCENE-5164
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Attachments: LUCENE-5164.patch, LUCENE-5164.patch


 Followup from LUCENE-5161:
 In former times we added the OOM cactching in NIOFSDir and SimpleFSDir 
 because nobody understand why the OOM could happen on FileChannel.read() or 
 SimpleFSDir.read(). By reading the Java code its easy to understand (it 
 allocates direct buffers with same size as the requested length to read). As 
 we have chunking now reduce to a few kilobytes it cannot happen anymore that 
 we get spurious OOMs.
 In fact we might hide a *real* OOM! So we should remove it.
 I am also not sure if we should make chunk size configureable in FSDirectory 
 at all! It makes no sense to me (it was in fact only added for people that 
 hit the OOM to fine-tune).
 In my opinion we should remove the setter in trunk and keep it deprecated in 
 4.x. The buf size is then in trunk equal to the defaults from LUCENE-5161.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-5164) Remove the OOM catching in SimpleFSDirectory and NIOFSDirectory

2013-08-11 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-5164:
--

Attachment: LUCENE-5164.patch

Improved test in TestDirectory that ensures if chunking is working correctly.

This is now ready.

 Remove the OOM catching in SimpleFSDirectory and NIOFSDirectory
 ---

 Key: LUCENE-5164
 URL: https://issues.apache.org/jira/browse/LUCENE-5164
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Attachments: LUCENE-5164.patch, LUCENE-5164.patch, LUCENE-5164.patch


 Followup from LUCENE-5161:
 In former times we added the OOM cactching in NIOFSDir and SimpleFSDir 
 because nobody understand why the OOM could happen on FileChannel.read() or 
 SimpleFSDir.read(). By reading the Java code its easy to understand (it 
 allocates direct buffers with same size as the requested length to read). As 
 we have chunking now reduce to a few kilobytes it cannot happen anymore that 
 we get spurious OOMs.
 In fact we might hide a *real* OOM! So we should remove it.
 I am also not sure if we should make chunk size configureable in FSDirectory 
 at all! It makes no sense to me (it was in fact only added for people that 
 hit the OOM to fine-tune).
 In my opinion we should remove the setter in trunk and keep it deprecated in 
 4.x. The buf size is then in trunk equal to the defaults from LUCENE-5161.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5101) make it easier to plugin different bitset implementations to CachingWrapperFilter

2013-08-11 Thread Paul Elschot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13736266#comment-13736266
 ] 

Paul Elschot commented on LUCENE-5101:
--

Patch looks good to me, too.

Hopefully we'll get some early feedback about performance.


 make it easier to plugin different bitset implementations to 
 CachingWrapperFilter
 -

 Key: LUCENE-5101
 URL: https://issues.apache.org/jira/browse/LUCENE-5101
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Robert Muir
 Attachments: DocIdSetBenchmark.java, LUCENE-5101.patch, 
 LUCENE-5101.patch


 Currently this is possible, but its not so friendly:
 {code}
   protected DocIdSet docIdSetToCache(DocIdSet docIdSet, AtomicReader reader) 
 throws IOException {
 if (docIdSet == null) {
   // this is better than returning null, as the nonnull result can be 
 cached
   return EMPTY_DOCIDSET;
 } else if (docIdSet.isCacheable()) {
   return docIdSet;
 } else {
   final DocIdSetIterator it = docIdSet.iterator();
   // null is allowed to be returned by iterator(),
   // in this case we wrap with the sentinel set,
   // which is cacheable.
   if (it == null) {
 return EMPTY_DOCIDSET;
   } else {
 /* INTERESTING PART */
 final FixedBitSet bits = new FixedBitSet(reader.maxDoc());
 bits.or(it);
 return bits;
 /* END INTERESTING PART */
   }
 }
   }
 {code}
 Is there any value to having all this other logic in the protected API? It 
 seems like something thats not useful for a subclass... Maybe this stuff can 
 become final, and INTERESTING PART calls a simpler method, something like:
 {code}
 protected DocIdSet cacheImpl(DocIdSetIterator iterator, AtomicReader reader) {
   final FixedBitSet bits = new FixedBitSet(reader.maxDoc());
   bits.or(iterator);
   return bits;
 }
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-5164) Remove the OOM catching in SimpleFSDirectory and NIOFSDirectory

2013-08-11 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-5164:
--

Attachment: LUCENE-5164.patch

Explicitely pass buffer size as CHUNK_SIZE to BufferedIndexOutput for 
FSDirectory.

 Remove the OOM catching in SimpleFSDirectory and NIOFSDirectory
 ---

 Key: LUCENE-5164
 URL: https://issues.apache.org/jira/browse/LUCENE-5164
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Attachments: LUCENE-5164.patch, LUCENE-5164.patch, LUCENE-5164.patch, 
 LUCENE-5164.patch


 Followup from LUCENE-5161:
 In former times we added the OOM cactching in NIOFSDir and SimpleFSDir 
 because nobody understand why the OOM could happen on FileChannel.read() or 
 SimpleFSDir.read(). By reading the Java code its easy to understand (it 
 allocates direct buffers with same size as the requested length to read). As 
 we have chunking now reduce to a few kilobytes it cannot happen anymore that 
 we get spurious OOMs.
 In fact we might hide a *real* OOM! So we should remove it.
 I am also not sure if we should make chunk size configureable in FSDirectory 
 at all! It makes no sense to me (it was in fact only added for people that 
 hit the OOM to fine-tune).
 In my opinion we should remove the setter in trunk and keep it deprecated in 
 4.x. The buf size is then in trunk equal to the defaults from LUCENE-5161.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4718) Allow solr.xml to be stored in zookeeper

2013-08-11 Thread Erick Erickson (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13736270#comment-13736270
 ] 

Erick Erickson commented on SOLR-4718:
--

I tried that once. It gets a little ugly with entries like solr.cloud=. The xml 
does divide things nicely into sections. Having a solr.xml allows a clean 
specification of a local properties file, there's no confusion between solr.xml 
and solr.properties, but that could be handled by convention since we really 
haven't decided what the local properties file could be (something like 
solr.properties and solrlocal.properties).

But personally I don't want to go through the hassle of changing from solr.xml. 
I agree that functionally we should be able to get by with a properties file, 
but the fact that it's xml is built into the code in a lot of places and 
untangling the xml-ish nature is more time consuming (at least it was last time 
I did it then reverted) than valuable I think.

 Allow solr.xml to be stored in zookeeper
 

 Key: SOLR-4718
 URL: https://issues.apache.org/jira/browse/SOLR-4718
 Project: Solr
  Issue Type: Improvement
  Components: Schema and Analysis
Affects Versions: 4.3, 5.0
Reporter: Erick Erickson
Assignee: Erick Erickson
 Attachments: SOLR-4718.patch


 So the near-final piece of this puzzle is to make solr.xml be storable in 
 Zookeeper. Code-wise in terms of Solr, this doesn't look very difficult, I'm 
 working on it now.
 More interesting is how to get the configuration into ZK in the first place, 
 enhancements to ZkCli? Or boostrap-conf? Other? I'm punting on that for this 
 patch.
 Second level is how to tell Solr to get the file from ZK. Some possibilities:
 1 A system prop, -DzkSolrXmlPath=blah where blah is the path _on zk_ where 
 the file is. Would require -DzkHost or -DzkRun as well.
pros - simple, I can wrap my head around it.
  - easy to script
cons - can't run multiple JVMs pointing to different files. Is this 
 really a problem?
 2 New solr.xml element. Something like:
 solr
   solrcloud
  str name=zkHostzkurl/str
  str name=zkSolrXmlPathwhatever/str
   /solrcloud
 solr
Really, this form would hinge on the presence or absence of zkSolrXmlPath. 
 If present, go up and look for the indicated solr.xml file on ZK. Any 
 properties in the ZK version would overwrite anything in the local copy.
 NOTE: I'm really not very interested in supporting this as an option for 
 old-style solr.xml unless it's _really_ easy. For instance, what if the local 
 solr.xml is new-style and the one in ZK is old-style? Or vice-versa? Since 
 old-style is going away, this doesn't seem like it's worth the effort.
 pros - No new mechanisms
 cons - once again requires that there be a solr.xml file on each client. 
 Admittedly for installations that didn't care much about multiple JVMs, it 
 could be a stock file that didn't change...
 For now, I'm going to just manually push solr.xml to ZK, then read it based 
 on a sysprop. That'll get the structure in place while we debate. Not going 
 to check this in until there's some consensus though.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4718) Allow solr.xml to be stored in zookeeper

2013-08-11 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13736275#comment-13736275
 ] 

Mark Miller commented on SOLR-4718:
---

The idea has come up before - there was a preference for the better hierarchy 
support with xml vs properties as well as consistency with the solrcore 
configuration.

It has little to do with putting it in zookeeper though - we want the same 
format as if it's on disk.

 Allow solr.xml to be stored in zookeeper
 

 Key: SOLR-4718
 URL: https://issues.apache.org/jira/browse/SOLR-4718
 Project: Solr
  Issue Type: Improvement
  Components: Schema and Analysis
Affects Versions: 4.3, 5.0
Reporter: Erick Erickson
Assignee: Erick Erickson
 Attachments: SOLR-4718.patch


 So the near-final piece of this puzzle is to make solr.xml be storable in 
 Zookeeper. Code-wise in terms of Solr, this doesn't look very difficult, I'm 
 working on it now.
 More interesting is how to get the configuration into ZK in the first place, 
 enhancements to ZkCli? Or boostrap-conf? Other? I'm punting on that for this 
 patch.
 Second level is how to tell Solr to get the file from ZK. Some possibilities:
 1 A system prop, -DzkSolrXmlPath=blah where blah is the path _on zk_ where 
 the file is. Would require -DzkHost or -DzkRun as well.
pros - simple, I can wrap my head around it.
  - easy to script
cons - can't run multiple JVMs pointing to different files. Is this 
 really a problem?
 2 New solr.xml element. Something like:
 solr
   solrcloud
  str name=zkHostzkurl/str
  str name=zkSolrXmlPathwhatever/str
   /solrcloud
 solr
Really, this form would hinge on the presence or absence of zkSolrXmlPath. 
 If present, go up and look for the indicated solr.xml file on ZK. Any 
 properties in the ZK version would overwrite anything in the local copy.
 NOTE: I'm really not very interested in supporting this as an option for 
 old-style solr.xml unless it's _really_ easy. For instance, what if the local 
 solr.xml is new-style and the one in ZK is old-style? Or vice-versa? Since 
 old-style is going away, this doesn't seem like it's worth the effort.
 pros - No new mechanisms
 cons - once again requires that there be a solr.xml file on each client. 
 Admittedly for installations that didn't care much about multiple JVMs, it 
 could be a stock file that didn't change...
 For now, I'm going to just manually push solr.xml to ZK, then read it based 
 on a sysprop. That'll get the structure in place while we debate. Not going 
 to check this in until there's some consensus though.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-5164) Remove the OOM catching in SimpleFSDirectory and NIOFSDirectory

2013-08-11 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-5164:
--

Attachment: LUCENE-5164.patch

New patch again, this time with better reuse of NIOFS' ByteBuffer!

 Remove the OOM catching in SimpleFSDirectory and NIOFSDirectory
 ---

 Key: LUCENE-5164
 URL: https://issues.apache.org/jira/browse/LUCENE-5164
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Attachments: LUCENE-5164.patch, LUCENE-5164.patch, LUCENE-5164.patch, 
 LUCENE-5164.patch, LUCENE-5164.patch


 Followup from LUCENE-5161:
 In former times we added the OOM cactching in NIOFSDir and SimpleFSDir 
 because nobody understand why the OOM could happen on FileChannel.read() or 
 SimpleFSDir.read(). By reading the Java code its easy to understand (it 
 allocates direct buffers with same size as the requested length to read). As 
 we have chunking now reduce to a few kilobytes it cannot happen anymore that 
 we get spurious OOMs.
 In fact we might hide a *real* OOM! So we should remove it.
 I am also not sure if we should make chunk size configureable in FSDirectory 
 at all! It makes no sense to me (it was in fact only added for people that 
 hit the OOM to fine-tune).
 In my opinion we should remove the setter in trunk and keep it deprecated in 
 4.x. The buf size is then in trunk equal to the defaults from LUCENE-5161.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5164) Remove the OOM catching in SimpleFSDirectory and NIOFSDirectory

2013-08-11 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13736288#comment-13736288
 ] 

ASF subversion and git services commented on LUCENE-5164:
-

Commit 1512937 from [~thetaphi] in branch 'dev/trunk'
[ https://svn.apache.org/r1512937 ]

LUCENE-5164: Fix default chunk sizes in FSDirectory to not be unnecessarily 
large (now 8192 bytes); also use chunking when writing to index files. 
FSDirectory#setReadChunkSize() is now deprecated and will be removed in Lucene 
5.0

 Remove the OOM catching in SimpleFSDirectory and NIOFSDirectory
 ---

 Key: LUCENE-5164
 URL: https://issues.apache.org/jira/browse/LUCENE-5164
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Attachments: LUCENE-5164.patch, LUCENE-5164.patch, LUCENE-5164.patch, 
 LUCENE-5164.patch, LUCENE-5164.patch


 Followup from LUCENE-5161:
 In former times we added the OOM cactching in NIOFSDir and SimpleFSDir 
 because nobody understand why the OOM could happen on FileChannel.read() or 
 SimpleFSDir.read(). By reading the Java code its easy to understand (it 
 allocates direct buffers with same size as the requested length to read). As 
 we have chunking now reduce to a few kilobytes it cannot happen anymore that 
 we get spurious OOMs.
 In fact we might hide a *real* OOM! So we should remove it.
 I am also not sure if we should make chunk size configureable in FSDirectory 
 at all! It makes no sense to me (it was in fact only added for people that 
 hit the OOM to fine-tune).
 In my opinion we should remove the setter in trunk and keep it deprecated in 
 4.x. The buf size is then in trunk equal to the defaults from LUCENE-5161.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4956) make maxBufferedAddsPerServer configurable

2013-08-11 Thread Erick Erickson (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13736291#comment-13736291
 ] 

Erick Erickson commented on SOLR-4956:
--

So it sounds like we have competing needs here. On the one hand
we have several anecdotal statements that upping the buffer size
had significant impact on throughput.

On the other, just upping the buffer size has potential for Bad
Outcomes.

So it seems we have three options here:
1 make it configurable with a warning that if you change it it
   may lead to Bad Stuff.
2 Leave it as-is and forget about it.
3 Do the harder thing and see if we can figure out why changing
   the batch size makes such a difference and fix the underlying
   cause (if there is one).

I'm totally unfamiliar with the code, but the 20,000 ft. smell is
that there's something about the intra-node routing code that's
very inefficient and making the buffers bigger is masking that. On
the surface, just sending the packets around doesn't seem like it
should spike the CPU that much... But like I said, I haven't looked
at the code at all.

 make maxBufferedAddsPerServer configurable
 --

 Key: SOLR-4956
 URL: https://issues.apache.org/jira/browse/SOLR-4956
 Project: Solr
  Issue Type: Improvement
Affects Versions: 4.3, 5.0
Reporter: Erick Erickson

 Anecdotal user's list evidence indicates that in high-throughput situations, 
 the default of 10 docs/batch for inter-shard batching can generate 
 significant CPU load. See the thread titled Sharding and Replication on 
 June 19th, but the gist is below.
 I haven't poked around, but it's a little surprising on the surface that Asif 
 is seeing this kind of difference. So I'm wondering if this change indicates 
 some other underlying issue. Regardless, this seems like it would be good to 
 investigate.
 Here's the gist of Asif's experience from the thread:
 Its a completely practical problem - we are exploring Solr to build a real
 time analytics/data solution for a system handling about 1000 qps. We have
 various metrics that are stored as different collections on the cloud,
 which means very high amount of writes. The cloud also needs to support
 about 300-400 qps.
 We initially tested with a single Solr node on a 16 core / 24 GB box  for a
 single metric. We saw that writes were not a issue at all - Solr was
 handling it extremely well. We were also able to achieve about 200 qps from
 a single node.
 When we set up the cloud ( a ensemble on 6 boxes), we saw very high CPU
 usage on the replicas. Up to 10 cores were getting used for writes on the
 replicas. Hence my concern with respect to batch updates for the replicas.
 BTW, I altered the maxBufferedAddsPerServer to 1000 - and now CPU usage is
 very similar to single node installation.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3076) Solr(Cloud) should support block joins

2013-08-11 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13736292#comment-13736292
 ] 

Yonik Seeley commented on SOLR-3076:


bq. why PARENT:true should ignore deletions?

In an earlier iteration, it needed to... but now I think it's just desirable 
(as opposed to required) because it's more efficient (less backtracking over 
deleted docs), and more resilient to accidental error conditions (like when 
someone deletes a parent doc but not it's children).

bq. I propose to revise the idea of rewindable docIDset iterator 

See LUCENE-5092, it looks like something like that has been rejected.

As far as maintenance, the current stuff makes some things easier to tweak.  I 
already did so for the child parser to make it fit better with how we put 
together full queries.  Anyway, the important part are the public interfaces 
(the XML doc, and \{!parent} \{!child} parsers and semantics).  If we're happy 
with those, I think we should commit at this point - this issue has been open 
for far too long!

 Solr(Cloud) should support block joins
 --

 Key: SOLR-3076
 URL: https://issues.apache.org/jira/browse/SOLR-3076
 Project: Solr
  Issue Type: New Feature
Reporter: Grant Ingersoll
Assignee: Yonik Seeley
 Fix For: 4.5, 5.0

 Attachments: 27M-singlesegment-histogram.png, 27M-singlesegment.png, 
 bjq-vs-filters-backward-disi.patch, bjq-vs-filters-illegal-state.patch, 
 child-bjqparser.patch, dih-3076.patch, dih-config.xml, 
 parent-bjq-qparser.patch, parent-bjq-qparser.patch, Screen Shot 2012-07-17 at 
 1.12.11 AM.png, SOLR-3076-childDocs.patch, SOLR-3076.patch, SOLR-3076.patch, 
 SOLR-3076.patch, SOLR-3076.patch, SOLR-3076.patch, SOLR-3076.patch, 
 SOLR-3076.patch, SOLR-3076.patch, SOLR-3076.patch, SOLR-3076.patch, 
 SOLR-3076.patch, SOLR-3076.patch, SOLR-3076.patch, 
 SOLR-7036-childDocs-solr-fork-trunk-patched, 
 solrconf-bjq-erschema-snippet.xml, solrconfig.xml.patch, 
 tochild-bjq-filtered-search-fix.patch


 Lucene has the ability to do block joins, we should add it to Solr.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-5164) Remove the OOM catching in SimpleFSDirectory and NIOFSDirectory

2013-08-11 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-5164:
--

Attachment: LUCENE-5164-4x.patch

Patch for 4.x (as the merging was complicated, because of many changes - Java 7)

 Remove the OOM catching in SimpleFSDirectory and NIOFSDirectory
 ---

 Key: LUCENE-5164
 URL: https://issues.apache.org/jira/browse/LUCENE-5164
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Attachments: LUCENE-5164-4x.patch, LUCENE-5164.patch, 
 LUCENE-5164.patch, LUCENE-5164.patch, LUCENE-5164.patch, LUCENE-5164.patch


 Followup from LUCENE-5161:
 In former times we added the OOM cactching in NIOFSDir and SimpleFSDir 
 because nobody understand why the OOM could happen on FileChannel.read() or 
 SimpleFSDir.read(). By reading the Java code its easy to understand (it 
 allocates direct buffers with same size as the requested length to read). As 
 we have chunking now reduce to a few kilobytes it cannot happen anymore that 
 we get spurious OOMs.
 In fact we might hide a *real* OOM! So we should remove it.
 I am also not sure if we should make chunk size configureable in FSDirectory 
 at all! It makes no sense to me (it was in fact only added for people that 
 hit the OOM to fine-tune).
 In my opinion we should remove the setter in trunk and keep it deprecated in 
 4.x. The buf size is then in trunk equal to the defaults from LUCENE-5161.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5164) Remove the OOM catching in SimpleFSDirectory and NIOFSDirectory

2013-08-11 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13736294#comment-13736294
 ] 

ASF subversion and git services commented on LUCENE-5164:
-

Commit 1512949 from [~thetaphi] in branch 'dev/branches/branch_4x'
[ https://svn.apache.org/r1512949 ]

Merged revision(s) 1512937 from lucene/dev/trunk:
LUCENE-5164: Fix default chunk sizes in FSDirectory to not be unnecessarily 
large (now 8192 bytes); also use chunking when writing to index files. 
FSDirectory#setReadChunkSize() is now deprecated and will be removed in Lucene 
5.0

 Remove the OOM catching in SimpleFSDirectory and NIOFSDirectory
 ---

 Key: LUCENE-5164
 URL: https://issues.apache.org/jira/browse/LUCENE-5164
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Attachments: LUCENE-5164-4x.patch, LUCENE-5164.patch, 
 LUCENE-5164.patch, LUCENE-5164.patch, LUCENE-5164.patch, LUCENE-5164.patch


 Followup from LUCENE-5161:
 In former times we added the OOM cactching in NIOFSDir and SimpleFSDir 
 because nobody understand why the OOM could happen on FileChannel.read() or 
 SimpleFSDir.read(). By reading the Java code its easy to understand (it 
 allocates direct buffers with same size as the requested length to read). As 
 we have chunking now reduce to a few kilobytes it cannot happen anymore that 
 we get spurious OOMs.
 In fact we might hide a *real* OOM! So we should remove it.
 I am also not sure if we should make chunk size configureable in FSDirectory 
 at all! It makes no sense to me (it was in fact only added for people that 
 hit the OOM to fine-tune).
 In my opinion we should remove the setter in trunk and keep it deprecated in 
 4.x. The buf size is then in trunk equal to the defaults from LUCENE-5161.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5164) Remove the OOM catching in SimpleFSDirectory and NIOFSDirectory

2013-08-11 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13736295#comment-13736295
 ] 

ASF subversion and git services commented on LUCENE-5164:
-

Commit 1512951 from [~thetaphi] in branch 'dev/trunk'
[ https://svn.apache.org/r1512951 ]

LUCENE-5164: Remove deprecated stuff in trunk.

 Remove the OOM catching in SimpleFSDirectory and NIOFSDirectory
 ---

 Key: LUCENE-5164
 URL: https://issues.apache.org/jira/browse/LUCENE-5164
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Attachments: LUCENE-5164-4x.patch, LUCENE-5164.patch, 
 LUCENE-5164.patch, LUCENE-5164.patch, LUCENE-5164.patch, LUCENE-5164.patch


 Followup from LUCENE-5161:
 In former times we added the OOM cactching in NIOFSDir and SimpleFSDir 
 because nobody understand why the OOM could happen on FileChannel.read() or 
 SimpleFSDir.read(). By reading the Java code its easy to understand (it 
 allocates direct buffers with same size as the requested length to read). As 
 we have chunking now reduce to a few kilobytes it cannot happen anymore that 
 we get spurious OOMs.
 In fact we might hide a *real* OOM! So we should remove it.
 I am also not sure if we should make chunk size configureable in FSDirectory 
 at all! It makes no sense to me (it was in fact only added for people that 
 hit the OOM to fine-tune).
 In my opinion we should remove the setter in trunk and keep it deprecated in 
 4.x. The buf size is then in trunk equal to the defaults from LUCENE-5161.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-5164) Remove the OOM catching in SimpleFSDirectory and NIOFSDirectory

2013-08-11 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler resolved LUCENE-5164.
---

Resolution: Fixed

Thanks Robert and Grant for the fruitful discussions!

 Remove the OOM catching in SimpleFSDirectory and NIOFSDirectory
 ---

 Key: LUCENE-5164
 URL: https://issues.apache.org/jira/browse/LUCENE-5164
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Attachments: LUCENE-5164-4x.patch, LUCENE-5164.patch, 
 LUCENE-5164.patch, LUCENE-5164.patch, LUCENE-5164.patch, LUCENE-5164.patch


 Followup from LUCENE-5161:
 In former times we added the OOM cactching in NIOFSDir and SimpleFSDir 
 because nobody understand why the OOM could happen on FileChannel.read() or 
 SimpleFSDir.read(). By reading the Java code its easy to understand (it 
 allocates direct buffers with same size as the requested length to read). As 
 we have chunking now reduce to a few kilobytes it cannot happen anymore that 
 we get spurious OOMs.
 In fact we might hide a *real* OOM! So we should remove it.
 I am also not sure if we should make chunk size configureable in FSDirectory 
 at all! It makes no sense to me (it was in fact only added for people that 
 hit the OOM to fine-tune).
 In my opinion we should remove the setter in trunk and keep it deprecated in 
 4.x. The buf size is then in trunk equal to the defaults from LUCENE-5161.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4956) make maxBufferedAddsPerServer configurable

2013-08-11 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13736298#comment-13736298
 ] 

Yonik Seeley commented on SOLR-4956:


Buffering can also slow indexing speeds...
Say you up the buffer to 100 docs and then you send in a batch of 50.  All 50 
docs will be indexed locally and only then will all 50 be sent to the replica 
(where we have to wait for all 50 docs to be indexed again).

 make maxBufferedAddsPerServer configurable
 --

 Key: SOLR-4956
 URL: https://issues.apache.org/jira/browse/SOLR-4956
 Project: Solr
  Issue Type: Improvement
Affects Versions: 4.3, 5.0
Reporter: Erick Erickson

 Anecdotal user's list evidence indicates that in high-throughput situations, 
 the default of 10 docs/batch for inter-shard batching can generate 
 significant CPU load. See the thread titled Sharding and Replication on 
 June 19th, but the gist is below.
 I haven't poked around, but it's a little surprising on the surface that Asif 
 is seeing this kind of difference. So I'm wondering if this change indicates 
 some other underlying issue. Regardless, this seems like it would be good to 
 investigate.
 Here's the gist of Asif's experience from the thread:
 Its a completely practical problem - we are exploring Solr to build a real
 time analytics/data solution for a system handling about 1000 qps. We have
 various metrics that are stored as different collections on the cloud,
 which means very high amount of writes. The cloud also needs to support
 about 300-400 qps.
 We initially tested with a single Solr node on a 16 core / 24 GB box  for a
 single metric. We saw that writes were not a issue at all - Solr was
 handling it extremely well. We were also able to achieve about 200 qps from
 a single node.
 When we set up the cloud ( a ensemble on 6 boxes), we saw very high CPU
 usage on the replicas. Up to 10 cores were getting used for writes on the
 replicas. Hence my concern with respect to batch updates for the replicas.
 BTW, I altered the maxBufferedAddsPerServer to 1000 - and now CPU usage is
 very similar to single node installation.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-5134) Have HdfsIndexOutput extend BufferedIndexOutput

2013-08-11 Thread Mark Miller (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-5134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Miller updated SOLR-5134:
--

Attachment: SOLR-5134.patch

Thanks Uwe - new patch attached.

 Have HdfsIndexOutput extend BufferedIndexOutput
 ---

 Key: SOLR-5134
 URL: https://issues.apache.org/jira/browse/SOLR-5134
 Project: Solr
  Issue Type: Improvement
Affects Versions: 4.4
Reporter: Mark Miller
Assignee: Mark Miller
Priority: Minor
 Fix For: 4.5, 5.0

 Attachments: SOLR-5134.patch, SOLR-5134.patch


 Upstream Blur has moved HdfsIndexOutput to use BufferedIndexOutput and the 
 simple FS IndexOutput does as well - seems we should do the same.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4956) make maxBufferedAddsPerServer configurable

2013-08-11 Thread Erick Erickson (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13736301#comment-13736301
 ] 

Erick Erickson commented on SOLR-4956:
--

Hmmm, sounds like we need more details. I wonder if the
situation where buffering up more docs is helping are also
situations in which there is only a leader?

I guess the thing that's puzzling me is the high CPU
rates that are reported being related to internal buffering
sizes.


 make maxBufferedAddsPerServer configurable
 --

 Key: SOLR-4956
 URL: https://issues.apache.org/jira/browse/SOLR-4956
 Project: Solr
  Issue Type: Improvement
Affects Versions: 4.3, 5.0
Reporter: Erick Erickson

 Anecdotal user's list evidence indicates that in high-throughput situations, 
 the default of 10 docs/batch for inter-shard batching can generate 
 significant CPU load. See the thread titled Sharding and Replication on 
 June 19th, but the gist is below.
 I haven't poked around, but it's a little surprising on the surface that Asif 
 is seeing this kind of difference. So I'm wondering if this change indicates 
 some other underlying issue. Regardless, this seems like it would be good to 
 investigate.
 Here's the gist of Asif's experience from the thread:
 Its a completely practical problem - we are exploring Solr to build a real
 time analytics/data solution for a system handling about 1000 qps. We have
 various metrics that are stored as different collections on the cloud,
 which means very high amount of writes. The cloud also needs to support
 about 300-400 qps.
 We initially tested with a single Solr node on a 16 core / 24 GB box  for a
 single metric. We saw that writes were not a issue at all - Solr was
 handling it extremely well. We were also able to achieve about 200 qps from
 a single node.
 When we set up the cloud ( a ensemble on 6 boxes), we saw very high CPU
 usage on the replicas. Up to 10 cores were getting used for writes on the
 replicas. Hence my concern with respect to batch updates for the replicas.
 BTW, I altered the maxBufferedAddsPerServer to 1000 - and now CPU usage is
 very similar to single node installation.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-5164) Remove the OOM catching in SimpleFSDirectory and NIOFSDirectory

2013-08-11 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-5164:
--

Fix Version/s: 4.5
   5.0

 Remove the OOM catching in SimpleFSDirectory and NIOFSDirectory
 ---

 Key: LUCENE-5164
 URL: https://issues.apache.org/jira/browse/LUCENE-5164
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 5.0, 4.5

 Attachments: LUCENE-5164-4x.patch, LUCENE-5164.patch, 
 LUCENE-5164.patch, LUCENE-5164.patch, LUCENE-5164.patch, LUCENE-5164.patch


 Followup from LUCENE-5161:
 In former times we added the OOM cactching in NIOFSDir and SimpleFSDir 
 because nobody understand why the OOM could happen on FileChannel.read() or 
 SimpleFSDir.read(). By reading the Java code its easy to understand (it 
 allocates direct buffers with same size as the requested length to read). As 
 we have chunking now reduce to a few kilobytes it cannot happen anymore that 
 we get spurious OOMs.
 In fact we might hide a *real* OOM! So we should remove it.
 I am also not sure if we should make chunk size configureable in FSDirectory 
 at all! It makes no sense to me (it was in fact only added for people that 
 hit the OOM to fine-tune).
 In my opinion we should remove the setter in trunk and keep it deprecated in 
 4.x. The buf size is then in trunk equal to the defaults from LUCENE-5161.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-5135) Deleting a collection should be extra aggressive in the face of failures.

2013-08-11 Thread Mark Miller (JIRA)
Mark Miller created SOLR-5135:
-

 Summary: Deleting a collection should be extra aggressive in the 
face of failures.
 Key: SOLR-5135
 URL: https://issues.apache.org/jira/browse/SOLR-5135
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Reporter: Mark Miller
Assignee: Mark Miller
 Fix For: 4.5, 5.0


Until Zk is the source of truth for the cluster, zk and local node states can 
get out of whack in certain situations - as a result, sometimes you cannot 
clean out all of the remnants of a collection to recreate it. For example, if 
the collection is listed in zk under /collections, but is not in 
clusterstate.json, you cannot remove or create the collection again due to a 
early exception in the collection removal chain.

I think we should probably still return the error - but also delete as much as 
we can.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [JENKINS] Lucene-Solr-4.x-Linux (32bit/jdk1.7.0_25) - Build # 6874 - Failure!

2013-08-11 Thread Robert Muir
I cannot reproduce this.

On Sat, Aug 10, 2013 at 1:44 PM, Policeman Jenkins Server
jenk...@thetaphi.de wrote:
 Build: http://jenkins.thetaphi.de/job/Lucene-Solr-4.x-Linux/6874/
 Java: 32bit/jdk1.7.0_25 -server -XX:+UseG1GC

 1 tests failed.
 REGRESSION:  
 org.apache.lucene.search.postingshighlight.TestPostingsHighlighter.testUserFailedToIndexOffsets

 Error Message:


 Stack Trace:
 java.lang.AssertionError
 at 
 __randomizedtesting.SeedInfo.seed([EBBFA6F4E80A7365:1FBF811885F2D611]:0)
 at 
 org.apache.lucene.index.ByteSliceReader.readByte(ByteSliceReader.java:73)
 at org.apache.lucene.store.DataInput.readVInt(DataInput.java:108)
 at 
 org.apache.lucene.index.FreqProxTermsWriterPerField.flush(FreqProxTermsWriterPerField.java:453)
 at 
 org.apache.lucene.index.FreqProxTermsWriter.flush(FreqProxTermsWriter.java:85)
 at org.apache.lucene.index.TermsHash.flush(TermsHash.java:116)
 at org.apache.lucene.index.DocInverter.flush(DocInverter.java:53)
 at 
 org.apache.lucene.index.DocFieldProcessor.flush(DocFieldProcessor.java:81)
 at 
 org.apache.lucene.index.DocumentsWriterPerThread.flush(DocumentsWriterPerThread.java:501)
 at 
 org.apache.lucene.index.DocumentsWriter.doFlush(DocumentsWriter.java:478)
 at 
 org.apache.lucene.index.DocumentsWriter.flushAllThreads(DocumentsWriter.java:615)
 at 
 org.apache.lucene.index.IndexWriter.prepareCommitInternal(IndexWriter.java:2760)
 at 
 org.apache.lucene.index.IndexWriter.commitInternal(IndexWriter.java:2909)
 at org.apache.lucene.index.IndexWriter.commit(IndexWriter.java:2884)
 at 
 org.apache.lucene.index.RandomIndexWriter.getReader(RandomIndexWriter.java:312)
 at 
 org.apache.lucene.index.RandomIndexWriter.getReader(RandomIndexWriter.java:249)
 at 
 org.apache.lucene.search.postingshighlight.TestPostingsHighlighter.testUserFailedToIndexOffsets(TestPostingsHighlighter.java:295)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:606)
 at 
 com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1559)
 at 
 com.carrotsearch.randomizedtesting.RandomizedRunner.access$600(RandomizedRunner.java:79)
 at 
 com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:737)
 at 
 com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:773)
 at 
 com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:787)
 at 
 org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50)
 at 
 org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:51)
 at 
 org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
 at 
 com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
 at 
 org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:49)
 at 
 org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70)
 at 
 org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
 at 
 com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
 at 
 com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:358)
 at 
 com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:782)
 at 
 com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:442)
 at 
 com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:746)
 at 
 com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:648)
 at 
 com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:682)
 at 
 com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:693)
 at 
 org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
 at 
 org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42)
 at 
 com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
 at 
 com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
 at 
 

[jira] [Commented] (SOLR-5135) Deleting a collection should be extra aggressive in the face of failures.

2013-08-11 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13736374#comment-13736374
 ] 

Mark Miller commented on SOLR-5135:
---

bq.  or create the collection again due

Seems you can actually create it again - we check existence vs the 
clusterstate.json rather than the /collections node - we should still remove 
the remnants though.

 Deleting a collection should be extra aggressive in the face of failures.
 -

 Key: SOLR-5135
 URL: https://issues.apache.org/jira/browse/SOLR-5135
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Reporter: Mark Miller
Assignee: Mark Miller
 Fix For: 4.5, 5.0


 Until Zk is the source of truth for the cluster, zk and local node states can 
 get out of whack in certain situations - as a result, sometimes you cannot 
 clean out all of the remnants of a collection to recreate it. For example, if 
 the collection is listed in zk under /collections, but is not in 
 clusterstate.json, you cannot remove or create the collection again due to a 
 early exception in the collection removal chain.
 I think we should probably still return the error - but also delete as much 
 as we can.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3076) Solr(Cloud) should support block joins

2013-08-11 Thread Mikhail Khludnev (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13736386#comment-13736386
 ] 

Mikhail Khludnev commented on SOLR-3076:


bq. like when someone deletes a parent doc but not it's children 

I've thought it so. However, there is an argument provided by one of my 
colleagues and the brightest engineers ever (Nina G): such courtesy works until 
merge happens, and after merge/expunge deletes it's a pain. So, beside it's 
inconsistent, I even thought it wont be passed by random tests.   

bq. See LUCENE-5092, it looks like something like that has been rejected.
that approach has performance implication, but I propose nothing more just API 
massaging without any real implementation changes/extending: let BJQ work with 
something, with is either CachingWrapperFilter or BitDocSet.getTopFilter().

bq.  If we're happy with those, I think we should commit at this point - this 
issue has been open for far too long) not found.

I got your point. It makes sense. We just need to raise followup issue - unify 
BJQs across Lucene and Solr, and ideally address it before the next release. 
Otherwise, it's just a way to upset a user - if someone happy with BJQ in 
Lucene, it should be clear that with this parser he goes to another BJQs. As an 
alternative intermediate measure, don't you think it's more honest to store 
CachingWrapperFilter in Solr's filtercache via ugly hack, for sure. Then, 
follow up and address it soon.  

Thanks

 Solr(Cloud) should support block joins
 --

 Key: SOLR-3076
 URL: https://issues.apache.org/jira/browse/SOLR-3076
 Project: Solr
  Issue Type: New Feature
Reporter: Grant Ingersoll
Assignee: Yonik Seeley
 Fix For: 4.5, 5.0

 Attachments: 27M-singlesegment-histogram.png, 27M-singlesegment.png, 
 bjq-vs-filters-backward-disi.patch, bjq-vs-filters-illegal-state.patch, 
 child-bjqparser.patch, dih-3076.patch, dih-config.xml, 
 parent-bjq-qparser.patch, parent-bjq-qparser.patch, Screen Shot 2012-07-17 at 
 1.12.11 AM.png, SOLR-3076-childDocs.patch, SOLR-3076.patch, SOLR-3076.patch, 
 SOLR-3076.patch, SOLR-3076.patch, SOLR-3076.patch, SOLR-3076.patch, 
 SOLR-3076.patch, SOLR-3076.patch, SOLR-3076.patch, SOLR-3076.patch, 
 SOLR-3076.patch, SOLR-3076.patch, SOLR-3076.patch, 
 SOLR-7036-childDocs-solr-fork-trunk-patched, 
 solrconf-bjq-erschema-snippet.xml, solrconfig.xml.patch, 
 tochild-bjq-filtered-search-fix.patch


 Lucene has the ability to do block joins, we should add it to Solr.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5092) join: don't expect all filters to be FixedBitSet instances

2013-08-11 Thread Mikhail Khludnev (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13736388#comment-13736388
 ] 

Mikhail Khludnev commented on LUCENE-5092:
--

bq.  In my opinion the order of child / parent documents should be reversed, so 
the search for the parent (or child dont know) could go forward only.

[~thetaphi] in this case, after I advance()/nextDoc() a child scorer, how the 
parent scorer can reach the parent doc which is before of the matched child?

 join: don't expect all filters to be FixedBitSet instances
 --

 Key: LUCENE-5092
 URL: https://issues.apache.org/jira/browse/LUCENE-5092
 Project: Lucene - Core
  Issue Type: Improvement
  Components: modules/join
Reporter: Adrien Grand
Assignee: Adrien Grand
Priority: Minor
 Attachments: LUCENE-5092.patch


 The join module throws exceptions when the parents filter isn't a 
 FixedBitSet. The reason is that the join module relies on prevSetBit to find 
 the first child document given a parent ID.
 As suggested by Uwe and Paul Elschot on LUCENE-5081, we could fix it by 
 exposing methods in the iterators to iterate backwards. When the join modules 
 gets an iterator which isn't able to iterate backwards, it would just need to 
 dump its content into another DocIdSet that supports backward iteration, 
 FixedBitSet for example.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-3076) Solr(Cloud) should support block joins

2013-08-11 Thread Mikhail Khludnev (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13736386#comment-13736386
 ] 

Mikhail Khludnev edited comment on SOLR-3076 at 8/11/13 7:22 PM:
-

bq. like when someone deletes a parent doc but not it's children 

I've thought it so. However, there is an argument provided by one of my 
colleagues and the brightest engineer ever (Nina G) - such courtesy works until 
merge happens, and after merge/expunge deletes it's a pain. So, beside it's 
inconsistent, I even thought it wont be passed by random tests.   

bq. See LUCENE-5092, it looks like something like that has been rejected.
that approach has performance implication, but I propose nothing more just API 
massaging without any real implementation changes/extending: let BJQ work with 
something, which is either CachingWrapperFilter or BitDocSet.getTopFilter().

bq.  If we're happy with those, I think we should commit at this point -  

I got your point. It makes sense. We just need to raise followup issue - unify 
BJQs across Lucene and Solr, and ideally address it before the next release. 
Otherwise, it's just a way to upset a user - if someone happy with BJQ in 
Lucene, it should be clear that with this parser he goes to another BJQs. As an 
alternative intermediate measure, don't you think it's more honest to store 
CachingWrapperFilter in Solr's filtercache via ugly hack, for sure. Then, 
follow up and address it soon.  

bq. this issue has been open for far too long)
but who really cares?  

Thanks

  was (Author: mkhludnev):
bq. like when someone deletes a parent doc but not it's children 

I've thought it so. However, there is an argument provided by one of my 
colleagues and the brightest engineers ever (Nina G): such courtesy works until 
merge happens, and after merge/expunge deletes it's a pain. So, beside it's 
inconsistent, I even thought it wont be passed by random tests.   

bq. See LUCENE-5092, it looks like something like that has been rejected.
that approach has performance implication, but I propose nothing more just API 
massaging without any real implementation changes/extending: let BJQ work with 
something, with is either CachingWrapperFilter or BitDocSet.getTopFilter().

bq.  If we're happy with those, I think we should commit at this point - this 
issue has been open for far too long) not found.

I got your point. It makes sense. We just need to raise followup issue - unify 
BJQs across Lucene and Solr, and ideally address it before the next release. 
Otherwise, it's just a way to upset a user - if someone happy with BJQ in 
Lucene, it should be clear that with this parser he goes to another BJQs. As an 
alternative intermediate measure, don't you think it's more honest to store 
CachingWrapperFilter in Solr's filtercache via ugly hack, for sure. Then, 
follow up and address it soon.  

Thanks
  
 Solr(Cloud) should support block joins
 --

 Key: SOLR-3076
 URL: https://issues.apache.org/jira/browse/SOLR-3076
 Project: Solr
  Issue Type: New Feature
Reporter: Grant Ingersoll
Assignee: Yonik Seeley
 Fix For: 4.5, 5.0

 Attachments: 27M-singlesegment-histogram.png, 27M-singlesegment.png, 
 bjq-vs-filters-backward-disi.patch, bjq-vs-filters-illegal-state.patch, 
 child-bjqparser.patch, dih-3076.patch, dih-config.xml, 
 parent-bjq-qparser.patch, parent-bjq-qparser.patch, Screen Shot 2012-07-17 at 
 1.12.11 AM.png, SOLR-3076-childDocs.patch, SOLR-3076.patch, SOLR-3076.patch, 
 SOLR-3076.patch, SOLR-3076.patch, SOLR-3076.patch, SOLR-3076.patch, 
 SOLR-3076.patch, SOLR-3076.patch, SOLR-3076.patch, SOLR-3076.patch, 
 SOLR-3076.patch, SOLR-3076.patch, SOLR-3076.patch, 
 SOLR-7036-childDocs-solr-fork-trunk-patched, 
 solrconf-bjq-erschema-snippet.xml, solrconfig.xml.patch, 
 tochild-bjq-filtered-search-fix.patch


 Lucene has the ability to do block joins, we should add it to Solr.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3076) Solr(Cloud) should support block joins

2013-08-11 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13736401#comment-13736401
 ] 

Robert Muir commented on SOLR-3076:
---

why take working per-segment code and make it slower/top-level?

 Solr(Cloud) should support block joins
 --

 Key: SOLR-3076
 URL: https://issues.apache.org/jira/browse/SOLR-3076
 Project: Solr
  Issue Type: New Feature
Reporter: Grant Ingersoll
Assignee: Yonik Seeley
 Fix For: 4.5, 5.0

 Attachments: 27M-singlesegment-histogram.png, 27M-singlesegment.png, 
 bjq-vs-filters-backward-disi.patch, bjq-vs-filters-illegal-state.patch, 
 child-bjqparser.patch, dih-3076.patch, dih-config.xml, 
 parent-bjq-qparser.patch, parent-bjq-qparser.patch, Screen Shot 2012-07-17 at 
 1.12.11 AM.png, SOLR-3076-childDocs.patch, SOLR-3076.patch, SOLR-3076.patch, 
 SOLR-3076.patch, SOLR-3076.patch, SOLR-3076.patch, SOLR-3076.patch, 
 SOLR-3076.patch, SOLR-3076.patch, SOLR-3076.patch, SOLR-3076.patch, 
 SOLR-3076.patch, SOLR-3076.patch, SOLR-3076.patch, 
 SOLR-7036-childDocs-solr-fork-trunk-patched, 
 solrconf-bjq-erschema-snippet.xml, solrconfig.xml.patch, 
 tochild-bjq-filtered-search-fix.patch


 Lucene has the ability to do block joins, we should add it to Solr.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3076) Solr(Cloud) should support block joins

2013-08-11 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13736406#comment-13736406
 ] 

Yonik Seeley commented on SOLR-3076:


per-segment caches isn't the focus of this issue (although we should add a 
generic per-segment cache that can be sized/managed in a diff issue).

 Solr(Cloud) should support block joins
 --

 Key: SOLR-3076
 URL: https://issues.apache.org/jira/browse/SOLR-3076
 Project: Solr
  Issue Type: New Feature
Reporter: Grant Ingersoll
Assignee: Yonik Seeley
 Fix For: 4.5, 5.0

 Attachments: 27M-singlesegment-histogram.png, 27M-singlesegment.png, 
 bjq-vs-filters-backward-disi.patch, bjq-vs-filters-illegal-state.patch, 
 child-bjqparser.patch, dih-3076.patch, dih-config.xml, 
 parent-bjq-qparser.patch, parent-bjq-qparser.patch, Screen Shot 2012-07-17 at 
 1.12.11 AM.png, SOLR-3076-childDocs.patch, SOLR-3076.patch, SOLR-3076.patch, 
 SOLR-3076.patch, SOLR-3076.patch, SOLR-3076.patch, SOLR-3076.patch, 
 SOLR-3076.patch, SOLR-3076.patch, SOLR-3076.patch, SOLR-3076.patch, 
 SOLR-3076.patch, SOLR-3076.patch, SOLR-3076.patch, 
 SOLR-7036-childDocs-solr-fork-trunk-patched, 
 solrconf-bjq-erschema-snippet.xml, solrconfig.xml.patch, 
 tochild-bjq-filtered-search-fix.patch


 Lucene has the ability to do block joins, we should add it to Solr.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3076) Solr(Cloud) should support block joins

2013-08-11 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13736411#comment-13736411
 ] 

Robert Muir commented on SOLR-3076:
---

The previous patches were per-segment. There is no reason for it to be 
top-level!

 Solr(Cloud) should support block joins
 --

 Key: SOLR-3076
 URL: https://issues.apache.org/jira/browse/SOLR-3076
 Project: Solr
  Issue Type: New Feature
Reporter: Grant Ingersoll
Assignee: Yonik Seeley
 Fix For: 4.5, 5.0

 Attachments: 27M-singlesegment-histogram.png, 27M-singlesegment.png, 
 bjq-vs-filters-backward-disi.patch, bjq-vs-filters-illegal-state.patch, 
 child-bjqparser.patch, dih-3076.patch, dih-config.xml, 
 parent-bjq-qparser.patch, parent-bjq-qparser.patch, Screen Shot 2012-07-17 at 
 1.12.11 AM.png, SOLR-3076-childDocs.patch, SOLR-3076.patch, SOLR-3076.patch, 
 SOLR-3076.patch, SOLR-3076.patch, SOLR-3076.patch, SOLR-3076.patch, 
 SOLR-3076.patch, SOLR-3076.patch, SOLR-3076.patch, SOLR-3076.patch, 
 SOLR-3076.patch, SOLR-3076.patch, SOLR-3076.patch, 
 SOLR-7036-childDocs-solr-fork-trunk-patched, 
 solrconf-bjq-erschema-snippet.xml, solrconfig.xml.patch, 
 tochild-bjq-filtered-search-fix.patch


 Lucene has the ability to do block joins, we should add it to Solr.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-5135) Deleting a collection should be extra aggressive in the face of failures.

2013-08-11 Thread Mark Miller (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-5135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Miller updated SOLR-5135:
--

Attachment: SOLR-5135.patch

Attached patch adds an attempt to remove the /collections zk node in a finally 
after trying to remove all of the cores.

 Deleting a collection should be extra aggressive in the face of failures.
 -

 Key: SOLR-5135
 URL: https://issues.apache.org/jira/browse/SOLR-5135
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Reporter: Mark Miller
Assignee: Mark Miller
 Fix For: 4.5, 5.0

 Attachments: SOLR-5135.patch


 Until Zk is the source of truth for the cluster, zk and local node states can 
 get out of whack in certain situations - as a result, sometimes you cannot 
 clean out all of the remnants of a collection to recreate it. For example, if 
 the collection is listed in zk under /collections, but is not in 
 clusterstate.json, you cannot remove or create the collection again due to a 
 early exception in the collection removal chain.
 I think we should probably still return the error - but also delete as much 
 as we can.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3076) Solr(Cloud) should support block joins

2013-08-11 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13736417#comment-13736417
 ] 

Michael McCandless commented on SOLR-3076:
--

I don't understand all the design constraints here, but I really don't like 
that the internal fork (full copy) of the ToParent/ChildBlockJoinQuery sources.

Why is this necessary?  Is it to cutover to the top-level filter cache?

We should not fork our sources if we can help it ...

 Solr(Cloud) should support block joins
 --

 Key: SOLR-3076
 URL: https://issues.apache.org/jira/browse/SOLR-3076
 Project: Solr
  Issue Type: New Feature
Reporter: Grant Ingersoll
Assignee: Yonik Seeley
 Fix For: 4.5, 5.0

 Attachments: 27M-singlesegment-histogram.png, 27M-singlesegment.png, 
 bjq-vs-filters-backward-disi.patch, bjq-vs-filters-illegal-state.patch, 
 child-bjqparser.patch, dih-3076.patch, dih-config.xml, 
 parent-bjq-qparser.patch, parent-bjq-qparser.patch, Screen Shot 2012-07-17 at 
 1.12.11 AM.png, SOLR-3076-childDocs.patch, SOLR-3076.patch, SOLR-3076.patch, 
 SOLR-3076.patch, SOLR-3076.patch, SOLR-3076.patch, SOLR-3076.patch, 
 SOLR-3076.patch, SOLR-3076.patch, SOLR-3076.patch, SOLR-3076.patch, 
 SOLR-3076.patch, SOLR-3076.patch, SOLR-3076.patch, 
 SOLR-7036-childDocs-solr-fork-trunk-patched, 
 solrconf-bjq-erschema-snippet.xml, solrconfig.xml.patch, 
 tochild-bjq-filtered-search-fix.patch


 Lucene has the ability to do block joins, we should add it to Solr.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [JENKINS] Lucene-Solr-4.x-Linux (32bit/jdk1.7.0_25) - Build # 6874 - Failure!

2013-08-11 Thread Robert Muir
I tried even harder (with exact JVM, tests.jvms, master seed, svn rev,
and same JVM args):

At revision 1512807
rmuir@beast:~/workspace/branch_4x/lucene/highlighter$ export
JAVA_HOME=/home/rmuir/Downloads/32bit7/jdk1.7.0_25
rmuir@beast:~/workspace/branch_4x/lucene/highlighter$ ant test
-Dtests.jvms=2 -Dtests.seed=EBBFA6F4E80A7365 -Dargs=-server
-XX:+UseG1GC

BUILD SUCCESSFUL
Total time: 18 seconds

On Sun, Aug 11, 2013 at 2:17 PM, Robert Muir rcm...@gmail.com wrote:
 I cannot reproduce this.

 On Sat, Aug 10, 2013 at 1:44 PM, Policeman Jenkins Server
 jenk...@thetaphi.de wrote:
 Build: http://jenkins.thetaphi.de/job/Lucene-Solr-4.x-Linux/6874/
 Java: 32bit/jdk1.7.0_25 -server -XX:+UseG1GC

 1 tests failed.
 REGRESSION:  
 org.apache.lucene.search.postingshighlight.TestPostingsHighlighter.testUserFailedToIndexOffsets

 Error Message:


 Stack Trace:
 java.lang.AssertionError
 at 
 __randomizedtesting.SeedInfo.seed([EBBFA6F4E80A7365:1FBF811885F2D611]:0)
 at 
 org.apache.lucene.index.ByteSliceReader.readByte(ByteSliceReader.java:73)
 at org.apache.lucene.store.DataInput.readVInt(DataInput.java:108)
 at 
 org.apache.lucene.index.FreqProxTermsWriterPerField.flush(FreqProxTermsWriterPerField.java:453)
 at 
 org.apache.lucene.index.FreqProxTermsWriter.flush(FreqProxTermsWriter.java:85)
 at org.apache.lucene.index.TermsHash.flush(TermsHash.java:116)
 at org.apache.lucene.index.DocInverter.flush(DocInverter.java:53)
 at 
 org.apache.lucene.index.DocFieldProcessor.flush(DocFieldProcessor.java:81)
 at 
 org.apache.lucene.index.DocumentsWriterPerThread.flush(DocumentsWriterPerThread.java:501)
 at 
 org.apache.lucene.index.DocumentsWriter.doFlush(DocumentsWriter.java:478)
 at 
 org.apache.lucene.index.DocumentsWriter.flushAllThreads(DocumentsWriter.java:615)
 at 
 org.apache.lucene.index.IndexWriter.prepareCommitInternal(IndexWriter.java:2760)
 at 
 org.apache.lucene.index.IndexWriter.commitInternal(IndexWriter.java:2909)
 at org.apache.lucene.index.IndexWriter.commit(IndexWriter.java:2884)
 at 
 org.apache.lucene.index.RandomIndexWriter.getReader(RandomIndexWriter.java:312)
 at 
 org.apache.lucene.index.RandomIndexWriter.getReader(RandomIndexWriter.java:249)
 at 
 org.apache.lucene.search.postingshighlight.TestPostingsHighlighter.testUserFailedToIndexOffsets(TestPostingsHighlighter.java:295)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:606)
 at 
 com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1559)
 at 
 com.carrotsearch.randomizedtesting.RandomizedRunner.access$600(RandomizedRunner.java:79)
 at 
 com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:737)
 at 
 com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:773)
 at 
 com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:787)
 at 
 org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50)
 at 
 org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:51)
 at 
 org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
 at 
 com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
 at 
 org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:49)
 at 
 org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70)
 at 
 org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
 at 
 com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
 at 
 com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:358)
 at 
 com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:782)
 at 
 com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:442)
 at 
 com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:746)
 at 
 com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:648)
 at 
 com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:682)
 at 
 com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:693)
 at 
 

[jira] [Commented] (SOLR-3076) Solr(Cloud) should support block joins

2013-08-11 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13736504#comment-13736504
 ] 

Yonik Seeley commented on SOLR-3076:


The important parts of this issue are:
- Serialization formats (XML, javabin, etc)
- join semantics
- join syntax... i.e. \{!child ...} \{!parent ...}
- common public Solr Java APIs SolrInputDocument, UpdateHandler/UpdateProcessor
- correctness

Other things are implementation details that can be improved over time.
We should be aware of things we *don't* want to support long term... this
is why I removed the external/custom cache dependency (in addition to the
usability implications).

As far as per-segment goes, some of the previous patches had issues
(such as caching SolrCache in QParser instances), double-caching
(the filter used by the join would be cached separately from the
same filter used in all other contexts), the custom caches defined in
solrconfig.xml, not to mention my general dislike for weak references.
Since per-segment filter caching is an orthogonal issue (and it would be
best to be able to specify this on a per-filter basis), I decided it was
best to leave per-segment filters for a different issue and create queries
that would work well with the way Solr currently does it's filter caching
and request construction.

Additionally, how to deal with the going backwards problem / expecting
all filters to be FixedBitSet (which Solr doesn't use) is still up in the
air: LUCENE-5092.  There's no reason to wait for that to get hashed out
before giving Solr users block child/parent join functionallity.  Those details
of the Java APIs just don't matter to Solr users.

These query classes in question are package-private classes that Solr
users do not see - truly an implementation detail.  Changing them in
the future (as long as the behavior is equivalent) would not even
warrant mention in release notes (unless performance had been improved).

Can there still be implementation improvements? Absolutely!  But I'm
personally currently out of time on this issue, and I feel comfortable
with supporting the public APIs we've come up with for some time to come.
Since no one seems to have issues with any of the important parts like
the public APIs, I plan on committing this shortly.  Additional
improvements/optimizations can come from follow-on patches.



 Solr(Cloud) should support block joins
 --

 Key: SOLR-3076
 URL: https://issues.apache.org/jira/browse/SOLR-3076
 Project: Solr
  Issue Type: New Feature
Reporter: Grant Ingersoll
Assignee: Yonik Seeley
 Fix For: 4.5, 5.0

 Attachments: 27M-singlesegment-histogram.png, 27M-singlesegment.png, 
 bjq-vs-filters-backward-disi.patch, bjq-vs-filters-illegal-state.patch, 
 child-bjqparser.patch, dih-3076.patch, dih-config.xml, 
 parent-bjq-qparser.patch, parent-bjq-qparser.patch, Screen Shot 2012-07-17 at 
 1.12.11 AM.png, SOLR-3076-childDocs.patch, SOLR-3076.patch, SOLR-3076.patch, 
 SOLR-3076.patch, SOLR-3076.patch, SOLR-3076.patch, SOLR-3076.patch, 
 SOLR-3076.patch, SOLR-3076.patch, SOLR-3076.patch, SOLR-3076.patch, 
 SOLR-3076.patch, SOLR-3076.patch, SOLR-3076.patch, 
 SOLR-7036-childDocs-solr-fork-trunk-patched, 
 solrconf-bjq-erschema-snippet.xml, solrconfig.xml.patch, 
 tochild-bjq-filtered-search-fix.patch


 Lucene has the ability to do block joins, we should add it to Solr.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3076) Solr(Cloud) should support block joins

2013-08-11 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13736508#comment-13736508
 ] 

Yonik Seeley commented on SOLR-3076:


bq. However, there is an argument provided by one of my colleagues and the 
brightest engineer ever (Nina G) - such courtesy works until merge happens, and 
after merge/expunge deletes it's a pain.

Ah, right you (and Nina G) are!  The inconsistency here (working until a merge) 
is worse than any performance difference.  I'll change it.

 Solr(Cloud) should support block joins
 --

 Key: SOLR-3076
 URL: https://issues.apache.org/jira/browse/SOLR-3076
 Project: Solr
  Issue Type: New Feature
Reporter: Grant Ingersoll
Assignee: Yonik Seeley
 Fix For: 4.5, 5.0

 Attachments: 27M-singlesegment-histogram.png, 27M-singlesegment.png, 
 bjq-vs-filters-backward-disi.patch, bjq-vs-filters-illegal-state.patch, 
 child-bjqparser.patch, dih-3076.patch, dih-config.xml, 
 parent-bjq-qparser.patch, parent-bjq-qparser.patch, Screen Shot 2012-07-17 at 
 1.12.11 AM.png, SOLR-3076-childDocs.patch, SOLR-3076.patch, SOLR-3076.patch, 
 SOLR-3076.patch, SOLR-3076.patch, SOLR-3076.patch, SOLR-3076.patch, 
 SOLR-3076.patch, SOLR-3076.patch, SOLR-3076.patch, SOLR-3076.patch, 
 SOLR-3076.patch, SOLR-3076.patch, SOLR-3076.patch, 
 SOLR-7036-childDocs-solr-fork-trunk-patched, 
 solrconf-bjq-erschema-snippet.xml, solrconfig.xml.patch, 
 tochild-bjq-filtered-search-fix.patch


 Lucene has the ability to do block joins, we should add it to Solr.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org