[jira] [Commented] (LUCENE-4569) Allow customization of column stride field and norms via indexing chain
[ https://issues.apache.org/jira/browse/LUCENE-4569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13505294#comment-13505294 ] Chris Male commented on LUCENE-4569: John, I don't really know much about the API you're wanting to change, but to help me understand are you able to explain more what you're trying to do in your custom indexing format / code? I think one of the major motivation for Codecs is to allow this sort of customization through their API (there is already Codecs for holding this in memory). Allow customization of column stride field and norms via indexing chain --- Key: LUCENE-4569 URL: https://issues.apache.org/jira/browse/LUCENE-4569 Project: Lucene - Core Issue Type: Improvement Components: core/index Affects Versions: 4.0 Reporter: John Wang Attachments: patch.diff We are building an in-memory indexing format and managing our own segments. We are doing this by implementing a custom IndexingChain. We would like to support column-stride-fields and norms without having to wire in a codec (since we are managing our postings differently) Suggested change is consistent with the api support for passing in a custom InvertedDocConsumer. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4114) Collection API: Allow multiple shards from one collection on the same Solr server
[ https://issues.apache.org/jira/browse/SOLR-4114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13505295#comment-13505295 ] Per Steffensen commented on SOLR-4114: -- bq. As far as terminology, when I say replicationFactor of 3, I mean 3 copies of the data. I also count the leader as a replica of a shard (which is logical). It follows from the clusterstate.json, which lists all replicas for a shard and one of them just has a flag indicating it's the leader. This also makes it easier to talk about a shard having 0 replicas (meaning there is not even a leader). Ok, its just than the replicationFactor you specify in your request is the other thing. You get replicationFactor + 1 shards per slice, if we define replicationFactor as the one you give in your request. Collection API: Allow multiple shards from one collection on the same Solr server - Key: SOLR-4114 URL: https://issues.apache.org/jira/browse/SOLR-4114 Project: Solr Issue Type: New Feature Components: multicore, SolrCloud Affects Versions: 4.0 Environment: Solr 4.0.0 release Reporter: Per Steffensen Assignee: Per Steffensen Labels: collection-api, multicore, shard, shard-allocation Attachments: SOLR-4114.patch We should support running multiple shards from one collection on the same Solr server - the run a collection with 8 shards on a 4 Solr server cluster (each Solr server running 2 shards). Performance tests at our side has shown that this is a good idea, and it is also a good idea for easy elasticity later on - it is much easier to move an entire existing shards from one Solr server to another one that just joined the cluter than it is to split an exsiting shard among the Solr that used to run it and the new Solr. See dev mailing list discussion Multiple shards for one collection on the same Solr server -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4114) Collection API: Allow multiple shards from one collection on the same Solr server
[ https://issues.apache.org/jira/browse/SOLR-4114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13505296#comment-13505296 ] Per Steffensen commented on SOLR-4114: -- bq. Solr 3.X to Solr 4.X back compat is not considered the same as Solr 4.0 to Solr 4.1 back compat. Of course, I agree! But anyway... Collection API: Allow multiple shards from one collection on the same Solr server - Key: SOLR-4114 URL: https://issues.apache.org/jira/browse/SOLR-4114 Project: Solr Issue Type: New Feature Components: multicore, SolrCloud Affects Versions: 4.0 Environment: Solr 4.0.0 release Reporter: Per Steffensen Assignee: Per Steffensen Labels: collection-api, multicore, shard, shard-allocation Attachments: SOLR-4114.patch We should support running multiple shards from one collection on the same Solr server - the run a collection with 8 shards on a 4 Solr server cluster (each Solr server running 2 shards). Performance tests at our side has shown that this is a good idea, and it is also a good idea for easy elasticity later on - it is much easier to move an entire existing shards from one Solr server to another one that just joined the cluter than it is to split an exsiting shard among the Solr that used to run it and the new Solr. See dev mailing list discussion Multiple shards for one collection on the same Solr server -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2141) NullPointerException when using escapeSql function
[ https://issues.apache.org/jira/browse/SOLR-2141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13505337#comment-13505337 ] Dominik Siebel commented on SOLR-2141: -- Hi, James, sorry I already forgot about that. Thanks for the good work! NullPointerException when using escapeSql function -- Key: SOLR-2141 URL: https://issues.apache.org/jira/browse/SOLR-2141 Project: Solr Issue Type: Bug Components: contrib - DataImportHandler Affects Versions: 1.4.1, 4.0 Environment: openjdk 1.6.0 b12 Reporter: Edward Rudd Assignee: James Dyer Fix For: 4.1, 5.0 Attachments: dih-config.xml, dih-file.xml, SOLR-2141.b341f5b.patch, SOLR-2141.patch, SOLR-2141.patch, SOLR-2141.patch, SOLR-2141.patch, SOLR-2141.patch, SOLR-2141.patch, SOLR-2141-sample.patch, SOLR-2141-test.patch I have two entities defined, nested in each other.. entity name=article query=select category, subcategory from articles entity name=other query=select other from othertable where category='${dataimporter.functions.escapeSql(article.category)}' AND subcategory='${dataimporter.functions.escapeSql(article.subcategory)}' /entity /entity Now, when I run that it bombs on any article where subcategory = '' (it's a NOT NULL column so empty string is there) If i do where subcategory!='' in the article query it works fine (aside from not pulling in all of the articles). org.apache.solr.handler.dataimport.DataImportHandlerException: java.lang.NullPointerException at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:424) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:383) at org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:242) at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:180) at org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:331) at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:389) at org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:370) Caused by: java.lang.NullPointerException at org.apache.solr.handler.dataimport.EvaluatorBag$1.evaluate(EvaluatorBag.java:75) at org.apache.solr.handler.dataimport.EvaluatorBag$5.get(EvaluatorBag.java:216) at org.apache.solr.handler.dataimport.EvaluatorBag$5.get(EvaluatorBag.java:204) at org.apache.solr.handler.dataimport.VariableResolverImpl.resolve(VariableResolverImpl.java:107) at org.apache.solr.handler.dataimport.TemplateString.fillTokens(TemplateString.java:81) at org.apache.solr.handler.dataimport.TemplateString.replaceTokens(TemplateString.java:75) at org.apache.solr.handler.dataimport.VariableResolverImpl.replaceTokens(VariableResolverImpl.java:87) at org.apache.solr.handler.dataimport.SqlEntityProcessor.nextRow(SqlEntityProcessor.java:71) at org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:237) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:357) ... 6 more -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-4115) WordBreakSpellChecker throws ArrayIndexOutOfBoundsException for random query string
Andreas Hubold created SOLR-4115: Summary: WordBreakSpellChecker throws ArrayIndexOutOfBoundsException for random query string Key: SOLR-4115 URL: https://issues.apache.org/jira/browse/SOLR-4115 Project: Solr Issue Type: Bug Components: spellchecker Affects Versions: 4.0 Environment: java version 1.6.0_37 Java(TM) SE Runtime Environment (build 1.6.0_37-b06) Java HotSpot(TM) 64-Bit Server VM (build 20.12-b01, mixed mode) Reporter: Andreas Hubold The following SolrJ test code causes an ArrayIndexOutOfBoundsException in the WordBreakSpellChecker. I tested this with the Solr 4.0.0 example webapp started with {{java -jar start.jar}}. {code:java} @Test public void testWordbreakSpellchecker() throws Exception { SolrQuery q = new SolrQuery(\uD864\uDC79); q.setRequestHandler(/browse); q.setParam(spellcheck.dictionary, wordbreak); HttpSolrServer server = new HttpSolrServer(http://localhost:8983/solr;); server.query(q, SolrRequest.METHOD.POST); } {code} {noformat} INFO: [collection1] webapp=/solr path=/browse params={spellcheck.dictionary=wordbreakqt=/browsewt=javabinq=?version=2} hits=0 status=500 QTime=11 Nov 28, 2012 11:23:01 AM org.apache.solr.common.SolrException log SEVERE: null:java.lang.ArrayIndexOutOfBoundsException: 1 at org.apache.lucene.util.UnicodeUtil.UTF8toUTF16(UnicodeUtil.java:599) at org.apache.lucene.util.BytesRef.utf8ToString(BytesRef.java:165) at org.apache.lucene.index.Term.text(Term.java:72) at org.apache.lucene.search.spell.WordBreakSpellChecker.generateSuggestWord(WordBreakSpellChecker.java:350) at org.apache.lucene.search.spell.WordBreakSpellChecker.generateBreakUpSuggestions(WordBreakSpellChecker.java:283) at org.apache.lucene.search.spell.WordBreakSpellChecker.suggestWordBreaks(WordBreakSpellChecker.java:122) at org.apache.solr.spelling.WordBreakSolrSpellChecker.getSuggestions(WordBreakSolrSpellChecker.java:229) at org.apache.solr.handler.component.SpellCheckComponent.process(SpellCheckComponent.java:172) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:206) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1699) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:455) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:276) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1337) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:484) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:119) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:524) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:233) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1065) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:413) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:192) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:999) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:117) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:250) at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:149) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:111) at org.eclipse.jetty.server.Server.handle(Server.java:351) at org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:454) at org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:47) at org.eclipse.jetty.server.AbstractHttpConnection.content(AbstractHttpConnection.java:900) at org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.content(AbstractHttpConnection.java:954) at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:857) at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235) at org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:66) at org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:254) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:599) at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:534) at java.lang.Thread.run(Thread.java:662) {noformat} The query string
[jira] [Commented] (SOLR-4114) Collection API: Allow multiple shards from one collection on the same Solr server
[ https://issues.apache.org/jira/browse/SOLR-4114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13505368#comment-13505368 ] Per Steffensen commented on SOLR-4114: -- bq. As far as terminology, when I say replicationFactor of 3, I mean 3 copies of the data. I also count the leader as a replica of a shard (which is logical). It follows from the clusterstate.json, which lists all replicas for a shard and one of them just has a flag indicating it's the leader. This also makes it easier to talk about a shard having 0 replicas (meaning there is not even a leader). I understand that you can view all shards under a slice as a replica, but in my mind replica is also a role that a shard plays at runtime - all shards except one under a slice plays the replica role at runtime, the remaining shard play the leader role. To not create to much confusion I suggest you use the term shards for all the instances under a slice, and that you use the term replica only for a role that a shard plays at runtime. But that of course would require changes e.g. to Slice-class where e.g. getReplicas, getReplicasCopy and getReplicasMap needs to me renamed to getShardsXXX. It probably shouldnt be done now, but as a part of a cross-code cleaning up in term-usage. Suggested terms: * collection: A big logical bucket to fill data into * slice: A logical part of a collection. A part of the data going into a collection goes into a particular slice. Slices for a particular collection are non-overlapping * shard: A physical instance of a slice. Running without replica there is one shard per slice. Running with replication-factor X there are X+1 shards per slice. * replica and leader: Roles played by shards at runtime. As soon as the system is not running there are no replica/leader - there are just shards * node-base-url: The prefix/base (up to and including the webapp-context) of the URL for a specific Solr server * node-name: A logical name for the Solr server - the same as node-base-url except /'s are replaced by _'s and the protocol part (http(s)://) is removed Collection API: Allow multiple shards from one collection on the same Solr server - Key: SOLR-4114 URL: https://issues.apache.org/jira/browse/SOLR-4114 Project: Solr Issue Type: New Feature Components: multicore, SolrCloud Affects Versions: 4.0 Environment: Solr 4.0.0 release Reporter: Per Steffensen Assignee: Per Steffensen Labels: collection-api, multicore, shard, shard-allocation Attachments: SOLR-4114.patch We should support running multiple shards from one collection on the same Solr server - the run a collection with 8 shards on a 4 Solr server cluster (each Solr server running 2 shards). Performance tests at our side has shown that this is a good idea, and it is also a good idea for easy elasticity later on - it is much easier to move an entire existing shards from one Solr server to another one that just joined the cluter than it is to split an exsiting shard among the Solr that used to run it and the new Solr. See dev mailing list discussion Multiple shards for one collection on the same Solr server -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3377) eDismax: A fielded query wrapped by parens is not recognized
[ https://issues.apache.org/jira/browse/SOLR-3377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13505370#comment-13505370 ] Leonhard Maylein commented on SOLR-3377: I do not agree that this issue is solved. I've tried the following combination with SOLR 4.0.0 q: +sw(a b) +ti:(c d) qf: freitext exttext^0.5 pf: freitext^6 exttext^3 The result is: str name=rawquerystring+sw:(a b) +ti:(c d)/str str name=querystring+sw:(a b) +ti:(c d)/str str name=parsedquery(+(+(sw:a sw:b) +(ti:c ti:d)) DisjunctionMaxQuery((freitext:b d^6.0)) DisjunctionMaxQuery((exttext:b d^3.0)))/no_coord/str There should be no splitting on the qf/pf fields and therefore no DisjunctionMaxQueries. The query '+(sw:a sw:b) +(ti:c ti:d)' works as expected. eDismax: A fielded query wrapped by parens is not recognized Key: SOLR-3377 URL: https://issues.apache.org/jira/browse/SOLR-3377 Project: Solr Issue Type: Bug Components: query parsers Affects Versions: 3.6 Reporter: Jan Høydahl Assignee: Yonik Seeley Priority: Critical Fix For: 4.0-BETA Attachments: SOLR-3377.patch, SOLR-3377.patch, SOLR-3377.patch, SOLR-3377.patch As reported by bernd on the user list, a query like this {{q=(name:test)}} will yield 0 hits in 3.6 while it worked in 3.5. It works without the parens. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-4114) Collection API: Allow multiple shards from one collection on the same Solr server
[ https://issues.apache.org/jira/browse/SOLR-4114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13505368#comment-13505368 ] Per Steffensen edited comment on SOLR-4114 at 11/28/12 11:20 AM: - bq. As far as terminology, when I say replicationFactor of 3, I mean 3 copies of the data. I also count the leader as a replica of a shard (which is logical). It follows from the clusterstate.json, which lists all replicas for a shard and one of them just has a flag indicating it's the leader. This also makes it easier to talk about a shard having 0 replicas (meaning there is not even a leader). I understand that you can view all shards under a slice as a replica, but in my mind replica is also a role that a shard plays at runtime - all shards except one under a slice play the replica role at runtime, the remaining shard plays the leader role at runtime. To not create to much confusion, I suggest you use the term shards for all the instances under a slice, and that you use the terms replica and leader only for a role that a shard plays at runtime. But that of course would require changes e.g. to Slice-class where e.g. getReplicas, getReplicasCopy and getReplicasMap needs to me renamed to getShardsXXX. It probably shouldnt be done now, but as a part of a cross-code cleaning up in term-usage. Today there is a heavy mixup of term-usage in the code - replica and shard are sometimes used for a node, replica and shard are used for the same thing, etc. Suggested terms: * collection: A big logical bucket to fill data into * slice: A logical part of a collection. A part of the data going into a collection goes into a particular slice. Slices for a particular collection are non-overlapping * shard: A physical instance of a slice. Running without replica there is one shard per slice. Running with replication-factor X there are X+1 shards per slice. * replica and leader: Roles played by shards at runtime. As soon as the system is not running there are no replica/leader - there are just shards * node-base-url: The prefix/base (up to and including the webapp-context) of the URL for a specific Solr server * node-name: A logical name for the Solr server - the same as node-base-url except /'s are replaced by _'s and the protocol part (http(s)://) is removed was (Author: steff1193): bq. As far as terminology, when I say replicationFactor of 3, I mean 3 copies of the data. I also count the leader as a replica of a shard (which is logical). It follows from the clusterstate.json, which lists all replicas for a shard and one of them just has a flag indicating it's the leader. This also makes it easier to talk about a shard having 0 replicas (meaning there is not even a leader). I understand that you can view all shards under a slice as a replica, but in my mind replica is also a role that a shard plays at runtime - all shards except one under a slice plays the replica role at runtime, the remaining shard play the leader role. To not create to much confusion I suggest you use the term shards for all the instances under a slice, and that you use the term replica only for a role that a shard plays at runtime. But that of course would require changes e.g. to Slice-class where e.g. getReplicas, getReplicasCopy and getReplicasMap needs to me renamed to getShardsXXX. It probably shouldnt be done now, but as a part of a cross-code cleaning up in term-usage. Suggested terms: * collection: A big logical bucket to fill data into * slice: A logical part of a collection. A part of the data going into a collection goes into a particular slice. Slices for a particular collection are non-overlapping * shard: A physical instance of a slice. Running without replica there is one shard per slice. Running with replication-factor X there are X+1 shards per slice. * replica and leader: Roles played by shards at runtime. As soon as the system is not running there are no replica/leader - there are just shards * node-base-url: The prefix/base (up to and including the webapp-context) of the URL for a specific Solr server * node-name: A logical name for the Solr server - the same as node-base-url except /'s are replaced by _'s and the protocol part (http(s)://) is removed Collection API: Allow multiple shards from one collection on the same Solr server - Key: SOLR-4114 URL: https://issues.apache.org/jira/browse/SOLR-4114 Project: Solr Issue Type: New Feature Components: multicore, SolrCloud Affects Versions: 4.0 Environment: Solr 4.0.0 release Reporter: Per Steffensen Assignee: Per Steffensen Labels: collection-api, multicore, shard, shard-allocation
[jira] [Commented] (SOLR-2368) Improve extended dismax (edismax) parser
[ https://issues.apache.org/jira/browse/SOLR-2368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13505374#comment-13505374 ] Leonhard Maylein commented on SOLR-2368: Please consider to also incorporate SOLR-3377 which is marked as fixed but it is not completely solved (see my comment on SOLR-3377). Improve extended dismax (edismax) parser Key: SOLR-2368 URL: https://issues.apache.org/jira/browse/SOLR-2368 Project: Solr Issue Type: Improvement Components: query parsers Reporter: Yonik Seeley Labels: QueryParser This is a mother issue to track further improvements for eDismax parser. The goal is to be able to deprecate and remove the old dismax once edismax satisfies all usecases of dismax. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4032) Files larger than an internal buffer size fail to replicate
[ https://issues.apache.org/jira/browse/SOLR-4032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13505378#comment-13505378 ] Markus Jelsma commented on SOLR-4032: - Great work, it seems this issue is resolved indeed as i cannot reproduce this exact problem. But another EOF exception pops up, i'll open a new issue. Files larger than an internal buffer size fail to replicate --- Key: SOLR-4032 URL: https://issues.apache.org/jira/browse/SOLR-4032 Project: Solr Issue Type: Bug Components: SolrCloud Affects Versions: 5.0 Environment: 5.0-SNAPSHOT 1366361:1404534M - markus - 2012-11-01 12:37:38 Debian Squeeze, Tomcat 6, Sun Java 6, 10 nodes, 10 shards, rep. factor 2. Reporter: Markus Jelsma Assignee: Mark Miller Priority: Blocker Fix For: 5.0 Attachments: SOLR-4032.patch Please see: http://lucene.472066.n3.nabble.com/trunk-is-unable-to-replicate-between-nodes-Unable-to-download-completely-td4017049.html and http://lucene.472066.n3.nabble.com/Possible-memory-leak-in-recovery-td4017833.html -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-4116) Log Replay [recoveryExecutor-8-thread-1] - : java.io.EOFException
Markus Jelsma created SOLR-4116: --- Summary: Log Replay [recoveryExecutor-8-thread-1] - : java.io.EOFException Key: SOLR-4116 URL: https://issues.apache.org/jira/browse/SOLR-4116 Project: Solr Issue Type: Bug Components: SolrCloud Affects Versions: 5.0 Environment: 5.0.0.2012.11.28.10.42.06 Debian Squeeze, Tomcat 6, Sun Java 6, 10 nodes, 10 shards, rep. factor 2. Reporter: Markus Jelsma Fix For: 5.0 With SOLR-4032 fixed we see other issues when randomly taking down nodes (nicely via tomcat restart) while indexing a few million web pages from Hadoop. We do make sure that at least one node is up for a shard but due to recovery issues it may not be live. {code} 2012-11-28 11:32:33,086 WARN [solr.update.UpdateLog] - [recoveryExecutor-8-thread-1] - : Starting log replay tlog{file=/opt/solr/cores/openindex_e/data/tlog/tlog.028 refcount=2} active=false starting pos=0 2012-11-28 11:32:41,873 ERROR [solr.update.UpdateLog] - [recoveryExecutor-8-thread-1] - : java.io.EOFException at org.apache.solr.common.util.FastInputStream.readFully(FastInputStream.java:151) at org.apache.solr.common.util.JavaBinCodec.readStr(JavaBinCodec.java:479) at org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:176) at org.apache.solr.common.util.JavaBinCodec.readSolrInputDocument(JavaBinCodec.java:374) at org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:225) at org.apache.solr.common.util.JavaBinCodec.readArray(JavaBinCodec.java:451) at org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:182) at org.apache.solr.update.TransactionLog$LogReader.next(TransactionLog.java:618) at org.apache.solr.update.UpdateLog$LogReplayer.doReplay(UpdateLog.java:1198) at org.apache.solr.update.UpdateLog$LogReplayer.run(UpdateLog.java:1143) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-4117) IO error while trying to get the size of the Directory
Markus Jelsma created SOLR-4117: --- Summary: IO error while trying to get the size of the Directory Key: SOLR-4117 URL: https://issues.apache.org/jira/browse/SOLR-4117 Project: Solr Issue Type: Bug Components: SolrCloud Affects Versions: 5.0 Environment: 5.0.0.2012.11.28.10.42.06 Debian Squeeze, Tomcat 6, Sun Java 6, 10 nodes, 10 shards, rep. factor 2. Reporter: Markus Jelsma Fix For: 5.0 With SOLR-4032 fixed we see other issues when randomly taking down nodes (nicely via tomcat restart) while indexing a few million web pages from Hadoop. We do make sure that at least one node is up for a shard but due to recovery issues it may not be live. One node seems to work but generates IO errors in the log and ZookeeperExeption in the GUI. In the GUI we only see: {code} SolrCore Initialization Failures openindex_f: org.apache.solr.common.cloud.ZooKeeperException:org.apache.solr.common.cloud.ZooKeeperException: Please check your logs for more information {code} and in the log we only see the following exception: {code} 2012-11-28 11:47:26,652 ERROR [solr.handler.ReplicationHandler] - [http-8080-exec-28] - : IO error while trying to get the size of the Directory:org.apache.lucene.store.NoSuchDirectoryException: directory '/opt/solr/cores/shard_f/data/index' does not exist at org.apache.lucene.store.FSDirectory.listAll(FSDirectory.java:217) at org.apache.lucene.store.FSDirectory.listAll(FSDirectory.java:240) at org.apache.lucene.store.NRTCachingDirectory.listAll(NRTCachingDirectory.java:132) at org.apache.solr.core.DirectoryFactory.sizeOfDirectory(DirectoryFactory.java:146) at org.apache.solr.handler.ReplicationHandler.getIndexSize(ReplicationHandler.java:472) at org.apache.solr.handler.ReplicationHandler.getReplicationDetails(ReplicationHandler.java:568) at org.apache.solr.handler.ReplicationHandler.handleRequestBody(ReplicationHandler.java:213) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:144) at org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:240) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1830) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:476) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:276) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293) at org.apache.coyote.http11.Http11NioProcessor.process(Http11NioProcessor.java:889) at org.apache.coyote.http11.Http11NioProtocol$Http11ConnectionHandler.process(Http11NioProtocol.java:744) at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.run(NioEndpoint.java:2274) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-4032) Files larger than an internal buffer size fail to replicate
[ https://issues.apache.org/jira/browse/SOLR-4032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13505378#comment-13505378 ] Markus Jelsma edited comment on SOLR-4032 at 11/28/12 11:51 AM: Great work, it seems this issue is resolved indeed as i cannot reproduce this exact problem. But another EOF exception pops up, i'll open a new issue. edit: another issue popped up as well, added SOLR-4116 and SOLR-4117 was (Author: markus17): Great work, it seems this issue is resolved indeed as i cannot reproduce this exact problem. But another EOF exception pops up, i'll open a new issue. Files larger than an internal buffer size fail to replicate --- Key: SOLR-4032 URL: https://issues.apache.org/jira/browse/SOLR-4032 Project: Solr Issue Type: Bug Components: SolrCloud Affects Versions: 5.0 Environment: 5.0-SNAPSHOT 1366361:1404534M - markus - 2012-11-01 12:37:38 Debian Squeeze, Tomcat 6, Sun Java 6, 10 nodes, 10 shards, rep. factor 2. Reporter: Markus Jelsma Assignee: Mark Miller Priority: Blocker Fix For: 5.0 Attachments: SOLR-4032.patch Please see: http://lucene.472066.n3.nabble.com/trunk-is-unable-to-replicate-between-nodes-Unable-to-download-completely-td4017049.html and http://lucene.472066.n3.nabble.com/Possible-memory-leak-in-recovery-td4017833.html -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-4117) IO error while trying to get the size of the Directory
[ https://issues.apache.org/jira/browse/SOLR-4117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated SOLR-4117: Priority: Minor (was: Major) This issue is the same as reported in SOLR-4032. It does not resolve itself, as it did before in SOLR-4032, when reloading a core or restarting the servlet container. The Zookeeper exception in the GUI is gone after restart so it's likely not related. IO error while trying to get the size of the Directory -- Key: SOLR-4117 URL: https://issues.apache.org/jira/browse/SOLR-4117 Project: Solr Issue Type: Bug Components: SolrCloud Affects Versions: 5.0 Environment: 5.0.0.2012.11.28.10.42.06 Debian Squeeze, Tomcat 6, Sun Java 6, 10 nodes, 10 shards, rep. factor 2. Reporter: Markus Jelsma Priority: Minor Fix For: 5.0 With SOLR-4032 fixed we see other issues when randomly taking down nodes (nicely via tomcat restart) while indexing a few million web pages from Hadoop. We do make sure that at least one node is up for a shard but due to recovery issues it may not be live. One node seems to work but generates IO errors in the log and ZookeeperExeption in the GUI. In the GUI we only see: {code} SolrCore Initialization Failures openindex_f: org.apache.solr.common.cloud.ZooKeeperException:org.apache.solr.common.cloud.ZooKeeperException: Please check your logs for more information {code} and in the log we only see the following exception: {code} 2012-11-28 11:47:26,652 ERROR [solr.handler.ReplicationHandler] - [http-8080-exec-28] - : IO error while trying to get the size of the Directory:org.apache.lucene.store.NoSuchDirectoryException: directory '/opt/solr/cores/shard_f/data/index' does not exist at org.apache.lucene.store.FSDirectory.listAll(FSDirectory.java:217) at org.apache.lucene.store.FSDirectory.listAll(FSDirectory.java:240) at org.apache.lucene.store.NRTCachingDirectory.listAll(NRTCachingDirectory.java:132) at org.apache.solr.core.DirectoryFactory.sizeOfDirectory(DirectoryFactory.java:146) at org.apache.solr.handler.ReplicationHandler.getIndexSize(ReplicationHandler.java:472) at org.apache.solr.handler.ReplicationHandler.getReplicationDetails(ReplicationHandler.java:568) at org.apache.solr.handler.ReplicationHandler.handleRequestBody(ReplicationHandler.java:213) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:144) at org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:240) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1830) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:476) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:276) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293) at org.apache.coyote.http11.Http11NioProcessor.process(Http11NioProcessor.java:889) at org.apache.coyote.http11.Http11NioProtocol$Http11ConnectionHandler.process(Http11NioProtocol.java:744) at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.run(NioEndpoint.java:2274) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-4117) IO error while trying to get the size of the Directory
[ https://issues.apache.org/jira/browse/SOLR-4117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13505391#comment-13505391 ] Markus Jelsma edited comment on SOLR-4117 at 11/28/12 12:17 PM: This issue is the same as reported in SOLR-4032. It does not resolve itself, as it did before in SOLR-4032, when reloading a core or restarting the servlet container. The Zookeeper exception in the GUI is gone after restart so it's likely not related. edit: the index.properties file in both cores point to the correct index.LARGE_NUMBER directory but NRTDir tries ./data/index regardless. was (Author: markus17): This issue is the same as reported in SOLR-4032. It does not resolve itself, as it did before in SOLR-4032, when reloading a core or restarting the servlet container. The Zookeeper exception in the GUI is gone after restart so it's likely not related. IO error while trying to get the size of the Directory -- Key: SOLR-4117 URL: https://issues.apache.org/jira/browse/SOLR-4117 Project: Solr Issue Type: Bug Components: SolrCloud Affects Versions: 5.0 Environment: 5.0.0.2012.11.28.10.42.06 Debian Squeeze, Tomcat 6, Sun Java 6, 10 nodes, 10 shards, rep. factor 2. Reporter: Markus Jelsma Priority: Minor Fix For: 5.0 With SOLR-4032 fixed we see other issues when randomly taking down nodes (nicely via tomcat restart) while indexing a few million web pages from Hadoop. We do make sure that at least one node is up for a shard but due to recovery issues it may not be live. One node seems to work but generates IO errors in the log and ZookeeperExeption in the GUI. In the GUI we only see: {code} SolrCore Initialization Failures openindex_f: org.apache.solr.common.cloud.ZooKeeperException:org.apache.solr.common.cloud.ZooKeeperException: Please check your logs for more information {code} and in the log we only see the following exception: {code} 2012-11-28 11:47:26,652 ERROR [solr.handler.ReplicationHandler] - [http-8080-exec-28] - : IO error while trying to get the size of the Directory:org.apache.lucene.store.NoSuchDirectoryException: directory '/opt/solr/cores/shard_f/data/index' does not exist at org.apache.lucene.store.FSDirectory.listAll(FSDirectory.java:217) at org.apache.lucene.store.FSDirectory.listAll(FSDirectory.java:240) at org.apache.lucene.store.NRTCachingDirectory.listAll(NRTCachingDirectory.java:132) at org.apache.solr.core.DirectoryFactory.sizeOfDirectory(DirectoryFactory.java:146) at org.apache.solr.handler.ReplicationHandler.getIndexSize(ReplicationHandler.java:472) at org.apache.solr.handler.ReplicationHandler.getReplicationDetails(ReplicationHandler.java:568) at org.apache.solr.handler.ReplicationHandler.handleRequestBody(ReplicationHandler.java:213) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:144) at org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:240) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1830) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:476) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:276) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293) at org.apache.coyote.http11.Http11NioProcessor.process(Http11NioProcessor.java:889) at org.apache.coyote.http11.Http11NioProtocol$Http11ConnectionHandler.process(Http11NioProtocol.java:744) at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.run(NioEndpoint.java:2274) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) {code} -- This message is automatically
[jira] [Commented] (SOLR-4114) Collection API: Allow multiple shards from one collection on the same Solr server
[ https://issues.apache.org/jira/browse/SOLR-4114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13505397#comment-13505397 ] Per Steffensen commented on SOLR-4114: -- Patch including the maxShardsPerNode feature comming up. And (much) better testing of the create operation of the Collections API. Collection API: Allow multiple shards from one collection on the same Solr server - Key: SOLR-4114 URL: https://issues.apache.org/jira/browse/SOLR-4114 Project: Solr Issue Type: New Feature Components: multicore, SolrCloud Affects Versions: 4.0 Environment: Solr 4.0.0 release Reporter: Per Steffensen Assignee: Per Steffensen Labels: collection-api, multicore, shard, shard-allocation Attachments: SOLR-4114.patch We should support running multiple shards from one collection on the same Solr server - the run a collection with 8 shards on a 4 Solr server cluster (each Solr server running 2 shards). Performance tests at our side has shown that this is a good idea, and it is also a good idea for easy elasticity later on - it is much easier to move an entire existing shards from one Solr server to another one that just joined the cluter than it is to split an exsiting shard among the Solr that used to run it and the new Solr. See dev mailing list discussion Multiple shards for one collection on the same Solr server -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4028) When using ZK chroot, it would be nice if Solr would create the initial path when it doesn't exist.
[ https://issues.apache.org/jira/browse/SOLR-4028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13505415#comment-13505415 ] Tomás Fernández Löbbe commented on SOLR-4028: - I think I see the issue here, the problem would be if someone mistype the initial path, instead of throwing exceptions and stopping, we would be creating a new path and probably hiding an error. However, we do create paths for overseer and upload configs automatically, I think creating the initial path is more consistent with the current behavior than stopping startup. Other options I thought are: • Only create the initial path when bootstrap_conf is true (or bootstrap_confdir). This could still have the same issue described above. • Add a new parameter to force creation, something like –DzkHost.create=true. This could add unnecessary parameters and configuration complexity. When using ZK chroot, it would be nice if Solr would create the initial path when it doesn't exist. --- Key: SOLR-4028 URL: https://issues.apache.org/jira/browse/SOLR-4028 Project: Solr Issue Type: Improvement Components: SolrCloud Reporter: Tomás Fernández Löbbe Priority: Minor Attachments: SOLR-4028.patch I think this would make it easier to test and develop with SolrCloud, in order to start with a fresh ZK directory now the approach is to delete ZK data, with this improvement one could just add a chroot to the zkHost like: java -DzkHost=localhost:2181/testXYZ -jar start.jar Right now this is possible but you have to manually create the initial path. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Assigned] (SOLR-4117) IO error while trying to get the size of the Directory
[ https://issues.apache.org/jira/browse/SOLR-4117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Miller reassigned SOLR-4117: - Assignee: Mark Miller IO error while trying to get the size of the Directory -- Key: SOLR-4117 URL: https://issues.apache.org/jira/browse/SOLR-4117 Project: Solr Issue Type: Bug Components: SolrCloud Affects Versions: 5.0 Environment: 5.0.0.2012.11.28.10.42.06 Debian Squeeze, Tomcat 6, Sun Java 6, 10 nodes, 10 shards, rep. factor 2. Reporter: Markus Jelsma Assignee: Mark Miller Priority: Minor Fix For: 5.0 With SOLR-4032 fixed we see other issues when randomly taking down nodes (nicely via tomcat restart) while indexing a few million web pages from Hadoop. We do make sure that at least one node is up for a shard but due to recovery issues it may not be live. One node seems to work but generates IO errors in the log and ZookeeperExeption in the GUI. In the GUI we only see: {code} SolrCore Initialization Failures openindex_f: org.apache.solr.common.cloud.ZooKeeperException:org.apache.solr.common.cloud.ZooKeeperException: Please check your logs for more information {code} and in the log we only see the following exception: {code} 2012-11-28 11:47:26,652 ERROR [solr.handler.ReplicationHandler] - [http-8080-exec-28] - : IO error while trying to get the size of the Directory:org.apache.lucene.store.NoSuchDirectoryException: directory '/opt/solr/cores/shard_f/data/index' does not exist at org.apache.lucene.store.FSDirectory.listAll(FSDirectory.java:217) at org.apache.lucene.store.FSDirectory.listAll(FSDirectory.java:240) at org.apache.lucene.store.NRTCachingDirectory.listAll(NRTCachingDirectory.java:132) at org.apache.solr.core.DirectoryFactory.sizeOfDirectory(DirectoryFactory.java:146) at org.apache.solr.handler.ReplicationHandler.getIndexSize(ReplicationHandler.java:472) at org.apache.solr.handler.ReplicationHandler.getReplicationDetails(ReplicationHandler.java:568) at org.apache.solr.handler.ReplicationHandler.handleRequestBody(ReplicationHandler.java:213) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:144) at org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:240) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1830) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:476) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:276) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293) at org.apache.coyote.http11.Http11NioProcessor.process(Http11NioProcessor.java:889) at org.apache.coyote.http11.Http11NioProtocol$Http11ConnectionHandler.process(Http11NioProtocol.java:744) at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.run(NioEndpoint.java:2274) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3377) eDismax: A fielded query wrapped by parens is not recognized
[ https://issues.apache.org/jira/browse/SOLR-3377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13505453#comment-13505453 ] Jack Krupansky commented on SOLR-3377: -- Leonhard, your use case seems rather different from that of this Jira. I presume that you are referring to the generated phrase query boost being a little odd, or maybe that the phrase boost should not occur when the terms are queried against fields not listed in the pf parameter. Feel free to raise that as a separate issue. You refer to splitting, but I don't see any term splitting in this example. eDismax: A fielded query wrapped by parens is not recognized Key: SOLR-3377 URL: https://issues.apache.org/jira/browse/SOLR-3377 Project: Solr Issue Type: Bug Components: query parsers Affects Versions: 3.6 Reporter: Jan Høydahl Assignee: Yonik Seeley Priority: Critical Fix For: 4.0-BETA Attachments: SOLR-3377.patch, SOLR-3377.patch, SOLR-3377.patch, SOLR-3377.patch As reported by bernd on the user list, a query like this {{q=(name:test)}} will yield 0 hits in 3.6 while it worked in 3.5. It works without the parens. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4116) Log Replay [recoveryExecutor-8-thread-1] - : java.io.EOFException
[ https://issues.apache.org/jira/browse/SOLR-4116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13505458#comment-13505458 ] Yonik Seeley commented on SOLR-4116: I don't know what tomcat restart does, but perhaps it's not as nice as you think if it causes a log replay on restart? Anyway, bringing down a server roughly enough (like kill -9) can cause truncated tlog files. But truncated log files are expected and should not cause fatal exceptions (and we have tests for that). This exception causes the core not to come up? Log Replay [recoveryExecutor-8-thread-1] - : java.io.EOFException - Key: SOLR-4116 URL: https://issues.apache.org/jira/browse/SOLR-4116 Project: Solr Issue Type: Bug Components: SolrCloud Affects Versions: 5.0 Environment: 5.0.0.2012.11.28.10.42.06 Debian Squeeze, Tomcat 6, Sun Java 6, 10 nodes, 10 shards, rep. factor 2. Reporter: Markus Jelsma Fix For: 5.0 With SOLR-4032 fixed we see other issues when randomly taking down nodes (nicely via tomcat restart) while indexing a few million web pages from Hadoop. We do make sure that at least one node is up for a shard but due to recovery issues it may not be live. {code} 2012-11-28 11:32:33,086 WARN [solr.update.UpdateLog] - [recoveryExecutor-8-thread-1] - : Starting log replay tlog{file=/opt/solr/cores/openindex_e/data/tlog/tlog.028 refcount=2} active=false starting pos=0 2012-11-28 11:32:41,873 ERROR [solr.update.UpdateLog] - [recoveryExecutor-8-thread-1] - : java.io.EOFException at org.apache.solr.common.util.FastInputStream.readFully(FastInputStream.java:151) at org.apache.solr.common.util.JavaBinCodec.readStr(JavaBinCodec.java:479) at org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:176) at org.apache.solr.common.util.JavaBinCodec.readSolrInputDocument(JavaBinCodec.java:374) at org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:225) at org.apache.solr.common.util.JavaBinCodec.readArray(JavaBinCodec.java:451) at org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:182) at org.apache.solr.update.TransactionLog$LogReader.next(TransactionLog.java:618) at org.apache.solr.update.UpdateLog$LogReplayer.doReplay(UpdateLog.java:1198) at org.apache.solr.update.UpdateLog$LogReplayer.run(UpdateLog.java:1143) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4028) When using ZK chroot, it would be nice if Solr would create the initial path when it doesn't exist.
[ https://issues.apache.org/jira/browse/SOLR-4028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13505462#comment-13505462 ] Mark Miller commented on SOLR-4028: --- Yeah, I think that was perhaps the concern - basically, it seems ops type people prefer being explicit. Other paths are auto-created, but they are not arbitrary paths supplied by the user as a connect string - I guess it's a little different. If you are trying to connect to an existing node and type something wrong, you just create a new one rather than getting an error. I don't know what's best, but like I said, I guess I lean towards auto creating. When using ZK chroot, it would be nice if Solr would create the initial path when it doesn't exist. --- Key: SOLR-4028 URL: https://issues.apache.org/jira/browse/SOLR-4028 Project: Solr Issue Type: Improvement Components: SolrCloud Reporter: Tomás Fernández Löbbe Priority: Minor Attachments: SOLR-4028.patch I think this would make it easier to test and develop with SolrCloud, in order to start with a fresh ZK directory now the approach is to delete ZK data, with this improvement one could just add a chroot to the zkHost like: java -DzkHost=localhost:2181/testXYZ -jar start.jar Right now this is possible but you have to manually create the initial path. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4116) Log Replay [recoveryExecutor-8-thread-1] - : java.io.EOFException
[ https://issues.apache.org/jira/browse/SOLR-4116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13505466#comment-13505466 ] Markus Jelsma commented on SOLR-4116: - Restarting or stopping Tomcat shuts down CoreContainer and stops recovery, i believe this is nice enough or isn't it? This error does not cause the core not to come up. {code} 2012-11-28 14:10:15,227 INFO [solr.core.CoreContainer] - [Thread-6] - : Shutting down CoreContainer instance=1830423861 2012-11-28 14:10:15,227 WARN [solr.cloud.RecoveryStrategy] - [Thread-6] - : Stopping recovery for zkNodeName=178.21.118.195:8080_solr_shard_fcore=shard_f 2012-11-28 14:10:15,227 WARN [solr.cloud.RecoveryStrategy] - [Thread-6] - : Stopping recovery for zkNodeName=178.21.118.195:8080_solr_shard_gcore=shard_g 2012-11-28 14:10:15,227 INFO [solr.core.SolrCore] - [Thread-6] - : [shard_f] CLOSING SolrCore org.apache.solr.core.SolrCore@513c952f 2012-11-28 14:10:15,230 INFO [solr.update.UpdateHandler] - [Thread-6] - : closing DirectUpdateHandler2{commits=1,autocommit maxTime=12ms,autocommits=0,soft autocommit maxTime=1ms,soft autocommits=0,optimizes=0,rollbacks=0,expungeDeletes=0,docsPending=0,adds=0,deletesById=0,deletesByQuery=0,errors=0,cumulative_adds=0,cumulative_deletesById=0,cumulative_deletesByQuery=0,cumulative_errors=0} 2012-11-28 14:10:15,231 INFO [solr.core.SolrCore] - [Thread-6] - : Closing SolrCoreState 2012-11-28 14:10:15,231 INFO [solr.update.DefaultSolrCoreState] - [Thread-6] - : SolrCoreState ref count has reached 0 - closing IndexWriter 2012-11-28 14:10:15,231 INFO [solr.update.DefaultSolrCoreState] - [Thread-6] - : closing IndexWriter with IndexWriterCloser 2012-11-28 14:10:15,234 INFO [solr.core.CachingDirectoryFactory] - [Thread-6] - : Releasing directory:/opt/solr/cores/shard_f/data/index.20121128113300496 2012-11-28 14:10:15,235 INFO [solr.core.SolrCore] - [Thread-6] - : [shard_f] Closing main searcher on request. 2012-11-28 14:10:15,244 INFO [solr.core.CachingDirectoryFactory] - [Thread-6] - : Releasing directory:/opt/solr/cores/shard_f/data/index.20121128113300496 2012-11-28 14:10:15,244 INFO [solr.core.SolrCore] - [Thread-6] - : [shard_g] CLOSING SolrCore org.apache.solr.core.SolrCore@24be0446 2012-11-28 14:10:15,248 INFO [solr.update.UpdateHandler] - [Thread-6] - : closing DirectUpdateHandler2{commits=1,autocommit maxTime=12ms,autocommits=0,soft autocommit maxTime=1ms,soft autocommits=0,optimizes=0,rollbacks=0,expungeDeletes=0,docsPending=0,adds=0,deletesById=0,deletesByQuery=0,errors=0,cumulative_adds=0,cumulative_deletesById=0,cumulative_deletesByQuery=0,cumulative_errors=0} 2012-11-28 14:10:15,248 INFO [solr.core.SolrCore] - [Thread-6] - : Closing SolrCoreState 2012-11-28 14:10:15,248 INFO [solr.update.DefaultSolrCoreState] - [Thread-6] - : SolrCoreState ref count has reached 0 - closing IndexWriter 2012-11-28 14:10:15,248 INFO [solr.update.DefaultSolrCoreState] - [Thread-6] - : closing IndexWriter with IndexWriterCloser 2012-11-28 14:10:15,250 INFO [solr.core.CachingDirectoryFactory] - [Thread-6] - : Releasing directory:/opt/solr/cores/shard_g/data/index.20121128113035951 2012-11-28 14:10:15,250 INFO [solr.core.SolrCore] - [Thread-6] - : [shard_g] Closing main searcher on request. 2012-11-28 14:10:15,256 INFO [solr.core.CachingDirectoryFactory] - [Thread-6] - : Releasing directory:/opt/solr/cores/shard_g/data/index.20121128113035951 2012-11-28 14:10:15,281 INFO [apache.zookeeper.ZooKeeper] - [Thread-6] - : Session: 0x13b4668803e000f closed {code} Log Replay [recoveryExecutor-8-thread-1] - : java.io.EOFException - Key: SOLR-4116 URL: https://issues.apache.org/jira/browse/SOLR-4116 Project: Solr Issue Type: Bug Components: SolrCloud Affects Versions: 5.0 Environment: 5.0.0.2012.11.28.10.42.06 Debian Squeeze, Tomcat 6, Sun Java 6, 10 nodes, 10 shards, rep. factor 2. Reporter: Markus Jelsma Fix For: 5.0 With SOLR-4032 fixed we see other issues when randomly taking down nodes (nicely via tomcat restart) while indexing a few million web pages from Hadoop. We do make sure that at least one node is up for a shard but due to recovery issues it may not be live. {code} 2012-11-28 11:32:33,086 WARN [solr.update.UpdateLog] - [recoveryExecutor-8-thread-1] - : Starting log replay tlog{file=/opt/solr/cores/openindex_e/data/tlog/tlog.028 refcount=2} active=false starting pos=0 2012-11-28 11:32:41,873 ERROR [solr.update.UpdateLog] - [recoveryExecutor-8-thread-1] - : java.io.EOFException at org.apache.solr.common.util.FastInputStream.readFully(FastInputStream.java:151) at org.apache.solr.common.util.JavaBinCodec.readStr(JavaBinCodec.java:479) at
[jira] [Commented] (SOLR-4114) Collection API: Allow multiple shards from one collection on the same Solr server
[ https://issues.apache.org/jira/browse/SOLR-4114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13505468#comment-13505468 ] Mark Miller commented on SOLR-4114: --- bq. fixed in collectionCmd (used for delete and reload) but not in createCollection This fix belongs with the issue that fixed delete and reload - I'm going to fix it there. Collection API: Allow multiple shards from one collection on the same Solr server - Key: SOLR-4114 URL: https://issues.apache.org/jira/browse/SOLR-4114 Project: Solr Issue Type: New Feature Components: multicore, SolrCloud Affects Versions: 4.0 Environment: Solr 4.0.0 release Reporter: Per Steffensen Assignee: Per Steffensen Labels: collection-api, multicore, shard, shard-allocation Attachments: SOLR-4114.patch We should support running multiple shards from one collection on the same Solr server - the run a collection with 8 shards on a 4 Solr server cluster (each Solr server running 2 shards). Performance tests at our side has shown that this is a good idea, and it is also a good idea for easy elasticity later on - it is much easier to move an entire existing shards from one Solr server to another one that just joined the cluter than it is to split an exsiting shard among the Solr that used to run it and the new Solr. See dev mailing list discussion Multiple shards for one collection on the same Solr server -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4114) Collection API: Allow multiple shards from one collection on the same Solr server
[ https://issues.apache.org/jira/browse/SOLR-4114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13505471#comment-13505471 ] Yonik Seeley commented on SOLR-4114: bq. Ok, its just than the replicationFactor you specify in your request is the other thing. Hmmm, you're right: Note: replicationFactor defines the maximum number of replicas created in addition to the leader from amongst the nodes currently running That's not consistent with the original definition (http://wiki.apache.org/solr/NewSolrCloudDesign), the way the state is represented in clusterstate, or the way others use the term such as in hbase/HDFS, cassandra, oracle, etc. The important part is how many times the data is stored (the replication factor), and things like leaders are more of an implementation detail. Luckily we don't yet store this in the cluster, so there's no back compat issue with existing clusters. There's only a change when creating a new cluster, but that seems relatively minor. Given that, I'd lean toward changing this parameter to be in line with common usage. Per: this is unrelated to your patch of course - it just happened to come up here. Collection API: Allow multiple shards from one collection on the same Solr server - Key: SOLR-4114 URL: https://issues.apache.org/jira/browse/SOLR-4114 Project: Solr Issue Type: New Feature Components: multicore, SolrCloud Affects Versions: 4.0 Environment: Solr 4.0.0 release Reporter: Per Steffensen Assignee: Per Steffensen Labels: collection-api, multicore, shard, shard-allocation Attachments: SOLR-4114.patch We should support running multiple shards from one collection on the same Solr server - the run a collection with 8 shards on a 4 Solr server cluster (each Solr server running 2 shards). Performance tests at our side has shown that this is a good idea, and it is also a good idea for easy elasticity later on - it is much easier to move an entire existing shards from one Solr server to another one that just joined the cluter than it is to split an exsiting shard among the Solr that used to run it and the new Solr. See dev mailing list discussion Multiple shards for one collection on the same Solr server -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4114) Collection API: Allow multiple shards from one collection on the same Solr server
[ https://issues.apache.org/jira/browse/SOLR-4114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13505472#comment-13505472 ] Per Steffensen commented on SOLR-4114: -- bq. This fix belongs with the issue that fixed delete and reload - I'm going to fix it there. Yes of course, it is just hard for me to split up the patch, because it is all needed for the tests to be green. But commit-wise it belongs to the other issue. Collection API: Allow multiple shards from one collection on the same Solr server - Key: SOLR-4114 URL: https://issues.apache.org/jira/browse/SOLR-4114 Project: Solr Issue Type: New Feature Components: multicore, SolrCloud Affects Versions: 4.0 Environment: Solr 4.0.0 release Reporter: Per Steffensen Assignee: Per Steffensen Labels: collection-api, multicore, shard, shard-allocation Attachments: SOLR-4114.patch We should support running multiple shards from one collection on the same Solr server - the run a collection with 8 shards on a 4 Solr server cluster (each Solr server running 2 shards). Performance tests at our side has shown that this is a good idea, and it is also a good idea for easy elasticity later on - it is much easier to move an entire existing shards from one Solr server to another one that just joined the cluter than it is to split an exsiting shard among the Solr that used to run it and the new Solr. See dev mailing list discussion Multiple shards for one collection on the same Solr server -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-4114) Collection API: Allow multiple shards from one collection on the same Solr server
[ https://issues.apache.org/jira/browse/SOLR-4114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13505472#comment-13505472 ] Per Steffensen edited comment on SOLR-4114 at 11/28/12 2:28 PM: bq. This fix belongs with the issue that fixed delete and reload - I'm going to fix it there. Yes of course, it is just hard for me to split up the patch, because it is all needed for the tests to be green - and I really want to give you a patch fitting on top of a certain revision where all tests are green if you add the patch. But commit-wise it belongs to the other issue. was (Author: steff1193): bq. This fix belongs with the issue that fixed delete and reload - I'm going to fix it there. Yes of course, it is just hard for me to split up the patch, because it is all needed for the tests to be green. But commit-wise it belongs to the other issue. Collection API: Allow multiple shards from one collection on the same Solr server - Key: SOLR-4114 URL: https://issues.apache.org/jira/browse/SOLR-4114 Project: Solr Issue Type: New Feature Components: multicore, SolrCloud Affects Versions: 4.0 Environment: Solr 4.0.0 release Reporter: Per Steffensen Assignee: Per Steffensen Labels: collection-api, multicore, shard, shard-allocation Attachments: SOLR-4114.patch We should support running multiple shards from one collection on the same Solr server - the run a collection with 8 shards on a 4 Solr server cluster (each Solr server running 2 shards). Performance tests at our side has shown that this is a good idea, and it is also a good idea for easy elasticity later on - it is much easier to move an entire existing shards from one Solr server to another one that just joined the cluter than it is to split an exsiting shard among the Solr that used to run it and the new Solr. See dev mailing list discussion Multiple shards for one collection on the same Solr server -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3926) solrj should support better way of finding active sorts
[ https://issues.apache.org/jira/browse/SOLR-3926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13505483#comment-13505483 ] Eirik Lygre commented on SOLR-3926: --- I'll take the blame for guiding Yonik down the Map-path; at the time (while parsing the sort-field), returning a LinkedHashMap was an easy way to achieve the business objectives. Then, as the idea developed, it became less so. Anyway, that's why we review, right? Here is an extended view of my current implementation. It will probably not be like this, ref questions below :-) {code} public String getSortField(); public SolrQuery setSorts(ListSortClause value); public SolrQuery clearSorts(); public ListSortClause getSorts(); public SolrQuery setSort(SortClause sortClause); public SolrQuery addSort(SortClause sortClause); public SolrQuery addOrUpdateSort(SortClause sortClause); public SolrQuery removeSort(String itemName); public static class SortClause { public static SortClause create (String item, ORDER order); public static SortClause create (String item, String order) public static SortClause asc (String item); public static SortClause desc (String item); public String getItem(); public ORDER getOrder(); } {code} Some questions, illustrated by code examples. Some questions relate to apis shown above, and are REMOVE? questions; some questions relate to apis *not* shown above, and are ADD? questions. Note that some of the examples use stuff from other {code} // Usage, per the api above query.setSort(SolrQuery.SortClause.desc(rating)); query.setSort(SolrQuery.SortClause.create(rating, SolrQuery.ORDER.desc)); query.setSort(SolrQuery.SortClause.create(rating, SortQuery.ORDER.valueOf(desc))); query.setSort(SolrQuery.SortClause.create(rating, asc)); query.remove(rating); {code} I want to retain query.removeSort(String), because that's really the use case (remove sort based on item name, ignoring ordering). I'm not really sure about query.removeSort(SortClause), which does in fact only use the item name, but it would be symmetrical to the add-functions. {code} // Q1: Should we REMOVE query.removeSort (String) query.addSort(new SolrQuery.SortClause(rating, SolrQuery.ORDER.desc)); query.addSort(new SolrQuery.SortClause(price, SolrQuery.ORDER.asc)); query.removeSort(rating); // Q2: Should we ADD query.removeSort(SortClause)? query.addSort(new SolrQuery.SortClause(rating, SolrQuery.ORDER.desc)); query.addSort(new SolrQuery.SortClause(price, SolrQuery.ORDER.asc)); query.removeSort(new SolrQuery.SortClause(price, SolrQuery.ORDER.desc)); // Remove irregardless of order {code} We might build convenience functions query.xxxSort (String, order) and query.xxxSort (String,String) as shown below. It would make usage simpler, but come with a footprint. The SortClause.asc(), .desc() and .create() factory functions described below make this less needed, I think: {code} // Q3: Should we ADD convenience functions query.xxxSort (String, order) query.addSort(price, SolrQuery.ORDER.asc); // Q4: Should we ADD convenience functions query.xxxSort (String, String) query.addSort(price, asc); {code} The api currently has convenience functions for creating SortClause. The functions asc() and desc() make it easier (and more compact) to create SortClause. The create() functions are there for symmetry (always use static methods instead of constructors). The constructors aren't public, but maybe they should be? {code} // Q5: Should we REMOVE asc() and desc() convenience factory methods: query.setSort(SolrQuery.SortClause.desc(rating)); query.setSort(SolrQuery.SortClause.asc(rating)); // Q6: Should we REMOVE create(String,ORDER) convenience factory method (use constructor instead) query.setSort(SolrQuery.SortClause.create(rating, SolrQuery.ORDER.desc)); query.setSort(SolrQuery.SortClause.create(rating, SolrQuery.ORDER.valueOf(desc))); // Q7:Should we REMOVE create(String,ORDER) convenience factory method (Complements Q5, when the order is in fact a string) query.setSort(SolrQuery.SortClause.create(rating, desc)); // Q8: Should we ADD a simple constructor, typically instead of Q5-Q7? query.setSort(new SolrQuery.SortClause(rating, SolrQuery.ORDER.desc)); query.setSort(new SolrQuery.SortClause(rating, SolrQuery.ORDER.valueOf(desc))); {code} A couple of other items: Q9: Currently, SortClause is an inner class of SolrQuery. Let me know if this is an issue Q10: What the heck do we call the thing to sort. I don't want to call it a field, since it can be many other things. I've chosen to call it an item, but is there another, better name? Q11: Should we have SortClause.hashCode() and SortClause.equals()? solrj should support better way of finding active sorts --- Key: SOLR-3926 URL: https://issues.apache.org/jira/browse/SOLR-3926
[jira] [Commented] (SOLR-4028) When using ZK chroot, it would be nice if Solr would create the initial path when it doesn't exist.
[ https://issues.apache.org/jira/browse/SOLR-4028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13505493#comment-13505493 ] Yonik Seeley commented on SOLR-4028: bq. I think I see the issue here, the problem would be if someone mistype the initial path, instead of throwing exceptions and stopping, we would be creating a new path and probably hiding an error. That can go the other direction too? A config could be created under /solr and then someone could try to join it by forgetting to specify that root in zkHost. bq. Only create the initial path when bootstrap_conf is true (or bootstrap_confdir). As long as we need some sort of explicit bootstrap, that seems reasonable. bq. Add a new parameter to force creation, something like –DzkHost.create=true. Anything that creates a skeleton layout of a new cluster should work the same (auto-create the rot if it doesn't exist). ZkCLI -cmd bootstrap for example. Not sure if there are others. When using ZK chroot, it would be nice if Solr would create the initial path when it doesn't exist. --- Key: SOLR-4028 URL: https://issues.apache.org/jira/browse/SOLR-4028 Project: Solr Issue Type: Improvement Components: SolrCloud Reporter: Tomás Fernández Löbbe Priority: Minor Attachments: SOLR-4028.patch I think this would make it easier to test and develop with SolrCloud, in order to start with a fresh ZK directory now the approach is to delete ZK data, with this improvement one could just add a chroot to the zkHost like: java -DzkHost=localhost:2181/testXYZ -jar start.jar Right now this is possible but you have to manually create the initial path. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4114) Collection API: Allow multiple shards from one collection on the same Solr server
[ https://issues.apache.org/jira/browse/SOLR-4114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13505496#comment-13505496 ] Per Steffensen commented on SOLR-4114: -- bq. Per: this is unrelated to your patch of course - it just happened to come up here. No problem. I could make it as part of this patch if you want, but Im not sure I agree with your way of interpreting the term replication-factor. I would expect replication-factor to say something about how many times the data is REPLICATED. If I run with only one copy of the data for each slice, I would logically say that my data is not replicated, and that matches the replication-factor of 0. I have used HDFS and HBase a little a year or so ago, but Im not sure what meaning they put into the term replica. I've also worked a lot with ElasticSearch (which I believe is more of a pendant to Solr) and in ElasticSearch I believe they use the term replica as the number of ADDITIONAL copies of the data - equal to your/our current implementation in Solr. Collection API: Allow multiple shards from one collection on the same Solr server - Key: SOLR-4114 URL: https://issues.apache.org/jira/browse/SOLR-4114 Project: Solr Issue Type: New Feature Components: multicore, SolrCloud Affects Versions: 4.0 Environment: Solr 4.0.0 release Reporter: Per Steffensen Assignee: Per Steffensen Labels: collection-api, multicore, shard, shard-allocation Attachments: SOLR-4114.patch We should support running multiple shards from one collection on the same Solr server - the run a collection with 8 shards on a 4 Solr server cluster (each Solr server running 2 shards). Performance tests at our side has shown that this is a good idea, and it is also a good idea for easy elasticity later on - it is much easier to move an entire existing shards from one Solr server to another one that just joined the cluter than it is to split an exsiting shard among the Solr that used to run it and the new Solr. See dev mailing list discussion Multiple shards for one collection on the same Solr server -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Reopened] (SOLR-4055) Remove/Reload the collection has the thread safe issue.
[ https://issues.apache.org/jira/browse/SOLR-4055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Miller reopened SOLR-4055: --- See SOLR-4114 - we missed a spot. Remove/Reload the collection has the thread safe issue. --- Key: SOLR-4055 URL: https://issues.apache.org/jira/browse/SOLR-4055 Project: Solr Issue Type: Bug Components: SolrCloud Affects Versions: 4.0-ALPHA, 4.0-BETA, 4.0 Environment: Solr cloud Reporter: Raintung Li Assignee: Mark Miller Fix For: 4.1, 5.0 Attachments: patch-4055 OverseerCollectionProcessor class for collectionCmd method has thread safe issue. The major issue is ModifiableSolrParams params instance will deliver into other thread use(HttpShardHandler.submit). Modify parameter will affect the other threads the correct parameter. In the method collectionCmd , change the value params.set(CoreAdminParams.CORE, node.getStr(ZkStateReader.CORE_NAME_PROP)); , that occur send the http request thread will get the wrong core name. The result is that can't delete/reload the right core. The easy fix is clone the ModifiableSolrParams for every request. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4028) When using ZK chroot, it would be nice if Solr would create the initial path when it doesn't exist.
[ https://issues.apache.org/jira/browse/SOLR-4028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13505500#comment-13505500 ] Tomás Fernández Löbbe commented on SOLR-4028: - bq. That can go the other direction too? A config could be created under /solr and then someone could try to join it by forgetting to specify that root in zkHost. This can happen today too bq. Anything that creates a skeleton layout of a new cluster should work the same (auto-create the rot if it doesn't exist). ZkCLI -cmd bootstrap for example. Not sure if there are others. Yes, I agree When using ZK chroot, it would be nice if Solr would create the initial path when it doesn't exist. --- Key: SOLR-4028 URL: https://issues.apache.org/jira/browse/SOLR-4028 Project: Solr Issue Type: Improvement Components: SolrCloud Reporter: Tomás Fernández Löbbe Priority: Minor Attachments: SOLR-4028.patch I think this would make it easier to test and develop with SolrCloud, in order to start with a fresh ZK directory now the approach is to delete ZK data, with this improvement one could just add a chroot to the zkHost like: java -DzkHost=localhost:2181/testXYZ -jar start.jar Right now this is possible but you have to manually create the initial path. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4117) IO error while trying to get the size of the Directory
[ https://issues.apache.org/jira/browse/SOLR-4117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13505507#comment-13505507 ] Markus Jelsma commented on SOLR-4117: - I have another node now logging the same exception for a core that has 0 docs which is not the leader but clusterstate says the node is active and does not attempt recovery. To my surprise it has two index.NUMBER directories of different sizes and index.properties points to the largest directory. The node won't come back up properly. Search and indexing works but accessing the GUI is impossible: {code} 2012-11-28 14:50:00,026 ERROR [solr.servlet.SolrDispatchFilter] - [http-8080-exec-6] - : null:org.apache.solr.common.SolrException: Error handling 'status' action at org.apache.solr.handler.admin.CoreAdminHandler.handleStatusAction(CoreAdminHandler.java:724) at org.apache.solr.handler.admin.CoreAdminHandler.handleRequestBody(CoreAdminHandler.java:157) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:144) at org.apache.solr.servlet.SolrDispatchFilter.handleAdminRequest(SolrDispatchFilter.java:372) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:181) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293) at org.apache.coyote.http11.Http11NioProcessor.process(Http11NioProcessor.java:889) at org.apache.coyote.http11.Http11NioProtocol$Http11ConnectionHandler.process(Http11NioProtocol.java:744) at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.run(NioEndpoint.java:2274) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) Caused by: org.apache.solr.common.SolrException: java.util.concurrent.RejectedExecutionException at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1674) at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1330) at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1265) at org.apache.solr.handler.admin.CoreAdminHandler.getCoreStatus(CoreAdminHandler.java:996) at org.apache.solr.handler.admin.CoreAdminHandler.handleStatusAction(CoreAdminHandler.java:710) ... 18 more Caused by: java.util.concurrent.RejectedExecutionException at java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:1768) at java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:767) at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:658) at java.util.concurrent.AbstractExecutorService.submit(AbstractExecutorService.java:92) at java.util.concurrent.Executors$DelegatedExecutorService.submit(Executors.java:603) at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1605) ... 22 more {code} IO error while trying to get the size of the Directory -- Key: SOLR-4117 URL: https://issues.apache.org/jira/browse/SOLR-4117 Project: Solr Issue Type: Bug Components: SolrCloud Affects Versions: 5.0 Environment: 5.0.0.2012.11.28.10.42.06 Debian Squeeze, Tomcat 6, Sun Java 6, 10 nodes, 10 shards, rep. factor 2. Reporter: Markus Jelsma Assignee: Mark Miller Priority: Minor Fix For: 5.0 With SOLR-4032 fixed we see other issues when randomly taking down nodes (nicely via tomcat restart) while indexing a few million web pages from Hadoop. We do make sure that at least one node is up for a shard but due to recovery issues it may not be live. One node seems to work but generates IO errors in the log and ZookeeperExeption in the GUI. In the GUI we only see: {code} SolrCore Initialization Failures openindex_f: org.apache.solr.common.cloud.ZooKeeperException:org.apache.solr.common.cloud.ZooKeeperException: Please check your logs for more
[jira] [Commented] (SOLR-4114) Collection API: Allow multiple shards from one collection on the same Solr server
[ https://issues.apache.org/jira/browse/SOLR-4114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13505506#comment-13505506 ] Per Steffensen commented on SOLR-4114: -- Another more urgent problem (for me) is that I need to do another change to the Solr Collection API, before we can use it as a replacement for what we already do in our project (where we create each shard one by one in OUR code). We split our set of Solr servers into two subsets - Data-Solrs and Search-Solrs. The Search-Solrs are not supposed to carry any data and therefore to be occupied by indexing. Search-Solr instead play the role of receiving queries from the outside, sub-quering the Data-Solrs and combining the final total response to the outside. Data-Solrs are where we create the data-carrying collections. Data-Solrs need more CPU and IO-capabilities while Search-Solrs need more RAM - hence the splitup. Therefore I need to be able to provide a list of Solrs to the create operation of the Solr Collection API. The shards are then only allowed to be spread shards for the collection over the Solrs in this list - default list could be all Solrs. As this list we, in our Solr-based projbect, will give our list of Data-Solrs. Can I add such a feature to this SOLR-4114 and include it in a combined patch, or do you prefer another ticket for this change? I can create another issue but provide a combined patch. Are you interrested in such a feature at all? That is, a feature where the create operation takes a list of Solrs to spread the created shards over. Collection API: Allow multiple shards from one collection on the same Solr server - Key: SOLR-4114 URL: https://issues.apache.org/jira/browse/SOLR-4114 Project: Solr Issue Type: New Feature Components: multicore, SolrCloud Affects Versions: 4.0 Environment: Solr 4.0.0 release Reporter: Per Steffensen Assignee: Per Steffensen Labels: collection-api, multicore, shard, shard-allocation Attachments: SOLR-4114.patch We should support running multiple shards from one collection on the same Solr server - the run a collection with 8 shards on a 4 Solr server cluster (each Solr server running 2 shards). Performance tests at our side has shown that this is a good idea, and it is also a good idea for easy elasticity later on - it is much easier to move an entire existing shards from one Solr server to another one that just joined the cluter than it is to split an exsiting shard among the Solr that used to run it and the new Solr. See dev mailing list discussion Multiple shards for one collection on the same Solr server -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-4114) Collection API: Allow multiple shards from one collection on the same Solr server
[ https://issues.apache.org/jira/browse/SOLR-4114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13505506#comment-13505506 ] Per Steffensen edited comment on SOLR-4114 at 11/28/12 2:54 PM: Another more urgent problem (for me) is that I need to do another change to the Solr Collection API, before we can use it as a replacement for what we already do in our project (where we create each shard one by one in OUR code). We split our set of Solr servers into two subsets - Data-Solrs and Search-Solrs. The Search-Solrs are not supposed to carry any data and therefore to be occupied by indexing. Search-Solr instead play the role of receiving queries from the outside, sub-quering the Data-Solrs and combining the final total response to the outside. Data-Solrs are where we create the data-carrying collections. Data-Solrs need more CPU and IO-capabilities while Search-Solrs need more RAM - hence the splitup. Therefore I need to be able to provide a list of Solrs to the create operation of the Solr Collection API. The shards of the collection to be created are then only allowed to be spread over the Solrs in this list - default list could be all Solrs. As this list we, in our Solr-based projbect, will give our list of Data-Solrs. Can I add such a feature to this SOLR-4114 and include it in a combined patch, or do you prefer another ticket for this change? I can create another issue but provide a combined patch. Are you interrested in such a feature at all? That is, a feature where the create operation takes a list of Solrs to spread the created shards over. was (Author: steff1193): Another more urgent problem (for me) is that I need to do another change to the Solr Collection API, before we can use it as a replacement for what we already do in our project (where we create each shard one by one in OUR code). We split our set of Solr servers into two subsets - Data-Solrs and Search-Solrs. The Search-Solrs are not supposed to carry any data and therefore to be occupied by indexing. Search-Solr instead play the role of receiving queries from the outside, sub-quering the Data-Solrs and combining the final total response to the outside. Data-Solrs are where we create the data-carrying collections. Data-Solrs need more CPU and IO-capabilities while Search-Solrs need more RAM - hence the splitup. Therefore I need to be able to provide a list of Solrs to the create operation of the Solr Collection API. The shards are then only allowed to be spread shards for the collection over the Solrs in this list - default list could be all Solrs. As this list we, in our Solr-based projbect, will give our list of Data-Solrs. Can I add such a feature to this SOLR-4114 and include it in a combined patch, or do you prefer another ticket for this change? I can create another issue but provide a combined patch. Are you interrested in such a feature at all? That is, a feature where the create operation takes a list of Solrs to spread the created shards over. Collection API: Allow multiple shards from one collection on the same Solr server - Key: SOLR-4114 URL: https://issues.apache.org/jira/browse/SOLR-4114 Project: Solr Issue Type: New Feature Components: multicore, SolrCloud Affects Versions: 4.0 Environment: Solr 4.0.0 release Reporter: Per Steffensen Assignee: Per Steffensen Labels: collection-api, multicore, shard, shard-allocation Attachments: SOLR-4114.patch We should support running multiple shards from one collection on the same Solr server - the run a collection with 8 shards on a 4 Solr server cluster (each Solr server running 2 shards). Performance tests at our side has shown that this is a good idea, and it is also a good idea for easy elasticity later on - it is much easier to move an entire existing shards from one Solr server to another one that just joined the cluter than it is to split an exsiting shard among the Solr that used to run it and the new Solr. See dev mailing list discussion Multiple shards for one collection on the same Solr server -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-4114) Collection API: Allow multiple shards from one collection on the same Solr server
[ https://issues.apache.org/jira/browse/SOLR-4114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Per Steffensen updated SOLR-4114: - Attachment: SOLR-4114.patch New patch SOLR-4114.patch attached (not including the only-spread-shards-over-solrs-mentioned-in-provided-list thingy) New, compared to the first patch: * maxShardsPerNode implemented * Tests (BasicDistributedZkTest.testCollectionAPI) now tests additional stuff ** That the expected number of shards are actually created ** That if there is not room for all the shards due to the provided maxShardsPerNode, nothing is created Collection API: Allow multiple shards from one collection on the same Solr server - Key: SOLR-4114 URL: https://issues.apache.org/jira/browse/SOLR-4114 Project: Solr Issue Type: New Feature Components: multicore, SolrCloud Affects Versions: 4.0 Environment: Solr 4.0.0 release Reporter: Per Steffensen Assignee: Per Steffensen Labels: collection-api, multicore, shard, shard-allocation Attachments: SOLR-4114.patch, SOLR-4114.patch We should support running multiple shards from one collection on the same Solr server - the run a collection with 8 shards on a 4 Solr server cluster (each Solr server running 2 shards). Performance tests at our side has shown that this is a good idea, and it is also a good idea for easy elasticity later on - it is much easier to move an entire existing shards from one Solr server to another one that just joined the cluter than it is to split an exsiting shard among the Solr that used to run it and the new Solr. See dev mailing list discussion Multiple shards for one collection on the same Solr server -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4574) FunctionQuery ValueSource value computed twice per document
[ https://issues.apache.org/jira/browse/LUCENE-4574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13505528#comment-13505528 ] David Smiley commented on LUCENE-4574: -- But Robert, if I simply change the scenario slightly such that there is more than one sort field, TopScoreDocCollector (the specific collector I think you actually meant to suggest) is no longer suitable. Is your concern that the overhead might be too much? It seems so small to me; it only caches the last docid score pair. My patch only did the score caching at for OneComparatorScoreing[No]MaxScoreCollector but after further experimentation by modifying the test to sort on an additional field, it appears that all subclasses of TopFieldCollector are affected. FunctionQuery ValueSource value computed twice per document --- Key: LUCENE-4574 URL: https://issues.apache.org/jira/browse/LUCENE-4574 Project: Lucene - Core Issue Type: Bug Components: core/search Affects Versions: 4.0, 4.1 Reporter: David Smiley Attachments: LUCENE-4574.patch, Test_for_LUCENE-4574.patch I was working on a custom ValueSource and did some basic profiling and debugging to see if it was being used optimally. To my surprise, the value was being fetched twice per document in a row. This computation isn't exactly cheap to calculate so this is a big problem. I was able to work-around this problem trivially on my end by caching the last value with corresponding docid in my FunctionValues implementation. Here is an excerpt of the code path to the first execution: {noformat} at org.apache.lucene.queries.function.docvalues.DoubleDocValues.floatVal(DoubleDocValues.java:48) at org.apache.lucene.queries.function.FunctionQuery$AllScorer.score(FunctionQuery.java:153) at org.apache.lucene.search.TopFieldCollector$OneComparatorScoringMaxScoreCollector.collect(TopFieldCollector.java:291) at org.apache.lucene.search.Scorer.score(Scorer.java:62) at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:588) at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:280) {noformat} And here is the 2nd call: {noformat} at org.apache.lucene.queries.function.docvalues.DoubleDocValues.floatVal(DoubleDocValues.java:48) at org.apache.lucene.queries.function.FunctionQuery$AllScorer.score(FunctionQuery.java:153) at org.apache.lucene.search.ScoreCachingWrappingScorer.score(ScoreCachingWrappingScorer.java:56) at org.apache.lucene.search.FieldComparator$RelevanceComparator.copy(FieldComparator.java:951) at org.apache.lucene.search.TopFieldCollector$OneComparatorScoringMaxScoreCollector.collect(TopFieldCollector.java:312) at org.apache.lucene.search.Scorer.score(Scorer.java:62) at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:588) at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:280) {noformat} The 2nd call appears to use some score caching mechanism, which is all well and good, but that same mechanism wasn't used in the first call so there's no cached value to retrieve. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4117) IO error while trying to get the size of the Directory
[ https://issues.apache.org/jira/browse/SOLR-4117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13505530#comment-13505530 ] Eks Dev commented on SOLR-4117: --- fwiw, we *think* we observed the following problem in simple master slave setup with NRTCachingDirectory... I am not sure it has something to do with issue, because ewe did not see this exception, anyhow on replication, slave gets the index from master and works fine, then on: 1. graceful restart, the world looks fine 2. kill -9 or such, solr does not start because an index gets corrupt (should actually not happen) We speculate that solr now does replication directly to Directory implementation and does not ensure that replicated files get fsck-ed completely after replication. As far as I remember, replication was going to /temp (disk) and than moving files if all went ok. Working under assumption that everything is already persisted. Maybe this invariant does not hold any more and some explicit fsck is needed for caching directories? I might be completely wrong, we just observed symptoms in not really debug-friendly environment IO error while trying to get the size of the Directory -- Key: SOLR-4117 URL: https://issues.apache.org/jira/browse/SOLR-4117 Project: Solr Issue Type: Bug Components: SolrCloud Affects Versions: 5.0 Environment: 5.0.0.2012.11.28.10.42.06 Debian Squeeze, Tomcat 6, Sun Java 6, 10 nodes, 10 shards, rep. factor 2. Reporter: Markus Jelsma Assignee: Mark Miller Priority: Minor Fix For: 5.0 With SOLR-4032 fixed we see other issues when randomly taking down nodes (nicely via tomcat restart) while indexing a few million web pages from Hadoop. We do make sure that at least one node is up for a shard but due to recovery issues it may not be live. One node seems to work but generates IO errors in the log and ZookeeperExeption in the GUI. In the GUI we only see: {code} SolrCore Initialization Failures openindex_f: org.apache.solr.common.cloud.ZooKeeperException:org.apache.solr.common.cloud.ZooKeeperException: Please check your logs for more information {code} and in the log we only see the following exception: {code} 2012-11-28 11:47:26,652 ERROR [solr.handler.ReplicationHandler] - [http-8080-exec-28] - : IO error while trying to get the size of the Directory:org.apache.lucene.store.NoSuchDirectoryException: directory '/opt/solr/cores/shard_f/data/index' does not exist at org.apache.lucene.store.FSDirectory.listAll(FSDirectory.java:217) at org.apache.lucene.store.FSDirectory.listAll(FSDirectory.java:240) at org.apache.lucene.store.NRTCachingDirectory.listAll(NRTCachingDirectory.java:132) at org.apache.solr.core.DirectoryFactory.sizeOfDirectory(DirectoryFactory.java:146) at org.apache.solr.handler.ReplicationHandler.getIndexSize(ReplicationHandler.java:472) at org.apache.solr.handler.ReplicationHandler.getReplicationDetails(ReplicationHandler.java:568) at org.apache.solr.handler.ReplicationHandler.handleRequestBody(ReplicationHandler.java:213) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:144) at org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:240) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1830) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:476) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:276) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293) at org.apache.coyote.http11.Http11NioProcessor.process(Http11NioProcessor.java:889) at org.apache.coyote.http11.Http11NioProtocol$Http11ConnectionHandler.process(Http11NioProtocol.java:744) at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.run(NioEndpoint.java:2274) at
[jira] [Commented] (SOLR-4114) Collection API: Allow multiple shards from one collection on the same Solr server
[ https://issues.apache.org/jira/browse/SOLR-4114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13505531#comment-13505531 ] Mark Miller commented on SOLR-4114: --- When grabbing the params fix I noticed you set the data dir to something like shardname+_data - that's not strictly necessary right? Since each core should have it's own instance dir? I've been thinking about how to set custom datadirs with this api - it would be nice to be able to specify the data dir - and in some cases perhaps base it on something like the core name rather than just some static string. But have you found it 'necessary' with your work? Collection API: Allow multiple shards from one collection on the same Solr server - Key: SOLR-4114 URL: https://issues.apache.org/jira/browse/SOLR-4114 Project: Solr Issue Type: New Feature Components: multicore, SolrCloud Affects Versions: 4.0 Environment: Solr 4.0.0 release Reporter: Per Steffensen Assignee: Per Steffensen Labels: collection-api, multicore, shard, shard-allocation Attachments: SOLR-4114.patch, SOLR-4114.patch We should support running multiple shards from one collection on the same Solr server - the run a collection with 8 shards on a 4 Solr server cluster (each Solr server running 2 shards). Performance tests at our side has shown that this is a good idea, and it is also a good idea for easy elasticity later on - it is much easier to move an entire existing shards from one Solr server to another one that just joined the cluter than it is to split an exsiting shard among the Solr that used to run it and the new Solr. See dev mailing list discussion Multiple shards for one collection on the same Solr server -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4114) Collection API: Allow multiple shards from one collection on the same Solr server
[ https://issues.apache.org/jira/browse/SOLR-4114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13505533#comment-13505533 ] Radim Kolar commented on SOLR-4114: --- could not you do same thing as Elastic Search. Build index with number of shards (initial number is 5). If there is 1 machine in cluster, then all shards are on this machine. If you add more machines, they will move to other machines. It is way simple for administration. Collection API: Allow multiple shards from one collection on the same Solr server - Key: SOLR-4114 URL: https://issues.apache.org/jira/browse/SOLR-4114 Project: Solr Issue Type: New Feature Components: multicore, SolrCloud Affects Versions: 4.0 Environment: Solr 4.0.0 release Reporter: Per Steffensen Assignee: Per Steffensen Labels: collection-api, multicore, shard, shard-allocation Attachments: SOLR-4114.patch, SOLR-4114.patch We should support running multiple shards from one collection on the same Solr server - the run a collection with 8 shards on a 4 Solr server cluster (each Solr server running 2 shards). Performance tests at our side has shown that this is a good idea, and it is also a good idea for easy elasticity later on - it is much easier to move an entire existing shards from one Solr server to another one that just joined the cluter than it is to split an exsiting shard among the Solr that used to run it and the new Solr. See dev mailing list discussion Multiple shards for one collection on the same Solr server -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
RE: dismax vs edismax
I absolutely agree with your first point. For second point, I agree for defType (so that it affects 'q') but not all the other numerous spots. As far as pushing down into Lucene; if you haven't noticed, Solr now has its own copy of the Lucene's query parser so it can customize it. This is a good thing that was inevitable IMO. ~ David From: Jack Krupansky-2 [via Lucene] [ml-node+s472066n4022841...@n3.nabble.com] Sent: Wednesday, November 28, 2012 1:07 AM To: Smiley, David W. Subject: Re: dismax vs edismax My view is that if we simply added an option to edismax to restrict the syntax to the very limited syntax of dismax, then we could have one, common xdismax query parser. And then, why not simply rename the current Solr query parser to classic and make the new xdismax be the default Solr query parser. And then... push a lot of the so-called Solr-specific features down into the Lucene query parser (abstracting away the specifics of Solr schema, Solr plugin, Solr parameter format, etc.) and then we can have one, unified query parser for Lucene and Solr. But... not everyone is persuaded! -- Jack Krupansky -Original Message- From: David Smiley (@MITRE.org) Sent: Tuesday, November 27, 2012 11:43 PM To: [hidden email]UrlBlockedError.aspx Subject: dismax vs edismax It was my hope that by now, the dismax edismax distinction would be a thing of the past, such that we'd simply call this by one name, simply dismax. From memories of various JIRA commentary, Jan wants this too and made great progress enhancing edismax, but Hoss pushed back on edismax overtaking dismax as the one new dismax. I see this as very unfortunate, as having both complicates things and makes it harder to write them in books ;-) I'd love to simply say dismax without having to say edismax or wonder if when someone said dismax they meant edismax, etc. Does anyone see this changing / progressing? ~ David - Author: http://www.packtpub.com/apache-solr-3-enterprise-search-server/book -- View this message in context: http://lucene.472066.n3.nabble.com/dismax-vs-edismax-tp4022834.html Sent from the Lucene - Java Developer mailing list archive at Nabble.com. - To unsubscribe, e-mail: [hidden email]UrlBlockedError.aspx For additional commands, e-mail: [hidden email]UrlBlockedError.aspx - To unsubscribe, e-mail: [hidden email]UrlBlockedError.aspx For additional commands, e-mail: [hidden email]UrlBlockedError.aspx If you reply to this email, your message will be added to the discussion below: http://lucene.472066.n3.nabble.com/dismax-vs-edismax-tp4022834p4022841.html To unsubscribe from dismax vs edismax, click herehttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=4022834code=RFNNSUxFWUBtaXRyZS5vcmd8NDAyMjgzNHwxMDE2NDI2OTUw. NAMLhttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml - Author: http://www.packtpub.com/apache-solr-3-enterprise-search-server/book -- View this message in context: http://lucene.472066.n3.nabble.com/dismax-vs-edismax-tp4022834p4022957.html Sent from the Lucene - Java Developer mailing list archive at Nabble.com.
[jira] [Commented] (SOLR-4114) Collection API: Allow multiple shards from one collection on the same Solr server
[ https://issues.apache.org/jira/browse/SOLR-4114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13505535#comment-13505535 ] Mark Miller commented on SOLR-4114: --- bq. Can I add such a feature to this SOLR-4114 and include it in a combined patch, or do you prefer another ticket for this change? My preference would be a new issue. If it has to be done as one piece, I would wait for this to go in before supplying the patch for that issue. Or supply a patch for that issue and note that it requires applying this patch first. Combining multiple issues into one patch just makes it more difficult to get it in generally. Collection API: Allow multiple shards from one collection on the same Solr server - Key: SOLR-4114 URL: https://issues.apache.org/jira/browse/SOLR-4114 Project: Solr Issue Type: New Feature Components: multicore, SolrCloud Affects Versions: 4.0 Environment: Solr 4.0.0 release Reporter: Per Steffensen Assignee: Per Steffensen Labels: collection-api, multicore, shard, shard-allocation Attachments: SOLR-4114.patch, SOLR-4114.patch We should support running multiple shards from one collection on the same Solr server - the run a collection with 8 shards on a 4 Solr server cluster (each Solr server running 2 shards). Performance tests at our side has shown that this is a good idea, and it is also a good idea for easy elasticity later on - it is much easier to move an entire existing shards from one Solr server to another one that just joined the cluter than it is to split an exsiting shard among the Solr that used to run it and the new Solr. See dev mailing list discussion Multiple shards for one collection on the same Solr server -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4114) Collection API: Allow multiple shards from one collection on the same Solr server
[ https://issues.apache.org/jira/browse/SOLR-4114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13505539#comment-13505539 ] Mark Miller commented on SOLR-4114: --- bq. you add more machines, they will move to other machines. Personally, I'm not really sold on this auto re balancing idea. I'd prefer the user had to explicitly make these moves. Collection API: Allow multiple shards from one collection on the same Solr server - Key: SOLR-4114 URL: https://issues.apache.org/jira/browse/SOLR-4114 Project: Solr Issue Type: New Feature Components: multicore, SolrCloud Affects Versions: 4.0 Environment: Solr 4.0.0 release Reporter: Per Steffensen Assignee: Per Steffensen Labels: collection-api, multicore, shard, shard-allocation Attachments: SOLR-4114.patch, SOLR-4114.patch We should support running multiple shards from one collection on the same Solr server - the run a collection with 8 shards on a 4 Solr server cluster (each Solr server running 2 shards). Performance tests at our side has shown that this is a good idea, and it is also a good idea for easy elasticity later on - it is much easier to move an entire existing shards from one Solr server to another one that just joined the cluter than it is to split an exsiting shard among the Solr that used to run it and the new Solr. See dev mailing list discussion Multiple shards for one collection on the same Solr server -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-4117) IO error while trying to get the size of the Directory
[ https://issues.apache.org/jira/browse/SOLR-4117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13505530#comment-13505530 ] Eks Dev edited comment on SOLR-4117 at 11/28/12 3:27 PM: - fwiw, we *think* we observed the following problem in simple master slave setup with NRTCachingDirectory... I am not sure it has something to do with issue, because ewe did not see this exception, anyhow on replication, slave gets the index from master and works fine, then on: 1. graceful restart, the world looks fine 2. kill -9 or such, solr does not start because an index gets corrupt (should actually not happen) We speculate that solr now does replication directly to Directory implementation and does not ensure that replicated files get fsck-ed completely after replication. As far as I remember, replication was going to /temp (disk) and than moving files if all went ok. Working under assumption that everything is already persisted. Maybe this invariant does not hold any more and some explicit fsck is needed for caching directories? I might be completely wrong, we just observed symptoms in not really debug-friendly environment Here Exception after hard restart: Caused by: org.apache.solr.common.SolrException: Error opening new searcher at org.apache.solr.core.SolrCore.init(SolrCore.java:804) at org.apache.solr.core.SolrCore.init(SolrCore.java:618) at org.apache.solr.core.CoreContainer.createFromLocal(CoreContainer.java:973) at org.apache.solr.core.CoreContainer.create(CoreContainer.java:1003) ... 10 more Caused by: org.apache.solr.common.SolrException: Error opening new searcher at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:1441) at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1553) at org.apache.solr.core.SolrCore.init(SolrCore.java:779) ... 13 more Caused by: java.io.FileNotFoundException: ...\core0\data\index\segments_1 (The system cannot find the file specified) at java.io.RandomAccessFile.open(Native Method) at java.io.RandomAccessFile.init(RandomAccessFile.java:233) at org.apache.lucene.store.MMapDirectory.openInput(MMapDirectory.java:222) at org.apache.lucene.store.NRTCachingDirectory.openInput(NRTCachingDirectory.java:232) at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:281) at org.apache.lucene.index.StandardDirectoryReader$1.doBody(StandardDirectoryReader.java:56) at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:668) at org.apache.lucene.index.StandardDirectoryReader.open(StandardDirectoryReader.java:52) at org.apache.lucene.index.DirectoryReader.open(DirectoryReader.java:87) at org.apache.solr.core.StandardIndexReaderFactory.newReader(StandardIndexReaderFactory.java:34) at org.apache.solr.search.SolrIndexSearcher.init(SolrIndexSearcher.java:120) at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:1417) was (Author: eksdev): fwiw, we *think* we observed the following problem in simple master slave setup with NRTCachingDirectory... I am not sure it has something to do with issue, because ewe did not see this exception, anyhow on replication, slave gets the index from master and works fine, then on: 1. graceful restart, the world looks fine 2. kill -9 or such, solr does not start because an index gets corrupt (should actually not happen) We speculate that solr now does replication directly to Directory implementation and does not ensure that replicated files get fsck-ed completely after replication. As far as I remember, replication was going to /temp (disk) and than moving files if all went ok. Working under assumption that everything is already persisted. Maybe this invariant does not hold any more and some explicit fsck is needed for caching directories? I might be completely wrong, we just observed symptoms in not really debug-friendly environment IO error while trying to get the size of the Directory -- Key: SOLR-4117 URL: https://issues.apache.org/jira/browse/SOLR-4117 Project: Solr Issue Type: Bug Components: SolrCloud Affects Versions: 5.0 Environment: 5.0.0.2012.11.28.10.42.06 Debian Squeeze, Tomcat 6, Sun Java 6, 10 nodes, 10 shards, rep. factor 2. Reporter: Markus Jelsma Assignee: Mark Miller Priority: Minor Fix For: 5.0 With SOLR-4032 fixed we see other issues when randomly taking down nodes (nicely via tomcat restart) while indexing a few million web pages from Hadoop. We do make sure that at least one node is up for a shard but due to recovery issues it may not be live. One node seems to work but generates IO errors in the log and ZookeeperExeption in the GUI.
[jira] [Commented] (SOLR-4114) Collection API: Allow multiple shards from one collection on the same Solr server
[ https://issues.apache.org/jira/browse/SOLR-4114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13505541#comment-13505541 ] Jack Krupansky commented on SOLR-4114: -- I certainly think of replica as a copy of the ORIGINAL, which makes perfect sense in a master with n-slaves configuration, but in a fully distributed environment such as SolrCloud where the leader of a shard can vary over time and updates are distributed to all nodes all of the time, there is no longer the concept of an original copy of the data. If anything, the original data is the source data on the wire before it gets instantiated on each node. No node is truly the original. The terminology has this difficulty that it is only partially shared between the worlds of master/slave and the cloud. In master/slave, only the slaves are replicas and the master is the original, while in cloud ALL nodes are replicas since there are no originals. The leader is not a master copy of the data in the sense of master/slave. So, I guess I am semi-comfortable with replica referring to all instances of the data, but we do need to be careful to highlight the distinction between how the term replica is used in the world of master/slave vs. SolrCloud, especially since many Cloud users will be migrating from the world of master/slave. We also need to be careful not to refer to leader and replicas which implies that a leader is not a replica! Collection API: Allow multiple shards from one collection on the same Solr server - Key: SOLR-4114 URL: https://issues.apache.org/jira/browse/SOLR-4114 Project: Solr Issue Type: New Feature Components: multicore, SolrCloud Affects Versions: 4.0 Environment: Solr 4.0.0 release Reporter: Per Steffensen Assignee: Per Steffensen Labels: collection-api, multicore, shard, shard-allocation Attachments: SOLR-4114.patch, SOLR-4114.patch We should support running multiple shards from one collection on the same Solr server - the run a collection with 8 shards on a 4 Solr server cluster (each Solr server running 2 shards). Performance tests at our side has shown that this is a good idea, and it is also a good idea for easy elasticity later on - it is much easier to move an entire existing shards from one Solr server to another one that just joined the cluter than it is to split an exsiting shard among the Solr that used to run it and the new Solr. See dev mailing list discussion Multiple shards for one collection on the same Solr server -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4117) IO error while trying to get the size of the Directory
[ https://issues.apache.org/jira/browse/SOLR-4117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13505542#comment-13505542 ] Mark Miller commented on SOLR-4117: --- Do you mean fsync rather than fsck (isnt that a file system check?) That did change in that we are now using the Directory's sync method - but it *should* still work the same as before... 2 should not happen though - so we should dig in. I'm guessing it's not related to this issue, but we will see. IO error while trying to get the size of the Directory -- Key: SOLR-4117 URL: https://issues.apache.org/jira/browse/SOLR-4117 Project: Solr Issue Type: Bug Components: SolrCloud Affects Versions: 5.0 Environment: 5.0.0.2012.11.28.10.42.06 Debian Squeeze, Tomcat 6, Sun Java 6, 10 nodes, 10 shards, rep. factor 2. Reporter: Markus Jelsma Assignee: Mark Miller Priority: Minor Fix For: 5.0 With SOLR-4032 fixed we see other issues when randomly taking down nodes (nicely via tomcat restart) while indexing a few million web pages from Hadoop. We do make sure that at least one node is up for a shard but due to recovery issues it may not be live. One node seems to work but generates IO errors in the log and ZookeeperExeption in the GUI. In the GUI we only see: {code} SolrCore Initialization Failures openindex_f: org.apache.solr.common.cloud.ZooKeeperException:org.apache.solr.common.cloud.ZooKeeperException: Please check your logs for more information {code} and in the log we only see the following exception: {code} 2012-11-28 11:47:26,652 ERROR [solr.handler.ReplicationHandler] - [http-8080-exec-28] - : IO error while trying to get the size of the Directory:org.apache.lucene.store.NoSuchDirectoryException: directory '/opt/solr/cores/shard_f/data/index' does not exist at org.apache.lucene.store.FSDirectory.listAll(FSDirectory.java:217) at org.apache.lucene.store.FSDirectory.listAll(FSDirectory.java:240) at org.apache.lucene.store.NRTCachingDirectory.listAll(NRTCachingDirectory.java:132) at org.apache.solr.core.DirectoryFactory.sizeOfDirectory(DirectoryFactory.java:146) at org.apache.solr.handler.ReplicationHandler.getIndexSize(ReplicationHandler.java:472) at org.apache.solr.handler.ReplicationHandler.getReplicationDetails(ReplicationHandler.java:568) at org.apache.solr.handler.ReplicationHandler.handleRequestBody(ReplicationHandler.java:213) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:144) at org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:240) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1830) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:476) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:276) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293) at org.apache.coyote.http11.Http11NioProcessor.process(Http11NioProcessor.java:889) at org.apache.coyote.http11.Http11NioProtocol$Http11ConnectionHandler.process(Http11NioProtocol.java:744) at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.run(NioEndpoint.java:2274) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4114) Collection API: Allow multiple shards from one collection on the same Solr server
[ https://issues.apache.org/jira/browse/SOLR-4114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13505547#comment-13505547 ] Per Steffensen commented on SOLR-4114: -- bq. could not you do same thing as Elastic Search. Build index with number of shards (initial number is 5). If there is 1 machine in cluster, then all shards are on this machine. If you add more machines, they will move to other machines. It is way simple for administration. This moving shards around as more Solr servers join the cluster is the easiest way to provide elasticity (as I mentioned above somewhere). That is one of the reasons, that I want to be able to run multiple shards for a collection on the same Solr server. In that way you will have shards already to move to other Solrs that might join the cluster later. In Solr, right now, we dont have the abillity to move shards from one server to another (ES has it), but in order to be able to bennefit from such a future feature, you will need to be able have multiple shards on one Solr server. Alternatively you have to go split a shard, but that is much harder, and should only be used if you did not forsee, when you created your collection, that you would add more servers later, and therefore created your collection with multiple shards per server. Collection API: Allow multiple shards from one collection on the same Solr server - Key: SOLR-4114 URL: https://issues.apache.org/jira/browse/SOLR-4114 Project: Solr Issue Type: New Feature Components: multicore, SolrCloud Affects Versions: 4.0 Environment: Solr 4.0.0 release Reporter: Per Steffensen Assignee: Per Steffensen Labels: collection-api, multicore, shard, shard-allocation Attachments: SOLR-4114.patch, SOLR-4114.patch We should support running multiple shards from one collection on the same Solr server - the run a collection with 8 shards on a 4 Solr server cluster (each Solr server running 2 shards). Performance tests at our side has shown that this is a good idea, and it is also a good idea for easy elasticity later on - it is much easier to move an entire existing shards from one Solr server to another one that just joined the cluter than it is to split an exsiting shard among the Solr that used to run it and the new Solr. See dev mailing list discussion Multiple shards for one collection on the same Solr server -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4574) FunctionQuery ValueSource value computed twice per document
[ https://issues.apache.org/jira/browse/LUCENE-4574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13505548#comment-13505548 ] Robert Muir commented on LUCENE-4574: - Right, there is more fixing needed for the other collectors and other situations. But I think solr should still be fixed for the common sort-by-score case. I don't like the duplicate calls to score. I feel like the API should not support this. But i don't think caching is the correct solution. It already frustrates me that there are caches everywhere, for example BooleanScorer2 has a super-secret score cache just like this. I have plans to hunt down and kill all such little caches in lucene. Its not the right solution. The questions for this one is: If the user adds relevance as a sort but then also asks to track doc scores/max scores, how should the collector work? I definitely don't like the idea of more specialized collectors: god knows there are already too many, but maybe we can avoid this. Also: can we speed up this particular query? why is its score so costly? FunctionQuery ValueSource value computed twice per document --- Key: LUCENE-4574 URL: https://issues.apache.org/jira/browse/LUCENE-4574 Project: Lucene - Core Issue Type: Bug Components: core/search Affects Versions: 4.0, 4.1 Reporter: David Smiley Attachments: LUCENE-4574.patch, Test_for_LUCENE-4574.patch I was working on a custom ValueSource and did some basic profiling and debugging to see if it was being used optimally. To my surprise, the value was being fetched twice per document in a row. This computation isn't exactly cheap to calculate so this is a big problem. I was able to work-around this problem trivially on my end by caching the last value with corresponding docid in my FunctionValues implementation. Here is an excerpt of the code path to the first execution: {noformat} at org.apache.lucene.queries.function.docvalues.DoubleDocValues.floatVal(DoubleDocValues.java:48) at org.apache.lucene.queries.function.FunctionQuery$AllScorer.score(FunctionQuery.java:153) at org.apache.lucene.search.TopFieldCollector$OneComparatorScoringMaxScoreCollector.collect(TopFieldCollector.java:291) at org.apache.lucene.search.Scorer.score(Scorer.java:62) at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:588) at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:280) {noformat} And here is the 2nd call: {noformat} at org.apache.lucene.queries.function.docvalues.DoubleDocValues.floatVal(DoubleDocValues.java:48) at org.apache.lucene.queries.function.FunctionQuery$AllScorer.score(FunctionQuery.java:153) at org.apache.lucene.search.ScoreCachingWrappingScorer.score(ScoreCachingWrappingScorer.java:56) at org.apache.lucene.search.FieldComparator$RelevanceComparator.copy(FieldComparator.java:951) at org.apache.lucene.search.TopFieldCollector$OneComparatorScoringMaxScoreCollector.collect(TopFieldCollector.java:312) at org.apache.lucene.search.Scorer.score(Scorer.java:62) at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:588) at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:280) {noformat} The 2nd call appears to use some score caching mechanism, which is all well and good, but that same mechanism wasn't used in the first call so there's no cached value to retrieve. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4055) Remove/Reload the collection has the thread safe issue.
[ https://issues.apache.org/jira/browse/SOLR-4055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13505549#comment-13505549 ] Commit Tag Bot commented on SOLR-4055: -- [trunk commit] Mark Robert Miller http://svn.apache.org/viewvc?view=revisionrevision=1414744 SOLR-4055: clone params for create calls Remove/Reload the collection has the thread safe issue. --- Key: SOLR-4055 URL: https://issues.apache.org/jira/browse/SOLR-4055 Project: Solr Issue Type: Bug Components: SolrCloud Affects Versions: 4.0-ALPHA, 4.0-BETA, 4.0 Environment: Solr cloud Reporter: Raintung Li Assignee: Mark Miller Fix For: 4.1, 5.0 Attachments: patch-4055 OverseerCollectionProcessor class for collectionCmd method has thread safe issue. The major issue is ModifiableSolrParams params instance will deliver into other thread use(HttpShardHandler.submit). Modify parameter will affect the other threads the correct parameter. In the method collectionCmd , change the value params.set(CoreAdminParams.CORE, node.getStr(ZkStateReader.CORE_NAME_PROP)); , that occur send the http request thread will get the wrong core name. The result is that can't delete/reload the right core. The easy fix is clone the ModifiableSolrParams for every request. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Trying out the commit bot tagger at a larger scale
Well, it happened again. The HTMLStripCharFilter once again somehow showed up with local mods in the commit bot git repo…weird. I'm not sure how to address this yet - for the moment it's a manual process of discarding the change and then everything works again. blah… - Mark On Nov 26, 2012, at 4:51 PM, Mark Miller markrmil...@gmail.com wrote: I took a look at the local repo with a git client and it seemed to show local changes in an HTMLStripCharFilter…odd… Anyway, I discarded those changes and let the bot run again and it caught up with the missed tags. Not sure if it will happen again or not (this stuff is pretty isolated and untouched) - please let me know if anyone notices the tags are not being sent out. - Mark On Nov 26, 2012, at 4:32 PM, Mark Miller markrmil...@gmail.com wrote: Thanks for the note - it actually has not been firing lately - I just took a look and for some reason it has having trouble doing an update (I'm using jgit under the covers). It's claiming there is a conflict when I am trying to check out a branch. I'll solve this and get it kicking again. It's currently cron'd to run every 2 minutes. - Mark On Mon, Nov 26, 2012 at 4:04 PM, David Smiley (@MITRE.org) dsmi...@mitre.org wrote: Mark, Do I need to do anything for the bot to make its comment, aside from the commit? I just made a commit to both branches. How much delay is there / i.e. what's its schedule? ~ David - Author: http://www.packtpub.com/apache-solr-3-enterprise-search-server/book -- View this message in context: http://lucene.472066.n3.nabble.com/Trying-out-the-commit-bot-tagger-at-a-larger-scale-tp4021178p4022451.html Sent from the Lucene - Java Developer mailing list archive at Nabble.com. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org -- - Mark - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3377) eDismax: A fielded query wrapped by parens is not recognized
[ https://issues.apache.org/jira/browse/SOLR-3377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13505554#comment-13505554 ] Leonhard Maylein commented on SOLR-3377: Ok, I understand. The phrase boost queries are separated from the normal query expansion via the qf paramter. But, all terms are (equally) qualified by a field (field sw for the terms a and b, field ti for the terms c and d). Why do the eDismax handler only use the terms b and d to build the phrase boost query? Isn't it a bug? eDismax: A fielded query wrapped by parens is not recognized Key: SOLR-3377 URL: https://issues.apache.org/jira/browse/SOLR-3377 Project: Solr Issue Type: Bug Components: query parsers Affects Versions: 3.6 Reporter: Jan Høydahl Assignee: Yonik Seeley Priority: Critical Fix For: 4.0-BETA Attachments: SOLR-3377.patch, SOLR-3377.patch, SOLR-3377.patch, SOLR-3377.patch As reported by bernd on the user list, a query like this {{q=(name:test)}} will yield 0 hits in 3.6 while it worked in 3.5. It works without the parens. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-4577) Nuke TFIDFSim's cache
Robert Muir created LUCENE-4577: --- Summary: Nuke TFIDFSim's cache Key: LUCENE-4577 URL: https://issues.apache.org/jira/browse/LUCENE-4577 Project: Lucene - Core Issue Type: Bug Reporter: Robert Muir This is the old termscorer cache. This helps nothing, and maybe hurts: I removed it and here are the results: {noformat} Chart saved to out.png... (wd: /home/rmuir/workspace/lucene-trunk/lucene/benchmark) TaskQPS base StdDev QPS patch StdDev Pct diff TermGroup1M 52.87 (2.2%) 52.62 (2.4%) -0.5% ( -4% -4%) AndHighMed 34.82 (2.8%) 34.70 (3.6%) -0.3% ( -6% -6%) SpanNear6.28 (5.3%)6.26 (3.9%) -0.2% ( -8% -9%) IntNRQ 13.24 (11.0%) 13.24 (9.9%) 0.0% ( -18% - 23%) Prefix3 42.19 (7.6%) 42.21 (7.0%) 0.1% ( -13% - 15%) Wildcard 36.90 (6.8%) 37.02 (5.9%) 0.3% ( -11% - 13%) AndHighHigh 25.68 (4.5%) 25.79 (3.2%) 0.5% ( -6% -8%) Phrase9.28 (4.7%)9.35 (4.4%) 0.7% ( -8% - 10%) TermBGroup1M 45.76 (6.3%) 46.10 (3.2%) 0.7% ( -8% - 10%) SloppyPhrase 10.25 (3.9%) 10.33 (4.4%) 0.8% ( -7% -9%) OrHighHigh8.87 (6.4%)8.97 (6.7%) 1.1% ( -11% - 15%) Fuzzy1 70.28 (4.3%) 71.24 (7.1%) 1.4% ( -9% - 13%) OrHighMed 10.70 (7.0%) 10.86 (6.4%) 1.5% ( -11% - 15%) Fuzzy2 27.79 (6.1%) 28.31 (5.1%) 1.9% ( -8% - 13%) Respell 71.72 (6.8%) 73.39 (3.7%) 2.3% ( -7% - 13%) Term 209.49 (4.4%) 214.58 (3.7%) 2.4% ( -5% - 11%) TermBGroup1M1P7.10 (5.1%)7.48 (7.8%) 5.3% ( -7% - 19%) {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4117) IO error while trying to get the size of the Directory
[ https://issues.apache.org/jira/browse/SOLR-4117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=1350#comment-1350 ] Eks Dev commented on SOLR-4117: --- fsync of course, fsck was intended for my terminal window :) IO error while trying to get the size of the Directory -- Key: SOLR-4117 URL: https://issues.apache.org/jira/browse/SOLR-4117 Project: Solr Issue Type: Bug Components: SolrCloud Affects Versions: 5.0 Environment: 5.0.0.2012.11.28.10.42.06 Debian Squeeze, Tomcat 6, Sun Java 6, 10 nodes, 10 shards, rep. factor 2. Reporter: Markus Jelsma Assignee: Mark Miller Priority: Minor Fix For: 5.0 With SOLR-4032 fixed we see other issues when randomly taking down nodes (nicely via tomcat restart) while indexing a few million web pages from Hadoop. We do make sure that at least one node is up for a shard but due to recovery issues it may not be live. One node seems to work but generates IO errors in the log and ZookeeperExeption in the GUI. In the GUI we only see: {code} SolrCore Initialization Failures openindex_f: org.apache.solr.common.cloud.ZooKeeperException:org.apache.solr.common.cloud.ZooKeeperException: Please check your logs for more information {code} and in the log we only see the following exception: {code} 2012-11-28 11:47:26,652 ERROR [solr.handler.ReplicationHandler] - [http-8080-exec-28] - : IO error while trying to get the size of the Directory:org.apache.lucene.store.NoSuchDirectoryException: directory '/opt/solr/cores/shard_f/data/index' does not exist at org.apache.lucene.store.FSDirectory.listAll(FSDirectory.java:217) at org.apache.lucene.store.FSDirectory.listAll(FSDirectory.java:240) at org.apache.lucene.store.NRTCachingDirectory.listAll(NRTCachingDirectory.java:132) at org.apache.solr.core.DirectoryFactory.sizeOfDirectory(DirectoryFactory.java:146) at org.apache.solr.handler.ReplicationHandler.getIndexSize(ReplicationHandler.java:472) at org.apache.solr.handler.ReplicationHandler.getReplicationDetails(ReplicationHandler.java:568) at org.apache.solr.handler.ReplicationHandler.handleRequestBody(ReplicationHandler.java:213) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:144) at org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:240) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1830) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:476) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:276) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293) at org.apache.coyote.http11.Http11NioProcessor.process(Http11NioProcessor.java:889) at org.apache.coyote.http11.Http11NioProtocol$Http11ConnectionHandler.process(Http11NioProtocol.java:744) at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.run(NioEndpoint.java:2274) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4114) Collection API: Allow multiple shards from one collection on the same Solr server
[ https://issues.apache.org/jira/browse/SOLR-4114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13505556#comment-13505556 ] Per Steffensen commented on SOLR-4114: -- bq. Personally, I'm not really sold on this auto re balancing idea. I'd prefer the user had to explicitly make these moves. Me neither - and I can say that ES sometimes f it up. At least when I was working with it, but that was mainly because of bad re-balancing algoritms. But I like moving shards manually from a admin-console! Collection API: Allow multiple shards from one collection on the same Solr server - Key: SOLR-4114 URL: https://issues.apache.org/jira/browse/SOLR-4114 Project: Solr Issue Type: New Feature Components: multicore, SolrCloud Affects Versions: 4.0 Environment: Solr 4.0.0 release Reporter: Per Steffensen Assignee: Per Steffensen Labels: collection-api, multicore, shard, shard-allocation Attachments: SOLR-4114.patch, SOLR-4114.patch We should support running multiple shards from one collection on the same Solr server - the run a collection with 8 shards on a 4 Solr server cluster (each Solr server running 2 shards). Performance tests at our side has shown that this is a good idea, and it is also a good idea for easy elasticity later on - it is much easier to move an entire existing shards from one Solr server to another one that just joined the cluter than it is to split an exsiting shard among the Solr that used to run it and the new Solr. See dev mailing list discussion Multiple shards for one collection on the same Solr server -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-4577) Nuke TFIDFSim's cache
[ https://issues.apache.org/jira/browse/LUCENE-4577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-4577: Attachment: LUCENE-4577.patch Nuke TFIDFSim's cache - Key: LUCENE-4577 URL: https://issues.apache.org/jira/browse/LUCENE-4577 Project: Lucene - Core Issue Type: Bug Reporter: Robert Muir Attachments: LUCENE-4577.patch This is the old termscorer cache. This helps nothing, and maybe hurts: I removed it and here are the results: {noformat} Chart saved to out.png... (wd: /home/rmuir/workspace/lucene-trunk/lucene/benchmark) TaskQPS base StdDev QPS patch StdDev Pct diff TermGroup1M 52.87 (2.2%) 52.62 (2.4%) -0.5% ( -4% -4%) AndHighMed 34.82 (2.8%) 34.70 (3.6%) -0.3% ( -6% -6%) SpanNear6.28 (5.3%)6.26 (3.9%) -0.2% ( -8% -9%) IntNRQ 13.24 (11.0%) 13.24 (9.9%) 0.0% ( -18% - 23%) Prefix3 42.19 (7.6%) 42.21 (7.0%) 0.1% ( -13% - 15%) Wildcard 36.90 (6.8%) 37.02 (5.9%) 0.3% ( -11% - 13%) AndHighHigh 25.68 (4.5%) 25.79 (3.2%) 0.5% ( -6% -8%) Phrase9.28 (4.7%)9.35 (4.4%) 0.7% ( -8% - 10%) TermBGroup1M 45.76 (6.3%) 46.10 (3.2%) 0.7% ( -8% - 10%) SloppyPhrase 10.25 (3.9%) 10.33 (4.4%) 0.8% ( -7% -9%) OrHighHigh8.87 (6.4%)8.97 (6.7%) 1.1% ( -11% - 15%) Fuzzy1 70.28 (4.3%) 71.24 (7.1%) 1.4% ( -9% - 13%) OrHighMed 10.70 (7.0%) 10.86 (6.4%) 1.5% ( -11% - 15%) Fuzzy2 27.79 (6.1%) 28.31 (5.1%) 1.9% ( -8% - 13%) Respell 71.72 (6.8%) 73.39 (3.7%) 2.3% ( -7% - 13%) Term 209.49 (4.4%) 214.58 (3.7%) 2.4% ( -5% - 11%) TermBGroup1M1P7.10 (5.1%)7.48 (7.8%) 5.3% ( -7% - 19%) {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4114) Collection API: Allow multiple shards from one collection on the same Solr server
[ https://issues.apache.org/jira/browse/SOLR-4114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13505558#comment-13505558 ] Mark Miller commented on SOLR-4114: --- I've committed the shared params issue under SOLR-4055 and added Per to the Changes entry. Collection API: Allow multiple shards from one collection on the same Solr server - Key: SOLR-4114 URL: https://issues.apache.org/jira/browse/SOLR-4114 Project: Solr Issue Type: New Feature Components: multicore, SolrCloud Affects Versions: 4.0 Environment: Solr 4.0.0 release Reporter: Per Steffensen Assignee: Per Steffensen Labels: collection-api, multicore, shard, shard-allocation Attachments: SOLR-4114.patch, SOLR-4114.patch We should support running multiple shards from one collection on the same Solr server - the run a collection with 8 shards on a 4 Solr server cluster (each Solr server running 2 shards). Performance tests at our side has shown that this is a good idea, and it is also a good idea for easy elasticity later on - it is much easier to move an entire existing shards from one Solr server to another one that just joined the cluter than it is to split an exsiting shard among the Solr that used to run it and the new Solr. See dev mailing list discussion Multiple shards for one collection on the same Solr server -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4055) Remove/Reload the collection has the thread safe issue.
[ https://issues.apache.org/jira/browse/SOLR-4055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13505559#comment-13505559 ] Commit Tag Bot commented on SOLR-4055: -- [branch_4x commit] Mark Robert Miller http://svn.apache.org/viewvc?view=revisionrevision=1414760 SOLR-4055: clone params for create calls Remove/Reload the collection has the thread safe issue. --- Key: SOLR-4055 URL: https://issues.apache.org/jira/browse/SOLR-4055 Project: Solr Issue Type: Bug Components: SolrCloud Affects Versions: 4.0-ALPHA, 4.0-BETA, 4.0 Environment: Solr cloud Reporter: Raintung Li Assignee: Mark Miller Fix For: 4.1, 5.0 Attachments: patch-4055 OverseerCollectionProcessor class for collectionCmd method has thread safe issue. The major issue is ModifiableSolrParams params instance will deliver into other thread use(HttpShardHandler.submit). Modify parameter will affect the other threads the correct parameter. In the method collectionCmd , change the value params.set(CoreAdminParams.CORE, node.getStr(ZkStateReader.CORE_NAME_PROP)); , that occur send the http request thread will get the wrong core name. The result is that can't delete/reload the right core. The easy fix is clone the ModifiableSolrParams for every request. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4114) Collection API: Allow multiple shards from one collection on the same Solr server
[ https://issues.apache.org/jira/browse/SOLR-4114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13505562#comment-13505562 ] Per Steffensen commented on SOLR-4114: -- bq. When grabbing the params fix I noticed you set the data dir to something like shardname+_data - that's not strictly necessary right? Since each core should have it's own instance dir Well I use the same instance-dir for all shards, but a different data-dir - this is just how we used to do it in my project, but it can be changed. As long as the code uses same instance-dir different data-dirs are necessary though. Collection API: Allow multiple shards from one collection on the same Solr server - Key: SOLR-4114 URL: https://issues.apache.org/jira/browse/SOLR-4114 Project: Solr Issue Type: New Feature Components: multicore, SolrCloud Affects Versions: 4.0 Environment: Solr 4.0.0 release Reporter: Per Steffensen Assignee: Per Steffensen Labels: collection-api, multicore, shard, shard-allocation Attachments: SOLR-4114.patch, SOLR-4114.patch We should support running multiple shards from one collection on the same Solr server - the run a collection with 8 shards on a 4 Solr server cluster (each Solr server running 2 shards). Performance tests at our side has shown that this is a good idea, and it is also a good idea for easy elasticity later on - it is much easier to move an entire existing shards from one Solr server to another one that just joined the cluter than it is to split an exsiting shard among the Solr that used to run it and the new Solr. See dev mailing list discussion Multiple shards for one collection on the same Solr server -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4114) Collection API: Allow multiple shards from one collection on the same Solr server
[ https://issues.apache.org/jira/browse/SOLR-4114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13505564#comment-13505564 ] Yonik Seeley commented on SOLR-4114: bq. I would expect replication-factor to say something about how many times the data is REPLICATED. I would too, but we would still disagree on what that meant since I would interpret the number of times the data is replicated to mean the total number of copies that exist after a write operation to the cluster. That seems to be the much more common interpretation in this context since there is no original... everyone has stored/indexed a copy. $ echo hello file1.txt $ cp file1.txt file2.txt How many copies of the file are there? If you look at the state (and not the mechanism by which you arrived there) most would say there are 2 copies. In one interpretation, there is only one copy, but that's too literal and assignes some special category to the original. http://hadoop.apache.org/docs/r0.20.2/hdfs_design.html The number of copies of a file is called the replication factor of that file. http://www.datastax.com/docs/1.0/cluster_architecture/replication The total number of replicas across the cluster is referred to as the replication factor. A replication factor of 1 means that there is only one copy of each row on one node. Oracle NoSQL store: http://docs.oracle.com/cd/NOSQL/html/AdminGuide/introduction.html#replicationfactor http://docs.oracle.com/cd/NOSQL/html/AdminGuide/store-config.html A Replication Factor of 3 gives you shards with one master plus two replicas. Riak: http://wiki.basho.com/What-is-Riak%3F.html An n value of 3 (default) means that each object is replicated 3 times. When an object’s key is mapped onto a given partition, Riak won’t stop there – it automatically replicates the data onto the next two partitions as well. Splunk: http://docs.splunk.com/Documentation/Splunk/latest/Indexer/Thereplicationfactor The number of data/bucket copies is called the cluster's replication factor. The cluster can tolerate a failure of (replication factor - 1) peer nodes. So, for example, to ensure that your system can tolerate a failure of two peers, you must configure a replication factor of 3, which means that the cluster stores three identical copies of each bucket on separate nodes. With a replication factor of 3, you can be certain that all your data will be available if no more than two peer nodes in the cluster fail. With two nodes down, you still have one complete copy of your data available on the remaining peer(s). It's clear that 3 copies means 3 total instances of the same data, not 4 (an original plus 3 more copies of it.) Collection API: Allow multiple shards from one collection on the same Solr server - Key: SOLR-4114 URL: https://issues.apache.org/jira/browse/SOLR-4114 Project: Solr Issue Type: New Feature Components: multicore, SolrCloud Affects Versions: 4.0 Environment: Solr 4.0.0 release Reporter: Per Steffensen Assignee: Per Steffensen Labels: collection-api, multicore, shard, shard-allocation Attachments: SOLR-4114.patch, SOLR-4114.patch We should support running multiple shards from one collection on the same Solr server - the run a collection with 8 shards on a 4 Solr server cluster (each Solr server running 2 shards). Performance tests at our side has shown that this is a good idea, and it is also a good idea for easy elasticity later on - it is much easier to move an entire existing shards from one Solr server to another one that just joined the cluter than it is to split an exsiting shard among the Solr that used to run it and the new Solr. See dev mailing list discussion Multiple shards for one collection on the same Solr server -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4114) Collection API: Allow multiple shards from one collection on the same Solr server
[ https://issues.apache.org/jira/browse/SOLR-4114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13505565#comment-13505565 ] Per Steffensen commented on SOLR-4114: -- bq. I've committed the shared params issue under SOLR-4055 and added Per to the Changes entry. On which branch are you committing, Mark? Collection API: Allow multiple shards from one collection on the same Solr server - Key: SOLR-4114 URL: https://issues.apache.org/jira/browse/SOLR-4114 Project: Solr Issue Type: New Feature Components: multicore, SolrCloud Affects Versions: 4.0 Environment: Solr 4.0.0 release Reporter: Per Steffensen Assignee: Per Steffensen Labels: collection-api, multicore, shard, shard-allocation Attachments: SOLR-4114.patch, SOLR-4114.patch We should support running multiple shards from one collection on the same Solr server - the run a collection with 8 shards on a 4 Solr server cluster (each Solr server running 2 shards). Performance tests at our side has shown that this is a good idea, and it is also a good idea for easy elasticity later on - it is much easier to move an entire existing shards from one Solr server to another one that just joined the cluter than it is to split an exsiting shard among the Solr that used to run it and the new Solr. See dev mailing list discussion Multiple shards for one collection on the same Solr server -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (SOLR-4055) Remove/Reload the collection has the thread safe issue.
[ https://issues.apache.org/jira/browse/SOLR-4055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Miller resolved SOLR-4055. --- Resolution: Fixed Remove/Reload the collection has the thread safe issue. --- Key: SOLR-4055 URL: https://issues.apache.org/jira/browse/SOLR-4055 Project: Solr Issue Type: Bug Components: SolrCloud Affects Versions: 4.0-ALPHA, 4.0-BETA, 4.0 Environment: Solr cloud Reporter: Raintung Li Assignee: Mark Miller Fix For: 4.1, 5.0 Attachments: patch-4055 OverseerCollectionProcessor class for collectionCmd method has thread safe issue. The major issue is ModifiableSolrParams params instance will deliver into other thread use(HttpShardHandler.submit). Modify parameter will affect the other threads the correct parameter. In the method collectionCmd , change the value params.set(CoreAdminParams.CORE, node.getStr(ZkStateReader.CORE_NAME_PROP)); , that occur send the http request thread will get the wrong core name. The result is that can't delete/reload the right core. The easy fix is clone the ModifiableSolrParams for every request. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4114) Collection API: Allow multiple shards from one collection on the same Solr server
[ https://issues.apache.org/jira/browse/SOLR-4114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13505570#comment-13505570 ] Per Steffensen commented on SOLR-4114: -- bq. I would too, but we would still disagree on what that meant since I would interpret the number of times the data is replicated... I actually agree with you. I just dont like replica to part of the name for it then. If we rename replication-factor to number-of-copies or something I would be much happier changing the semantics of it :-) But really, this is another issue. Collection API: Allow multiple shards from one collection on the same Solr server - Key: SOLR-4114 URL: https://issues.apache.org/jira/browse/SOLR-4114 Project: Solr Issue Type: New Feature Components: multicore, SolrCloud Affects Versions: 4.0 Environment: Solr 4.0.0 release Reporter: Per Steffensen Assignee: Per Steffensen Labels: collection-api, multicore, shard, shard-allocation Attachments: SOLR-4114.patch, SOLR-4114.patch We should support running multiple shards from one collection on the same Solr server - the run a collection with 8 shards on a 4 Solr server cluster (each Solr server running 2 shards). Performance tests at our side has shown that this is a good idea, and it is also a good idea for easy elasticity later on - it is much easier to move an entire existing shards from one Solr server to another one that just joined the cluter than it is to split an exsiting shard among the Solr that used to run it and the new Solr. See dev mailing list discussion Multiple shards for one collection on the same Solr server -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4114) Collection API: Allow multiple shards from one collection on the same Solr server
[ https://issues.apache.org/jira/browse/SOLR-4114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13505571#comment-13505571 ] Mark Miller commented on SOLR-4114: --- bq. On which branch are you committing, Mark? 5x and then merged to 4x - just that small fix though - have not had a chance to review this patch fully yet. Collection API: Allow multiple shards from one collection on the same Solr server - Key: SOLR-4114 URL: https://issues.apache.org/jira/browse/SOLR-4114 Project: Solr Issue Type: New Feature Components: multicore, SolrCloud Affects Versions: 4.0 Environment: Solr 4.0.0 release Reporter: Per Steffensen Assignee: Per Steffensen Labels: collection-api, multicore, shard, shard-allocation Attachments: SOLR-4114.patch, SOLR-4114.patch We should support running multiple shards from one collection on the same Solr server - the run a collection with 8 shards on a 4 Solr server cluster (each Solr server running 2 shards). Performance tests at our side has shown that this is a good idea, and it is also a good idea for easy elasticity later on - it is much easier to move an entire existing shards from one Solr server to another one that just joined the cluter than it is to split an exsiting shard among the Solr that used to run it and the new Solr. See dev mailing list discussion Multiple shards for one collection on the same Solr server -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4114) Collection API: Allow multiple shards from one collection on the same Solr server
[ https://issues.apache.org/jira/browse/SOLR-4114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13505575#comment-13505575 ] Per Steffensen commented on SOLR-4114: -- Well Im off for today. Will probably (if my POs head does not turn green) be making the spread-shards-according-to-provided-list feature tomorrow. If you commit the entire patch for SOLR-4114 it will be easier for me to provide a new patch for this new feature and attach it to a new issue :-) Collection API: Allow multiple shards from one collection on the same Solr server - Key: SOLR-4114 URL: https://issues.apache.org/jira/browse/SOLR-4114 Project: Solr Issue Type: New Feature Components: multicore, SolrCloud Affects Versions: 4.0 Environment: Solr 4.0.0 release Reporter: Per Steffensen Assignee: Per Steffensen Labels: collection-api, multicore, shard, shard-allocation Attachments: SOLR-4114.patch, SOLR-4114.patch We should support running multiple shards from one collection on the same Solr server - the run a collection with 8 shards on a 4 Solr server cluster (each Solr server running 2 shards). Performance tests at our side has shown that this is a good idea, and it is also a good idea for easy elasticity later on - it is much easier to move an entire existing shards from one Solr server to another one that just joined the cluter than it is to split an exsiting shard among the Solr that used to run it and the new Solr. See dev mailing list discussion Multiple shards for one collection on the same Solr server -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-4118) fix replicationFactor to align with industry usage
Yonik Seeley created SOLR-4118: -- Summary: fix replicationFactor to align with industry usage Key: SOLR-4118 URL: https://issues.apache.org/jira/browse/SOLR-4118 Project: Solr Issue Type: Bug Affects Versions: 4.0 Reporter: Yonik Seeley Priority: Minor Fix For: 4.1 replicationFactor should be the number of different nodes that have a document. See discussion in SOLR-4114 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4114) Collection API: Allow multiple shards from one collection on the same Solr server
[ https://issues.apache.org/jira/browse/SOLR-4114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13505578#comment-13505578 ] Per Steffensen commented on SOLR-4114: -- bq. 5x and then merged to 4x - just that small fix though - have not had a chance to review this patch fully yet. But is it also going to be backported to lucene_solr_4_0, which is actually the branch I am working on top of? Collection API: Allow multiple shards from one collection on the same Solr server - Key: SOLR-4114 URL: https://issues.apache.org/jira/browse/SOLR-4114 Project: Solr Issue Type: New Feature Components: multicore, SolrCloud Affects Versions: 4.0 Environment: Solr 4.0.0 release Reporter: Per Steffensen Assignee: Per Steffensen Labels: collection-api, multicore, shard, shard-allocation Attachments: SOLR-4114.patch, SOLR-4114.patch We should support running multiple shards from one collection on the same Solr server - the run a collection with 8 shards on a 4 Solr server cluster (each Solr server running 2 shards). Performance tests at our side has shown that this is a good idea, and it is also a good idea for easy elasticity later on - it is much easier to move an entire existing shards from one Solr server to another one that just joined the cluter than it is to split an exsiting shard among the Solr that used to run it and the new Solr. See dev mailing list discussion Multiple shards for one collection on the same Solr server -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4117) IO error while trying to get the size of the Directory
[ https://issues.apache.org/jira/browse/SOLR-4117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13505581#comment-13505581 ] Mark Miller commented on SOLR-4117: --- Markus, I'm about to commit a fix to this issue - but I doubt it's the same as the issue you then mention in a comment. IO error while trying to get the size of the Directory -- Key: SOLR-4117 URL: https://issues.apache.org/jira/browse/SOLR-4117 Project: Solr Issue Type: Bug Components: SolrCloud Affects Versions: 5.0 Environment: 5.0.0.2012.11.28.10.42.06 Debian Squeeze, Tomcat 6, Sun Java 6, 10 nodes, 10 shards, rep. factor 2. Reporter: Markus Jelsma Assignee: Mark Miller Priority: Minor Fix For: 5.0 With SOLR-4032 fixed we see other issues when randomly taking down nodes (nicely via tomcat restart) while indexing a few million web pages from Hadoop. We do make sure that at least one node is up for a shard but due to recovery issues it may not be live. One node seems to work but generates IO errors in the log and ZookeeperExeption in the GUI. In the GUI we only see: {code} SolrCore Initialization Failures openindex_f: org.apache.solr.common.cloud.ZooKeeperException:org.apache.solr.common.cloud.ZooKeeperException: Please check your logs for more information {code} and in the log we only see the following exception: {code} 2012-11-28 11:47:26,652 ERROR [solr.handler.ReplicationHandler] - [http-8080-exec-28] - : IO error while trying to get the size of the Directory:org.apache.lucene.store.NoSuchDirectoryException: directory '/opt/solr/cores/shard_f/data/index' does not exist at org.apache.lucene.store.FSDirectory.listAll(FSDirectory.java:217) at org.apache.lucene.store.FSDirectory.listAll(FSDirectory.java:240) at org.apache.lucene.store.NRTCachingDirectory.listAll(NRTCachingDirectory.java:132) at org.apache.solr.core.DirectoryFactory.sizeOfDirectory(DirectoryFactory.java:146) at org.apache.solr.handler.ReplicationHandler.getIndexSize(ReplicationHandler.java:472) at org.apache.solr.handler.ReplicationHandler.getReplicationDetails(ReplicationHandler.java:568) at org.apache.solr.handler.ReplicationHandler.handleRequestBody(ReplicationHandler.java:213) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:144) at org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:240) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1830) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:476) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:276) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293) at org.apache.coyote.http11.Http11NioProcessor.process(Http11NioProcessor.java:889) at org.apache.coyote.http11.Http11NioProtocol$Http11ConnectionHandler.process(Http11NioProtocol.java:744) at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.run(NioEndpoint.java:2274) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4574) FunctionQuery ValueSource value computed twice per document
[ https://issues.apache.org/jira/browse/LUCENE-4574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13505582#comment-13505582 ] David Smiley commented on LUCENE-4574: -- I don't have any conviction on what the right answer should be; this area of Lucene is not one I've explored before. If scorer.score() is cheap in general (is it?), then I can see your reservations. Perhaps the solution is to only cache specific Scorers that are or could be expensive. So for me this means adding the cache at FunctionQuery$AllScorer. This cache is as lightweight as a cache can possibly be, remember; no hashtable lookup, just a docid comparison with branch. bq. Also: can we speed up this particular query? why is its score so costly? It's a FunctionQuery tied to a ValueSource doing spatial distance. Applying this very simple cache on my custom ValueSource cut my response time in nearly a half! FunctionQuery ValueSource value computed twice per document --- Key: LUCENE-4574 URL: https://issues.apache.org/jira/browse/LUCENE-4574 Project: Lucene - Core Issue Type: Bug Components: core/search Affects Versions: 4.0, 4.1 Reporter: David Smiley Attachments: LUCENE-4574.patch, Test_for_LUCENE-4574.patch I was working on a custom ValueSource and did some basic profiling and debugging to see if it was being used optimally. To my surprise, the value was being fetched twice per document in a row. This computation isn't exactly cheap to calculate so this is a big problem. I was able to work-around this problem trivially on my end by caching the last value with corresponding docid in my FunctionValues implementation. Here is an excerpt of the code path to the first execution: {noformat} at org.apache.lucene.queries.function.docvalues.DoubleDocValues.floatVal(DoubleDocValues.java:48) at org.apache.lucene.queries.function.FunctionQuery$AllScorer.score(FunctionQuery.java:153) at org.apache.lucene.search.TopFieldCollector$OneComparatorScoringMaxScoreCollector.collect(TopFieldCollector.java:291) at org.apache.lucene.search.Scorer.score(Scorer.java:62) at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:588) at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:280) {noformat} And here is the 2nd call: {noformat} at org.apache.lucene.queries.function.docvalues.DoubleDocValues.floatVal(DoubleDocValues.java:48) at org.apache.lucene.queries.function.FunctionQuery$AllScorer.score(FunctionQuery.java:153) at org.apache.lucene.search.ScoreCachingWrappingScorer.score(ScoreCachingWrappingScorer.java:56) at org.apache.lucene.search.FieldComparator$RelevanceComparator.copy(FieldComparator.java:951) at org.apache.lucene.search.TopFieldCollector$OneComparatorScoringMaxScoreCollector.collect(TopFieldCollector.java:312) at org.apache.lucene.search.Scorer.score(Scorer.java:62) at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:588) at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:280) {noformat} The 2nd call appears to use some score caching mechanism, which is all well and good, but that same mechanism wasn't used in the first call so there's no cached value to retrieve. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4117) IO error while trying to get the size of the Directory
[ https://issues.apache.org/jira/browse/SOLR-4117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13505584#comment-13505584 ] Commit Tag Bot commented on SOLR-4117: -- [trunk commit] Mark Robert Miller http://svn.apache.org/viewvc?view=revisionrevision=1414773 SOLR-4117: Retrieving the size of the index may use the wrong index dir if you are replicating. IO error while trying to get the size of the Directory -- Key: SOLR-4117 URL: https://issues.apache.org/jira/browse/SOLR-4117 Project: Solr Issue Type: Bug Components: SolrCloud Affects Versions: 5.0 Environment: 5.0.0.2012.11.28.10.42.06 Debian Squeeze, Tomcat 6, Sun Java 6, 10 nodes, 10 shards, rep. factor 2. Reporter: Markus Jelsma Assignee: Mark Miller Priority: Minor Fix For: 5.0 With SOLR-4032 fixed we see other issues when randomly taking down nodes (nicely via tomcat restart) while indexing a few million web pages from Hadoop. We do make sure that at least one node is up for a shard but due to recovery issues it may not be live. One node seems to work but generates IO errors in the log and ZookeeperExeption in the GUI. In the GUI we only see: {code} SolrCore Initialization Failures openindex_f: org.apache.solr.common.cloud.ZooKeeperException:org.apache.solr.common.cloud.ZooKeeperException: Please check your logs for more information {code} and in the log we only see the following exception: {code} 2012-11-28 11:47:26,652 ERROR [solr.handler.ReplicationHandler] - [http-8080-exec-28] - : IO error while trying to get the size of the Directory:org.apache.lucene.store.NoSuchDirectoryException: directory '/opt/solr/cores/shard_f/data/index' does not exist at org.apache.lucene.store.FSDirectory.listAll(FSDirectory.java:217) at org.apache.lucene.store.FSDirectory.listAll(FSDirectory.java:240) at org.apache.lucene.store.NRTCachingDirectory.listAll(NRTCachingDirectory.java:132) at org.apache.solr.core.DirectoryFactory.sizeOfDirectory(DirectoryFactory.java:146) at org.apache.solr.handler.ReplicationHandler.getIndexSize(ReplicationHandler.java:472) at org.apache.solr.handler.ReplicationHandler.getReplicationDetails(ReplicationHandler.java:568) at org.apache.solr.handler.ReplicationHandler.handleRequestBody(ReplicationHandler.java:213) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:144) at org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:240) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1830) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:476) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:276) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293) at org.apache.coyote.http11.Http11NioProcessor.process(Http11NioProcessor.java:889) at org.apache.coyote.http11.Http11NioProtocol$Http11ConnectionHandler.process(Http11NioProtocol.java:744) at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.run(NioEndpoint.java:2274) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-4118) fix replicationFactor to align with industry usage
[ https://issues.apache.org/jira/browse/SOLR-4118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Miller updated SOLR-4118: -- Fix Version/s: 5.0 fix replicationFactor to align with industry usage --- Key: SOLR-4118 URL: https://issues.apache.org/jira/browse/SOLR-4118 Project: Solr Issue Type: Bug Affects Versions: 4.0 Reporter: Yonik Seeley Priority: Minor Fix For: 4.1, 5.0 replicationFactor should be the number of different nodes that have a document. See discussion in SOLR-4114 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-4114) Collection API: Allow multiple shards from one collection on the same Solr server
[ https://issues.apache.org/jira/browse/SOLR-4114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13505570#comment-13505570 ] Per Steffensen edited comment on SOLR-4114 at 11/28/12 3:59 PM: bq. I would too, but we would still disagree on what that meant since I would interpret the number of times the data is replicated... I actually agree with you. I just dont like replication to part of the name for it then. If we rename replication-factor to number-of-copies or even number-of-replica or something I would be much happier changing the semantics of it :-) But really, this is another issue. was (Author: steff1193): bq. I would too, but we would still disagree on what that meant since I would interpret the number of times the data is replicated... I actually agree with you. I just dont like replica to part of the name for it then. If we rename replication-factor to number-of-copies or something I would be much happier changing the semantics of it :-) But really, this is another issue. Collection API: Allow multiple shards from one collection on the same Solr server - Key: SOLR-4114 URL: https://issues.apache.org/jira/browse/SOLR-4114 Project: Solr Issue Type: New Feature Components: multicore, SolrCloud Affects Versions: 4.0 Environment: Solr 4.0.0 release Reporter: Per Steffensen Assignee: Per Steffensen Labels: collection-api, multicore, shard, shard-allocation Attachments: SOLR-4114.patch, SOLR-4114.patch We should support running multiple shards from one collection on the same Solr server - the run a collection with 8 shards on a 4 Solr server cluster (each Solr server running 2 shards). Performance tests at our side has shown that this is a good idea, and it is also a good idea for easy elasticity later on - it is much easier to move an entire existing shards from one Solr server to another one that just joined the cluter than it is to split an exsiting shard among the Solr that used to run it and the new Solr. See dev mailing list discussion Multiple shards for one collection on the same Solr server -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4114) Collection API: Allow multiple shards from one collection on the same Solr server
[ https://issues.apache.org/jira/browse/SOLR-4114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13505590#comment-13505590 ] Mark Miller commented on SOLR-4114: --- bq. But is it also going to be backported to lucene_solr_4_0 Given past discussion, it's very unlikely that we will release a 4.0.1 (I was for it FWIW) and will just do a 4.1 - so no, generally nothing is being back ported to the 4.0 branch. If we did end up deciding to do a 4.0.1, then we would select which issues should go in and then do those back ports later. Collection API: Allow multiple shards from one collection on the same Solr server - Key: SOLR-4114 URL: https://issues.apache.org/jira/browse/SOLR-4114 Project: Solr Issue Type: New Feature Components: multicore, SolrCloud Affects Versions: 4.0 Environment: Solr 4.0.0 release Reporter: Per Steffensen Assignee: Per Steffensen Labels: collection-api, multicore, shard, shard-allocation Attachments: SOLR-4114.patch, SOLR-4114.patch We should support running multiple shards from one collection on the same Solr server - the run a collection with 8 shards on a 4 Solr server cluster (each Solr server running 2 shards). Performance tests at our side has shown that this is a good idea, and it is also a good idea for easy elasticity later on - it is much easier to move an entire existing shards from one Solr server to another one that just joined the cluter than it is to split an exsiting shard among the Solr that used to run it and the new Solr. See dev mailing list discussion Multiple shards for one collection on the same Solr server -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4117) IO error while trying to get the size of the Directory
[ https://issues.apache.org/jira/browse/SOLR-4117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13505594#comment-13505594 ] Commit Tag Bot commented on SOLR-4117: -- [branch_4x commit] Mark Robert Miller http://svn.apache.org/viewvc?view=revisionrevision=1414774 SOLR-4117: Retrieving the size of the index may use the wrong index dir if you are replicating. IO error while trying to get the size of the Directory -- Key: SOLR-4117 URL: https://issues.apache.org/jira/browse/SOLR-4117 Project: Solr Issue Type: Bug Components: SolrCloud Affects Versions: 5.0 Environment: 5.0.0.2012.11.28.10.42.06 Debian Squeeze, Tomcat 6, Sun Java 6, 10 nodes, 10 shards, rep. factor 2. Reporter: Markus Jelsma Assignee: Mark Miller Priority: Minor Fix For: 5.0 With SOLR-4032 fixed we see other issues when randomly taking down nodes (nicely via tomcat restart) while indexing a few million web pages from Hadoop. We do make sure that at least one node is up for a shard but due to recovery issues it may not be live. One node seems to work but generates IO errors in the log and ZookeeperExeption in the GUI. In the GUI we only see: {code} SolrCore Initialization Failures openindex_f: org.apache.solr.common.cloud.ZooKeeperException:org.apache.solr.common.cloud.ZooKeeperException: Please check your logs for more information {code} and in the log we only see the following exception: {code} 2012-11-28 11:47:26,652 ERROR [solr.handler.ReplicationHandler] - [http-8080-exec-28] - : IO error while trying to get the size of the Directory:org.apache.lucene.store.NoSuchDirectoryException: directory '/opt/solr/cores/shard_f/data/index' does not exist at org.apache.lucene.store.FSDirectory.listAll(FSDirectory.java:217) at org.apache.lucene.store.FSDirectory.listAll(FSDirectory.java:240) at org.apache.lucene.store.NRTCachingDirectory.listAll(NRTCachingDirectory.java:132) at org.apache.solr.core.DirectoryFactory.sizeOfDirectory(DirectoryFactory.java:146) at org.apache.solr.handler.ReplicationHandler.getIndexSize(ReplicationHandler.java:472) at org.apache.solr.handler.ReplicationHandler.getReplicationDetails(ReplicationHandler.java:568) at org.apache.solr.handler.ReplicationHandler.handleRequestBody(ReplicationHandler.java:213) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:144) at org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:240) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1830) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:476) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:276) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293) at org.apache.coyote.http11.Http11NioProcessor.process(Http11NioProcessor.java:889) at org.apache.coyote.http11.Http11NioProtocol$Http11ConnectionHandler.process(Http11NioProtocol.java:744) at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.run(NioEndpoint.java:2274) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4114) Collection API: Allow multiple shards from one collection on the same Solr server
[ https://issues.apache.org/jira/browse/SOLR-4114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13505597#comment-13505597 ] Per Steffensen commented on SOLR-4114: -- bq. so no, generally nothing is being back ported to the 4.0 branch Well, I guess the essence of my question is, if it is ok that I keep providing patches relative to lucene_solr_4_0? At least for this issue and the spread-shards-across-provided-list-of-solrs one? Collection API: Allow multiple shards from one collection on the same Solr server - Key: SOLR-4114 URL: https://issues.apache.org/jira/browse/SOLR-4114 Project: Solr Issue Type: New Feature Components: multicore, SolrCloud Affects Versions: 4.0 Environment: Solr 4.0.0 release Reporter: Per Steffensen Assignee: Per Steffensen Labels: collection-api, multicore, shard, shard-allocation Attachments: SOLR-4114.patch, SOLR-4114.patch We should support running multiple shards from one collection on the same Solr server - the run a collection with 8 shards on a 4 Solr server cluster (each Solr server running 2 shards). Performance tests at our side has shown that this is a good idea, and it is also a good idea for easy elasticity later on - it is much easier to move an entire existing shards from one Solr server to another one that just joined the cluter than it is to split an exsiting shard among the Solr that used to run it and the new Solr. See dev mailing list discussion Multiple shards for one collection on the same Solr server -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4117) IO error while trying to get the size of the Directory
[ https://issues.apache.org/jira/browse/SOLR-4117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13505600#comment-13505600 ] Markus Jelsma commented on SOLR-4117: - Likely indeed. I'll check on this issue tomorrow and try to reproduce the other one, will open new issue if i can. Thanks IO error while trying to get the size of the Directory -- Key: SOLR-4117 URL: https://issues.apache.org/jira/browse/SOLR-4117 Project: Solr Issue Type: Bug Components: SolrCloud Affects Versions: 5.0 Environment: 5.0.0.2012.11.28.10.42.06 Debian Squeeze, Tomcat 6, Sun Java 6, 10 nodes, 10 shards, rep. factor 2. Reporter: Markus Jelsma Assignee: Mark Miller Priority: Minor Fix For: 5.0 With SOLR-4032 fixed we see other issues when randomly taking down nodes (nicely via tomcat restart) while indexing a few million web pages from Hadoop. We do make sure that at least one node is up for a shard but due to recovery issues it may not be live. One node seems to work but generates IO errors in the log and ZookeeperExeption in the GUI. In the GUI we only see: {code} SolrCore Initialization Failures openindex_f: org.apache.solr.common.cloud.ZooKeeperException:org.apache.solr.common.cloud.ZooKeeperException: Please check your logs for more information {code} and in the log we only see the following exception: {code} 2012-11-28 11:47:26,652 ERROR [solr.handler.ReplicationHandler] - [http-8080-exec-28] - : IO error while trying to get the size of the Directory:org.apache.lucene.store.NoSuchDirectoryException: directory '/opt/solr/cores/shard_f/data/index' does not exist at org.apache.lucene.store.FSDirectory.listAll(FSDirectory.java:217) at org.apache.lucene.store.FSDirectory.listAll(FSDirectory.java:240) at org.apache.lucene.store.NRTCachingDirectory.listAll(NRTCachingDirectory.java:132) at org.apache.solr.core.DirectoryFactory.sizeOfDirectory(DirectoryFactory.java:146) at org.apache.solr.handler.ReplicationHandler.getIndexSize(ReplicationHandler.java:472) at org.apache.solr.handler.ReplicationHandler.getReplicationDetails(ReplicationHandler.java:568) at org.apache.solr.handler.ReplicationHandler.handleRequestBody(ReplicationHandler.java:213) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:144) at org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:240) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1830) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:476) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:276) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293) at org.apache.coyote.http11.Http11NioProcessor.process(Http11NioProcessor.java:889) at org.apache.coyote.http11.Http11NioProtocol$Http11ConnectionHandler.process(Http11NioProtocol.java:744) at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.run(NioEndpoint.java:2274) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-3793) Use ReferenceManager in DirectoryTaxonomyReader
[ https://issues.apache.org/jira/browse/LUCENE-3793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shai Erera resolved LUCENE-3793. Resolution: Implemented This issue was taken care of as part of LUCENE-3441. Use ReferenceManager in DirectoryTaxonomyReader --- Key: LUCENE-3793 URL: https://issues.apache.org/jira/browse/LUCENE-3793 Project: Lucene - Core Issue Type: Improvement Components: modules/facet Reporter: Shai Erera Assignee: Shai Erera Priority: Minor Fix For: 4.1 DirTaxoReader uses hairy code to protect its indexReader instance from being modified while threads use it. It maintains a ReentrantLock (indexReaderLock) which is obtained on every 'read' access, while refresh() locks it for 'write' operations (refreshing the IndexReader). Instead of all that, now that we have ReferenceManager in place, I think that we can write a ReaderManagerIndexReader which will be used by DirTR. Every method that requires access to the indexReader will acquire/release (not too different than obtaining/releasing the read lock), and refresh() will call ReaderManager.maybeRefresh(). It will simplify the code and remove some rather long comments, that go into great length explaining why does the code looks like that. This ReaderManager cannot be used for every IndexReader, because DirTR's refresh() logic is special -- it reopens the indexReader, and then verifies that the createTime still matches on the reopened reader as well. Otherwise, it closes the reopened reader and fails with an exception. Therefore, this ReaderManager.refreshIfNeeded will need to take the createTime into consideration and fail if they do not match. And while we're at it ... I wonder if we should have a manager for an IndexReader/ParentArray pair? I think that it makes sense because we don't want DirTR to use a ParentArray that does not match the IndexReader. Today this can happen in refresh() if e.g. after the indexReader instance has been replaced, parentArray.refresh(indexReader) fails. DirTR will be left with a newer IndexReader instance, but old (or worse, corrupt?) ParentArray ... I think it'll be good if we introduce clone() on ParentArray, or a new ctor which takes an int[]. I'll work on a patch once I finish with LUCENE-3786. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-4157) Improve Spatial Testing
[ https://issues.apache.org/jira/browse/LUCENE-4157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Smiley resolved LUCENE-4157. -- Resolution: Fixed Fix Version/s: (was: 4.1) 4.0 Marking as fixed. The titles is a bit general and there has indeed been testing improvements that made it into 4.0. If there's something in particular that needs testing than an issue should be created for it, and there are already such issues. Improve Spatial Testing --- Key: LUCENE-4157 URL: https://issues.apache.org/jira/browse/LUCENE-4157 Project: Lucene - Core Issue Type: Improvement Components: modules/spatial Reporter: David Smiley Assignee: David Smiley Priority: Critical Fix For: 4.0 Attachments: LUCENE-4157_Improve_Lucene_Spatial_testing_p1.patch, LUCENE-4157_Improve_TermQueryPrefixTreeStrategy_and_move_makeQuery_impl_to_SpatialStrategy.patch Looking back at the tests for the Lucene Spatial Module, they seem half-baked. (At least Spatial4j is well tested). I've started working on some improvements: * Some tests are in an abstract base class which have a subclass that provides a SpatialContext. The idea was that the same tests could test other contexts (such as geo vs not or different distance calculators (haversine vs vincenty) but this can be done using RandomizedTesting's nifty parameterized test feature, once there is a need to do this. * Port the complex geohash recursive prefix tree test that was developed on the Solr side to the Lucene side where it belongs. And some things are not tested or aren't well tested: * Distance order as the query score * Indexing shapes other than points (i.e. shapes with area / regions) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4114) Collection API: Allow multiple shards from one collection on the same Solr server
[ https://issues.apache.org/jira/browse/SOLR-4114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13505612#comment-13505612 ] Mark Miller commented on SOLR-4114: --- Well, it makes things a little more painful in that I have to merge it to 4x/5x, but I can do that. It's probably not too difficult. Collection API: Allow multiple shards from one collection on the same Solr server - Key: SOLR-4114 URL: https://issues.apache.org/jira/browse/SOLR-4114 Project: Solr Issue Type: New Feature Components: multicore, SolrCloud Affects Versions: 4.0 Environment: Solr 4.0.0 release Reporter: Per Steffensen Assignee: Per Steffensen Labels: collection-api, multicore, shard, shard-allocation Attachments: SOLR-4114.patch, SOLR-4114.patch We should support running multiple shards from one collection on the same Solr server - the run a collection with 8 shards on a 4 Solr server cluster (each Solr server running 2 shards). Performance tests at our side has shown that this is a good idea, and it is also a good idea for easy elasticity later on - it is much easier to move an entire existing shards from one Solr server to another one that just joined the cluter than it is to split an exsiting shard among the Solr that used to run it and the new Solr. See dev mailing list discussion Multiple shards for one collection on the same Solr server -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (SOLR-3942) Cannot use geodist() function with edismax
[ https://issues.apache.org/jira/browse/SOLR-3942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Smiley resolved SOLR-3942. Resolution: Cannot Reproduce Cannot use geodist() function with edismax -- Key: SOLR-3942 URL: https://issues.apache.org/jira/browse/SOLR-3942 Project: Solr Issue Type: Bug Affects Versions: 4.0 Environment: Windows Server 2008 R2, Windows 7 Reporter: Shane Andrade Assignee: David Smiley Priority: Critical Using the spatial example from the wiki when boosting with edismax: http://localhost:8983/solr/select?defType=edismaxq.alt=*:*fq={!geofilt}sfield=storept=45.15,-93.85d=50boost=recip(geodist(),2,200,20)sort=score%20desc Produces the following error: lst name=error str name=msg org.apache.lucene.queryparser.classic.ParseException: Spatial field must implement MultiValueSource:store{type=geohash,properties=indexed,stored,omitTermFreqAndPositions} /str int name=code400/int /lst When the defType is changed to dismax, the query works as expected. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4574) FunctionQuery ValueSource value computed twice per document
[ https://issues.apache.org/jira/browse/LUCENE-4574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13505619#comment-13505619 ] Robert Muir commented on LUCENE-4574: - I think its generally cheap. like today its already cached in BooleanScorer2 (which solr always gets for a booleanquery), and for a term query its typically like a multiply and so on. So i think caching in general is only useless and would hurt here. in these silly cases (sorting with relevance but also asking for filling scores versus etc), cheaper to just call it twice rather than try to do something funkier in the collector: we would have to benchmark this. {quote} So for me this means adding the cache at FunctionQuery$AllScorer. {quote} I think I like this idea better than adding caching in general to these collectors. Is the score() method typically expensive for function queries? Yet another possibility is, instead of asking to track scores when sorting by relevance, to ask to fill sort fields (the default anyway right?). Its sorta redundant to ask for both. If you do this, i dont think it calls score() twice. Finally, we could also consider something like your patch, except more honed in these particular silly situations. so thats something like, up-front setting a boolean in these collectors ctors if one of the comparators is relevance and also its asked to track scores/max scores. then in setscorer, we could do like your patch only if this boolean is set. i feel like we wouldnt have to add 87 more specialized collectors to do this. I just havent looked at the code to try to figure out what all the situations can be (all those booleans etc to indexsearcher) where score() can currently be called twice. FunctionQuery ValueSource value computed twice per document --- Key: LUCENE-4574 URL: https://issues.apache.org/jira/browse/LUCENE-4574 Project: Lucene - Core Issue Type: Bug Components: core/search Affects Versions: 4.0, 4.1 Reporter: David Smiley Attachments: LUCENE-4574.patch, Test_for_LUCENE-4574.patch I was working on a custom ValueSource and did some basic profiling and debugging to see if it was being used optimally. To my surprise, the value was being fetched twice per document in a row. This computation isn't exactly cheap to calculate so this is a big problem. I was able to work-around this problem trivially on my end by caching the last value with corresponding docid in my FunctionValues implementation. Here is an excerpt of the code path to the first execution: {noformat} at org.apache.lucene.queries.function.docvalues.DoubleDocValues.floatVal(DoubleDocValues.java:48) at org.apache.lucene.queries.function.FunctionQuery$AllScorer.score(FunctionQuery.java:153) at org.apache.lucene.search.TopFieldCollector$OneComparatorScoringMaxScoreCollector.collect(TopFieldCollector.java:291) at org.apache.lucene.search.Scorer.score(Scorer.java:62) at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:588) at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:280) {noformat} And here is the 2nd call: {noformat} at org.apache.lucene.queries.function.docvalues.DoubleDocValues.floatVal(DoubleDocValues.java:48) at org.apache.lucene.queries.function.FunctionQuery$AllScorer.score(FunctionQuery.java:153) at org.apache.lucene.search.ScoreCachingWrappingScorer.score(ScoreCachingWrappingScorer.java:56) at org.apache.lucene.search.FieldComparator$RelevanceComparator.copy(FieldComparator.java:951) at org.apache.lucene.search.TopFieldCollector$OneComparatorScoringMaxScoreCollector.collect(TopFieldCollector.java:312) at org.apache.lucene.search.Scorer.score(Scorer.java:62) at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:588) at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:280) {noformat} The 2nd call appears to use some score caching mechanism, which is all well and good, but that same mechanism wasn't used in the first call so there's no cached value to retrieve. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (SOLR-3601) Reconsider Google Guava dependency
[ https://issues.apache.org/jira/browse/SOLR-3601?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Smiley resolved SOLR-3601. Resolution: Won't Fix Marking as Won't Fix, as Guava successfully made it into 4.0 for better or worse, and so that about settles it. Reconsider Google Guava dependency -- Key: SOLR-3601 URL: https://issues.apache.org/jira/browse/SOLR-3601 Project: Solr Issue Type: Improvement Reporter: David Smiley Assignee: Hoss Man Priority: Minor Google Guava is a cool Java library with lots of useful stuff in it. But note that the old version r05 that we have is 935kb in size and FWIW the latest v12 is 1.8MB. Despite its usefulness, Solr (core) is not actually using it aside for a trivial case in org.apache.solr.logging.jul to get a string from a Throwable. And I'm using it in my uncommitted patch for Solr adapters to the Lucene module. The Clustering contrib module definitely needs it. This dependency to Solr core seems half-hearted and I suspect it may have been inadvertent during improvements to the Clustering contrib module at some point. Shall we get rid of this dependency to Solr core, and push it back to the contrib module? I like Guava, I want to use it in my work, but the reality is that Solr core doesn't even touch 1% of it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3377) eDismax: A fielded query wrapped by parens is not recognized
[ https://issues.apache.org/jira/browse/SOLR-3377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13505623#comment-13505623 ] Jack Krupansky commented on SOLR-3377: -- Yes, it looks like a bug, but distinct from this current Jira. Actually, two bugs: 1. Fielded terms should not be used in phrase boost except for the specified field. 2. Some terms appear to have been skipped for phrase boost. eDismax: A fielded query wrapped by parens is not recognized Key: SOLR-3377 URL: https://issues.apache.org/jira/browse/SOLR-3377 Project: Solr Issue Type: Bug Components: query parsers Affects Versions: 3.6 Reporter: Jan Høydahl Assignee: Yonik Seeley Priority: Critical Fix For: 4.0-BETA Attachments: SOLR-3377.patch, SOLR-3377.patch, SOLR-3377.patch, SOLR-3377.patch As reported by bernd on the user list, a query like this {{q=(name:test)}} will yield 0 hits in 3.6 while it worked in 3.5. It works without the parens. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Closed] (LUCENE-4197) Small improvements to Lucene Spatial Module for v4
[ https://issues.apache.org/jira/browse/LUCENE-4197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Smiley closed LUCENE-4197. Resolution: Fixed Fix Version/s: (was: 4.1) 4.0 Assignee: David Smiley Closing against 4.0; this issue was for small improvements to 4.0 which already shipped. Small improvements to Lucene Spatial Module for v4 -- Key: LUCENE-4197 URL: https://issues.apache.org/jira/browse/LUCENE-4197 Project: Lucene - Core Issue Type: Improvement Components: modules/spatial Reporter: David Smiley Assignee: David Smiley Fix For: 4.0 Attachments: LUCENE-4197_rename_CachedDistanceValueSource.patch, LUCENE-4197_SpatialArgs_doesn_t_need_overloaded_toString()_with_a_ctx_param_.patch, LUCENE-4413_better_spatial_exception_handling.patch, SpatialArgs-_remove_unused_min_and_max_params.patch This issue is to capture small changes to the Lucene spatial module that don't deserve their own issue. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4574) FunctionQuery ValueSource value computed twice per document
[ https://issues.apache.org/jira/browse/LUCENE-4574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13505650#comment-13505650 ] David Smiley commented on LUCENE-4574: -- Rob, FunctionQuery$AllScorer.score() is pretty simple and innocent enough so perhaps that is not the right place to add the cache either. Some ValueSources might have a trivial value e.g. a constant, some might be expensive. [~yo...@apache.org], your first comment was: bq. FunctionValues isn't the right place to solve this... that would cause caching/checking at every level of a function. Do you mean it's wrong for a custom ValueSource I wrote to have its FunctionValues, which I know to be expensive because I wrote it, cache its previous value? That's hard to believe so perhaps you don't mean that. Here's a proposal. Add a ValueSource method boolean nonTrivial(), defaulting to true to be safe but overriding in many subclasses to use false as appropriate. Then, FunctionQuery$AllScorer's constructor (called only per-segment) can check and wrap in a to-be-developed FunctionValues caching wrapper for floatVal(). Unlike my previous proposal in the collector, this proposal targets cases that self-declare themselves to have non-trivial implementations and so are worth caching. FunctionQuery ValueSource value computed twice per document --- Key: LUCENE-4574 URL: https://issues.apache.org/jira/browse/LUCENE-4574 Project: Lucene - Core Issue Type: Bug Components: core/search Affects Versions: 4.0, 4.1 Reporter: David Smiley Attachments: LUCENE-4574.patch, Test_for_LUCENE-4574.patch I was working on a custom ValueSource and did some basic profiling and debugging to see if it was being used optimally. To my surprise, the value was being fetched twice per document in a row. This computation isn't exactly cheap to calculate so this is a big problem. I was able to work-around this problem trivially on my end by caching the last value with corresponding docid in my FunctionValues implementation. Here is an excerpt of the code path to the first execution: {noformat} at org.apache.lucene.queries.function.docvalues.DoubleDocValues.floatVal(DoubleDocValues.java:48) at org.apache.lucene.queries.function.FunctionQuery$AllScorer.score(FunctionQuery.java:153) at org.apache.lucene.search.TopFieldCollector$OneComparatorScoringMaxScoreCollector.collect(TopFieldCollector.java:291) at org.apache.lucene.search.Scorer.score(Scorer.java:62) at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:588) at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:280) {noformat} And here is the 2nd call: {noformat} at org.apache.lucene.queries.function.docvalues.DoubleDocValues.floatVal(DoubleDocValues.java:48) at org.apache.lucene.queries.function.FunctionQuery$AllScorer.score(FunctionQuery.java:153) at org.apache.lucene.search.ScoreCachingWrappingScorer.score(ScoreCachingWrappingScorer.java:56) at org.apache.lucene.search.FieldComparator$RelevanceComparator.copy(FieldComparator.java:951) at org.apache.lucene.search.TopFieldCollector$OneComparatorScoringMaxScoreCollector.collect(TopFieldCollector.java:312) at org.apache.lucene.search.Scorer.score(Scorer.java:62) at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:588) at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:280) {noformat} The 2nd call appears to use some score caching mechanism, which is all well and good, but that same mechanism wasn't used in the first call so there's no cached value to retrieve. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4569) Allow customization of column stride field and norms via indexing chain
[ https://issues.apache.org/jira/browse/LUCENE-4569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13505658#comment-13505658 ] Simon Willnauer commented on LUCENE-4569: - sorry john, busy times over here... I will look into that later this week though. seems pretty straight forward to me at a first glance ie. doesn't hurt anyone Allow customization of column stride field and norms via indexing chain --- Key: LUCENE-4569 URL: https://issues.apache.org/jira/browse/LUCENE-4569 Project: Lucene - Core Issue Type: Improvement Components: core/index Affects Versions: 4.0 Reporter: John Wang Attachments: patch.diff We are building an in-memory indexing format and managing our own segments. We are doing this by implementing a custom IndexingChain. We would like to support column-stride-fields and norms without having to wire in a codec (since we are managing our postings differently) Suggested change is consistent with the api support for passing in a custom InvertedDocConsumer. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Assigned] (LUCENE-4569) Allow customization of column stride field and norms via indexing chain
[ https://issues.apache.org/jira/browse/LUCENE-4569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Willnauer reassigned LUCENE-4569: --- Assignee: Simon Willnauer Allow customization of column stride field and norms via indexing chain --- Key: LUCENE-4569 URL: https://issues.apache.org/jira/browse/LUCENE-4569 Project: Lucene - Core Issue Type: Improvement Components: core/index Affects Versions: 4.0 Reporter: John Wang Assignee: Simon Willnauer Attachments: patch.diff We are building an in-memory indexing format and managing our own segments. We are doing this by implementing a custom IndexingChain. We would like to support column-stride-fields and norms without having to wire in a codec (since we are managing our postings differently) Suggested change is consistent with the api support for passing in a custom InvertedDocConsumer. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Assigned] (LUCENE-4419) Test RecursivePrefixTree indexing non-point data
[ https://issues.apache.org/jira/browse/LUCENE-4419?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Smiley reassigned LUCENE-4419: Assignee: David Smiley Test RecursivePrefixTree indexing non-point data Key: LUCENE-4419 URL: https://issues.apache.org/jira/browse/LUCENE-4419 Project: Lucene - Core Issue Type: Improvement Components: modules/spatial Reporter: David Smiley Assignee: David Smiley RecursivePrefixTreeFilter was modified in ~July 2011 to support spatial filtering of non-point indexed shapes. It seems to work when playing with the capability but it isn't tested. It really needs to be as this is a major feature. I imagine an approach in which some randomly generated rectangles are indexed and then a randomly generated rectangle is queried. The right answer can be calculated brute-force and then compared with the filter. In order to deal with shape imprecision, the randomly generated shapes could be generated to fit a course grid (e.g. round everything to a 1 degree interval). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Active 4.x branches?
On Nov 27, 2012, at 8:39 PM, Mark Miller markrmil...@gmail.com wrote: 40 committers perhaps on paper but precious few are active I'd like to toss some numbers against that statement after taking a look at Ohloh. According to Ohloh, the stats for Lucene and Solr are: Lucene Committers active within the past month: 13 Committers active within the past year: 29 Solr Committers active within the past month: 9 Committers active within the past year: 29 For just under 40 committers and the volunteer nature of OpenSource, those numbers look pretty good! Not everyone that is active has the same amount of time to volunteer. - Mark - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4574) FunctionQuery ValueSource value computed twice per document
[ https://issues.apache.org/jira/browse/LUCENE-4574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13505676#comment-13505676 ] Adrien Grand commented on LUCENE-4574: -- bq. Add a ValueSource method boolean nonTrivial() Could we move this logic to a upper level and expect callers of {{FunctionQuery(ValueSource)}} to provide a ValueSource impl that returns FunctionValues impls that cache their values when the computation is expensive? Then Solr could wrap costly value sources when its function values get* methods are likely to be called several times per document? FunctionQuery ValueSource value computed twice per document --- Key: LUCENE-4574 URL: https://issues.apache.org/jira/browse/LUCENE-4574 Project: Lucene - Core Issue Type: Bug Components: core/search Affects Versions: 4.0, 4.1 Reporter: David Smiley Attachments: LUCENE-4574.patch, Test_for_LUCENE-4574.patch I was working on a custom ValueSource and did some basic profiling and debugging to see if it was being used optimally. To my surprise, the value was being fetched twice per document in a row. This computation isn't exactly cheap to calculate so this is a big problem. I was able to work-around this problem trivially on my end by caching the last value with corresponding docid in my FunctionValues implementation. Here is an excerpt of the code path to the first execution: {noformat} at org.apache.lucene.queries.function.docvalues.DoubleDocValues.floatVal(DoubleDocValues.java:48) at org.apache.lucene.queries.function.FunctionQuery$AllScorer.score(FunctionQuery.java:153) at org.apache.lucene.search.TopFieldCollector$OneComparatorScoringMaxScoreCollector.collect(TopFieldCollector.java:291) at org.apache.lucene.search.Scorer.score(Scorer.java:62) at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:588) at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:280) {noformat} And here is the 2nd call: {noformat} at org.apache.lucene.queries.function.docvalues.DoubleDocValues.floatVal(DoubleDocValues.java:48) at org.apache.lucene.queries.function.FunctionQuery$AllScorer.score(FunctionQuery.java:153) at org.apache.lucene.search.ScoreCachingWrappingScorer.score(ScoreCachingWrappingScorer.java:56) at org.apache.lucene.search.FieldComparator$RelevanceComparator.copy(FieldComparator.java:951) at org.apache.lucene.search.TopFieldCollector$OneComparatorScoringMaxScoreCollector.collect(TopFieldCollector.java:312) at org.apache.lucene.search.Scorer.score(Scorer.java:62) at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:588) at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:280) {noformat} The 2nd call appears to use some score caching mechanism, which is all well and good, but that same mechanism wasn't used in the first call so there's no cached value to retrieve. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (SOLR-4117) IO error while trying to get the size of the Directory
[ https://issues.apache.org/jira/browse/SOLR-4117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Miller resolved SOLR-4117. --- Resolution: Fixed IO error while trying to get the size of the Directory -- Key: SOLR-4117 URL: https://issues.apache.org/jira/browse/SOLR-4117 Project: Solr Issue Type: Bug Components: SolrCloud Affects Versions: 5.0 Environment: 5.0.0.2012.11.28.10.42.06 Debian Squeeze, Tomcat 6, Sun Java 6, 10 nodes, 10 shards, rep. factor 2. Reporter: Markus Jelsma Assignee: Mark Miller Priority: Minor Fix For: 5.0 With SOLR-4032 fixed we see other issues when randomly taking down nodes (nicely via tomcat restart) while indexing a few million web pages from Hadoop. We do make sure that at least one node is up for a shard but due to recovery issues it may not be live. One node seems to work but generates IO errors in the log and ZookeeperExeption in the GUI. In the GUI we only see: {code} SolrCore Initialization Failures openindex_f: org.apache.solr.common.cloud.ZooKeeperException:org.apache.solr.common.cloud.ZooKeeperException: Please check your logs for more information {code} and in the log we only see the following exception: {code} 2012-11-28 11:47:26,652 ERROR [solr.handler.ReplicationHandler] - [http-8080-exec-28] - : IO error while trying to get the size of the Directory:org.apache.lucene.store.NoSuchDirectoryException: directory '/opt/solr/cores/shard_f/data/index' does not exist at org.apache.lucene.store.FSDirectory.listAll(FSDirectory.java:217) at org.apache.lucene.store.FSDirectory.listAll(FSDirectory.java:240) at org.apache.lucene.store.NRTCachingDirectory.listAll(NRTCachingDirectory.java:132) at org.apache.solr.core.DirectoryFactory.sizeOfDirectory(DirectoryFactory.java:146) at org.apache.solr.handler.ReplicationHandler.getIndexSize(ReplicationHandler.java:472) at org.apache.solr.handler.ReplicationHandler.getReplicationDetails(ReplicationHandler.java:568) at org.apache.solr.handler.ReplicationHandler.handleRequestBody(ReplicationHandler.java:213) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:144) at org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:240) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1830) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:476) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:276) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293) at org.apache.coyote.http11.Http11NioProcessor.process(Http11NioProcessor.java:889) at org.apache.coyote.http11.Http11NioProtocol$Http11ConnectionHandler.process(Http11NioProtocol.java:744) at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.run(NioEndpoint.java:2274) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-4016) Deduplication is broken by partial update
[ https://issues.apache.org/jira/browse/SOLR-4016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Miller updated SOLR-4016: -- Description: The SignatureUpdateProcessorFactory used (primarily?) for deduplication does not consider partial update semantics. The below uses the following solrconfig.xml excerpt: {noformat} updateRequestProcessorChain name=text_hash processor class=solr.processor.SignatureUpdateProcessorFactory bool name=enabledtrue/bool str name=signatureFieldtext_hash/str bool name=overwriteDupesfalse/bool str name=fieldstext/str str name=signatureClasssolr.processor.TextProfileSignature/str /processor processor class=solr.LogUpdateProcessorFactory / processor class=solr.RunUpdateProcessorFactory / /updateRequestProcessorChain {noformat} Firstly, the processor treats {noformat}{set: value}{noformat} as a string and hashes it, instead of the value alone: {noformat} $ curl '$URL/update?commit=true' -H 'Content-type:application/json' -d '{add:{doc:{id: abcde, text: {set: hello world' curl '$URL/select?q=id:abcde' {responseHeader:{status:0,QTime:30}} ?xml version=1.0 encoding=UTF-8?responselst name=responseHeaderint name=status0/intint name=QTime1/intlst name=paramsstr name=qid:abcde/str/lst/lstresult name=response numFound=1 start=0docstr name=idabcde/strstr name=texthello world/strstr name=text_hashad48c7ad60ac22cc/strlong name=_version_1417247434224959488/long/doc/result /response $ $ curl '$URL/update?commit=true' -H 'Content-type:application/json' -d '{add:{doc:{id: abcde, text: hello world}}}' curl '$URL/select?q=id:abcde' {responseHeader:{status:0,QTime:27}} ?xml version=1.0 encoding=UTF-8? response lst name=responseHeaderint name=status0/intint name=QTime1/intlst name=paramsstr name=qid:abcde/str/lst/lstresult name=response numFound=1 start=0docstr name=idabcde/strstr name=texthello world/strstr name=text_hashb169c743d220da8d/strlong name=_version_141724802221564/long/doc/result /response {noformat} Note the different text_hash value. Secondly, when updating a field other than those used to create the signature (which I imagine is a more common use-case), the signature is recalculated from no values: {noformat} $ curl '$URL/update?commit=true' -H 'Content-type:application/json' -d '{add:{doc:{id: abcde, title: {set: new title' curl '$URL/select?q=id:abcde' {responseHeader:{status:0,QTime:39}} ?xml version=1.0 encoding=UTF-8? response lst name=responseHeaderint name=status0/intint name=QTime1/intlst name=paramsstr name=qid:abcde/str/lst/lstresult name=response numFound=1 start=0docstr name=idabcde/strstr name=texthello world/strstr name=text_hash/strstr name=titlenew title/strlong name=_version_1417248120480202752/long/doc/result /response {noformat} was: The SignatureUpdateProcessorFactory used (primarily?) for deduplication does not consider partial update semantics. The below uses the following solrconfig.xml excerpt: {noformat} updateRequestProcessorChain name=text_hash processor class=solr.processor.SignatureUpdateProcessorFactory bool name=enabledtrue/bool str name=signatureFieldtext_hash/str bool name=overwriteDupesfalse/bool str name=fieldstext/str str name=signatureClasssolr.processor.TextProfileSignature/str /processor processor class=solr.LogUpdateProcessorFactory / processor class=solr.RunUpdateProcessorFactory / /updateRequestProcessorChain {noformat} Firstly, the processor treats {noformat}{set: value}{noformat} as a string and hashes it, instead of the value alone: {noformat} $ curl '$URL/update?commit=true' -H 'Content-type:application/json' -d '{add:{doc:{id: abcde, text: {set: hello world' curl '$URL/select?q=id:abcde' {responseHeader:{status:0,QTime:30}} ?xml version=1.0 encoding=UTF-8?responselst name=responseHeaderint name=status0/intint name=QTime1/intlst name=paramsstr name=qid:abcde/str/lst/lstresult name=response numFound=1 start=0docstr name=idabcde/strstr name=texthello world/strstr name=text_hashad48c7ad60ac22cc/strlong name=_version_1417247434224959488/long/doc/result /response $ $ curl '$URL/update?commit=true' -H 'Content-type:application/json' -d '{add:{doc:{id: abcde, text: hello world}}}' curl '$URL/select?q=id:abcde' {responseHeader:{status:0,QTime:27}} ?xml version=1.0 encoding=UTF-8? response lst name=responseHeaderint name=status0/intint name=QTime1/intlst name=paramsstr name=qid:abcde/str/lst/lstresult name=response numFound=1 start=0docstr name=idabcde/strstr name=texthello world/strstr name=text_hashb169c743d220da8d/strlong name=_version_141724802221564/long/doc/result /response {noformat} Note the different text_hash value. Secondly, when updating a field other than those used to create the signature (which I imagine
[jira] [Updated] (SOLR-4016) Deduplication is broken by partial update
[ https://issues.apache.org/jira/browse/SOLR-4016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Miller updated SOLR-4016: -- Labels: 4.0.1_Candidate (was: ) Deduplication is broken by partial update - Key: SOLR-4016 URL: https://issues.apache.org/jira/browse/SOLR-4016 Project: Solr Issue Type: Bug Components: update Affects Versions: 4.0 Environment: Tomcat6 / Catalina on Ubuntu 12.04 LTS Reporter: Joel Nothman Labels: 4.0.1_Candidate Fix For: 4.1, 5.0 The SignatureUpdateProcessorFactory used (primarily?) for deduplication does not consider partial update semantics. The below uses the following solrconfig.xml excerpt: {noformat} updateRequestProcessorChain name=text_hash processor class=solr.processor.SignatureUpdateProcessorFactory bool name=enabledtrue/bool str name=signatureFieldtext_hash/str bool name=overwriteDupesfalse/bool str name=fieldstext/str str name=signatureClasssolr.processor.TextProfileSignature/str /processor processor class=solr.LogUpdateProcessorFactory / processor class=solr.RunUpdateProcessorFactory / /updateRequestProcessorChain {noformat} Firstly, the processor treats {noformat}{set: value}{noformat} as a string and hashes it, instead of the value alone: {noformat} $ curl '$URL/update?commit=true' -H 'Content-type:application/json' -d '{add:{doc:{id: abcde, text: {set: hello world' curl '$URL/select?q=id:abcde' {responseHeader:{status:0,QTime:30}} ?xml version=1.0 encoding=UTF-8?responselst name=responseHeaderint name=status0/intint name=QTime1/intlst name=paramsstr name=qid:abcde/str/lst/lstresult name=response numFound=1 start=0docstr name=idabcde/strstr name=texthello world/strstr name=text_hashad48c7ad60ac22cc/strlong name=_version_1417247434224959488/long/doc/result /response $ $ curl '$URL/update?commit=true' -H 'Content-type:application/json' -d '{add:{doc:{id: abcde, text: hello world}}}' curl '$URL/select?q=id:abcde' {responseHeader:{status:0,QTime:27}} ?xml version=1.0 encoding=UTF-8? response lst name=responseHeaderint name=status0/intint name=QTime1/intlst name=paramsstr name=qid:abcde/str/lst/lstresult name=response numFound=1 start=0docstr name=idabcde/strstr name=texthello world/strstr name=text_hashb169c743d220da8d/strlong name=_version_141724802221564/long/doc/result /response {noformat} Note the different text_hash value. Secondly, when updating a field other than those used to create the signature (which I imagine is a more common use-case), the signature is recalculated from no values: {noformat} $ curl '$URL/update?commit=true' -H 'Content-type:application/json' -d '{add:{doc:{id: abcde, title: {set: new title' curl '$URL/select?q=id:abcde' {responseHeader:{status:0,QTime:39}} ?xml version=1.0 encoding=UTF-8? response lst name=responseHeaderint name=status0/intint name=QTime1/intlst name=paramsstr name=qid:abcde/str/lst/lstresult name=response numFound=1 start=0docstr name=idabcde/strstr name=texthello world/strstr name=text_hash/strstr name=titlenew title/strlong name=_version_1417248120480202752/long/doc/result /response {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4110) Configurable Content-Type headers for PHPResponseWriters and PHPSerializedResponseWriter
[ https://issues.apache.org/jira/browse/SOLR-4110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13505695#comment-13505695 ] Mark Miller commented on SOLR-4110: --- What about back compat? How might this affect those upgrading from 4.0? Configurable Content-Type headers for PHPResponseWriters and PHPSerializedResponseWriter Key: SOLR-4110 URL: https://issues.apache.org/jira/browse/SOLR-4110 Project: Solr Issue Type: Improvement Components: Response Writers Affects Versions: 4.0 Reporter: Dominik Siebel Priority: Minor Labels: 4.0.1_Candidate Fix For: 4.1, 5.0 Attachments: SOLR-4110.patch The *PHPResponseWriter* and *PHPSerializedResponseWriter* currently send a hard coded Content-Type header of _text/plain; charset=UTF-8_ although there are constants defining _text/x-php;charset=UTF-8_ and _text/x-php-serialized;charset=UTF-8_ which remain unused. This makes content type guessing on the client side quite complicated. I already created a patch (from the branch_4x github branch) to use the respective constants and also added the possibility to configure the Content-Type header via solrconfig.xml (like in JSONResponseWriter). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Assigned] (SOLR-4110) Configurable Content-Type headers for PHPResponseWriters and PHPSerializedResponseWriter
[ https://issues.apache.org/jira/browse/SOLR-4110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Miller reassigned SOLR-4110: - Assignee: Mark Miller Configurable Content-Type headers for PHPResponseWriters and PHPSerializedResponseWriter Key: SOLR-4110 URL: https://issues.apache.org/jira/browse/SOLR-4110 Project: Solr Issue Type: Improvement Components: Response Writers Affects Versions: 4.0 Reporter: Dominik Siebel Assignee: Mark Miller Priority: Minor Labels: 4.0.1_Candidate Fix For: 4.1, 5.0 Attachments: SOLR-4110.patch The *PHPResponseWriter* and *PHPSerializedResponseWriter* currently send a hard coded Content-Type header of _text/plain; charset=UTF-8_ although there are constants defining _text/x-php;charset=UTF-8_ and _text/x-php-serialized;charset=UTF-8_ which remain unused. This makes content type guessing on the client side quite complicated. I already created a patch (from the branch_4x github branch) to use the respective constants and also added the possibility to configure the Content-Type header via solrconfig.xml (like in JSONResponseWriter). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-4087) MoreLikeThis missing MAX_DOC_FREQ option
[ https://issues.apache.org/jira/browse/SOLR-4087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Miller updated SOLR-4087: -- Priority: Minor (was: Major) Fix Version/s: (was: 4.0.1) 5.0 MoreLikeThis missing MAX_DOC_FREQ option Key: SOLR-4087 URL: https://issues.apache.org/jira/browse/SOLR-4087 Project: Solr Issue Type: Improvement Affects Versions: 4.0 Reporter: Andrew Janowczyk Priority: Minor Fix For: 4.1, 5.0 Attachments: MorelikeThis-maxdocfreq.patch the MoreLikeThisHandler supports almost all of the underlying MoreLikeThis options except for MAX_DOC_FREQ, which seems important in preventing terms from being selected for the query which are present in too many documents. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4569) Allow customization of column stride field and norms via indexing chain
[ https://issues.apache.org/jira/browse/LUCENE-4569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13505718#comment-13505718 ] John Wang commented on LUCENE-4569: --- Thanks Simon! No rush, was just wanted some feedback. Do you think we should do the same for stored fields? Chris: we are building a custom IndexingChain which is at a higher level than Codecs. You are definitely right, and currently I am able to get what I needed via a codec. e.g. our custom indexing chain to handle indexed documents, and then register a codec to intercept the code path for norms and CSF, but this ends up with 2 customization hooks for the same indexer. Thanks! -John Allow customization of column stride field and norms via indexing chain --- Key: LUCENE-4569 URL: https://issues.apache.org/jira/browse/LUCENE-4569 Project: Lucene - Core Issue Type: Improvement Components: core/index Affects Versions: 4.0 Reporter: John Wang Assignee: Simon Willnauer Attachments: patch.diff We are building an in-memory indexing format and managing our own segments. We are doing this by implementing a custom IndexingChain. We would like to support column-stride-fields and norms without having to wire in a codec (since we are managing our postings differently) Suggested change is consistent with the api support for passing in a custom InvertedDocConsumer. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4087) MoreLikeThis missing MAX_DOC_FREQ option
[ https://issues.apache.org/jira/browse/SOLR-4087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13505733#comment-13505733 ] Commit Tag Bot commented on SOLR-4087: -- [branch_4x commit] Mark Robert Miller http://svn.apache.org/viewvc?view=revisionrevision=1414846 SOLR-4087: Add MAX_DOC_FREQ option to MoreLikeThis. MoreLikeThis missing MAX_DOC_FREQ option Key: SOLR-4087 URL: https://issues.apache.org/jira/browse/SOLR-4087 Project: Solr Issue Type: Improvement Affects Versions: 4.0 Reporter: Andrew Janowczyk Priority: Minor Fix For: 4.1, 5.0 Attachments: MorelikeThis-maxdocfreq.patch the MoreLikeThisHandler supports almost all of the underlying MoreLikeThis options except for MAX_DOC_FREQ, which seems important in preventing terms from being selected for the query which are present in too many documents. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4087) MoreLikeThis missing MAX_DOC_FREQ option
[ https://issues.apache.org/jira/browse/SOLR-4087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13505734#comment-13505734 ] Commit Tag Bot commented on SOLR-4087: -- [trunk commit] Mark Robert Miller http://svn.apache.org/viewvc?view=revisionrevision=1414841 SOLR-4087: Add MAX_DOC_FREQ option to MoreLikeThis. MoreLikeThis missing MAX_DOC_FREQ option Key: SOLR-4087 URL: https://issues.apache.org/jira/browse/SOLR-4087 Project: Solr Issue Type: Improvement Affects Versions: 4.0 Reporter: Andrew Janowczyk Priority: Minor Fix For: 4.1, 5.0 Attachments: MorelikeThis-maxdocfreq.patch the MoreLikeThisHandler supports almost all of the underlying MoreLikeThis options except for MAX_DOC_FREQ, which seems important in preventing terms from being selected for the query which are present in too many documents. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (SOLR-4087) MoreLikeThis missing MAX_DOC_FREQ option
[ https://issues.apache.org/jira/browse/SOLR-4087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Miller resolved SOLR-4087. --- Resolution: Fixed Thanks Andrew! MoreLikeThis missing MAX_DOC_FREQ option Key: SOLR-4087 URL: https://issues.apache.org/jira/browse/SOLR-4087 Project: Solr Issue Type: Improvement Affects Versions: 4.0 Reporter: Andrew Janowczyk Priority: Minor Fix For: 4.1, 5.0 Attachments: MorelikeThis-maxdocfreq.patch the MoreLikeThisHandler supports almost all of the underlying MoreLikeThis options except for MAX_DOC_FREQ, which seems important in preventing terms from being selected for the query which are present in too many documents. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-2908) To push the terms.limit parameter from the master core to all the shard cores.
[ https://issues.apache.org/jira/browse/SOLR-2908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Miller updated SOLR-2908: -- Fix Version/s: (was: 1.4.1) 5.0 4.1 To push the terms.limit parameter from the master core to all the shard cores. -- Key: SOLR-2908 URL: https://issues.apache.org/jira/browse/SOLR-2908 Project: Solr Issue Type: Improvement Components: SearchComponents - other Affects Versions: 1.4.1 Environment: Linux server. 64 bit processor and 16GB Ram. Reporter: sivaganesh Priority: Critical Labels: patch Fix For: 4.1, 5.0 Original Estimate: 168h Remaining Estimate: 168h When we pass the terms.limit parameter to the master (which has many shard cores), it's not getting pushed down to the individual cores. Instead the default value of -1 is assigned to Terms.limit parameter is assigned in the underlying shard cores. The issue being the time taken by the Master core to return the required limit of terms is higher when we are having more number of underlying shard cores. This affects the performances of the auto suggest feature. Can thought we can have a parameter to explicitly override the -1 being set to Terms.limit in shards core. We saw the source code(TermsComponent.java) and concluded that the same. Please help us in pushing the terms.limit parameter to shard cores. PFB code snippet. private ShardRequest createShardQuery(SolrParams params) { ShardRequest sreq = new ShardRequest(); sreq.purpose = ShardRequest.PURPOSE_GET_TERMS; // base shard request on original parameters sreq.params = new ModifiableSolrParams(params); // remove any limits for shards, we want them to return all possible // responses // we want this so we can calculate the correct counts // dont sort by count to avoid that unnecessary overhead on the shards sreq.params.remove(TermsParams.TERMS_MAXCOUNT); sreq.params.remove(TermsParams.TERMS_MINCOUNT); sreq.params.set(TermsParams.TERMS_LIMIT, -1); sreq.params.set(TermsParams.TERMS_SORT, TermsParams.TERMS_SORT_INDEX); return sreq; } Solr Version: Solr Specification Version: 1.4.0.2010.01.13.08.09.44 Solr Implementation Version: 1.5-dev exported - yonik - 2010-01-13 08:09:44 Lucene Specification Version: 2.9.1-dev Lucene Implementation Version: 2.9.1-dev 888785 - 2009-12-09 18:03:31 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-4112) Dataimporting with SolrCloud Fails
[ https://issues.apache.org/jira/browse/SOLR-4112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Miller updated SOLR-4112: -- Fix Version/s: 5.0 4.1 Dataimporting with SolrCloud Fails -- Key: SOLR-4112 URL: https://issues.apache.org/jira/browse/SOLR-4112 Project: Solr Issue Type: Bug Affects Versions: 5.0 Reporter: Deniz Durmus Fix For: 4.1, 5.0 Attachments: SOLR-4112.patch While trying to import data from db on cloud, it shows this in logs: SEVERE: Full Import failed:org.apache.solr.handler.dataimport.DataImportHandlerException: Unable to PropertyWriter implementation:ZKPropertiesWriter at org.apache.solr.handler.dataimport.DataImporter.createPropertyWriter(DataImporter.java:336) at org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:418) at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:487) at org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:468) Caused by: org.apache.solr.common.cloud.ZooKeeperException: ZkSolrResourceLoader does not support getConfigDir() - likely, what you are trying to do is not supported in ZooKeeper mode at org.apache.solr.cloud.ZkSolrResourceLoader.getConfigDir(ZkSolrResourceLoader.java:100) at org.apache.solr.handler.dataimport.SimplePropertiesWriter.init(SimplePropertiesWriter.java:91) at org.apache.solr.handler.dataimport.ZKPropertiesWriter.init(ZKPropertiesWriter.java:45) at org.apache.solr.handler.dataimport.DataImporter.createPropertyWriter(DataImporter.java:334) ... 3 more Exception in thread Thread-306 java.lang.NullPointerException at org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:427) at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:487) at org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:468) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org