[jira] [Commented] (LUCENE-5375) ToChildBlockJoinQuery becomes crazy on wrong subquery
[ https://issues.apache.org/jira/browse/LUCENE-5375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865200#comment-13865200 ] Dr Oleg Savrasov commented on LUCENE-5375: -- After some investigations I came to conclusion that existing assertions in ToChildBlockJoinScorer are not sufficient to guarantee child and parent documents orthogonality. In order to prove that, I've created a test which expects appropriate AssertionError, please see attached SOLR-5553-insufficient_assertions.patch. The test fails for example with -Dtests.seed=E8A0C61499EE8851:C5E7CB6721742C4F. I have not found any way for correct validation other than checking parentBits. It costs retrieving appropriate bit from FixedBitSet, but it seems not too expensive. The test is reworked in order to be Lucene-level and cover the cases when existing assertions are not sufficient. Please see attached SOLR-5553-1.patch. ToChildBlockJoinQuery becomes crazy on wrong subquery - Key: LUCENE-5375 URL: https://issues.apache.org/jira/browse/LUCENE-5375 Project: Lucene - Core Issue Type: Bug Components: modules/join Affects Versions: 4.6 Reporter: Dr Oleg Savrasov Labels: patch Attachments: SOLR-5553.patch Original Estimate: 24h Remaining Estimate: 24h If user supplies wrong subquery to ToParentBlockJoinQuery it reasonably throws IllegalStateException. (http://lucene.apache.org/core/4_0_0/join/org/apache/lucene/search/join/ToParentBlockJoinQuery.html 'The child documents must be orthogonal to the parent documents: the wrapped child query must never return a parent document.'). However ToChildBlockJoinQuery just goes crazy silently. I want to provide simple patch for ToChildBlockJoinQuery with if-throw clause and test. See http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201311.mbox/%3cf415ce3a-ebe5-4d15-adf1-c5ead32a1...@sheffield.ac.uk%3E -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-5375) ToChildBlockJoinQuery becomes crazy on wrong subquery
[ https://issues.apache.org/jira/browse/LUCENE-5375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dr Oleg Savrasov updated LUCENE-5375: - Attachment: SOLR-5553-insufficient_assertions.patch Test that shows insufficient assertions in ToChildBlockJoinScorer. Expects AssertionError and fails randomly, for example with -Dtests.seed=E8A0C61499EE8851:C5E7CB6721742C4F ToChildBlockJoinQuery becomes crazy on wrong subquery - Key: LUCENE-5375 URL: https://issues.apache.org/jira/browse/LUCENE-5375 Project: Lucene - Core Issue Type: Bug Components: modules/join Affects Versions: 4.6 Reporter: Dr Oleg Savrasov Labels: patch Attachments: SOLR-5553-1.patch, SOLR-5553-insufficient_assertions.patch, SOLR-5553.patch Original Estimate: 24h Remaining Estimate: 24h If user supplies wrong subquery to ToParentBlockJoinQuery it reasonably throws IllegalStateException. (http://lucene.apache.org/core/4_0_0/join/org/apache/lucene/search/join/ToParentBlockJoinQuery.html 'The child documents must be orthogonal to the parent documents: the wrapped child query must never return a parent document.'). However ToChildBlockJoinQuery just goes crazy silently. I want to provide simple patch for ToChildBlockJoinQuery with if-throw clause and test. See http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201311.mbox/%3cf415ce3a-ebe5-4d15-adf1-c5ead32a1...@sheffield.ac.uk%3E -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-5375) ToChildBlockJoinQuery becomes crazy on wrong subquery
[ https://issues.apache.org/jira/browse/LUCENE-5375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dr Oleg Savrasov updated LUCENE-5375: - Attachment: SOLR-5553-1.patch Reworked patch ToChildBlockJoinQuery becomes crazy on wrong subquery - Key: LUCENE-5375 URL: https://issues.apache.org/jira/browse/LUCENE-5375 Project: Lucene - Core Issue Type: Bug Components: modules/join Affects Versions: 4.6 Reporter: Dr Oleg Savrasov Labels: patch Attachments: SOLR-5553-1.patch, SOLR-5553-insufficient_assertions.patch, SOLR-5553.patch Original Estimate: 24h Remaining Estimate: 24h If user supplies wrong subquery to ToParentBlockJoinQuery it reasonably throws IllegalStateException. (http://lucene.apache.org/core/4_0_0/join/org/apache/lucene/search/join/ToParentBlockJoinQuery.html 'The child documents must be orthogonal to the parent documents: the wrapped child query must never return a parent document.'). However ToChildBlockJoinQuery just goes crazy silently. I want to provide simple patch for ToChildBlockJoinQuery with if-throw clause and test. See http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201311.mbox/%3cf415ce3a-ebe5-4d15-adf1-c5ead32a1...@sheffield.ac.uk%3E -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-5594) Enable using extended field types with prefix queries for non-default encoded strings
[ https://issues.apache.org/jira/browse/SOLR-5594?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anshum Gupta updated SOLR-5594: --- Attachment: SOLR-5594.patch Here's another patch. Enable using extended field types with prefix queries for non-default encoded strings - Key: SOLR-5594 URL: https://issues.apache.org/jira/browse/SOLR-5594 Project: Solr Issue Type: Improvement Components: query parsers, Schema and Analysis Affects Versions: 4.6 Reporter: Anshum Gupta Assignee: Anshum Gupta Priority: Minor Attachments: SOLR-5594-branch_4x.patch, SOLR-5594.patch, SOLR-5594.patch Enable users to be able to use prefix query with custom field types with non-default encoding/decoding for queries more easily. e.g. having a custom field work with base64 encoded query strings. Currently, the workaround for it is to have the override at getRewriteMethod level. Perhaps having the prefixQuery also use the calling FieldType's readableToIndexed method would work better. -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5594) Enable using extended field types with prefix queries for non-default encoded strings
[ https://issues.apache.org/jira/browse/SOLR-5594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865215#comment-13865215 ] Anshum Gupta commented on SOLR-5594: Thanks for the feedback Hoss! I've tried to be less invasive this time around and have avoided using readableToIndexed in the base class. I've also fixed the javadocs. About using the BinaryTokenStream (extending BinaryField), I may be missing something as far as the reason is concerned but it's tough to work with BinaryFields right now. They can not be indexed really and the only way to index a binary field is through the hack I put it. I've removed all of that and made this more generic. I'll also open a separate JIRA to make BinaryFields better and more usable. Here's what still remains before this can be checked in: * Fix/change dependent Parser classes e.g. PrefixQParserPlugin and SimpleQParserPlugin. * A test that shows that things haven't changed for the existing field types as far as Prefix Queries are concerned. Need a couple of hours for that. Enable using extended field types with prefix queries for non-default encoded strings - Key: SOLR-5594 URL: https://issues.apache.org/jira/browse/SOLR-5594 Project: Solr Issue Type: Improvement Components: query parsers, Schema and Analysis Affects Versions: 4.6 Reporter: Anshum Gupta Assignee: Anshum Gupta Priority: Minor Attachments: SOLR-5594-branch_4x.patch, SOLR-5594.patch, SOLR-5594.patch Enable users to be able to use prefix query with custom field types with non-default encoding/decoding for queries more easily. e.g. having a custom field work with base64 encoded query strings. Currently, the workaround for it is to have the override at getRewriteMethod level. Perhaps having the prefixQuery also use the calling FieldType's readableToIndexed method would work better. -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5388) Eliminate construction over readers for Tokenizer
[ https://issues.apache.org/jira/browse/LUCENE-5388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865217#comment-13865217 ] Michael McCandless commented on LUCENE-5388: +1, it's weird that the ctor takes a Reader and we also have a setReader. This is a relic from the pre-reuse days... Eliminate construction over readers for Tokenizer - Key: LUCENE-5388 URL: https://issues.apache.org/jira/browse/LUCENE-5388 Project: Lucene - Core Issue Type: Improvement Components: core/other Reporter: Benson Margulies In the modern world, Tokenizers are intended to be reusable, with input supplied via #setReader. The constructors that take Reader are a vestige. Worse yet, they invite people to make mistakes in handling the reader that tangle them up with the state machine in Tokenizer. The sensible thing is to eliminate these ctors, and force setReader usage. -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5387) Improve FacetConfig.build
[ https://issues.apache.org/jira/browse/LUCENE-5387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865219#comment-13865219 ] Michael McCandless commented on LUCENE-5387: +1 to fix FC.build to return Document, and improve the javadocs. Improve FacetConfig.build - Key: LUCENE-5387 URL: https://issues.apache.org/jira/browse/LUCENE-5387 Project: Lucene - Core Issue Type: Improvement Components: modules/facet Reporter: Shai Erera Assignee: Shai Erera FacetConfig.build() takes an IndexDocument and returns a new instance of IndexDocument. This forces you to write code like this: {code} Document doc = new Document(); doc.add(new StringField(id, someID, Store.NO)); doc.add(new FacetField(author, john)); IndexDocument facetDoc = facetConfig.build(doc); indexWriter.addDocument(facetDoc); {code} Technically, you don't need to declare 'facetDoc', you could just {{indexWriter.addDocument(facetConfig.build(doc))}}, but it's weird: * After you call facetConfig.build(), you cannot add any more fields to the document (since you get an IndexDoc), so you must call it last. * Nothing suggests you *should* call facetConfig.build() at all - I can already see users trapped by the new API, thinking that adding a FacetField is enough. We should at least document on FacetField that you should call FacetConfig.build(). * Nothing suggests that you shouldn't ignore the returned IndexDoc from FC.build() - we should at least document that. I think that if FacetConfig.build() took an IndexDocument but returned a Document, that will at least allow you to call it in whatever stage of the pipeline that you want (after adding all FacetFields though)... I'll post a patch later. -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5388) Eliminate construction over readers for Tokenizer
[ https://issues.apache.org/jira/browse/LUCENE-5388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865222#comment-13865222 ] Uwe Schindler commented on LUCENE-5388: --- +1, but delay this until 5.0. Because there are many Tokenizers outside we should keep backwards compatibility. Eliminate construction over readers for Tokenizer - Key: LUCENE-5388 URL: https://issues.apache.org/jira/browse/LUCENE-5388 Project: Lucene - Core Issue Type: Improvement Components: core/other Reporter: Benson Margulies In the modern world, Tokenizers are intended to be reusable, with input supplied via #setReader. The constructors that take Reader are a vestige. Worse yet, they invite people to make mistakes in handling the reader that tangle them up with the state machine in Tokenizer. The sensible thing is to eliminate these ctors, and force setReader usage. -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-5619) Improve BinaryField to make it Sortable and Indexable
Anshum Gupta created SOLR-5619: -- Summary: Improve BinaryField to make it Sortable and Indexable Key: SOLR-5619 URL: https://issues.apache.org/jira/browse/SOLR-5619 Project: Solr Issue Type: Improvement Affects Versions: 4.6 Reporter: Anshum Gupta Assignee: Anshum Gupta Currently, BinaryField can neither be indexed nor sorted on. Having them indexable and sortable should come in handy for a reasonable amount of use cases e.g. wanting to index binary data (could come from anything non-text). -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5594) Enable using extended field types with prefix queries for non-default encoded strings
[ https://issues.apache.org/jira/browse/SOLR-5594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865255#comment-13865255 ] Anshum Gupta commented on SOLR-5594: https://issues.apache.org/jira/browse/SOLR-5619 Enable using extended field types with prefix queries for non-default encoded strings - Key: SOLR-5594 URL: https://issues.apache.org/jira/browse/SOLR-5594 Project: Solr Issue Type: Improvement Components: query parsers, Schema and Analysis Affects Versions: 4.6 Reporter: Anshum Gupta Assignee: Anshum Gupta Priority: Minor Attachments: SOLR-5594-branch_4x.patch, SOLR-5594.patch, SOLR-5594.patch Enable users to be able to use prefix query with custom field types with non-default encoding/decoding for queries more easily. e.g. having a custom field work with base64 encoded query strings. Currently, the workaround for it is to have the override at getRewriteMethod level. Perhaps having the prefixQuery also use the calling FieldType's readableToIndexed method would work better. -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Iterating BinaryDocValues
FWIW, Micro benchmark shows 4% gain on reusing incoming ByteRef.bytes in short binary docvalues Test2BBinaryDocValues.testVariableBinary() with mmap directory. I wonder why it doesn't reads into incoming bytes https://github.com/apache/lucene-solr/blame/trunk/lucene/core/src/java/org/apache/lucene/codecs/lucene45/Lucene45DocValuesProducer.java#L401 On Wed, Jan 8, 2014 at 12:53 AM, Michael McCandless luc...@mikemccandless.com wrote: Going sequentially should help, if the pages are not hot (in the OS's IO cache). You can also use a different DVFormat, e.g. Direct, but this holds all bytes in RAM. Mike McCandless http://blog.mikemccandless.com On Tue, Jan 7, 2014 at 1:09 PM, Mikhail Khludnev mkhlud...@griddynamics.com wrote: Joel, I tried to hack it straightforwardly, but found no free gain there. The only attempt I can suggest is to try to reuse bytes in https://github.com/apache/lucene-solr/blame/trunk/lucene/core/src/java/org/apache/lucene/codecs/lucene45/Lucene45DocValuesProducer.java#L401 right now it allocates bytes every time, which beside of GC can also impact memory access locality. Could you try fix memory waste and repeat performance test? Have a good hack! On Mon, Dec 23, 2013 at 9:51 PM, Joel Bernstein joels...@gmail.com wrote: Hi, I'm looking for a faster way to perform large scale docId - bytesRef lookups for BinaryDocValues. I'm finding that I can't get the performance that I need from the random access seek in the BinaryDocValues interface. I'm wondering if sequentially scanning the docValues would be a faster approach. I have a BitSet of matching docs, so if I sequentially moved through the docValues I could test each one against that bitset. Wondering if that approach would be faster for bulk extracts and how tricky it would be to add an iterator to the BinaryDocValues interface? Thanks, Joel -- Sincerely yours Mikhail Khludnev Principal Engineer, Grid Dynamics - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org -- Sincerely yours Mikhail Khludnev Principal Engineer, Grid Dynamics http://www.griddynamics.com mkhlud...@griddynamics.com
[jira] [Commented] (LUCENE-5361) FVH throws away some boosts
[ https://issues.apache.org/jira/browse/LUCENE-5361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865270#comment-13865270 ] ASF subversion and git services commented on LUCENE-5361: - Commit 1556483 from [~jpountz] in branch 'dev/trunk' [ https://svn.apache.org/r1556483 ] LUCENE-5361: Fixed handling of query boosts in FastVectorHighlighter. FVH throws away some boosts --- Key: LUCENE-5361 URL: https://issues.apache.org/jira/browse/LUCENE-5361 Project: Lucene - Core Issue Type: Bug Reporter: Nik Everett Assignee: Adrien Grand Priority: Minor Attachments: LUCENE-5361.patch The FVH's FieldQuery throws away some boosts when flattening queries, including DisjunctionMaxQuery and BooleanQuery queries. Fragments generated against queries containing boosted boolean queries don't end up sorted correctly. -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5361) FVH throws away some boosts
[ https://issues.apache.org/jira/browse/LUCENE-5361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865274#comment-13865274 ] ASF subversion and git services commented on LUCENE-5361: - Commit 1556484 from [~jpountz] in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1556484 ] LUCENE-5361: Fixed handling of query boosts in FastVectorHighlighter. FVH throws away some boosts --- Key: LUCENE-5361 URL: https://issues.apache.org/jira/browse/LUCENE-5361 Project: Lucene - Core Issue Type: Bug Reporter: Nik Everett Assignee: Adrien Grand Priority: Minor Attachments: LUCENE-5361.patch The FVH's FieldQuery throws away some boosts when flattening queries, including DisjunctionMaxQuery and BooleanQuery queries. Fragments generated against queries containing boosted boolean queries don't end up sorted correctly. -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-5614) Boost documents using map and query functions
[ https://issues.apache.org/jira/browse/SOLR-5614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865278#comment-13865278 ] Anca Kopetz edited comment on SOLR-5614 at 1/8/14 10:21 AM: Hi, Sorry for that, I really thought it was a bug. Thanks a lot for your answer. was (Author: agh): Hi, Sorry for that, I really thought it was a bug. Thank a lot for your answer. Boost documents using map and query functions - Key: SOLR-5614 URL: https://issues.apache.org/jira/browse/SOLR-5614 Project: Solr Issue Type: Bug Affects Versions: 4.6 Reporter: Anca Kopetz We want to boost documents that contain specific search terms in its fields. We tried the following simplified query : http://localhost:8983/solr/collection1/select?q=ipod belkinwt=xmldebugQuery=trueq.op=ANDdefType=edismaxbf=map(query($qq),0,0,0,100.0)qq={!edismax}power And we get the following error : org.apache.solr.search.SyntaxError: Infinite Recursion detected parsing query 'power' And the stacktrace : ERROR - 2014-01-06 18:27:02.275; org.apache.solr.common.SolrException; org.apache.solr.common.SolrException: org.apache.solr.search.SyntaxError: Infinite Recursion detected parsing query 'power' at org.apache.solr.handler.component.QueryComponent.prepare(QueryComponent.java:171) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:194) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1859) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:710) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:413) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:197) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255) at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116) at org.eclipse.jetty.server.Server.handle(Server.java:368) at org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489) at org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53) at org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:942) at org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1004) at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:640) at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235) at org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72) at org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608) at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543) at java.lang.Thread.run(Thread.java:722) Caused by: org.apache.solr.search.SyntaxError: Infinite Recursion detected parsing query 'power' at org.apache.solr.search.QParser.checkRecurse(QParser.java:178) at org.apache.solr.search.QParser.subQuery(QParser.java:200) at org.apache.solr.search.ExtendedDismaxQParser.getBoostFunctions(ExtendedDismaxQParser.java:437) at org.apache.solr.search.ExtendedDismaxQParser.parse(ExtendedDismaxQParser.java:175) at org.apache.solr.search.QParser.getQuery(QParser.java:142) at
[jira] [Commented] (SOLR-5614) Boost documents using map and query functions
[ https://issues.apache.org/jira/browse/SOLR-5614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865278#comment-13865278 ] Anca Kopetz commented on SOLR-5614: --- Hi, Sorry for that, I really thought it was a bug. Thank a lot for your answer. Boost documents using map and query functions - Key: SOLR-5614 URL: https://issues.apache.org/jira/browse/SOLR-5614 Project: Solr Issue Type: Bug Affects Versions: 4.6 Reporter: Anca Kopetz We want to boost documents that contain specific search terms in its fields. We tried the following simplified query : http://localhost:8983/solr/collection1/select?q=ipod belkinwt=xmldebugQuery=trueq.op=ANDdefType=edismaxbf=map(query($qq),0,0,0,100.0)qq={!edismax}power And we get the following error : org.apache.solr.search.SyntaxError: Infinite Recursion detected parsing query 'power' And the stacktrace : ERROR - 2014-01-06 18:27:02.275; org.apache.solr.common.SolrException; org.apache.solr.common.SolrException: org.apache.solr.search.SyntaxError: Infinite Recursion detected parsing query 'power' at org.apache.solr.handler.component.QueryComponent.prepare(QueryComponent.java:171) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:194) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1859) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:710) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:413) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:197) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255) at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116) at org.eclipse.jetty.server.Server.handle(Server.java:368) at org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489) at org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53) at org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:942) at org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1004) at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:640) at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235) at org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72) at org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608) at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543) at java.lang.Thread.run(Thread.java:722) Caused by: org.apache.solr.search.SyntaxError: Infinite Recursion detected parsing query 'power' at org.apache.solr.search.QParser.checkRecurse(QParser.java:178) at org.apache.solr.search.QParser.subQuery(QParser.java:200) at org.apache.solr.search.ExtendedDismaxQParser.getBoostFunctions(ExtendedDismaxQParser.java:437) at org.apache.solr.search.ExtendedDismaxQParser.parse(ExtendedDismaxQParser.java:175) at org.apache.solr.search.QParser.getQuery(QParser.java:142) at org.apache.solr.search.FunctionQParser.parseNestedQuery(FunctionQParser.java:236) at
[jira] [Updated] (LUCENE-5361) FVH throws away some boosts
[ https://issues.apache.org/jira/browse/LUCENE-5361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adrien Grand updated LUCENE-5361: - Fix Version/s: 4.6.1 FVH throws away some boosts --- Key: LUCENE-5361 URL: https://issues.apache.org/jira/browse/LUCENE-5361 Project: Lucene - Core Issue Type: Bug Reporter: Nik Everett Assignee: Adrien Grand Priority: Minor Fix For: 4.6.1 Attachments: LUCENE-5361.patch The FVH's FieldQuery throws away some boosts when flattening queries, including DisjunctionMaxQuery and BooleanQuery queries. Fragments generated against queries containing boosted boolean queries don't end up sorted correctly. -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5361) FVH throws away some boosts
[ https://issues.apache.org/jira/browse/LUCENE-5361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865280#comment-13865280 ] ASF subversion and git services commented on LUCENE-5361: - Commit 1556485 from [~jpountz] in branch 'dev/branches/lucene_solr_4_6' [ https://svn.apache.org/r1556485 ] LUCENE-5361: Fixed handling of query boosts in FastVectorHighlighter. FVH throws away some boosts --- Key: LUCENE-5361 URL: https://issues.apache.org/jira/browse/LUCENE-5361 Project: Lucene - Core Issue Type: Bug Reporter: Nik Everett Assignee: Adrien Grand Priority: Minor Fix For: 4.6.1 Attachments: LUCENE-5361.patch The FVH's FieldQuery throws away some boosts when flattening queries, including DisjunctionMaxQuery and BooleanQuery queries. Fragments generated against queries containing boosted boolean queries don't end up sorted correctly. -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-5361) FVH throws away some boosts
[ https://issues.apache.org/jira/browse/LUCENE-5361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adrien Grand resolved LUCENE-5361. -- Resolution: Fixed While doing a final review, I noticed that you mistakenly modified the boost of the original query instead of the clone. I took the liberty to fix it before committing but please let me know if this looks wrong to you. Committed, thanks! FVH throws away some boosts --- Key: LUCENE-5361 URL: https://issues.apache.org/jira/browse/LUCENE-5361 Project: Lucene - Core Issue Type: Bug Reporter: Nik Everett Assignee: Adrien Grand Priority: Minor Fix For: 4.6.1 Attachments: LUCENE-5361.patch The FVH's FieldQuery throws away some boosts when flattening queries, including DisjunctionMaxQuery and BooleanQuery queries. Fragments generated against queries containing boosted boolean queries don't end up sorted correctly. -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5379) Query-time multi-word synonym expansion
[ https://issues.apache.org/jira/browse/SOLR-5379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865335#comment-13865335 ] Markus Jelsma commented on SOLR-5379: - Nolan, Jan, both of you have extensive knowledge about the one you worked on hosted on github. How do you compare features? I've checked your issue list and there are no new issues coming in and a lot have been resolved already, looks like that one is much more mature and flexible/configurable. Query-time multi-word synonym expansion --- Key: SOLR-5379 URL: https://issues.apache.org/jira/browse/SOLR-5379 Project: Solr Issue Type: Improvement Components: query parsers Reporter: Tien Nguyen Manh Labels: multi-word, queryparser, synonym Fix For: 4.7 Attachments: quoted.patch, synonym-expander.patch While dealing with synonym at query time, solr failed to work with multi-word synonyms due to some reasons: - First the lucene queryparser tokenizes user query by space so it split multi-word term into two terms before feeding to synonym filter, so synonym filter can't recognized multi-word term to do expansion - Second, if synonym filter expand into multiple terms which contains multi-word synonym, The SolrQueryParseBase currently use MultiPhraseQuery to handle synonyms. But MultiPhraseQuery don't work with term have different number of words. For the first one, we can extend quoted all multi-word synonym in user query so that lucene queryparser don't split it. There are a jira task related to this one https://issues.apache.org/jira/browse/LUCENE-2605. For the second, we can replace MultiPhraseQuery by an appropriate BoleanQuery SHOULD which contains multiple PhraseQuery in case tokens stream have multi-word synonym. -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5376) Add a demo search server
[ https://issues.apache.org/jira/browse/LUCENE-5376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865352#comment-13865352 ] ASF subversion and git services commented on LUCENE-5376: - Commit 1556508 from [~mikemccand] in branch 'dev/branches/lucene5376' [ https://svn.apache.org/r1556508 ] LUCENE-5207, LUCENE-5376: add expressions support to lucene server, so you can define a virtual field from any JS expression and then sort by that field or retrieve its values for all hits Add a demo search server Key: LUCENE-5376 URL: https://issues.apache.org/jira/browse/LUCENE-5376 Project: Lucene - Core Issue Type: Improvement Reporter: Michael McCandless Assignee: Michael McCandless Attachments: lucene-demo-server.tgz I think it'd be useful to have a demo search server for Lucene. Rather than being fully featured, like Solr, it would be minimal, just wrapping the existing Lucene modules to show how you can make use of these features in a server setting. The purpose is to demonstrate how one can build a minimal search server on top of APIs like SearchManager, SearcherLifetimeManager, etc. This is also useful for finding rough edges / issues in Lucene's APIs that make building a server unnecessarily hard. I don't think it should have back compatibility promises (except Lucene's index back compatibility), so it's free to improve as Lucene's APIs change. As a starting point, I'll post what I built for the eating your own dog food search app for Lucene's Solr's jira issues http://jirasearch.mikemccandless.com (blog: http://blog.mikemccandless.com/2013/05/eating-dog-food-with-lucene.html ). It uses Netty to expose basic indexing searching APIs via JSON, but it's very rough (lots nocommits). -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5207) lucene expressions module
[ https://issues.apache.org/jira/browse/LUCENE-5207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865351#comment-13865351 ] ASF subversion and git services commented on LUCENE-5207: - Commit 1556508 from [~mikemccand] in branch 'dev/branches/lucene5376' [ https://svn.apache.org/r1556508 ] LUCENE-5207, LUCENE-5376: add expressions support to lucene server, so you can define a virtual field from any JS expression and then sort by that field or retrieve its values for all hits lucene expressions module - Key: LUCENE-5207 URL: https://issues.apache.org/jira/browse/LUCENE-5207 Project: Lucene - Core Issue Type: New Feature Reporter: Ryan Ernst Fix For: 4.6, 5.0 Attachments: LUCENE-5207.patch, LUCENE-5207.patch, LUCENE-5207.patch Expressions are geared at defining an alternative ranking function (e.g. incorporating the text relevance score and other field values/ranking signals). So they are conceptually much more like ElasticSearch's scripting support (http://www.elasticsearch.org/guide/reference/modules/scripting/) than solr's function queries. Some additional notes: * In addition to referring to other fields, they can also refer to other expressions, so they can be used as computed fields. * You can rank documents easily by multiple expressions (its a SortField at the end), e.g. Sort by year descending, then some function of score price and time ascending. * The provided javascript expression syntax is much more efficient than using a scripting engine, because it does not have dynamic typing (compiles to .class files that work on doubles). Performance is similar to writing a custom FieldComparator yourself, but much easier to do. * We have solr integration to contribute in the future, but this is just the standalone lucene part as a start. Since lucene has no schema, it includes an implementation of Bindings (SimpleBindings) that maps variable names to SortField's or other expressions. -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
lucene-solr pull request: Make ZkStateReader.aliases volatile, this can be ...
GitHub user andyetitmoves opened a pull request: https://github.com/apache/lucene-solr/pull/15 Make ZkStateReader.aliases volatile, this can be updated async from a watcher You can merge this pull request into a Git repository by running: $ git pull https://github.com/andyetitmoves/lucene-solr volatile-aliases Alternatively you can review and apply these changes as the patch at: https://github.com/apache/lucene-solr/pull/15.patch commit 80d3a0f18700968f8f11854ccf07b2e59a73a594 Author: Ramkumar Aiyengar raiyen...@bloomberg.net Date: 2014-01-08T12:13:29Z Make ZkStateReader.aliases volatile, this can be updated async from a watcher Test Plan: Staged rollout Reviewers: dcollins, cpoersch, mjustice Differential Revision: https://all.phab.dev.bloomberg.com/D41601 - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5388) Eliminate construction over readers for Tokenizer
[ https://issues.apache.org/jira/browse/LUCENE-5388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865385#comment-13865385 ] Benson Margulies commented on LUCENE-5388: -- Uwe, what's that mean practically? No PR yet? A PR just in trunk? Merging my recent doc to a 4.x branch? Eliminate construction over readers for Tokenizer - Key: LUCENE-5388 URL: https://issues.apache.org/jira/browse/LUCENE-5388 Project: Lucene - Core Issue Type: Improvement Components: core/other Reporter: Benson Margulies In the modern world, Tokenizers are intended to be reusable, with input supplied via #setReader. The constructors that take Reader are a vestige. Worse yet, they invite people to make mistakes in handling the reader that tangle them up with the state machine in Tokenizer. The sensible thing is to eliminate these ctors, and force setReader usage. -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (LUCENE-5388) Eliminate construction over readers for Tokenizer
[ https://issues.apache.org/jira/browse/LUCENE-5388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865385#comment-13865385 ] Benson Margulies edited comment on LUCENE-5388 at 1/8/14 12:59 PM: --- Uwe, what's that mean practically? No PR yet? A PR just in trunk? Merging my recent doc to a 4.x branch? A feature branch where this goes to be merged in when the time is ripe? was (Author: bmargulies): Uwe, what's that mean practically? No PR yet? A PR just in trunk? Merging my recent doc to a 4.x branch? Eliminate construction over readers for Tokenizer - Key: LUCENE-5388 URL: https://issues.apache.org/jira/browse/LUCENE-5388 Project: Lucene - Core Issue Type: Improvement Components: core/other Reporter: Benson Margulies In the modern world, Tokenizers are intended to be reusable, with input supplied via #setReader. The constructors that take Reader are a vestige. Worse yet, they invite people to make mistakes in handling the reader that tangle them up with the state machine in Tokenizer. The sensible thing is to eliminate these ctors, and force setReader usage. -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5388) Eliminate construction over readers for Tokenizer
[ https://issues.apache.org/jira/browse/LUCENE-5388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865401#comment-13865401 ] Robert Muir commented on LUCENE-5388: - Benson, he just means the patch would only be committed to trunk. I agree with this... Eliminate construction over readers for Tokenizer - Key: LUCENE-5388 URL: https://issues.apache.org/jira/browse/LUCENE-5388 Project: Lucene - Core Issue Type: Improvement Components: core/other Reporter: Benson Margulies In the modern world, Tokenizers are intended to be reusable, with input supplied via #setReader. The constructors that take Reader are a vestige. Worse yet, they invite people to make mistakes in handling the reader that tangle them up with the state machine in Tokenizer. The sensible thing is to eliminate these ctors, and force setReader usage. -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5388) Eliminate construction over readers for Tokenizer
[ https://issues.apache.org/jira/browse/LUCENE-5388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865399#comment-13865399 ] Uwe Schindler commented on LUCENE-5388: --- Commit to trunk only, not backport to branch_4x. Eliminate construction over readers for Tokenizer - Key: LUCENE-5388 URL: https://issues.apache.org/jira/browse/LUCENE-5388 Project: Lucene - Core Issue Type: Improvement Components: core/other Reporter: Benson Margulies In the modern world, Tokenizers are intended to be reusable, with input supplied via #setReader. The constructors that take Reader are a vestige. Worse yet, they invite people to make mistakes in handling the reader that tangle them up with the state machine in Tokenizer. The sensible thing is to eliminate these ctors, and force setReader usage. -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5619) Improve BinaryField to make it Sortable and Indexable
[ https://issues.apache.org/jira/browse/SOLR-5619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865447#comment-13865447 ] Erick Erickson commented on SOLR-5619: -- Anshum: Could you flesh this out a little more, including use-cases? I'm guessing this would be fairly limited, it makes no sense to match on, say, the entire contents of a movie that you'd indexed. Or at least it would be very difficult. Include what limitations you envision. How would queries be parsed? Binary data is just binary data that could happen to be query syntax. Base64 encode all the field values on the client side? What would the rules be for tokenizing input both at index and query time? Etc... I suspect that there are a bunch of details that make this less useful than one might think, but that's only a guess. Improve BinaryField to make it Sortable and Indexable - Key: SOLR-5619 URL: https://issues.apache.org/jira/browse/SOLR-5619 Project: Solr Issue Type: Improvement Affects Versions: 4.6 Reporter: Anshum Gupta Assignee: Anshum Gupta Currently, BinaryField can neither be indexed nor sorted on. Having them indexable and sortable should come in handy for a reasonable amount of use cases e.g. wanting to index binary data (could come from anything non-text). -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-5610) Support cluster-wide properties with an API called CLUSTERPROP
[ https://issues.apache.org/jira/browse/SOLR-5610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Noble Paul updated SOLR-5610: - Attachment: SOLR-5610.patch Support cluster-wide properties with an API called CLUSTERPROP -- Key: SOLR-5610 URL: https://issues.apache.org/jira/browse/SOLR-5610 Project: Solr Issue Type: Bug Reporter: Noble Paul Attachments: SOLR-5610.patch Add a collection admin API for cluster wide property management the new API would create an entry in the root as /cluster-props.json {code:javascript} { prop:val } {code} The API would work as /command=clusterpropname=propNamevalue=propVal there will be a set of well-known properties which can be set or unset with this command -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Assigned] (SOLR-5610) Support cluster-wide properties with an API called CLUSTERPROP
[ https://issues.apache.org/jira/browse/SOLR-5610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Noble Paul reassigned SOLR-5610: Assignee: Noble Paul Support cluster-wide properties with an API called CLUSTERPROP -- Key: SOLR-5610 URL: https://issues.apache.org/jira/browse/SOLR-5610 Project: Solr Issue Type: Bug Reporter: Noble Paul Assignee: Noble Paul Attachments: SOLR-5610.patch Add a collection admin API for cluster wide property management the new API would create an entry in the root as /cluster-props.json {code:javascript} { prop:val } {code} The API would work as /command=clusterpropname=propNamevalue=propVal there will be a set of well-known properties which can be set or unset with this command -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5610) Support cluster-wide properties with an API called CLUSTERPROP
[ https://issues.apache.org/jira/browse/SOLR-5610?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865457#comment-13865457 ] Noble Paul commented on SOLR-5610: -- plan to commit this soon Support cluster-wide properties with an API called CLUSTERPROP -- Key: SOLR-5610 URL: https://issues.apache.org/jira/browse/SOLR-5610 Project: Solr Issue Type: Bug Reporter: Noble Paul Assignee: Noble Paul Attachments: SOLR-5610.patch Add a collection admin API for cluster wide property management the new API would create an entry in the root as /cluster-props.json {code:javascript} { prop:val } {code} The API would work as /command=clusterpropname=propNamevalue=propVal there will be a set of well-known properties which can be set or unset with this command -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5488) Fix up test failures for Analytics Component
[ https://issues.apache.org/jira/browse/SOLR-5488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865470#comment-13865470 ] Erick Erickson commented on SOLR-5488: -- What's our status here? There are still occasional test failures on trunk, all relating to returning null values. I'm reluctant to merge this back into 4x until they're resolved. I've been out of country for the last month, so I'm catching back up. It's reasonable to put some temporary logging in to see if we can track this down, since the failures don't seem to be reproducible at will. If you do, please flag them with something like remove me so we can find them all. Do _not_ flag them with //nocommit since that'll cause problems Let me know, Erick Fix up test failures for Analytics Component Key: SOLR-5488 URL: https://issues.apache.org/jira/browse/SOLR-5488 Project: Solr Issue Type: Bug Affects Versions: 5.0, 4.7 Reporter: Erick Erickson Assignee: Erick Erickson Attachments: SOLR-5488.patch, SOLR-5488.patch, SOLR-5488.patch The analytics component has a few test failures, perhaps environment-dependent. This is just to collect the test fixes in one place for convenience when we merge back into 4.x -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5376) Add a demo search server
[ https://issues.apache.org/jira/browse/LUCENE-5376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865473#comment-13865473 ] ASF subversion and git services commented on LUCENE-5376: - Commit 1556546 from [~mikemccand] in branch 'dev/branches/lucene5376' [ https://svn.apache.org/r1556546 ] LUCENE-5376: turn on scoring when sorting by field if any of the sort fields or retrieved fields require scores, e.g. when they are an expression field that uses _score Add a demo search server Key: LUCENE-5376 URL: https://issues.apache.org/jira/browse/LUCENE-5376 Project: Lucene - Core Issue Type: Improvement Reporter: Michael McCandless Assignee: Michael McCandless Attachments: lucene-demo-server.tgz I think it'd be useful to have a demo search server for Lucene. Rather than being fully featured, like Solr, it would be minimal, just wrapping the existing Lucene modules to show how you can make use of these features in a server setting. The purpose is to demonstrate how one can build a minimal search server on top of APIs like SearchManager, SearcherLifetimeManager, etc. This is also useful for finding rough edges / issues in Lucene's APIs that make building a server unnecessarily hard. I don't think it should have back compatibility promises (except Lucene's index back compatibility), so it's free to improve as Lucene's APIs change. As a starting point, I'll post what I built for the eating your own dog food search app for Lucene's Solr's jira issues http://jirasearch.mikemccandless.com (blog: http://blog.mikemccandless.com/2013/05/eating-dog-food-with-lucene.html ). It uses Netty to expose basic indexing searching APIs via JSON, but it's very rough (lots nocommits). -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5488) Fix up test failures for Analytics Component
[ https://issues.apache.org/jira/browse/SOLR-5488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865483#comment-13865483 ] Mark Miller commented on SOLR-5488: --- It's not reproducible by will, but it's pretty easy to reproduce. I see it fairly frequently in my local runs. Fix up test failures for Analytics Component Key: SOLR-5488 URL: https://issues.apache.org/jira/browse/SOLR-5488 Project: Solr Issue Type: Bug Affects Versions: 5.0, 4.7 Reporter: Erick Erickson Assignee: Erick Erickson Attachments: SOLR-5488.patch, SOLR-5488.patch, SOLR-5488.patch The analytics component has a few test failures, perhaps environment-dependent. This is just to collect the test fixes in one place for convenience when we merge back into 4.x -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5376) Add a demo search server
[ https://issues.apache.org/jira/browse/LUCENE-5376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865511#comment-13865511 ] ASF subversion and git services commented on LUCENE-5376: - Commit 1556555 from [~mikemccand] in branch 'dev/branches/lucene5376' [ https://svn.apache.org/r1556555 ] LUCENE-5376: don't need to make ScoreValueSource public Add a demo search server Key: LUCENE-5376 URL: https://issues.apache.org/jira/browse/LUCENE-5376 Project: Lucene - Core Issue Type: Improvement Reporter: Michael McCandless Assignee: Michael McCandless Attachments: lucene-demo-server.tgz I think it'd be useful to have a demo search server for Lucene. Rather than being fully featured, like Solr, it would be minimal, just wrapping the existing Lucene modules to show how you can make use of these features in a server setting. The purpose is to demonstrate how one can build a minimal search server on top of APIs like SearchManager, SearcherLifetimeManager, etc. This is also useful for finding rough edges / issues in Lucene's APIs that make building a server unnecessarily hard. I don't think it should have back compatibility promises (except Lucene's index back compatibility), so it's free to improve as Lucene's APIs change. As a starting point, I'll post what I built for the eating your own dog food search app for Lucene's Solr's jira issues http://jirasearch.mikemccandless.com (blog: http://blog.mikemccandless.com/2013/05/eating-dog-food-with-lucene.html ). It uses Netty to expose basic indexing searching APIs via JSON, but it's very rough (lots nocommits). -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5488) Fix up test failures for Analytics Component
[ https://issues.apache.org/jira/browse/SOLR-5488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865520#comment-13865520 ] Steven Bower commented on SOLR-5488: [~markrmil...@gmail.com] what environment are you running in where you see the failures? (i've not been able to reproduce on my end).. Fix up test failures for Analytics Component Key: SOLR-5488 URL: https://issues.apache.org/jira/browse/SOLR-5488 Project: Solr Issue Type: Bug Affects Versions: 5.0, 4.7 Reporter: Erick Erickson Assignee: Erick Erickson Attachments: SOLR-5488.patch, SOLR-5488.patch, SOLR-5488.patch The analytics component has a few test failures, perhaps environment-dependent. This is just to collect the test fixes in one place for convenience when we merge back into 4.x -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4414) MoreLikeThis on a shard finds no interesting terms if the document queried is not in that shard
[ https://issues.apache.org/jira/browse/SOLR-4414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865538#comment-13865538 ] Nimrod Gliksman commented on SOLR-4414: --- Hi, Does anyone know of any progress in this matter, or any workaround? Thanks! MoreLikeThis on a shard finds no interesting terms if the document queried is not in that shard --- Key: SOLR-4414 URL: https://issues.apache.org/jira/browse/SOLR-4414 Project: Solr Issue Type: Bug Components: MoreLikeThis, SolrCloud Affects Versions: 4.1 Reporter: Colin Bartolome Running a MoreLikeThis query in a cloud works only when the document being queried exists in whatever shard serves the request. If the document is not present in the shard, no interesting terms are found and, consequently, no matches are found. h5. Steps to reproduce * Edit example/solr/collection1/conf/solrconfig.xml and add this line, with the rest of the request handlers: {code:xml} requestHandler name=/mlt class=solr.MoreLikeThisHandler / {code} * Follow the [simplest SolrCloud example|http://wiki.apache.org/solr/SolrCloud#Example_A:_Simple_two_shard_cluster] to get two shards running. * Hit this URL: [http://localhost:8983/solr/collection1/mlt?mlt.fl=includesq=id:3007WFPmlt.match.include=falsemlt.interestingTerms=listmlt.mindf=1mlt.mintf=1] * Compare that output to that of this URL: [http://localhost:7574/solr/collection1/mlt?mlt.fl=includesq=id:3007WFPmlt.match.include=falsemlt.interestingTerms=listmlt.mindf=1mlt.mintf=1] The former URL will return a result and list some interesting terms. The latter URL will return no results and list no interesting terms. It will also show this odd XML element: {code:xml} null name=response/ {code} -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-5541) Allow QueryElevationComponent to accept elevateIds and excludeIds as http parameters
[ https://issues.apache.org/jira/browse/SOLR-5541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joel Bernstein updated SOLR-5541: - Attachment: SOLR-5541.patch Added a new patch. In the initial patches I was piggy backing a convenience method that was used by the test cases to manually set the elevateIds and excludeIds. This method was using an in memory cache would hold all the queries that were in the elevation xml file. This approach would cause a memory leak when doing large scale query elavation, which this patch is designed to support. This patch stops piggy backing that convenience method and adds a new method that doesn't interact with the elevation cache. Allow QueryElevationComponent to accept elevateIds and excludeIds as http parameters Key: SOLR-5541 URL: https://issues.apache.org/jira/browse/SOLR-5541 Project: Solr Issue Type: Improvement Components: SearchComponents - other Affects Versions: 4.6 Reporter: Joel Bernstein Assignee: Joel Bernstein Priority: Minor Fix For: 4.7 Attachments: SOLR-5541.patch, SOLR-5541.patch, SOLR-5541.patch, SOLR-5541.patch The QueryElevationComponent currently uses an xml file to map query strings to elevateIds and excludeIds. This ticket adds the ability to pass in elevateIds and excludeIds through two new http parameters elevateIds and excludeIds. This will allow more sophisticated business logic to be used in selecting which ids to elevate/exclude. Proposed syntax: http://localhost:8983/solr/elevate?q=*:*elevatedIds=3,4excludeIds=6,8 The elevateIds and excludeIds point to the unique document Id. -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-5541) Allow QueryElevationComponent to accept elevateIds and excludeIds as http parameters
[ https://issues.apache.org/jira/browse/SOLR-5541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865557#comment-13865557 ] Joel Bernstein edited comment on SOLR-5541 at 1/8/14 3:33 PM: -- Added a new patch. In the initial patches I was piggy backing a convenience method that was used by the test cases to manually set the elevateIds and excludeIds. This method was adding to an in memory cache that holds all the queries that were in the elevation xml file. This approach would cause a memory leak when doing large scale query elavation, which this patch is designed to support. This patch stops piggy backing that convenience method and adds a new method that doesn't interact with the elevation cache. was (Author: joel.bernstein): Added a new patch. In the initial patches I was piggy backing a convenience method that was used by the test cases to manually set the elevateIds and excludeIds. This method was using an in memory cache would hold all the queries that were in the elevation xml file. This approach would cause a memory leak when doing large scale query elavation, which this patch is designed to support. This patch stops piggy backing that convenience method and adds a new method that doesn't interact with the elevation cache. Allow QueryElevationComponent to accept elevateIds and excludeIds as http parameters Key: SOLR-5541 URL: https://issues.apache.org/jira/browse/SOLR-5541 Project: Solr Issue Type: Improvement Components: SearchComponents - other Affects Versions: 4.6 Reporter: Joel Bernstein Assignee: Joel Bernstein Priority: Minor Fix For: 4.7 Attachments: SOLR-5541.patch, SOLR-5541.patch, SOLR-5541.patch, SOLR-5541.patch The QueryElevationComponent currently uses an xml file to map query strings to elevateIds and excludeIds. This ticket adds the ability to pass in elevateIds and excludeIds through two new http parameters elevateIds and excludeIds. This will allow more sophisticated business logic to be used in selecting which ids to elevate/exclude. Proposed syntax: http://localhost:8983/solr/elevate?q=*:*elevatedIds=3,4excludeIds=6,8 The elevateIds and excludeIds point to the unique document Id. -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5361) FVH throws away some boosts
[ https://issues.apache.org/jira/browse/LUCENE-5361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865566#comment-13865566 ] Nik Everett commented on LUCENE-5361: - Wonderful! Thanks. FVH throws away some boosts --- Key: LUCENE-5361 URL: https://issues.apache.org/jira/browse/LUCENE-5361 Project: Lucene - Core Issue Type: Bug Reporter: Nik Everett Assignee: Adrien Grand Priority: Minor Fix For: 4.6.1 Attachments: LUCENE-5361.patch The FVH's FieldQuery throws away some boosts when flattening queries, including DisjunctionMaxQuery and BooleanQuery queries. Fragments generated against queries containing boosted boolean queries don't end up sorted correctly. -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5388) Eliminate construction over readers for Tokenizer
[ https://issues.apache.org/jira/browse/LUCENE-5388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865573#comment-13865573 ] Benson Margulies commented on LUCENE-5388: -- How about we start by adding ctors that don't require a reader, and do treat them as 4.x fodder? Eliminate construction over readers for Tokenizer - Key: LUCENE-5388 URL: https://issues.apache.org/jira/browse/LUCENE-5388 Project: Lucene - Core Issue Type: Improvement Components: core/other Reporter: Benson Margulies In the modern world, Tokenizers are intended to be reusable, with input supplied via #setReader. The constructors that take Reader are a vestige. Worse yet, they invite people to make mistakes in handling the reader that tangle them up with the state machine in Tokenizer. The sensible thing is to eliminate these ctors, and force setReader usage. -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-4414) MoreLikeThis on a shard finds no interesting terms if the document queried is not in that shard
[ https://issues.apache.org/jira/browse/SOLR-4414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865538#comment-13865538 ] Nimrod Gliksman edited comment on SOLR-4414 at 1/8/14 4:04 PM: --- Hi, Does anyone knows of any progress in this matter, or any workaround? Thanks! was (Author: nim...@modusp.com): Hi, Does anyone know of any progress in this matter, or any workaround? Thanks! MoreLikeThis on a shard finds no interesting terms if the document queried is not in that shard --- Key: SOLR-4414 URL: https://issues.apache.org/jira/browse/SOLR-4414 Project: Solr Issue Type: Bug Components: MoreLikeThis, SolrCloud Affects Versions: 4.1 Reporter: Colin Bartolome Running a MoreLikeThis query in a cloud works only when the document being queried exists in whatever shard serves the request. If the document is not present in the shard, no interesting terms are found and, consequently, no matches are found. h5. Steps to reproduce * Edit example/solr/collection1/conf/solrconfig.xml and add this line, with the rest of the request handlers: {code:xml} requestHandler name=/mlt class=solr.MoreLikeThisHandler / {code} * Follow the [simplest SolrCloud example|http://wiki.apache.org/solr/SolrCloud#Example_A:_Simple_two_shard_cluster] to get two shards running. * Hit this URL: [http://localhost:8983/solr/collection1/mlt?mlt.fl=includesq=id:3007WFPmlt.match.include=falsemlt.interestingTerms=listmlt.mindf=1mlt.mintf=1] * Compare that output to that of this URL: [http://localhost:7574/solr/collection1/mlt?mlt.fl=includesq=id:3007WFPmlt.match.include=falsemlt.interestingTerms=listmlt.mindf=1mlt.mintf=1] The former URL will return a result and list some interesting terms. The latter URL will return no results and list no interesting terms. It will also show this odd XML element: {code:xml} null name=response/ {code} -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5376) Add a demo search server
[ https://issues.apache.org/jira/browse/LUCENE-5376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865579#comment-13865579 ] ASF subversion and git services commented on LUCENE-5376: - Commit 1556564 from [~mikemccand] in branch 'dev/branches/lucene5376' [ https://svn.apache.org/r1556564 ] LUCENE-5376: add another expression test case; add nocommit for bcp47 cutover Add a demo search server Key: LUCENE-5376 URL: https://issues.apache.org/jira/browse/LUCENE-5376 Project: Lucene - Core Issue Type: Improvement Reporter: Michael McCandless Assignee: Michael McCandless Attachments: lucene-demo-server.tgz I think it'd be useful to have a demo search server for Lucene. Rather than being fully featured, like Solr, it would be minimal, just wrapping the existing Lucene modules to show how you can make use of these features in a server setting. The purpose is to demonstrate how one can build a minimal search server on top of APIs like SearchManager, SearcherLifetimeManager, etc. This is also useful for finding rough edges / issues in Lucene's APIs that make building a server unnecessarily hard. I don't think it should have back compatibility promises (except Lucene's index back compatibility), so it's free to improve as Lucene's APIs change. As a starting point, I'll post what I built for the eating your own dog food search app for Lucene's Solr's jira issues http://jirasearch.mikemccandless.com (blog: http://blog.mikemccandless.com/2013/05/eating-dog-food-with-lucene.html ). It uses Netty to expose basic indexing searching APIs via JSON, but it's very rough (lots nocommits). -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5388) Eliminate construction over readers for Tokenizer
[ https://issues.apache.org/jira/browse/LUCENE-5388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865580#comment-13865580 ] Benson Margulies commented on LUCENE-5388: -- setReader throws IOException, but the existing constructors don't. Analyzer 'createComponents' doesn't. How to sort this out? Eliminate construction over readers for Tokenizer - Key: LUCENE-5388 URL: https://issues.apache.org/jira/browse/LUCENE-5388 Project: Lucene - Core Issue Type: Improvement Components: core/other Reporter: Benson Margulies In the modern world, Tokenizers are intended to be reusable, with input supplied via #setReader. The constructors that take Reader are a vestige. Worse yet, they invite people to make mistakes in handling the reader that tangle them up with the state machine in Tokenizer. The sensible thing is to eliminate these ctors, and force setReader usage. -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-5480) Make MoreLikeThisHandler distributable
[ https://issues.apache.org/jira/browse/SOLR-5480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Molloy updated SOLR-5480: --- Attachment: SOLR-5480.patch (Hopefully) last 4.6 patch version, includes more bug fixing, normalization of output, etc. Working on migrating to trunk now. Make MoreLikeThisHandler distributable -- Key: SOLR-5480 URL: https://issues.apache.org/jira/browse/SOLR-5480 Project: Solr Issue Type: Improvement Reporter: Steve Molloy Attachments: SOLR-5480.patch, SOLR-5480.patch, SOLR-5480.patch, SOLR-5480.patch, SOLR-5480.patch The MoreLikeThis component, when used in the standard search handler supports distributed searches. But the MoreLikeThisHandler itself doesn't, which prevents from say, passing in text to perform the query. I'll start looking into adapting the SearchHandler logic to the MoreLikeThisHandler. If anyone has some work done already and want to share, or want to contribute, any help will be welcomed. -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4414) MoreLikeThis on a shard finds no interesting terms if the document queried is not in that shard
[ https://issues.apache.org/jira/browse/SOLR-4414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865603#comment-13865603 ] Steve Molloy commented on SOLR-4414: You may want to look at SOLR-5480. MoreLikeThis on a shard finds no interesting terms if the document queried is not in that shard --- Key: SOLR-4414 URL: https://issues.apache.org/jira/browse/SOLR-4414 Project: Solr Issue Type: Bug Components: MoreLikeThis, SolrCloud Affects Versions: 4.1 Reporter: Colin Bartolome Running a MoreLikeThis query in a cloud works only when the document being queried exists in whatever shard serves the request. If the document is not present in the shard, no interesting terms are found and, consequently, no matches are found. h5. Steps to reproduce * Edit example/solr/collection1/conf/solrconfig.xml and add this line, with the rest of the request handlers: {code:xml} requestHandler name=/mlt class=solr.MoreLikeThisHandler / {code} * Follow the [simplest SolrCloud example|http://wiki.apache.org/solr/SolrCloud#Example_A:_Simple_two_shard_cluster] to get two shards running. * Hit this URL: [http://localhost:8983/solr/collection1/mlt?mlt.fl=includesq=id:3007WFPmlt.match.include=falsemlt.interestingTerms=listmlt.mindf=1mlt.mintf=1] * Compare that output to that of this URL: [http://localhost:7574/solr/collection1/mlt?mlt.fl=includesq=id:3007WFPmlt.match.include=falsemlt.interestingTerms=listmlt.mindf=1mlt.mintf=1] The former URL will return a result and list some interesting terms. The latter URL will return no results and list no interesting terms. It will also show this odd XML element: {code:xml} null name=response/ {code} -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5615) Deadlock while trying to recover after a ZK session expiry
[ https://issues.apache.org/jira/browse/SOLR-5615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865604#comment-13865604 ] ASF subversion and git services commented on SOLR-5615: --- Commit 1556572 from [~markrmil...@gmail.com] in branch 'dev/trunk' [ https://svn.apache.org/r1556572 ] SOLR-5615: Deadlock while trying to recover after a ZK session expiration. Deadlock while trying to recover after a ZK session expiry -- Key: SOLR-5615 URL: https://issues.apache.org/jira/browse/SOLR-5615 Project: Solr Issue Type: Bug Components: SolrCloud Affects Versions: 4.4, 4.5, 4.6 Reporter: Ramkumar Aiyengar Assignee: Mark Miller Fix For: 5.0, 4.7, 4.6.1 Attachments: SOLR-5615.patch, SOLR-5615.patch, SOLR-5615.patch The sequence of events which might trigger this is as follows: - Leader of a shard, say OL, has a ZK expiry - The new leader, NL, starts the election process - NL, through Overseer, clears the current leader (OL) for the shard from the cluster state - OL reconnects to ZK, calls onReconnect from event thread (main-EventThread) - OL marks itself down - OL sets up watches for cluster state, and then retrieves it (with no leader for this shard) - NL, through Overseer, updates cluster state to mark itself leader for the shard - OL tries to register itself as a replica, and waits till the cluster state is updated with the new leader from event thread - ZK sends a watch update to OL, but it is blocked on the event thread waiting for it. Oops. This finally breaks out after trying to register itself as replica times out after 20 mins. -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5615) Deadlock while trying to recover after a ZK session expiry
[ https://issues.apache.org/jira/browse/SOLR-5615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865606#comment-13865606 ] ASF subversion and git services commented on SOLR-5615: --- Commit 1556573 from [~markrmil...@gmail.com] in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1556573 ] SOLR-5615: Deadlock while trying to recover after a ZK session expiration. Deadlock while trying to recover after a ZK session expiry -- Key: SOLR-5615 URL: https://issues.apache.org/jira/browse/SOLR-5615 Project: Solr Issue Type: Bug Components: SolrCloud Affects Versions: 4.4, 4.5, 4.6 Reporter: Ramkumar Aiyengar Assignee: Mark Miller Fix For: 5.0, 4.7, 4.6.1 Attachments: SOLR-5615.patch, SOLR-5615.patch, SOLR-5615.patch The sequence of events which might trigger this is as follows: - Leader of a shard, say OL, has a ZK expiry - The new leader, NL, starts the election process - NL, through Overseer, clears the current leader (OL) for the shard from the cluster state - OL reconnects to ZK, calls onReconnect from event thread (main-EventThread) - OL marks itself down - OL sets up watches for cluster state, and then retrieves it (with no leader for this shard) - NL, through Overseer, updates cluster state to mark itself leader for the shard - OL tries to register itself as a replica, and waits till the cluster state is updated with the new leader from event thread - ZK sends a watch update to OL, but it is blocked on the event thread waiting for it. Oops. This finally breaks out after trying to register itself as replica times out after 20 mins. -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5619) Improve BinaryField to make it Sortable and Indexable
[ https://issues.apache.org/jira/browse/SOLR-5619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865611#comment-13865611 ] Hoss Man commented on SOLR-5619: This came out of an idea i mentioned to Anshum offline when talking about his test for SOLR-5594, and some confusion i had based on what i remembered/imagined about some test code sarowe added in SOLR-5354. my suggestion was that some of the things that existed in the test only FieldType in SOLR-5354, and in Anshum's SOLR-5594 patch, could/should probably just be improvements to BinaryField directly. In SOLR-5354 a sortable subclass of BinaryField via docvalues -- anshum and i were just discussing the idea of using something like the existing BinaryTokenStream functinality from the lucene core test utils to promote the indexable/sortable logic up into BinaryField itself. BinaryField already requires that external systems deal with it (the stored field part) via base64 encoded strings -- so from a query standpoint, yes -- you'd do term queries against it via base64, but the sorting of the indexed terms would be just like in SOLR-5354. bq. I suspect that there are a bunch of details that make this less useful than one might think, but that's only a guess. it probably wouldn't be super useful to a lot of people, but it would be nice to have a FieldType that gives you some more direct access to doing things directly with BytesRef in lucene -- if nothing else it would help act as a proof of concept/methodology for how people can write custom FieldTypes that do things with their specialized binary data. If it doesn't make sense, it doesn't make sense -- but it seemed like it might be worth while to investigate given the instances where it's made sense to create weird little subclasses in various tests. Improve BinaryField to make it Sortable and Indexable - Key: SOLR-5619 URL: https://issues.apache.org/jira/browse/SOLR-5619 Project: Solr Issue Type: Improvement Affects Versions: 4.6 Reporter: Anshum Gupta Assignee: Anshum Gupta Currently, BinaryField can neither be indexed nor sorted on. Having them indexable and sortable should come in handy for a reasonable amount of use cases e.g. wanting to index binary data (could come from anything non-text). -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5615) Deadlock while trying to recover after a ZK session expiry
[ https://issues.apache.org/jira/browse/SOLR-5615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865612#comment-13865612 ] Ramkumar Aiyengar commented on SOLR-5615: - Had noticed a separate race while initially investigating running onReconnect in a separate thread, https://github.com/apache/lucene-solr/pull/15 for that. Deadlock while trying to recover after a ZK session expiry -- Key: SOLR-5615 URL: https://issues.apache.org/jira/browse/SOLR-5615 Project: Solr Issue Type: Bug Components: SolrCloud Affects Versions: 4.4, 4.5, 4.6 Reporter: Ramkumar Aiyengar Assignee: Mark Miller Fix For: 5.0, 4.7, 4.6.1 Attachments: SOLR-5615.patch, SOLR-5615.patch, SOLR-5615.patch The sequence of events which might trigger this is as follows: - Leader of a shard, say OL, has a ZK expiry - The new leader, NL, starts the election process - NL, through Overseer, clears the current leader (OL) for the shard from the cluster state - OL reconnects to ZK, calls onReconnect from event thread (main-EventThread) - OL marks itself down - OL sets up watches for cluster state, and then retrieves it (with no leader for this shard) - NL, through Overseer, updates cluster state to mark itself leader for the shard - OL tries to register itself as a replica, and waits till the cluster state is updated with the new leader from event thread - ZK sends a watch update to OL, but it is blocked on the event thread waiting for it. Oops. This finally breaks out after trying to register itself as replica times out after 20 mins. -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5388) Eliminate construction over readers for Tokenizer
[ https://issues.apache.org/jira/browse/LUCENE-5388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865620#comment-13865620 ] Robert Muir commented on LUCENE-5388: - {quote} How about we start by adding ctors that don't require a reader, and do treat them as 4.x fodder? {quote} I'd prefer not, because then there needs to be very sophisticated backwards compat to know which one to call. and subclassing gets complicated. I would really prefer we just choose to fix the API, either 1) 5.0-only or 2) break it in a 4.x release From my perspective, Benson would be probably be the one most impacted by such a 4.x break. So if he really wants to do this, I have no problem. {quote} setReader throws IOException, but the existing constructors don't. Analyzer 'createComponents' doesn't. How to sort this out? {quote} I dont see the problem. I think createComponents doesnt need to throw exception: instead the logic of Analyzer.tokenStream changes slightly, to call components.setReader(r) in both cases of the if-then-else. Make sense? Eliminate construction over readers for Tokenizer - Key: LUCENE-5388 URL: https://issues.apache.org/jira/browse/LUCENE-5388 Project: Lucene - Core Issue Type: Improvement Components: core/other Reporter: Benson Margulies In the modern world, Tokenizers are intended to be reusable, with input supplied via #setReader. The constructors that take Reader are a vestige. Worse yet, they invite people to make mistakes in handling the reader that tangle them up with the state machine in Tokenizer. The sensible thing is to eliminate these ctors, and force setReader usage. -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5615) Deadlock while trying to recover after a ZK session expiry
[ https://issues.apache.org/jira/browse/SOLR-5615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865621#comment-13865621 ] Mark Miller commented on SOLR-5615: --- Thanks, we should open a new JIRA issue for that. Deadlock while trying to recover after a ZK session expiry -- Key: SOLR-5615 URL: https://issues.apache.org/jira/browse/SOLR-5615 Project: Solr Issue Type: Bug Components: SolrCloud Affects Versions: 4.4, 4.5, 4.6 Reporter: Ramkumar Aiyengar Assignee: Mark Miller Fix For: 5.0, 4.7, 4.6.1 Attachments: SOLR-5615.patch, SOLR-5615.patch, SOLR-5615.patch The sequence of events which might trigger this is as follows: - Leader of a shard, say OL, has a ZK expiry - The new leader, NL, starts the election process - NL, through Overseer, clears the current leader (OL) for the shard from the cluster state - OL reconnects to ZK, calls onReconnect from event thread (main-EventThread) - OL marks itself down - OL sets up watches for cluster state, and then retrieves it (with no leader for this shard) - NL, through Overseer, updates cluster state to mark itself leader for the shard - OL tries to register itself as a replica, and waits till the cluster state is updated with the new leader from event thread - ZK sends a watch update to OL, but it is blocked on the event thread waiting for it. Oops. This finally breaks out after trying to register itself as replica times out after 20 mins. -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-5620) Race condition while setting ZkStateReader.aliases
Ramkumar Aiyengar created SOLR-5620: --- Summary: Race condition while setting ZkStateReader.aliases Key: SOLR-5620 URL: https://issues.apache.org/jira/browse/SOLR-5620 Project: Solr Issue Type: Bug Components: SolrCloud Affects Versions: 4.6 Reporter: Ramkumar Aiyengar Priority: Minor Noticed while working on SOLR-5615, https://github.com/apache/lucene-solr/pull/15 for a patch. -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5615) Deadlock while trying to recover after a ZK session expiry
[ https://issues.apache.org/jira/browse/SOLR-5615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865627#comment-13865627 ] Ramkumar Aiyengar commented on SOLR-5615: - SOLR-5620 Deadlock while trying to recover after a ZK session expiry -- Key: SOLR-5615 URL: https://issues.apache.org/jira/browse/SOLR-5615 Project: Solr Issue Type: Bug Components: SolrCloud Affects Versions: 4.4, 4.5, 4.6 Reporter: Ramkumar Aiyengar Assignee: Mark Miller Fix For: 5.0, 4.7, 4.6.1 Attachments: SOLR-5615.patch, SOLR-5615.patch, SOLR-5615.patch The sequence of events which might trigger this is as follows: - Leader of a shard, say OL, has a ZK expiry - The new leader, NL, starts the election process - NL, through Overseer, clears the current leader (OL) for the shard from the cluster state - OL reconnects to ZK, calls onReconnect from event thread (main-EventThread) - OL marks itself down - OL sets up watches for cluster state, and then retrieves it (with no leader for this shard) - NL, through Overseer, updates cluster state to mark itself leader for the shard - OL tries to register itself as a replica, and waits till the cluster state is updated with the new leader from event thread - ZK sends a watch update to OL, but it is blocked on the event thread waiting for it. Oops. This finally breaks out after trying to register itself as replica times out after 20 mins. -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5388) Eliminate construction over readers for Tokenizer
[ https://issues.apache.org/jira/browse/LUCENE-5388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865626#comment-13865626 ] Benson Margulies commented on LUCENE-5388: -- OK, I see. If we don't do compatibility, then no one calls setReader in createComponents, and all is well. OK, I'm proceeding. Eliminate construction over readers for Tokenizer - Key: LUCENE-5388 URL: https://issues.apache.org/jira/browse/LUCENE-5388 Project: Lucene - Core Issue Type: Improvement Components: core/other Reporter: Benson Margulies In the modern world, Tokenizers are intended to be reusable, with input supplied via #setReader. The constructors that take Reader are a vestige. Worse yet, they invite people to make mistakes in handling the reader that tangle them up with the state machine in Tokenizer. The sensible thing is to eliminate these ctors, and force setReader usage. -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5620) Race condition while setting ZkStateReader.aliases
[ https://issues.apache.org/jira/browse/SOLR-5620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865625#comment-13865625 ] Ramkumar Aiyengar commented on SOLR-5620: - I added a synchronized section around updateAliases just for consistency with the Watcher code in my second commit, though I am unsure of why it's even present in the Watcher code in the first place? Race condition while setting ZkStateReader.aliases -- Key: SOLR-5620 URL: https://issues.apache.org/jira/browse/SOLR-5620 Project: Solr Issue Type: Bug Components: SolrCloud Affects Versions: 4.6 Reporter: Ramkumar Aiyengar Priority: Minor Noticed while working on SOLR-5615, https://github.com/apache/lucene-solr/pull/15 for a patch. -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5388) Eliminate construction over readers for Tokenizer
[ https://issues.apache.org/jira/browse/LUCENE-5388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865634#comment-13865634 ] Benson Margulies commented on LUCENE-5388: -- Why does the reader get passed to createComponents in this model? Should that param go away? Eliminate construction over readers for Tokenizer - Key: LUCENE-5388 URL: https://issues.apache.org/jira/browse/LUCENE-5388 Project: Lucene - Core Issue Type: Improvement Components: core/other Reporter: Benson Margulies In the modern world, Tokenizers are intended to be reusable, with input supplied via #setReader. The constructors that take Reader are a vestige. Worse yet, they invite people to make mistakes in handling the reader that tangle them up with the state machine in Tokenizer. The sensible thing is to eliminate these ctors, and force setReader usage. -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5619) Improve BinaryField to make it Sortable and Indexable
[ https://issues.apache.org/jira/browse/SOLR-5619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865640#comment-13865640 ] Erick Erickson commented on SOLR-5619: -- Right, I'm not arguing that it never makes sense, just that we should be clear during implementation what the expectations/use-cases are. For instance, I can see defining a tokenizer that split the binary input into tokens based on a base64 encoding of some special characters that had meaning only to the specific app trying to index/search binary values. I suppose the real work will be coaching people on what searching binary values really means. It wouldn't be, for instance, something you'd do face recognition with :) Improve BinaryField to make it Sortable and Indexable - Key: SOLR-5619 URL: https://issues.apache.org/jira/browse/SOLR-5619 Project: Solr Issue Type: Improvement Affects Versions: 4.6 Reporter: Anshum Gupta Assignee: Anshum Gupta Currently, BinaryField can neither be indexed nor sorted on. Having them indexable and sortable should come in handy for a reasonable amount of use cases e.g. wanting to index binary data (could come from anything non-text). -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4414) MoreLikeThis on a shard finds no interesting terms if the document queried is not in that shard
[ https://issues.apache.org/jira/browse/SOLR-4414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865651#comment-13865651 ] Nimrod Gliksman commented on SOLR-4414: --- Thanks MoreLikeThis on a shard finds no interesting terms if the document queried is not in that shard --- Key: SOLR-4414 URL: https://issues.apache.org/jira/browse/SOLR-4414 Project: Solr Issue Type: Bug Components: MoreLikeThis, SolrCloud Affects Versions: 4.1 Reporter: Colin Bartolome Running a MoreLikeThis query in a cloud works only when the document being queried exists in whatever shard serves the request. If the document is not present in the shard, no interesting terms are found and, consequently, no matches are found. h5. Steps to reproduce * Edit example/solr/collection1/conf/solrconfig.xml and add this line, with the rest of the request handlers: {code:xml} requestHandler name=/mlt class=solr.MoreLikeThisHandler / {code} * Follow the [simplest SolrCloud example|http://wiki.apache.org/solr/SolrCloud#Example_A:_Simple_two_shard_cluster] to get two shards running. * Hit this URL: [http://localhost:8983/solr/collection1/mlt?mlt.fl=includesq=id:3007WFPmlt.match.include=falsemlt.interestingTerms=listmlt.mindf=1mlt.mintf=1] * Compare that output to that of this URL: [http://localhost:7574/solr/collection1/mlt?mlt.fl=includesq=id:3007WFPmlt.match.include=falsemlt.interestingTerms=listmlt.mindf=1mlt.mintf=1] The former URL will return a result and list some interesting terms. The latter URL will return no results and list no interesting terms. It will also show this odd XML element: {code:xml} null name=response/ {code} -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5488) Fix up test failures for Analytics Component
[ https://issues.apache.org/jira/browse/SOLR-5488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865657#comment-13865657 ] Mark Miller commented on SOLR-5488: --- Linux. The occur pretty frequently in the jenkins runs too, but I have not paid attention to if it's just a specific OS. I assumed it was a timing issue. Those can be hard to duplicate sometimes either due to raw hardware speed differences or other differences leading to different tests running at the same time, etc. Fix up test failures for Analytics Component Key: SOLR-5488 URL: https://issues.apache.org/jira/browse/SOLR-5488 Project: Solr Issue Type: Bug Affects Versions: 5.0, 4.7 Reporter: Erick Erickson Assignee: Erick Erickson Attachments: SOLR-5488.patch, SOLR-5488.patch, SOLR-5488.patch The analytics component has a few test failures, perhaps environment-dependent. This is just to collect the test fixes in one place for convenience when we merge back into 4.x -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
lucene-solr pull request: LUCENE-5388: code and highlighting changes to rem...
GitHub user benson-basis opened a pull request: https://github.com/apache/lucene-solr/pull/16 LUCENE-5388: code and highlighting changes to remove Reader from Tokenizers You can merge this pull request into a Git repository by running: $ git pull https://github.com/benson-basis/lucene-solr lucene-5388-tokenizer-ctor-changes Alternatively you can review and apply these changes as the patch at: https://github.com/apache/lucene-solr/pull/16.patch commit 9c0bc03d6d233ce5b7595bdca8acd52cf40c6337 Author: Benson Margulies ben...@basistech.com Date: 2014-01-08T17:17:18Z LUCENE-5388: code and highlighting changes to remove Reader from Tokenizer ctors. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Nested Grouping / Field Collapsing
Anyone has got latest updates for https://issues.apache.org/jira/browse/SOLR-2553 ? I am trying to take a look at the implementation and see how complex this is to achieve. If someone else had a look into it earlier, could you please share your thoughts/comments.. Thanks, Kranti K. Parisa http://www.linkedin.com/in/krantiparisa
[jira] [Commented] (LUCENE-5388) Eliminate construction over readers for Tokenizer
[ https://issues.apache.org/jira/browse/LUCENE-5388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865661#comment-13865661 ] Benson Margulies commented on LUCENE-5388: -- https://github.com/apache/lucene-solr/pull/16 is available for your read pleasure to see what these changes look like. Eliminate construction over readers for Tokenizer - Key: LUCENE-5388 URL: https://issues.apache.org/jira/browse/LUCENE-5388 Project: Lucene - Core Issue Type: Improvement Components: core/other Reporter: Benson Margulies In the modern world, Tokenizers are intended to be reusable, with input supplied via #setReader. The constructors that take Reader are a vestige. Worse yet, they invite people to make mistakes in handling the reader that tangle them up with the state machine in Tokenizer. The sensible thing is to eliminate these ctors, and force setReader usage. -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5354) Blended score in AnalyzingInfixSuggester
[ https://issues.apache.org/jira/browse/LUCENE-5354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865659#comment-13865659 ] Michael McCandless commented on LUCENE-5354: New patch looks great, thanks Remi! I'm worried about how costly iterating over term vectors is going to be ... are you planning to run the performance test? If not, I can. It might be better to open up a protected method to convert the smallest position to the coefficient? The default impl can do the switch based on the BlenderType enum... but apps may want to control how the score is boosted by position. Blended score in AnalyzingInfixSuggester Key: LUCENE-5354 URL: https://issues.apache.org/jira/browse/LUCENE-5354 Project: Lucene - Core Issue Type: Improvement Components: modules/spellchecker Affects Versions: 4.4 Reporter: Remi Melisson Priority: Minor Labels: suggester Attachments: LUCENE-5354.patch, LUCENE-5354_2.patch I'm working on a custom suggester derived from the AnalyzingInfix. I require what is called a blended score (//TODO ln.399 in AnalyzingInfixSuggester) to transform the suggestion weights depending on the position of the searched term(s) in the text. Right now, I'm using an easy solution : If I want 10 suggestions, then I search against the current ordered index for the 100 first results and transform the weight : bq. a) by using the term position in the text (found with TermVector and DocsAndPositionsEnum) or bq. b) by multiplying the weight by the score of a SpanQuery that I add when searching and return the updated 10 most weighted suggestions. Since we usually don't need to suggest so many things, the bigger search + rescoring overhead is not so significant but I agree that this is not the most elegant solution. We could include this factor (here the position of the term) directly into the index. So, I can contribute to this if you think it's worth adding it. Do you think I should tweak AnalyzingInfixSuggester, subclass it or create a dedicated class ? -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5388) Eliminate construction over readers for Tokenizer
[ https://issues.apache.org/jira/browse/LUCENE-5388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865667#comment-13865667 ] Benson Margulies commented on LUCENE-5388: -- [~rcmuir] Next frontier is TokenizerFactory. Do we change #create to not take a reader, or do we add 'throws IOException'? Based on comments above, I'd think we take out the reader. [~mikemccand] I would love help. If you tell me your github id, I'll add you to my repo, and you can take up some of the ton of editing. Eliminate construction over readers for Tokenizer - Key: LUCENE-5388 URL: https://issues.apache.org/jira/browse/LUCENE-5388 Project: Lucene - Core Issue Type: Improvement Components: core/other Reporter: Benson Margulies In the modern world, Tokenizers are intended to be reusable, with input supplied via #setReader. The constructors that take Reader are a vestige. Worse yet, they invite people to make mistakes in handling the reader that tangle them up with the state machine in Tokenizer. The sensible thing is to eliminate these ctors, and force setReader usage. -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5388) Eliminate construction over readers for Tokenizer
[ https://issues.apache.org/jira/browse/LUCENE-5388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865675#comment-13865675 ] Robert Muir commented on LUCENE-5388: - {quote} Why does the reader get passed to createComponents in this model? Should that param go away? {quote} Yes: please nuke it! Eliminate construction over readers for Tokenizer - Key: LUCENE-5388 URL: https://issues.apache.org/jira/browse/LUCENE-5388 Project: Lucene - Core Issue Type: Improvement Components: core/other Reporter: Benson Margulies In the modern world, Tokenizers are intended to be reusable, with input supplied via #setReader. The constructors that take Reader are a vestige. Worse yet, they invite people to make mistakes in handling the reader that tangle them up with the state machine in Tokenizer. The sensible thing is to eliminate these ctors, and force setReader usage. -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5388) Eliminate construction over readers for Tokenizer
[ https://issues.apache.org/jira/browse/LUCENE-5388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865677#comment-13865677 ] Robert Muir commented on LUCENE-5388: - {quote} Do we change #create to not take a reader, or do we add 'throws IOException'? Based on comments above, I'd think we take out the reader. {quote} Yes, please. its not like a factory can do anything fancy with it anyway: its only called the first time, subsequent readers are invoked with setReader. so this is just more of the same, please nuke it! Eliminate construction over readers for Tokenizer - Key: LUCENE-5388 URL: https://issues.apache.org/jira/browse/LUCENE-5388 Project: Lucene - Core Issue Type: Improvement Components: core/other Reporter: Benson Margulies In the modern world, Tokenizers are intended to be reusable, with input supplied via #setReader. The constructors that take Reader are a vestige. Worse yet, they invite people to make mistakes in handling the reader that tangle them up with the state machine in Tokenizer. The sensible thing is to eliminate these ctors, and force setReader usage. -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5388) Eliminate construction over readers for Tokenizer
[ https://issues.apache.org/jira/browse/LUCENE-5388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865678#comment-13865678 ] Uwe Schindler commented on LUCENE-5388: --- Yes, createComponents should no longer get a Reader, too! Same for factories. The factory just creates the instance, setting the reading is up to the consumer. Eliminate construction over readers for Tokenizer - Key: LUCENE-5388 URL: https://issues.apache.org/jira/browse/LUCENE-5388 Project: Lucene - Core Issue Type: Improvement Components: core/other Reporter: Benson Margulies In the modern world, Tokenizers are intended to be reusable, with input supplied via #setReader. The constructors that take Reader are a vestige. Worse yet, they invite people to make mistakes in handling the reader that tangle them up with the state machine in Tokenizer. The sensible thing is to eliminate these ctors, and force setReader usage. -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5388) Eliminate construction over readers for Tokenizer
[ https://issues.apache.org/jira/browse/LUCENE-5388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865681#comment-13865681 ] Uwe Schindler commented on LUCENE-5388: --- The cool thing is: In Analyzer we may simplify things like the initReader() protected method. We should also look at those APIs. Most of the code in Analyzer is to work around the ctor / setReader stuff. Eliminate construction over readers for Tokenizer - Key: LUCENE-5388 URL: https://issues.apache.org/jira/browse/LUCENE-5388 Project: Lucene - Core Issue Type: Improvement Components: core/other Reporter: Benson Margulies In the modern world, Tokenizers are intended to be reusable, with input supplied via #setReader. The constructors that take Reader are a vestige. Worse yet, they invite people to make mistakes in handling the reader that tangle them up with the state machine in Tokenizer. The sensible thing is to eliminate these ctors, and force setReader usage. -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-5618) Reproducible failure from TestFiltering.testRandomFiltering
[ https://issues.apache.org/jira/browse/SOLR-5618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hoss Man updated SOLR-5618: --- Attachment: SOLR-5618.patch I've recreated the failure conditions in a non-randomized test. See testHoSanity in the updated patch for the details, but the bottom line is after building up two sets of SolrParams (match_1 and match_0), we have a situation where the following test code fails on the last line (match_1 gets a numFound==0)... {code} // 1 then 0 assertJQ(req(match_1), /response/numFound==1); assertJQ(req(match_0), /response/numFound==0); // clear caches assertU(commit()); // 0 then 1 assertJQ(req(match_0), /response/numFound==0); assertJQ(req(match_1), /response/numFound==1); {code} ...which definitely smells like a caching bug. Perhaps this is a Query.equals() problem with one of the query classes used in the test? I'll investigate a bit more later today -- starting with trying to simplify the params to the barest case that still fails. Reproducible failure from TestFiltering.testRandomFiltering --- Key: SOLR-5618 URL: https://issues.apache.org/jira/browse/SOLR-5618 Project: Solr Issue Type: Bug Reporter: Hoss Man Attachments: SOLR-5618.patch, SOLR-5618.patch uwe's jenkins found this in java8... http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-Linux/9004/consoleText {noformat} [junit4] 2 NOTE: reproduce with: ant test -Dtestcase=TestFiltering -Dtests.method=testRandomFiltering -Dtests.seed=C22042E80957AE3E -Dtests.multiplier=3 -Dtests.slow=true -Dtests.locale=ar_LY -Dtests.timezone=Asia/Katmandu -Dtests.file.encoding=UTF-8 [junit4] FAILURE 16.9s J1 | TestFiltering.testRandomFiltering [junit4] Throwable #1: java.lang.AssertionError: FAILURE: iiter=11 qiter=336 request=[q, {!frange v=val_i l=0 u=1 cost=139 tag=t}, fq, {!frange v=val_i l=0 u=1}, fq, {! cost=92}-_query_:{!frange v=val_i l=1 u=1}, fq, {!frange v=val_i l=0 u=1 cache=true tag=t}, fq, {! cache=true tag=t}-_query_:{!frange v=val_i l=1 u=1}] [junit4] at __randomizedtesting.SeedInfo.seed([C22042E80957AE3E:DD43E12DEC70EE37]:0) [junit4] at org.apache.solr.search.TestFiltering.testRandomFiltering(TestFiltering.java:327) {noformat} The seed fails consistently for me on trunk using java7, and on 4x using both java7 and java6 - details to follow in comment. -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: The Old Git Discussion
+1 David and Mark I also like Lajos have - very sadly - not contributed as much as I'd want to Lucene, but having followed this thread with interest for a while, I hope my contribution is well received. I do sympathize with all the problems which have been highlighted about Git as I've had the same impression 3 years ago when all our projects (Hibernate) where moved to Git, and I was the skeptical one back then. I have suffered from it for a couple of weeks, while I was pointlessly trying to map my previous SVN workflow on Git.. until I realized that that was the main crux of my pain with it. I really do have to admit I was just stubborn and grown up in bad habits, I'm extremely happy we moved now.. and yes - no offence - but from an outsider you all look like carving code on a stone wall with stone axes. Sparing you all the details of what I did wrong and how exactly it should be used the point is really a huge flexibility and a better model for the problem it solves. On this thread I've seen several problems being pointed out about git, but while I'd be happy to chat about each single one, for the sake of brevity my impression is just confusion by people who are trying to use it as it was an alias to svn. To put it boldly you're missing the point :-) If you need details, feel free to ask here or contact me on IRC: I'm afraid my email is too long already. Would be good to see some negative points from someone who actually used it for a significant time. From my part for example I don't like the complexity of handling merges; but then again we also use fast-forward only; considering that, maybe I've never actually understood how a merge should be done - as I've never practiced it. Please take it as an example of how you don't need to learn all its details and still get huge benefits from it: in 47 releases, for 3 years long, ~100 contributors have been happily collaborating, we developed a workflow which suites us best and never ever needed to do a merge. And yes I confirm it feels very odd for an occasional contributor that you guys still work by attaching patch files to JIRA. - Sanne On 8 January 2014 00:45, David Smiley (@MITRE.org) dsmi...@mitre.org wrote: +1, Mark. Git isn't perfect; I sympathize with the annoyances pointed out by Rob et. all. But I think we would be better off for it -- a net win considering the upsides. In the end I'd love to track changes via branches (which includes forks people make to add changes), not with attaching patch files to an issue tracker. The way we do things here sucks for collaboration and it's a higher bar for people to get involved than it can and should be. ~ David Mark Miller-3 wrote I don’t really buy the fad argument, but as I’ve said, I’m willing to wait a little longer for others to catch on. I try and follow the stats and reports and articles on this pretty closely. As I mentioned early in the thread, by all appearances, the shift from SVN to GIT looks much like the shift from CVS to SVN. This was not a fad change, nor is the next mass movement likely to be. Just like no one starts a project on CVS anymore, we are almost already to the point where new projects start exclusive on GIT - especially open source. I’m happy to sit back and watch the trend continue though. The number of GIT users in the committee and among the committers only grows every time the discussion comes up. If this was 2009, 2010, 2011 … who knows, perhaps I would buy some fad argument. But it just doesn’t jive in 2014. - Mark - Author: http://www.packtpub.com/apache-solr-3-enterprise-search-server/book -- View this message in context: http://lucene.472066.n3.nabble.com/The-Old-Git-Discussion-tp4109193p4110109.html Sent from the Lucene - Java Developer mailing list archive at Nabble.com. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: The Old Git Discussion
On Wed, Jan 8, 2014 at 1:25 PM, Sanne Grinovero sanne.grinov...@gmail.com wrote: sake of brevity my impression is just confusion by people who are trying to use it as it was an alias to svn. To put it boldly you're missing the point :-) Would be good to see some negative points from someone who actually used it for a significant time. And these are two typical quotes from git fantatics: they assume that the person complaining about their shitty tool knows nothing about distributed version control and is an idiot, etc, etc. Again, ive been using git on my day job for over a year. I've also used mercurial, which was extremely intuitive and usable (I think i came up to speed on this almost immediately). The problem is not that I havent used git, and its not that i'm missing the point of distributed VC. Git is just really done badly. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-4260) Inconsistent numDocs between leader and replica
[ https://issues.apache.org/jira/browse/SOLR-4260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Timothy Potter updated SOLR-4260: - Attachment: demo_shard1_replicas_out_of_sync.tgz While doing some other testing of SolrCloud (branch4x - 4.7-SNAPSHOT rev. 1556055), I hit this issue and here's the kicker ... there were no errors in my replica's log, the tlogs are identical, and there was no significant GC activity during the time where the replica got out of sync with the leader. I'm attaching the data directories (index + tlog) for both replicas (demo_shard1_replica1 [leader], and demo_shard1_replica2) and their log files. When I do a doc-by-doc comparison of the two indexes, here's the result: finished querying replica1, found 33537 documents (33537) finished querying replica2, found 33528 documents Doc [82995] not found in replica2: doc boost=1.0field name=id82995/fieldfield name=string_stest/fieldfield name=int_i-274468088/fieldfield name=float_f0.90338105/fieldfield name=double_d0.6949391474539932/fieldfield name=text_enthis is a test/fieldfield name=_version_1456683668206518274/field/doc Doc [82997] not found in replica2: doc boost=1.0field name=id82997/fieldfield name=string_stest/fieldfield name=int_i301737117/fieldfield name=float_f0.6746266/fieldfield name=double_d0.26034065188918565/fieldfield name=text_enthis is a test/fieldfield name=_version_1456683668206518276/field/doc Doc [82996] not found in replica2: doc boost=1.0field name=id82996/fieldfield name=string_stest/fieldfield name=int_i-1768315588/fieldfield name=float_f0.6641093/fieldfield name=double_d0.23708033183534993/fieldfield name=text_enthis is a test/fieldfield name=_version_1456683668206518275/field/doc Doc [82991] not found in replica2: doc boost=1.0field name=id82991/fieldfield name=string_stest/fieldfield name=int_i-2057280061/fieldfield name=float_f0.27617514/fieldfield name=double_d0.7885214691953506/fieldfield name=text_enthis is a test/fieldfield name=_version_1456683668206518273/field/doc Doc [82987] not found in replica2: doc boost=1.0field name=id82987/fieldfield name=string_stest/fieldfield name=int_i1051456320/fieldfield name=float_f0.51863414/fieldfield name=double_d0.7881255443862878/fieldfield name=text_enthis is a test/fieldfield name=_version_1456683668206518272/field/doc Doc [82986] not found in replica2: doc boost=1.0field name=id82986/fieldfield name=string_stest/fieldfield name=int_i-1356807889/fieldfield name=float_f0.2762279/fieldfield name=double_d0.003657816979820372/fieldfield name=text_enthis is a test/fieldfield name=_version_1456683668205469699/field/doc Doc [82984] not found in replica2: doc boost=1.0field name=id82984/fieldfield name=string_stest/fieldfield name=int_i732678870/fieldfield name=float_f0.31199205/fieldfield name=double_d0.9848865821766198/fieldfield name=text_enthis is a test/fieldfield name=_version_1456683668205469698/field/doc Doc [82970] not found in replica2: doc boost=1.0field name=id82970/fieldfield name=string_stest/fieldfield name=int_i283693979/fieldfield name=float_f0.6119651/fieldfield name=double_d0.04142006867388914/fieldfield name=text_enthis is a test/fieldfield name=_version_1456683668205469696/field/doc Doc [82973] not found in replica2: doc boost=1.0field name=id82973/fieldfield name=string_stest/fieldfield name=int_i1343103920/fieldfield name=float_f0.5855809/fieldfield name=double_d0.6575904716584224/fieldfield name=text_enthis is a test/fieldfield name=_version_1456683668205469697/field/doc No amount of committing or reloading of these cores helps. Also, restarting replica2 doesn't lead to it being in-sync either, most likely because the tlog is identical to the leader? Here's the log messages on replica2 after restarting it. 2014-01-08 13:28:20,112 [searcherExecutor-5-thread-1] INFO solr.core.SolrCore - [demo_shard1_replica2] Registered new searcher Searcher@4345de8a main{StandardDirectoryReader(segments_e:38:nrt _d(4.7):C26791 _e(4.7):C3356 _f(4.7):C3381)} 2014-01-08 13:28:21,298 [RecoveryThread] INFO solr.cloud.RecoveryStrategy - Attempting to PeerSync from http://ec2-54-209-223-12.compute-1.amazonaws.com:8984/solr/demo_shard1_replica1/ core=demo_shard1_replica2 - recoveringAfterStartup=true 2014-01-08 13:28:21,302 [RecoveryThread] INFO solr.update.PeerSync - PeerSync: core=demo_shard1_replica2 url=http://ec2-54-209-97-145.compute-1.amazonaws.com:8984/solr START replicas=[http://ec2-54-209-223-12.compute-1.amazonaws.com:8984/solr/demo_shard1_replica1/] nUpdates=100 2014-01-08 13:28:21,330 [RecoveryThread] INFO solr.update.PeerSync - PeerSync: core=demo_shard1_replica2 url=http://ec2-54-209-97-145.compute-1.amazonaws.com:8984/solr Received 99 versions from ec2-54-209-223-12.compute-1.amazonaws.com:8984/solr/demo_shard1_replica1/ 2014-01-08 13:28:21,331 [RecoveryThread] INFO solr.update.PeerSync - PeerSync:
[jira] [Commented] (SOLR-4260) Inconsistent numDocs between leader and replica
[ https://issues.apache.org/jira/browse/SOLR-4260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865742#comment-13865742 ] Mark Miller commented on SOLR-4260: --- I've noticed something like this too - but nothing i could reproduce easily. I imagine it's likely an issue in SolrCmdDistributor. Inconsistent numDocs between leader and replica --- Key: SOLR-4260 URL: https://issues.apache.org/jira/browse/SOLR-4260 Project: Solr Issue Type: Bug Components: SolrCloud Environment: 5.0.0.2013.01.04.15.31.51 Reporter: Markus Jelsma Assignee: Mark Miller Priority: Critical Fix For: 5.0, 4.7 Attachments: 192.168.20.102-replica1.png, 192.168.20.104-replica2.png, clusterstate.png, demo_shard1_replicas_out_of_sync.tgz After wiping all cores and reindexing some 3.3 million docs from Nutch using CloudSolrServer we see inconsistencies between the leader and replica for some shards. Each core hold about 3.3k documents. For some reason 5 out of 10 shards have a small deviation in then number of documents. The leader and slave deviate for roughly 10-20 documents, not more. Results hopping ranks in the result set for identical queries got my attention, there were small IDF differences for exactly the same record causing a record to shift positions in the result set. During those tests no records were indexed. Consecutive catch all queries also return different number of numDocs. We're running a 10 node test cluster with 10 shards and a replication factor of two and frequently reindex using a fresh build from trunk. I've not seen this issue for quite some time until a few days ago. -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4260) Inconsistent numDocs between leader and replica
[ https://issues.apache.org/jira/browse/SOLR-4260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865744#comment-13865744 ] Mark Miller commented on SOLR-4260: --- Although that doesn't really jive with the tran logs being identical...hmm... Inconsistent numDocs between leader and replica --- Key: SOLR-4260 URL: https://issues.apache.org/jira/browse/SOLR-4260 Project: Solr Issue Type: Bug Components: SolrCloud Environment: 5.0.0.2013.01.04.15.31.51 Reporter: Markus Jelsma Assignee: Mark Miller Priority: Critical Fix For: 5.0, 4.7 Attachments: 192.168.20.102-replica1.png, 192.168.20.104-replica2.png, clusterstate.png, demo_shard1_replicas_out_of_sync.tgz After wiping all cores and reindexing some 3.3 million docs from Nutch using CloudSolrServer we see inconsistencies between the leader and replica for some shards. Each core hold about 3.3k documents. For some reason 5 out of 10 shards have a small deviation in then number of documents. The leader and slave deviate for roughly 10-20 documents, not more. Results hopping ranks in the result set for identical queries got my attention, there were small IDF differences for exactly the same record causing a record to shift positions in the result set. During those tests no records were indexed. Consecutive catch all queries also return different number of numDocs. We're running a 10 node test cluster with 10 shards and a replication factor of two and frequently reindex using a fresh build from trunk. I've not seen this issue for quite some time until a few days ago. -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5604) Remove deprecations caused by httpclient 4.3.x upgrade
[ https://issues.apache.org/jira/browse/SOLR-5604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865753#comment-13865753 ] Shawn Heisey commented on SOLR-5604: [~olegk] replied to my httpclient-users post. Thank you! Some general thoughts regarding how SolrServer implementations use HttpClient: Currently we have a number of methods that change HttpClient settings after a SolrServer is created. I think there are two choices for dealing with this, and the actual solution might be a blend of both: 1) Deprecate those methods and require users to create their own HttpClient object if they want to change those settings. 2) Have those methods change fields in the class which are then used to change settings when HC request objects are built. I think the httpclient changes for Lucene need to be split off into a separate issue under the LUCENE project. Remove deprecations caused by httpclient 4.3.x upgrade -- Key: SOLR-5604 URL: https://issues.apache.org/jira/browse/SOLR-5604 Project: Solr Issue Type: Improvement Affects Versions: 4.7 Reporter: Shawn Heisey Fix For: 5.0, 4.7 Attachments: SOLR-5604-4x-just-lucene.patch SOLR-5590 upgraded httpclient in Solr and Lucene to version 4.3.x. This version deprecates a LOT of classes and methods, recommending that they all be replaced with various methods from the HttpClientBuilder class. -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4260) Inconsistent numDocs between leader and replica
[ https://issues.apache.org/jira/browse/SOLR-4260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865769#comment-13865769 ] Mark Miller commented on SOLR-4260: --- No, wait, it could jive. We only check the last 99 docs on peer sync - if bunch of docs just didn't show up well before that, it wouldn't be detected by peer sync. I still think SolrCmdDistributor is the first place to look. Inconsistent numDocs between leader and replica --- Key: SOLR-4260 URL: https://issues.apache.org/jira/browse/SOLR-4260 Project: Solr Issue Type: Bug Components: SolrCloud Environment: 5.0.0.2013.01.04.15.31.51 Reporter: Markus Jelsma Assignee: Mark Miller Priority: Critical Fix For: 5.0, 4.7 Attachments: 192.168.20.102-replica1.png, 192.168.20.104-replica2.png, clusterstate.png, demo_shard1_replicas_out_of_sync.tgz After wiping all cores and reindexing some 3.3 million docs from Nutch using CloudSolrServer we see inconsistencies between the leader and replica for some shards. Each core hold about 3.3k documents. For some reason 5 out of 10 shards have a small deviation in then number of documents. The leader and slave deviate for roughly 10-20 documents, not more. Results hopping ranks in the result set for identical queries got my attention, there were small IDF differences for exactly the same record causing a record to shift positions in the result set. During those tests no records were indexed. Consecutive catch all queries also return different number of numDocs. We're running a 10 node test cluster with 10 shards and a replication factor of two and frequently reindex using a fresh build from trunk. I've not seen this issue for quite some time until a few days ago. -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5619) Improve BinaryField to make it Sortable and Indexable
[ https://issues.apache.org/jira/browse/SOLR-5619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865787#comment-13865787 ] Anshum Gupta commented on SOLR-5619: Now that Hoss already mentioned the initial reason why I opened this issue, I'll just add a bit more to it. I think this would come in handy for people wanting to do that i.e. split the binary input into tokens based on a base64 encoding of some special characters, thereby also enabling users to run stuff like prefix/range queries against the field. Improve BinaryField to make it Sortable and Indexable - Key: SOLR-5619 URL: https://issues.apache.org/jira/browse/SOLR-5619 Project: Solr Issue Type: Improvement Affects Versions: 4.6 Reporter: Anshum Gupta Assignee: Anshum Gupta Currently, BinaryField can neither be indexed nor sorted on. Having them indexable and sortable should come in handy for a reasonable amount of use cases e.g. wanting to index binary data (could come from anything non-text). -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5369) Add an UpperCaseFilter
[ https://issues.apache.org/jira/browse/LUCENE-5369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865803#comment-13865803 ] ASF subversion and git services commented on LUCENE-5369: - Commit 1556617 from [~ryantxu] in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1556617 ] LUCENE-5369: Added an UpperCaseFilter to make UPPERCASE tokens Add an UpperCaseFilter -- Key: LUCENE-5369 URL: https://issues.apache.org/jira/browse/LUCENE-5369 Project: Lucene - Core Issue Type: New Feature Reporter: Ryan McKinley Assignee: Ryan McKinley Priority: Minor Attachments: LUCENE-5369-uppercase-filter.patch We should offer a standard way to force upper-case tokens. I understand that lowercase is safer for general search quality because some uppercase characters can represent multiple lowercase ones. However, having upper-case tokens is often nice for faceting (consider normalizing to standard acronyms) -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5369) Add an UpperCaseFilter
[ https://issues.apache.org/jira/browse/LUCENE-5369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865807#comment-13865807 ] ASF subversion and git services commented on LUCENE-5369: - Commit 1556618 from [~ryantxu] in branch 'dev/trunk' [ https://svn.apache.org/r1556618 ] LUCENE-5369: Added an UpperCaseFilter to make UPPERCASE tokens (merge from 4x) Add an UpperCaseFilter -- Key: LUCENE-5369 URL: https://issues.apache.org/jira/browse/LUCENE-5369 Project: Lucene - Core Issue Type: New Feature Reporter: Ryan McKinley Assignee: Ryan McKinley Priority: Minor Attachments: LUCENE-5369-uppercase-filter.patch We should offer a standard way to force upper-case tokens. I understand that lowercase is safer for general search quality because some uppercase characters can represent multiple lowercase ones. However, having upper-case tokens is often nice for faceting (consider normalizing to standard acronyms) -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
RE: The Old Git Discussion
Guess it's time I jump in as well, although I'm not really a committer (only a couple of small patches submitted through Jira). Before anything, no, I haven't used GIT. I don't think this is reason enough to just disregard my comments though. :) Basically, I wouldn't mind one or the other, even if it meant adapting my tools and knowledge from SVN to GIT, the important thing to me is the project itself, not where code is stored. The one thing that is really keeping me away from GIT is the all or nothing nature of it. This is why I'm a Maven SVN fan more than a GIT ant one. Concretely, when I need to work on a module for a project at work, I simply go in eclipse, browse to the specific module, right-click the pom file and checkout as maven project. I can then start working on the specific module right away, all dependencies configured properly. If I need to look at a dependency's code, I do the same on it and my workspace gets updated appropriately. I don't need the whole code which can take forever to compile. The fact that I need to get the whole repo to work on a single file just doesn't seem right. The single project containing everything you get after running ant eclipse in Solr also causes me headaches. You can have code working perfectly in eclipse that will not compile with ant because of project dependencies hidden while you work. But no matter what happens, I will adapt and continue using Solr and contributing when I can, because the project itself is far from looking like it's been carved on stone wall using stone axes... :P Steve From: Robert Muir [rcm...@gmail.com] Sent: January 8, 2014 1:30 PM To: dev@lucene.apache.org Subject: Re: The Old Git Discussion On Wed, Jan 8, 2014 at 1:25 PM, Sanne Grinovero sanne.grinov...@gmail.com wrote: sake of brevity my impression is just confusion by people who are trying to use it as it was an alias to svn. To put it boldly you're missing the point :-) Would be good to see some negative points from someone who actually used it for a significant time. And these are two typical quotes from git fantatics: they assume that the person complaining about their shitty tool knows nothing about distributed version control and is an idiot, etc, etc. Again, ive been using git on my day job for over a year. I've also used mercurial, which was extremely intuitive and usable (I think i came up to speed on this almost immediately). The problem is not that I havent used git, and its not that i'm missing the point of distributed VC. Git is just really done badly. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5376) Add a demo search server
[ https://issues.apache.org/jira/browse/LUCENE-5376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865810#comment-13865810 ] ASF subversion and git services commented on LUCENE-5376: - Commit 1556620 from [~mikemccand] in branch 'dev/branches/lucene5376' [ https://svn.apache.org/r1556620 ] LUCENE-5376: also allow dynamic expression per-request Add a demo search server Key: LUCENE-5376 URL: https://issues.apache.org/jira/browse/LUCENE-5376 Project: Lucene - Core Issue Type: Improvement Reporter: Michael McCandless Assignee: Michael McCandless Attachments: lucene-demo-server.tgz I think it'd be useful to have a demo search server for Lucene. Rather than being fully featured, like Solr, it would be minimal, just wrapping the existing Lucene modules to show how you can make use of these features in a server setting. The purpose is to demonstrate how one can build a minimal search server on top of APIs like SearchManager, SearcherLifetimeManager, etc. This is also useful for finding rough edges / issues in Lucene's APIs that make building a server unnecessarily hard. I don't think it should have back compatibility promises (except Lucene's index back compatibility), so it's free to improve as Lucene's APIs change. As a starting point, I'll post what I built for the eating your own dog food search app for Lucene's Solr's jira issues http://jirasearch.mikemccandless.com (blog: http://blog.mikemccandless.com/2013/05/eating-dog-food-with-lucene.html ). It uses Netty to expose basic indexing searching APIs via JSON, but it's very rough (lots nocommits). -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-5480) Make MoreLikeThisHandler distributable
[ https://issues.apache.org/jira/browse/SOLR-5480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Molloy updated SOLR-5480: --- Attachment: SOLR-5480.patch Attempt at a patch for trunk (rev 1556570). Got it to compile but not currently setup to test with trunk. Make MoreLikeThisHandler distributable -- Key: SOLR-5480 URL: https://issues.apache.org/jira/browse/SOLR-5480 Project: Solr Issue Type: Improvement Reporter: Steve Molloy Attachments: SOLR-5480.patch, SOLR-5480.patch, SOLR-5480.patch, SOLR-5480.patch, SOLR-5480.patch, SOLR-5480.patch The MoreLikeThis component, when used in the standard search handler supports distributed searches. But the MoreLikeThisHandler itself doesn't, which prevents from say, passing in text to perform the query. I'll start looking into adapting the SearchHandler logic to the MoreLikeThisHandler. If anyone has some work done already and want to share, or want to contribute, any help will be welcomed. -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5230) Call DelegatingCollector.finish() during grouping
[ https://issues.apache.org/jira/browse/SOLR-5230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865854#comment-13865854 ] Steve Rowe commented on SOLR-5230: -- [~joel.bernstein], is this committable? Call DelegatingCollector.finish() during grouping - Key: SOLR-5230 URL: https://issues.apache.org/jira/browse/SOLR-5230 Project: Solr Issue Type: New Feature Components: search Affects Versions: 4.4 Reporter: Joel Bernstein Priority: Minor Fix For: 4.7 Attachments: SOLR-5230.patch This is an add-on to SOLR-5020 to call the new DelegatingCollector.finish() method from inside the grouping flow. -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5376) Add a demo search server
[ https://issues.apache.org/jira/browse/LUCENE-5376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865861#comment-13865861 ] ASF subversion and git services commented on LUCENE-5376: - Commit 1556627 from [~mikemccand] in branch 'dev/branches/lucene5376' [ https://svn.apache.org/r1556627 ] LUCENE-5376: remove recency blending hack: just use expressions instead Add a demo search server Key: LUCENE-5376 URL: https://issues.apache.org/jira/browse/LUCENE-5376 Project: Lucene - Core Issue Type: Improvement Reporter: Michael McCandless Assignee: Michael McCandless Attachments: lucene-demo-server.tgz I think it'd be useful to have a demo search server for Lucene. Rather than being fully featured, like Solr, it would be minimal, just wrapping the existing Lucene modules to show how you can make use of these features in a server setting. The purpose is to demonstrate how one can build a minimal search server on top of APIs like SearchManager, SearcherLifetimeManager, etc. This is also useful for finding rough edges / issues in Lucene's APIs that make building a server unnecessarily hard. I don't think it should have back compatibility promises (except Lucene's index back compatibility), so it's free to improve as Lucene's APIs change. As a starting point, I'll post what I built for the eating your own dog food search app for Lucene's Solr's jira issues http://jirasearch.mikemccandless.com (blog: http://blog.mikemccandless.com/2013/05/eating-dog-food-with-lucene.html ). It uses Netty to expose basic indexing searching APIs via JSON, but it's very rough (lots nocommits). -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5230) Call DelegatingCollector.finish() during grouping
[ https://issues.apache.org/jira/browse/SOLR-5230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865874#comment-13865874 ] Joel Bernstein commented on SOLR-5230: -- Steve, I think this is done properly, but I haven't added any test cases for this yet. We'd need to mock up a simple PostFilter that uses finish() for the tests because the CollapsingQParserPlugin is the only PostFilter right now that relies on finish(). It would be better to mockup a simple test PostFilter for the tests. Joel Call DelegatingCollector.finish() during grouping - Key: SOLR-5230 URL: https://issues.apache.org/jira/browse/SOLR-5230 Project: Solr Issue Type: New Feature Components: search Affects Versions: 4.4 Reporter: Joel Bernstein Priority: Minor Fix For: 4.7 Attachments: SOLR-5230.patch This is an add-on to SOLR-5020 to call the new DelegatingCollector.finish() method from inside the grouping flow. -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5230) Call DelegatingCollector.finish() during grouping
[ https://issues.apache.org/jira/browse/SOLR-5230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865882#comment-13865882 ] Steve Rowe commented on SOLR-5230: -- Thanks Joel, I'll see if I can whip up a test. - Steve Call DelegatingCollector.finish() during grouping - Key: SOLR-5230 URL: https://issues.apache.org/jira/browse/SOLR-5230 Project: Solr Issue Type: New Feature Components: search Affects Versions: 4.4 Reporter: Joel Bernstein Priority: Minor Fix For: 4.7 Attachments: SOLR-5230.patch This is an add-on to SOLR-5020 to call the new DelegatingCollector.finish() method from inside the grouping flow. -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5369) Add an UpperCaseFilter
[ https://issues.apache.org/jira/browse/LUCENE-5369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865902#comment-13865902 ] Shawn Heisey commented on LUCENE-5369: -- [~ryantxu], this fails precommit because the new files are missing svn:eol-style. I actually ran the precommit because I was worried that it would fail the forbidden-apis check. Looks like that only fails on String#toUpperCase if you don't include a locale. Javadocs for Character say that Character#toUpperCase uses Unicode information, so I guess it's OK -- and precommit passed just fine after I added svn:eol-style native to the new files. Add an UpperCaseFilter -- Key: LUCENE-5369 URL: https://issues.apache.org/jira/browse/LUCENE-5369 Project: Lucene - Core Issue Type: New Feature Reporter: Ryan McKinley Assignee: Ryan McKinley Priority: Minor Attachments: LUCENE-5369-uppercase-filter.patch We should offer a standard way to force upper-case tokens. I understand that lowercase is safer for general search quality because some uppercase characters can represent multiple lowercase ones. However, having upper-case tokens is often nice for faceting (consider normalizing to standard acronyms) -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5388) Eliminate construction over readers for Tokenizer
[ https://issues.apache.org/jira/browse/LUCENE-5388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865907#comment-13865907 ] Benson Margulies commented on LUCENE-5388: -- [~rcmuir] or [~mikemccand] I could really use some help here with TestRandomChains. Eliminate construction over readers for Tokenizer - Key: LUCENE-5388 URL: https://issues.apache.org/jira/browse/LUCENE-5388 Project: Lucene - Core Issue Type: Improvement Components: core/other Reporter: Benson Margulies In the modern world, Tokenizers are intended to be reusable, with input supplied via #setReader. The constructors that take Reader are a vestige. Worse yet, they invite people to make mistakes in handling the reader that tangle them up with the state machine in Tokenizer. The sensible thing is to eliminate these ctors, and force setReader usage. -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5369) Add an UpperCaseFilter
[ https://issues.apache.org/jira/browse/LUCENE-5369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865911#comment-13865911 ] Uwe Schindler commented on LUCENE-5369: --- Yes Character.toUpperCase is fine and locale invariant. Add an UpperCaseFilter -- Key: LUCENE-5369 URL: https://issues.apache.org/jira/browse/LUCENE-5369 Project: Lucene - Core Issue Type: New Feature Reporter: Ryan McKinley Assignee: Ryan McKinley Priority: Minor Attachments: LUCENE-5369-uppercase-filter.patch We should offer a standard way to force upper-case tokens. I understand that lowercase is safer for general search quality because some uppercase characters can represent multiple lowercase ones. However, having upper-case tokens is often nice for faceting (consider normalizing to standard acronyms) -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5388) Eliminate construction over readers for Tokenizer
[ https://issues.apache.org/jira/browse/LUCENE-5388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865913#comment-13865913 ] Robert Muir commented on LUCENE-5388: - its a monster... Eliminate construction over readers for Tokenizer - Key: LUCENE-5388 URL: https://issues.apache.org/jira/browse/LUCENE-5388 Project: Lucene - Core Issue Type: Improvement Components: core/other Reporter: Benson Margulies In the modern world, Tokenizers are intended to be reusable, with input supplied via #setReader. The constructors that take Reader are a vestige. Worse yet, they invite people to make mistakes in handling the reader that tangle them up with the state machine in Tokenizer. The sensible thing is to eliminate these ctors, and force setReader usage. -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [jira] [Commented] (LUCENE-5369) Add an UpperCaseFilter
fixing now... sorry On Wed, Jan 8, 2014 at 1:28 PM, Uwe Schindler (JIRA) j...@apache.orgwrote: [ https://issues.apache.org/jira/browse/LUCENE-5369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865911#comment-13865911] Uwe Schindler commented on LUCENE-5369: --- Yes Character.toUpperCase is fine and locale invariant. Add an UpperCaseFilter -- Key: LUCENE-5369 URL: https://issues.apache.org/jira/browse/LUCENE-5369 Project: Lucene - Core Issue Type: New Feature Reporter: Ryan McKinley Assignee: Ryan McKinley Priority: Minor Attachments: LUCENE-5369-uppercase-filter.patch We should offer a standard way to force upper-case tokens. I understand that lowercase is safer for general search quality because some uppercase characters can represent multiple lowercase ones. However, having upper-case tokens is often nice for faceting (consider normalizing to standard acronyms) -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5369) Add an UpperCaseFilter
[ https://issues.apache.org/jira/browse/LUCENE-5369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865916#comment-13865916 ] ASF subversion and git services commented on LUCENE-5369: - Commit 1556643 from [~ryantxu] in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1556643 ] LUCENE-5369: missing eol:style Add an UpperCaseFilter -- Key: LUCENE-5369 URL: https://issues.apache.org/jira/browse/LUCENE-5369 Project: Lucene - Core Issue Type: New Feature Reporter: Ryan McKinley Assignee: Ryan McKinley Priority: Minor Attachments: LUCENE-5369-uppercase-filter.patch We should offer a standard way to force upper-case tokens. I understand that lowercase is safer for general search quality because some uppercase characters can represent multiple lowercase ones. However, having upper-case tokens is often nice for faceting (consider normalizing to standard acronyms) -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-4.x-Linux (32bit/jdk1.6.0_45) - Build # 8916 - Failure!
Build: http://jenkins.thetaphi.de/job/Lucene-Solr-4.x-Linux/8916/ Java: 32bit/jdk1.6.0_45 -client -XX:+UseConcMarkSweepGC All tests passed Build Log: [...truncated 35915 lines...] BUILD FAILED /mnt/ssd/jenkins/workspace/Lucene-Solr-4.x-Linux/build.xml:459: The following error occurred while executing this line: /mnt/ssd/jenkins/workspace/Lucene-Solr-4.x-Linux/build.xml:398: The following error occurred while executing this line: /mnt/ssd/jenkins/workspace/Lucene-Solr-4.x-Linux/extra-targets.xml:87: The following error occurred while executing this line: /mnt/ssd/jenkins/workspace/Lucene-Solr-4.x-Linux/extra-targets.xml:185: The following files are missing svn:eol-style (or binary svn:mime-type): * ./lucene/analysis/common/src/java/org/apache/lucene/analysis/core/UpperCaseFilter.java * ./lucene/analysis/common/src/java/org/apache/lucene/analysis/core/UpperCaseFilterFactory.java Total time: 54 minutes 4 seconds Build step 'Invoke Ant' marked build as failure Description set: Java: 32bit/jdk1.6.0_45 -client -XX:+UseConcMarkSweepGC Archiving artifacts Recording test results Email was triggered for: Failure Sending email for trigger: Failure - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5369) Add an UpperCaseFilter
[ https://issues.apache.org/jira/browse/LUCENE-5369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865917#comment-13865917 ] ASF subversion and git services commented on LUCENE-5369: - Commit 1556644 from [~ryantxu] in branch 'dev/trunk' [ https://svn.apache.org/r1556644 ] LUCENE-5369: missing eol:style (merge from 4x) Add an UpperCaseFilter -- Key: LUCENE-5369 URL: https://issues.apache.org/jira/browse/LUCENE-5369 Project: Lucene - Core Issue Type: New Feature Reporter: Ryan McKinley Assignee: Ryan McKinley Priority: Minor Attachments: LUCENE-5369-uppercase-filter.patch We should offer a standard way to force upper-case tokens. I understand that lowercase is safer for general search quality because some uppercase characters can represent multiple lowercase ones. However, having upper-case tokens is often nice for faceting (consider normalizing to standard acronyms) -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5388) Eliminate construction over readers for Tokenizer
[ https://issues.apache.org/jira/browse/LUCENE-5388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865925#comment-13865925 ] Benson Margulies commented on LUCENE-5388: -- It does something complex with the input reader in a createComponents. the challenge is to move all that to initReader so that it works. I think I'm too fried from 1000 other edits, I'll look in after dinner but anyone who wants to grab my branch from github and pitch in is more than welcome. Eliminate construction over readers for Tokenizer - Key: LUCENE-5388 URL: https://issues.apache.org/jira/browse/LUCENE-5388 Project: Lucene - Core Issue Type: Improvement Components: core/other Reporter: Benson Margulies In the modern world, Tokenizers are intended to be reusable, with input supplied via #setReader. The constructors that take Reader are a vestige. Worse yet, they invite people to make mistakes in handling the reader that tangle them up with the state machine in Tokenizer. The sensible thing is to eliminate these ctors, and force setReader usage. -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5388) Eliminate construction over readers for Tokenizer
[ https://issues.apache.org/jira/browse/LUCENE-5388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865930#comment-13865930 ] Robert Muir commented on LUCENE-5388: - Yes: well the CheckThatYouDidntReadAnythingReaderWrapper can likely be removed. you are removing that possibility entirely. so it should get simpler... ill have a look at your branch tonight and try to help with some of this stuff, its hairy. Eliminate construction over readers for Tokenizer - Key: LUCENE-5388 URL: https://issues.apache.org/jira/browse/LUCENE-5388 Project: Lucene - Core Issue Type: Improvement Components: core/other Reporter: Benson Margulies In the modern world, Tokenizers are intended to be reusable, with input supplied via #setReader. The constructors that take Reader are a vestige. Worse yet, they invite people to make mistakes in handling the reader that tangle them up with the state machine in Tokenizer. The sensible thing is to eliminate these ctors, and force setReader usage. -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4260) Inconsistent numDocs between leader and replica
[ https://issues.apache.org/jira/browse/SOLR-4260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865963#comment-13865963 ] Timothy Potter commented on SOLR-4260: -- Still digging into it ... I'm curious why a batch of 34 adds on the leader gets processed as several sub-batches on the replica? Here's what I'm seeing the logs around the documents that are missing from the replica. Basically, there are 34 docs on the leader and only 25 processed in 4 separate batches (from my counting of the logs) on the replica. Why wouldn't it just be one for one? The docs are all roughly the same size ... and what's breaking it up? Having trouble seeing that in the logs ;-) On the leader: 2014-01-08 12:23:21,501 [qtp604104855-17] INFO update.processor.LogUpdateProcessor - [demo_shard1_replica1] webapp=/solr path=/update params={wt=javabinversion=2} {add=[82900 (1456683668174012416), 82901 (1456683668181352448), 82903 (1456683668181352449), 82904 (1456683668181352450), 82912 (1456683668187643904), 82913 (1456683668188692480), 82914 (1456683668188692481), 82916 (1456683668188692482), 82917 (1456683668188692483), 82918 (1456683668188692484), ... (34 adds)]} 0 34 2014-01-08 12:23:21,600 [qtp604104855-17] INFO update.processor.LogUpdateProcessor - [demo_shard1_replica1] webapp=/solr path=/update params={wt=javabinversion=2} {add=[83002 (1456683668280967168), 83005 (1456683668286210048), 83008 (1456683668286210049), 83011 (1456683668286210050), 83012 (1456683668286210051), 83013 (1456683668287258624), 83018 (1456683668287258625), 83019 (1456683668289355776), 83023 (1456683668289355777), 83024 (1456683668289355778), ... (43 adds)]} 0 32 On the replica: 2014-01-08 12:23:21,126 [qtp604104855-22] INFO update.processor.LogUpdateProcessor - [demo_shard1_replica2] webapp=/solr path=/update params={distrib.from=http://ec2-54-209-223-12.compute-1.amazonaws.com:8984/solr/demo_shard1_replica1/update.distrib=FROMLEADERwt=javabinversion=2} {add=[82900 (1456683668174012416), 82901 (1456683668181352448), 82903 (1456683668181352449)]} 0 1 2014-01-08 12:23:21,134 [qtp604104855-22] INFO update.processor.LogUpdateProcessor - [demo_shard1_replica2] webapp=/solr path=/update params={distrib.from=http://ec2-54-209-223-12.compute-1.amazonaws.com:8984/solr/demo_shard1_replica1/update.distrib=FROMLEADERwt=javabinversion=2} {add=[82904 (1456683668181352450), 82912 (1456683668187643904), 82913 (1456683668188692480), 82914 (1456683668188692481), 82916 (1456683668188692482), 82917 (1456683668188692483), 82918 (1456683668188692484), 82919 (1456683668188692485), 82922 (1456683668188692486)]} 0 2 2014-01-08 12:23:21,139 [qtp604104855-22] INFO update.processor.LogUpdateProcessor - [demo_shard1_replica2] webapp=/solr path=/update params={distrib.from=http://ec2-54-209-223-12.compute-1.amazonaws.com:8984/solr/demo_shard1_replica1/update.distrib=FROMLEADERwt=javabinversion=2} {add=[82923 (1456683668188692487), 82926 (1456683668190789632), 82928 (1456683668190789633), 82932 (1456683668190789634), 82939 (1456683668192886784), 82945 (1456683668192886785), 82946 (1456683668192886786), 82947 (1456683668193935360), 82952 (1456683668193935361), 82962 (1456683668193935362), ... (12 adds)]} 0 3 2014-01-08 12:23:21,144 [qtp604104855-22] INFO update.processor.LogUpdateProcessor - [demo_shard1_replica2] webapp=/solr path=/update params={distrib.from=http://ec2-54-209-223-12.compute-1.amazonaws.com:8984/solr/demo_shard1_replica1/update.distrib=FROMLEADERwt=javabinversion=2} {add=[82967 (1456683668199178240)]} 0 0 9 Docs Missing here 2014-01-08 12:23:21,227 [qtp604104855-22] INFO update.processor.LogUpdateProcessor - [demo_shard1_replica2] webapp=/solr path=/update params={distrib.from=http://ec2-54-209-223-12.compute-1.amazonaws.com:8984/solr/demo_shard1_replica1/update.distrib=FROMLEADERwt=javabinversion=2} {add=[83002 (1456683668280967168), 83005 (1456683668286210048), 83008 (1456683668286210049), 83011 (1456683668286210050), 83012 (1456683668286210051), 83013 (1456683668287258624)]} 0 2 Inconsistent numDocs between leader and replica --- Key: SOLR-4260 URL: https://issues.apache.org/jira/browse/SOLR-4260 Project: Solr Issue Type: Bug Components: SolrCloud Environment: 5.0.0.2013.01.04.15.31.51 Reporter: Markus Jelsma Assignee: Mark Miller Priority: Critical Fix For: 5.0, 4.7 Attachments: 192.168.20.102-replica1.png, 192.168.20.104-replica2.png, clusterstate.png, demo_shard1_replicas_out_of_sync.tgz After wiping all cores and reindexing some 3.3 million docs from Nutch using CloudSolrServer we see inconsistencies between the leader and replica for some shards. Each core hold about 3.3k documents. For some
[jira] [Comment Edited] (SOLR-4260) Inconsistent numDocs between leader and replica
[ https://issues.apache.org/jira/browse/SOLR-4260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865963#comment-13865963 ] Timothy Potter edited comment on SOLR-4260 at 1/8/14 10:08 PM: --- Still digging into it ... I'm curious why a batch of 34 adds on the leader gets processed as several sub-batches on the replica? Here's what I'm seeing the logs around the documents that are missing from the replica. Basically, there are 34 docs on the leader and only 25 processed in 4 separate batches (from my counting of the logs) on the replica. Why wouldn't it just be one for one? The docs are all roughly the same size ... and what's breaking it up? Having trouble seeing that in the logs / code ;-) On the leader: 2014-01-08 12:23:21,501 [qtp604104855-17] INFO update.processor.LogUpdateProcessor - [demo_shard1_replica1] webapp=/solr path=/update params={wt=javabinversion=2} {add=[82900 (1456683668174012416), 82901 (1456683668181352448), 82903 (1456683668181352449), 82904 (1456683668181352450), 82912 (1456683668187643904), 82913 (1456683668188692480), 82914 (1456683668188692481), 82916 (1456683668188692482), 82917 (1456683668188692483), 82918 (1456683668188692484), ... (34 adds)]} 0 34 2014-01-08 12:23:21,600 [qtp604104855-17] INFO update.processor.LogUpdateProcessor - [demo_shard1_replica1] webapp=/solr path=/update params={wt=javabinversion=2} {add=[83002 (1456683668280967168), 83005 (1456683668286210048), 83008 (1456683668286210049), 83011 (1456683668286210050), 83012 (1456683668286210051), 83013 (1456683668287258624), 83018 (1456683668287258625), 83019 (1456683668289355776), 83023 (1456683668289355777), 83024 (1456683668289355778), ... (43 adds)]} 0 32 On the replica: 2014-01-08 12:23:21,126 [qtp604104855-22] INFO update.processor.LogUpdateProcessor - [demo_shard1_replica2] webapp=/solr path=/update params={distrib.from=http://ec2-54-209-223-12.compute-1.amazonaws.com:8984/solr/demo_shard1_replica1/update.distrib=FROMLEADERwt=javabinversion=2} {add=[82900 (1456683668174012416), 82901 (1456683668181352448), 82903 (1456683668181352449)]} 0 1 2014-01-08 12:23:21,134 [qtp604104855-22] INFO update.processor.LogUpdateProcessor - [demo_shard1_replica2] webapp=/solr path=/update params={distrib.from=http://ec2-54-209-223-12.compute-1.amazonaws.com:8984/solr/demo_shard1_replica1/update.distrib=FROMLEADERwt=javabinversion=2} {add=[82904 (1456683668181352450), 82912 (1456683668187643904), 82913 (1456683668188692480), 82914 (1456683668188692481), 82916 (1456683668188692482), 82917 (1456683668188692483), 82918 (1456683668188692484), 82919 (1456683668188692485), 82922 (1456683668188692486)]} 0 2 2014-01-08 12:23:21,139 [qtp604104855-22] INFO update.processor.LogUpdateProcessor - [demo_shard1_replica2] webapp=/solr path=/update params={distrib.from=http://ec2-54-209-223-12.compute-1.amazonaws.com:8984/solr/demo_shard1_replica1/update.distrib=FROMLEADERwt=javabinversion=2} {add=[82923 (1456683668188692487), 82926 (1456683668190789632), 82928 (1456683668190789633), 82932 (1456683668190789634), 82939 (1456683668192886784), 82945 (1456683668192886785), 82946 (1456683668192886786), 82947 (1456683668193935360), 82952 (1456683668193935361), 82962 (1456683668193935362), ... (12 adds)]} 0 3 2014-01-08 12:23:21,144 [qtp604104855-22] INFO update.processor.LogUpdateProcessor - [demo_shard1_replica2] webapp=/solr path=/update params={distrib.from=http://ec2-54-209-223-12.compute-1.amazonaws.com:8984/solr/demo_shard1_replica1/update.distrib=FROMLEADERwt=javabinversion=2} {add=[82967 (1456683668199178240)]} 0 0 9 Docs Missing here 2014-01-08 12:23:21,227 [qtp604104855-22] INFO update.processor.LogUpdateProcessor - [demo_shard1_replica2] webapp=/solr path=/update params={distrib.from=http://ec2-54-209-223-12.compute-1.amazonaws.com:8984/solr/demo_shard1_replica1/update.distrib=FROMLEADERwt=javabinversion=2} {add=[83002 (1456683668280967168), 83005 (1456683668286210048), 83008 (1456683668286210049), 83011 (1456683668286210050), 83012 (1456683668286210051), 83013 (1456683668287258624)]} 0 2 Note the add log message starting with doc ID 83002 is just included here for context to show where the leader / replica got out of sync. was (Author: tim.potter): Still digging into it ... I'm curious why a batch of 34 adds on the leader gets processed as several sub-batches on the replica? Here's what I'm seeing the logs around the documents that are missing from the replica. Basically, there are 34 docs on the leader and only 25 processed in 4 separate batches (from my counting of the logs) on the replica. Why wouldn't it just be one for one? The docs are all roughly the same size ... and what's breaking it up? Having trouble seeing that in the logs / code ;-) On the leader: 2014-01-08 12:23:21,501 [qtp604104855-17] INFO
[jira] [Comment Edited] (SOLR-4260) Inconsistent numDocs between leader and replica
[ https://issues.apache.org/jira/browse/SOLR-4260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865963#comment-13865963 ] Timothy Potter edited comment on SOLR-4260 at 1/8/14 10:07 PM: --- Still digging into it ... I'm curious why a batch of 34 adds on the leader gets processed as several sub-batches on the replica? Here's what I'm seeing the logs around the documents that are missing from the replica. Basically, there are 34 docs on the leader and only 25 processed in 4 separate batches (from my counting of the logs) on the replica. Why wouldn't it just be one for one? The docs are all roughly the same size ... and what's breaking it up? Having trouble seeing that in the logs / code ;-) On the leader: 2014-01-08 12:23:21,501 [qtp604104855-17] INFO update.processor.LogUpdateProcessor - [demo_shard1_replica1] webapp=/solr path=/update params={wt=javabinversion=2} {add=[82900 (1456683668174012416), 82901 (1456683668181352448), 82903 (1456683668181352449), 82904 (1456683668181352450), 82912 (1456683668187643904), 82913 (1456683668188692480), 82914 (1456683668188692481), 82916 (1456683668188692482), 82917 (1456683668188692483), 82918 (1456683668188692484), ... (34 adds)]} 0 34 2014-01-08 12:23:21,600 [qtp604104855-17] INFO update.processor.LogUpdateProcessor - [demo_shard1_replica1] webapp=/solr path=/update params={wt=javabinversion=2} {add=[83002 (1456683668280967168), 83005 (1456683668286210048), 83008 (1456683668286210049), 83011 (1456683668286210050), 83012 (1456683668286210051), 83013 (1456683668287258624), 83018 (1456683668287258625), 83019 (1456683668289355776), 83023 (1456683668289355777), 83024 (1456683668289355778), ... (43 adds)]} 0 32 On the replica: 2014-01-08 12:23:21,126 [qtp604104855-22] INFO update.processor.LogUpdateProcessor - [demo_shard1_replica2] webapp=/solr path=/update params={distrib.from=http://ec2-54-209-223-12.compute-1.amazonaws.com:8984/solr/demo_shard1_replica1/update.distrib=FROMLEADERwt=javabinversion=2} {add=[82900 (1456683668174012416), 82901 (1456683668181352448), 82903 (1456683668181352449)]} 0 1 2014-01-08 12:23:21,134 [qtp604104855-22] INFO update.processor.LogUpdateProcessor - [demo_shard1_replica2] webapp=/solr path=/update params={distrib.from=http://ec2-54-209-223-12.compute-1.amazonaws.com:8984/solr/demo_shard1_replica1/update.distrib=FROMLEADERwt=javabinversion=2} {add=[82904 (1456683668181352450), 82912 (1456683668187643904), 82913 (1456683668188692480), 82914 (1456683668188692481), 82916 (1456683668188692482), 82917 (1456683668188692483), 82918 (1456683668188692484), 82919 (1456683668188692485), 82922 (1456683668188692486)]} 0 2 2014-01-08 12:23:21,139 [qtp604104855-22] INFO update.processor.LogUpdateProcessor - [demo_shard1_replica2] webapp=/solr path=/update params={distrib.from=http://ec2-54-209-223-12.compute-1.amazonaws.com:8984/solr/demo_shard1_replica1/update.distrib=FROMLEADERwt=javabinversion=2} {add=[82923 (1456683668188692487), 82926 (1456683668190789632), 82928 (1456683668190789633), 82932 (1456683668190789634), 82939 (1456683668192886784), 82945 (1456683668192886785), 82946 (1456683668192886786), 82947 (1456683668193935360), 82952 (1456683668193935361), 82962 (1456683668193935362), ... (12 adds)]} 0 3 2014-01-08 12:23:21,144 [qtp604104855-22] INFO update.processor.LogUpdateProcessor - [demo_shard1_replica2] webapp=/solr path=/update params={distrib.from=http://ec2-54-209-223-12.compute-1.amazonaws.com:8984/solr/demo_shard1_replica1/update.distrib=FROMLEADERwt=javabinversion=2} {add=[82967 (1456683668199178240)]} 0 0 9 Docs Missing here 2014-01-08 12:23:21,227 [qtp604104855-22] INFO update.processor.LogUpdateProcessor - [demo_shard1_replica2] webapp=/solr path=/update params={distrib.from=http://ec2-54-209-223-12.compute-1.amazonaws.com:8984/solr/demo_shard1_replica1/update.distrib=FROMLEADERwt=javabinversion=2} {add=[83002 (1456683668280967168), 83005 (1456683668286210048), 83008 (1456683668286210049), 83011 (1456683668286210050), 83012 (1456683668286210051), 83013 (1456683668287258624)]} 0 2 was (Author: tim.potter): Still digging into it ... I'm curious why a batch of 34 adds on the leader gets processed as several sub-batches on the replica? Here's what I'm seeing the logs around the documents that are missing from the replica. Basically, there are 34 docs on the leader and only 25 processed in 4 separate batches (from my counting of the logs) on the replica. Why wouldn't it just be one for one? The docs are all roughly the same size ... and what's breaking it up? Having trouble seeing that in the logs ;-) On the leader: 2014-01-08 12:23:21,501 [qtp604104855-17] INFO update.processor.LogUpdateProcessor - [demo_shard1_replica1] webapp=/solr path=/update params={wt=javabinversion=2} {add=[82900 (1456683668174012416), 82901
[jira] [Comment Edited] (SOLR-4260) Inconsistent numDocs between leader and replica
[ https://issues.apache.org/jira/browse/SOLR-4260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865963#comment-13865963 ] Timothy Potter edited comment on SOLR-4260 at 1/8/14 10:09 PM: --- Still digging into it ... I'm curious why a batch of 34 adds on the leader gets processed as several sub-batches on the replica? Here's what I'm seeing the logs around the documents that are missing from the replica. Basically, there are 34 docs on the leader and only 25 processed in 4 separate batches (from my counting of the logs) on the replica. Why wouldn't it just be one for one? The docs are all roughly the same size ... and what's breaking it up? Having trouble seeing that in the logs / code ;-) On the leader: 2014-01-08 12:23:21,501 [qtp604104855-17] INFO update.processor.LogUpdateProcessor - [demo_shard1_replica1] webapp=/solr path=/update params={wt=javabinversion=2} {add=[82900 (1456683668174012416), 82901 (1456683668181352448), 82903 (1456683668181352449), 82904 (1456683668181352450), 82912 (1456683668187643904), 82913 (1456683668188692480), 82914 (1456683668188692481), 82916 (1456683668188692482), 82917 (1456683668188692483), 82918 (1456683668188692484), ... (34 adds)]} 0 34 NOT ALL OF THE 34 DOCS MENTIONED ABOVE MAKE IT TO THE REPLICA 2014-01-08 12:23:21,600 [qtp604104855-17] INFO update.processor.LogUpdateProcessor - [demo_shard1_replica1] webapp=/solr path=/update params={wt=javabinversion=2} {add=[83002 (1456683668280967168), 83005 (1456683668286210048), 83008 (1456683668286210049), 83011 (1456683668286210050), 83012 (1456683668286210051), 83013 (1456683668287258624), 83018 (1456683668287258625), 83019 (1456683668289355776), 83023 (1456683668289355777), 83024 (1456683668289355778), ... (43 adds)]} 0 32 On the replica: 2014-01-08 12:23:21,126 [qtp604104855-22] INFO update.processor.LogUpdateProcessor - [demo_shard1_replica2] webapp=/solr path=/update params={distrib.from=http://ec2-54-209-223-12.compute-1.amazonaws.com:8984/solr/demo_shard1_replica1/update.distrib=FROMLEADERwt=javabinversion=2} {add=[82900 (1456683668174012416), 82901 (1456683668181352448), 82903 (1456683668181352449)]} 0 1 2014-01-08 12:23:21,134 [qtp604104855-22] INFO update.processor.LogUpdateProcessor - [demo_shard1_replica2] webapp=/solr path=/update params={distrib.from=http://ec2-54-209-223-12.compute-1.amazonaws.com:8984/solr/demo_shard1_replica1/update.distrib=FROMLEADERwt=javabinversion=2} {add=[82904 (1456683668181352450), 82912 (1456683668187643904), 82913 (1456683668188692480), 82914 (1456683668188692481), 82916 (1456683668188692482), 82917 (1456683668188692483), 82918 (1456683668188692484), 82919 (1456683668188692485), 82922 (1456683668188692486)]} 0 2 2014-01-08 12:23:21,139 [qtp604104855-22] INFO update.processor.LogUpdateProcessor - [demo_shard1_replica2] webapp=/solr path=/update params={distrib.from=http://ec2-54-209-223-12.compute-1.amazonaws.com:8984/solr/demo_shard1_replica1/update.distrib=FROMLEADERwt=javabinversion=2} {add=[82923 (1456683668188692487), 82926 (1456683668190789632), 82928 (1456683668190789633), 82932 (1456683668190789634), 82939 (1456683668192886784), 82945 (1456683668192886785), 82946 (1456683668192886786), 82947 (1456683668193935360), 82952 (1456683668193935361), 82962 (1456683668193935362), ... (12 adds)]} 0 3 2014-01-08 12:23:21,144 [qtp604104855-22] INFO update.processor.LogUpdateProcessor - [demo_shard1_replica2] webapp=/solr path=/update params={distrib.from=http://ec2-54-209-223-12.compute-1.amazonaws.com:8984/solr/demo_shard1_replica1/update.distrib=FROMLEADERwt=javabinversion=2} {add=[82967 (1456683668199178240)]} 0 0 9 Docs Missing here 2014-01-08 12:23:21,227 [qtp604104855-22] INFO update.processor.LogUpdateProcessor - [demo_shard1_replica2] webapp=/solr path=/update params={distrib.from=http://ec2-54-209-223-12.compute-1.amazonaws.com:8984/solr/demo_shard1_replica1/update.distrib=FROMLEADERwt=javabinversion=2} {add=[83002 (1456683668280967168), 83005 (1456683668286210048), 83008 (1456683668286210049), 83011 (1456683668286210050), 83012 (1456683668286210051), 83013 (1456683668287258624)]} 0 2 Note the add log message starting with doc ID 83002 is just included here for context to show where the leader / replica got out of sync. was (Author: tim.potter): Still digging into it ... I'm curious why a batch of 34 adds on the leader gets processed as several sub-batches on the replica? Here's what I'm seeing the logs around the documents that are missing from the replica. Basically, there are 34 docs on the leader and only 25 processed in 4 separate batches (from my counting of the logs) on the replica. Why wouldn't it just be one for one? The docs are all roughly the same size ... and what's breaking it up? Having trouble seeing that in the logs / code ;-) On the leader: 2014-01-08