date:20140108


 [ 
https://issues.apache.org/jira/browse/SOLR-5594?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anshum Gupta updated SOLR-5594:
---

Attachment: SOLR-5594.patch

Here's another patch.

 Enable using extended field types with prefix queries for non-default encoded 
 strings
 -

 Key: SOLR-5594
 URL: https://issues.apache.org/jira/browse/SOLR-5594
 Project: Solr
  Issue Type: Improvement
  Components: query parsers, Schema and Analysis
Affects Versions: 4.6
Reporter: Anshum Gupta
Assignee: Anshum Gupta
Priority: Minor
 Attachments: SOLR-5594-branch_4x.patch, SOLR-5594.patch, 
 SOLR-5594.patch


 Enable users to be able to use prefix query with custom field types with 
 non-default encoding/decoding for queries more easily. e.g. having a custom 
 field work with base64 encoded query strings.
 Currently, the workaround for it is to have the override at getRewriteMethod 
 level. Perhaps having the prefixQuery also use the calling FieldType's 
 readableToIndexed method would work better.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5594) Enable using extended field types with prefix queries for non-default encoded strings

[
https://issues.apache.org/jira/browse/SOLR-5594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865215#comment-13865215
]

Anshum Gupta commented on SOLR-5594:

Thanks for the feedback Hoss!

I've tried to be less invasive this time around and have avoided using
readableToIndexed in the base class. I've also fixed the javadocs.
About using the BinaryTokenStream (extending BinaryField), I may be missing
something as far as the reason is concerned but it's tough to work with
BinaryFields right now. They can not be indexed really and the only way to
index a binary field is through the hack I put it.
I've removed all of that and made this more generic.

I'll also open a separate JIRA to make BinaryFields better and more usable.

Here's what still remains before this can be checked in:
* Fix/change dependent Parser classes e.g. PrefixQParserPlugin and
SimpleQParserPlugin.
* A test that shows that things haven't changed for the existing field types as
far as Prefix Queries are concerned.

Need a couple of hours for that.

Enable using extended field types with prefix queries for non-default encoded
strings
-

Key: SOLR-5594
URL: https://issues.apache.org/jira/browse/SOLR-5594
Project: Solr
Issue Type: Improvement
Components: query parsers, Schema and Analysis
Affects Versions: 4.6
Reporter: Anshum Gupta
Assignee: Anshum Gupta
Priority: Minor
Attachments: SOLR-5594-branch_4x.patch, SOLR-5594.patch,
SOLR-5594.patch

Enable users to be able to use prefix query with custom field types with
non-default encoding/decoding for queries more easily. e.g. having a custom
field work with base64 encoded query strings.
Currently, the workaround for it is to have the override at getRewriteMethod
level. Perhaps having the prefixQuery also use the calling FieldType's
readableToIndexed method would work better.

--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5388) Eliminate construction over readers for Tokenizer

2014-01-08 Thread Michael McCandless (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865217#comment-13865217
 ] 

Michael McCandless commented on LUCENE-5388:


+1, it's weird that the ctor takes a Reader and we also have a setReader.  This 
is a relic from the pre-reuse days...

 Eliminate construction over readers for Tokenizer
 -

 Key: LUCENE-5388
 URL: https://issues.apache.org/jira/browse/LUCENE-5388
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/other
Reporter: Benson Margulies

 In the modern world, Tokenizers are intended to be reusable, with input 
 supplied via #setReader. The constructors that take Reader are a vestige. 
 Worse yet, they invite people to make mistakes in handling the reader that 
 tangle them up with the state machine in Tokenizer. The sensible thing is to 
 eliminate these ctors, and force setReader usage.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5387) Improve FacetConfig.build

2014-01-08 Thread Michael McCandless (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865219#comment-13865219
 ] 

Michael McCandless commented on LUCENE-5387:


+1 to fix FC.build to return Document, and improve the javadocs.

 Improve FacetConfig.build
 -

 Key: LUCENE-5387
 URL: https://issues.apache.org/jira/browse/LUCENE-5387
 Project: Lucene - Core
  Issue Type: Improvement
  Components: modules/facet
Reporter: Shai Erera
Assignee: Shai Erera

 FacetConfig.build() takes an IndexDocument and returns a new instance of 
 IndexDocument. This forces you to write code like this:
 {code}
 Document doc = new Document();
 doc.add(new StringField(id, someID, Store.NO));
 doc.add(new FacetField(author, john));
 IndexDocument facetDoc = facetConfig.build(doc);
 indexWriter.addDocument(facetDoc);
 {code}
 Technically, you don't need to declare 'facetDoc', you could just 
 {{indexWriter.addDocument(facetConfig.build(doc))}}, but it's weird:
 * After you call facetConfig.build(), you cannot add any more fields to the 
 document (since you get an IndexDoc), so you must call it last.
 * Nothing suggests you *should* call facetConfig.build() at all - I can 
 already see users trapped by the new API, thinking that adding a FacetField 
 is enough. We should at least document on FacetField that you should call 
 FacetConfig.build().
 * Nothing suggests that you shouldn't ignore the returned IndexDoc from 
 FC.build() - we should at least document that.
 I think that if FacetConfig.build() took an IndexDocument but returned a 
 Document, that will at least allow you to call it in whatever stage of the 
 pipeline that you want (after adding all FacetFields though)...
 I'll post a patch later.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5388) Eliminate construction over readers for Tokenizer


[ 
https://issues.apache.org/jira/browse/LUCENE-5388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865222#comment-13865222
 ] 

Uwe Schindler commented on LUCENE-5388:
---

+1, but delay this until 5.0. Because there are many Tokenizers outside we 
should keep backwards compatibility.

 Eliminate construction over readers for Tokenizer
 -

 Key: LUCENE-5388
 URL: https://issues.apache.org/jira/browse/LUCENE-5388
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/other
Reporter: Benson Margulies

 In the modern world, Tokenizers are intended to be reusable, with input 
 supplied via #setReader. The constructors that take Reader are a vestige. 
 Worse yet, they invite people to make mistakes in handling the reader that 
 tangle them up with the state machine in Tokenizer. The sensible thing is to 
 eliminate these ctors, and force setReader usage.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (SOLR-5619) Improve BinaryField to make it Sortable and Indexable

Anshum Gupta created SOLR-5619:
--

 Summary: Improve BinaryField to make it Sortable and Indexable
 Key: SOLR-5619
 URL: https://issues.apache.org/jira/browse/SOLR-5619
 Project: Solr
  Issue Type: Improvement
Affects Versions: 4.6
Reporter: Anshum Gupta
Assignee: Anshum Gupta


Currently, BinaryField can neither be indexed nor sorted on.
Having them indexable and sortable should come in handy for a reasonable amount 
of use cases e.g. wanting to index binary data (could come from anything 
non-text).



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5594) Enable using extended field types with prefix queries for non-default encoded strings

2014-01-08 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-5594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865255#comment-13865255
 ] 

Anshum Gupta commented on SOLR-5594:


https://issues.apache.org/jira/browse/SOLR-5619

 Enable using extended field types with prefix queries for non-default encoded 
 strings
 -

 Key: SOLR-5594
 URL: https://issues.apache.org/jira/browse/SOLR-5594
 Project: Solr
  Issue Type: Improvement
  Components: query parsers, Schema and Analysis
Affects Versions: 4.6
Reporter: Anshum Gupta
Assignee: Anshum Gupta
Priority: Minor
 Attachments: SOLR-5594-branch_4x.patch, SOLR-5594.patch, 
 SOLR-5594.patch


 Enable users to be able to use prefix query with custom field types with 
 non-default encoding/decoding for queries more easily. e.g. having a custom 
 field work with base64 encoded query strings.
 Currently, the workaround for it is to have the override at getRewriteMethod 
 level. Perhaps having the prefixQuery also use the calling FieldType's 
 readableToIndexed method would work better.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Iterating BinaryDocValues

2014-01-08 Thread Mikhail Khludnev

FWIW,

Micro benchmark shows 4% gain on reusing incoming ByteRef.bytes in short
binary docvalues Test2BBinaryDocValues.testVariableBinary() with mmap
directory.
I wonder why it doesn't reads into incoming bytes
https://github.com/apache/lucene-solr/blame/trunk/lucene/core/src/java/org/apache/lucene/codecs/lucene45/Lucene45DocValuesProducer.java#L401



On Wed, Jan 8, 2014 at 12:53 AM, Michael McCandless 
luc...@mikemccandless.com wrote:

 Going sequentially should help, if the pages are not hot (in the OS's IO
 cache).

 You can also use a different DVFormat, e.g. Direct, but this holds all
 bytes in RAM.

 Mike McCandless

 http://blog.mikemccandless.com


 On Tue, Jan 7, 2014 at 1:09 PM, Mikhail Khludnev
 mkhlud...@griddynamics.com wrote:
  Joel,
 
  I tried to hack it straightforwardly, but found no free gain there. The
 only
  attempt I can suggest is to try to reuse bytes in
 
 https://github.com/apache/lucene-solr/blame/trunk/lucene/core/src/java/org/apache/lucene/codecs/lucene45/Lucene45DocValuesProducer.java#L401
  right now it allocates bytes every time, which beside of GC can also
 impact
  memory access locality. Could you try fix memory waste and repeat
  performance test?
 
  Have a good hack!
 
 
  On Mon, Dec 23, 2013 at 9:51 PM, Joel Bernstein joels...@gmail.com
 wrote:
 
 
  Hi,
 
  I'm looking for a faster way to perform large scale docId - bytesRef
  lookups for BinaryDocValues.
 
  I'm finding that I can't get the performance that I need from the random
  access seek in the BinaryDocValues interface.
 
  I'm wondering if sequentially scanning the docValues would be a faster
  approach. I have a BitSet of matching docs, so if I sequentially moved
  through the docValues I could test each one against that bitset.
 
  Wondering if that approach would be faster for bulk extracts and how
  tricky it would be to add an iterator to the BinaryDocValues interface?
 
  Thanks,
  Joel
 
 
 
 
  --
  Sincerely yours
  Mikhail Khludnev
  Principal Engineer,
  Grid Dynamics
 
 

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org




-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics

http://www.griddynamics.com
 mkhlud...@griddynamics.com

[jira] [Commented] (LUCENE-5361) FVH throws away some boosts


[ 
https://issues.apache.org/jira/browse/LUCENE-5361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865270#comment-13865270
 ] 

ASF subversion and git services commented on LUCENE-5361:
-

Commit 1556483 from [~jpountz] in branch 'dev/trunk'
[ https://svn.apache.org/r1556483 ]

LUCENE-5361: Fixed handling of query boosts in FastVectorHighlighter.

 FVH throws away some boosts
 ---

 Key: LUCENE-5361
 URL: https://issues.apache.org/jira/browse/LUCENE-5361
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Nik Everett
Assignee: Adrien Grand
Priority: Minor
 Attachments: LUCENE-5361.patch


 The FVH's FieldQuery throws away some boosts when flattening queries, 
 including DisjunctionMaxQuery and BooleanQuery queries.   Fragments generated 
 against queries containing boosted boolean queries don't end up sorted 
 correctly.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5361) FVH throws away some boosts

2014-01-08 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865274#comment-13865274
 ] 

ASF subversion and git services commented on LUCENE-5361:
-

Commit 1556484 from [~jpountz] in branch 'dev/branches/branch_4x'
[ https://svn.apache.org/r1556484 ]

LUCENE-5361: Fixed handling of query boosts in FastVectorHighlighter.

 FVH throws away some boosts
 ---

 Key: LUCENE-5361
 URL: https://issues.apache.org/jira/browse/LUCENE-5361
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Nik Everett
Assignee: Adrien Grand
Priority: Minor
 Attachments: LUCENE-5361.patch


 The FVH's FieldQuery throws away some boosts when flattening queries, 
 including DisjunctionMaxQuery and BooleanQuery queries.   Fragments generated 
 against queries containing boosted boolean queries don't end up sorted 
 correctly.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Comment Edited] (SOLR-5614) Boost documents using map and query functions

2014-01-08 Thread Anca Kopetz (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-5614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865278#comment-13865278
 ] 

Anca Kopetz edited comment on SOLR-5614 at 1/8/14 10:21 AM:


Hi,
Sorry for that, I really thought it was a bug. Thanks a lot for your answer.




was (Author: agh):
Hi,
Sorry for that, I really thought it was a bug. Thank a lot for your answer.



 Boost documents using map and query functions
 -

 Key: SOLR-5614
 URL: https://issues.apache.org/jira/browse/SOLR-5614
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.6
Reporter: Anca Kopetz

 We want to boost documents that contain specific search terms in its fields. 
 We tried the following simplified query : 
 http://localhost:8983/solr/collection1/select?q=ipod 
 belkinwt=xmldebugQuery=trueq.op=ANDdefType=edismaxbf=map(query($qq),0,0,0,100.0)qq={!edismax}power
 And we get the following error : 
 org.apache.solr.search.SyntaxError: Infinite Recursion detected parsing query 
 'power'
 And the stacktrace :
 ERROR - 2014-01-06 18:27:02.275; org.apache.solr.common.SolrException; 
 org.apache.solr.common.SolrException: org.apache.solr.search.SyntaxError: 
 Infinite Recursion detected parsing query 'power'
 at 
 org.apache.solr.handler.component.QueryComponent.prepare(QueryComponent.java:171)
 at 
 org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:194)
 at 
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
 at org.apache.solr.core.SolrCore.execute(SolrCore.java:1859)
 at 
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:710)
 at 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:413)
 at 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:197)
 at 
 org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)
 at 
 org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)
 at 
 org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
 at 
 org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557)
 at 
 org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
 at 
 org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075)
 at 
 org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384)
 at 
 org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
 at 
 org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009)
 at 
 org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
 at 
 org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
 at 
 org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)
 at 
 org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
 at org.eclipse.jetty.server.Server.handle(Server.java:368)
 at 
 org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489)
 at 
 org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)
 at 
 org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:942)
 at 
 org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1004)
 at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:640)
 at 
 org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)
 at 
 org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)
 at 
 org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)
 at 
 org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
 at 
 org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
 at java.lang.Thread.run(Thread.java:722)
 Caused by: org.apache.solr.search.SyntaxError: Infinite Recursion detected 
 parsing query 'power'
 at org.apache.solr.search.QParser.checkRecurse(QParser.java:178)
 at org.apache.solr.search.QParser.subQuery(QParser.java:200)
 at 
 org.apache.solr.search.ExtendedDismaxQParser.getBoostFunctions(ExtendedDismaxQParser.java:437)
 at 
 org.apache.solr.search.ExtendedDismaxQParser.parse(ExtendedDismaxQParser.java:175)
 at org.apache.solr.search.QParser.getQuery(QParser.java:142)
 at

[jira] [Commented] (SOLR-5614) Boost documents using map and query functions

2014-01-08 Thread Anca Kopetz (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-5614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865278#comment-13865278
 ] 

Anca Kopetz commented on SOLR-5614:
---

Hi,
Sorry for that, I really thought it was a bug. Thank a lot for your answer.



 Boost documents using map and query functions
 -

 Key: SOLR-5614
 URL: https://issues.apache.org/jira/browse/SOLR-5614
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.6
Reporter: Anca Kopetz

 We want to boost documents that contain specific search terms in its fields. 
 We tried the following simplified query : 
 http://localhost:8983/solr/collection1/select?q=ipod 
 belkinwt=xmldebugQuery=trueq.op=ANDdefType=edismaxbf=map(query($qq),0,0,0,100.0)qq={!edismax}power
 And we get the following error : 
 org.apache.solr.search.SyntaxError: Infinite Recursion detected parsing query 
 'power'
 And the stacktrace :
 ERROR - 2014-01-06 18:27:02.275; org.apache.solr.common.SolrException; 
 org.apache.solr.common.SolrException: org.apache.solr.search.SyntaxError: 
 Infinite Recursion detected parsing query 'power'
 at 
 org.apache.solr.handler.component.QueryComponent.prepare(QueryComponent.java:171)
 at 
 org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:194)
 at 
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
 at org.apache.solr.core.SolrCore.execute(SolrCore.java:1859)
 at 
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:710)
 at 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:413)
 at 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:197)
 at 
 org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)
 at 
 org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)
 at 
 org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
 at 
 org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557)
 at 
 org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
 at 
 org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075)
 at 
 org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384)
 at 
 org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
 at 
 org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009)
 at 
 org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
 at 
 org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
 at 
 org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)
 at 
 org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
 at org.eclipse.jetty.server.Server.handle(Server.java:368)
 at 
 org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489)
 at 
 org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)
 at 
 org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:942)
 at 
 org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1004)
 at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:640)
 at 
 org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)
 at 
 org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)
 at 
 org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)
 at 
 org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
 at 
 org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
 at java.lang.Thread.run(Thread.java:722)
 Caused by: org.apache.solr.search.SyntaxError: Infinite Recursion detected 
 parsing query 'power'
 at org.apache.solr.search.QParser.checkRecurse(QParser.java:178)
 at org.apache.solr.search.QParser.subQuery(QParser.java:200)
 at 
 org.apache.solr.search.ExtendedDismaxQParser.getBoostFunctions(ExtendedDismaxQParser.java:437)
 at 
 org.apache.solr.search.ExtendedDismaxQParser.parse(ExtendedDismaxQParser.java:175)
 at org.apache.solr.search.QParser.getQuery(QParser.java:142)
 at 
 org.apache.solr.search.FunctionQParser.parseNestedQuery(FunctionQParser.java:236)
 at

[jira] [Updated] (LUCENE-5361) FVH throws away some boosts

2014-01-08 Thread Adrien Grand (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-5361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adrien Grand updated LUCENE-5361:
-

Fix Version/s: 4.6.1

 FVH throws away some boosts
 ---

 Key: LUCENE-5361
 URL: https://issues.apache.org/jira/browse/LUCENE-5361
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Nik Everett
Assignee: Adrien Grand
Priority: Minor
 Fix For: 4.6.1

 Attachments: LUCENE-5361.patch


 The FVH's FieldQuery throws away some boosts when flattening queries, 
 including DisjunctionMaxQuery and BooleanQuery queries.   Fragments generated 
 against queries containing boosted boolean queries don't end up sorted 
 correctly.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5361) FVH throws away some boosts

2014-01-08 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865280#comment-13865280
 ] 

ASF subversion and git services commented on LUCENE-5361:
-

Commit 1556485 from [~jpountz] in branch 'dev/branches/lucene_solr_4_6'
[ https://svn.apache.org/r1556485 ]

LUCENE-5361: Fixed handling of query boosts in FastVectorHighlighter.

 FVH throws away some boosts
 ---

 Key: LUCENE-5361
 URL: https://issues.apache.org/jira/browse/LUCENE-5361
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Nik Everett
Assignee: Adrien Grand
Priority: Minor
 Fix For: 4.6.1

 Attachments: LUCENE-5361.patch


 The FVH's FieldQuery throws away some boosts when flattening queries, 
 including DisjunctionMaxQuery and BooleanQuery queries.   Fragments generated 
 against queries containing boosted boolean queries don't end up sorted 
 correctly.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-5361) FVH throws away some boosts

2014-01-08 Thread Adrien Grand (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-5361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adrien Grand resolved LUCENE-5361.
--

Resolution: Fixed

While doing a final review, I noticed that you mistakenly modified the boost of 
the original query instead of the clone. I took the liberty to fix it before 
committing but please let me know if this looks wrong to you.

Committed, thanks!

 FVH throws away some boosts
 ---

 Key: LUCENE-5361
 URL: https://issues.apache.org/jira/browse/LUCENE-5361
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Nik Everett
Assignee: Adrien Grand
Priority: Minor
 Fix For: 4.6.1

 Attachments: LUCENE-5361.patch


 The FVH's FieldQuery throws away some boosts when flattening queries, 
 including DisjunctionMaxQuery and BooleanQuery queries.   Fragments generated 
 against queries containing boosted boolean queries don't end up sorted 
 correctly.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5379) Query-time multi-word synonym expansion

2014-01-08 Thread Markus Jelsma (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-5379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865335#comment-13865335
 ] 

Markus Jelsma commented on SOLR-5379:
-

Nolan, Jan, both of you have extensive knowledge about the one you worked on 
hosted on github. How do you compare features? I've checked your issue list and 
there are no new issues coming in and a lot have been resolved already, looks 
like that one is much more mature and flexible/configurable. 

 Query-time multi-word synonym expansion
 ---

 Key: SOLR-5379
 URL: https://issues.apache.org/jira/browse/SOLR-5379
 Project: Solr
  Issue Type: Improvement
  Components: query parsers
Reporter: Tien Nguyen Manh
  Labels: multi-word, queryparser, synonym
 Fix For: 4.7

 Attachments: quoted.patch, synonym-expander.patch


 While dealing with synonym at query time, solr failed to work with multi-word 
 synonyms due to some reasons:
 - First the lucene queryparser tokenizes user query by space so it split 
 multi-word term into two terms before feeding to synonym filter, so synonym 
 filter can't recognized multi-word term to do expansion
 - Second, if synonym filter expand into multiple terms which contains 
 multi-word synonym, The SolrQueryParseBase currently use MultiPhraseQuery to 
 handle synonyms. But MultiPhraseQuery don't work with term have different 
 number of words.
 For the first one, we can extend quoted all multi-word synonym in user query 
 so that lucene queryparser don't split it. There are a jira task related to 
 this one https://issues.apache.org/jira/browse/LUCENE-2605.
 For the second, we can replace MultiPhraseQuery by an appropriate BoleanQuery 
 SHOULD which contains multiple PhraseQuery in case tokens stream have 
 multi-word synonym.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5376) Add a demo search server

2014-01-08 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865352#comment-13865352
 ] 

ASF subversion and git services commented on LUCENE-5376:
-

Commit 1556508 from [~mikemccand] in branch 'dev/branches/lucene5376'
[ https://svn.apache.org/r1556508 ]

LUCENE-5207, LUCENE-5376: add expressions support to lucene server, so you can 
define a virtual field from any JS expression and then sort by that field or 
retrieve its values for all hits

 Add a demo search server
 

 Key: LUCENE-5376
 URL: https://issues.apache.org/jira/browse/LUCENE-5376
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Michael McCandless
Assignee: Michael McCandless
 Attachments: lucene-demo-server.tgz


 I think it'd be useful to have a demo search server for Lucene.
 Rather than being fully featured, like Solr, it would be minimal, just 
 wrapping the existing Lucene modules to show how you can make use of these 
 features in a server setting.
 The purpose is to demonstrate how one can build a minimal search server on 
 top of APIs like SearchManager, SearcherLifetimeManager, etc.
 This is also useful for finding rough edges / issues in Lucene's APIs that 
 make building a server unnecessarily hard.
 I don't think it should have back compatibility promises (except Lucene's 
 index back compatibility), so it's free to improve as Lucene's APIs change.
 As a starting point, I'll post what I built for the eating your own dog 
 food search app for Lucene's  Solr's jira issues 
 http://jirasearch.mikemccandless.com (blog: 
 http://blog.mikemccandless.com/2013/05/eating-dog-food-with-lucene.html ). It 
 uses Netty to expose basic indexing  searching APIs via JSON, but it's very 
 rough (lots nocommits).



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5207) lucene expressions module

2014-01-08 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865351#comment-13865351
 ] 

ASF subversion and git services commented on LUCENE-5207:
-

Commit 1556508 from [~mikemccand] in branch 'dev/branches/lucene5376'
[ https://svn.apache.org/r1556508 ]

LUCENE-5207, LUCENE-5376: add expressions support to lucene server, so you can 
define a virtual field from any JS expression and then sort by that field or 
retrieve its values for all hits

 lucene expressions module
 -

 Key: LUCENE-5207
 URL: https://issues.apache.org/jira/browse/LUCENE-5207
 Project: Lucene - Core
  Issue Type: New Feature
Reporter: Ryan Ernst
 Fix For: 4.6, 5.0

 Attachments: LUCENE-5207.patch, LUCENE-5207.patch, LUCENE-5207.patch


 Expressions are geared at defining an alternative ranking function (e.g. 
 incorporating the text relevance score and other field values/ranking
 signals). So they are conceptually much more like ElasticSearch's scripting 
 support (http://www.elasticsearch.org/guide/reference/modules/scripting/) 
 than solr's function queries.
 Some additional notes:
 * In addition to referring to other fields, they can also refer to other 
 expressions, so they can be used as computed fields.
 * You can rank documents easily by multiple expressions (its a SortField at 
 the end), e.g. Sort by year descending, then some function of score price and 
 time ascending.
 * The provided javascript expression syntax is much more efficient than using 
 a scripting engine, because it does not have dynamic typing (compiles to 
 .class files that work on doubles). Performance is similar to writing a 
 custom FieldComparator yourself, but much easier to do.
 * We have solr integration to contribute in the future, but this is just the 
 standalone lucene part as a start. Since lucene has no schema, it includes an 
 implementation of Bindings (SimpleBindings) that maps variable names to 
 SortField's or other expressions.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

lucene-solr pull request: Make ZkStateReader.aliases volatile, this can be ...

2014-01-08 Thread andyetitmoves

GitHub user andyetitmoves opened a pull request:

https://github.com/apache/lucene-solr/pull/15

Make ZkStateReader.aliases volatile, this can be updated async from a 
watcher



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/andyetitmoves/lucene-solr volatile-aliases

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/lucene-solr/pull/15.patch


commit 80d3a0f18700968f8f11854ccf07b2e59a73a594
Author: Ramkumar Aiyengar raiyen...@bloomberg.net
Date:   2014-01-08T12:13:29Z

Make ZkStateReader.aliases volatile, this can be updated async from a 
watcher

Test Plan: Staged rollout

Reviewers: dcollins, cpoersch, mjustice

Differential Revision: https://all.phab.dev.bloomberg.com/D41601




-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5388) Eliminate construction over readers for Tokenizer


[ 
https://issues.apache.org/jira/browse/LUCENE-5388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865385#comment-13865385
 ] 

Benson Margulies commented on LUCENE-5388:
--

Uwe, what's that mean practically? No PR yet? A PR just in trunk? Merging my 
recent doc to a 4.x branch?

 Eliminate construction over readers for Tokenizer
 -

 Key: LUCENE-5388
 URL: https://issues.apache.org/jira/browse/LUCENE-5388
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/other
Reporter: Benson Margulies

 In the modern world, Tokenizers are intended to be reusable, with input 
 supplied via #setReader. The constructors that take Reader are a vestige. 
 Worse yet, they invite people to make mistakes in handling the reader that 
 tangle them up with the state machine in Tokenizer. The sensible thing is to 
 eliminate these ctors, and force setReader usage.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Comment Edited] (LUCENE-5388) Eliminate construction over readers for Tokenizer


[ 
https://issues.apache.org/jira/browse/LUCENE-5388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865385#comment-13865385
 ] 

Benson Margulies edited comment on LUCENE-5388 at 1/8/14 12:59 PM:
---

Uwe, what's that mean practically? No PR yet? A PR just in trunk? Merging my 
recent doc to a 4.x branch? A feature branch where this goes to be merged in 
when the time is ripe?


was (Author: bmargulies):
Uwe, what's that mean practically? No PR yet? A PR just in trunk? Merging my 
recent doc to a 4.x branch?

 Eliminate construction over readers for Tokenizer
 -

 Key: LUCENE-5388
 URL: https://issues.apache.org/jira/browse/LUCENE-5388
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/other
Reporter: Benson Margulies

 In the modern world, Tokenizers are intended to be reusable, with input 
 supplied via #setReader. The constructors that take Reader are a vestige. 
 Worse yet, they invite people to make mistakes in handling the reader that 
 tangle them up with the state machine in Tokenizer. The sensible thing is to 
 eliminate these ctors, and force setReader usage.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5388) Eliminate construction over readers for Tokenizer


[ 
https://issues.apache.org/jira/browse/LUCENE-5388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865401#comment-13865401
 ] 

Robert Muir commented on LUCENE-5388:
-

Benson, he just means the patch would only be committed to trunk. I agree with 
this...

 Eliminate construction over readers for Tokenizer
 -

 Key: LUCENE-5388
 URL: https://issues.apache.org/jira/browse/LUCENE-5388
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/other
Reporter: Benson Margulies

 In the modern world, Tokenizers are intended to be reusable, with input 
 supplied via #setReader. The constructors that take Reader are a vestige. 
 Worse yet, they invite people to make mistakes in handling the reader that 
 tangle them up with the state machine in Tokenizer. The sensible thing is to 
 eliminate these ctors, and force setReader usage.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5388) Eliminate construction over readers for Tokenizer

2014-01-08 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865399#comment-13865399
 ] 

Uwe Schindler commented on LUCENE-5388:
---

Commit to trunk only, not backport to branch_4x.

 Eliminate construction over readers for Tokenizer
 -

 Key: LUCENE-5388
 URL: https://issues.apache.org/jira/browse/LUCENE-5388
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/other
Reporter: Benson Margulies

 In the modern world, Tokenizers are intended to be reusable, with input 
 supplied via #setReader. The constructors that take Reader are a vestige. 
 Worse yet, they invite people to make mistakes in handling the reader that 
 tangle them up with the state machine in Tokenizer. The sensible thing is to 
 eliminate these ctors, and force setReader usage.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5619) Improve BinaryField to make it Sortable and Indexable

2014-01-08 Thread Erick Erickson (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-5619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865447#comment-13865447
]

Erick Erickson commented on SOLR-5619:
--

Anshum:

Could you flesh this out a little more, including use-cases? I'm guessing this
would be fairly limited, it makes no sense to match on, say, the entire
contents of a movie that you'd indexed. Or at least it would be very difficult.

Include what limitations you envision. How would queries be parsed? Binary data
is just binary data that could happen to be query syntax. Base64 encode all the
field values on the client side? What would the rules be for tokenizing input
both at index and query time? Etc...

I suspect that there are a bunch of details that make this less useful than one
might think, but that's only a guess.

Improve BinaryField to make it Sortable and Indexable
-

Key: SOLR-5619
URL: https://issues.apache.org/jira/browse/SOLR-5619
Project: Solr
Issue Type: Improvement
Affects Versions: 4.6
Reporter: Anshum Gupta
Assignee: Anshum Gupta

Currently, BinaryField can neither be indexed nor sorted on.
Having them indexable and sortable should come in handy for a reasonable
amount of use cases e.g. wanting to index binary data (could come from
anything non-text).

--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-5610) Support cluster-wide properties with an API called CLUSTERPROP

2014-01-08 Thread Noble Paul (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-5610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Noble Paul updated SOLR-5610:
-

Attachment: SOLR-5610.patch

 Support cluster-wide properties with an API called CLUSTERPROP
 --

 Key: SOLR-5610
 URL: https://issues.apache.org/jira/browse/SOLR-5610
 Project: Solr
  Issue Type: Bug
Reporter: Noble Paul
 Attachments: SOLR-5610.patch


 Add a collection admin API for cluster wide property management
 the new API would create an entry in the root as 
 /cluster-props.json
 {code:javascript}
 {
 prop:val
 }
 {code}
 The API would work as
 /command=clusterpropname=propNamevalue=propVal
 there will be a set of well-known properties which can be set or unset with 
 this command



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Assigned] (SOLR-5610) Support cluster-wide properties with an API called CLUSTERPROP

2014-01-08 Thread Noble Paul (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-5610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Noble Paul reassigned SOLR-5610:


Assignee: Noble Paul

 Support cluster-wide properties with an API called CLUSTERPROP
 --

 Key: SOLR-5610
 URL: https://issues.apache.org/jira/browse/SOLR-5610
 Project: Solr
  Issue Type: Bug
Reporter: Noble Paul
Assignee: Noble Paul
 Attachments: SOLR-5610.patch


 Add a collection admin API for cluster wide property management
 the new API would create an entry in the root as 
 /cluster-props.json
 {code:javascript}
 {
 prop:val
 }
 {code}
 The API would work as
 /command=clusterpropname=propNamevalue=propVal
 there will be a set of well-known properties which can be set or unset with 
 this command



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5610) Support cluster-wide properties with an API called CLUSTERPROP

2014-01-08 Thread Noble Paul (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-5610?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865457#comment-13865457
 ] 

Noble Paul commented on SOLR-5610:
--

plan to commit this soon

 Support cluster-wide properties with an API called CLUSTERPROP
 --

 Key: SOLR-5610
 URL: https://issues.apache.org/jira/browse/SOLR-5610
 Project: Solr
  Issue Type: Bug
Reporter: Noble Paul
Assignee: Noble Paul
 Attachments: SOLR-5610.patch


 Add a collection admin API for cluster wide property management
 the new API would create an entry in the root as 
 /cluster-props.json
 {code:javascript}
 {
 prop:val
 }
 {code}
 The API would work as
 /command=clusterpropname=propNamevalue=propVal
 there will be a set of well-known properties which can be set or unset with 
 this command



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5488) Fix up test failures for Analytics Component

2014-01-08 Thread Erick Erickson (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-5488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865470#comment-13865470
 ] 

Erick Erickson commented on SOLR-5488:
--

What's our status here? There are still occasional test failures on trunk, all 
relating to returning null values. I'm reluctant to merge this back into 4x 
until they're resolved. I've been out of country for the last month, so I'm 
catching back up.

It's reasonable to put some temporary logging in to see if we can track this 
down, since the failures don't seem to be reproducible at will. If you do, 
please flag them with something like remove me so we can find them all. Do 
_not_ flag them with //nocommit since that'll cause problems

Let me know,
Erick

 Fix up test failures for Analytics Component
 

 Key: SOLR-5488
 URL: https://issues.apache.org/jira/browse/SOLR-5488
 Project: Solr
  Issue Type: Bug
Affects Versions: 5.0, 4.7
Reporter: Erick Erickson
Assignee: Erick Erickson
 Attachments: SOLR-5488.patch, SOLR-5488.patch, SOLR-5488.patch


 The analytics component has a few test failures, perhaps 
 environment-dependent. This is just to collect the test fixes in one place 
 for convenience when we merge back into 4.x



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5376) Add a demo search server


[ 
https://issues.apache.org/jira/browse/LUCENE-5376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865473#comment-13865473
 ] 

ASF subversion and git services commented on LUCENE-5376:
-

Commit 1556546 from [~mikemccand] in branch 'dev/branches/lucene5376'
[ https://svn.apache.org/r1556546 ]

LUCENE-5376: turn on scoring when sorting by field if any of the sort fields or 
retrieved fields require scores, e.g. when they are an expression field that 
uses _score

 Add a demo search server
 

 Key: LUCENE-5376
 URL: https://issues.apache.org/jira/browse/LUCENE-5376
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Michael McCandless
Assignee: Michael McCandless
 Attachments: lucene-demo-server.tgz


 I think it'd be useful to have a demo search server for Lucene.
 Rather than being fully featured, like Solr, it would be minimal, just 
 wrapping the existing Lucene modules to show how you can make use of these 
 features in a server setting.
 The purpose is to demonstrate how one can build a minimal search server on 
 top of APIs like SearchManager, SearcherLifetimeManager, etc.
 This is also useful for finding rough edges / issues in Lucene's APIs that 
 make building a server unnecessarily hard.
 I don't think it should have back compatibility promises (except Lucene's 
 index back compatibility), so it's free to improve as Lucene's APIs change.
 As a starting point, I'll post what I built for the eating your own dog 
 food search app for Lucene's  Solr's jira issues 
 http://jirasearch.mikemccandless.com (blog: 
 http://blog.mikemccandless.com/2013/05/eating-dog-food-with-lucene.html ). It 
 uses Netty to expose basic indexing  searching APIs via JSON, but it's very 
 rough (lots nocommits).



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5488) Fix up test failures for Analytics Component

2014-01-08 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-5488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865483#comment-13865483
 ] 

Mark Miller commented on SOLR-5488:
---

It's not reproducible by will, but it's pretty easy to reproduce. I see it 
fairly frequently in my local runs.

 Fix up test failures for Analytics Component
 

 Key: SOLR-5488
 URL: https://issues.apache.org/jira/browse/SOLR-5488
 Project: Solr
  Issue Type: Bug
Affects Versions: 5.0, 4.7
Reporter: Erick Erickson
Assignee: Erick Erickson
 Attachments: SOLR-5488.patch, SOLR-5488.patch, SOLR-5488.patch


 The analytics component has a few test failures, perhaps 
 environment-dependent. This is just to collect the test fixes in one place 
 for convenience when we merge back into 4.x



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5376) Add a demo search server


[ 
https://issues.apache.org/jira/browse/LUCENE-5376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865511#comment-13865511
 ] 

ASF subversion and git services commented on LUCENE-5376:
-

Commit 1556555 from [~mikemccand] in branch 'dev/branches/lucene5376'
[ https://svn.apache.org/r1556555 ]

LUCENE-5376: don't need to make ScoreValueSource public

 Add a demo search server
 

 Key: LUCENE-5376
 URL: https://issues.apache.org/jira/browse/LUCENE-5376
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Michael McCandless
Assignee: Michael McCandless
 Attachments: lucene-demo-server.tgz


 I think it'd be useful to have a demo search server for Lucene.
 Rather than being fully featured, like Solr, it would be minimal, just 
 wrapping the existing Lucene modules to show how you can make use of these 
 features in a server setting.
 The purpose is to demonstrate how one can build a minimal search server on 
 top of APIs like SearchManager, SearcherLifetimeManager, etc.
 This is also useful for finding rough edges / issues in Lucene's APIs that 
 make building a server unnecessarily hard.
 I don't think it should have back compatibility promises (except Lucene's 
 index back compatibility), so it's free to improve as Lucene's APIs change.
 As a starting point, I'll post what I built for the eating your own dog 
 food search app for Lucene's  Solr's jira issues 
 http://jirasearch.mikemccandless.com (blog: 
 http://blog.mikemccandless.com/2013/05/eating-dog-food-with-lucene.html ). It 
 uses Netty to expose basic indexing  searching APIs via JSON, but it's very 
 rough (lots nocommits).



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5488) Fix up test failures for Analytics Component

2014-01-08 Thread Steven Bower (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-5488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865520#comment-13865520
 ] 

Steven Bower commented on SOLR-5488:


[~markrmil...@gmail.com] what environment are you running in where you see the 
failures? (i've not been able to reproduce on my end)..

 Fix up test failures for Analytics Component
 

 Key: SOLR-5488
 URL: https://issues.apache.org/jira/browse/SOLR-5488
 Project: Solr
  Issue Type: Bug
Affects Versions: 5.0, 4.7
Reporter: Erick Erickson
Assignee: Erick Erickson
 Attachments: SOLR-5488.patch, SOLR-5488.patch, SOLR-5488.patch


 The analytics component has a few test failures, perhaps 
 environment-dependent. This is just to collect the test fixes in one place 
 for convenience when we merge back into 4.x



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-4414) MoreLikeThis on a shard finds no interesting terms if the document queried is not in that shard

2014-01-08 Thread Nimrod Gliksman (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-4414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865538#comment-13865538
 ] 

Nimrod Gliksman commented on SOLR-4414:
---

Hi,
Does anyone know of any progress in this matter, or any workaround?
Thanks!

 MoreLikeThis on a shard finds no interesting terms if the document queried is 
 not in that shard
 ---

 Key: SOLR-4414
 URL: https://issues.apache.org/jira/browse/SOLR-4414
 Project: Solr
  Issue Type: Bug
  Components: MoreLikeThis, SolrCloud
Affects Versions: 4.1
Reporter: Colin Bartolome

 Running a MoreLikeThis query in a cloud works only when the document being 
 queried exists in whatever shard serves the request. If the document is not 
 present in the shard, no interesting terms are found and, consequently, no 
 matches are found.
 h5. Steps to reproduce
 * Edit example/solr/collection1/conf/solrconfig.xml and add this line, with 
 the rest of the request handlers:
 {code:xml}
 requestHandler name=/mlt class=solr.MoreLikeThisHandler /
 {code}
 * Follow the [simplest SolrCloud 
 example|http://wiki.apache.org/solr/SolrCloud#Example_A:_Simple_two_shard_cluster]
  to get two shards running.
 * Hit this URL: 
 [http://localhost:8983/solr/collection1/mlt?mlt.fl=includesq=id:3007WFPmlt.match.include=falsemlt.interestingTerms=listmlt.mindf=1mlt.mintf=1]
 * Compare that output to that of this URL: 
 [http://localhost:7574/solr/collection1/mlt?mlt.fl=includesq=id:3007WFPmlt.match.include=falsemlt.interestingTerms=listmlt.mindf=1mlt.mintf=1]
 The former URL will return a result and list some interesting terms. The 
 latter URL will return no results and list no interesting terms. It will also 
 show this odd XML element:
 {code:xml}
 null name=response/
 {code}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-5541) Allow QueryElevationComponent to accept elevateIds and excludeIds as http parameters

2014-01-08 Thread Joel Bernstein (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-5541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joel Bernstein updated SOLR-5541:
-

Attachment: SOLR-5541.patch

Added a new patch. In the initial patches I was piggy backing a convenience 
method that was used by the test cases to manually set the elevateIds and 
excludeIds. This method was using an in memory cache would hold all the queries 
that were in the elevation xml file.

This approach would cause a memory leak when doing large scale query elavation, 
which this patch is designed to support.

This patch stops piggy backing that convenience method and adds a new method 
that doesn't interact with the elevation cache.


 Allow QueryElevationComponent to accept elevateIds and excludeIds as http 
 parameters
 

 Key: SOLR-5541
 URL: https://issues.apache.org/jira/browse/SOLR-5541
 Project: Solr
  Issue Type: Improvement
  Components: SearchComponents - other
Affects Versions: 4.6
Reporter: Joel Bernstein
Assignee: Joel Bernstein
Priority: Minor
 Fix For: 4.7

 Attachments: SOLR-5541.patch, SOLR-5541.patch, SOLR-5541.patch, 
 SOLR-5541.patch


 The QueryElevationComponent currently uses an xml file to map query strings 
 to elevateIds and excludeIds.
 This ticket adds the ability to pass in elevateIds and excludeIds through two 
 new http parameters elevateIds and excludeIds.
 This will allow more sophisticated business logic to be used in selecting 
 which ids to elevate/exclude.
 Proposed syntax:
 http://localhost:8983/solr/elevate?q=*:*elevatedIds=3,4excludeIds=6,8
 The elevateIds and excludeIds point to the unique document Id.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Comment Edited] (SOLR-5541) Allow QueryElevationComponent to accept elevateIds and excludeIds as http parameters

2014-01-08 Thread Joel Bernstein (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-5541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865557#comment-13865557
]

Joel Bernstein edited comment on SOLR-5541 at 1/8/14 3:33 PM:
--

Added a new patch. In the initial patches I was piggy backing a convenience
method that was used by the test cases to manually set the elevateIds and
excludeIds. This method was adding to an in memory cache that holds all the
queries that were in the elevation xml file.

This approach would cause a memory leak when doing large scale query elavation,
which this patch is designed to support.

This patch stops piggy backing that convenience method and adds a new method
that doesn't interact with the elevation cache.

was (Author: joel.bernstein):
Added a new patch. In the initial patches I was piggy backing a convenience
method that was used by the test cases to manually set the elevateIds and
excludeIds. This method was using an in memory cache would hold all the queries
that were in the elevation xml file.

This approach would cause a memory leak when doing large scale query elavation,
which this patch is designed to support.

This patch stops piggy backing that convenience method and adds a new method
that doesn't interact with the elevation cache.

Allow QueryElevationComponent to accept elevateIds and excludeIds as http
parameters

Key: SOLR-5541
URL: https://issues.apache.org/jira/browse/SOLR-5541
Project: Solr
Issue Type: Improvement
Components: SearchComponents - other
Affects Versions: 4.6
Reporter: Joel Bernstein
Assignee: Joel Bernstein
Priority: Minor
Fix For: 4.7

Attachments: SOLR-5541.patch, SOLR-5541.patch, SOLR-5541.patch,
SOLR-5541.patch

The QueryElevationComponent currently uses an xml file to map query strings
to elevateIds and excludeIds.
This ticket adds the ability to pass in elevateIds and excludeIds through two
new http parameters elevateIds and excludeIds.
This will allow more sophisticated business logic to be used in selecting
which ids to elevate/exclude.
Proposed syntax:
http://localhost:8983/solr/elevate?q=*:*elevatedIds=3,4excludeIds=6,8
The elevateIds and excludeIds point to the unique document Id.

--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5361) FVH throws away some boosts

2014-01-08 Thread Nik Everett (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865566#comment-13865566
 ] 

Nik Everett commented on LUCENE-5361:
-

Wonderful!  Thanks.

 FVH throws away some boosts
 ---

 Key: LUCENE-5361
 URL: https://issues.apache.org/jira/browse/LUCENE-5361
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Nik Everett
Assignee: Adrien Grand
Priority: Minor
 Fix For: 4.6.1

 Attachments: LUCENE-5361.patch


 The FVH's FieldQuery throws away some boosts when flattening queries, 
 including DisjunctionMaxQuery and BooleanQuery queries.   Fragments generated 
 against queries containing boosted boolean queries don't end up sorted 
 correctly.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5388) Eliminate construction over readers for Tokenizer

2014-01-08 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865573#comment-13865573
 ] 

Benson Margulies commented on LUCENE-5388:
--

How about we start by adding ctors that don't require a reader, and do treat 
them as 4.x fodder?

 Eliminate construction over readers for Tokenizer
 -

 Key: LUCENE-5388
 URL: https://issues.apache.org/jira/browse/LUCENE-5388
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/other
Reporter: Benson Margulies

 In the modern world, Tokenizers are intended to be reusable, with input 
 supplied via #setReader. The constructors that take Reader are a vestige. 
 Worse yet, they invite people to make mistakes in handling the reader that 
 tangle them up with the state machine in Tokenizer. The sensible thing is to 
 eliminate these ctors, and force setReader usage.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Comment Edited] (SOLR-4414) MoreLikeThis on a shard finds no interesting terms if the document queried is not in that shard

2014-01-08 Thread Nimrod Gliksman (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-4414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865538#comment-13865538
 ] 

Nimrod Gliksman edited comment on SOLR-4414 at 1/8/14 4:04 PM:
---

Hi,
Does anyone knows of any progress in this matter, or any workaround?
Thanks!


was (Author: nim...@modusp.com):
Hi,
Does anyone know of any progress in this matter, or any workaround?
Thanks!

 MoreLikeThis on a shard finds no interesting terms if the document queried is 
 not in that shard
 ---

 Key: SOLR-4414
 URL: https://issues.apache.org/jira/browse/SOLR-4414
 Project: Solr
  Issue Type: Bug
  Components: MoreLikeThis, SolrCloud
Affects Versions: 4.1
Reporter: Colin Bartolome

 Running a MoreLikeThis query in a cloud works only when the document being 
 queried exists in whatever shard serves the request. If the document is not 
 present in the shard, no interesting terms are found and, consequently, no 
 matches are found.
 h5. Steps to reproduce
 * Edit example/solr/collection1/conf/solrconfig.xml and add this line, with 
 the rest of the request handlers:
 {code:xml}
 requestHandler name=/mlt class=solr.MoreLikeThisHandler /
 {code}
 * Follow the [simplest SolrCloud 
 example|http://wiki.apache.org/solr/SolrCloud#Example_A:_Simple_two_shard_cluster]
  to get two shards running.
 * Hit this URL: 
 [http://localhost:8983/solr/collection1/mlt?mlt.fl=includesq=id:3007WFPmlt.match.include=falsemlt.interestingTerms=listmlt.mindf=1mlt.mintf=1]
 * Compare that output to that of this URL: 
 [http://localhost:7574/solr/collection1/mlt?mlt.fl=includesq=id:3007WFPmlt.match.include=falsemlt.interestingTerms=listmlt.mindf=1mlt.mintf=1]
 The former URL will return a result and list some interesting terms. The 
 latter URL will return no results and list no interesting terms. It will also 
 show this odd XML element:
 {code:xml}
 null name=response/
 {code}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5376) Add a demo search server


[ 
https://issues.apache.org/jira/browse/LUCENE-5376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865579#comment-13865579
 ] 

ASF subversion and git services commented on LUCENE-5376:
-

Commit 1556564 from [~mikemccand] in branch 'dev/branches/lucene5376'
[ https://svn.apache.org/r1556564 ]

LUCENE-5376: add another expression test case; add nocommit for bcp47 cutover

 Add a demo search server
 

 Key: LUCENE-5376
 URL: https://issues.apache.org/jira/browse/LUCENE-5376
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Michael McCandless
Assignee: Michael McCandless
 Attachments: lucene-demo-server.tgz


 I think it'd be useful to have a demo search server for Lucene.
 Rather than being fully featured, like Solr, it would be minimal, just 
 wrapping the existing Lucene modules to show how you can make use of these 
 features in a server setting.
 The purpose is to demonstrate how one can build a minimal search server on 
 top of APIs like SearchManager, SearcherLifetimeManager, etc.
 This is also useful for finding rough edges / issues in Lucene's APIs that 
 make building a server unnecessarily hard.
 I don't think it should have back compatibility promises (except Lucene's 
 index back compatibility), so it's free to improve as Lucene's APIs change.
 As a starting point, I'll post what I built for the eating your own dog 
 food search app for Lucene's  Solr's jira issues 
 http://jirasearch.mikemccandless.com (blog: 
 http://blog.mikemccandless.com/2013/05/eating-dog-food-with-lucene.html ). It 
 uses Netty to expose basic indexing  searching APIs via JSON, but it's very 
 rough (lots nocommits).



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5388) Eliminate construction over readers for Tokenizer

2014-01-08 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865580#comment-13865580
 ] 

Benson Margulies commented on LUCENE-5388:
--

setReader throws IOException, but the existing constructors don't. Analyzer 
'createComponents' doesn't. How to sort this out?

 Eliminate construction over readers for Tokenizer
 -

 Key: LUCENE-5388
 URL: https://issues.apache.org/jira/browse/LUCENE-5388
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/other
Reporter: Benson Margulies

 In the modern world, Tokenizers are intended to be reusable, with input 
 supplied via #setReader. The constructors that take Reader are a vestige. 
 Worse yet, they invite people to make mistakes in handling the reader that 
 tangle them up with the state machine in Tokenizer. The sensible thing is to 
 eliminate these ctors, and force setReader usage.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-5480) Make MoreLikeThisHandler distributable

2014-01-08 Thread Steve Molloy (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-5480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Molloy updated SOLR-5480:
---

Attachment: SOLR-5480.patch

(Hopefully) last 4.6 patch version, includes more bug fixing, normalization of 
output, etc. 

Working on migrating to trunk now.

 Make MoreLikeThisHandler distributable
 --

 Key: SOLR-5480
 URL: https://issues.apache.org/jira/browse/SOLR-5480
 Project: Solr
  Issue Type: Improvement
Reporter: Steve Molloy
 Attachments: SOLR-5480.patch, SOLR-5480.patch, SOLR-5480.patch, 
 SOLR-5480.patch, SOLR-5480.patch


 The MoreLikeThis component, when used in the standard search handler supports 
 distributed searches. But the MoreLikeThisHandler itself doesn't, which 
 prevents from say, passing in text to perform the query. I'll start looking 
 into adapting the SearchHandler logic to the MoreLikeThisHandler. If anyone 
 has some work done already and want to share, or want to contribute, any help 
 will be welcomed. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-4414) MoreLikeThis on a shard finds no interesting terms if the document queried is not in that shard

2014-01-08 Thread Steve Molloy (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-4414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865603#comment-13865603
 ] 

Steve Molloy commented on SOLR-4414:


You may want to look at SOLR-5480.

 MoreLikeThis on a shard finds no interesting terms if the document queried is 
 not in that shard
 ---

 Key: SOLR-4414
 URL: https://issues.apache.org/jira/browse/SOLR-4414
 Project: Solr
  Issue Type: Bug
  Components: MoreLikeThis, SolrCloud
Affects Versions: 4.1
Reporter: Colin Bartolome

 Running a MoreLikeThis query in a cloud works only when the document being 
 queried exists in whatever shard serves the request. If the document is not 
 present in the shard, no interesting terms are found and, consequently, no 
 matches are found.
 h5. Steps to reproduce
 * Edit example/solr/collection1/conf/solrconfig.xml and add this line, with 
 the rest of the request handlers:
 {code:xml}
 requestHandler name=/mlt class=solr.MoreLikeThisHandler /
 {code}
 * Follow the [simplest SolrCloud 
 example|http://wiki.apache.org/solr/SolrCloud#Example_A:_Simple_two_shard_cluster]
  to get two shards running.
 * Hit this URL: 
 [http://localhost:8983/solr/collection1/mlt?mlt.fl=includesq=id:3007WFPmlt.match.include=falsemlt.interestingTerms=listmlt.mindf=1mlt.mintf=1]
 * Compare that output to that of this URL: 
 [http://localhost:7574/solr/collection1/mlt?mlt.fl=includesq=id:3007WFPmlt.match.include=falsemlt.interestingTerms=listmlt.mindf=1mlt.mintf=1]
 The former URL will return a result and list some interesting terms. The 
 latter URL will return no results and list no interesting terms. It will also 
 show this odd XML element:
 {code:xml}
 null name=response/
 {code}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5615) Deadlock while trying to recover after a ZK session expiry


[ 
https://issues.apache.org/jira/browse/SOLR-5615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865604#comment-13865604
 ] 

ASF subversion and git services commented on SOLR-5615:
---

Commit 1556572 from [~markrmil...@gmail.com] in branch 'dev/trunk'
[ https://svn.apache.org/r1556572 ]

SOLR-5615: Deadlock while trying to recover after a ZK session expiration.

 Deadlock while trying to recover after a ZK session expiry
 --

 Key: SOLR-5615
 URL: https://issues.apache.org/jira/browse/SOLR-5615
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Affects Versions: 4.4, 4.5, 4.6
Reporter: Ramkumar Aiyengar
Assignee: Mark Miller
 Fix For: 5.0, 4.7, 4.6.1

 Attachments: SOLR-5615.patch, SOLR-5615.patch, SOLR-5615.patch


 The sequence of events which might trigger this is as follows:
  - Leader of a shard, say OL, has a ZK expiry
  - The new leader, NL, starts the election process
  - NL, through Overseer, clears the current leader (OL) for the shard from 
 the cluster state
  - OL reconnects to ZK, calls onReconnect from event thread (main-EventThread)
  - OL marks itself down
  - OL sets up watches for cluster state, and then retrieves it (with no 
 leader for this shard)
  - NL, through Overseer, updates cluster state to mark itself leader for the 
 shard
  - OL tries to register itself as a replica, and waits till the cluster state 
 is updated
with the new leader from event thread
  - ZK sends a watch update to OL, but it is blocked on the event thread 
 waiting for it.
 Oops. This finally breaks out after trying to register itself as replica 
 times out after 20 mins.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5615) Deadlock while trying to recover after a ZK session expiry

2014-01-08 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-5615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865606#comment-13865606
 ] 

ASF subversion and git services commented on SOLR-5615:
---

Commit 1556573 from [~markrmil...@gmail.com] in branch 'dev/branches/branch_4x'
[ https://svn.apache.org/r1556573 ]

SOLR-5615: Deadlock while trying to recover after a ZK session expiration.

 Deadlock while trying to recover after a ZK session expiry
 --

 Key: SOLR-5615
 URL: https://issues.apache.org/jira/browse/SOLR-5615
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Affects Versions: 4.4, 4.5, 4.6
Reporter: Ramkumar Aiyengar
Assignee: Mark Miller
 Fix For: 5.0, 4.7, 4.6.1

 Attachments: SOLR-5615.patch, SOLR-5615.patch, SOLR-5615.patch


 The sequence of events which might trigger this is as follows:
  - Leader of a shard, say OL, has a ZK expiry
  - The new leader, NL, starts the election process
  - NL, through Overseer, clears the current leader (OL) for the shard from 
 the cluster state
  - OL reconnects to ZK, calls onReconnect from event thread (main-EventThread)
  - OL marks itself down
  - OL sets up watches for cluster state, and then retrieves it (with no 
 leader for this shard)
  - NL, through Overseer, updates cluster state to mark itself leader for the 
 shard
  - OL tries to register itself as a replica, and waits till the cluster state 
 is updated
with the new leader from event thread
  - ZK sends a watch update to OL, but it is blocked on the event thread 
 waiting for it.
 Oops. This finally breaks out after trying to register itself as replica 
 times out after 20 mins.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5619) Improve BinaryField to make it Sortable and Indexable

2014-01-08 Thread Hoss Man (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-5619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865611#comment-13865611
 ] 

Hoss Man commented on SOLR-5619:


This came out of an idea i mentioned to Anshum offline when talking about his 
test for SOLR-5594, and some confusion i had based on what i 
remembered/imagined about some test code sarowe added in SOLR-5354.

my suggestion was that some of the things that existed in the test only 
FieldType in SOLR-5354, and in Anshum's SOLR-5594 patch, could/should probably 
just be improvements to BinaryField directly.

In SOLR-5354 a sortable subclass of BinaryField via docvalues -- anshum and i 
were just discussing the idea of using something like the existing 
BinaryTokenStream functinality from the lucene core test utils to promote the 
indexable/sortable logic up into BinaryField itself.

BinaryField already requires that external systems deal with it (the stored 
field part) via base64 encoded strings -- so from a query standpoint, yes -- 
you'd do term queries against it via base64, but the sorting of the indexed 
terms would be just like in SOLR-5354.

bq. I suspect that there are a bunch of details that make this less useful than 
one might think, but that's only a guess.

it probably wouldn't be super useful to a lot of people, but it would be nice 
to have a FieldType that gives you some more direct access to doing things 
directly with BytesRef in lucene -- if nothing else it would help act as a 
proof of concept/methodology for how people can write custom FieldTypes that do 
things with their specialized binary data.

If it doesn't make sense, it doesn't make sense -- but it seemed like it might 
be worth while to investigate given the instances where it's made sense to 
create weird little subclasses in various tests.

 Improve BinaryField to make it Sortable and Indexable
 -

 Key: SOLR-5619
 URL: https://issues.apache.org/jira/browse/SOLR-5619
 Project: Solr
  Issue Type: Improvement
Affects Versions: 4.6
Reporter: Anshum Gupta
Assignee: Anshum Gupta

 Currently, BinaryField can neither be indexed nor sorted on.
 Having them indexable and sortable should come in handy for a reasonable 
 amount of use cases e.g. wanting to index binary data (could come from 
 anything non-text).



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5615) Deadlock while trying to recover after a ZK session expiry


[ 
https://issues.apache.org/jira/browse/SOLR-5615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865612#comment-13865612
 ] 

Ramkumar Aiyengar commented on SOLR-5615:
-

Had noticed a separate race while initially investigating running onReconnect 
in a separate thread, https://github.com/apache/lucene-solr/pull/15 for that.

 Deadlock while trying to recover after a ZK session expiry
 --

 Key: SOLR-5615
 URL: https://issues.apache.org/jira/browse/SOLR-5615
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Affects Versions: 4.4, 4.5, 4.6
Reporter: Ramkumar Aiyengar
Assignee: Mark Miller
 Fix For: 5.0, 4.7, 4.6.1

 Attachments: SOLR-5615.patch, SOLR-5615.patch, SOLR-5615.patch


 The sequence of events which might trigger this is as follows:
  - Leader of a shard, say OL, has a ZK expiry
  - The new leader, NL, starts the election process
  - NL, through Overseer, clears the current leader (OL) for the shard from 
 the cluster state
  - OL reconnects to ZK, calls onReconnect from event thread (main-EventThread)
  - OL marks itself down
  - OL sets up watches for cluster state, and then retrieves it (with no 
 leader for this shard)
  - NL, through Overseer, updates cluster state to mark itself leader for the 
 shard
  - OL tries to register itself as a replica, and waits till the cluster state 
 is updated
with the new leader from event thread
  - ZK sends a watch update to OL, but it is blocked on the event thread 
 waiting for it.
 Oops. This finally breaks out after trying to register itself as replica 
 times out after 20 mins.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5388) Eliminate construction over readers for Tokenizer

[
https://issues.apache.org/jira/browse/LUCENE-5388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865620#comment-13865620
]

Robert Muir commented on LUCENE-5388:
-

{quote}
How about we start by adding ctors that don't require a reader, and do treat
them as 4.x fodder?
{quote}

I'd prefer not, because then there needs to be very sophisticated backwards
compat to know which one to call. and subclassing gets complicated.

I would really prefer we just choose to fix the API, either 1) 5.0-only or 2)
break it in a 4.x release

From my perspective, Benson would be probably be the one most impacted by such
a 4.x break. So if he really wants to do this, I have no problem.

{quote}
setReader throws IOException, but the existing constructors don't. Analyzer
'createComponents' doesn't. How to sort this out?
{quote}

I dont see the problem. I think createComponents doesnt need to throw
exception: instead the logic of Analyzer.tokenStream changes slightly, to call
components.setReader(r) in both cases of the if-then-else. Make sense?

Eliminate construction over readers for Tokenizer
-

Key: LUCENE-5388
URL: https://issues.apache.org/jira/browse/LUCENE-5388
Project: Lucene - Core
Issue Type: Improvement
Components: core/other
Reporter: Benson Margulies

In the modern world, Tokenizers are intended to be reusable, with input
supplied via #setReader. The constructors that take Reader are a vestige.
Worse yet, they invite people to make mistakes in handling the reader that
tangle them up with the state machine in Tokenizer. The sensible thing is to
eliminate these ctors, and force setReader usage.

--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5615) Deadlock while trying to recover after a ZK session expiry


[ 
https://issues.apache.org/jira/browse/SOLR-5615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865621#comment-13865621
 ] 

Mark Miller commented on SOLR-5615:
---

Thanks, we should open a new JIRA issue for that.

 Deadlock while trying to recover after a ZK session expiry
 --

 Key: SOLR-5615
 URL: https://issues.apache.org/jira/browse/SOLR-5615
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Affects Versions: 4.4, 4.5, 4.6
Reporter: Ramkumar Aiyengar
Assignee: Mark Miller
 Fix For: 5.0, 4.7, 4.6.1

 Attachments: SOLR-5615.patch, SOLR-5615.patch, SOLR-5615.patch


 The sequence of events which might trigger this is as follows:
  - Leader of a shard, say OL, has a ZK expiry
  - The new leader, NL, starts the election process
  - NL, through Overseer, clears the current leader (OL) for the shard from 
 the cluster state
  - OL reconnects to ZK, calls onReconnect from event thread (main-EventThread)
  - OL marks itself down
  - OL sets up watches for cluster state, and then retrieves it (with no 
 leader for this shard)
  - NL, through Overseer, updates cluster state to mark itself leader for the 
 shard
  - OL tries to register itself as a replica, and waits till the cluster state 
 is updated
with the new leader from event thread
  - ZK sends a watch update to OL, but it is blocked on the event thread 
 waiting for it.
 Oops. This finally breaks out after trying to register itself as replica 
 times out after 20 mins.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (SOLR-5620) Race condition while setting ZkStateReader.aliases

Ramkumar Aiyengar created SOLR-5620:
---

 Summary: Race condition while setting ZkStateReader.aliases
 Key: SOLR-5620
 URL: https://issues.apache.org/jira/browse/SOLR-5620
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Affects Versions: 4.6
Reporter: Ramkumar Aiyengar
Priority: Minor


Noticed while working on SOLR-5615, 
https://github.com/apache/lucene-solr/pull/15 for a patch.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5615) Deadlock while trying to recover after a ZK session expiry


[ 
https://issues.apache.org/jira/browse/SOLR-5615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865627#comment-13865627
 ] 

Ramkumar Aiyengar commented on SOLR-5615:
-

SOLR-5620

 Deadlock while trying to recover after a ZK session expiry
 --

 Key: SOLR-5615
 URL: https://issues.apache.org/jira/browse/SOLR-5615
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Affects Versions: 4.4, 4.5, 4.6
Reporter: Ramkumar Aiyengar
Assignee: Mark Miller
 Fix For: 5.0, 4.7, 4.6.1

 Attachments: SOLR-5615.patch, SOLR-5615.patch, SOLR-5615.patch


 The sequence of events which might trigger this is as follows:
  - Leader of a shard, say OL, has a ZK expiry
  - The new leader, NL, starts the election process
  - NL, through Overseer, clears the current leader (OL) for the shard from 
 the cluster state
  - OL reconnects to ZK, calls onReconnect from event thread (main-EventThread)
  - OL marks itself down
  - OL sets up watches for cluster state, and then retrieves it (with no 
 leader for this shard)
  - NL, through Overseer, updates cluster state to mark itself leader for the 
 shard
  - OL tries to register itself as a replica, and waits till the cluster state 
 is updated
with the new leader from event thread
  - ZK sends a watch update to OL, but it is blocked on the event thread 
 waiting for it.
 Oops. This finally breaks out after trying to register itself as replica 
 times out after 20 mins.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5388) Eliminate construction over readers for Tokenizer


[ 
https://issues.apache.org/jira/browse/LUCENE-5388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865626#comment-13865626
 ] 

Benson Margulies commented on LUCENE-5388:
--

OK, I see. If we don't do compatibility, then no one calls setReader in 
createComponents, and all is well. OK, I'm proceeding.


 Eliminate construction over readers for Tokenizer
 -

 Key: LUCENE-5388
 URL: https://issues.apache.org/jira/browse/LUCENE-5388
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/other
Reporter: Benson Margulies

 In the modern world, Tokenizers are intended to be reusable, with input 
 supplied via #setReader. The constructors that take Reader are a vestige. 
 Worse yet, they invite people to make mistakes in handling the reader that 
 tangle them up with the state machine in Tokenizer. The sensible thing is to 
 eliminate these ctors, and force setReader usage.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5620) Race condition while setting ZkStateReader.aliases


[ 
https://issues.apache.org/jira/browse/SOLR-5620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865625#comment-13865625
 ] 

Ramkumar Aiyengar commented on SOLR-5620:
-

I added a synchronized section around updateAliases just for consistency with 
the Watcher code in my second commit, though I am unsure of why it's even 
present in the Watcher code in the first place?

 Race condition while setting ZkStateReader.aliases
 --

 Key: SOLR-5620
 URL: https://issues.apache.org/jira/browse/SOLR-5620
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Affects Versions: 4.6
Reporter: Ramkumar Aiyengar
Priority: Minor

 Noticed while working on SOLR-5615, 
 https://github.com/apache/lucene-solr/pull/15 for a patch.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5388) Eliminate construction over readers for Tokenizer


[ 
https://issues.apache.org/jira/browse/LUCENE-5388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865634#comment-13865634
 ] 

Benson Margulies commented on LUCENE-5388:
--

Why does the reader get passed to createComponents in this model? Should that 
param go away?


 Eliminate construction over readers for Tokenizer
 -

 Key: LUCENE-5388
 URL: https://issues.apache.org/jira/browse/LUCENE-5388
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/other
Reporter: Benson Margulies

 In the modern world, Tokenizers are intended to be reusable, with input 
 supplied via #setReader. The constructors that take Reader are a vestige. 
 Worse yet, they invite people to make mistakes in handling the reader that 
 tangle them up with the state machine in Tokenizer. The sensible thing is to 
 eliminate these ctors, and force setReader usage.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5619) Improve BinaryField to make it Sortable and Indexable

2014-01-08 Thread Erick Erickson (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-5619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865640#comment-13865640
 ] 

Erick Erickson commented on SOLR-5619:
--

Right, I'm not arguing that it never makes sense, just that we should be clear 
during implementation what the expectations/use-cases are. For instance, I can 
see defining a tokenizer that split the binary input into tokens based on a 
base64 encoding of some special characters that had meaning only to the 
specific app trying to index/search binary values.

I suppose the real work will be coaching people on what searching binary 
values really means. It wouldn't be, for instance, something you'd do face 
recognition with :)



 Improve BinaryField to make it Sortable and Indexable
 -

 Key: SOLR-5619
 URL: https://issues.apache.org/jira/browse/SOLR-5619
 Project: Solr
  Issue Type: Improvement
Affects Versions: 4.6
Reporter: Anshum Gupta
Assignee: Anshum Gupta

 Currently, BinaryField can neither be indexed nor sorted on.
 Having them indexable and sortable should come in handy for a reasonable 
 amount of use cases e.g. wanting to index binary data (could come from 
 anything non-text).



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-4414) MoreLikeThis on a shard finds no interesting terms if the document queried is not in that shard

2014-01-08 Thread Nimrod Gliksman (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-4414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865651#comment-13865651
 ] 

Nimrod Gliksman commented on SOLR-4414:
---

Thanks

 MoreLikeThis on a shard finds no interesting terms if the document queried is 
 not in that shard
 ---

 Key: SOLR-4414
 URL: https://issues.apache.org/jira/browse/SOLR-4414
 Project: Solr
  Issue Type: Bug
  Components: MoreLikeThis, SolrCloud
Affects Versions: 4.1
Reporter: Colin Bartolome

 Running a MoreLikeThis query in a cloud works only when the document being 
 queried exists in whatever shard serves the request. If the document is not 
 present in the shard, no interesting terms are found and, consequently, no 
 matches are found.
 h5. Steps to reproduce
 * Edit example/solr/collection1/conf/solrconfig.xml and add this line, with 
 the rest of the request handlers:
 {code:xml}
 requestHandler name=/mlt class=solr.MoreLikeThisHandler /
 {code}
 * Follow the [simplest SolrCloud 
 example|http://wiki.apache.org/solr/SolrCloud#Example_A:_Simple_two_shard_cluster]
  to get two shards running.
 * Hit this URL: 
 [http://localhost:8983/solr/collection1/mlt?mlt.fl=includesq=id:3007WFPmlt.match.include=falsemlt.interestingTerms=listmlt.mindf=1mlt.mintf=1]
 * Compare that output to that of this URL: 
 [http://localhost:7574/solr/collection1/mlt?mlt.fl=includesq=id:3007WFPmlt.match.include=falsemlt.interestingTerms=listmlt.mindf=1mlt.mintf=1]
 The former URL will return a result and list some interesting terms. The 
 latter URL will return no results and list no interesting terms. It will also 
 show this odd XML element:
 {code:xml}
 null name=response/
 {code}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5488) Fix up test failures for Analytics Component


[ 
https://issues.apache.org/jira/browse/SOLR-5488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865657#comment-13865657
 ] 

Mark Miller commented on SOLR-5488:
---

Linux. The occur pretty frequently in the jenkins runs too, but I have not paid 
attention to if it's just a specific OS. I assumed it was a timing issue. Those 
can be hard to duplicate sometimes either due to raw hardware speed differences 
or other differences leading to different tests running at the same time, etc.

 Fix up test failures for Analytics Component
 

 Key: SOLR-5488
 URL: https://issues.apache.org/jira/browse/SOLR-5488
 Project: Solr
  Issue Type: Bug
Affects Versions: 5.0, 4.7
Reporter: Erick Erickson
Assignee: Erick Erickson
 Attachments: SOLR-5488.patch, SOLR-5488.patch, SOLR-5488.patch


 The analytics component has a few test failures, perhaps 
 environment-dependent. This is just to collect the test fixes in one place 
 for convenience when we merge back into 4.x



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

lucene-solr pull request: LUCENE-5388: code and highlighting changes to rem...

2014-01-08 Thread benson-basis

GitHub user benson-basis opened a pull request:

https://github.com/apache/lucene-solr/pull/16

LUCENE-5388: code and highlighting changes to remove Reader from Tokenizers



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/benson-basis/lucene-solr 
lucene-5388-tokenizer-ctor-changes

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/lucene-solr/pull/16.patch


commit 9c0bc03d6d233ce5b7595bdca8acd52cf40c6337
Author: Benson Margulies ben...@basistech.com
Date:   2014-01-08T17:17:18Z

LUCENE-5388: code and highlighting changes to remove Reader from Tokenizer 
ctors.




-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Nested Grouping / Field Collapsing

2014-01-08 Thread Kranti Parisa

Anyone has got latest updates for
https://issues.apache.org/jira/browse/SOLR-2553 ?
I am trying to take a look at the implementation and see how complex this
is to achieve.

If someone else had a look into it earlier, could you please share your
thoughts/comments..

Thanks,
Kranti K. Parisa
http://www.linkedin.com/in/krantiparisa

[jira] [Commented] (LUCENE-5388) Eliminate construction over readers for Tokenizer


[ 
https://issues.apache.org/jira/browse/LUCENE-5388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865661#comment-13865661
 ] 

Benson Margulies commented on LUCENE-5388:
--

https://github.com/apache/lucene-solr/pull/16 is available for your read 
pleasure to see what these changes look like.


 Eliminate construction over readers for Tokenizer
 -

 Key: LUCENE-5388
 URL: https://issues.apache.org/jira/browse/LUCENE-5388
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/other
Reporter: Benson Margulies

 In the modern world, Tokenizers are intended to be reusable, with input 
 supplied via #setReader. The constructors that take Reader are a vestige. 
 Worse yet, they invite people to make mistakes in handling the reader that 
 tangle them up with the state machine in Tokenizer. The sensible thing is to 
 eliminate these ctors, and force setReader usage.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5354) Blended score in AnalyzingInfixSuggester

2014-01-08 Thread Michael McCandless (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865659#comment-13865659
 ] 

Michael McCandless commented on LUCENE-5354:


New patch looks great, thanks Remi!

I'm worried about how costly iterating over term vectors is going to be ... are 
you planning to run the performance test?  If not, I can.

It might be better to open up a protected method to convert the smallest 
position to the coefficient?  The default impl can do the switch based on the 
BlenderType enum... but apps may want to control how the score is boosted by 
position.

 Blended score in AnalyzingInfixSuggester
 

 Key: LUCENE-5354
 URL: https://issues.apache.org/jira/browse/LUCENE-5354
 Project: Lucene - Core
  Issue Type: Improvement
  Components: modules/spellchecker
Affects Versions: 4.4
Reporter: Remi Melisson
Priority: Minor
  Labels: suggester
 Attachments: LUCENE-5354.patch, LUCENE-5354_2.patch


 I'm working on a custom suggester derived from the AnalyzingInfix. I require 
 what is called a blended score (//TODO ln.399 in AnalyzingInfixSuggester) 
 to transform the suggestion weights depending on the position of the searched 
 term(s) in the text.
 Right now, I'm using an easy solution :
 If I want 10 suggestions, then I search against the current ordered index for 
 the 100 first results and transform the weight :
 bq. a) by using the term position in the text (found with TermVector and 
 DocsAndPositionsEnum)
 or
 bq. b) by multiplying the weight by the score of a SpanQuery that I add when 
 searching
 and return the updated 10 most weighted suggestions.
 Since we usually don't need to suggest so many things, the bigger search + 
 rescoring overhead is not so significant but I agree that this is not the 
 most elegant solution.
 We could include this factor (here the position of the term) directly into 
 the index.
 So, I can contribute to this if you think it's worth adding it.
 Do you think I should tweak AnalyzingInfixSuggester, subclass it or create a 
 dedicated class ?



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5388) Eliminate construction over readers for Tokenizer


[ 
https://issues.apache.org/jira/browse/LUCENE-5388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865667#comment-13865667
 ] 

Benson Margulies commented on LUCENE-5388:
--

[~rcmuir] Next frontier is TokenizerFactory.

Do we change #create to not take a reader, or do we add 'throws IOException'? 
Based on comments above, I'd think we take out the reader.

[~mikemccand] I would love help. If you tell me your github id, I'll add you to 
my repo, and you can take up some of the ton of editing.


 Eliminate construction over readers for Tokenizer
 -

 Key: LUCENE-5388
 URL: https://issues.apache.org/jira/browse/LUCENE-5388
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/other
Reporter: Benson Margulies

 In the modern world, Tokenizers are intended to be reusable, with input 
 supplied via #setReader. The constructors that take Reader are a vestige. 
 Worse yet, they invite people to make mistakes in handling the reader that 
 tangle them up with the state machine in Tokenizer. The sensible thing is to 
 eliminate these ctors, and force setReader usage.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5388) Eliminate construction over readers for Tokenizer


[ 
https://issues.apache.org/jira/browse/LUCENE-5388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865675#comment-13865675
 ] 

Robert Muir commented on LUCENE-5388:
-

{quote}
Why does the reader get passed to createComponents in this model? Should that 
param go away?
{quote}

Yes: please nuke it!

 Eliminate construction over readers for Tokenizer
 -

 Key: LUCENE-5388
 URL: https://issues.apache.org/jira/browse/LUCENE-5388
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/other
Reporter: Benson Margulies

 In the modern world, Tokenizers are intended to be reusable, with input 
 supplied via #setReader. The constructors that take Reader are a vestige. 
 Worse yet, they invite people to make mistakes in handling the reader that 
 tangle them up with the state machine in Tokenizer. The sensible thing is to 
 eliminate these ctors, and force setReader usage.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5388) Eliminate construction over readers for Tokenizer


[ 
https://issues.apache.org/jira/browse/LUCENE-5388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865677#comment-13865677
 ] 

Robert Muir commented on LUCENE-5388:
-

{quote}
Do we change #create to not take a reader, or do we add 'throws IOException'? 
Based on comments above, I'd think we take out the reader.
{quote}

Yes, please. its not like a factory can do anything fancy with it anyway: its 
only called the first time, subsequent readers are invoked with setReader. so 
this is just more of the same, please nuke it!

 Eliminate construction over readers for Tokenizer
 -

 Key: LUCENE-5388
 URL: https://issues.apache.org/jira/browse/LUCENE-5388
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/other
Reporter: Benson Margulies

 In the modern world, Tokenizers are intended to be reusable, with input 
 supplied via #setReader. The constructors that take Reader are a vestige. 
 Worse yet, they invite people to make mistakes in handling the reader that 
 tangle them up with the state machine in Tokenizer. The sensible thing is to 
 eliminate these ctors, and force setReader usage.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5388) Eliminate construction over readers for Tokenizer


[ 
https://issues.apache.org/jira/browse/LUCENE-5388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865678#comment-13865678
 ] 

Uwe Schindler commented on LUCENE-5388:
---

Yes, createComponents should no longer get a Reader, too! Same for factories. 
The factory just creates the instance, setting the reading is up to the 
consumer.

 Eliminate construction over readers for Tokenizer
 -

 Key: LUCENE-5388
 URL: https://issues.apache.org/jira/browse/LUCENE-5388
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/other
Reporter: Benson Margulies

 In the modern world, Tokenizers are intended to be reusable, with input 
 supplied via #setReader. The constructors that take Reader are a vestige. 
 Worse yet, they invite people to make mistakes in handling the reader that 
 tangle them up with the state machine in Tokenizer. The sensible thing is to 
 eliminate these ctors, and force setReader usage.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5388) Eliminate construction over readers for Tokenizer


[ 
https://issues.apache.org/jira/browse/LUCENE-5388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865681#comment-13865681
 ] 

Uwe Schindler commented on LUCENE-5388:
---

The cool thing is: In Analyzer we may simplify things like the initReader() 
protected method. We should also look at those APIs. Most of the code in 
Analyzer is to work around the ctor / setReader stuff.

 Eliminate construction over readers for Tokenizer
 -

 Key: LUCENE-5388
 URL: https://issues.apache.org/jira/browse/LUCENE-5388
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/other
Reporter: Benson Margulies

 In the modern world, Tokenizers are intended to be reusable, with input 
 supplied via #setReader. The constructors that take Reader are a vestige. 
 Worse yet, they invite people to make mistakes in handling the reader that 
 tangle them up with the state machine in Tokenizer. The sensible thing is to 
 eliminate these ctors, and force setReader usage.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-5618) Reproducible failure from TestFiltering.testRandomFiltering

2014-01-08 Thread Hoss Man (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-5618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hoss Man updated SOLR-5618:
---

Attachment: SOLR-5618.patch

I've recreated the failure conditions in a non-randomized test.

See testHoSanity in the updated patch for the details, but the bottom 
line is after building up two sets of SolrParams (match_1 and match_0), we have 
a situation where the following test code fails on the last line (match_1 gets 
a numFound==0)...

{code}
// 1 then 0
assertJQ(req(match_1), /response/numFound==1);
assertJQ(req(match_0), /response/numFound==0);

// clear caches
assertU(commit());

// 0 then 1
assertJQ(req(match_0), /response/numFound==0);
assertJQ(req(match_1), /response/numFound==1);
{code}

...which definitely smells like a caching bug.

Perhaps this is a Query.equals() problem with one of the query classes used in 
the test?  I'll investigate a bit more later today -- starting with trying to 
simplify the params to the barest case that still fails.



 Reproducible failure from TestFiltering.testRandomFiltering
 ---

 Key: SOLR-5618
 URL: https://issues.apache.org/jira/browse/SOLR-5618
 Project: Solr
  Issue Type: Bug
Reporter: Hoss Man
 Attachments: SOLR-5618.patch, SOLR-5618.patch


 uwe's jenkins found this in java8...
 http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-Linux/9004/consoleText
 {noformat}
[junit4]   2 NOTE: reproduce with: ant test  -Dtestcase=TestFiltering 
 -Dtests.method=testRandomFiltering -Dtests.seed=C22042E80957AE3E 
 -Dtests.multiplier=3 -Dtests.slow=true -Dtests.locale=ar_LY 
 -Dtests.timezone=Asia/Katmandu -Dtests.file.encoding=UTF-8
[junit4] FAILURE 16.9s J1 | TestFiltering.testRandomFiltering 
[junit4] Throwable #1: java.lang.AssertionError: FAILURE: iiter=11 
 qiter=336 request=[q, {!frange v=val_i l=0 u=1 cost=139 tag=t}, fq, {!frange 
 v=val_i l=0 u=1}, fq, {! cost=92}-_query_:{!frange v=val_i l=1 u=1}, fq, 
 {!frange v=val_i l=0 u=1 cache=true tag=t}, fq, {! cache=true 
 tag=t}-_query_:{!frange v=val_i l=1 u=1}]
[junit4]  at 
 __randomizedtesting.SeedInfo.seed([C22042E80957AE3E:DD43E12DEC70EE37]:0)
[junit4]  at 
 org.apache.solr.search.TestFiltering.testRandomFiltering(TestFiltering.java:327)
 {noformat}
 The seed fails consistently for me on trunk using java7, and on 4x using both 
 java7 and java6 - details to follow in comment.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: The Old Git Discussion

2014-01-08 Thread Sanne Grinovero

+1 David and Mark

I also like Lajos have - very sadly - not contributed as much as I'd
want to Lucene, but having followed this thread with interest for a
while, I hope my contribution is well received.

I do sympathize with all the problems which have been highlighted
about Git as I've had the same impression 3 years ago when all our
projects (Hibernate) where moved to Git, and I was the skeptical one
back then. I have suffered from it for a couple of weeks, while I was
pointlessly trying to map my previous SVN workflow on Git.. until I
realized that that was the main crux of my pain with it. I really do
have to admit I was just stubborn and grown up in bad habits, I'm
extremely happy we moved now.. and yes - no offence - but from an
outsider you all look like carving code on a stone wall with stone
axes.

Sparing you all the details of what I did wrong and how exactly it
should be used the point is really a huge flexibility and a better
model for the problem it solves.
On this thread I've seen several problems being pointed out about
git, but while I'd be happy to chat about each single one, for the
sake of brevity my impression is just confusion by people who are
trying to use it as it was an alias to svn. To put it boldly you're
missing the point :-)
If you need details, feel free to ask here or contact me on IRC: I'm
afraid my email is too long already.

Would be good to see some negative points from someone who actually
used it for a significant time. From my part for example I don't like
the complexity of handling merges; but then again we also use
fast-forward only; considering that, maybe I've never actually
understood how a merge should be done - as I've never practiced it.
Please take it as an example of how you don't need to learn all its
details and still get huge benefits from it: in 47 releases, for 3
years long, ~100 contributors have been happily collaborating, we
developed a workflow which suites us best and never ever needed to do
a merge.

And yes I confirm it feels very odd for an occasional contributor that
you guys still work by attaching patch files to JIRA.

 - Sanne


On 8 January 2014 00:45, David Smiley (@MITRE.org) dsmi...@mitre.org wrote:
 +1, Mark.

 Git isn't perfect; I sympathize with the annoyances pointed out by Rob et.
 all.  But I think we would be better off for it -- a net win considering the
 upsides.  In the end I'd love to track changes via branches (which includes
 forks people make to add changes), not with attaching patch files to an
 issue tracker.  The way we do things here sucks for collaboration and it's a
 higher bar for people to get involved than it can and should be.

 ~ David


 Mark Miller-3 wrote
 I don’t really buy the fad argument, but as I’ve said, I’m willing to wait
 a little longer for others to catch on. I try and follow the stats and
 reports and articles on this pretty closely.

 As I mentioned early in the thread, by all appearances, the shift from SVN
 to GIT looks much like the shift from CVS to SVN. This was not a fad
 change, nor is the next mass movement likely to be.

 Just like no one starts a project on CVS anymore, we are almost already to
 the point where new projects start exclusive on GIT - especially open
 source.

 I’m happy to sit back and watch the trend continue though. The number of
 GIT users in the committee and among the committers only grows every time
 the discussion comes up.

 If this was 2009, 2010, 2011 … who knows, perhaps I would buy some fad
 argument. But it just doesn’t jive in 2014.

 - Mark





 -
  Author: http://www.packtpub.com/apache-solr-3-enterprise-search-server/book
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/The-Old-Git-Discussion-tp4109193p4110109.html
 Sent from the Lucene - Java Developer mailing list archive at Nabble.com.

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: The Old Git Discussion

2014-01-08 Thread Robert Muir

On Wed, Jan 8, 2014 at 1:25 PM, Sanne Grinovero
sanne.grinov...@gmail.com wrote:
 sake of brevity my impression is just confusion by people who are
 trying to use it as it was an alias to svn. To put it boldly you're
 missing the point :-)

 Would be good to see some negative points from someone who actually
 used it for a significant time.


And these are two typical quotes from git fantatics: they assume that
the person complaining about their shitty tool knows nothing about
distributed version control and is an idiot, etc, etc.

Again, ive been using git on my day job for over a year.

I've also used mercurial, which was extremely intuitive and usable (I
think i came up to speed on this almost immediately).

The problem is not that I havent used git, and its not that i'm
missing the point of distributed VC.

Git is just really done badly.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-4260) Inconsistent numDocs between leader and replica


 [ 
https://issues.apache.org/jira/browse/SOLR-4260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Timothy Potter updated SOLR-4260:
-

Attachment: demo_shard1_replicas_out_of_sync.tgz

While doing some other testing of SolrCloud (branch4x - 4.7-SNAPSHOT rev. 
1556055), I hit this issue and here's the kicker ... there were no errors in my 
replica's log, the tlogs are identical, and there was no significant GC 
activity during the time where the replica got out of sync with the leader. I'm 
attaching the data directories (index + tlog) for both replicas 
(demo_shard1_replica1 [leader], and demo_shard1_replica2) and their log files. 
When I do a doc-by-doc comparison of the two indexes, here's the result:

 finished querying replica1, found 33537 documents (33537)
 finished querying replica2, found 33528 documents
Doc [82995] not found in replica2: doc boost=1.0field 
name=id82995/fieldfield name=string_stest/fieldfield 
name=int_i-274468088/fieldfield name=float_f0.90338105/fieldfield 
name=double_d0.6949391474539932/fieldfield name=text_enthis is a 
test/fieldfield name=_version_1456683668206518274/field/doc
Doc [82997] not found in replica2: doc boost=1.0field 
name=id82997/fieldfield name=string_stest/fieldfield 
name=int_i301737117/fieldfield name=float_f0.6746266/fieldfield 
name=double_d0.26034065188918565/fieldfield name=text_enthis is a 
test/fieldfield name=_version_1456683668206518276/field/doc
Doc [82996] not found in replica2: doc boost=1.0field 
name=id82996/fieldfield name=string_stest/fieldfield 
name=int_i-1768315588/fieldfield name=float_f0.6641093/fieldfield 
name=double_d0.23708033183534993/fieldfield name=text_enthis is a 
test/fieldfield name=_version_1456683668206518275/field/doc
Doc [82991] not found in replica2: doc boost=1.0field 
name=id82991/fieldfield name=string_stest/fieldfield 
name=int_i-2057280061/fieldfield name=float_f0.27617514/fieldfield 
name=double_d0.7885214691953506/fieldfield name=text_enthis is a 
test/fieldfield name=_version_1456683668206518273/field/doc
Doc [82987] not found in replica2: doc boost=1.0field 
name=id82987/fieldfield name=string_stest/fieldfield 
name=int_i1051456320/fieldfield name=float_f0.51863414/fieldfield 
name=double_d0.7881255443862878/fieldfield name=text_enthis is a 
test/fieldfield name=_version_1456683668206518272/field/doc
Doc [82986] not found in replica2: doc boost=1.0field 
name=id82986/fieldfield name=string_stest/fieldfield 
name=int_i-1356807889/fieldfield name=float_f0.2762279/fieldfield 
name=double_d0.003657816979820372/fieldfield name=text_enthis is a 
test/fieldfield name=_version_1456683668205469699/field/doc
Doc [82984] not found in replica2: doc boost=1.0field 
name=id82984/fieldfield name=string_stest/fieldfield 
name=int_i732678870/fieldfield name=float_f0.31199205/fieldfield 
name=double_d0.9848865821766198/fieldfield name=text_enthis is a 
test/fieldfield name=_version_1456683668205469698/field/doc
Doc [82970] not found in replica2: doc boost=1.0field 
name=id82970/fieldfield name=string_stest/fieldfield 
name=int_i283693979/fieldfield name=float_f0.6119651/fieldfield 
name=double_d0.04142006867388914/fieldfield name=text_enthis is a 
test/fieldfield name=_version_1456683668205469696/field/doc
Doc [82973] not found in replica2: doc boost=1.0field 
name=id82973/fieldfield name=string_stest/fieldfield 
name=int_i1343103920/fieldfield name=float_f0.5855809/fieldfield 
name=double_d0.6575904716584224/fieldfield name=text_enthis is a 
test/fieldfield name=_version_1456683668205469697/field/doc

No amount of committing or reloading of these cores helps. Also, restarting 
replica2 doesn't lead to it being in-sync either, most likely because the tlog 
is identical to the leader? Here's the log messages on replica2 after 
restarting it.

2014-01-08 13:28:20,112 [searcherExecutor-5-thread-1] INFO  solr.core.SolrCore  
- [demo_shard1_replica2] Registered new searcher Searcher@4345de8a 
main{StandardDirectoryReader(segments_e:38:nrt _d(4.7):C26791 _e(4.7):C3356 
_f(4.7):C3381)}
2014-01-08 13:28:21,298 [RecoveryThread] INFO  solr.cloud.RecoveryStrategy  - 
Attempting to PeerSync from 
http://ec2-54-209-223-12.compute-1.amazonaws.com:8984/solr/demo_shard1_replica1/
 core=demo_shard1_replica2 - recoveringAfterStartup=true
2014-01-08 13:28:21,302 [RecoveryThread] INFO  solr.update.PeerSync  - 
PeerSync: core=demo_shard1_replica2 
url=http://ec2-54-209-97-145.compute-1.amazonaws.com:8984/solr START 
replicas=[http://ec2-54-209-223-12.compute-1.amazonaws.com:8984/solr/demo_shard1_replica1/]
 nUpdates=100
2014-01-08 13:28:21,330 [RecoveryThread] INFO  solr.update.PeerSync  - 
PeerSync: core=demo_shard1_replica2 
url=http://ec2-54-209-97-145.compute-1.amazonaws.com:8984/solr  Received 99 
versions from 
ec2-54-209-223-12.compute-1.amazonaws.com:8984/solr/demo_shard1_replica1/
2014-01-08 13:28:21,331 [RecoveryThread] INFO  solr.update.PeerSync  - 
PeerSync:

[jira] [Commented] (SOLR-4260) Inconsistent numDocs between leader and replica

[
https://issues.apache.org/jira/browse/SOLR-4260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865742#comment-13865742
]

Mark Miller commented on SOLR-4260:
---

I've noticed something like this too - but nothing i could reproduce easily. I
imagine it's likely an issue in SolrCmdDistributor.

Inconsistent numDocs between leader and replica
---

Key: SOLR-4260
URL: https://issues.apache.org/jira/browse/SOLR-4260
Project: Solr
Issue Type: Bug
Components: SolrCloud
Environment: 5.0.0.2013.01.04.15.31.51
Reporter: Markus Jelsma
Assignee: Mark Miller
Priority: Critical
Fix For: 5.0, 4.7

Attachments: 192.168.20.102-replica1.png,
192.168.20.104-replica2.png, clusterstate.png,
demo_shard1_replicas_out_of_sync.tgz

After wiping all cores and reindexing some 3.3 million docs from Nutch using
CloudSolrServer we see inconsistencies between the leader and replica for
some shards.
Each core hold about 3.3k documents. For some reason 5 out of 10 shards have
a small deviation in then number of documents. The leader and slave deviate
for roughly 10-20 documents, not more.
Results hopping ranks in the result set for identical queries got my
attention, there were small IDF differences for exactly the same record
causing a record to shift positions in the result set. During those tests no
records were indexed. Consecutive catch all queries also return different
number of numDocs.
We're running a 10 node test cluster with 10 shards and a replication factor
of two and frequently reindex using a fresh build from trunk. I've not seen
this issue for quite some time until a few days ago.

--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-4260) Inconsistent numDocs between leader and replica

[
https://issues.apache.org/jira/browse/SOLR-4260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865744#comment-13865744
]

Mark Miller commented on SOLR-4260:
---

Although that doesn't really jive with the tran logs being identical...hmm...

Inconsistent numDocs between leader and replica
---

Attachments: 192.168.20.102-replica1.png,
192.168.20.104-replica2.png, clusterstate.png,
demo_shard1_replicas_out_of_sync.tgz

--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5604) Remove deprecations caused by httpclient 4.3.x upgrade

2014-01-08 Thread Shawn Heisey (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-5604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865753#comment-13865753
 ] 

Shawn Heisey commented on SOLR-5604:


[~olegk] replied to my httpclient-users post.  Thank you!

Some general thoughts regarding how SolrServer implementations use HttpClient:  
Currently we have a number of methods that change HttpClient settings after a 
SolrServer is created.  I think there are two choices for dealing with this, 
and the actual solution might be a blend of both:

1) Deprecate those methods and require users to create their own HttpClient 
object if they want to change those settings.

2) Have those methods change fields in the class which are then used to change 
settings when HC request objects are built.

I think the httpclient changes for Lucene need to be split off into a separate 
issue under the LUCENE project.


 Remove deprecations caused by httpclient 4.3.x upgrade
 --

 Key: SOLR-5604
 URL: https://issues.apache.org/jira/browse/SOLR-5604
 Project: Solr
  Issue Type: Improvement
Affects Versions: 4.7
Reporter: Shawn Heisey
 Fix For: 5.0, 4.7

 Attachments: SOLR-5604-4x-just-lucene.patch


 SOLR-5590 upgraded httpclient in Solr and Lucene to version 4.3.x.  This 
 version deprecates a LOT of classes and methods, recommending that they all 
 be replaced with various methods from the HttpClientBuilder class.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-4260) Inconsistent numDocs between leader and replica

[
https://issues.apache.org/jira/browse/SOLR-4260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865769#comment-13865769
]

Mark Miller commented on SOLR-4260:
---

No, wait, it could jive. We only check the last 99 docs on peer sync - if bunch
of docs just didn't show up well before that, it wouldn't be detected by peer
sync. I still think SolrCmdDistributor is the first place to look.

Inconsistent numDocs between leader and replica
---

Attachments: 192.168.20.102-replica1.png,
192.168.20.104-replica2.png, clusterstate.png,
demo_shard1_replicas_out_of_sync.tgz

--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5619) Improve BinaryField to make it Sortable and Indexable

2014-01-08 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-5619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865787#comment-13865787
 ] 

Anshum Gupta commented on SOLR-5619:


Now that Hoss already mentioned the initial reason why I opened this issue, 
I'll just add a bit more to it.
I think this would come in handy for people wanting to do that i.e. split the 
binary input into tokens based on a base64 encoding of some special 
characters, thereby also enabling users to run stuff like prefix/range queries 
against the field.


 Improve BinaryField to make it Sortable and Indexable
 -

 Key: SOLR-5619
 URL: https://issues.apache.org/jira/browse/SOLR-5619
 Project: Solr
  Issue Type: Improvement
Affects Versions: 4.6
Reporter: Anshum Gupta
Assignee: Anshum Gupta

 Currently, BinaryField can neither be indexed nor sorted on.
 Having them indexable and sortable should come in handy for a reasonable 
 amount of use cases e.g. wanting to index binary data (could come from 
 anything non-text).



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5369) Add an UpperCaseFilter


[ 
https://issues.apache.org/jira/browse/LUCENE-5369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865803#comment-13865803
 ] 

ASF subversion and git services commented on LUCENE-5369:
-

Commit 1556617 from [~ryantxu] in branch 'dev/branches/branch_4x'
[ https://svn.apache.org/r1556617 ]

LUCENE-5369: Added an UpperCaseFilter to make UPPERCASE tokens

 Add an UpperCaseFilter
 --

 Key: LUCENE-5369
 URL: https://issues.apache.org/jira/browse/LUCENE-5369
 Project: Lucene - Core
  Issue Type: New Feature
Reporter: Ryan McKinley
Assignee: Ryan McKinley
Priority: Minor
 Attachments: LUCENE-5369-uppercase-filter.patch


 We should offer a standard way to force upper-case tokens.  I understand that 
 lowercase is safer for general search quality because some uppercase 
 characters can represent multiple lowercase ones.
 However, having upper-case tokens is often nice for faceting (consider 
 normalizing to standard acronyms)



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5369) Add an UpperCaseFilter

2014-01-08 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865807#comment-13865807
 ] 

ASF subversion and git services commented on LUCENE-5369:
-

Commit 1556618 from [~ryantxu] in branch 'dev/trunk'
[ https://svn.apache.org/r1556618 ]

LUCENE-5369: Added an UpperCaseFilter to make UPPERCASE tokens (merge from 4x)

 Add an UpperCaseFilter
 --

 Key: LUCENE-5369
 URL: https://issues.apache.org/jira/browse/LUCENE-5369
 Project: Lucene - Core
  Issue Type: New Feature
Reporter: Ryan McKinley
Assignee: Ryan McKinley
Priority: Minor
 Attachments: LUCENE-5369-uppercase-filter.patch


 We should offer a standard way to force upper-case tokens.  I understand that 
 lowercase is safer for general search quality because some uppercase 
 characters can represent multiple lowercase ones.
 However, having upper-case tokens is often nice for faceting (consider 
 normalizing to standard acronyms)



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

RE: The Old Git Discussion

2014-01-08 Thread Steve Molloy

Guess it's time I jump in as well, although I'm not really a committer (only a 
couple of small patches submitted through Jira).

Before anything, no, I haven't used GIT. I don't think this is reason enough to 
just disregard my comments though. :)

Basically, I wouldn't mind one or the other, even if it meant adapting my tools 
and knowledge from SVN to GIT, the important thing to me is the project itself, 
not where code is stored. 

The one thing that is really keeping me away from GIT is the all or nothing 
nature of it. This is why I'm a Maven  SVN fan more than a GIT  ant one. 
Concretely, when I need to work on a module for a project at work, I simply go 
in eclipse, browse to the specific module, right-click the pom file and 
checkout as maven project. I can then start working on the specific module 
right away, all dependencies configured properly. If I need to look at a 
dependency's code, I do the same on it and my workspace gets updated 
appropriately. I don't need the whole code which can take forever to compile.

The fact that I need to get the whole repo to work on a single file just 
doesn't seem right. The single project containing everything you get after 
running ant eclipse in Solr also causes me headaches. You can have code working 
perfectly in eclipse that will not compile with ant because of project 
dependencies hidden while you work.

But no matter what happens, I will adapt and continue using Solr and 
contributing when I can, because the project itself is far from looking like 
it's been carved on stone wall using stone axes... :P

Steve

From: Robert Muir [rcm...@gmail.com]
Sent: January 8, 2014 1:30 PM
To: dev@lucene.apache.org
Subject: Re: The Old Git Discussion

On Wed, Jan 8, 2014 at 1:25 PM, Sanne Grinovero
sanne.grinov...@gmail.com wrote:
 sake of brevity my impression is just confusion by people who are
 trying to use it as it was an alias to svn. To put it boldly you're
 missing the point :-)

 Would be good to see some negative points from someone who actually
 used it for a significant time.


And these are two typical quotes from git fantatics: they assume that
the person complaining about their shitty tool knows nothing about
distributed version control and is an idiot, etc, etc.

Again, ive been using git on my day job for over a year.

I've also used mercurial, which was extremely intuitive and usable (I
think i came up to speed on this almost immediately).

The problem is not that I havent used git, and its not that i'm
missing the point of distributed VC.

Git is just really done badly.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5376) Add a demo search server

2014-01-08 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865810#comment-13865810
 ] 

ASF subversion and git services commented on LUCENE-5376:
-

Commit 1556620 from [~mikemccand] in branch 'dev/branches/lucene5376'
[ https://svn.apache.org/r1556620 ]

LUCENE-5376: also allow dynamic expression per-request

 Add a demo search server
 

 Key: LUCENE-5376
 URL: https://issues.apache.org/jira/browse/LUCENE-5376
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Michael McCandless
Assignee: Michael McCandless
 Attachments: lucene-demo-server.tgz


 I think it'd be useful to have a demo search server for Lucene.
 Rather than being fully featured, like Solr, it would be minimal, just 
 wrapping the existing Lucene modules to show how you can make use of these 
 features in a server setting.
 The purpose is to demonstrate how one can build a minimal search server on 
 top of APIs like SearchManager, SearcherLifetimeManager, etc.
 This is also useful for finding rough edges / issues in Lucene's APIs that 
 make building a server unnecessarily hard.
 I don't think it should have back compatibility promises (except Lucene's 
 index back compatibility), so it's free to improve as Lucene's APIs change.
 As a starting point, I'll post what I built for the eating your own dog 
 food search app for Lucene's  Solr's jira issues 
 http://jirasearch.mikemccandless.com (blog: 
 http://blog.mikemccandless.com/2013/05/eating-dog-food-with-lucene.html ). It 
 uses Netty to expose basic indexing  searching APIs via JSON, but it's very 
 rough (lots nocommits).



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-5480) Make MoreLikeThisHandler distributable

2014-01-08 Thread Steve Molloy (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-5480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Molloy updated SOLR-5480:
---

Attachment: SOLR-5480.patch

Attempt at a patch for trunk (rev 1556570). Got it to compile but not currently 
setup to test with trunk.

 Make MoreLikeThisHandler distributable
 --

 Key: SOLR-5480
 URL: https://issues.apache.org/jira/browse/SOLR-5480
 Project: Solr
  Issue Type: Improvement
Reporter: Steve Molloy
 Attachments: SOLR-5480.patch, SOLR-5480.patch, SOLR-5480.patch, 
 SOLR-5480.patch, SOLR-5480.patch, SOLR-5480.patch


 The MoreLikeThis component, when used in the standard search handler supports 
 distributed searches. But the MoreLikeThisHandler itself doesn't, which 
 prevents from say, passing in text to perform the query. I'll start looking 
 into adapting the SearchHandler logic to the MoreLikeThisHandler. If anyone 
 has some work done already and want to share, or want to contribute, any help 
 will be welcomed. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5230) Call DelegatingCollector.finish() during grouping

2014-01-08 Thread Steve Rowe (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-5230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865854#comment-13865854
 ] 

Steve Rowe commented on SOLR-5230:
--

[~joel.bernstein], is this committable?

 Call DelegatingCollector.finish() during grouping
 -

 Key: SOLR-5230
 URL: https://issues.apache.org/jira/browse/SOLR-5230
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 4.4
Reporter: Joel Bernstein
Priority: Minor
 Fix For: 4.7

 Attachments: SOLR-5230.patch


 This is an add-on to SOLR-5020 to call the new DelegatingCollector.finish() 
 method from inside the grouping flow. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5376) Add a demo search server

2014-01-08 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865861#comment-13865861
 ] 

ASF subversion and git services commented on LUCENE-5376:
-

Commit 1556627 from [~mikemccand] in branch 'dev/branches/lucene5376'
[ https://svn.apache.org/r1556627 ]

LUCENE-5376: remove recency blending hack: just use expressions instead

 Add a demo search server
 

 Key: LUCENE-5376
 URL: https://issues.apache.org/jira/browse/LUCENE-5376
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Michael McCandless
Assignee: Michael McCandless
 Attachments: lucene-demo-server.tgz


 I think it'd be useful to have a demo search server for Lucene.
 Rather than being fully featured, like Solr, it would be minimal, just 
 wrapping the existing Lucene modules to show how you can make use of these 
 features in a server setting.
 The purpose is to demonstrate how one can build a minimal search server on 
 top of APIs like SearchManager, SearcherLifetimeManager, etc.
 This is also useful for finding rough edges / issues in Lucene's APIs that 
 make building a server unnecessarily hard.
 I don't think it should have back compatibility promises (except Lucene's 
 index back compatibility), so it's free to improve as Lucene's APIs change.
 As a starting point, I'll post what I built for the eating your own dog 
 food search app for Lucene's  Solr's jira issues 
 http://jirasearch.mikemccandless.com (blog: 
 http://blog.mikemccandless.com/2013/05/eating-dog-food-with-lucene.html ). It 
 uses Netty to expose basic indexing  searching APIs via JSON, but it's very 
 rough (lots nocommits).



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5230) Call DelegatingCollector.finish() during grouping

2014-01-08 Thread Joel Bernstein (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-5230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865874#comment-13865874
 ] 

Joel Bernstein commented on SOLR-5230:
--

Steve, 

I think this is done properly, but I haven't added any test cases for this yet.

We'd need to mock up a simple PostFilter that uses finish() for the tests 
because the CollapsingQParserPlugin is the only PostFilter right now that 
relies on finish(). It would be better to mockup a simple test PostFilter for 
the tests.

Joel

 Call DelegatingCollector.finish() during grouping
 -

 Key: SOLR-5230
 URL: https://issues.apache.org/jira/browse/SOLR-5230
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 4.4
Reporter: Joel Bernstein
Priority: Minor
 Fix For: 4.7

 Attachments: SOLR-5230.patch


 This is an add-on to SOLR-5020 to call the new DelegatingCollector.finish() 
 method from inside the grouping flow. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5230) Call DelegatingCollector.finish() during grouping

2014-01-08 Thread Steve Rowe (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-5230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865882#comment-13865882
 ] 

Steve Rowe commented on SOLR-5230:
--

Thanks Joel, I'll see if I can whip up a test. - Steve

 Call DelegatingCollector.finish() during grouping
 -

 Key: SOLR-5230
 URL: https://issues.apache.org/jira/browse/SOLR-5230
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 4.4
Reporter: Joel Bernstein
Priority: Minor
 Fix For: 4.7

 Attachments: SOLR-5230.patch


 This is an add-on to SOLR-5020 to call the new DelegatingCollector.finish() 
 method from inside the grouping flow. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5369) Add an UpperCaseFilter

2014-01-08 Thread Shawn Heisey (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865902#comment-13865902
 ] 

Shawn Heisey commented on LUCENE-5369:
--

[~ryantxu], this fails precommit because the new files are missing 
svn:eol-style.

I actually ran the precommit because I was worried that it would fail the 
forbidden-apis check.  Looks like that only fails on String#toUpperCase if you 
don't include a locale.  Javadocs for Character say that Character#toUpperCase 
uses Unicode information, so I guess it's OK -- and precommit passed just fine 
after I added svn:eol-style native to the new files.


 Add an UpperCaseFilter
 --

 Key: LUCENE-5369
 URL: https://issues.apache.org/jira/browse/LUCENE-5369
 Project: Lucene - Core
  Issue Type: New Feature
Reporter: Ryan McKinley
Assignee: Ryan McKinley
Priority: Minor
 Attachments: LUCENE-5369-uppercase-filter.patch


 We should offer a standard way to force upper-case tokens.  I understand that 
 lowercase is safer for general search quality because some uppercase 
 characters can represent multiple lowercase ones.
 However, having upper-case tokens is often nice for faceting (consider 
 normalizing to standard acronyms)



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5388) Eliminate construction over readers for Tokenizer


[ 
https://issues.apache.org/jira/browse/LUCENE-5388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865907#comment-13865907
 ] 

Benson Margulies commented on LUCENE-5388:
--

[~rcmuir] or [~mikemccand] I could really use some help here with 
TestRandomChains.

 Eliminate construction over readers for Tokenizer
 -

 Key: LUCENE-5388
 URL: https://issues.apache.org/jira/browse/LUCENE-5388
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/other
Reporter: Benson Margulies

 In the modern world, Tokenizers are intended to be reusable, with input 
 supplied via #setReader. The constructors that take Reader are a vestige. 
 Worse yet, they invite people to make mistakes in handling the reader that 
 tangle them up with the state machine in Tokenizer. The sensible thing is to 
 eliminate these ctors, and force setReader usage.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5369) Add an UpperCaseFilter


[ 
https://issues.apache.org/jira/browse/LUCENE-5369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865911#comment-13865911
 ] 

Uwe Schindler commented on LUCENE-5369:
---

Yes Character.toUpperCase is fine and locale invariant.

 Add an UpperCaseFilter
 --

 Key: LUCENE-5369
 URL: https://issues.apache.org/jira/browse/LUCENE-5369
 Project: Lucene - Core
  Issue Type: New Feature
Reporter: Ryan McKinley
Assignee: Ryan McKinley
Priority: Minor
 Attachments: LUCENE-5369-uppercase-filter.patch


 We should offer a standard way to force upper-case tokens.  I understand that 
 lowercase is safer for general search quality because some uppercase 
 characters can represent multiple lowercase ones.
 However, having upper-case tokens is often nice for faceting (consider 
 normalizing to standard acronyms)



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5388) Eliminate construction over readers for Tokenizer

2014-01-08 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865913#comment-13865913
 ] 

Robert Muir commented on LUCENE-5388:
-

its a monster...

 Eliminate construction over readers for Tokenizer
 -

 Key: LUCENE-5388
 URL: https://issues.apache.org/jira/browse/LUCENE-5388
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/other
Reporter: Benson Margulies

 In the modern world, Tokenizers are intended to be reusable, with input 
 supplied via #setReader. The constructors that take Reader are a vestige. 
 Worse yet, they invite people to make mistakes in handling the reader that 
 tangle them up with the state machine in Tokenizer. The sensible thing is to 
 eliminate these ctors, and force setReader usage.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: [jira] [Commented] (LUCENE-5369) Add an UpperCaseFilter

2014-01-08 Thread Ryan McKinley

fixing now... sorry


On Wed, Jan 8, 2014 at 1:28 PM, Uwe Schindler (JIRA) j...@apache.orgwrote:


 [
 https://issues.apache.org/jira/browse/LUCENE-5369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865911#comment-13865911]

 Uwe Schindler commented on LUCENE-5369:
 ---

 Yes Character.toUpperCase is fine and locale invariant.

  Add an UpperCaseFilter
  --
 
  Key: LUCENE-5369
  URL: https://issues.apache.org/jira/browse/LUCENE-5369
  Project: Lucene - Core
   Issue Type: New Feature
 Reporter: Ryan McKinley
 Assignee: Ryan McKinley
 Priority: Minor
  Attachments: LUCENE-5369-uppercase-filter.patch
 
 
  We should offer a standard way to force upper-case tokens.  I understand
 that lowercase is safer for general search quality because some uppercase
 characters can represent multiple lowercase ones.
  However, having upper-case tokens is often nice for faceting (consider
 normalizing to standard acronyms)



 --
 This message was sent by Atlassian JIRA
 (v6.1.5#6160)

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5369) Add an UpperCaseFilter


[ 
https://issues.apache.org/jira/browse/LUCENE-5369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865916#comment-13865916
 ] 

ASF subversion and git services commented on LUCENE-5369:
-

Commit 1556643 from [~ryantxu] in branch 'dev/branches/branch_4x'
[ https://svn.apache.org/r1556643 ]

LUCENE-5369: missing eol:style

 Add an UpperCaseFilter
 --

 Key: LUCENE-5369
 URL: https://issues.apache.org/jira/browse/LUCENE-5369
 Project: Lucene - Core
  Issue Type: New Feature
Reporter: Ryan McKinley
Assignee: Ryan McKinley
Priority: Minor
 Attachments: LUCENE-5369-uppercase-filter.patch


 We should offer a standard way to force upper-case tokens.  I understand that 
 lowercase is safer for general search quality because some uppercase 
 characters can represent multiple lowercase ones.
 However, having upper-case tokens is often nice for faceting (consider 
 normalizing to standard acronyms)



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[JENKINS] Lucene-Solr-4.x-Linux (32bit/jdk1.6.0_45) - Build # 8916 - Failure!

2014-01-08 Thread Policeman Jenkins Server

Build: http://jenkins.thetaphi.de/job/Lucene-Solr-4.x-Linux/8916/
Java: 32bit/jdk1.6.0_45 -client -XX:+UseConcMarkSweepGC

All tests passed

Build Log:
[...truncated 35915 lines...]
BUILD FAILED
/mnt/ssd/jenkins/workspace/Lucene-Solr-4.x-Linux/build.xml:459: The following 
error occurred while executing this line:
/mnt/ssd/jenkins/workspace/Lucene-Solr-4.x-Linux/build.xml:398: The following 
error occurred while executing this line:
/mnt/ssd/jenkins/workspace/Lucene-Solr-4.x-Linux/extra-targets.xml:87: The 
following error occurred while executing this line:
/mnt/ssd/jenkins/workspace/Lucene-Solr-4.x-Linux/extra-targets.xml:185: The 
following files are missing svn:eol-style (or binary svn:mime-type):
* 
./lucene/analysis/common/src/java/org/apache/lucene/analysis/core/UpperCaseFilter.java
* 
./lucene/analysis/common/src/java/org/apache/lucene/analysis/core/UpperCaseFilterFactory.java

Total time: 54 minutes 4 seconds
Build step 'Invoke Ant' marked build as failure
Description set: Java: 32bit/jdk1.6.0_45 -client -XX:+UseConcMarkSweepGC
Archiving artifacts
Recording test results
Email was triggered for: Failure
Sending email for trigger: Failure



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5369) Add an UpperCaseFilter

2014-01-08 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865917#comment-13865917
 ] 

ASF subversion and git services commented on LUCENE-5369:
-

Commit 1556644 from [~ryantxu] in branch 'dev/trunk'
[ https://svn.apache.org/r1556644 ]

LUCENE-5369: missing eol:style (merge from 4x)

 Add an UpperCaseFilter
 --

 Key: LUCENE-5369
 URL: https://issues.apache.org/jira/browse/LUCENE-5369
 Project: Lucene - Core
  Issue Type: New Feature
Reporter: Ryan McKinley
Assignee: Ryan McKinley
Priority: Minor
 Attachments: LUCENE-5369-uppercase-filter.patch


 We should offer a standard way to force upper-case tokens.  I understand that 
 lowercase is safer for general search quality because some uppercase 
 characters can represent multiple lowercase ones.
 However, having upper-case tokens is often nice for faceting (consider 
 normalizing to standard acronyms)



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5388) Eliminate construction over readers for Tokenizer


[ 
https://issues.apache.org/jira/browse/LUCENE-5388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865925#comment-13865925
 ] 

Benson Margulies commented on LUCENE-5388:
--

It does something complex with the input reader in a createComponents. the 
challenge is to move all that to initReader so that it works. I think I'm too 
fried from 1000 other edits, I'll look in after dinner but anyone who wants to 
grab my branch from github and pitch in is more than welcome.


 Eliminate construction over readers for Tokenizer
 -

 Key: LUCENE-5388
 URL: https://issues.apache.org/jira/browse/LUCENE-5388
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/other
Reporter: Benson Margulies

 In the modern world, Tokenizers are intended to be reusable, with input 
 supplied via #setReader. The constructors that take Reader are a vestige. 
 Worse yet, they invite people to make mistakes in handling the reader that 
 tangle them up with the state machine in Tokenizer. The sensible thing is to 
 eliminate these ctors, and force setReader usage.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5388) Eliminate construction over readers for Tokenizer


[ 
https://issues.apache.org/jira/browse/LUCENE-5388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865930#comment-13865930
 ] 

Robert Muir commented on LUCENE-5388:
-

Yes: well the CheckThatYouDidntReadAnythingReaderWrapper can likely be removed. 
you are removing that possibility entirely. so it should get simpler... ill 
have a look at your branch tonight and try to help with some of this stuff, its 
hairy.

 Eliminate construction over readers for Tokenizer
 -

 Key: LUCENE-5388
 URL: https://issues.apache.org/jira/browse/LUCENE-5388
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/other
Reporter: Benson Margulies

 In the modern world, Tokenizers are intended to be reusable, with input 
 supplied via #setReader. The constructors that take Reader are a vestige. 
 Worse yet, they invite people to make mistakes in handling the reader that 
 tangle them up with the state machine in Tokenizer. The sensible thing is to 
 eliminate these ctors, and force setReader usage.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-4260) Inconsistent numDocs between leader and replica


[ 
https://issues.apache.org/jira/browse/SOLR-4260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865963#comment-13865963
 ] 

Timothy Potter commented on SOLR-4260:
--

Still digging into it ... I'm curious why a batch of 34 adds on the leader gets 
processed as several sub-batches on the replica? Here's what I'm seeing the 
logs around the documents that are missing from the replica. Basically, there 
are 34 docs on the leader and only 25 processed in 4 separate batches (from my 
counting of the logs) on the replica. Why wouldn't it just be one for one? The 
docs are all roughly the same size ... and what's breaking it up? Having 
trouble seeing that in the logs ;-)

On the leader:

2014-01-08 12:23:21,501 [qtp604104855-17] INFO  
update.processor.LogUpdateProcessor  - 
[demo_shard1_replica1] webapp=/solr path=/update params={wt=javabinversion=2} 
{add=[82900 (1456683668174012416), 82901 (1456683668181352448), 82903 
(1456683668181352449), 82904 (1456683668181352450), 82912 
(1456683668187643904), 82913 (1456683668188692480), 82914 
(1456683668188692481), 82916 (1456683668188692482), 82917 
(1456683668188692483), 82918 (1456683668188692484), ... (34 adds)]} 0 34

2014-01-08 12:23:21,600 [qtp604104855-17] INFO  
update.processor.LogUpdateProcessor  - 
[demo_shard1_replica1] webapp=/solr path=/update params={wt=javabinversion=2} 
{add=[83002 (1456683668280967168), 83005 (1456683668286210048), 83008 
(1456683668286210049), 83011 (1456683668286210050), 83012 
(1456683668286210051), 83013 (1456683668287258624), 83018 
(1456683668287258625), 83019 (1456683668289355776), 83023 
(1456683668289355777), 83024 (1456683668289355778), ... (43 adds)]} 0 32


On the replica:

2014-01-08 12:23:21,126 [qtp604104855-22] INFO  
update.processor.LogUpdateProcessor  - 
[demo_shard1_replica2] webapp=/solr path=/update 
params={distrib.from=http://ec2-54-209-223-12.compute-1.amazonaws.com:8984/solr/demo_shard1_replica1/update.distrib=FROMLEADERwt=javabinversion=2}
 
{add=[82900 (1456683668174012416), 82901 (1456683668181352448), 82903 
(1456683668181352449)]} 0 1

2014-01-08 12:23:21,134 [qtp604104855-22] INFO  
update.processor.LogUpdateProcessor  - 
[demo_shard1_replica2] webapp=/solr path=/update 
params={distrib.from=http://ec2-54-209-223-12.compute-1.amazonaws.com:8984/solr/demo_shard1_replica1/update.distrib=FROMLEADERwt=javabinversion=2}
 
{add=[82904 (1456683668181352450), 82912 (1456683668187643904), 82913 
(1456683668188692480), 82914 (1456683668188692481), 82916 
(1456683668188692482), 82917 (1456683668188692483), 82918 
(1456683668188692484), 82919 (1456683668188692485), 82922 
(1456683668188692486)]} 0 2

2014-01-08 12:23:21,139 [qtp604104855-22] INFO  
update.processor.LogUpdateProcessor  - 
[demo_shard1_replica2] webapp=/solr path=/update 
params={distrib.from=http://ec2-54-209-223-12.compute-1.amazonaws.com:8984/solr/demo_shard1_replica1/update.distrib=FROMLEADERwt=javabinversion=2}
 
{add=[82923 (1456683668188692487), 82926 (1456683668190789632), 82928 
(1456683668190789633), 82932 (1456683668190789634), 82939 
(1456683668192886784), 82945 (1456683668192886785), 82946 
(1456683668192886786), 82947 (1456683668193935360), 82952 
(1456683668193935361), 82962 (1456683668193935362), ... (12 adds)]} 0 3

2014-01-08 12:23:21,144 [qtp604104855-22] INFO  
update.processor.LogUpdateProcessor  - 
[demo_shard1_replica2] webapp=/solr path=/update 
params={distrib.from=http://ec2-54-209-223-12.compute-1.amazonaws.com:8984/solr/demo_shard1_replica1/update.distrib=FROMLEADERwt=javabinversion=2}
 
{add=[82967 (1456683668199178240)]} 0 0


 9 Docs Missing here 

2014-01-08 12:23:21,227 [qtp604104855-22] INFO  
update.processor.LogUpdateProcessor  - 
[demo_shard1_replica2] webapp=/solr path=/update 
params={distrib.from=http://ec2-54-209-223-12.compute-1.amazonaws.com:8984/solr/demo_shard1_replica1/update.distrib=FROMLEADERwt=javabinversion=2}
 
{add=[83002 (1456683668280967168), 83005 (1456683668286210048), 83008 
(1456683668286210049), 83011 (1456683668286210050), 83012 
(1456683668286210051), 83013 (1456683668287258624)]} 0 2


 Inconsistent numDocs between leader and replica
 ---

 Key: SOLR-4260
 URL: https://issues.apache.org/jira/browse/SOLR-4260
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
 Environment: 5.0.0.2013.01.04.15.31.51
Reporter: Markus Jelsma
Assignee: Mark Miller
Priority: Critical
 Fix For: 5.0, 4.7

 Attachments: 192.168.20.102-replica1.png, 
 192.168.20.104-replica2.png, clusterstate.png, 
 demo_shard1_replicas_out_of_sync.tgz


 After wiping all cores and reindexing some 3.3 million docs from Nutch using 
 CloudSolrServer we see inconsistencies between the leader and replica for 
 some shards.
 Each core hold about 3.3k documents. For some

[jira] [Comment Edited] (SOLR-4260) Inconsistent numDocs between leader and replica


[ 
https://issues.apache.org/jira/browse/SOLR-4260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865963#comment-13865963
 ] 

Timothy Potter edited comment on SOLR-4260 at 1/8/14 10:08 PM:
---

Still digging into it ... I'm curious why a batch of 34 adds on the leader gets 
processed as several sub-batches on the replica? Here's what I'm seeing the 
logs around the documents that are missing from the replica. Basically, there 
are 34 docs on the leader and only 25 processed in 4 separate batches (from my 
counting of the logs) on the replica. Why wouldn't it just be one for one? The 
docs are all roughly the same size ... and what's breaking it up? Having 
trouble seeing that in the logs / code ;-)

On the leader:

2014-01-08 12:23:21,501 [qtp604104855-17] INFO  
update.processor.LogUpdateProcessor  - 
[demo_shard1_replica1] webapp=/solr path=/update params={wt=javabinversion=2} 
{add=[82900 (1456683668174012416), 82901 (1456683668181352448), 82903 
(1456683668181352449), 82904 (1456683668181352450), 82912 
(1456683668187643904), 82913 (1456683668188692480), 82914 
(1456683668188692481), 82916 (1456683668188692482), 82917 
(1456683668188692483), 82918 (1456683668188692484), ... (34 adds)]} 0 34

2014-01-08 12:23:21,600 [qtp604104855-17] INFO  
update.processor.LogUpdateProcessor  - 
[demo_shard1_replica1] webapp=/solr path=/update params={wt=javabinversion=2} 
{add=[83002 (1456683668280967168), 83005 (1456683668286210048), 83008 
(1456683668286210049), 83011 (1456683668286210050), 83012 
(1456683668286210051), 83013 (1456683668287258624), 83018 
(1456683668287258625), 83019 (1456683668289355776), 83023 
(1456683668289355777), 83024 (1456683668289355778), ... (43 adds)]} 0 32


On the replica:

2014-01-08 12:23:21,126 [qtp604104855-22] INFO  
update.processor.LogUpdateProcessor  - 
[demo_shard1_replica2] webapp=/solr path=/update 
params={distrib.from=http://ec2-54-209-223-12.compute-1.amazonaws.com:8984/solr/demo_shard1_replica1/update.distrib=FROMLEADERwt=javabinversion=2}
 
{add=[82900 (1456683668174012416), 82901 (1456683668181352448), 82903 
(1456683668181352449)]} 0 1

2014-01-08 12:23:21,134 [qtp604104855-22] INFO  
update.processor.LogUpdateProcessor  - 
[demo_shard1_replica2] webapp=/solr path=/update 
params={distrib.from=http://ec2-54-209-223-12.compute-1.amazonaws.com:8984/solr/demo_shard1_replica1/update.distrib=FROMLEADERwt=javabinversion=2}
 
{add=[82904 (1456683668181352450), 82912 (1456683668187643904), 82913 
(1456683668188692480), 82914 (1456683668188692481), 82916 
(1456683668188692482), 82917 (1456683668188692483), 82918 
(1456683668188692484), 82919 (1456683668188692485), 82922 
(1456683668188692486)]} 0 2

2014-01-08 12:23:21,139 [qtp604104855-22] INFO  
update.processor.LogUpdateProcessor  - 
[demo_shard1_replica2] webapp=/solr path=/update 
params={distrib.from=http://ec2-54-209-223-12.compute-1.amazonaws.com:8984/solr/demo_shard1_replica1/update.distrib=FROMLEADERwt=javabinversion=2}
 
{add=[82923 (1456683668188692487), 82926 (1456683668190789632), 82928 
(1456683668190789633), 82932 (1456683668190789634), 82939 
(1456683668192886784), 82945 (1456683668192886785), 82946 
(1456683668192886786), 82947 (1456683668193935360), 82952 
(1456683668193935361), 82962 (1456683668193935362), ... (12 adds)]} 0 3

2014-01-08 12:23:21,144 [qtp604104855-22] INFO  
update.processor.LogUpdateProcessor  - 
[demo_shard1_replica2] webapp=/solr path=/update 
params={distrib.from=http://ec2-54-209-223-12.compute-1.amazonaws.com:8984/solr/demo_shard1_replica1/update.distrib=FROMLEADERwt=javabinversion=2}
 
{add=[82967 (1456683668199178240)]} 0 0


 9 Docs Missing here 

2014-01-08 12:23:21,227 [qtp604104855-22] INFO  
update.processor.LogUpdateProcessor  - 
[demo_shard1_replica2] webapp=/solr path=/update 
params={distrib.from=http://ec2-54-209-223-12.compute-1.amazonaws.com:8984/solr/demo_shard1_replica1/update.distrib=FROMLEADERwt=javabinversion=2}
 
{add=[83002 (1456683668280967168), 83005 (1456683668286210048), 83008 
(1456683668286210049), 83011 (1456683668286210050), 83012 
(1456683668286210051), 83013 (1456683668287258624)]} 0 2


Note the add log message starting with doc ID 83002 is just included here for 
context to show where the leader / replica got out of sync.


was (Author: tim.potter):
Still digging into it ... I'm curious why a batch of 34 adds on the leader gets 
processed as several sub-batches on the replica? Here's what I'm seeing the 
logs around the documents that are missing from the replica. Basically, there 
are 34 docs on the leader and only 25 processed in 4 separate batches (from my 
counting of the logs) on the replica. Why wouldn't it just be one for one? The 
docs are all roughly the same size ... and what's breaking it up? Having 
trouble seeing that in the logs / code ;-)

On the leader:

2014-01-08 12:23:21,501 [qtp604104855-17] INFO

[jira] [Comment Edited] (SOLR-4260) Inconsistent numDocs between leader and replica


[ 
https://issues.apache.org/jira/browse/SOLR-4260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865963#comment-13865963
 ] 

Timothy Potter edited comment on SOLR-4260 at 1/8/14 10:07 PM:
---

Still digging into it ... I'm curious why a batch of 34 adds on the leader gets 
processed as several sub-batches on the replica? Here's what I'm seeing the 
logs around the documents that are missing from the replica. Basically, there 
are 34 docs on the leader and only 25 processed in 4 separate batches (from my 
counting of the logs) on the replica. Why wouldn't it just be one for one? The 
docs are all roughly the same size ... and what's breaking it up? Having 
trouble seeing that in the logs / code ;-)

On the leader:

2014-01-08 12:23:21,501 [qtp604104855-17] INFO  
update.processor.LogUpdateProcessor  - 
[demo_shard1_replica1] webapp=/solr path=/update params={wt=javabinversion=2} 
{add=[82900 (1456683668174012416), 82901 (1456683668181352448), 82903 
(1456683668181352449), 82904 (1456683668181352450), 82912 
(1456683668187643904), 82913 (1456683668188692480), 82914 
(1456683668188692481), 82916 (1456683668188692482), 82917 
(1456683668188692483), 82918 (1456683668188692484), ... (34 adds)]} 0 34

2014-01-08 12:23:21,600 [qtp604104855-17] INFO  
update.processor.LogUpdateProcessor  - 
[demo_shard1_replica1] webapp=/solr path=/update params={wt=javabinversion=2} 
{add=[83002 (1456683668280967168), 83005 (1456683668286210048), 83008 
(1456683668286210049), 83011 (1456683668286210050), 83012 
(1456683668286210051), 83013 (1456683668287258624), 83018 
(1456683668287258625), 83019 (1456683668289355776), 83023 
(1456683668289355777), 83024 (1456683668289355778), ... (43 adds)]} 0 32


On the replica:

2014-01-08 12:23:21,126 [qtp604104855-22] INFO  
update.processor.LogUpdateProcessor  - 
[demo_shard1_replica2] webapp=/solr path=/update 
params={distrib.from=http://ec2-54-209-223-12.compute-1.amazonaws.com:8984/solr/demo_shard1_replica1/update.distrib=FROMLEADERwt=javabinversion=2}
 
{add=[82900 (1456683668174012416), 82901 (1456683668181352448), 82903 
(1456683668181352449)]} 0 1

2014-01-08 12:23:21,134 [qtp604104855-22] INFO  
update.processor.LogUpdateProcessor  - 
[demo_shard1_replica2] webapp=/solr path=/update 
params={distrib.from=http://ec2-54-209-223-12.compute-1.amazonaws.com:8984/solr/demo_shard1_replica1/update.distrib=FROMLEADERwt=javabinversion=2}
 
{add=[82904 (1456683668181352450), 82912 (1456683668187643904), 82913 
(1456683668188692480), 82914 (1456683668188692481), 82916 
(1456683668188692482), 82917 (1456683668188692483), 82918 
(1456683668188692484), 82919 (1456683668188692485), 82922 
(1456683668188692486)]} 0 2

2014-01-08 12:23:21,139 [qtp604104855-22] INFO  
update.processor.LogUpdateProcessor  - 
[demo_shard1_replica2] webapp=/solr path=/update 
params={distrib.from=http://ec2-54-209-223-12.compute-1.amazonaws.com:8984/solr/demo_shard1_replica1/update.distrib=FROMLEADERwt=javabinversion=2}
 
{add=[82923 (1456683668188692487), 82926 (1456683668190789632), 82928 
(1456683668190789633), 82932 (1456683668190789634), 82939 
(1456683668192886784), 82945 (1456683668192886785), 82946 
(1456683668192886786), 82947 (1456683668193935360), 82952 
(1456683668193935361), 82962 (1456683668193935362), ... (12 adds)]} 0 3

2014-01-08 12:23:21,144 [qtp604104855-22] INFO  
update.processor.LogUpdateProcessor  - 
[demo_shard1_replica2] webapp=/solr path=/update 
params={distrib.from=http://ec2-54-209-223-12.compute-1.amazonaws.com:8984/solr/demo_shard1_replica1/update.distrib=FROMLEADERwt=javabinversion=2}
 
{add=[82967 (1456683668199178240)]} 0 0


 9 Docs Missing here 

2014-01-08 12:23:21,227 [qtp604104855-22] INFO  
update.processor.LogUpdateProcessor  - 
[demo_shard1_replica2] webapp=/solr path=/update 
params={distrib.from=http://ec2-54-209-223-12.compute-1.amazonaws.com:8984/solr/demo_shard1_replica1/update.distrib=FROMLEADERwt=javabinversion=2}
 
{add=[83002 (1456683668280967168), 83005 (1456683668286210048), 83008 
(1456683668286210049), 83011 (1456683668286210050), 83012 
(1456683668286210051), 83013 (1456683668287258624)]} 0 2



was (Author: tim.potter):
Still digging into it ... I'm curious why a batch of 34 adds on the leader gets 
processed as several sub-batches on the replica? Here's what I'm seeing the 
logs around the documents that are missing from the replica. Basically, there 
are 34 docs on the leader and only 25 processed in 4 separate batches (from my 
counting of the logs) on the replica. Why wouldn't it just be one for one? The 
docs are all roughly the same size ... and what's breaking it up? Having 
trouble seeing that in the logs ;-)

On the leader:

2014-01-08 12:23:21,501 [qtp604104855-17] INFO  
update.processor.LogUpdateProcessor  - 
[demo_shard1_replica1] webapp=/solr path=/update params={wt=javabinversion=2} 
{add=[82900 (1456683668174012416), 82901

[jira] [Comment Edited] (SOLR-4260) Inconsistent numDocs between leader and replica