date:20200310

[GitHub] [lucene-solr] anshumg commented on issue #1335: SOLR-14316 Remove unchecked type conversion warning in JavaBinCodec's readMapEntry's equals() method

2020-03-10 Thread GitBox

anshumg commented on issue #1335: SOLR-14316 Remove unchecked type conversion 
warning in JavaBinCodec's readMapEntry's equals() method
URL: https://github.com/apache/lucene-solr/pull/1335#issuecomment-597473846
 
 
   LGTM. Thanks for adding the test.
   
   Can you please also add a CHANGELOG entry?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Updated] (SOLR-14316) Remove unchecked type conversion warning in JavaBinCodec's readMapEntry's equals() method

2020-03-10 Thread Aroop (Jira)



 [ 
https://issues.apache.org/jira/browse/SOLR-14316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aroop updated SOLR-14316:
-
Description: 
There is an unchecked type conversion warning in JavaBinCodec's readMapEntry's 
equals() method. 

This change removes that warning by handling a checked conversion and also adds 
to tests to an earlier untested api.

  was:There is an unchecked type conversion warning in JavaBinCodec's 
readMapEntry's equals() method. 


> Remove unchecked type conversion warning in JavaBinCodec's readMapEntry's 
> equals() method
> -
>
> Key: SOLR-14316
> URL: https://issues.apache.org/jira/browse/SOLR-14316
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrJ
>Affects Versions: 7.7.2, 8.4.1
>Reporter: Aroop
>Priority: Minor
>  Labels: patch
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> There is an unchecked type conversion warning in JavaBinCodec's 
> readMapEntry's equals() method. 
> This change removes that warning by handling a checked conversion and also 
> adds to tests to an earlier untested api.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] aroopganguly commented on issue #1335: SOLR-14316 Remove unchecked type conversion warning in JavaBinCodec's readMapEntry's equals() method

2020-03-10 Thread GitBox

aroopganguly commented on issue #1335: SOLR-14316 Remove unchecked type 
conversion warning in JavaBinCodec's readMapEntry's equals() method
URL: https://github.com/apache/lucene-solr/pull/1335#issuecomment-597466952
 
 
   @noblepaul tests added and some points to ponder upon as well in the Tests 
section in the main section of the PR above.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-13199) NPE due to unexpected null return value from QueryBitSetProducer.getBitSet

2020-03-10 Thread Munendra S N (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-13199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17056691#comment-17056691
 ] 

Munendra S N commented on SOLR-13199:
-

Thanks [~mkhl] for the review. The reason I thought of that scenario is that 
there are no enforcements or checks on what can be a parentFilter. As stated 
earlier, it is bit of stretch and I agree it is not a valid case

 [^SOLR-13199.patch]  modified to throw exceptions


> NPE due to unexpected null return value from QueryBitSetProducer.getBitSet
> --
>
> Key: SOLR-13199
> URL: https://issues.apache.org/jira/browse/SOLR-13199
> Project: Solr
>  Issue Type: Bug
>  Components: search
>Affects Versions: master (9.0)
> Environment: h1. Steps to reproduce
> * Use a Linux machine.
> *  Build commit {{ea2c8ba}} of Solr as described in the section below.
> * Build the films collection as described below.
> * Start the server using the command {{./bin/solr start -f -p 8983 -s 
> /tmp/home}}
> * Request the URL given in the bug description.
> h1. Compiling the server
> {noformat}
> git clone https://github.com/apache/lucene-solr
> cd lucene-solr
> git checkout ea2c8ba
> ant compile
> cd solr
> ant server
> {noformat}
> h1. Building the collection
> We followed [Exercise 
> 2|http://lucene.apache.org/solr/guide/7_5/solr-tutorial.html#exercise-2] from 
> the [Solr 
> Tutorial|http://lucene.apache.org/solr/guide/7_5/solr-tutorial.html]. The 
> attached file ({{home.zip}}) gives the contents of folder {{/tmp/home}} that 
> you will obtain by following the steps below:
> {noformat}
> mkdir -p /tmp/home
> echo '' > 
> /tmp/home/solr.xml
> {noformat}
> In one terminal start a Solr instance in foreground:
> {noformat}
> ./bin/solr start -f -p 8983 -s /tmp/home
> {noformat}
> In another terminal, create a collection of movies, with no shards and no 
> replication, and initialize it:
> {noformat}
> bin/solr create -c films
> curl -X POST -H 'Content-type:application/json' --data-binary '{"add-field": 
> {"name":"name", "type":"text_general", "multiValued":false, "stored":true}}' 
> http://localhost:8983/solr/films/schema
> curl -X POST -H 'Content-type:application/json' --data-binary 
> '{"add-copy-field" : {"source":"*","dest":"_text_"}}' 
> http://localhost:8983/solr/films/schema
> ./bin/post -c films example/films/films.json
> {noformat}
>Reporter: Johannes Kloos
>Assignee: Munendra S N
>Priority: Minor
>  Labels: diffblue, newdev
> Attachments: SOLR-13199.patch, SOLR-13199.patch, home.zip
>
>
> Requesting the following URL causes Solr to return an HTTP 500 error response:
> {noformat}
> http://localhost:8983/solr/films/select?fl=[child%20parentFilter=ge]&q=*:*
> {noformat}
> The error response seems to be caused by the following uncaught exception:
> {noformat}
> java.lang.NullPointerException
> at 
> org.apache.solr.response.transform.ChildDocTransformer.transform(ChildDocTransformer.java:92)
> at org.apache.solr.response.DocsStreamer.next(DocsStreamer.java:103)
> at org.apache.solr.response.DocsStreamer.next(DocsStreamer.java:1)
> at 
> org.apache.solr.response.TextResponseWriter.writeDocuments(TextResponseWriter.java:184)
> at 
> org.apache.solr.response.TextResponseWriter.writeVal(TextResponseWriter.java:136)
> at 
> org.apache.solr.common.util.JsonTextWriter.writeNamedListAsMapWithDups(JsonTextWriter.java:386)
> at 
> org.apache.solr.common.util.JsonTextWriter.writeNamedList(JsonTextWriter.java:292)
> at org.apache.solr.response.JSONWriter.writeResponse(JSONWriter.java:73)
> {noformat}
> In ChildDocTransformer.transform, we have the following lines:
> {noformat}
> final BitSet segParentsBitSet = parentsFilter.getBitSet(leafReaderContext);
> final int segPrevRootId = segRootId==0? -1: 
> segParentsBitSet.prevSetBit(segRootId - 1); // can return -1 and that's okay
> {noformat}
> But getBitSet can return null if the set of DocIds is empty:
> {noformat}
> return docIdSet == DocIdSet.EMPTY ? null : ((BitDocIdSet) docIdSet).bits();
> {noformat}
> We found this bug using [Diffblue Microservices 
> Testing|https://www.diffblue.com/labs/?utm_source=solr-br]. Find more 
> information on this [fuzz testing 
> campaign|https://www.diffblue.com/blog/2018/12/19/diffblue-microservice-testing-a-sneak-peek-at-our-early-product-and-results?utm_source=solr-br].



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Updated] (SOLR-13199) NPE due to unexpected null return value from QueryBitSetProducer.getBitSet

2020-03-10 Thread Munendra S N (Jira)



 [ 
https://issues.apache.org/jira/browse/SOLR-13199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Munendra S N updated SOLR-13199:

Attachment: SOLR-13199.patch

> NPE due to unexpected null return value from QueryBitSetProducer.getBitSet
> --
>
> Key: SOLR-13199
> URL: https://issues.apache.org/jira/browse/SOLR-13199
> Project: Solr
>  Issue Type: Bug
>  Components: search
>Affects Versions: master (9.0)
> Environment: h1. Steps to reproduce
> * Use a Linux machine.
> *  Build commit {{ea2c8ba}} of Solr as described in the section below.
> * Build the films collection as described below.
> * Start the server using the command {{./bin/solr start -f -p 8983 -s 
> /tmp/home}}
> * Request the URL given in the bug description.
> h1. Compiling the server
> {noformat}
> git clone https://github.com/apache/lucene-solr
> cd lucene-solr
> git checkout ea2c8ba
> ant compile
> cd solr
> ant server
> {noformat}
> h1. Building the collection
> We followed [Exercise 
> 2|http://lucene.apache.org/solr/guide/7_5/solr-tutorial.html#exercise-2] from 
> the [Solr 
> Tutorial|http://lucene.apache.org/solr/guide/7_5/solr-tutorial.html]. The 
> attached file ({{home.zip}}) gives the contents of folder {{/tmp/home}} that 
> you will obtain by following the steps below:
> {noformat}
> mkdir -p /tmp/home
> echo '' > 
> /tmp/home/solr.xml
> {noformat}
> In one terminal start a Solr instance in foreground:
> {noformat}
> ./bin/solr start -f -p 8983 -s /tmp/home
> {noformat}
> In another terminal, create a collection of movies, with no shards and no 
> replication, and initialize it:
> {noformat}
> bin/solr create -c films
> curl -X POST -H 'Content-type:application/json' --data-binary '{"add-field": 
> {"name":"name", "type":"text_general", "multiValued":false, "stored":true}}' 
> http://localhost:8983/solr/films/schema
> curl -X POST -H 'Content-type:application/json' --data-binary 
> '{"add-copy-field" : {"source":"*","dest":"_text_"}}' 
> http://localhost:8983/solr/films/schema
> ./bin/post -c films example/films/films.json
> {noformat}
>Reporter: Johannes Kloos
>Assignee: Munendra S N
>Priority: Minor
>  Labels: diffblue, newdev
> Attachments: SOLR-13199.patch, SOLR-13199.patch, home.zip
>
>
> Requesting the following URL causes Solr to return an HTTP 500 error response:
> {noformat}
> http://localhost:8983/solr/films/select?fl=[child%20parentFilter=ge]&q=*:*
> {noformat}
> The error response seems to be caused by the following uncaught exception:
> {noformat}
> java.lang.NullPointerException
> at 
> org.apache.solr.response.transform.ChildDocTransformer.transform(ChildDocTransformer.java:92)
> at org.apache.solr.response.DocsStreamer.next(DocsStreamer.java:103)
> at org.apache.solr.response.DocsStreamer.next(DocsStreamer.java:1)
> at 
> org.apache.solr.response.TextResponseWriter.writeDocuments(TextResponseWriter.java:184)
> at 
> org.apache.solr.response.TextResponseWriter.writeVal(TextResponseWriter.java:136)
> at 
> org.apache.solr.common.util.JsonTextWriter.writeNamedListAsMapWithDups(JsonTextWriter.java:386)
> at 
> org.apache.solr.common.util.JsonTextWriter.writeNamedList(JsonTextWriter.java:292)
> at org.apache.solr.response.JSONWriter.writeResponse(JSONWriter.java:73)
> {noformat}
> In ChildDocTransformer.transform, we have the following lines:
> {noformat}
> final BitSet segParentsBitSet = parentsFilter.getBitSet(leafReaderContext);
> final int segPrevRootId = segRootId==0? -1: 
> segParentsBitSet.prevSetBit(segRootId - 1); // can return -1 and that's okay
> {noformat}
> But getBitSet can return null if the set of DocIds is empty:
> {noformat}
> return docIdSet == DocIdSet.EMPTY ? null : ((BitDocIdSet) docIdSet).bits();
> {noformat}
> We found this bug using [Diffblue Microservices 
> Testing|https://www.diffblue.com/labs/?utm_source=solr-br]. Find more 
> information on this [fuzz testing 
> campaign|https://www.diffblue.com/blog/2018/12/19/diffblue-microservice-testing-a-sneak-peek-at-our-early-product-and-results?utm_source=solr-br].



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-13199) NPE due to unexpected null return value from QueryBitSetProducer.getBitSet

2020-03-10 Thread Mikhail Khludnev (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-13199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17056668#comment-17056668
 ] 

Mikhail Khludnev commented on SOLR-13199:
-

{quote}{{cat_s:fantasy}} is the parentFilter
{quote}
[~munendrasn], here I would disagree. Such query can't be a parent filter ever. 
{quote}bq. Solr flattens the nest then supplies the list to Lucene which 
+guarantees+ this.
{quote}
Thanks for reminding, [~dsmiley]. I agree that having even two distinct parent 
types like: {{type:ParentA}} and {{type:ParentB}} is not a valid case and 
shouldn't be considered.

But, couldn't a segment has all docs marked for deletion? Could it happen due 
to NRT or some other esoteric cases?  

> NPE due to unexpected null return value from QueryBitSetProducer.getBitSet
> --
>
> Key: SOLR-13199
> URL: https://issues.apache.org/jira/browse/SOLR-13199
> Project: Solr
>  Issue Type: Bug
>  Components: search
>Affects Versions: master (9.0)
> Environment: h1. Steps to reproduce
> * Use a Linux machine.
> *  Build commit {{ea2c8ba}} of Solr as described in the section below.
> * Build the films collection as described below.
> * Start the server using the command {{./bin/solr start -f -p 8983 -s 
> /tmp/home}}
> * Request the URL given in the bug description.
> h1. Compiling the server
> {noformat}
> git clone https://github.com/apache/lucene-solr
> cd lucene-solr
> git checkout ea2c8ba
> ant compile
> cd solr
> ant server
> {noformat}
> h1. Building the collection
> We followed [Exercise 
> 2|http://lucene.apache.org/solr/guide/7_5/solr-tutorial.html#exercise-2] from 
> the [Solr 
> Tutorial|http://lucene.apache.org/solr/guide/7_5/solr-tutorial.html]. The 
> attached file ({{home.zip}}) gives the contents of folder {{/tmp/home}} that 
> you will obtain by following the steps below:
> {noformat}
> mkdir -p /tmp/home
> echo '' > 
> /tmp/home/solr.xml
> {noformat}
> In one terminal start a Solr instance in foreground:
> {noformat}
> ./bin/solr start -f -p 8983 -s /tmp/home
> {noformat}
> In another terminal, create a collection of movies, with no shards and no 
> replication, and initialize it:
> {noformat}
> bin/solr create -c films
> curl -X POST -H 'Content-type:application/json' --data-binary '{"add-field": 
> {"name":"name", "type":"text_general", "multiValued":false, "stored":true}}' 
> http://localhost:8983/solr/films/schema
> curl -X POST -H 'Content-type:application/json' --data-binary 
> '{"add-copy-field" : {"source":"*","dest":"_text_"}}' 
> http://localhost:8983/solr/films/schema
> ./bin/post -c films example/films/films.json
> {noformat}
>Reporter: Johannes Kloos
>Assignee: Munendra S N
>Priority: Minor
>  Labels: diffblue, newdev
> Attachments: SOLR-13199.patch, home.zip
>
>
> Requesting the following URL causes Solr to return an HTTP 500 error response:
> {noformat}
> http://localhost:8983/solr/films/select?fl=[child%20parentFilter=ge]&q=*:*
> {noformat}
> The error response seems to be caused by the following uncaught exception:
> {noformat}
> java.lang.NullPointerException
> at 
> org.apache.solr.response.transform.ChildDocTransformer.transform(ChildDocTransformer.java:92)
> at org.apache.solr.response.DocsStreamer.next(DocsStreamer.java:103)
> at org.apache.solr.response.DocsStreamer.next(DocsStreamer.java:1)
> at 
> org.apache.solr.response.TextResponseWriter.writeDocuments(TextResponseWriter.java:184)
> at 
> org.apache.solr.response.TextResponseWriter.writeVal(TextResponseWriter.java:136)
> at 
> org.apache.solr.common.util.JsonTextWriter.writeNamedListAsMapWithDups(JsonTextWriter.java:386)
> at 
> org.apache.solr.common.util.JsonTextWriter.writeNamedList(JsonTextWriter.java:292)
> at org.apache.solr.response.JSONWriter.writeResponse(JSONWriter.java:73)
> {noformat}
> In ChildDocTransformer.transform, we have the following lines:
> {noformat}
> final BitSet segParentsBitSet = parentsFilter.getBitSet(leafReaderContext);
> final int segPrevRootId = segRootId==0? -1: 
> segParentsBitSet.prevSetBit(segRootId - 1); // can return -1 and that's okay
> {noformat}
> But getBitSet can return null if the set of DocIds is empty:
> {noformat}
> return docIdSet == DocIdSet.EMPTY ? null : ((BitDocIdSet) docIdSet).bits();
> {noformat}
> We found this bug using [Diffblue Microservices 
> Testing|https://www.diffblue.com/labs/?utm_source=solr-br]. Find more 
> information on this [fuzz testing 
> campaign|https://www.diffblue.com/blog/2018/12/19/diffblue-microservice-testing-a-sneak-peek-at-our-early-product-and-results?utm_source=solr-br].



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.ap

[jira] [Commented] (LUCENE-8103) QueryValueSource should use TwoPhaseIterator

2020-03-10 Thread David Smiley (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-8103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17056624#comment-17056624
 ] 

David Smiley commented on LUCENE-8103:
--

{{ant test -Dtestcase=TestValueSources 
-Dtests.method=testQueryWrapedFuncWrapedQuery -Dtests.seed=625CF512BDD7BD01 
-Dtests.slow=true -Dtests.badapples=true -Dtests.locale=fr-CA 
-Dtests.timezone=America/Phoenix -Dtests.asserts=true 
-Dtests.file.encoding=UTF-8}}

 

{{java.lang.AssertionError: ITERATINGjava.lang.AssertionError: ITERATING}}
{{ at __randomizedtesting.SeedInfo.seed([625CF512BDD7BD01:36E49DF7D50086EB]:0) 
at org.apache.lucene.search.AssertingScorer$3.matches(AssertingScorer.java:235) 
at 
org.apache.lucene.queries.function.valuesource.QueryDocValues.exists(QueryValueSource.java:156)
 at 
org.apache.lucene.queries.function.valuesource.QueryDocValues.floatVal(QueryValueSource.java:129)
 at 
org.apache.lucene.queries.function.FunctionQuery$AllScorer.score(FunctionQuery.java:120)
 at org.apache.lucene.search.AssertingScorer.score(AssertingScorer.java:102)}}

> QueryValueSource should use TwoPhaseIterator
> 
>
> Key: LUCENE-8103
> URL: https://issues.apache.org/jira/browse/LUCENE-8103
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/other
>Reporter: David Smiley
>Priority: Minor
> Attachments: LUCENE-8103.patch
>
>
> QueryValueSource (in "queries" module) is a ValueSource representation of a 
> Query; the score is the value.  It ought to try to use a TwoPhaseIterator 
> from the query if it can be offered. This will prevent possibly expensive 
> advancing beyond documents that we aren't interested in.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] dsmiley commented on a change in pull request #1332: SOLR-14254: Docs for text tagger: FST50 trade-off

2020-03-10 Thread GitBox

dsmiley commented on a change in pull request #1332: SOLR-14254: Docs for text 
tagger: FST50 trade-off
URL: https://github.com/apache/lucene-solr/pull/1332#discussion_r390721357
 
 

 ##
 File path: solr/solr-ref-guide/src/the-tagger-handler.adoc
 ##
 @@ -271,11 +271,12 @@ The response should be this (the QTime may vary):
   }}
 
 
-== Tagger Tips
+== Tagger Performance Tips
 
-Performance Tips:
-
-* Follow the recommended configuration field settings, especially 
`postingsFormat=FST50`.
+* Follow the recommended configuration field settings above.
+Additionally, for the best tagger performance, set `postingsFormat=FST50`.
+However, non-default postings formats have no backwards-compatibility 
guarantees, and so if you upgrade Solr then you may find a nasty exception on 
startup as it fails to read the older index.
+If the input text to be tagged is small (e.g. you are tagging queries or 
tweets) then the postings format choice isn't as important.
 
 Review comment:
   FYI the SolrTextTagger was benchmarked a couple years ago to compare the old 
"Memory" PF and FST50 -- 
https://github.com/OpenSextant/SolrTextTagger/issues/38#issuecomment-385597248  
 we never tried the default (blocktree).  I believe the input data in that 
experiment were whole articles, and thus would be impacted by the postings 
format choice.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Comment Edited] (SOLR-10112) Prevent DBQs from getting reordered

2020-03-10 Thread Eugene Tenkaev (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-10112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17056576#comment-17056576
 ] 

Eugene Tenkaev edited comment on SOLR-10112 at 3/11/20, 1:44 AM:
-

[~ichattopadhyaya], [~rwhaddad] please help! We need to figure out root problem 
in our case. I see you are pretty deep in this question, so decided to ask for 
your help.

We are sending intensely updates that contains partial modification of fields 
in document using *set*.
Also, we're sending delete by query(DBQ) pretty intensive, we use it for 
garbage collection of documents that don't have field *enabled* with value 
*true*.

So if we want doc to be removed we set field *enabled:false*.
Our delete by query(DBQ) looks like:
{code}
-enabled:true
{code}

Documents represent for example phone model. After some time it can be added 
back to search index and this doc will contain:
{code}
enabled:true
{code}

In parallel, we're enriching document by additional info(enriching 
concurrently, info comes from different sources), but we always make sure that 
update with *enabled:true* will be sent first before *other* updates.

We start observer that we lose some *other* updates(updates that comes after 
first initial update with *enabled:true*). 
Seems like they not applied, can DBQs create such problems?

Should we send DBQ's where we mention date and delete only documents that not 
have been updated for a day for example?

Thank you, I will be very appreciated for help!


was (Author: hronom):
[~ichattopadhyaya], [~rwhaddad] please help! We need to figure out root problem 
in our case. I see you are pretty deep in this question, so decided to ask for 
your help.

We are sending intensely updates that contains partial modification of fields 
in document using *set*.
Also, we're sending delete by query(DBQ) pretty intensive, we use it for 
garbage collection of documents that don't have field *enabled* with value 
*true*.

So if we want doc to be removed we set field *enabled:false*.
Our delete by query(DBQ) looks like:
{code}
-enabled:true
{code}

Documents represent for example phone model. After some time it can be added 
back to search index and this doc will contain:
{code}
enabled:true
{code}

In parallel, we're enriching document by additional info, but we always make 
sure that update with *enabled:true* will be sent first before *other* updates.

We start observer that we lose some *other* updates(updates that comes after 
first initial update with *enabled:true*). 
Seems like they not applied, can DBQs create such problems?

Should we send DBQ's where we mention date and delete only documents that not 
have been updated for a day for example?

Thank you, I will be very appreciated for help!

> Prevent DBQs from getting reordered
> ---
>
> Key: SOLR-10112
> URL: https://issues.apache.org/jira/browse/SOLR-10112
> Project: Solr
>  Issue Type: Bug
>Reporter: Ishan Chattopadhyaya
>Priority: Major
>
> Reordered DBQs are problematic for various reasons. We might be able to 
> prevent DBQs from getting re-ordered by making sure, at the leader, that all 
> updates before a DBQ have been written successfully on the replicas, and 
> block all updates after the DBQ until the DBQ is written successfully at the 
> replicas.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Comment Edited] (SOLR-10112) Prevent DBQs from getting reordered

2020-03-10 Thread Eugene Tenkaev (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-10112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17056576#comment-17056576
 ] 

Eugene Tenkaev edited comment on SOLR-10112 at 3/11/20, 1:43 AM:
-

[~ichattopadhyaya], [~rwhaddad] please help! We need to figure out root problem 
in our case. I see you are pretty deep in this question, so decided to ask for 
your help.

We are sending intensely updates that contains partial modification of fields 
in document using *set*.
Also, we're sending delete by query(DBQ) pretty intensive, we use it for 
garbage collection of documents that don't have field *enabled* with value 
*true*.

So if we want doc to be removed we set field *enabled:false*.
Our delete by query(DBQ) looks like:
{code}
-enabled:true
{code}

Documents represent for example phone model. After some time it can be added 
back to search index and this doc will contain:
{code}
enabled:true
{code}

In parallel, we're enriching document by additional info, but we always make 
sure that update with *enabled:true* will be sent first before *other* updates.

We start observer that we lose some *other* updates(updates that comes after 
first initial update with *enabled:true*). 
Seems like they not applied, can DBQs create such problems?

Should we send DBQ's where we mention date and delete only documents that not 
have been updated for a day for example?

Thank you, I will be very appreciated for help!


was (Author: hronom):
[~ichattopadhyaya], [~rwhaddad] please help! We need to figure out root problem 
in our case.

We are sending intensely updates that contains partial modification of fields 
in document using *set*.
Also, we're sending delete by query(DBQ) pretty intensive, we use it for 
garbage collection of documents that don't have field *enabled* with value 
*true*.

So if we want doc to be removed we set field *enabled:false*.
Our delete by query(DBQ) looks like:
{code}
-enabled:true
{code}

Documents represent for example phone model. After some time it can be added 
back to search index and this doc will contain:
{code}
enabled:true
{code}

In parallel, we're enriching document by additional info, but we always make 
sure that update with *enabled:true* will be sent first before *other* updates.

We start observer that we lose some *other* updates(updates that comes after 
first initial update with *enabled:true*). 
Seems like they not applied, can DBQs create such problems?

Should we send DBQ's where we mention date and delete only documents that not 
have been updated for a day for example?

Thank you, I will be very appreciated for help!

> Prevent DBQs from getting reordered
> ---
>
> Key: SOLR-10112
> URL: https://issues.apache.org/jira/browse/SOLR-10112
> Project: Solr
>  Issue Type: Bug
>Reporter: Ishan Chattopadhyaya
>Priority: Major
>
> Reordered DBQs are problematic for various reasons. We might be able to 
> prevent DBQs from getting re-ordered by making sure, at the leader, that all 
> updates before a DBQ have been written successfully on the replicas, and 
> block all updates after the DBQ until the DBQ is written successfully at the 
> replicas.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-10112) Prevent DBQs from getting reordered

2020-03-10 Thread Eugene Tenkaev (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-10112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17056576#comment-17056576
 ] 

Eugene Tenkaev commented on SOLR-10112:
---

[~ichattopadhyaya], [~rwhaddad] please help! We need to figure out root problem 
in our case.

We are sending intensely updates that contains partial modification of fields 
in document using *set*.
Also, we're sending delete by query(DBQ) pretty intensive, we use it for 
garbage collection of documents that don't have field *enabled* with value 
*true*.

So if we want doc to be removed we set field *enabled:false*.
Our delete by query(DBQ) looks like:
{code}
-enabled:true
{code}

Documents represent for example phone model. After some time it can be added 
back to search index and this doc will contain:
{code}
enabled:true
{code}

In parallel, we're enriching document by additional info, but we always make 
sure that update with *enabled:true* will be sent first before *other* updates.

We start observer that we lose some *other* updates(updates that comes after 
first initial update with *enabled:true*). 
Seems like they not applied, can DBQs create such problems?

Should we send DBQ's where we mention date and delete only documents that not 
have been updated for a day for example?

Thank you, I will be very appreciated for help!

> Prevent DBQs from getting reordered
> ---
>
> Key: SOLR-10112
> URL: https://issues.apache.org/jira/browse/SOLR-10112
> Project: Solr
>  Issue Type: Bug
>Reporter: Ishan Chattopadhyaya
>Priority: Major
>
> Reordered DBQs are problematic for various reasons. We might be able to 
> prevent DBQs from getting re-ordered by making sure, at the leader, that all 
> updates before a DBQ have been written successfully on the replicas, and 
> block all updates after the DBQ until the DBQ is written successfully at the 
> replicas.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] noblepaul commented on issue #1335: SOLR-14316 Remove unchecked type conversion warning in JavaBinCodec's readMapEntry's equals() method

2020-03-10 Thread GitBox

noblepaul commented on issue #1335: SOLR-14316 Remove unchecked type conversion 
warning in JavaBinCodec's readMapEntry's equals() method
URL: https://github.com/apache/lucene-solr/pull/1335#issuecomment-597392672
 
 
   LGTM
   
   a test can help


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] dsmiley commented on issue #1191: SOLR-14197 Reduce API of SolrResourceLoader

2020-03-10 Thread GitBox

dsmiley commented on issue #1191: SOLR-14197 Reduce API of SolrResourceLoader
URL: https://github.com/apache/lucene-solr/pull/1191#issuecomment-597376064
 
 
   I propose that in 8.x, I leave the methods that were moved but mark them as 
deprecated.  Otherwise, the rest of the changes are rather internal and can be 
applied to 8x, thus helping reduce possible merge conflicts for our future work.
   
   I'll put an entry in Other Changes for 9.x: "SolrResourceLoader remove 
deprecated methods (numerous)."  Simple and succinct.  The commit message will 
have lots of info.  On 8x I'll say "SolrResourceLoader: marked many methods as 
deprecated, and in some cases rerouted exiting logic to avoid them".  Since 
this is very internal stuff, people can go see for themselves.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-14310) Expose solr logs with basic filters via HTTP

2020-03-10 Thread Noble Paul (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-14310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17056519#comment-17056519
 ] 

Noble Paul commented on SOLR-14310:
---

My idea is to just read last n lines from the file and dump it in the API

> Expose solr logs with basic filters via HTTP
> 
>
> Key: SOLR-14310
> URL: https://issues.apache.org/jira/browse/SOLR-14310
> Project: Solr
>  Issue Type: Sub-task
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Noble Paul
>Priority: Major
>
> path {{/api/node/tools/logs}}
> params
> * lines : (default 100) no:of lines to be shown. the most recent lines are 
> shown
> * collection : (multivalued) filter log lines by collections
> * shard : (multivalued) filter log lines by shard name
> * core: (multivalued)filter log lines by cores
> * startTime: timestamp start
> * endTime: Timestamp end
> * className :(multivalued) Name of the class in logs
> * logLevel: (multivalued) . INFO/DEBUG etc
> * threadNamePrefix: (multivalued) : eg: qtp ,searchExecutor , 
> solrHandlerExecutor etc
> The output will be in plain text format



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] janhoy merged pull request #1322: Remove some unused lines from addBackcompatIndexes.py related to svn

2020-03-10 Thread GitBox

janhoy merged pull request #1322: Remove some unused lines from 
addBackcompatIndexes.py related to svn
URL: https://github.com/apache/lucene-solr/pull/1322
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-14310) Expose solr logs with basic filters via HTTP

2020-03-10 Thread Jira



[ 
https://issues.apache.org/jira/browse/SOLR-14310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17056517#comment-17056517
 ] 

Jan Høydahl commented on SOLR-14310:


I agree that anything but simple show last-N lines is overkill. Did you plan to 
serve this from the actual file system or from some memory buffer? Remember 
that for some deployments like containers, Solr may very well log to stdout and 
not store any log files on disk locally.

> Expose solr logs with basic filters via HTTP
> 
>
> Key: SOLR-14310
> URL: https://issues.apache.org/jira/browse/SOLR-14310
> Project: Solr
>  Issue Type: Sub-task
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Noble Paul
>Priority: Major
>
> path {{/api/node/tools/logs}}
> params
> * lines : (default 100) no:of lines to be shown. the most recent lines are 
> shown
> * collection : (multivalued) filter log lines by collections
> * shard : (multivalued) filter log lines by shard name
> * core: (multivalued)filter log lines by cores
> * startTime: timestamp start
> * endTime: Timestamp end
> * className :(multivalued) Name of the class in logs
> * logLevel: (multivalued) . INFO/DEBUG etc
> * threadNamePrefix: (multivalued) : eg: qtp ,searchExecutor , 
> solrHandlerExecutor etc
> The output will be in plain text format



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] msokolov commented on issue #1316: LUCENE-8929 parallel early termination in TopFieldCollector using minmin score

2020-03-10 Thread GitBox

msokolov commented on issue #1316: LUCENE-8929 parallel early termination in 
TopFieldCollector using minmin score
URL: https://github.com/apache/lucene-solr/pull/1316#issuecomment-597359816
 
 
   I've done pretty extensive performance testing, results are good, and unit 
tests are passing, but I would really appreciate some eyeballs if anyone has 
the time because this is a pretty sensitive area, and the implementation I've 
tested extensively in the wild is not exactly the same as this one, although it 
implements the same strategy. 
   
   I'm also particularly interested, @jimczi, if you can comment on whether the 
`MaxScoreAccumulator` still provides additional benefit alongside this opto? I 
haven't tried removing it, but I wonder if it might be doing something 
redundant now - I'm not totally clear what impact setMinCompetitiveScore will 
have. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] sarowe commented on issue #1322: Remove some unused lines from addBackcompatIndexes.py related to svn

2020-03-10 Thread GitBox

sarowe commented on issue #1322: Remove some unused lines from 
addBackcompatIndexes.py related to svn
URL: https://github.com/apache/lucene-solr/pull/1322#issuecomment-597354343
 
 
   +1 LGTM


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-14306) Refactor coordination code into separate module and evaluate using Curator

2020-03-10 Thread David Smiley (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-14306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17056457#comment-17056457
 ] 

David Smiley commented on SOLR-14306:
-

Overall I'm really encouraged to finally seem some momentum in this direction.  
Thanks so much Tomas!

> Refactor coordination code into separate module and evaluate using Curator
> --
>
> Key: SOLR-14306
> URL: https://issues.apache.org/jira/browse/SOLR-14306
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrCloud
>Reporter: Tomas Eduardo Fernandez Lobbe
>Priority: Major
>
> This Jira issue is to discuss two changes that unfortunately are difficult to 
> address separately
>  # Separate all ZooKeeper coordination logic into it’s own module, that can 
> be tested in isolation
>  # Evaluate using Apache Curator for coordination instead of our own logic.
> I drafted a 
> [SIP|https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=148640472],
>  but this is very much WIP, I’d like to hear opinions before I spend too much 
> time on something people hates.
> From the initial draft of the SIP:
> {quote}The main goal of this change is to allow better testing of the 
> different ZooKeeper interactions related to coordination (leader election, 
> queues, etc). There are already some abstractions in place for lower level 
> operations (set-data, get-data, etc, see DistribStateManager), so the idea is 
> to have a new, related abstraction named CoordinationManager, where we could 
> have some higher level coordination-related classes, like LeaderRunner 
> (Overseer), LeaderLatch (for shard leaders), etc.  Curator comes into place 
> because, in order to refactor the existing code into these new abstractions, 
> we’d have to rework much of it, so we could instead consider using Curator, a 
> library that was mentioned in the past many times. While I don’t think this 
> is required, It would make this transition and our code simpler (from what I 
> could see, however, input from people with more Curator experience would be 
> greatly appreciated).
>  While it would be out of the scope of this change, If the 
> abstractions/interfaces are correctly designed, this could lead to, in the 
> future, be able to use something other than ZooKeeper for coordination, 
> either etcd or maybe even some in-memory replacement for tests.
> {quote}
> There are still many open questions, and many questions I still don’t know 
> we’ll have, but please, let me know if you have any early feedback, specially 
> if you’ve worked with Curator in the past.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-14310) Expose solr logs with basic filters via HTTP

2020-03-10 Thread David Smiley (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-14310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17056435#comment-17056435
 ] 

David Smiley commented on SOLR-14310:
-

I agree with Ishan; I think this is scope-creep bloat.

> Expose solr logs with basic filters via HTTP
> 
>
> Key: SOLR-14310
> URL: https://issues.apache.org/jira/browse/SOLR-14310
> Project: Solr
>  Issue Type: Sub-task
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Noble Paul
>Priority: Major
>
> path {{/api/node/tools/logs}}
> params
> * lines : (default 100) no:of lines to be shown. the most recent lines are 
> shown
> * collection : (multivalued) filter log lines by collections
> * shard : (multivalued) filter log lines by shard name
> * core: (multivalued)filter log lines by cores
> * startTime: timestamp start
> * endTime: Timestamp end
> * className :(multivalued) Name of the class in logs
> * logLevel: (multivalued) . INFO/DEBUG etc
> * threadNamePrefix: (multivalued) : eg: qtp ,searchExecutor , 
> solrHandlerExecutor etc
> The output will be in plain text format



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-14173) Ref Guide Redesign

2020-03-10 Thread David Smiley (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-14173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17056433#comment-17056433
 ] 

David Smiley commented on SOLR-14173:
-

I like the gray-themed one better than the blue/orange combo.

> Ref Guide Redesign
> --
>
> Key: SOLR-14173
> URL: https://issues.apache.org/jira/browse/SOLR-14173
> Project: Solr
>  Issue Type: Improvement
>  Components: documentation
>Reporter: Cassandra Targett
>Assignee: Cassandra Targett
>Priority: Major
> Attachments: SOLR-14173.patch, blue-left-nav.png, gray-left-nav.png
>
>
> The current design of the Ref Guide was essentially copied from a 
> Jekyll-based documentation theme 
> (https://idratherbewriting.com/documentation-theme-jekyll/), which had a 
> couple important benefits for that time:
> * It was well-documented and since I had little experience with Jekyll and 
> its Liquid templates and since I was the one doing it, I wanted to make it as 
> easy on myself as possible
> * It was designed for documentation specifically so took care of all the 
> things like inter-page navigation, etc.
> * It helped us get from Confluence to our current system quickly
> It had some drawbacks, though:
> * It wasted a lot of space on the page
> * The theme was built for Markdown files, so did not take advantage of the 
> features of the {{jekyll-asciidoc}} plugin we use (the in-page TOC being one 
> big example - the plugin could create it at build time, but the theme 
> included JS to do it as the page loads, so we use the JS)
> * It had a lot of JS and overlapping CSS files. While it used Bootstrap it 
> used a customized CSS on top of it for theming that made modifications 
> complex (it was hard to figure out how exactly a change would behave)
> * With all the stuff I'd changed in my bumbling way just to get things to 
> work back then, I broke a lot of the stuff Bootstrap is supposed to give us 
> in terms of responsiveness and making the Guide usable even on smaller screen 
> sizes.
> After upgrading the Asciidoctor components in SOLR-12786 and stopping the PDF 
> (SOLR-13782), I wanted to try to set us up for a more flexible system. We 
> need it for things like Joel's work on the visual guide for streaming 
> expressions (SOLR-13105), and in order to implement other ideas we might have 
> on how to present information in the future.
> I view this issue as a phase 1 of an overall redesign that I've already 
> started in a local branch. I'll explain in a comment the changes I've already 
> made, and will use this issue to create and push a branch where we can 
> discuss in more detail.
> Phase 1 here will be under-the-hood CSS/JS changes + overall page layout 
> changes.
> Phase 2 (issue TBD) will be a wholesale re-organization of all the pages of 
> the Guide.
> Phase 3 (issue TBD) will explore moving us from Jekyll to another static site 
> generator that is better suited for our content format, file types, and build 
> conventions.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Resolved] (SOLR-14318) Missing dependency on commons-lang in solr-cell 8.4.1

2020-03-10 Thread Kevin Risden (Jira)



 [ 
https://issues.apache.org/jira/browse/SOLR-14318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kevin Risden resolved SOLR-14318.
-
  Assignee: Kevin Risden
Resolution: Information Provided

Marking this as information provided. Thanks for the feedback 
[~markus.guenther] let us know if there are any other concerns.

> Missing dependency on commons-lang in solr-cell 8.4.1
> -
>
> Key: SOLR-14318
> URL: https://issues.apache.org/jira/browse/SOLR-14318
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: contrib - Solr Cell (Tika extraction)
>Affects Versions: 8.4.1
>Reporter: Markus Günther
>Assignee: Kevin Risden
>Priority: Minor
>
> During a migration from Solr 7.x to Solr 8.4.1 we noticed that the 
> commons-lang:commons-lang:2.6 dependency has been removed, and thus, no 
> longer is part of org.apache.solr:solr-cell. solr-cell however comes bundled 
> with Apache Tika Parsers (org.apache.tika:tika-parsers) in version 1.19.1 
> which - although it is not an explicit dependency - does require 
> commons-lang:commons-lang:2.6.
> This raises an issue when trying to extract the content from Microsoft Access 
> database files using Tika. See the stacktrace below.
> {code:java}
> java.lang.NoClassDefFoundError: 
> org/apache/commons/lang/ObjectUtilsjava.lang.NoClassDefFoundError: 
> org/apache/commons/lang/ObjectUtils at 
> com.healthmarketscience.jackcess.util.SimpleColumnMatcher.equals(SimpleColumnMatcher.java:74)
>  at 
> com.healthmarketscience.jackcess.util.SimpleColumnMatcher.matches(SimpleColumnMatcher.java:46)
>  at 
> com.healthmarketscience.jackcess.util.CaseInsensitiveColumnMatcher.matches(CaseInsensitiveColumnMatcher.java:49)
>  at 
> com.healthmarketscience.jackcess.impl.CursorImpl.currentRowMatchesImpl(CursorImpl.java:571)
>  at 
> com.healthmarketscience.jackcess.impl.CursorImpl.findAnotherRowImpl(CursorImpl.java:627)
>  at 
> com.healthmarketscience.jackcess.impl.CursorImpl.findAnotherRow(CursorImpl.java:517)
>  at 
> com.healthmarketscience.jackcess.impl.CursorImpl.findFirstRow(CursorImpl.java:494)
>  at 
> com.healthmarketscience.jackcess.impl.DatabaseImpl$FallbackTableFinder.findRow(DatabaseImpl.java:2376)
>  at 
> com.healthmarketscience.jackcess.impl.DatabaseImpl$TableFinder.findObjectId(DatabaseImpl.java:2176)
>  at 
> com.healthmarketscience.jackcess.impl.DatabaseImpl.readSystemCatalog(DatabaseImpl.java:879)
>  at 
> com.healthmarketscience.jackcess.impl.DatabaseImpl.(DatabaseImpl.java:534)
>  at 
> com.healthmarketscience.jackcess.impl.DatabaseImpl.open(DatabaseImpl.java:401)
>  at 
> com.healthmarketscience.jackcess.DatabaseBuilder.open(DatabaseBuilder.java:252)
>  at 
> org.apache.tika.parser.microsoft.JackcessParser.parse(JackcessParser.java:94) 
> at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280) at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280) at 
> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143) at 
> org.apache.tika.parser.DelegatingParser.parse(DelegatingParser.java:72) at 
> org.apache.tika.extractor.ParsingEmbeddedDocumentExtractor.parseEmbedded(ParsingEmbeddedDocumentExtractor.java:102)
>  at 
> org.apache.tika.parser.pkg.PackageParser.parseEntry(PackageParser.java:350) 
> at org.apache.tika.parser.pkg.PackageParser.parse(PackageParser.java:287) at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280) at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280) at 
> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143) at 
> org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:228)
>  at 
> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:68)
>  at 
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:211)
>  at org.apache.solr.core.SolrCore.execute(SolrCore.java:2596) at 
> org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:799) at 
> org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:578) at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:419)
>  at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:351)
>  at 
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1602)
>  at 
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:540) at 
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:146) 
> at 
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548) 
> at 
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
>  at 
> org.eclipse.j

[jira] [Commented] (SOLR-14054) Upgrade Tika to 1.23

2020-03-10 Thread Tim Allison (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-14054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17056413#comment-17056413
 ] 

Tim Allison commented on SOLR-14054:


Would something like this be acceptable?

https://stackoverflow.com/a/24497206

> Upgrade Tika to 1.23
> 
>
> Key: SOLR-14054
> URL: https://issues.apache.org/jira/browse/SOLR-14054
> Project: Solr
>  Issue Type: Task
>  Components: contrib - DataImportHandler
>Reporter: Tim Allison
>Assignee: Tim Allison
>Priority: Minor
> Fix For: 8.5
>
> Attachments: test-documents.7z, 
> tika-integration-example-9.0.0-SNAPSHOT.tgz
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> We just released 1.23.  Let's upgrade Tika.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Comment Edited] (SOLR-14054) Upgrade Tika to 1.23

2020-03-10 Thread Tim Allison (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-14054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17056403#comment-17056403
 ] 

Tim Allison edited comment on SOLR-14054 at 3/10/20, 8:54 PM:
--

We use xerces 2.12.0 which brings in xml-apis 1.4.01, which is needed by Java 
8...see above.  In master, we get rid of xml-apis because we don't need it with 
Java > 8.  

Any recommendations for a fix in 8.x when building with Java > 8?

Is there an ant/ivy version of maven's profiles, activated by Java > 8, e.g.: 
https://github.com/apache/pdfbox/blob/trunk/parent/pom.xml#L176 ?


was (Author: talli...@mitre.org):
We use xerces 2.12.0 which brings in xml-apis 1.4.01, which is needed by Java 
8...see above.  In master, we get rid of xml-apis because we don't need it with 
Java > 8.  

Any recommendations for a fix in 8.x when building with Java > 8?

> Upgrade Tika to 1.23
> 
>
> Key: SOLR-14054
> URL: https://issues.apache.org/jira/browse/SOLR-14054
> Project: Solr
>  Issue Type: Task
>  Components: contrib - DataImportHandler
>Reporter: Tim Allison
>Assignee: Tim Allison
>Priority: Minor
> Fix For: 8.5
>
> Attachments: test-documents.7z, 
> tika-integration-example-9.0.0-SNAPSHOT.tgz
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> We just released 1.23.  Let's upgrade Tika.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Comment Edited] (SOLR-14054) Upgrade Tika to 1.23

2020-03-10 Thread Tim Allison (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-14054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17056403#comment-17056403
 ] 

Tim Allison edited comment on SOLR-14054 at 3/10/20, 8:45 PM:
--

We use xerces 2.12.0 which brings in xml-apis 1.4.01, which is needed by Java 
8...see above.  In master, we get rid of xml-apis because we don't need it with 
Java > 8.  

Any recommendations for a fix in 8.x when building with Java > 8?


was (Author: talli...@mitre.org):
We use xerces 2.12.0 which brings in xml-apis 1.4.01, which is needed by Java 
8...see above.  In master, we get rid of xml-apis because we don't need it with 
Java > 8.  

Any recommendations for a fix in 8.x?

> Upgrade Tika to 1.23
> 
>
> Key: SOLR-14054
> URL: https://issues.apache.org/jira/browse/SOLR-14054
> Project: Solr
>  Issue Type: Task
>  Components: contrib - DataImportHandler
>Reporter: Tim Allison
>Assignee: Tim Allison
>Priority: Minor
> Fix For: 8.5
>
> Attachments: test-documents.7z, 
> tika-integration-example-9.0.0-SNAPSHOT.tgz
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> We just released 1.23.  Let's upgrade Tika.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-14054) Upgrade Tika to 1.23

2020-03-10 Thread Tim Allison (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-14054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17056403#comment-17056403
 ] 

Tim Allison commented on SOLR-14054:


We use xerces 2.12.0 which brings in xml-apis 1.4.01, which is needed by Java 
8...see above.  In master, we get rid of xml-apis because we don't need it with 
Java > 8.  

Any recommendations for a fix in 8.x?

> Upgrade Tika to 1.23
> 
>
> Key: SOLR-14054
> URL: https://issues.apache.org/jira/browse/SOLR-14054
> Project: Solr
>  Issue Type: Task
>  Components: contrib - DataImportHandler
>Reporter: Tim Allison
>Assignee: Tim Allison
>Priority: Minor
> Fix For: 8.5
>
> Attachments: test-documents.7z, 
> tika-integration-example-9.0.0-SNAPSHOT.tgz
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> We just released 1.23.  Let's upgrade Tika.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-14173) Ref Guide Redesign

2020-03-10 Thread Cassandra Targett (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-14173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17056396#comment-17056396
 ] 

Cassandra Targett commented on SOLR-14173:
--

New branch created and pushed (without the color changes I mentioned 
yesterday...still waiting for some feedback there): named 
{{jira/solr-14173-2}}, in GH at: 
https://github.com/apache/lucene-solr/tree/jira/solr-14173-2

> Ref Guide Redesign
> --
>
> Key: SOLR-14173
> URL: https://issues.apache.org/jira/browse/SOLR-14173
> Project: Solr
>  Issue Type: Improvement
>  Components: documentation
>Reporter: Cassandra Targett
>Assignee: Cassandra Targett
>Priority: Major
> Attachments: SOLR-14173.patch, blue-left-nav.png, gray-left-nav.png
>
>
> The current design of the Ref Guide was essentially copied from a 
> Jekyll-based documentation theme 
> (https://idratherbewriting.com/documentation-theme-jekyll/), which had a 
> couple important benefits for that time:
> * It was well-documented and since I had little experience with Jekyll and 
> its Liquid templates and since I was the one doing it, I wanted to make it as 
> easy on myself as possible
> * It was designed for documentation specifically so took care of all the 
> things like inter-page navigation, etc.
> * It helped us get from Confluence to our current system quickly
> It had some drawbacks, though:
> * It wasted a lot of space on the page
> * The theme was built for Markdown files, so did not take advantage of the 
> features of the {{jekyll-asciidoc}} plugin we use (the in-page TOC being one 
> big example - the plugin could create it at build time, but the theme 
> included JS to do it as the page loads, so we use the JS)
> * It had a lot of JS and overlapping CSS files. While it used Bootstrap it 
> used a customized CSS on top of it for theming that made modifications 
> complex (it was hard to figure out how exactly a change would behave)
> * With all the stuff I'd changed in my bumbling way just to get things to 
> work back then, I broke a lot of the stuff Bootstrap is supposed to give us 
> in terms of responsiveness and making the Guide usable even on smaller screen 
> sizes.
> After upgrading the Asciidoctor components in SOLR-12786 and stopping the PDF 
> (SOLR-13782), I wanted to try to set us up for a more flexible system. We 
> need it for things like Joel's work on the visual guide for streaming 
> expressions (SOLR-13105), and in order to implement other ideas we might have 
> on how to present information in the future.
> I view this issue as a phase 1 of an overall redesign that I've already 
> started in a local branch. I'll explain in a comment the changes I've already 
> made, and will use this issue to create and push a branch where we can 
> discuss in more detail.
> Phase 1 here will be under-the-hood CSS/JS changes + overall page layout 
> changes.
> Phase 2 (issue TBD) will be a wholesale re-organization of all the pages of 
> the Guide.
> Phase 3 (issue TBD) will explore moving us from Jekyll to another static site 
> generator that is better suited for our content format, file types, and build 
> conventions.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-14319) Add ability to select replicatype to admin ui collection creation

2020-03-10 Thread Lucene/Solr QA (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-14319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17056392#comment-17056392
 ] 

Lucene/Solr QA commented on SOLR-14319:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | {color:red} patch {color} | {color:red}  0m  5s{color} 
| {color:red} SOLR-14319 does not apply to master. Rebase required? Wrong 
Branch? See 
https://wiki.apache.org/solr/HowToContribute#Creating_the_patch_file for help. 
{color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | SOLR-14319 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12996319/SOLR-14319.patch |
| Console output | 
https://builds.apache.org/job/PreCommit-SOLR-Build/708/console |
| Powered by | Apache Yetus 0.7.0   http://yetus.apache.org |


This message was automatically generated.



> Add ability to select replicatype to admin ui collection creation
> -
>
> Key: SOLR-14319
> URL: https://issues.apache.org/jira/browse/SOLR-14319
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: Admin UI
>Affects Versions: 7.7.2
>Reporter: Richard Goodman
>Priority: Minor
> Attachments: SOLR-14319.patch, Screenshot 2020-03-10 at 16.26.28.png, 
> Screenshot 2020-03-10 at 16.33.52.png
>
>
> This is just a small patch that allows you to select the replica type when 
> creating a collection. I'm aware that a possible strategy for replica types 
> of a collection can be {{'tlog + pull'}}, because of this, I'm open to 
> feedback on a different way to display this feature. Currently I have a drop 
> down box defining the types of replicas, with it defaulting to nrt, and it 
> will take the replication factor specified and will that many replicas of a 
> given type.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-14318) Missing dependency on commons-lang in solr-cell 8.4.1

2020-03-10 Thread Tim Allison (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-14318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17056344#comment-17056344
 ] 

Tim Allison commented on SOLR-14318:


Y.  Confirmed we removed commons-lang from Tika in 1.23 so 8.5.

> Missing dependency on commons-lang in solr-cell 8.4.1
> -
>
> Key: SOLR-14318
> URL: https://issues.apache.org/jira/browse/SOLR-14318
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: contrib - Solr Cell (Tika extraction)
>Affects Versions: 8.4.1
>Reporter: Markus Günther
>Priority: Minor
>
> During a migration from Solr 7.x to Solr 8.4.1 we noticed that the 
> commons-lang:commons-lang:2.6 dependency has been removed, and thus, no 
> longer is part of org.apache.solr:solr-cell. solr-cell however comes bundled 
> with Apache Tika Parsers (org.apache.tika:tika-parsers) in version 1.19.1 
> which - although it is not an explicit dependency - does require 
> commons-lang:commons-lang:2.6.
> This raises an issue when trying to extract the content from Microsoft Access 
> database files using Tika. See the stacktrace below.
> {code:java}
> java.lang.NoClassDefFoundError: 
> org/apache/commons/lang/ObjectUtilsjava.lang.NoClassDefFoundError: 
> org/apache/commons/lang/ObjectUtils at 
> com.healthmarketscience.jackcess.util.SimpleColumnMatcher.equals(SimpleColumnMatcher.java:74)
>  at 
> com.healthmarketscience.jackcess.util.SimpleColumnMatcher.matches(SimpleColumnMatcher.java:46)
>  at 
> com.healthmarketscience.jackcess.util.CaseInsensitiveColumnMatcher.matches(CaseInsensitiveColumnMatcher.java:49)
>  at 
> com.healthmarketscience.jackcess.impl.CursorImpl.currentRowMatchesImpl(CursorImpl.java:571)
>  at 
> com.healthmarketscience.jackcess.impl.CursorImpl.findAnotherRowImpl(CursorImpl.java:627)
>  at 
> com.healthmarketscience.jackcess.impl.CursorImpl.findAnotherRow(CursorImpl.java:517)
>  at 
> com.healthmarketscience.jackcess.impl.CursorImpl.findFirstRow(CursorImpl.java:494)
>  at 
> com.healthmarketscience.jackcess.impl.DatabaseImpl$FallbackTableFinder.findRow(DatabaseImpl.java:2376)
>  at 
> com.healthmarketscience.jackcess.impl.DatabaseImpl$TableFinder.findObjectId(DatabaseImpl.java:2176)
>  at 
> com.healthmarketscience.jackcess.impl.DatabaseImpl.readSystemCatalog(DatabaseImpl.java:879)
>  at 
> com.healthmarketscience.jackcess.impl.DatabaseImpl.(DatabaseImpl.java:534)
>  at 
> com.healthmarketscience.jackcess.impl.DatabaseImpl.open(DatabaseImpl.java:401)
>  at 
> com.healthmarketscience.jackcess.DatabaseBuilder.open(DatabaseBuilder.java:252)
>  at 
> org.apache.tika.parser.microsoft.JackcessParser.parse(JackcessParser.java:94) 
> at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280) at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280) at 
> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143) at 
> org.apache.tika.parser.DelegatingParser.parse(DelegatingParser.java:72) at 
> org.apache.tika.extractor.ParsingEmbeddedDocumentExtractor.parseEmbedded(ParsingEmbeddedDocumentExtractor.java:102)
>  at 
> org.apache.tika.parser.pkg.PackageParser.parseEntry(PackageParser.java:350) 
> at org.apache.tika.parser.pkg.PackageParser.parse(PackageParser.java:287) at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280) at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280) at 
> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143) at 
> org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:228)
>  at 
> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:68)
>  at 
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:211)
>  at org.apache.solr.core.SolrCore.execute(SolrCore.java:2596) at 
> org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:799) at 
> org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:578) at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:419)
>  at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:351)
>  at 
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1602)
>  at 
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:540) at 
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:146) 
> at 
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548) 
> at 
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
>  at 
> org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:257)
>  at 
> org.eclipse.jetty.server.session.S

[GitHub] [lucene-solr] dsmiley commented on a change in pull request #1332: SOLR-14254: Docs for text tagger: FST50 trade-off

2020-03-10 Thread GitBox

dsmiley commented on a change in pull request #1332: SOLR-14254: Docs for text 
tagger: FST50 trade-off
URL: https://github.com/apache/lucene-solr/pull/1332#discussion_r390581813
 
 

 ##
 File path: solr/solr-ref-guide/src/the-tagger-handler.adoc
 ##
 @@ -271,11 +271,12 @@ The response should be this (the QTime may vary):
   }}
 
 
-== Tagger Tips
+== Tagger Performance Tips
 
-Performance Tips:
-
-* Follow the recommended configuration field settings, especially 
`postingsFormat=FST50`.
+* Follow the recommended configuration field settings above.
+Additionally, for the best tagger performance, set `postingsFormat=FST50`.
+However, non-default postings formats have no backwards-compatibility 
guarantees, and so if you upgrade Solr then you may find a nasty exception on 
startup as it fails to read the older index.
+If the input text to be tagged is small (e.g. you are tagging queries or 
tweets) then the postings format choice isn't as important.
 
 Review comment:
   > I didn't realize that the FST50 vs default performance decreased the 
smaller the individual document size was
   
   The tagger works by looping over each token from the input and doing a term 
dictionary lookup on the local index.  Logically, if your input text is small 
then there is less work to do than for large input text.  Knowing this requires 
tagger knowledge but not how any particular postings format works.  See?  No I 
didn't benchmark this ;-). 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-14318) Missing dependency on commons-lang in solr-cell 8.4.1

2020-03-10 Thread Kevin Risden (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-14318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17056321#comment-17056321
 ] 

Kevin Risden commented on SOLR-14318:
-

I'm guessing this was caused by SOLR-9079 in 8.1.

Either way for the most part it makes sense to not run Tika as part of Solr 
itself and instead use Tika server or some other place to run Tika. Running 
Tika inside Solr can cause all sorts of issues for stability.

> Missing dependency on commons-lang in solr-cell 8.4.1
> -
>
> Key: SOLR-14318
> URL: https://issues.apache.org/jira/browse/SOLR-14318
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: contrib - Solr Cell (Tika extraction)
>Affects Versions: 8.4.1
>Reporter: Markus Günther
>Priority: Minor
>
> During a migration from Solr 7.x to Solr 8.4.1 we noticed that the 
> commons-lang:commons-lang:2.6 dependency has been removed, and thus, no 
> longer is part of org.apache.solr:solr-cell. solr-cell however comes bundled 
> with Apache Tika Parsers (org.apache.tika:tika-parsers) in version 1.19.1 
> which - although it is not an explicit dependency - does require 
> commons-lang:commons-lang:2.6.
> This raises an issue when trying to extract the content from Microsoft Access 
> database files using Tika. See the stacktrace below.
> {code:java}
> java.lang.NoClassDefFoundError: 
> org/apache/commons/lang/ObjectUtilsjava.lang.NoClassDefFoundError: 
> org/apache/commons/lang/ObjectUtils at 
> com.healthmarketscience.jackcess.util.SimpleColumnMatcher.equals(SimpleColumnMatcher.java:74)
>  at 
> com.healthmarketscience.jackcess.util.SimpleColumnMatcher.matches(SimpleColumnMatcher.java:46)
>  at 
> com.healthmarketscience.jackcess.util.CaseInsensitiveColumnMatcher.matches(CaseInsensitiveColumnMatcher.java:49)
>  at 
> com.healthmarketscience.jackcess.impl.CursorImpl.currentRowMatchesImpl(CursorImpl.java:571)
>  at 
> com.healthmarketscience.jackcess.impl.CursorImpl.findAnotherRowImpl(CursorImpl.java:627)
>  at 
> com.healthmarketscience.jackcess.impl.CursorImpl.findAnotherRow(CursorImpl.java:517)
>  at 
> com.healthmarketscience.jackcess.impl.CursorImpl.findFirstRow(CursorImpl.java:494)
>  at 
> com.healthmarketscience.jackcess.impl.DatabaseImpl$FallbackTableFinder.findRow(DatabaseImpl.java:2376)
>  at 
> com.healthmarketscience.jackcess.impl.DatabaseImpl$TableFinder.findObjectId(DatabaseImpl.java:2176)
>  at 
> com.healthmarketscience.jackcess.impl.DatabaseImpl.readSystemCatalog(DatabaseImpl.java:879)
>  at 
> com.healthmarketscience.jackcess.impl.DatabaseImpl.(DatabaseImpl.java:534)
>  at 
> com.healthmarketscience.jackcess.impl.DatabaseImpl.open(DatabaseImpl.java:401)
>  at 
> com.healthmarketscience.jackcess.DatabaseBuilder.open(DatabaseBuilder.java:252)
>  at 
> org.apache.tika.parser.microsoft.JackcessParser.parse(JackcessParser.java:94) 
> at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280) at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280) at 
> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143) at 
> org.apache.tika.parser.DelegatingParser.parse(DelegatingParser.java:72) at 
> org.apache.tika.extractor.ParsingEmbeddedDocumentExtractor.parseEmbedded(ParsingEmbeddedDocumentExtractor.java:102)
>  at 
> org.apache.tika.parser.pkg.PackageParser.parseEntry(PackageParser.java:350) 
> at org.apache.tika.parser.pkg.PackageParser.parse(PackageParser.java:287) at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280) at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280) at 
> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143) at 
> org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:228)
>  at 
> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:68)
>  at 
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:211)
>  at org.apache.solr.core.SolrCore.execute(SolrCore.java:2596) at 
> org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:799) at 
> org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:578) at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:419)
>  at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:351)
>  at 
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1602)
>  at 
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:540) at 
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:146) 
> at 
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548) 
> at 
> org.eclip

[jira] [Commented] (LUCENE-9164) Should not consider ACE a tragedy if IW is closed

2020-03-10 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-9164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17056320#comment-17056320
 ] 

ASF subversion and git services commented on LUCENE-9164:
-

Commit 845ee75e28b5bb73bd8ec5a8b1d79a46cff7737c in lucene-solr's branch 
refs/heads/branch_8x from Simon Willnauer
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=845ee75 ]

LUCENE-9164: process all events before closing gracefully (#1319)

IndexWriter must process all pending events before closing the writer during 
rollback to prevent AlreadyClosedExceptions from being thrown during event 
processing which can cause the writer to be closed with a tragic event.

> Should not consider ACE a tragedy if IW is closed
> -
>
> Key: LUCENE-9164
> URL: https://issues.apache.org/jira/browse/LUCENE-9164
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/index
>Affects Versions: master (9.0), 8.5, 8.4.2
>Reporter: Nhat Nguyen
>Assignee: Nhat Nguyen
>Priority: Major
> Fix For: master (9.0), 8.5
>
> Attachments: LUCENE-9164.patch, LUCENE-9164.patch
>
>  Time Spent: 7h 20m
>  Remaining Estimate: 0h
>
> If IndexWriter is closed or being closed, AlreadyClosedException is expected. 
> We should not consider it a tragic event in this case.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Updated] (LUCENE-9164) Should not consider ACE a tragedy if IW is closed

2020-03-10 Thread Simon Willnauer (Jira)



 [ 
https://issues.apache.org/jira/browse/LUCENE-9164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Willnauer updated LUCENE-9164:

Fix Version/s: 8.5
   master (9.0)
   Resolution: Fixed
   Status: Resolved  (was: Patch Available)

> Should not consider ACE a tragedy if IW is closed
> -
>
> Key: LUCENE-9164
> URL: https://issues.apache.org/jira/browse/LUCENE-9164
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/index
>Affects Versions: master (9.0), 8.5, 8.4.2
>Reporter: Nhat Nguyen
>Assignee: Nhat Nguyen
>Priority: Major
> Fix For: master (9.0), 8.5
>
> Attachments: LUCENE-9164.patch, LUCENE-9164.patch
>
>  Time Spent: 7h 20m
>  Remaining Estimate: 0h
>
> If IndexWriter is closed or being closed, AlreadyClosedException is expected. 
> We should not consider it a tragic event in this case.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-14318) Missing dependency on commons-lang in solr-cell 8.4.1

2020-03-10 Thread Kevin Risden (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-14318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17056318#comment-17056318
 ] 

Kevin Risden commented on SOLR-14318:
-

[~markus.guenther] I'm pretty sure this won't be addressed in 8.4.1 since Tika 
was upgraded in SOLR-14054 for 8.5 which should get underway for release soon. 
Tika upgrade as far as I can tell doesn't use commons-lang anymore and moves to 
commons-lang3. [~tallison] can you confirm/deny?

> Missing dependency on commons-lang in solr-cell 8.4.1
> -
>
> Key: SOLR-14318
> URL: https://issues.apache.org/jira/browse/SOLR-14318
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: contrib - Solr Cell (Tika extraction)
>Affects Versions: 8.4.1
>Reporter: Markus Günther
>Priority: Minor
>
> During a migration from Solr 7.x to Solr 8.4.1 we noticed that the 
> commons-lang:commons-lang:2.6 dependency has been removed, and thus, no 
> longer is part of org.apache.solr:solr-cell. solr-cell however comes bundled 
> with Apache Tika Parsers (org.apache.tika:tika-parsers) in version 1.19.1 
> which - although it is not an explicit dependency - does require 
> commons-lang:commons-lang:2.6.
> This raises an issue when trying to extract the content from Microsoft Access 
> database files using Tika. See the stacktrace below.
> {code:java}
> java.lang.NoClassDefFoundError: 
> org/apache/commons/lang/ObjectUtilsjava.lang.NoClassDefFoundError: 
> org/apache/commons/lang/ObjectUtils at 
> com.healthmarketscience.jackcess.util.SimpleColumnMatcher.equals(SimpleColumnMatcher.java:74)
>  at 
> com.healthmarketscience.jackcess.util.SimpleColumnMatcher.matches(SimpleColumnMatcher.java:46)
>  at 
> com.healthmarketscience.jackcess.util.CaseInsensitiveColumnMatcher.matches(CaseInsensitiveColumnMatcher.java:49)
>  at 
> com.healthmarketscience.jackcess.impl.CursorImpl.currentRowMatchesImpl(CursorImpl.java:571)
>  at 
> com.healthmarketscience.jackcess.impl.CursorImpl.findAnotherRowImpl(CursorImpl.java:627)
>  at 
> com.healthmarketscience.jackcess.impl.CursorImpl.findAnotherRow(CursorImpl.java:517)
>  at 
> com.healthmarketscience.jackcess.impl.CursorImpl.findFirstRow(CursorImpl.java:494)
>  at 
> com.healthmarketscience.jackcess.impl.DatabaseImpl$FallbackTableFinder.findRow(DatabaseImpl.java:2376)
>  at 
> com.healthmarketscience.jackcess.impl.DatabaseImpl$TableFinder.findObjectId(DatabaseImpl.java:2176)
>  at 
> com.healthmarketscience.jackcess.impl.DatabaseImpl.readSystemCatalog(DatabaseImpl.java:879)
>  at 
> com.healthmarketscience.jackcess.impl.DatabaseImpl.(DatabaseImpl.java:534)
>  at 
> com.healthmarketscience.jackcess.impl.DatabaseImpl.open(DatabaseImpl.java:401)
>  at 
> com.healthmarketscience.jackcess.DatabaseBuilder.open(DatabaseBuilder.java:252)
>  at 
> org.apache.tika.parser.microsoft.JackcessParser.parse(JackcessParser.java:94) 
> at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280) at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280) at 
> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143) at 
> org.apache.tika.parser.DelegatingParser.parse(DelegatingParser.java:72) at 
> org.apache.tika.extractor.ParsingEmbeddedDocumentExtractor.parseEmbedded(ParsingEmbeddedDocumentExtractor.java:102)
>  at 
> org.apache.tika.parser.pkg.PackageParser.parseEntry(PackageParser.java:350) 
> at org.apache.tika.parser.pkg.PackageParser.parse(PackageParser.java:287) at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280) at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280) at 
> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143) at 
> org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:228)
>  at 
> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:68)
>  at 
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:211)
>  at org.apache.solr.core.SolrCore.execute(SolrCore.java:2596) at 
> org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:799) at 
> org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:578) at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:419)
>  at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:351)
>  at 
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1602)
>  at 
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:540) at 
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:146) 
> at 
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:

[jira] [Commented] (LUCENE-9164) Should not consider ACE a tragedy if IW is closed

2020-03-10 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-9164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17056316#comment-17056316
 ] 

ASF subversion and git services commented on LUCENE-9164:
-

Commit 79feb93bd962aa65ede05ecf7cc86e9f5cec84a1 in lucene-solr's branch 
refs/heads/master from Simon Willnauer
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=79feb93 ]

LUCENE-9164: process all events before closing gracefully (#1319)

IndexWriter must process all pending events before closing the writer during 
rollback to prevent AlreadyClosedExceptions from being thrown during event 
processing which can cause the writer to be closed with a tragic event.

> Should not consider ACE a tragedy if IW is closed
> -
>
> Key: LUCENE-9164
> URL: https://issues.apache.org/jira/browse/LUCENE-9164
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/index
>Affects Versions: master (9.0), 8.5, 8.4.2
>Reporter: Nhat Nguyen
>Assignee: Nhat Nguyen
>Priority: Major
> Attachments: LUCENE-9164.patch, LUCENE-9164.patch
>
>  Time Spent: 7h 20m
>  Remaining Estimate: 0h
>
> If IndexWriter is closed or being closed, AlreadyClosedException is expected. 
> We should not consider it a tragic event in this case.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] s1monw merged pull request #1319: LUCENE-9164: process all events before closing gracefully

2020-03-10 Thread GitBox

s1monw merged pull request #1319: LUCENE-9164: process all events before 
closing gracefully
URL: https://github.com/apache/lucene-solr/pull/1319
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-14254) Index backcompat break between 8.3.1 and 8.4.1

2020-03-10 Thread Jason Gerlowski (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-14254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17056306#comment-17056306
 ] 

Jason Gerlowski commented on SOLR-14254:


I agree: short of doing something to handle the old index format in Solr itself 
the best we can do here is a doc fix.

Thanks for proposing docs, they look good to me.

> Index backcompat break between 8.3.1 and 8.4.1
> --
>
> Key: SOLR-14254
> URL: https://issues.apache.org/jira/browse/SOLR-14254
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Jason Gerlowski
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> I believe I found a backcompat break between 8.4.1 and 8.3.1.
> I encountered this when a Solr 8.3.1 cluster was upgraded to 8.4.1.  On 8.4. 
> nodes, several collections had cores fail to come up with 
> {{CorruptIndexException}}:
> {code}
> 2020-02-10 20:58:26.136 ERROR 
> (coreContainerWorkExecutor-2-thread-1-processing-n:192.168.1.194:8983_solr) [ 
>   ] o.a.s.c.CoreContainer Error waiting for SolrCore to be loaded on startup 
> => org.apache.sol
> r.common.SolrException: Unable to create core 
> [testbackcompat_shard1_replica_n1]
> at 
> org.apache.solr.core.CoreContainer.createFromDescriptor(CoreContainer.java:1313)
> org.apache.solr.common.SolrException: Unable to create core 
> [testbackcompat_shard1_replica_n1]
> at 
> org.apache.solr.core.CoreContainer.createFromDescriptor(CoreContainer.java:1313)
>  ~[?:?]
> at 
> org.apache.solr.core.CoreContainer.lambda$load$13(CoreContainer.java:788) 
> ~[?:?]
> at 
> com.codahale.metrics.InstrumentedExecutorService$InstrumentedCallable.call(InstrumentedExecutorService.java:202)
>  ~[metrics-core-4.0.5.jar:4.0.5]
> at java.util.concurrent.FutureTask.run(FutureTask.java:264) ~[?:?]
> at 
> org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:210)
>  ~[?:?]
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
>  ~[?:?]
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>  ~[?:?]
> at java.lang.Thread.run(Thread.java:834) [?:?]
> Caused by: org.apache.solr.common.SolrException: Error opening new searcher
> at org.apache.solr.core.SolrCore.(SolrCore.java:1072) ~[?:?]
> at org.apache.solr.core.SolrCore.(SolrCore.java:901) ~[?:?]
> at 
> org.apache.solr.core.CoreContainer.createFromDescriptor(CoreContainer.java:1292)
>  ~[?:?]
> ... 7 more
> Caused by: org.apache.solr.common.SolrException: Error opening new searcher
> at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:2182) 
> ~[?:?]
> at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:2302) 
> ~[?:?]
> at org.apache.solr.core.SolrCore.initSearcher(SolrCore.java:1132) 
> ~[?:?]
> at org.apache.solr.core.SolrCore.(SolrCore.java:1013) ~[?:?]
> at org.apache.solr.core.SolrCore.(SolrCore.java:901) ~[?:?]
> at 
> org.apache.solr.core.CoreContainer.createFromDescriptor(CoreContainer.java:1292)
>  ~[?:?]
> ... 7 more
> Caused by: org.apache.lucene.index.CorruptIndexException: codec mismatch: 
> actual codec=Lucene50PostingsWriterDoc vs expected 
> codec=Lucene84PostingsWriterDoc 
> (resource=MMapIndexInput(path="/Users/jasongerlowski/run/solrdata/data/testbackcompat_shard1_replica_n1/data/index/_0_FST50_0.doc"))
> at 
> org.apache.lucene.codecs.CodecUtil.checkHeaderNoMagic(CodecUtil.java:208) 
> ~[?:?]
> at org.apache.lucene.codecs.CodecUtil.checkHeader(CodecUtil.java:198) 
> ~[?:?]
> at 
> org.apache.lucene.codecs.CodecUtil.checkIndexHeader(CodecUtil.java:255) ~[?:?]
> at 
> org.apache.lucene.codecs.lucene84.Lucene84PostingsReader.(Lucene84PostingsReader.java:82)
>  ~[?:?]
> at 
> org.apache.lucene.codecs.memory.FSTPostingsFormat.fieldsProducer(FSTPostingsFormat.java:66)
>  ~[?:?]
> at 
> org.apache.lucene.codecs.perfield.PerFieldPostingsFormat$FieldsReader.(PerFieldPostingsFormat.java:315)
>  ~[?:?]
> at 
> org.apache.lucene.codecs.perfield.PerFieldPostingsFormat.fieldsProducer(PerFieldPostingsFormat.java:395)
>  ~[?:?]
> at 
> org.apache.lucene.index.SegmentCoreReaders.(SegmentCoreReaders.java:114)
>  ~[?:?]
> at 
> org.apache.lucene.index.SegmentReader.(SegmentReader.java:84) ~[?:?]
> at 
> org.apache.lucene.index.ReadersAndUpdates.getReader(ReadersAndUpdates.java:177)
>  ~[?:?]
> at 
> org.apache.lucene.index.ReadersAndUpdates.getReadOnlyClone(ReadersAndUpdates.java:219)
>  ~[?:?]
> at 
> org.apache.lucene.index.StandardDirectoryReader.open(Standa

[GitHub] [lucene-solr] gerlowskija commented on a change in pull request #1332: SOLR-14254: Docs for text tagger: FST50 trade-off

2020-03-10 Thread GitBox

gerlowskija commented on a change in pull request #1332: SOLR-14254: Docs for 
text tagger: FST50 trade-off
URL: https://github.com/apache/lucene-solr/pull/1332#discussion_r390556194
 
 

 ##
 File path: solr/solr-ref-guide/src/the-tagger-handler.adoc
 ##
 @@ -271,11 +271,12 @@ The response should be this (the QTime may vary):
   }}
 
 
-== Tagger Tips
+== Tagger Performance Tips
 
-Performance Tips:
-
-* Follow the recommended configuration field settings, especially 
`postingsFormat=FST50`.
+* Follow the recommended configuration field settings above.
+Additionally, for the best tagger performance, set `postingsFormat=FST50`.
+However, non-default postings formats have no backwards-compatibility 
guarantees, and so if you upgrade Solr then you may find a nasty exception on 
startup as it fails to read the older index.
+If the input text to be tagged is small (e.g. you are tagging queries or 
tweets) then the postings format choice isn't as important.
 
 Review comment:
   [Q] Interesting.  I didn't realize that the FST50 vs default performance 
decreased the smaller the individual document size was.  Did you do a 
particular performance test to bear this out, or are you just intuiting that 
behavior from knowing how postingsFormats work?
   
   Is the performance comparable even if numTweets or whatever gets large and 
the posting-lists grow due to the sheer number of tiny docs?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-13199) NPE due to unexpected null return value from QueryBitSetProducer.getBitSet

2020-03-10 Thread Munendra S N (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-13199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17056298#comment-17056298
 ] 

Munendra S N commented on SOLR-13199:
-

{code:java}
I suspect you are not familiar with how nested documents work and/or this 
particular DocTransformer.  The use-case for a child doc transformer is when 
the master query matches only parent documents but you want to see their 
children in the response, attached to it.  What the user pages through are 
parent documents; they only see parent documents.
{code}
I think I didn't explain the case properly. 
Let's consider http://yonik.com/solr-nested-objects/ Suppose the initial query 
is {{author_s:yonik}} which matches parent document and {{cat_s:fantasy}} is 
the parentFilter used in childDocTranformer(child documents are reviews). This 
works fine

Now say, we add a new book by {{author_s:yonik}} which belongs to 
{{cat_s:biography}}. Once this is added, if a make same query 
{{author_s:yoniik}} and {{parentFilter=cat_s:fantasy}}, new parentFilter won't 
obviously match {{cat_s:biography}} and So, getBitSet will return {{null}} we 
get NPE.

This is what I was trying to explain but made it complicated by introducing 
pagination and etc.

I understand in this case better parentFilter would be type_s:book instead of 
cat_s but in the beginning cat_s is also constant across the parent set until 
new category is introduced.

After some more thought and trying to out scenarios like childless parent(which 
works fine as long parentFilter is proper), it is better to fail the request 
instead of not returning any children since parent contains children but not 
able to return because of improper parentFilter

I will make changes to throw an error in both cases.
[~dsmiley] Thanks a lot for your patience and  detailed explanation

> NPE due to unexpected null return value from QueryBitSetProducer.getBitSet
> --
>
> Key: SOLR-13199
> URL: https://issues.apache.org/jira/browse/SOLR-13199
> Project: Solr
>  Issue Type: Bug
>  Components: search
>Affects Versions: master (9.0)
> Environment: h1. Steps to reproduce
> * Use a Linux machine.
> *  Build commit {{ea2c8ba}} of Solr as described in the section below.
> * Build the films collection as described below.
> * Start the server using the command {{./bin/solr start -f -p 8983 -s 
> /tmp/home}}
> * Request the URL given in the bug description.
> h1. Compiling the server
> {noformat}
> git clone https://github.com/apache/lucene-solr
> cd lucene-solr
> git checkout ea2c8ba
> ant compile
> cd solr
> ant server
> {noformat}
> h1. Building the collection
> We followed [Exercise 
> 2|http://lucene.apache.org/solr/guide/7_5/solr-tutorial.html#exercise-2] from 
> the [Solr 
> Tutorial|http://lucene.apache.org/solr/guide/7_5/solr-tutorial.html]. The 
> attached file ({{home.zip}}) gives the contents of folder {{/tmp/home}} that 
> you will obtain by following the steps below:
> {noformat}
> mkdir -p /tmp/home
> echo '' > 
> /tmp/home/solr.xml
> {noformat}
> In one terminal start a Solr instance in foreground:
> {noformat}
> ./bin/solr start -f -p 8983 -s /tmp/home
> {noformat}
> In another terminal, create a collection of movies, with no shards and no 
> replication, and initialize it:
> {noformat}
> bin/solr create -c films
> curl -X POST -H 'Content-type:application/json' --data-binary '{"add-field": 
> {"name":"name", "type":"text_general", "multiValued":false, "stored":true}}' 
> http://localhost:8983/solr/films/schema
> curl -X POST -H 'Content-type:application/json' --data-binary 
> '{"add-copy-field" : {"source":"*","dest":"_text_"}}' 
> http://localhost:8983/solr/films/schema
> ./bin/post -c films example/films/films.json
> {noformat}
>Reporter: Johannes Kloos
>Assignee: Munendra S N
>Priority: Minor
>  Labels: diffblue, newdev
> Attachments: SOLR-13199.patch, home.zip
>
>
> Requesting the following URL causes Solr to return an HTTP 500 error response:
> {noformat}
> http://localhost:8983/solr/films/select?fl=[child%20parentFilter=ge]&q=*:*
> {noformat}
> The error response seems to be caused by the following uncaught exception:
> {noformat}
> java.lang.NullPointerException
> at 
> org.apache.solr.response.transform.ChildDocTransformer.transform(ChildDocTransformer.java:92)
> at org.apache.solr.response.DocsStreamer.next(DocsStreamer.java:103)
> at org.apache.solr.response.DocsStreamer.next(DocsStreamer.java:1)
> at 
> org.apache.solr.response.TextResponseWriter.writeDocuments(TextResponseWriter.java:184)
> at 
> org.apache.solr.response.TextResponseWriter.writeVal(TextResponseWriter.java:136)
> at 
> org.apache.solr.common.util.JsonTextWriter.writeNamedListAsMapWithDups(JsonTextWriter.java:386)
> at 
> org.apache.s

[jira] [Commented] (SOLR-14265) Move to admin API to v2 completely

2020-03-10 Thread Cassandra Targett (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-14265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17056283#comment-17056283
 ] 

Cassandra Targett commented on SOLR-14265:
--

The only thing I have is a doc from 2017 that Noble (and Steve also IIRC) wrote 
as the first documentation on the v2 APIs: 
https://docs.google.com/document/d/18n9IL6y82C8gnBred6lzG0GLaT3OsZZsBvJQ2YAt72I.
 At the end there is a mapping of v1 to v2 which was reasonably correct when it 
was written.

There have been lots of changes since then; whole new APIs have been added, 
some with and some without v2 support. In some cases v2 support has been added 
for things that were missing back in 2017.

Some of my comments in SOLR-11646 also point out a couple of additional gaps 
that may not be listed in the 2017 docs Noble wrote.

> Move to admin API to v2 completely 
> ---
>
> Key: SOLR-14265
> URL: https://issues.apache.org/jira/browse/SOLR-14265
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Anshum Gupta
>Assignee: Anshum Gupta
>Priority: Major
>
> V2 admin API has been available in Solr for a very long time, making it 
> difficult for both users and developers to remember and understand which 
> format to use when. We should move to v2 API completely for all Solr Admin 
> calls for the following reasons:
>  # converge code - there are multiple ways of doing the same thing, there's 
> unwanted back-compat code, and we should get rid of that
>  # POJO all the way - no more NamedList. I know this would have split 
> opinions, but I strongly think we should move in this direction. I created 
> Jira about this specific task in the past and went half way but I think we 
> should just close this one out now.
>  # Automatic documentation
>  # Others
> This is just an umbrella Jira for the task. Let's create sub-tasks and split 
> this up as it would require a bunch of rewriting of the code and it makes a 
> lot of sense to get this out with 9.0 so we don't have to support v1 forever! 
> There have been some conversations going on about this and it feels like most 
> folks are happy to go this route.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] janhoy merged pull request #1326: Remove unused scripts in dev-tools folder

2020-03-10 Thread GitBox

janhoy merged pull request #1326: Remove unused scripts in dev-tools folder
URL: https://github.com/apache/lucene-solr/pull/1326
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-14054) Upgrade Tika to 1.23

2020-03-10 Thread Tim Allison (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-14054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17056259#comment-17056259
 ] 

Tim Allison commented on SOLR-14054:


Looking...

> Upgrade Tika to 1.23
> 
>
> Key: SOLR-14054
> URL: https://issues.apache.org/jira/browse/SOLR-14054
> Project: Solr
>  Issue Type: Task
>  Components: contrib - DataImportHandler
>Reporter: Tim Allison
>Assignee: Tim Allison
>Priority: Minor
> Fix For: 8.5
>
> Attachments: test-documents.7z, 
> tika-integration-example-9.0.0-SNAPSHOT.tgz
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> We just released 1.23.  Let's upgrade Tika.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-14054) Upgrade Tika to 1.23

2020-03-10 Thread Adrien Grand (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-14054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17056257#comment-17056257
 ] 

Adrien Grand commented on SOLR-14054:
-

It looks like this issue is responsible for the smoketest build failures, see 
e.g. https://builds.apache.org/job/Lucene-Solr-SmokeRelease-8.x/369/console.

> Upgrade Tika to 1.23
> 
>
> Key: SOLR-14054
> URL: https://issues.apache.org/jira/browse/SOLR-14054
> Project: Solr
>  Issue Type: Task
>  Components: contrib - DataImportHandler
>Reporter: Tim Allison
>Assignee: Tim Allison
>Priority: Minor
> Fix For: 8.5
>
> Attachments: test-documents.7z, 
> tika-integration-example-9.0.0-SNAPSHOT.tgz
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> We just released 1.23.  Let's upgrade Tika.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] jpountz commented on a change in pull request #1294: LUCENE-9074: Slice Allocation Control Plane For Concurrent Searches

2020-03-10 Thread GitBox

jpountz commented on a change in pull request #1294: LUCENE-9074: Slice 
Allocation Control Plane For Concurrent Searches
URL: https://github.com/apache/lucene-solr/pull/1294#discussion_r390523437
 
 

 ##
 File path: lucene/core/src/java/org/apache/lucene/search/SliceExecutor.java
 ##
 @@ -0,0 +1,105 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.lucene.search;
+
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.List;
+import java.util.concurrent.CompletableFuture;
+import java.util.concurrent.Executor;
+import java.util.concurrent.Future;
+import java.util.concurrent.FutureTask;
+import java.util.concurrent.RejectedExecutionException;
+
+/**
+ * Executor which is responsible
+ * for execution of slices based on the current status
+ * of the system and current system load
+ */
+class SliceExecutor {
+  private final Executor executor;
+
+  public SliceExecutor(Executor executor) {
+this.executor = executor;
+  }
+
+  public  List> invokeAll(Collection> tasks) {
 
 Review comment:
   I wonder whether this is the right API. We could change the return type to 
`void` and use `Runnable` instead of `FutureTask` and that would still work, 
right? The return value isn't really useful since it has the same content as 
the input collection? So what about making it just: `public void 
invokeAll(Collection tasks)`?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] jpountz commented on a change in pull request #1294: LUCENE-9074: Slice Allocation Control Plane For Concurrent Searches

2020-03-10 Thread GitBox

jpountz commented on a change in pull request #1294: LUCENE-9074: Slice 
Allocation Control Plane For Concurrent Searches
URL: https://github.com/apache/lucene-solr/pull/1294#discussion_r390522773
 
 

 ##
 File path: lucene/core/src/java/org/apache/lucene/search/IndexSearcher.java
 ##
 @@ -933,6 +932,13 @@ public Executor getExecutor() {
 return executor;
   }
 
+  /**
+   * Returns this searchers slice execution control plane or null 
if no executor was provided
+   */
+  public SliceExecutor getSliceExecutor() {
 
 Review comment:
   we shouldn't make this method public if it returns a pkg-private class, 
let's make the method pkg-private too? Or even remove it entirely as I'm not 
seeing any call site for it?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] jpountz commented on a change in pull request #1294: LUCENE-9074: Slice Allocation Control Plane For Concurrent Searches

2020-03-10 Thread GitBox

jpountz commented on a change in pull request #1294: LUCENE-9074: Slice 
Allocation Control Plane For Concurrent Searches
URL: https://github.com/apache/lucene-solr/pull/1294#discussion_r390515024
 
 

 ##
 File path: lucene/core/src/java/org/apache/lucene/search/IndexSearcher.java
 ##
 @@ -211,6 +213,18 @@ public IndexSearcher(IndexReaderContext context, Executor 
executor) {
 assert context.isTopLevel: "IndexSearcher's ReaderContext must be topLevel 
for reader" + context.reader();
 reader = context.reader();
 this.executor = executor;
+this.sliceExecutor = executor == null ? null : 
getSliceExecutionControlPlane(executor);
+this.readerContext = context;
+leafContexts = context.leaves();
+this.leafSlices = executor == null ? null : slices(leafContexts);
 
 Review comment:
   maybe this should delegate to the below constructor?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] jpountz commented on a change in pull request #1294: LUCENE-9074: Slice Allocation Control Plane For Concurrent Searches

2020-03-10 Thread GitBox

jpountz commented on a change in pull request #1294: LUCENE-9074: Slice 
Allocation Control Plane For Concurrent Searches
URL: https://github.com/apache/lucene-solr/pull/1294#discussion_r390521799
 
 

 ##
 File path: lucene/core/src/java/org/apache/lucene/search/SliceExecutor.java
 ##
 @@ -0,0 +1,105 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.lucene.search;
+
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.List;
+import java.util.concurrent.CompletableFuture;
+import java.util.concurrent.Executor;
+import java.util.concurrent.Future;
+import java.util.concurrent.FutureTask;
+import java.util.concurrent.RejectedExecutionException;
+
+/**
+ * Executor which is responsible
+ * for execution of slices based on the current status
+ * of the system and current system load
+ */
+class SliceExecutor {
+  private final Executor executor;
+
+  public SliceExecutor(Executor executor) {
+this.executor = executor;
+  }
+
+  public  List> invokeAll(Collection> tasks) {
+
+if (tasks == null) {
+  throw new IllegalArgumentException("Tasks is null");
+}
+
+if (executor == null) {
+  throw new IllegalArgumentException("Executor is null");
+}
+
+List> futures = new ArrayList();
+
+int i = 0;
+
+for (FutureTask task : tasks) {
+  boolean shouldExecuteOnCallerThread = false;
+
+  // Execute last task on caller thread
+  if (i == tasks.size() - 1) {
+shouldExecuteOnCallerThread = true;
+  }
+
+  processTask(task, futures, shouldExecuteOnCallerThread);
+  ++i;
+}
+
+return futures;
+  }
+
+  // Helper method to execute a single task
+  protected  void processTask(final FutureTask task, final 
List> futures,
+ final boolean shouldExecuteOnCallerThread) {
+if (task == null) {
+  throw new IllegalArgumentException("Input is null");
+}
+
+if (!shouldExecuteOnCallerThread) {
+  try {
+executor.execute(task);
+futures.add(task);
+
+return;
+  } catch (RejectedExecutionException e) {
+// Execute on caller thread
+  }
+}
+
+runTaskOnCallerThread(task);
+
+try {
+  futures.add(CompletableFuture.completedFuture(task.get()));
+} catch (Exception e) {
+  throw new RuntimeException(e);
+}
+  }
+
+  // Private helper method to run a task on the caller thread
+  private void runTaskOnCallerThread(FutureTask task) {
+try {
+  task.run();
+} catch (Exception e) {
+  throw new RuntimeException(e);
+}
 
 Review comment:
   we don't need this catch block as task.run() doesn't declare any non-runtime 
exception?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-13199) NPE due to unexpected null return value from QueryBitSetProducer.getBitSet

2020-03-10 Thread David Smiley (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-13199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17056243#comment-17056243
 ] 

David Smiley commented on SOLR-13199:
-

{quote}what about a segment with no hits? Presumably it may occurs between 
regular ones
{quote}
No, a nested document set is committed atomically; it is not split to other 
segments.  Solr flattens the nest then supplies the list to Lucene which 
+guarantees+ this.
{quote}One case where query could be null even if parentFilter is specified 
filter is defined on text field and value is stopword
{quote}
The dev user is then not using an appropriate query.  It's mandatory that at 
least one document in a non-empty index be a parent; otherwise we don't 
actually have a nested index which is also the dev user's fault if true.
{quote}Suppose, user is also using pagination. Fist page returns properly, 
there is one such parent product which fits the bill and we throw an exception.
{quote}
I suspect you are not familiar with how nested documents work and/or this 
particular DocTransformer.  The use-case for a child doc transformer is when 
the master query matches only parent documents but you want to see their 
children in the response, attached to it.  What the user pages through are 
parent documents; they only see parent documents.

If somehow there is some use case where throwing an exception would inhibit a 
use case we have never thought of, I insist we wait until such a use-case 
actually presents itself.  Otherwise, we are failing to inform the user that 
they are very probably making a mistake.
{quote}Also, I have question if someone uses nestPathField approach(defined in 
the schema) but doesn't have any children for parents what does 
childTransformer return? Does it fail the request with valid error or return 
just the parent products?
{quote}
What is the "it" in "Does it fail ..." refer to?  You probably refer to 
ChildDocTransformer.  It attaches no child documents to the parent because 
there aren't any (perfectly valid!).  It would not "return parents" but I think 
you maybe mean would the master query return parents.  What the master query 
returns is whatever your q/fq match and is not something the transformer 
affects.

> NPE due to unexpected null return value from QueryBitSetProducer.getBitSet
> --
>
> Key: SOLR-13199
> URL: https://issues.apache.org/jira/browse/SOLR-13199
> Project: Solr
>  Issue Type: Bug
>  Components: search
>Affects Versions: master (9.0)
> Environment: h1. Steps to reproduce
> * Use a Linux machine.
> *  Build commit {{ea2c8ba}} of Solr as described in the section below.
> * Build the films collection as described below.
> * Start the server using the command {{./bin/solr start -f -p 8983 -s 
> /tmp/home}}
> * Request the URL given in the bug description.
> h1. Compiling the server
> {noformat}
> git clone https://github.com/apache/lucene-solr
> cd lucene-solr
> git checkout ea2c8ba
> ant compile
> cd solr
> ant server
> {noformat}
> h1. Building the collection
> We followed [Exercise 
> 2|http://lucene.apache.org/solr/guide/7_5/solr-tutorial.html#exercise-2] from 
> the [Solr 
> Tutorial|http://lucene.apache.org/solr/guide/7_5/solr-tutorial.html]. The 
> attached file ({{home.zip}}) gives the contents of folder {{/tmp/home}} that 
> you will obtain by following the steps below:
> {noformat}
> mkdir -p /tmp/home
> echo '' > 
> /tmp/home/solr.xml
> {noformat}
> In one terminal start a Solr instance in foreground:
> {noformat}
> ./bin/solr start -f -p 8983 -s /tmp/home
> {noformat}
> In another terminal, create a collection of movies, with no shards and no 
> replication, and initialize it:
> {noformat}
> bin/solr create -c films
> curl -X POST -H 'Content-type:application/json' --data-binary '{"add-field": 
> {"name":"name", "type":"text_general", "multiValued":false, "stored":true}}' 
> http://localhost:8983/solr/films/schema
> curl -X POST -H 'Content-type:application/json' --data-binary 
> '{"add-copy-field" : {"source":"*","dest":"_text_"}}' 
> http://localhost:8983/solr/films/schema
> ./bin/post -c films example/films/films.json
> {noformat}
>Reporter: Johannes Kloos
>Assignee: Munendra S N
>Priority: Minor
>  Labels: diffblue, newdev
> Attachments: SOLR-13199.patch, home.zip
>
>
> Requesting the following URL causes Solr to return an HTTP 500 error response:
> {noformat}
> http://localhost:8983/solr/films/select?fl=[child%20parentFilter=ge]&q=*:*
> {noformat}
> The error response seems to be caused by the following uncaught exception:
> {noformat}
> java.lang.NullPointerException
> at 
> org.apache.solr.response.transform.ChildDocTransformer.transform(ChildDocTransformer.java:92)
> at org.apache.solr.response.DocsStrea

[GitHub] [lucene-solr] jpountz commented on a change in pull request #1294: LUCENE-9074: Slice Allocation Control Plane For Concurrent Searches

2020-03-10 Thread GitBox

jpountz commented on a change in pull request #1294: LUCENE-9074: Slice 
Allocation Control Plane For Concurrent Searches
URL: https://github.com/apache/lucene-solr/pull/1294#discussion_r390523963
 
 

 ##
 File path: lucene/core/src/java/org/apache/lucene/search/SliceExecutor.java
 ##
 @@ -0,0 +1,105 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.lucene.search;
+
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.List;
+import java.util.concurrent.CompletableFuture;
+import java.util.concurrent.Executor;
+import java.util.concurrent.Future;
+import java.util.concurrent.FutureTask;
+import java.util.concurrent.RejectedExecutionException;
+
+/**
+ * Executor which is responsible
+ * for execution of slices based on the current status
+ * of the system and current system load
+ */
+class SliceExecutor {
+  private final Executor executor;
+
+  public SliceExecutor(Executor executor) {
+this.executor = executor;
+  }
+
+  public  List> invokeAll(Collection> tasks) {
+
+if (tasks == null) {
+  throw new IllegalArgumentException("Tasks is null");
+}
+
+if (executor == null) {
+  throw new IllegalArgumentException("Executor is null");
+}
+
+List> futures = new ArrayList();
+
+int i = 0;
+
+for (FutureTask task : tasks) {
 
 Review comment:
   we should never use generic types without type parameters, can you address 
all these compilation warnings?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] jpountz commented on a change in pull request #1294: LUCENE-9074: Slice Allocation Control Plane For Concurrent Searches

2020-03-10 Thread GitBox

jpountz commented on a change in pull request #1294: LUCENE-9074: Slice 
Allocation Control Plane For Concurrent Searches
URL: https://github.com/apache/lucene-solr/pull/1294#discussion_r390525903
 
 

 ##
 File path: lucene/core/src/java/org/apache/lucene/search/SliceExecutor.java
 ##
 @@ -0,0 +1,105 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.lucene.search;
+
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.List;
+import java.util.concurrent.CompletableFuture;
+import java.util.concurrent.Executor;
+import java.util.concurrent.Future;
+import java.util.concurrent.FutureTask;
+import java.util.concurrent.RejectedExecutionException;
+
+/**
+ * Executor which is responsible
+ * for execution of slices based on the current status
+ * of the system and current system load
+ */
+class SliceExecutor {
+  private final Executor executor;
+
+  public SliceExecutor(Executor executor) {
+this.executor = executor;
+  }
+
+  public  List> invokeAll(Collection> tasks) {
+
+if (tasks == null) {
+  throw new IllegalArgumentException("Tasks is null");
+}
+
+if (executor == null) {
+  throw new IllegalArgumentException("Executor is null");
+}
+
+List> futures = new ArrayList();
+
+int i = 0;
+
+for (FutureTask task : tasks) {
+  boolean shouldExecuteOnCallerThread = false;
+
+  // Execute last task on caller thread
+  if (i == tasks.size() - 1) {
+shouldExecuteOnCallerThread = true;
+  }
+
+  processTask(task, futures, shouldExecuteOnCallerThread);
+  ++i;
+}
+
+return futures;
+  }
+
+  // Helper method to execute a single task
+  protected  void processTask(final FutureTask task, final 
List> futures,
+ final boolean shouldExecuteOnCallerThread) {
+if (task == null) {
+  throw new IllegalArgumentException("Input is null");
+}
+
+if (!shouldExecuteOnCallerThread) {
+  try {
+executor.execute(task);
+futures.add(task);
+
+return;
+  } catch (RejectedExecutionException e) {
+// Execute on caller thread
+  }
+}
+
+runTaskOnCallerThread(task);
+
+try {
+  futures.add(CompletableFuture.completedFuture(task.get()));
 
 Review comment:
   this has the same effect as `futures.add(task)` unless I'm missing something


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-14265) Move to admin API to v2 completely

2020-03-10 Thread Anshum Gupta (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-14265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17056226#comment-17056226
 ] 

Anshum Gupta commented on SOLR-14265:
-

Great idea, Cassandra! I was kind of doing that in addition to working through 
and understanding how the new API is structured. There are more than one ways 
the v2 stuff is written so trying to converge on the best option there too.

Having a list of all the endpoints that we want to support with v2 would be 
great, and then I plan on moving those APIs over one after the other.

 

If you already have a document or writeup on that mapping/accounting, please 
feel free to share and that would be a great anchor for this.

> Move to admin API to v2 completely 
> ---
>
> Key: SOLR-14265
> URL: https://issues.apache.org/jira/browse/SOLR-14265
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Anshum Gupta
>Assignee: Anshum Gupta
>Priority: Major
>
> V2 admin API has been available in Solr for a very long time, making it 
> difficult for both users and developers to remember and understand which 
> format to use when. We should move to v2 API completely for all Solr Admin 
> calls for the following reasons:
>  # converge code - there are multiple ways of doing the same thing, there's 
> unwanted back-compat code, and we should get rid of that
>  # POJO all the way - no more NamedList. I know this would have split 
> opinions, but I strongly think we should move in this direction. I created 
> Jira about this specific task in the past and went half way but I think we 
> should just close this one out now.
>  # Automatic documentation
>  # Others
> This is just an umbrella Jira for the task. Let's create sub-tasks and split 
> this up as it would require a bunch of rewriting of the code and it makes a 
> lot of sense to get this out with 9.0 so we don't have to support v1 forever! 
> There have been some conversations going on about this and it feels like most 
> folks are happy to go this route.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] danmuzi opened a new pull request #1336: LUCENE-9270: Update Javadoc about normalizeEntry in the Kuromoji DictionaryBuilder

2020-03-10 Thread GitBox

danmuzi opened a new pull request #1336: LUCENE-9270: Update Javadoc about 
normalizeEntry in the Kuromoji DictionaryBuilder
URL: https://github.com/apache/lucene-solr/pull/1336
 
 
   The normalizeEntry option is missing from the Javadoc of Kuromoji 
DictionaryBuilder.
   Without this explanation, users don't know what it means until they see the 
code.
   Also, if user follows the usage of Javadoc, it will not be built.
   
   Please check the following JIRA issue:
   [LUCENE-9270](https://issues.apache.org/jira/browse/LUCENE-9270)


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Created] (LUCENE-9270) Update Javadoc about normalizeEntry in Kuromoji DictionaryBuilder

2020-03-10 Thread Namgyu Kim (Jira)

Namgyu Kim created LUCENE-9270:
--

 Summary: Update Javadoc about normalizeEntry in Kuromoji 
DictionaryBuilder
 Key: LUCENE-9270
 URL: https://issues.apache.org/jira/browse/LUCENE-9270
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Namgyu Kim
Assignee: Namgyu Kim


The normalizeEntry option is missing from the Javadoc of Kuromoji 
DictionaryBuilder.
 Without this explanation, users don't know what it means until they see the 
code.
Also, if user follows the usage of Javadoc, it will not be built.

So the following contents need to be applied:

1) Change usage
 before:
   java -cp [lucene classpath] 
org.apache.lucene.analysis.ja.util.DictionaryBuilder \
     ${inputDir} ${outputDir} ${encoding}
 after:
   java -cp [lucene classpath] 
org.apache.lucene.analysis.ja.util.DictionaryBuilder \
     ${inputDir} ${outputDir} ${encoding} *${normalizeEntry}*

2) Add description about normalizeEntry

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] jpountz commented on a change in pull request #1294: LUCENE-9074: Slice Allocation Control Plane For Concurrent Searches

2020-03-10 Thread GitBox

jpountz commented on a change in pull request #1294: LUCENE-9074: Slice 
Allocation Control Plane For Concurrent Searches
URL: https://github.com/apache/lucene-solr/pull/1294#discussion_r390514257
 
 

 ##
 File path: lucene/core/src/java/org/apache/lucene/search/IndexSearcher.java
 ##
 @@ -211,6 +213,18 @@ public IndexSearcher(IndexReaderContext context, Executor 
executor) {
 assert context.isTopLevel: "IndexSearcher's ReaderContext must be topLevel 
for reader" + context.reader();
 reader = context.reader();
 this.executor = executor;
+this.sliceExecutionControlPlane = executor == null ? null : 
getSliceExecutionControlPlane(executor);
+this.readerContext = context;
+leafContexts = context.leaves();
+this.leafSlices = executor == null ? null : slices(leafContexts);
+  }
+
+  // Package private for testing
+  IndexSearcher(IndexReaderContext context, Executor executor, 
SliceExecutionControlPlane sliceExecutionControlPlane) {
+assert context.isTopLevel: "IndexSearcher's ReaderContext must be topLevel 
for reader" + context.reader();
+reader = context.reader();
+this.executor = executor;
+this.sliceExecutionControlPlane = executor == null ? null : 
sliceExecutionControlPlane;
 
 Review comment:
   My point was that it sounds like a bug on the caller of this constructor to 
pass a null executor and a non-null sliceExecutionControlPlane? So I'd rather 
have validation around it rather than be lenient and ignore the provided 
sliceExecutionControlPlane if the executor is null?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-9269) Blended queries with boolean rewrite can result in inconstitent scores

2020-03-10 Thread Adrien Grand (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-9269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17056205#comment-17056205
 ] 

Adrien Grand commented on LUCENE-9269:
--

For this particular issue, I think that the right fix would be to fix 
{{TermQuery#equals}} and {{hashCode}} to take {{perReaderTermState}} into 
account. Queries shouldn't be considered equal if they might return different 
scores. I don't think that this would have bad side-effects as boolean rewrites 
are generally used for scoring queries, which are not cached (which is the 
other typicall call-site for Query#equals/hashCode).

> Blended queries with boolean rewrite can result in inconstitent scores
> --
>
> Key: LUCENE-9269
> URL: https://issues.apache.org/jira/browse/LUCENE-9269
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/search
>Affects Versions: 8.4
>Reporter: Michele Palmia
>Priority: Minor
> Attachments: LUCENE-9269-test.patch
>
>
> If two blended queries are should clauses of a boolean query and are built so 
> that
>  * some of their terms are the same
>  * their rewrite method is BlendedTermQuery.BOOLEAN_REWRITE
> the docFreq for the overlapping terms used for scoring is picked as follow:
>  # if the overlapping terms are not boosted, the df of the term in the first 
> blended query is used
>  # if any of the overlapping terms is boosted, the df is picked at (what 
> looks like) random.
> A few examples using a field with 2 terms: f:a (df: 2), and f:b (df: 3).
> {code:java}
> a)
> Blended(f:a f:b) Blended (f:a)
> df: 3 df: 2
> gets rewritten to:
> (f:a)^2.0 (f:b)
> df: 3  df:2
> b)
> Blended(f:a) Blended(f:a f:b)
> df: 2df: 3
> gets rewritten to:
> (f:a)^2.0 (f:b)
>  df: 2 df:2
> c)
> Blended(f:a f:b^0.66) Blended (f:a^0.75)
> df: 3  df: 2
> gets rewritten to:
> (f:a)^1.75 (f:b)^0.66
>  df:?   df:2
> {code}
> with ? either 2 or 3, depending on the run.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-14265) Move to admin API to v2 completely

2020-03-10 Thread Cassandra Targett (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-14265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17056179#comment-17056179
 ] 

Cassandra Targett commented on SOLR-14265:
--

I was thinking about this today as I had a moment to pick up SOLR-11646 again 
and the first thing I noticed was a Collections API command 
(action=CLUSTERSTATUS) which does not have a v2 counterpart.

It made me think that the first thing to do here is possibly an accounting of 
what does & doesn't have v2 coverage, get those added, and then work on 
removing v1.

> Move to admin API to v2 completely 
> ---
>
> Key: SOLR-14265
> URL: https://issues.apache.org/jira/browse/SOLR-14265
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Anshum Gupta
>Assignee: Anshum Gupta
>Priority: Major
>
> V2 admin API has been available in Solr for a very long time, making it 
> difficult for both users and developers to remember and understand which 
> format to use when. We should move to v2 API completely for all Solr Admin 
> calls for the following reasons:
>  # converge code - there are multiple ways of doing the same thing, there's 
> unwanted back-compat code, and we should get rid of that
>  # POJO all the way - no more NamedList. I know this would have split 
> opinions, but I strongly think we should move in this direction. I created 
> Jira about this specific task in the past and went half way but I think we 
> should just close this one out now.
>  # Automatic documentation
>  # Others
> This is just an umbrella Jira for the task. Let's create sub-tasks and split 
> this up as it would require a bunch of rewriting of the code and it makes a 
> lot of sense to get this out with 9.0 so we don't have to support v1 forever! 
> There have been some conversations going on about this and it feels like most 
> folks are happy to go this route.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-14007) Difference response format for percentile aggregation

2020-03-10 Thread Munendra S N (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-14007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17056132#comment-17056132
 ] 

Munendra S N commented on SOLR-14007:
-

Thanks [~mkhl] for the review

Thanks [~jbernste] for the sharing your thoughts. Idea of table view looks 
exciting. My idea was to provide similar view (key-value) with normal facet 
response for percentile.

Thank you [~ysee...@gmail.com] for the detailed review.

{code:java}
Consistency should not be a goal since the Stats component should be deprecated
{code}
Huge +1 to deprecate stats component. Having multiple component to same 
functionality is not required and also a maintenance hassle. I have been 
working on this from past few months (but lesser speed than preferred). I have 
created some tasks for the same and they are not exhaustive. Idea is not to fix 
all of them but fix only those which makes sense. For example, returning 
distinctValues could lead to potential OOM, there are other way to achieve the 
say(terms component with limit) which need not have to be supported in json 
facets. Similarly, avg on dates not sure of the usecase for finding avg on 
date. If there is not such case, maybe failing rather than returning some date 
or double value makes more sense


{code:java}
Regardless, what the Stats component currently does should really shouldn't 
have much bearing on what solution we chose here.
{code}
Completely agree with this. Even with the current patch, when there are no 
values unlike stats component not returning null for each percentile specified. 


{code:java}
For percentile(), if the norm was a single argument, then representing the 
response as a single value would be natural and multiple values would be an 
extension (but an exception.
{code}
My understanding was always  the other way around. I always thought median to 
be case which could be supported via percentiles. 


{code:java}
 I also do question if this change actually makes anyones lives easier. The 
vast majority of clients would know what they are asking for and hence the form 
of answer they will get back?
{code}
I still think having consistent response format irrespective of number of 
values specified, makes response processing cleaner without if-else checks.
The reason for going with NamedList(initial I built the patch with list in my 
mind, still have it my local), is to make response self-contained as much 
possible. 

I have shared the reasoning behind the approach irrespective of namedList(would 
prefer of self-containess) or list, would prefer to have consistent value type 
in the response
Let me know if there are any suggestions



> Difference response format for percentile aggregation
> -
>
> Key: SOLR-14007
> URL: https://issues.apache.org/jira/browse/SOLR-14007
> Project: Solr
>  Issue Type: Sub-task
>  Components: Facet Module
>Reporter: Munendra S N
>Assignee: Munendra S N
>Priority: Major
> Attachments: SOLR-14007.patch
>
>
> For percentile,
> In Stats component, the response format for percentile is {{NamedList}} but 
> in JSON facet, the format is either array or single value depending on number 
> of percentiles specified.
> Even if JSON percentile doesn't use NamedList, response format shouldn't 
> change based on number of percentiles



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Updated] (SOLR-14319) Add ability to select replicatype to admin ui collection creation

2020-03-10 Thread Richard Goodman (Jira)



 [ 
https://issues.apache.org/jira/browse/SOLR-14319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Goodman updated SOLR-14319:
---
Status: Patch Available  (was: Open)

> Add ability to select replicatype to admin ui collection creation
> -
>
> Key: SOLR-14319
> URL: https://issues.apache.org/jira/browse/SOLR-14319
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: Admin UI
>Affects Versions: 7.7.2
>Reporter: Richard Goodman
>Priority: Minor
> Attachments: SOLR-14319.patch, Screenshot 2020-03-10 at 16.26.28.png, 
> Screenshot 2020-03-10 at 16.33.52.png
>
>
> This is just a small patch that allows you to select the replica type when 
> creating a collection. I'm aware that a possible strategy for replica types 
> of a collection can be {{'tlog + pull'}}, because of this, I'm open to 
> feedback on a different way to display this feature. Currently I have a drop 
> down box defining the types of replicas, with it defaulting to nrt, and it 
> will take the replication factor specified and will that many replicas of a 
> given type.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Updated] (SOLR-14319) Add ability to select replicatype to admin ui collection creation

2020-03-10 Thread Richard Goodman (Jira)



 [ 
https://issues.apache.org/jira/browse/SOLR-14319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Goodman updated SOLR-14319:
---
Attachment: Screenshot 2020-03-10 at 16.26.28.png
Screenshot 2020-03-10 at 16.33.52.png

> Add ability to select replicatype to admin ui collection creation
> -
>
> Key: SOLR-14319
> URL: https://issues.apache.org/jira/browse/SOLR-14319
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: Admin UI
>Affects Versions: 7.7.2
>Reporter: Richard Goodman
>Priority: Minor
> Attachments: SOLR-14319.patch, Screenshot 2020-03-10 at 16.26.28.png, 
> Screenshot 2020-03-10 at 16.33.52.png
>
>
> This is just a small patch that allows you to select the replica type when 
> creating a collection. I'm aware that a possible strategy for replica types 
> of a collection can be {{'tlog + pull'}}, because of this, I'm open to 
> feedback on a different way to display this feature. Currently I have a drop 
> down box defining the types of replicas, with it defaulting to nrt, and it 
> will take the replication factor specified and will that many replicas of a 
> given type.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Created] (SOLR-14319) Add ability to select replicatype to admin ui collection creation

2020-03-10 Thread Richard Goodman (Jira)

Richard Goodman created SOLR-14319:
--

 Summary: Add ability to select replicatype to admin ui collection 
creation
 Key: SOLR-14319
 URL: https://issues.apache.org/jira/browse/SOLR-14319
 Project: Solr
  Issue Type: Improvement
  Security Level: Public (Default Security Level. Issues are Public)
  Components: Admin UI
Affects Versions: 7.7.2
Reporter: Richard Goodman
 Attachments: SOLR-14319.patch, Screenshot 2020-03-10 at 16.26.28.png, 
Screenshot 2020-03-10 at 16.33.52.png

This is just a small patch that allows you to select the replica type when 
creating a collection. I'm aware that a possible strategy for replica types of 
a collection can be {{'tlog + pull'}}, because of this, I'm open to feedback on 
a different way to display this feature. Currently I have a drop down box 
defining the types of replicas, with it defaulting to nrt, and it will take the 
replication factor specified and will that many replicas of a given type.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Updated] (SOLR-14319) Add ability to select replicatype to admin ui collection creation

2020-03-10 Thread Richard Goodman (Jira)



 [ 
https://issues.apache.org/jira/browse/SOLR-14319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Goodman updated SOLR-14319:
---
Attachment: SOLR-14319.patch

> Add ability to select replicatype to admin ui collection creation
> -
>
> Key: SOLR-14319
> URL: https://issues.apache.org/jira/browse/SOLR-14319
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: Admin UI
>Affects Versions: 7.7.2
>Reporter: Richard Goodman
>Priority: Minor
> Attachments: SOLR-14319.patch, Screenshot 2020-03-10 at 16.26.28.png, 
> Screenshot 2020-03-10 at 16.33.52.png
>
>
> This is just a small patch that allows you to select the replica type when 
> creating a collection. I'm aware that a possible strategy for replica types 
> of a collection can be {{'tlog + pull'}}, because of this, I'm open to 
> feedback on a different way to display this feature. Currently I have a drop 
> down box defining the types of replicas, with it defaulting to nrt, and it 
> will take the replication factor specified and will that many replicas of a 
> given type.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Comment Edited] (SOLR-13199) NPE due to unexpected null return value from QueryBitSetProducer.getBitSet

2020-03-10 Thread Munendra S N (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-13199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17056103#comment-17056103
 ] 

Munendra S N edited comment on SOLR-13199 at 3/10/20, 4:19 PM:
---

[~dsmiley]
Thanks for the review

This is in addition what Mikhail has shared
Initially I was thinking to raise/throw an  Exception but then I thought few 
cases

{code:java}
Likewise if we parse the query and get null, the query is in error.
{code}
One case where query could be null even if parentFilter is specified filter is 
defined on text field and value is stopword. I have seen cases where query 
resolves to null in lot of cases but currently could think of this case. Using 
text field itself for parentFilter is not the right choice but I don't think we 
can control usage. So, when user has specified perfectly fine filter which 
resolves to null should we throw an exception?

{code:java}
If parentsFilter.getBitSet returns null, then we should throw an error that the 
user didn't supply a parentFilter matching parent documents
{code}
parentFilter could be something that matches fewer parent set rather then whole 
parent set. Suggestion throw an error is good if there is an enforcement that 
unique parent condition should be part of each document. Suppose, user is also 
using pagination. Fist page returns properly, there is one such parent product 
which fits the bill and we throw an exception. Same query throws exception 
based on limit and start parameter. Not sure, if that would be right choice

I understand both cases are either bit of stretch or corner cases but I'm 
sharing my reasoning behind going with the above approach
Let me know if these corners cases doesn't make much sense and its okay to fail 
request then, i will modify the patch accordingly.

Also, I have question if someone uses nestPathField approach(defined in the 
schema) but doesn't have any children for parents what does childTransformer 
return? Does it fail the request with valid error or return just the parent 
products?

I haven't yet tried nestPathField for indexing parent-children. So, just 
curious. 




was (Author: munendrasn):
[~dsmiley]
Thanks for the review

This is in addition what Mikhail has shared
Initially I was thinking to raise/throw an  Exception but then I thought few 
cases

{code:java}
Likewise if we parse the query and get null, the query is in error.
{code}
One case where query could be null even if parentFilter is specified filter is 
defined on text field and value is stopword. I have seen cases where query 
resolves to null in lot of cases but currently could think of this case. Using 
text field itself for parentFilter is not the right choice but I don't think we 
can control usage. So, when user has specified perfectly fine filter which 
resolves to null should we throw an exception?

{code:java}
If parentsFilter.getBitSet returns null, then we should throw an error that the 
user didn't supply a parentFilter matching parent documents
{code}
parentFilter could be something that matches fewer parent set rather then whole 
parent set. Suggestion throw an error is good if there is an enforcement that 
unique parent condition should be part of each document. Suppose, user is also 
using pagination. Fist page returns properly, there is one such parent product 
which fits the bill and we throw an exception. Same query throws exception 
based on limit and start parameter. Not sure, if that would be right choice

I understand both cases are either bit of stretch or corner cases but I'm 
sharing my reasoning behind going with the above approach
Let me know if these corners cases doesn't make such sense and its okay to fail 
request then, i will modify the patch accordingly.

Also, I have question if someone uses nestPathField approach(defined in the 
schema) but doesn't have any children for parents what does childTransformer 
return? Does it fail the request with valid error or return just the parent 
products?

I haven't yet tried nestPathField for indexing parent-children. So, just 
curious. 



> NPE due to unexpected null return value from QueryBitSetProducer.getBitSet
> --
>
> Key: SOLR-13199
> URL: https://issues.apache.org/jira/browse/SOLR-13199
> Project: Solr
>  Issue Type: Bug
>  Components: search
>Affects Versions: master (9.0)
> Environment: h1. Steps to reproduce
> * Use a Linux machine.
> *  Build commit {{ea2c8ba}} of Solr as described in the section below.
> * Build the films collection as described below.
> * Start the server using the command {{./bin/solr start -f -p 8983 -s 
> /tmp/home}}
> * Request the URL given in the bug description.
> h1. Compiling the server
> {noformat}
> git clone https://github.com/apache/lucene-solr
> cd lucene-solr
> git

[jira] [Commented] (SOLR-13199) NPE due to unexpected null return value from QueryBitSetProducer.getBitSet

2020-03-10 Thread Munendra S N (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-13199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17056103#comment-17056103
 ] 

Munendra S N commented on SOLR-13199:
-

[~dsmiley]
Thanks for the review

This is in addition what Mikhail has shared
Initially I was thinking to raise/throw an  Exception but then I thought few 
cases

{code:java}
Likewise if we parse the query and get null, the query is in error.
{code}
One case where query could be null even if parentFilter is specified filter is 
defined on text field and value is stopword. I have seen cases where query 
resolves to null in lot of cases but currently could think of this case. Using 
text field itself for parentFilter is not the right choice but I don't think we 
can control usage. So, when user has specified perfectly fine filter which 
resolves to null should we throw an exception?

{code:java}
If parentsFilter.getBitSet returns null, then we should throw an error that the 
user didn't supply a parentFilter matching parent documents
{code}
parentFilter could be something that matches fewer parent set rather then whole 
parent set. Suggestion throw an error is good if there is an enforcement that 
unique parent condition should be part of each document. Suppose, user is also 
using pagination. Fist page returns properly, there is one such parent product 
which fits the bill and we throw an exception. Same query throws exception 
based on limit and start parameter. Not sure, if that would be right choice

I understand both cases are either bit of stretch or corner cases but I'm 
sharing my reasoning behind going with the above approach
Let me know if these corners cases doesn't make such sense and its okay to fail 
request then, i will modify the patch accordingly.

Also, I have question if someone uses nestPathField approach(defined in the 
schema) but doesn't have any children for parents what does childTransformer 
return? Does it fail the request with valid error or return just the parent 
products?

I haven't yet tried nestPathField for indexing parent-children. So, just 
curious. 



> NPE due to unexpected null return value from QueryBitSetProducer.getBitSet
> --
>
> Key: SOLR-13199
> URL: https://issues.apache.org/jira/browse/SOLR-13199
> Project: Solr
>  Issue Type: Bug
>  Components: search
>Affects Versions: master (9.0)
> Environment: h1. Steps to reproduce
> * Use a Linux machine.
> *  Build commit {{ea2c8ba}} of Solr as described in the section below.
> * Build the films collection as described below.
> * Start the server using the command {{./bin/solr start -f -p 8983 -s 
> /tmp/home}}
> * Request the URL given in the bug description.
> h1. Compiling the server
> {noformat}
> git clone https://github.com/apache/lucene-solr
> cd lucene-solr
> git checkout ea2c8ba
> ant compile
> cd solr
> ant server
> {noformat}
> h1. Building the collection
> We followed [Exercise 
> 2|http://lucene.apache.org/solr/guide/7_5/solr-tutorial.html#exercise-2] from 
> the [Solr 
> Tutorial|http://lucene.apache.org/solr/guide/7_5/solr-tutorial.html]. The 
> attached file ({{home.zip}}) gives the contents of folder {{/tmp/home}} that 
> you will obtain by following the steps below:
> {noformat}
> mkdir -p /tmp/home
> echo '' > 
> /tmp/home/solr.xml
> {noformat}
> In one terminal start a Solr instance in foreground:
> {noformat}
> ./bin/solr start -f -p 8983 -s /tmp/home
> {noformat}
> In another terminal, create a collection of movies, with no shards and no 
> replication, and initialize it:
> {noformat}
> bin/solr create -c films
> curl -X POST -H 'Content-type:application/json' --data-binary '{"add-field": 
> {"name":"name", "type":"text_general", "multiValued":false, "stored":true}}' 
> http://localhost:8983/solr/films/schema
> curl -X POST -H 'Content-type:application/json' --data-binary 
> '{"add-copy-field" : {"source":"*","dest":"_text_"}}' 
> http://localhost:8983/solr/films/schema
> ./bin/post -c films example/films/films.json
> {noformat}
>Reporter: Johannes Kloos
>Assignee: Munendra S N
>Priority: Minor
>  Labels: diffblue, newdev
> Attachments: SOLR-13199.patch, home.zip
>
>
> Requesting the following URL causes Solr to return an HTTP 500 error response:
> {noformat}
> http://localhost:8983/solr/films/select?fl=[child%20parentFilter=ge]&q=*:*
> {noformat}
> The error response seems to be caused by the following uncaught exception:
> {noformat}
> java.lang.NullPointerException
> at 
> org.apache.solr.response.transform.ChildDocTransformer.transform(ChildDocTransformer.java:92)
> at org.apache.solr.response.DocsStreamer.next(DocsStreamer.java:103)
> at org.apache.solr.response.DocsStreamer.next(DocsStreamer.java:1)
> at 
> org.apache.solr.response.TextResponseWrit

[jira] [Commented] (SOLR-13807) Caching for term facet counts

2020-03-10 Thread Michael Gibney (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-13807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17056100#comment-17056100
 ] 

Michael Gibney commented on SOLR-13807:
---

Thanks for responding on these points, [~hossman]! Apologies for my delay in 
responding, but it's taken me a while to dig into actually addressing some of 
the issues uncovered by testing (just pushed to [PR 
#751|https://github.com/apache/lucene-solr/pull/751]). Before embarking on a 
potential major refactor of PR code that is I believe essentially sound, I 
first wanted to address the test failures in the existing PR and then see where 
we are with things.

The changes required were not large in terms of number of lines of code. Aside 
from some trivial bug fixes, the substantive issues addressed fell into three 
categories, broadly speaking:
 # UIF caching was an afterthought in the initial patch. I knew this at the 
time I opened the PR (and should have called it out more explicitly) but 
although I had roughed in some of the cache-entry-building logic as a POC, 
nothing was ever actually getting inserted in the cache \(!) and not all code 
branches were covered. It was fairly straightforward to bring UIF into line 
(and I re-enabled the UIF cases from your initial test).
 # Cache entry compatibility across different methods of facet processing. I 
had to clarify that term counts are only eligible for caching when 
{{prefix==null}} (or {{prefix.isEmpty()}}). (It would be possible to use 
no-prefix cached term counts to process prefixed facet requests, but I think it 
makes sense to leave that for later, if at all). Aside from that, missing 
buckets are collected _inline_ and cached for {{FacetFieldProcessorByArrayDV}}, 
but are _not_ collected (nor cached) for {{FacetFieldProcessorByArrayUIF}} (or 
legacy {{DocValuesFacets}}) processing. In practice, it's unlikely that the 
same field would be processed both as UIF (no cached "missing" count) _and_ as 
DV (cached "missing" count), but the case did come up in testing, and I 
addressed it by detecting and re-processing with {{*ByArrayDV}}, and replacing 
the cache entry with the new one that includes "missing" count. The resulting 
"missing"-inclusive cache-entry is backward-compatible with (may be used by) 
{{*ByArrayUIF}} and legacy {{DocValuesFacets}} processing implementations. 
Incidentally, I wonder whether this "inline" collection of "missing" counts is 
something like what you had in mind with the comment "{{TODO: it would be more 
efficient to build up a missing DocSet if we need it here anyway.}}"?
 # Cache key compatibility across blockJoin domain changes. The extant "nested 
facet" implementation only passes the {{base}} DocSet domain down from parent 
to child. One of the things this PR had to do was to also track corresponding 
changes to the {{baseFilters}} – the queries used to generate the {{base}} 
DocSet domain – because these queries are required for use in facet cache keys. 
The initial PR punted on the question of blockJoin domain changes, and simply 
set {{baseFilters = null}}, with a comment in code: "{{unusual case; TODO: can 
we make a cache key for this base domain?}}". Well I meant "unusual _for me, at 
the moment_" :); I just had to put the effort into building proper 
({{baseFilter}} query) cache keys for these domain changes. In the process, I 
also realized that tracking {{baseFilters}} down the nested facet tree should 
probably address "{{TODO: somehow remove responsebuilder dependency}}" – I put 
a {{nocommit}} comment to that effect (and temporarily throw an 
{{AssertionError}} to highlight what I think can now be dead code following). I 
also found myself wondering how exclusion of ancestor tagged filters would 
affect descendent join/graph/blockjoin domain changes ... but that's a separate 
issue.

> Caching for term facet counts
> -
>
> Key: SOLR-13807
> URL: https://issues.apache.org/jira/browse/SOLR-13807
> Project: Solr
>  Issue Type: New Feature
>  Components: Facet Module
>Affects Versions: master (9.0), 8.2
>Reporter: Michael Gibney
>Priority: Minor
> Attachments: SOLR-13807__SOLR-13132_test_stub.patch
>
>
> Solr does not have a facet count cache; so for _every_ request, term facets 
> are recalculated for _every_ (facet) field, by iterating over _every_ field 
> value for _every_ doc in the result domain, and incrementing the associated 
> count.
> As a result, subsequent requests end up redoing a lot of the same work, 
> including all associated object allocation, GC, etc. This situation could 
> benefit from integrated caching.
> Because of the domain-based, serial/iterative nature of term facet 
> calculation, latency is proportional to the size of the result domain. 
> Consequently, one common/clear manifestation of this issue is

[GitHub] [lucene-solr] aroopganguly opened a new pull request #1335: SOLR-14316 Remove unchecked type conversion warning in JavaBinCodec's readMapEntry's equals() method

2020-03-10 Thread GitBox

aroopganguly opened a new pull request #1335: SOLR-14316 Remove unchecked type 
conversion warning in JavaBinCodec's readMapEntry's equals() method
URL: https://github.com/apache/lucene-solr/pull/1335
 
 
   
   
   # Description
   
   there was an unchecked type conversion warning in JavaBinCodec's 
readMapEntry's equals() method
   
   # Solution
   fixed an unchecked type conversion warning in JavaBinCodec's readMapEntry's 
equals() method
   # Tests
   
   no new test added, but all existing test suite succeed with warning now gone.
   
   # Checklist
   
   Please review the following and check all that apply:
   
   - [x] I have reviewed the guidelines for [How to 
Contribute](https://wiki.apache.org/solr/HowToContribute) and my code conforms 
to the standards described there to the best of my ability.
   - [x] I have created a Jira issue and added the issue ID to my pull request 
title.
   - [x] I have given Solr maintainers 
[access](https://help.github.com/en/articles/allowing-changes-to-a-pull-request-branch-created-from-a-fork)
 to contribute to my PR branch. (optional but recommended)
   - [x] I have developed this patch against the `master` branch.
   - [x] I have run `ant precommit` and the appropriate test suite.
   - [x] I have run `gradlew precommit` and the appropriate test suite.
   - [ ] I have added tests for my changes.
   - [ ] I have added documentation for the [Ref 
Guide](https://github.com/apache/lucene-solr/tree/master/solr/solr-ref-guide) 
(for Solr changes only).
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Comment Edited] (LUCENE-8103) QueryValueSource should use TwoPhaseIterator

2020-03-10 Thread Michele Palmia (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-8103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17056067#comment-17056067
 ] 

Michele Palmia edited comment on LUCENE-8103 at 3/10/20, 3:37 PM:
--

Thanks a lot - i had not fully grasped the approximation mechanism and the 
{{TPI. asDocIdSetIterator(tpi)}} implementation.

I uploaded an updated patch.


was (Author: micpalmia):
Thanks a lot - i had not grasped the approximation mechanism and the {{TPI. 
asDocIdSetIterator(tpi)}} implementation.

I uploaded an updated patch.

> QueryValueSource should use TwoPhaseIterator
> 
>
> Key: LUCENE-8103
> URL: https://issues.apache.org/jira/browse/LUCENE-8103
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/other
>Reporter: David Smiley
>Priority: Minor
> Attachments: LUCENE-8103.patch
>
>
> QueryValueSource (in "queries" module) is a ValueSource representation of a 
> Query; the score is the value.  It ought to try to use a TwoPhaseIterator 
> from the query if it can be offered. This will prevent possibly expensive 
> advancing beyond documents that we aren't interested in.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-8103) QueryValueSource should use TwoPhaseIterator

2020-03-10 Thread Michele Palmia (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-8103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17056067#comment-17056067
 ] 

Michele Palmia commented on LUCENE-8103:


Thanks a lot - i had not grasped the approximation mechanism and the {{TPI. 
asDocIdSetIterator(tpi)}} implementation.

I uploaded an updated patch.

> QueryValueSource should use TwoPhaseIterator
> 
>
> Key: LUCENE-8103
> URL: https://issues.apache.org/jira/browse/LUCENE-8103
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/other
>Reporter: David Smiley
>Priority: Minor
> Attachments: LUCENE-8103.patch
>
>
> QueryValueSource (in "queries" module) is a ValueSource representation of a 
> Query; the score is the value.  It ought to try to use a TwoPhaseIterator 
> from the query if it can be offered. This will prevent possibly expensive 
> advancing beyond documents that we aren't interested in.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Updated] (LUCENE-8103) QueryValueSource should use TwoPhaseIterator

2020-03-10 Thread Michele Palmia (Jira)



 [ 
https://issues.apache.org/jira/browse/LUCENE-8103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michele Palmia updated LUCENE-8103:
---
Attachment: (was: LUCENE-8103.patch)

> QueryValueSource should use TwoPhaseIterator
> 
>
> Key: LUCENE-8103
> URL: https://issues.apache.org/jira/browse/LUCENE-8103
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/other
>Reporter: David Smiley
>Priority: Minor
> Attachments: LUCENE-8103.patch
>
>
> QueryValueSource (in "queries" module) is a ValueSource representation of a 
> Query; the score is the value.  It ought to try to use a TwoPhaseIterator 
> from the query if it can be offered. This will prevent possibly expensive 
> advancing beyond documents that we aren't interested in.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Updated] (LUCENE-8103) QueryValueSource should use TwoPhaseIterator

2020-03-10 Thread Michele Palmia (Jira)



 [ 
https://issues.apache.org/jira/browse/LUCENE-8103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michele Palmia updated LUCENE-8103:
---
Attachment: LUCENE-8103.patch

> QueryValueSource should use TwoPhaseIterator
> 
>
> Key: LUCENE-8103
> URL: https://issues.apache.org/jira/browse/LUCENE-8103
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/other
>Reporter: David Smiley
>Priority: Minor
> Attachments: LUCENE-8103.patch
>
>
> QueryValueSource (in "queries" module) is a ValueSource representation of a 
> Query; the score is the value.  It ought to try to use a TwoPhaseIterator 
> from the query if it can be offered. This will prevent possibly expensive 
> advancing beyond documents that we aren't interested in.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] atris commented on issue #1294: LUCENE-9074: Slice Allocation Control Plane For Concurrent Searches

2020-03-10 Thread GitBox

atris commented on issue #1294: LUCENE-9074: Slice Allocation Control Plane For 
Concurrent Searches
URL: https://github.com/apache/lucene-solr/pull/1294#issuecomment-597139106
 
 
   @jpountz Any thoughts on this one?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-9266) ant nightly-smoke fails due to presence of build.gradle

2020-03-10 Thread Mike Drob (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-9266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17056023#comment-17056023
 ] 

Mike Drob commented on LUCENE-9266:
---

Gradle itself is master only, but there are other failures on 8x nightly 
currently ([https://builds.apache.org/job/Lucene-Solr-SmokeRelease-8.x/372/]) 
that I'm sure I will discover on master once I get through whatever else is 
failing.

> ant nightly-smoke fails due to presence of build.gradle
> ---
>
> Key: LUCENE-9266
> URL: https://issues.apache.org/jira/browse/LUCENE-9266
> Project: Lucene - Core
>  Issue Type: Sub-task
>Reporter: Mike Drob
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Seen on Jenkins - 
> [https://builds.apache.org/job/Lucene-Solr-SmokeRelease-master/1617/console]
>  
> Reproduced locally.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-9236) Having a modular Doc Values format

2020-03-10 Thread juan camilo rodriguez duran (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-9236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17056013#comment-17056013
 ] 

juan camilo rodriguez duran commented on LUCENE-9236:
-

[~jpountz] This is step 2 of the Jira issue, I want to know what do you think 
about step one, only splitting the big classes and then make the reader and 
writing part more symmetric

> Having a modular Doc Values format
> --
>
> Key: LUCENE-9236
> URL: https://issues.apache.org/jira/browse/LUCENE-9236
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/index
>Reporter: juan camilo rodriguez duran
>Priority: Minor
>  Labels: docValues
>
>  Today DocValues Consumer/Producer require override 5 different methods, even 
> if you only want to use one and given that one given field can only support 
> one doc values type at same time.
>  
> In the attached PR I’ve implemented a new modular version of those classes 
> (consumer/producer) each one having a single responsibility and writing in 
> the same unique file.
> This is mainly a refactor of the existing format opening the possibility to 
> override or implement the sub-format you need.
>  
> I’ll do in 3 steps:
>  # Create a CompositeDocValuesFormat and moving the code of 
> Lucene80DocValuesFormat in separate classes, without modifying the inner 
> code. At same time I created a Lucene85CompositeDocValuesFormat based on 
> these changes.
>  # I’ll introduce some basic components for writing doc values in general 
> such as:
>  ## DocumentIdSetIterator Serializer: used in each type of field based on an 
> IndexedDISI.
>  ## Document Ordinals Serializer: Used in Sorted and SortedSet for 
> deduplicate values using a dictionary.
>  ## Document Boundaries Serializer (optional used only for multivalued 
> fields: SortedNumeric and SortedSet)
>  ## TermsEnum Serializer: useful to write and read the terms dictionary for 
> sorted and sorted set doc values.
>  # I’ll create the new Sub-DocValues format using the previous components.
>  
> PR: [https://github.com/apache/lucene-solr/pull/1282]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Comment Edited] (LUCENE-8674) UnsupportedOperationException due to call to o.a.l.q.f.FunctionValues.floatVal

2020-03-10 Thread Michele Palmia (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-8674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17055990#comment-17055990
 ] 

Michele Palmia edited comment on LUCENE-8674 at 3/10/20, 2:20 PM:
--

The problematic query ( {{?fq=\{!frange l=10 u=100}or_version_s,directed_by}} ) 
specifies two value sources separated by a comma 
({{or_version_s,directed_by}}). These are parsed as a {{VectorValueSource}} 
embedding the two individual ValueSources corresponding to the two fields (see 
[FunctionQParser.java:115|https://github.com/apache/lucene-solr/blob/master/solr/core/src/java/org/apache/solr/search/FunctionQParser.java#L115]).


was (Author: micpalmia):
The problematic query ( 
{{?fq={!frange%20l=10%20u=100}or_version_s,directed_by}} ) specifies two value 
sources separated by a comma ({{or_version_s,directed_by}}). These are parsed 
as a {{VectorValueSource}} embedding the two individual ValueSources 
corresponding to the two fields (see 
[FunctionQParser.java:115|https://github.com/apache/lucene-solr/blob/master/solr/core/src/java/org/apache/solr/search/FunctionQParser.java#L115]).

> UnsupportedOperationException due to call to o.a.l.q.f.FunctionValues.floatVal
> --
>
> Key: LUCENE-8674
> URL: https://issues.apache.org/jira/browse/LUCENE-8674
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/query/scoring
>Affects Versions: master (9.0)
> Environment: h1. Steps to reproduce
> * Use a Linux machine.
> *  Build commit {{ea2c8ba}} of Solr as described in the section below.
> * Build the films collection as described below.
> * Start the server using the command {{./bin/solr start -f -p 8983 -s 
> /tmp/home}}
> * Request the URL given in the bug description.
> h1. Compiling the server
> {noformat}
> git clone https://github.com/apache/lucene-solr
> cd lucene-solr
> git checkout ea2c8ba
> ant compile
> cd solr
> ant server
> {noformat}
> h1. Building the collection and reproducing the bug
> We followed [Exercise 
> 2|http://lucene.apache.org/solr/guide/7_5/solr-tutorial.html#exercise-2] from 
> the [Solr 
> Tutorial|http://lucene.apache.org/solr/guide/7_5/solr-tutorial.html].
> {noformat}
> mkdir -p /tmp/home
> echo '' > 
> /tmp/home/solr.xml
> {noformat}
> In one terminal start a Solr instance in foreground:
> {noformat}
> ./bin/solr start -f -p 8983 -s /tmp/home
> {noformat}
> In another terminal, create a collection of movies, with no shards and no 
> replication, and initialize it:
> {noformat}
> bin/solr create -c films
> curl -X POST -H 'Content-type:application/json' --data-binary '{"add-field": 
> {"name":"name", "type":"text_general", "multiValued":false, "stored":true}}' 
> http://localhost:8983/solr/films/schema
> curl -X POST -H 'Content-type:application/json' --data-binary 
> '{"add-copy-field" : {"source":"*","dest":"_text_"}}' 
> http://localhost:8983/solr/films/schema
> ./bin/post -c films example/films/films.json
> curl -v “URL_BUG”
> {noformat}
> Please check the issue description below to find the “URL_BUG” that will 
> allow you to reproduce the issue reported.
>Reporter: Johannes Kloos
>Priority: Minor
>  Labels: diffblue, newdev
>
> Requesting the following URL causes Solr to return an HTTP 500 error response:
> {noformat}
> http://localhost:8983/solr/films/select?fq={!frange%20l=10%20u=100}or_version_s,directed_by
> {noformat}
> The error response seems to be caused by the following uncaught exception:
> {noformat}
> java.lang.UnsupportedOperationException
> at 
> org.apache.lucene.queries.function.FunctionValues.floatVal(FunctionValues.java:47)
> at 
> org.apache.lucene.queries.function.FunctionValues$3.matches(FunctionValues.java:188)
> at 
> org.apache.lucene.queries.function.ValueSourceScorer$1.matches(ValueSourceScorer.java:53)
> at 
> org.apache.lucene.search.TwoPhaseIterator$TwoPhaseIteratorAsDocIdSetIterator.doNext(TwoPhaseIterator.java:89)
> at 
> org.apache.lucene.search.TwoPhaseIterator$TwoPhaseIteratorAsDocIdSetIterator.nextDoc(TwoPhaseIterator.java:77)
> at org.apache.lucene.search.Weight$DefaultBulkScorer.scoreAll(Weight.java:261)
> at org.apache.lucene.search.Weight$DefaultBulkScorer.score(Weight.java:214)
> at org.apache.lucene.search.BulkScorer.score(BulkScorer.java:39)
> at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:652)
> at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:443)
> at org.apache.solr.search.DocSetUtil.createDocSetGeneric(DocSetUtil.java:151)
> at org.apache.solr.search.DocSetUtil.createDocSet(DocSetUtil.java:140)
> at 
> org.apache.solr.search.SolrIndexSearcher.getDocSetNC(SolrIndexSearcher.java:1177)
> at 
> org.apache.solr.search.SolrIndexSearcher.getPositiveDocSet(SolrIndexSearcher.java:817)
> at 
> org.apach

[jira] [Commented] (LUCENE-8674) UnsupportedOperationException due to call to o.a.l.q.f.FunctionValues.floatVal

2020-03-10 Thread Michele Palmia (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-8674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17055990#comment-17055990
 ] 

Michele Palmia commented on LUCENE-8674:


The problematic query ( 
{{?fq={!frange%20l=10%20u=100}or_version_s,directed_by}} ) specifies two value 
sources separated by a comma ({{or_version_s,directed_by}}). These are parsed 
as a {{VectorValueSource}} embedding the two individual ValueSources 
corresponding to the two fields (see 
[FunctionQParser.java:115|https://github.com/apache/lucene-solr/blob/master/solr/core/src/java/org/apache/solr/search/FunctionQParser.java#L115]).

> UnsupportedOperationException due to call to o.a.l.q.f.FunctionValues.floatVal
> --
>
> Key: LUCENE-8674
> URL: https://issues.apache.org/jira/browse/LUCENE-8674
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/query/scoring
>Affects Versions: master (9.0)
> Environment: h1. Steps to reproduce
> * Use a Linux machine.
> *  Build commit {{ea2c8ba}} of Solr as described in the section below.
> * Build the films collection as described below.
> * Start the server using the command {{./bin/solr start -f -p 8983 -s 
> /tmp/home}}
> * Request the URL given in the bug description.
> h1. Compiling the server
> {noformat}
> git clone https://github.com/apache/lucene-solr
> cd lucene-solr
> git checkout ea2c8ba
> ant compile
> cd solr
> ant server
> {noformat}
> h1. Building the collection and reproducing the bug
> We followed [Exercise 
> 2|http://lucene.apache.org/solr/guide/7_5/solr-tutorial.html#exercise-2] from 
> the [Solr 
> Tutorial|http://lucene.apache.org/solr/guide/7_5/solr-tutorial.html].
> {noformat}
> mkdir -p /tmp/home
> echo '' > 
> /tmp/home/solr.xml
> {noformat}
> In one terminal start a Solr instance in foreground:
> {noformat}
> ./bin/solr start -f -p 8983 -s /tmp/home
> {noformat}
> In another terminal, create a collection of movies, with no shards and no 
> replication, and initialize it:
> {noformat}
> bin/solr create -c films
> curl -X POST -H 'Content-type:application/json' --data-binary '{"add-field": 
> {"name":"name", "type":"text_general", "multiValued":false, "stored":true}}' 
> http://localhost:8983/solr/films/schema
> curl -X POST -H 'Content-type:application/json' --data-binary 
> '{"add-copy-field" : {"source":"*","dest":"_text_"}}' 
> http://localhost:8983/solr/films/schema
> ./bin/post -c films example/films/films.json
> curl -v “URL_BUG”
> {noformat}
> Please check the issue description below to find the “URL_BUG” that will 
> allow you to reproduce the issue reported.
>Reporter: Johannes Kloos
>Priority: Minor
>  Labels: diffblue, newdev
>
> Requesting the following URL causes Solr to return an HTTP 500 error response:
> {noformat}
> http://localhost:8983/solr/films/select?fq={!frange%20l=10%20u=100}or_version_s,directed_by
> {noformat}
> The error response seems to be caused by the following uncaught exception:
> {noformat}
> java.lang.UnsupportedOperationException
> at 
> org.apache.lucene.queries.function.FunctionValues.floatVal(FunctionValues.java:47)
> at 
> org.apache.lucene.queries.function.FunctionValues$3.matches(FunctionValues.java:188)
> at 
> org.apache.lucene.queries.function.ValueSourceScorer$1.matches(ValueSourceScorer.java:53)
> at 
> org.apache.lucene.search.TwoPhaseIterator$TwoPhaseIteratorAsDocIdSetIterator.doNext(TwoPhaseIterator.java:89)
> at 
> org.apache.lucene.search.TwoPhaseIterator$TwoPhaseIteratorAsDocIdSetIterator.nextDoc(TwoPhaseIterator.java:77)
> at org.apache.lucene.search.Weight$DefaultBulkScorer.scoreAll(Weight.java:261)
> at org.apache.lucene.search.Weight$DefaultBulkScorer.score(Weight.java:214)
> at org.apache.lucene.search.BulkScorer.score(BulkScorer.java:39)
> at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:652)
> at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:443)
> at org.apache.solr.search.DocSetUtil.createDocSetGeneric(DocSetUtil.java:151)
> at org.apache.solr.search.DocSetUtil.createDocSet(DocSetUtil.java:140)
> at 
> org.apache.solr.search.SolrIndexSearcher.getDocSetNC(SolrIndexSearcher.java:1177)
> at 
> org.apache.solr.search.SolrIndexSearcher.getPositiveDocSet(SolrIndexSearcher.java:817)
> at 
> org.apache.solr.search.SolrIndexSearcher.getProcessedFilter(SolrIndexSearcher.java:1025)
> at 
> org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:1540)
> at 
> org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:1420)
> at org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:567)
> at 
> org.apache.solr.handler.component.QueryComponent.doProcessUngroupedSearch(QueryComponent.java:1434)
> {noformat}
> Sadly, I can't understand the logic of this code

[jira] [Created] (SOLR-14318) Missing dependency on commons-lang in solr-cell 8.4.1

2020-03-10 Thread Jira

Markus Günther created SOLR-14318:
-

 Summary: Missing dependency on commons-lang in solr-cell 8.4.1
 Key: SOLR-14318
 URL: https://issues.apache.org/jira/browse/SOLR-14318
 Project: Solr
  Issue Type: Bug
  Security Level: Public (Default Security Level. Issues are Public)
  Components: contrib - Solr Cell (Tika extraction)
Affects Versions: 8.4.1
Reporter: Markus Günther


During a migration from Solr 7.x to Solr 8.4.1 we noticed that the 
commons-lang:commons-lang:2.6 dependency has been removed, and thus, no longer 
is part of org.apache.solr:solr-cell. solr-cell however comes bundled with 
Apache Tika Parsers (org.apache.tika:tika-parsers) in version 1.19.1 which - 
although it is not an explicit dependency - does require 
commons-lang:commons-lang:2.6.

This raises an issue when trying to extract the content from Microsoft Access 
database files using Tika. See the stacktrace below.
{code:java}
java.lang.NoClassDefFoundError: 
org/apache/commons/lang/ObjectUtilsjava.lang.NoClassDefFoundError: 
org/apache/commons/lang/ObjectUtils at 
com.healthmarketscience.jackcess.util.SimpleColumnMatcher.equals(SimpleColumnMatcher.java:74)
 at 
com.healthmarketscience.jackcess.util.SimpleColumnMatcher.matches(SimpleColumnMatcher.java:46)
 at 
com.healthmarketscience.jackcess.util.CaseInsensitiveColumnMatcher.matches(CaseInsensitiveColumnMatcher.java:49)
 at 
com.healthmarketscience.jackcess.impl.CursorImpl.currentRowMatchesImpl(CursorImpl.java:571)
 at 
com.healthmarketscience.jackcess.impl.CursorImpl.findAnotherRowImpl(CursorImpl.java:627)
 at 
com.healthmarketscience.jackcess.impl.CursorImpl.findAnotherRow(CursorImpl.java:517)
 at 
com.healthmarketscience.jackcess.impl.CursorImpl.findFirstRow(CursorImpl.java:494)
 at 
com.healthmarketscience.jackcess.impl.DatabaseImpl$FallbackTableFinder.findRow(DatabaseImpl.java:2376)
 at 
com.healthmarketscience.jackcess.impl.DatabaseImpl$TableFinder.findObjectId(DatabaseImpl.java:2176)
 at 
com.healthmarketscience.jackcess.impl.DatabaseImpl.readSystemCatalog(DatabaseImpl.java:879)
 at 
com.healthmarketscience.jackcess.impl.DatabaseImpl.(DatabaseImpl.java:534)
 at 
com.healthmarketscience.jackcess.impl.DatabaseImpl.open(DatabaseImpl.java:401) 
at 
com.healthmarketscience.jackcess.DatabaseBuilder.open(DatabaseBuilder.java:252) 
at 
org.apache.tika.parser.microsoft.JackcessParser.parse(JackcessParser.java:94) 
at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280) at 
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280) at 
org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143) at 
org.apache.tika.parser.DelegatingParser.parse(DelegatingParser.java:72) at 
org.apache.tika.extractor.ParsingEmbeddedDocumentExtractor.parseEmbedded(ParsingEmbeddedDocumentExtractor.java:102)
 at org.apache.tika.parser.pkg.PackageParser.parseEntry(PackageParser.java:350) 
at org.apache.tika.parser.pkg.PackageParser.parse(PackageParser.java:287) at 
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280) at 
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280) at 
org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143) at 
org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:228)
 at 
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:68)
 at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:211)
 at org.apache.solr.core.SolrCore.execute(SolrCore.java:2596) at 
org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:799) at 
org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:578) at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:419)
 at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:351)
 at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1602)
 at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:540) 
at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:146) 
at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548) 
at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132) 
at 
org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:257)
 at 
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1711)
 at 
org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:255)
 at 
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1347)
 at 
org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:203)
 at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:480) 
at 
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.ja

[jira] [Commented] (SOLR-8306) Enhance ExpandComponent to allow expand.hits=0

2020-03-10 Thread Amelia Henderson (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-8306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17055977#comment-17055977
 ] 

Amelia Henderson commented on SOLR-8306:


Added a Github PR that includes Marshall's work and some tests.

> Enhance ExpandComponent to allow expand.hits=0
> --
>
> Key: SOLR-8306
> URL: https://issues.apache.org/jira/browse/SOLR-8306
> Project: Solr
>  Issue Type: Improvement
>Affects Versions: 5.3.1
>Reporter: Marshall Sanders
>Priority: Minor
>  Labels: expand
> Fix For: 5.5
>
> Attachments: SOLR-8306.patch, SOLR-8306.patch, 
> SOLR-8306_branch_5x@1715230.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> This enhancement allows the ExpandComponent to allow expand.hits=0 for those 
> who don't want an expanded document returned and only want the numFound from 
> the expand section.
> This is useful for "See 54 more like this" use cases, but without the 
> performance hit of gathering an entire expanded document.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] ameliahenderson opened a new pull request #1334: SOLR-8306: Enhance ExpandComponent to allow expand.hits=0

2020-03-10 Thread GitBox

ameliahenderson opened a new pull request #1334: SOLR-8306: Enhance 
ExpandComponent to allow expand.hits=0
URL: https://github.com/apache/lucene-solr/pull/1334
 
 
   
   
   
   # Description
   
   Please provide a short description of the changes you're making with this 
pull request.
   
   # Solution
   
   Please provide a short description of the approach taken to implement your 
solution.
   
   # Tests
   
   Please describe the tests you've developed or run to confirm this patch 
implements the feature or solves the problem.
   
   # Checklist
   
   Please review the following and check all that apply:
   
   - [x] I have reviewed the guidelines for [How to 
Contribute](https://wiki.apache.org/solr/HowToContribute) and my code conforms 
to the standards described there to the best of my ability.
   - [x] I have created a Jira issue and added the issue ID to my pull request 
title.
   - [ ] I have given Solr maintainers 
[access](https://help.github.com/en/articles/allowing-changes-to-a-pull-request-branch-created-from-a-fork)
 to contribute to my PR branch. (optional but recommended)
   - [x] I have developed this patch against the `master` branch.
   - [x] I have run `ant precommit` and the appropriate test suite.
   - [x] I have added tests for my changes.
   - [ ] I have added documentation for the [Ref 
Guide](https://github.com/apache/lucene-solr/tree/master/solr/solr-ref-guide) 
(for Solr changes only).
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Comment Edited] (LUCENE-9269) Blended queries with boolean rewrite can result in inconstitent scores

2020-03-10 Thread Michele Palmia (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-9269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17055927#comment-17055927
 ] 

Michele Palmia edited comment on LUCENE-9269 at 3/10/20, 1:15 PM:
--

I was actually just looking at a [user 
report|https://mail-archives.apache.org/mod_mbox/lucene-dev/202003.mbox/%3CCALyzSEn%2BQFoT3MpNYkxw-dEK9jc59mSTvXqccuUVMMDAgOMMmA%40mail.gmail.com%3E]
 that came to lucene-dev and looked interesting. In their use case, they were 
using fuzzy queries, that in turn generate blended queries that are affected by 
this issue. Maybe users of BlendedQuery/FuzzyQuery should be able to find some 
form of warning in the docs (while 
[LUCENE-8840|https://issues.apache.org/jira/browse/LUCENE-8840] is not fixed)?


was (Author: micpalmia):
I was actually just looking at a [user 
report|https://mail-archives.apache.org/mod_mbox/lucene-dev/202003.mbox/%3CCALyzSEn%2BQFoT3MpNYkxw-dEK9jc59mSTvXqccuUVMMDAgOMMmA%40mail.gmail.com%3E]
 that came to lucene-dev and looked interesting. In their use case, they were 
using fuzzy queries, that in turn generate blended queries that are affected by 
this issue. Maybe users of BlendedQuery/FuzzyQuery should be able to find some 
form of warning in the docs?

> Blended queries with boolean rewrite can result in inconstitent scores
> --
>
> Key: LUCENE-9269
> URL: https://issues.apache.org/jira/browse/LUCENE-9269
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/search
>Affects Versions: 8.4
>Reporter: Michele Palmia
>Priority: Minor
> Attachments: LUCENE-9269-test.patch
>
>
> If two blended queries are should clauses of a boolean query and are built so 
> that
>  * some of their terms are the same
>  * their rewrite method is BlendedTermQuery.BOOLEAN_REWRITE
> the docFreq for the overlapping terms used for scoring is picked as follow:
>  # if the overlapping terms are not boosted, the df of the term in the first 
> blended query is used
>  # if any of the overlapping terms is boosted, the df is picked at (what 
> looks like) random.
> A few examples using a field with 2 terms: f:a (df: 2), and f:b (df: 3).
> {code:java}
> a)
> Blended(f:a f:b) Blended (f:a)
> df: 3 df: 2
> gets rewritten to:
> (f:a)^2.0 (f:b)
> df: 3  df:2
> b)
> Blended(f:a) Blended(f:a f:b)
> df: 2df: 3
> gets rewritten to:
> (f:a)^2.0 (f:b)
>  df: 2 df:2
> c)
> Blended(f:a f:b^0.66) Blended (f:a^0.75)
> df: 3  df: 2
> gets rewritten to:
> (f:a)^1.75 (f:b)^0.66
>  df:?   df:2
> {code}
> with ? either 2 or 3, depending on the run.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Resolved] (SOLR-14139) Support backtick phrase queries in Streaming Expressions

2020-03-10 Thread Joel Bernstein (Jira)



 [ 
https://issues.apache.org/jira/browse/SOLR-14139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joel Bernstein resolved SOLR-14139.
---
Fix Version/s: 8.5
   Resolution: Resolved

> Support backtick phrase queries in Streaming Expressions
> 
>
> Key: SOLR-14139
> URL: https://issues.apache.org/jira/browse/SOLR-14139
> Project: Solr
>  Issue Type: Improvement
>Reporter: Joel Bernstein
>Priority: Major
> Fix For: 8.5
>
> Attachments: SOLR-14139.patch, SOLR-14139.patch
>
>
> Currently in order to make phrase queries in Streaming Expressions you must 
> escape the quotes as follows:
> {code:java}
> search(collection1, q="fieldA:\"hello world\" AND fieldB:two"){code}
> This ticket will allow phrase queries to be entered with back ticks as 
> follows:
> {code:java}
> search(collection1, q="fieldA:`hello world` AND fieldB:two") {code}
> Back ticks are nice because they are infrequently searched on and people in 
> the SQL world are used to back ticks meaning "take the literal value of this 
> string".
> Under the covers back ticks will be translated to double quotes.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-14139) Support backtick phrase queries in Streaming Expressions

2020-03-10 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-14139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17055939#comment-17055939
 ] 

ASF subversion and git services commented on SOLR-14139:


Commit d3c2afec4fbf9501279034e8d6aca4c5af797616 in lucene-solr's branch 
refs/heads/branch_8_5 from Joel Bernstein
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=d3c2afe ]

SOLR-14139: Update CHANGE.txt


> Support backtick phrase queries in Streaming Expressions
> 
>
> Key: SOLR-14139
> URL: https://issues.apache.org/jira/browse/SOLR-14139
> Project: Solr
>  Issue Type: Improvement
>Reporter: Joel Bernstein
>Priority: Major
> Attachments: SOLR-14139.patch, SOLR-14139.patch
>
>
> Currently in order to make phrase queries in Streaming Expressions you must 
> escape the quotes as follows:
> {code:java}
> search(collection1, q="fieldA:\"hello world\" AND fieldB:two"){code}
> This ticket will allow phrase queries to be entered with back ticks as 
> follows:
> {code:java}
> search(collection1, q="fieldA:`hello world` AND fieldB:two") {code}
> Back ticks are nice because they are infrequently searched on and people in 
> the SQL world are used to back ticks meaning "take the literal value of this 
> string".
> Under the covers back ticks will be translated to double quotes.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-14139) Support backtick phrase queries in Streaming Expressions

2020-03-10 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-14139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17055938#comment-17055938
 ] 

ASF subversion and git services commented on SOLR-14139:


Commit c179ab66e4facd9d342c33c6fda021f27165941a in lucene-solr's branch 
refs/heads/branch_8x from Joel Bernstein
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=c179ab6 ]

SOLR-14139: Update CHANGE.txt


> Support backtick phrase queries in Streaming Expressions
> 
>
> Key: SOLR-14139
> URL: https://issues.apache.org/jira/browse/SOLR-14139
> Project: Solr
>  Issue Type: Improvement
>Reporter: Joel Bernstein
>Priority: Major
> Attachments: SOLR-14139.patch, SOLR-14139.patch
>
>
> Currently in order to make phrase queries in Streaming Expressions you must 
> escape the quotes as follows:
> {code:java}
> search(collection1, q="fieldA:\"hello world\" AND fieldB:two"){code}
> This ticket will allow phrase queries to be entered with back ticks as 
> follows:
> {code:java}
> search(collection1, q="fieldA:`hello world` AND fieldB:two") {code}
> Back ticks are nice because they are infrequently searched on and people in 
> the SQL world are used to back ticks meaning "take the literal value of this 
> string".
> Under the covers back ticks will be translated to double quotes.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-14139) Support backtick phrase queries in Streaming Expressions

2020-03-10 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-14139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17055936#comment-17055936
 ] 

ASF subversion and git services commented on SOLR-14139:


Commit 193e4a64234b2f76036d8f018a7478d61e5a0fab in lucene-solr's branch 
refs/heads/master from Joel Bernstein
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=193e4a6 ]

SOLR-14139: Update CHANGE.txt


> Support backtick phrase queries in Streaming Expressions
> 
>
> Key: SOLR-14139
> URL: https://issues.apache.org/jira/browse/SOLR-14139
> Project: Solr
>  Issue Type: Improvement
>Reporter: Joel Bernstein
>Priority: Major
> Attachments: SOLR-14139.patch, SOLR-14139.patch
>
>
> Currently in order to make phrase queries in Streaming Expressions you must 
> escape the quotes as follows:
> {code:java}
> search(collection1, q="fieldA:\"hello world\" AND fieldB:two"){code}
> This ticket will allow phrase queries to be entered with back ticks as 
> follows:
> {code:java}
> search(collection1, q="fieldA:`hello world` AND fieldB:two") {code}
> Back ticks are nice because they are infrequently searched on and people in 
> the SQL world are used to back ticks meaning "take the literal value of this 
> string".
> Under the covers back ticks will be translated to double quotes.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Comment Edited] (LUCENE-9269) Blended queries with boolean rewrite can result in inconstitent scores

2020-03-10 Thread Michele Palmia (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-9269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17055927#comment-17055927
 ] 

Michele Palmia edited comment on LUCENE-9269 at 3/10/20, 1:07 PM:
--

I was actually just looking at a [user 
report|https://mail-archives.apache.org/mod_mbox/lucene-dev/202003.mbox/%3CCALyzSEn%2BQFoT3MpNYkxw-dEK9jc59mSTvXqccuUVMMDAgOMMmA%40mail.gmail.com%3E]
 that came to lucene-dev and looked interesting. In their use case, they were 
using fuzzy queries, that in turn generate blended queries that are affected by 
this issue. Maybe users of BlendedQuery/FuzzyQuery should be able to find some 
form of warning in the docs?


was (Author: micpalmia):
I was actually just looking at a [user 
report|https://mail-archives.apache.org/mod_mbox/lucene-dev/202003.mbox/browser]
 that came to lucene-dev and looked interesting. In their use case, they were 
using fuzzy queries, that in turn generate blended queries that are affected by 
this issue. Maybe users of BlendedQuery/FuzzyQuery should be able to find some 
form of warning in the docs?

> Blended queries with boolean rewrite can result in inconstitent scores
> --
>
> Key: LUCENE-9269
> URL: https://issues.apache.org/jira/browse/LUCENE-9269
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/search
>Affects Versions: 8.4
>Reporter: Michele Palmia
>Priority: Minor
> Attachments: LUCENE-9269-test.patch
>
>
> If two blended queries are should clauses of a boolean query and are built so 
> that
>  * some of their terms are the same
>  * their rewrite method is BlendedTermQuery.BOOLEAN_REWRITE
> the docFreq for the overlapping terms used for scoring is picked as follow:
>  # if the overlapping terms are not boosted, the df of the term in the first 
> blended query is used
>  # if any of the overlapping terms is boosted, the df is picked at (what 
> looks like) random.
> A few examples using a field with 2 terms: f:a (df: 2), and f:b (df: 3).
> {code:java}
> a)
> Blended(f:a f:b) Blended (f:a)
> df: 3 df: 2
> gets rewritten to:
> (f:a)^2.0 (f:b)
> df: 3  df:2
> b)
> Blended(f:a) Blended(f:a f:b)
> df: 2df: 3
> gets rewritten to:
> (f:a)^2.0 (f:b)
>  df: 2 df:2
> c)
> Blended(f:a f:b^0.66) Blended (f:a^0.75)
> df: 3  df: 2
> gets rewritten to:
> (f:a)^1.75 (f:b)^0.66
>  df:?   df:2
> {code}
> with ? either 2 or 3, depending on the run.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Comment Edited] (LUCENE-9269) Blended queries with boolean rewrite can result in inconstitent scores

2020-03-10 Thread Michele Palmia (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-9269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17055927#comment-17055927
 ] 

Michele Palmia edited comment on LUCENE-9269 at 3/10/20, 1:05 PM:
--

I was actually just looking at a [user 
report|https://mail-archives.apache.org/mod_mbox/lucene-dev/202003.mbox/browser]
 that came to lucene-dev and looked interesting. In their use case, they were 
using fuzzy queries, that in turn generate blended queries that are affected by 
this issue. Maybe users of BlendedQuery/FuzzyQuery should be able to find some 
form of warning in the docs?


was (Author: micpalmia):
I was actually just looking at a [user 
report|[https://mail-archives.apache.org/mod_mbox/lucene-dev/202003.mbox/browser]]
 that came to lucene-dev and looked interesting. In their use case, they were 
using fuzzy queries, that in turn generate blended queries that are affected by 
this issue. Maybe users of BlendedQuery/FuzzyQuery should be able to find some 
form of warning in the docs?

> Blended queries with boolean rewrite can result in inconstitent scores
> --
>
> Key: LUCENE-9269
> URL: https://issues.apache.org/jira/browse/LUCENE-9269
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/search
>Affects Versions: 8.4
>Reporter: Michele Palmia
>Priority: Minor
> Attachments: LUCENE-9269-test.patch
>
>
> If two blended queries are should clauses of a boolean query and are built so 
> that
>  * some of their terms are the same
>  * their rewrite method is BlendedTermQuery.BOOLEAN_REWRITE
> the docFreq for the overlapping terms used for scoring is picked as follow:
>  # if the overlapping terms are not boosted, the df of the term in the first 
> blended query is used
>  # if any of the overlapping terms is boosted, the df is picked at (what 
> looks like) random.
> A few examples using a field with 2 terms: f:a (df: 2), and f:b (df: 3).
> {code:java}
> a)
> Blended(f:a f:b) Blended (f:a)
> df: 3 df: 2
> gets rewritten to:
> (f:a)^2.0 (f:b)
> df: 3  df:2
> b)
> Blended(f:a) Blended(f:a f:b)
> df: 2df: 3
> gets rewritten to:
> (f:a)^2.0 (f:b)
>  df: 2 df:2
> c)
> Blended(f:a f:b^0.66) Blended (f:a^0.75)
> df: 3  df: 2
> gets rewritten to:
> (f:a)^1.75 (f:b)^0.66
>  df:?   df:2
> {code}
> with ? either 2 or 3, depending on the run.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-9269) Blended queries with boolean rewrite can result in inconstitent scores

2020-03-10 Thread Michele Palmia (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-9269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17055927#comment-17055927
 ] 

Michele Palmia commented on LUCENE-9269:


I was actually just looking at a [user 
report|[https://mail-archives.apache.org/mod_mbox/lucene-dev/202003.mbox/browser]]
 that came to lucene-dev and looked interesting. In their use case, they were 
using fuzzy queries, that in turn generate blended queries that are affected by 
this issue. Maybe users of BlendedQuery/FuzzyQuery should be able to find some 
form of warning in the docs?

> Blended queries with boolean rewrite can result in inconstitent scores
> --
>
> Key: LUCENE-9269
> URL: https://issues.apache.org/jira/browse/LUCENE-9269
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/search
>Affects Versions: 8.4
>Reporter: Michele Palmia
>Priority: Minor
> Attachments: LUCENE-9269-test.patch
>
>
> If two blended queries are should clauses of a boolean query and are built so 
> that
>  * some of their terms are the same
>  * their rewrite method is BlendedTermQuery.BOOLEAN_REWRITE
> the docFreq for the overlapping terms used for scoring is picked as follow:
>  # if the overlapping terms are not boosted, the df of the term in the first 
> blended query is used
>  # if any of the overlapping terms is boosted, the df is picked at (what 
> looks like) random.
> A few examples using a field with 2 terms: f:a (df: 2), and f:b (df: 3).
> {code:java}
> a)
> Blended(f:a f:b) Blended (f:a)
> df: 3 df: 2
> gets rewritten to:
> (f:a)^2.0 (f:b)
> df: 3  df:2
> b)
> Blended(f:a) Blended(f:a f:b)
> df: 2df: 3
> gets rewritten to:
> (f:a)^2.0 (f:b)
>  df: 2 df:2
> c)
> Blended(f:a f:b^0.66) Blended (f:a^0.75)
> df: 3  df: 2
> gets rewritten to:
> (f:a)^1.75 (f:b)^0.66
>  df:?   df:2
> {code}
> with ? either 2 or 3, depending on the run.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-9266) ant nightly-smoke fails due to presence of build.gradle

2020-03-10 Thread Dawid Weiss (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-9266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17055922#comment-17055922
 ] 

Dawid Weiss commented on LUCENE-9266:
-

I don't think the plan is to backporte gradle build to 8x - at least I don't 
plan to invest time in doing this (and it's hard for me to say how much work 
it'd be).

> ant nightly-smoke fails due to presence of build.gradle
> ---
>
> Key: LUCENE-9266
> URL: https://issues.apache.org/jira/browse/LUCENE-9266
> Project: Lucene - Core
>  Issue Type: Sub-task
>Reporter: Mike Drob
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Seen on Jenkins - 
> [https://builds.apache.org/job/Lucene-Solr-SmokeRelease-master/1617/console]
>  
> Reproduced locally.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Comment Edited] (LUCENE-9269) Blended queries with boolean rewrite can result in inconstitent scores

2020-03-10 Thread Michele Palmia (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-9269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17055891#comment-17055891
 ] 

Michele Palmia edited comment on LUCENE-9269 at 3/10/20, 12:57 PM:
---

I added a very simple test (with my very limited Lucene testing skills) that 
emulates example c) above and checks for the score of the top document. As 
there is no "right" score, I just check for one of the two possible scores and 
have the test fail on the other.

I'm having a hard time wrapping my head around what the right behavior should 
be in this case (and thus coming up with a more sensible test and fix).

In case that's useful, I should probably add that the randomness in the scoring 
behavior is due to the HashMap underlying MultiSet: when should clauses are 
processed for deduplication, they're served in an arbitrary order (see 
[BooleanQuery.java:370|[https://github.com/apache/lucene-solr/blob/master/lucene/core/src/java/org/apache/lucene/search/BooleanQuery.java#L370]])


was (Author: micpalmia):
I added a very simple test (with my very limited Lucene testing skills) that 
simply emulates example c) above and checks for the score of the top document. 
As there is no "right" score, I just check for one of the two possible scores 
and have the test fail on the other.

I'm having a hard time wrapping my head around what the right behavior should 
be in this case (and thus coming up with a more sensible test and fix).

> Blended queries with boolean rewrite can result in inconstitent scores
> --
>
> Key: LUCENE-9269
> URL: https://issues.apache.org/jira/browse/LUCENE-9269
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/search
>Affects Versions: 8.4
>Reporter: Michele Palmia
>Priority: Minor
> Attachments: LUCENE-9269-test.patch
>
>
> If two blended queries are should clauses of a boolean query and are built so 
> that
>  * some of their terms are the same
>  * their rewrite method is BlendedTermQuery.BOOLEAN_REWRITE
> the docFreq for the overlapping terms used for scoring is picked as follow:
>  # if the overlapping terms are not boosted, the df of the term in the first 
> blended query is used
>  # if any of the overlapping terms is boosted, the df is picked at (what 
> looks like) random.
> A few examples using a field with 2 terms: f:a (df: 2), and f:b (df: 3).
> {code:java}
> a)
> Blended(f:a f:b) Blended (f:a)
> df: 3 df: 2
> gets rewritten to:
> (f:a)^2.0 (f:b)
> df: 3  df:2
> b)
> Blended(f:a) Blended(f:a f:b)
> df: 2df: 3
> gets rewritten to:
> (f:a)^2.0 (f:b)
>  df: 2 df:2
> c)
> Blended(f:a f:b^0.66) Blended (f:a^0.75)
> df: 3  df: 2
> gets rewritten to:
> (f:a)^1.75 (f:b)^0.66
>  df:?   df:2
> {code}
> with ? either 2 or 3, depending on the run.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-9236) Having a modular Doc Values format

2020-03-10 Thread Adrien Grand (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-9236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17055907#comment-17055907
 ] 

Adrien Grand commented on LUCENE-9236:
--

[~juan.duran] In my opinion it introduces complexity because it introduces more 
abstractions: CompositeFieldMetadata, DocValuesConsumerSupplier, and so on.

> Having a modular Doc Values format
> --
>
> Key: LUCENE-9236
> URL: https://issues.apache.org/jira/browse/LUCENE-9236
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/index
>Reporter: juan camilo rodriguez duran
>Priority: Minor
>  Labels: docValues
>
>  Today DocValues Consumer/Producer require override 5 different methods, even 
> if you only want to use one and given that one given field can only support 
> one doc values type at same time.
>  
> In the attached PR I’ve implemented a new modular version of those classes 
> (consumer/producer) each one having a single responsibility and writing in 
> the same unique file.
> This is mainly a refactor of the existing format opening the possibility to 
> override or implement the sub-format you need.
>  
> I’ll do in 3 steps:
>  # Create a CompositeDocValuesFormat and moving the code of 
> Lucene80DocValuesFormat in separate classes, without modifying the inner 
> code. At same time I created a Lucene85CompositeDocValuesFormat based on 
> these changes.
>  # I’ll introduce some basic components for writing doc values in general 
> such as:
>  ## DocumentIdSetIterator Serializer: used in each type of field based on an 
> IndexedDISI.
>  ## Document Ordinals Serializer: Used in Sorted and SortedSet for 
> deduplicate values using a dictionary.
>  ## Document Boundaries Serializer (optional used only for multivalued 
> fields: SortedNumeric and SortedSet)
>  ## TermsEnum Serializer: useful to write and read the terms dictionary for 
> sorted and sorted set doc values.
>  # I’ll create the new Sub-DocValues format using the previous components.
>  
> PR: [https://github.com/apache/lucene-solr/pull/1282]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-9269) Blended queries with boolean rewrite can result in inconstitent scores

2020-03-10 Thread Adrien Grand (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-9269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17055901#comment-17055901
 ] 

Adrien Grand commented on LUCENE-9269:
--

We should remove BlendedTermQuery eventually. It tries to solve cross-field 
search and synonym search at the same time, which introduces complications... 
Since you seem to be using it for the synonym case, you can look at 
SynonymQuery, which can deal with multiple synonyms that have different boosts 
today already. For cross-field search, we have a BM25FQuery, though I hope 
we'll find ways to make it easier to use in the future, e.g. by moving the 
scoring logic to Similarity.

> Blended queries with boolean rewrite can result in inconstitent scores
> --
>
> Key: LUCENE-9269
> URL: https://issues.apache.org/jira/browse/LUCENE-9269
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/search
>Affects Versions: 8.4
>Reporter: Michele Palmia
>Priority: Minor
> Attachments: LUCENE-9269-test.patch
>
>
> If two blended queries are should clauses of a boolean query and are built so 
> that
>  * some of their terms are the same
>  * their rewrite method is BlendedTermQuery.BOOLEAN_REWRITE
> the docFreq for the overlapping terms used for scoring is picked as follow:
>  # if the overlapping terms are not boosted, the df of the term in the first 
> blended query is used
>  # if any of the overlapping terms is boosted, the df is picked at (what 
> looks like) random.
> A few examples using a field with 2 terms: f:a (df: 2), and f:b (df: 3).
> {code:java}
> a)
> Blended(f:a f:b) Blended (f:a)
> df: 3 df: 2
> gets rewritten to:
> (f:a)^2.0 (f:b)
> df: 3  df:2
> b)
> Blended(f:a) Blended(f:a f:b)
> df: 2df: 3
> gets rewritten to:
> (f:a)^2.0 (f:b)
>  df: 2 df:2
> c)
> Blended(f:a f:b^0.66) Blended (f:a^0.75)
> df: 3  df: 2
> gets rewritten to:
> (f:a)^1.75 (f:b)^0.66
>  df:?   df:2
> {code}
> with ? either 2 or 3, depending on the run.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-9269) Blended queries with boolean rewrite can result in inconstitent scores

2020-03-10 Thread Michele Palmia (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-9269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17055891#comment-17055891
 ] 

Michele Palmia commented on LUCENE-9269:


I added a very simple test (with my very limited Lucene testing skills) that 
simply emulates example c) above and checks for the score of the top document. 
As there is no "right" score, I just check for one of the two possible scores 
and have the test fail on the other.

I'm having a hard time wrapping my head around what the right behavior should 
be in this case (and thus coming up with a more sensible test and fix).

> Blended queries with boolean rewrite can result in inconstitent scores
> --
>
> Key: LUCENE-9269
> URL: https://issues.apache.org/jira/browse/LUCENE-9269
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/search
>Affects Versions: 8.4
>Reporter: Michele Palmia
>Priority: Minor
> Attachments: LUCENE-9269-test.patch
>
>
> If two blended queries are should clauses of a boolean query and are built so 
> that
>  * some of their terms are the same
>  * their rewrite method is BlendedTermQuery.BOOLEAN_REWRITE
> the docFreq for the overlapping terms used for scoring is picked as follow:
>  # if the overlapping terms are not boosted, the df of the term in the first 
> blended query is used
>  # if any of the overlapping terms is boosted, the df is picked at (what 
> looks like) random.
> A few examples using a field with 2 terms: f:a (df: 2), and f:b (df: 3).
> {code:java}
> a)
> Blended(f:a f:b) Blended (f:a)
> df: 3 df: 2
> gets rewritten to:
> (f:a)^2.0 (f:b)
> df: 3  df:2
> b)
> Blended(f:a) Blended(f:a f:b)
> df: 2df: 3
> gets rewritten to:
> (f:a)^2.0 (f:b)
>  df: 2 df:2
> c)
> Blended(f:a f:b^0.66) Blended (f:a^0.75)
> df: 3  df: 2
> gets rewritten to:
> (f:a)^1.75 (f:b)^0.66
>  df:?   df:2
> {code}
> with ? either 2 or 3, depending on the run.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Updated] (LUCENE-9269) Blended queries with boolean rewrite can result in inconstitent scores

2020-03-10 Thread Michele Palmia (Jira)



 [ 
https://issues.apache.org/jira/browse/LUCENE-9269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michele Palmia updated LUCENE-9269:
---
Attachment: LUCENE-9269-test.patch

> Blended queries with boolean rewrite can result in inconstitent scores
> --
>
> Key: LUCENE-9269
> URL: https://issues.apache.org/jira/browse/LUCENE-9269
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/search
>Affects Versions: 8.4
>Reporter: Michele Palmia
>Priority: Minor
> Attachments: LUCENE-9269-test.patch
>
>
> If two blended queries are should clauses of a boolean query and are built so 
> that
>  * some of their terms are the same
>  * their rewrite method is BlendedTermQuery.BOOLEAN_REWRITE
> the docFreq for the overlapping terms used for scoring is picked as follow:
>  # if the overlapping terms are not boosted, the df of the term in the first 
> blended query is used
>  # if any of the overlapping terms is boosted, the df is picked at (what 
> looks like) random.
> A few examples using a field with 2 terms: f:a (df: 2), and f:b (df: 3).
> {code:java}
> a)
> Blended(f:a f:b) Blended (f:a)
> df: 3 df: 2
> gets rewritten to:
> (f:a)^2.0 (f:b)
> df: 3  df:2
> b)
> Blended(f:a) Blended(f:a f:b)
> df: 2df: 3
> gets rewritten to:
> (f:a)^2.0 (f:b)
>  df: 2 df:2
> c)
> Blended(f:a f:b^0.66) Blended (f:a^0.75)
> df: 3  df: 2
> gets rewritten to:
> (f:a)^1.75 (f:b)^0.66
>  df:?   df:2
> {code}
> with ? either 2 or 3, depending on the run.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Updated] (LUCENE-9269) Blended queries with boolean rewrite can result in inconstitent scores

2020-03-10 Thread Michele Palmia (Jira)



 [ 
https://issues.apache.org/jira/browse/LUCENE-9269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michele Palmia updated LUCENE-9269:
---
Description: 
If two blended queries are should clauses of a boolean query and are built so 
that
 * some of their terms are the same
 * their rewrite method is BlendedTermQuery.BOOLEAN_REWRITE

the docFreq for the overlapping terms used for scoring is picked as follow:
 * if the overlapping terms are not boosted, the df of the term in the first 
blended query is used
 * if any of the overlapping terms is boosted, the df is picked at (what looks 
like) random.

A few examples using a field with 2 terms: f:a (df: 2), and f:b (df: 3).
{code:java}
1.
Blended(f:a f:b) Blended (f:a)
df: 3 df: 2
gets rewritten to:
(f:a)^2.0 (f:b)
df: 3  df:2

Blended(f:a) Blended(f:a f:b)
df: 2df: 3
gets rewritten to:
(f:a)^2.0 (f:b)
 df: 2 df:2

Blended(f:a f:b^0.66) Blended (f:a^0.75)
df: 3  df: 2
gets rewritten to:
(f:a)^1.75 (f:b)^0.66
 df:?   df:2
{code}
with ? either 2 or 3, depending on the run.

 

  was:
If two blended queries are built so that
 * some of their terms are the same
 * their rewrite method is BlendedTermQuery.BOOLEAN_REWRITE

the docFreq for the overlapping terms used for scoring is picked as follow:
 * if the overlapping terms are not boosted, the df of the term in the first 
blended query is used
 * if any of the overlapping terms is boosted, the df is picked at (what looks 
like) random.

A few examples using a field with 2 terms: f:a (df: 2), and f:b (df: 3).

 
{code:java}
1.
Blended(f:a f:b) Blended (f:a)
df: 3 df: 2
gets rewritten to:
(f:a)^2.0 (f:b)
df: 3  df:2

Blended(f:a) Blended(f:a f:b)
df: 2df: 3
gets rewritten to:
(f:a)^2.0 (f:b)
 df: 2 df:2

Blended(f:a f:b^0.66) Blended (f:a^0.75)
df: 3  df: 2
gets rewritten to:
(f:a)^1.75 (f:b)^0.66
 df:?   df:2
{code}
with ? either 2 or 3, depending on the run.

 


> Blended queries with boolean rewrite can result in inconstitent scores
> --
>
> Key: LUCENE-9269
> URL: https://issues.apache.org/jira/browse/LUCENE-9269
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/search
>Affects Versions: 8.4
>Reporter: Michele Palmia
>Priority: Minor
>
> If two blended queries are should clauses of a boolean query and are built so 
> that
>  * some of their terms are the same
>  * their rewrite method is BlendedTermQuery.BOOLEAN_REWRITE
> the docFreq for the overlapping terms used for scoring is picked as follow:
>  * if the overlapping terms are not boosted, the df of the term in the first 
> blended query is used
>  * if any of the overlapping terms is boosted, the df is picked at (what 
> looks like) random.
> A few examples using a field with 2 terms: f:a (df: 2), and f:b (df: 3).
> {code:java}
> 1.
> Blended(f:a f:b) Blended (f:a)
> df: 3 df: 2
> gets rewritten to:
> (f:a)^2.0 (f:b)
> df: 3  df:2
> Blended(f:a) Blended(f:a f:b)
> df: 2df: 3
> gets rewritten to:
> (f:a)^2.0 (f:b)
>  df: 2 df:2
> Blended(f:a f:b^0.66) Blended (f:a^0.75)
> df: 3  df: 2
> gets rewritten to:
> (f:a)^1.75 (f:b)^0.66
>  df:?   df:2
> {code}
> with ? either 2 or 3, depending on the run.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Updated] (LUCENE-9269) Blended queries with boolean rewrite can result in inconstitent scores

2020-03-10 Thread Michele Palmia (Jira)



 [ 
https://issues.apache.org/jira/browse/LUCENE-9269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michele Palmia updated LUCENE-9269:
---
Description: 
If two blended queries are should clauses of a boolean query and are built so 
that
 * some of their terms are the same
 * their rewrite method is BlendedTermQuery.BOOLEAN_REWRITE

the docFreq for the overlapping terms used for scoring is picked as follow:
 # if the overlapping terms are not boosted, the df of the term in the first 
blended query is used
 # if any of the overlapping terms is boosted, the df is picked at (what looks 
like) random.

A few examples using a field with 2 terms: f:a (df: 2), and f:b (df: 3).
{code:java}
a)
Blended(f:a f:b) Blended (f:a)
df: 3 df: 2
gets rewritten to:
(f:a)^2.0 (f:b)
df: 3  df:2

b)
Blended(f:a) Blended(f:a f:b)
df: 2df: 3
gets rewritten to:
(f:a)^2.0 (f:b)
 df: 2 df:2

c)
Blended(f:a f:b^0.66) Blended (f:a^0.75)
df: 3  df: 2
gets rewritten to:
(f:a)^1.75 (f:b)^0.66
 df:?   df:2
{code}
with ? either 2 or 3, depending on the run.

 

  was:
If two blended queries are should clauses of a boolean query and are built so 
that
 * some of their terms are the same
 * their rewrite method is BlendedTermQuery.BOOLEAN_REWRITE

the docFreq for the overlapping terms used for scoring is picked as follow:
 * if the overlapping terms are not boosted, the df of the term in the first 
blended query is used
 * if any of the overlapping terms is boosted, the df is picked at (what looks 
like) random.

A few examples using a field with 2 terms: f:a (df: 2), and f:b (df: 3).
{code:java}
1.
Blended(f:a f:b) Blended (f:a)
df: 3 df: 2
gets rewritten to:
(f:a)^2.0 (f:b)
df: 3  df:2

Blended(f:a) Blended(f:a f:b)
df: 2df: 3
gets rewritten to:
(f:a)^2.0 (f:b)
 df: 2 df:2

Blended(f:a f:b^0.66) Blended (f:a^0.75)
df: 3  df: 2
gets rewritten to:
(f:a)^1.75 (f:b)^0.66
 df:?   df:2
{code}
with ? either 2 or 3, depending on the run.

 


> Blended queries with boolean rewrite can result in inconstitent scores
> --
>
> Key: LUCENE-9269
> URL: https://issues.apache.org/jira/browse/LUCENE-9269
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/search
>Affects Versions: 8.4
>Reporter: Michele Palmia
>Priority: Minor
>
> If two blended queries are should clauses of a boolean query and are built so 
> that
>  * some of their terms are the same
>  * their rewrite method is BlendedTermQuery.BOOLEAN_REWRITE
> the docFreq for the overlapping terms used for scoring is picked as follow:
>  # if the overlapping terms are not boosted, the df of the term in the first 
> blended query is used
>  # if any of the overlapping terms is boosted, the df is picked at (what 
> looks like) random.
> A few examples using a field with 2 terms: f:a (df: 2), and f:b (df: 3).
> {code:java}
> a)
> Blended(f:a f:b) Blended (f:a)
> df: 3 df: 2
> gets rewritten to:
> (f:a)^2.0 (f:b)
> df: 3  df:2
> b)
> Blended(f:a) Blended(f:a f:b)
> df: 2df: 3
> gets rewritten to:
> (f:a)^2.0 (f:b)
>  df: 2 df:2
> c)
> Blended(f:a f:b^0.66) Blended (f:a^0.75)
> df: 3  df: 2
> gets rewritten to:
> (f:a)^1.75 (f:b)^0.66
>  df:?   df:2
> {code}
> with ? either 2 or 3, depending on the run.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Created] (LUCENE-9269) Blended queries with boolean rewrite can result in inconstitent scores

2020-03-10 Thread Michele Palmia (Jira)

Michele Palmia created LUCENE-9269:
--

 Summary: Blended queries with boolean rewrite can result in 
inconstitent scores
 Key: LUCENE-9269
 URL: https://issues.apache.org/jira/browse/LUCENE-9269
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/search
Affects Versions: 8.4
Reporter: Michele Palmia


If two blended queries are built so that
 * some of their terms are the same
 * their rewrite method is BlendedTermQuery.BOOLEAN_REWRITE

the docFreq for the overlapping terms used for scoring is picked as follow:
 * if the overlapping terms are not boosted, the df of the term in the first 
blended query is used
 * if any of the overlapping terms is boosted, the df is picked at (what looks 
like) random.

A few examples using a field with 2 terms: f:a (df: 2), and f:b (df: 3).

 
{code:java}
1.
Blended(f:a f:b) Blended (f:a)
df: 3 df: 2
gets rewritten to:
(f:a)^2.0 (f:b)
df: 3  df:2

Blended(f:a) Blended(f:a f:b)
df: 2df: 3
gets rewritten to:
(f:a)^2.0 (f:b)
 df: 2 df:2

Blended(f:a f:b^0.66) Blended (f:a^0.75)
df: 3  df: 2
gets rewritten to:
(f:a)^1.75 (f:b)^0.66
 df:?   df:2
{code}
with ? either 2 or 3, depending on the run.

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-9263) Geo3D distance query computes wrongly the radius

2020-03-10 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-9263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17055745#comment-17055745
 ] 

ASF subversion and git services commented on LUCENE-9263:
-

Commit f4737e5974d75decf14f8217b99176431dfa055c in lucene-solr's branch 
refs/heads/branch_8_5 from Ignacio Vera
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=f4737e5 ]

LUCENE-9263: Fix wrong transformation of distance in meters to radians in 
Geo3DPoint (#1318)


> Geo3D distance query computes wrongly the radius
> 
>
> Key: LUCENE-9263
> URL: https://issues.apache.org/jira/browse/LUCENE-9263
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Ignacio Vera
>Assignee: Ignacio Vera
>Priority: Major
> Fix For: 8.6
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> This I side effect of LUCENE-9150, the transformation of radius in meters to 
> radians is totally wrong as it does not take into account the mean radius of 
> the earth.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-9266) ant nightly-smoke fails due to presence of build.gradle

2020-03-10 Thread Alan Woodward (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-9266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17055714#comment-17055714
 ] 

Alan Woodward commented on LUCENE-9266:
---

I think gradle is master only at the moment?

> ant nightly-smoke fails due to presence of build.gradle
> ---
>
> Key: LUCENE-9266
> URL: https://issues.apache.org/jira/browse/LUCENE-9266
> Project: Lucene - Core
>  Issue Type: Sub-task
>Reporter: Mike Drob
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Seen on Jenkins - 
> [https://builds.apache.org/job/Lucene-Solr-SmokeRelease-master/1617/console]
>  
> Reproduced locally.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] dweiss commented on a change in pull request #1333: LUCENE-9266 Update smoke test for gradle

2020-03-10 Thread GitBox

dweiss commented on a change in pull request #1333: LUCENE-9266 Update smoke 
test for gradle
URL: https://github.com/apache/lucene-solr/pull/1333#discussion_r390134272
 
 

 ##
 File path: lucene/common-build.xml
 ##
 @@ -598,9 +599,13 @@
 
 
 
-  
+  
+
 
 Review comment:
   Why is this needed? Is this related to gradle?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-14317) HttpClusterStateProvider throws exception when only one node down

2020-03-10 Thread Ishan Chattopadhyaya (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-14317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17055662#comment-17055662
 ] 

Ishan Chattopadhyaya commented on SOLR-14317:
-

Feel free to submit a patch, with tests if possible.

> HttpClusterStateProvider throws exception when only one node down
> -
>
> Key: SOLR-14317
> URL: https://issues.apache.org/jira/browse/SOLR-14317
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrJ
>Affects Versions: 7.7.1, 7.7.2
>Reporter: Lyle
>Assignee: Ishan Chattopadhyaya
>Priority: Major
>
> When create a CloudSolrClient with solrUrls, if the first url in the solrUrls 
> list is invalid or server is down, it will throw exception directly rather 
> than try remaining url.
> In 
> [https://github.com/apache/lucene-solr/blob/branch_7_7/solr/solrj/src/java/org/apache/solr/client/solrj/impl/HttpClusterStateProvider.java#L65],
>  if fetchLiveNodes(initialClient) have any IOException, in 
> [https://github.com/apache/lucene-solr/blob/branch_7_7/solr/solrj/src/java/org/apache/solr/client/solrj/impl/HttpSolrClient.java#L648],
>  exceptions will be caught and throw SolrServerException to the upper caller, 
> while no IOExceptioin will be caught in 
> HttpClusterStateProvider.fetchLiveNodes(HttpClusterStateProvider.java:200).
> The SolrServerException should be caught as well in 
> [https://github.com/apache/lucene-solr/blob/branch_7_7/solr/solrj/src/java/org/apache/solr/client/solrj/impl/HttpClusterStateProvider.java#L69],
>  so that if first node provided in solrUrs down, we can try to use the second 
> to fetch live nodes.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Assigned] (SOLR-14317) HttpClusterStateProvider throws exception when only one node down

2020-03-10 Thread Ishan Chattopadhyaya (Jira)



 [ 
https://issues.apache.org/jira/browse/SOLR-14317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ishan Chattopadhyaya reassigned SOLR-14317:
---

Assignee: Ishan Chattopadhyaya

> HttpClusterStateProvider throws exception when only one node down
> -
>
> Key: SOLR-14317
> URL: https://issues.apache.org/jira/browse/SOLR-14317
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrJ
>Affects Versions: 7.7.1, 7.7.2
>Reporter: Lyle
>Assignee: Ishan Chattopadhyaya
>Priority: Major
>
> When create a CloudSolrClient with solrUrls, if the first url in the solrUrls 
> list is invalid or server is down, it will throw exception directly rather 
> than try remaining url.
> In 
> [https://github.com/apache/lucene-solr/blob/branch_7_7/solr/solrj/src/java/org/apache/solr/client/solrj/impl/HttpClusterStateProvider.java#L65],
>  if fetchLiveNodes(initialClient) have any IOException, in 
> [https://github.com/apache/lucene-solr/blob/branch_7_7/solr/solrj/src/java/org/apache/solr/client/solrj/impl/HttpSolrClient.java#L648],
>  exceptions will be caught and throw SolrServerException to the upper caller, 
> while no IOExceptioin will be caught in 
> HttpClusterStateProvider.fetchLiveNodes(HttpClusterStateProvider.java:200).
> The SolrServerException should be caught as well in 
> [https://github.com/apache/lucene-solr/blob/branch_7_7/solr/solrj/src/java/org/apache/solr/client/solrj/impl/HttpClusterStateProvider.java#L69],
>  so that if first node provided in solrUrs down, we can try to use the second 
> to fetch live nodes.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-14317) HttpClusterStateProvider throws exception when only one node down

2020-03-10 Thread Ishan Chattopadhyaya (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-14317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17055661#comment-17055661
 ] 

Ishan Chattopadhyaya commented on SOLR-14317:
-

Thanks for reporting, I'll take a look.

> HttpClusterStateProvider throws exception when only one node down
> -
>
> Key: SOLR-14317
> URL: https://issues.apache.org/jira/browse/SOLR-14317
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrJ
>Affects Versions: 7.7.1, 7.7.2
>Reporter: Lyle
>Assignee: Ishan Chattopadhyaya
>Priority: Major
>
> When create a CloudSolrClient with solrUrls, if the first url in the solrUrls 
> list is invalid or server is down, it will throw exception directly rather 
> than try remaining url.
> In 
> [https://github.com/apache/lucene-solr/blob/branch_7_7/solr/solrj/src/java/org/apache/solr/client/solrj/impl/HttpClusterStateProvider.java#L65],
>  if fetchLiveNodes(initialClient) have any IOException, in 
> [https://github.com/apache/lucene-solr/blob/branch_7_7/solr/solrj/src/java/org/apache/solr/client/solrj/impl/HttpSolrClient.java#L648],
>  exceptions will be caught and throw SolrServerException to the upper caller, 
> while no IOExceptioin will be caught in 
> HttpClusterStateProvider.fetchLiveNodes(HttpClusterStateProvider.java:200).
> The SolrServerException should be caught as well in 
> [https://github.com/apache/lucene-solr/blob/branch_7_7/solr/solrj/src/java/org/apache/solr/client/solrj/impl/HttpClusterStateProvider.java#L69],
>  so that if first node provided in solrUrs down, we can try to use the second 
> to fetch live nodes.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

1 2 >

1 - 100 of 101 matches

Mail list logo