[GitHub] [lucene-solr] muse-dev[bot] commented on a change in pull request #2097: LUCENE-9537

2021-01-11 Thread GitBox


muse-dev[bot] commented on a change in pull request #2097:
URL: https://github.com/apache/lucene-solr/pull/2097#discussion_r05138



##
File path: lucene/core/src/java/org/apache/lucene/search/IndriAndWeight.java
##
@@ -0,0 +1,124 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.search;
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.Iterator;
+import java.util.List;
+
+import org.apache.lucene.index.LeafReaderContext;
+
+/**
+ * The Weight for IndriAndQuery, used to normalize, score and explain these
+ * queries.
+ */
+public class IndriAndWeight extends Weight {
+  
+  private final IndriAndQuery query;
+  private final ArrayList weights;
+  private final ScoreMode scoreMode;
+  private final float boost;
+  
+  public IndriAndWeight(IndriAndQuery query, IndexSearcher searcher,
+  ScoreMode scoreMode, float boost) throws IOException {
+super(query);
+this.query = query;
+this.boost = boost;
+this.scoreMode = scoreMode;
+weights = new ArrayList<>();
+for (BooleanClause c : query) {
+  Weight w = searcher.createWeight(c.getQuery(), scoreMode, 1.0f);
+  weights.add(w);
+}
+  }
+  
+  private Scorer getScorer(LeafReaderContext context) throws IOException {
+List subScorers = new ArrayList<>();
+
+for (Weight w : weights) {
+  Scorer scorer = w.scorer(context);
+  if (scorer != null) {
+subScorers.add(scorer);
+  }
+}
+
+if (subScorers.isEmpty()) {
+  return null;
+}
+Scorer scorer = subScorers.get(0);
+if (subScorers.size() > 1) {
+  scorer = new IndriAndScorer(this, subScorers, scoreMode, boost);
+}
+return scorer;
+  }
+  
+  @Override
+  public Scorer scorer(LeafReaderContext context) throws IOException {
+return getScorer(context);
+  }
+  
+  @Override
+  public BulkScorer bulkScorer(LeafReaderContext context) throws IOException {
+Scorer scorer = getScorer(context);
+if (scorer != null) {
+  BulkScorer bulkScorer = new DefaultBulkScorer(scorer);
+  return bulkScorer;
+}
+return null;
+  }
+  
+  @Override
+  public boolean isCacheable(LeafReaderContext ctx) {
+for (Weight w : weights) {
+  if (w.isCacheable(ctx) == false) return false;
+}
+return true;
+  }
+  
+  @Override
+  public Explanation explain(LeafReaderContext context, int doc)
+  throws IOException {
+List subs = new ArrayList<>();
+boolean fail = false;
+Iterator cIter = query.iterator();
+for (Iterator wIter = weights.iterator(); wIter.hasNext();) {
+  Weight w = wIter.next();
+  BooleanClause c = cIter.next();
+  Explanation e = w.explain(context, doc);
+  if (e.isMatch()) {
+subs.add(e);
+  } else if (c.isRequired()) {
+subs.add(Explanation.noMatch(
+"no match on required clause (" + c.getQuery().toString() + ")",
+e));
+fail = true;
+  }
+}
+if (fail) {
+  return Explanation.noMatch(
+  "Failure to meet condition(s) of required/prohibited clause(s)",
+  subs);
+} else {
+  Scorer scorer = scorer(context);
+  int advanced = scorer.iterator().advance(doc);

Review comment:
   *NULLPTR_DEREFERENCE:*  accessing memory that is the null pointer on 
line 117 indirectly during the call to `IndriAndWeight.scorer(...)`.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-15052) Reducing overseer bottlenecks using per-replica states

2021-01-11 Thread Ishan Chattopadhyaya (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-15052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17263063#comment-17263063
 ] 

Ishan Chattopadhyaya commented on SOLR-15052:
-

Thanks for bringing up performance testing, Mike. I'll do another round before 
the release, and if this issue causes any regression, we should back it out. It 
is refreshing to see someone emphasizing the performance of critical changes in 
non optional code paths.

> Reducing overseer bottlenecks using per-replica states
> --
>
> Key: SOLR-15052
> URL: https://issues.apache.org/jira/browse/SOLR-15052
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Ishan Chattopadhyaya
>Priority: Major
> Attachments: per-replica-states-gcp.pdf
>
>  Time Spent: 10h 20m
>  Remaining Estimate: 0h
>
> This work has the same goal as SOLR-13951, that is to reduce overseer 
> bottlenecks by avoiding replica state updates from going to the state.json 
> via the overseer. However, the approach taken here is different from 
> SOLR-13951 and hence this work supercedes that work.
> The design proposed is here: 
> https://docs.google.com/document/d/1xdxpzUNmTZbk0vTMZqfen9R3ArdHokLITdiISBxCFUg/edit
> Briefly,
> # Every replica's state will be in a separate znode nested under the 
> state.json. It has the name that encodes the replica name, state, leadership 
> status.
> # An additional children watcher to be set on state.json for state changes.
> # Upon a state change, a ZK multi-op to delete the previous znode and add a 
> new znode with new state.
> Differences between this and SOLR-13951,
> # In SOLR-13951, we planned to leverage shard terms for per shard states.
> # As a consequence, the code changes required for SOLR-13951 were massive (we 
> needed a shard state provider abstraction and introduce it everywhere in the 
> codebase).
> # This approach is a drastically simpler change and design.
> Credits for this design and the PR is due to [~noble.paul]. 
> [~markrmil...@gmail.com], [~noble.paul] and I have collaborated on this 
> effort. The reference branch takes a conceptually similar (but not identical) 
> approach.
> I shall attach a PR and performance benchmarks shortly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] dsmiley commented on pull request #2196: SOLR-15071: Fix ArrayIndexOutOfBoundsException in contrib/ltr SolrFeatureScorer

2021-01-11 Thread GitBox


dsmiley commented on pull request #2196:
URL: https://github.com/apache/lucene-solr/pull/2196#issuecomment-758372861


   Can you please include a simple test that exposes the problem?
   It doesn't feel right for this Filter implementation to not delegate when 
the other methods do.  Note that a plausible alternative is to simply not 
specify it and thus inherit the null-returning implementation from the Scorer.  
Still...
   Admittedly I don't quite "get" the problem yet but once a failing test is in 
front of me, and with your analysis already done, I'm sure it won't take long.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] cammiemw commented on a change in pull request #2097: LUCENE-9537

2021-01-11 Thread GitBox


cammiemw commented on a change in pull request #2097:
URL: https://github.com/apache/lucene-solr/pull/2097#discussion_r555482893



##
File path: lucene/core/src/java/org/apache/lucene/search/ScoreAndDoc.java
##
@@ -32,4 +33,9 @@ public int docID() {
   public float score() {
 return score;
   }
+
+  @Override
+  public float smoothingScore(int docId) throws IOException {

Review comment:
   I agree that this makes more sense :-)  I have added the default 
implementation of smoothingScore in Scorable and reverted my changes to add the 
smoothingScore method to all the unnecessary extending classes.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] cammiemw commented on a change in pull request #2097: LUCENE-9537

2021-01-11 Thread GitBox


cammiemw commented on a change in pull request #2097:
URL: https://github.com/apache/lucene-solr/pull/2097#discussion_r555482893



##
File path: lucene/core/src/java/org/apache/lucene/search/ScoreAndDoc.java
##
@@ -32,4 +33,9 @@ public int docID() {
   public float score() {
 return score;
   }
+
+  @Override
+  public float smoothingScore(int docId) throws IOException {

Review comment:
   I agree that this makes more sense ;-)  I have added the default 
implementation of smoothingScore in Scorable and reverted my changes to add the 
smoothingScore method to all the unnecessary extending classes.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] cammiemw commented on a change in pull request #2097: LUCENE-9537

2021-01-11 Thread GitBox


cammiemw commented on a change in pull request #2097:
URL: https://github.com/apache/lucene-solr/pull/2097#discussion_r555482275



##
File path: lucene/core/src/java/org/apache/lucene/search/IndriQuery.java
##
@@ -0,0 +1,100 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.search;
+
+import java.io.IOException;
+import java.util.Iterator;
+import java.util.List;
+import java.util.Objects;
+
+/**
+ * A Basic abstract query that all IndriQueries can extend to implement
+ * toString, equals, getClauses, and iterator.
+ *
+ */
+public abstract class IndriQuery extends Query

Review comment:
   Currently, IndriQuery does not take advantage of Block MAX Weak and 
optimization.  We iterate through all documents that that have a posting for at 
least one of the search terms.  I would be interested in expanding the 
smoothing score functionality to more parts of lucene in the future.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-15056) CPU circuit breaker needs to use CPU utilization, not Unix load average

2021-01-11 Thread Walter Underwood (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-15056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17263017#comment-17263017
 ] 

Walter Underwood commented on SOLR-15056:
-

I've added a patch.

Next, I'm setting up some functional testing. I'll fire up a server, configure 
a circuit breaker, then start using up system CPU with "yes | wc &".  That will 
close to max-out two processors. At some point, searches should start returning 
503.

> CPU circuit breaker needs to use CPU utilization, not Unix load average
> ---
>
> Key: SOLR-15056
> URL: https://issues.apache.org/jira/browse/SOLR-15056
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: metrics
>Affects Versions: 8.7
>Reporter: Walter Underwood
>Priority: Major
>  Labels: Metrics
> Attachments: SOLR-15056.patch
>
>
> The config range, 50% to 95%, assumes that the circuit breaker is triggered 
> by a CPU utilization metric that goes from 0% to 100%. But the code uses the 
> metric OperatingSystemMXBean.getSystemLoadAverage(). That is an average of 
> the count of processes waiting to run. It is effectively unbounded. I've seen 
> it as high as 50 to 100. It is not bound by 1.0 (100%).
> A good limit for load average would need to be aware of the number of CPUs 
> available to the JVM. A load average of 8 is no problem for a 32 CPU host. It 
> is a critical situation for a 2 CPU host.
> Also, load average is a Unix OS metric. I don't know if it is even available 
> on Windows.
> Instead, use a CPU utilization metric that goes from 0.0 to 1.0. A good 
> choice is OperatingSystemMXBean.getSystemCPULoad(). This name also uses 
> "load", but it is a usage metric.
> From the Javadoc:
> > Returns the "recent cpu usage" for the whole system. This value is a double 
> >in the [0.0,1.0] interval. A value of 0.0 means that all CPUs were idle 
> >during the recent period of time observed, while a value of 1.0 means that 
> >all CPUs were actively running 100% of the time during the recent period 
> >being observed. All values betweens 0.0 and 1.0 are possible depending of 
> >the activities going on in the system. If the system recent cpu usage is not 
> >available, the method returns a negative value.
> https://docs.oracle.com/javase/7/docs/jre/api/management/extension/com/sun/management/OperatingSystemMXBean.html#getSystemCpuLoad()
> Also update the documentation to explain which JMX metrics are used for the 
> memory and CPU circuit breakers.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (SOLR-15056) CPU circuit breaker needs to use CPU utilization, not Unix load average

2021-01-11 Thread Walter Underwood (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-15056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Walter Underwood updated SOLR-15056:

Attachment: SOLR-15056.patch
Labels: Metrics  (was: )
Status: Open  (was: Open)

Mostly passes "./gradlew check -Pvalidation.git.failOnModified=false" on 
master. Some probably unrelated cloud tests fail.

> CPU circuit breaker needs to use CPU utilization, not Unix load average
> ---
>
> Key: SOLR-15056
> URL: https://issues.apache.org/jira/browse/SOLR-15056
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: metrics
>Affects Versions: 8.7
>Reporter: Walter Underwood
>Priority: Major
>  Labels: Metrics
> Attachments: SOLR-15056.patch
>
>
> The config range, 50% to 95%, assumes that the circuit breaker is triggered 
> by a CPU utilization metric that goes from 0% to 100%. But the code uses the 
> metric OperatingSystemMXBean.getSystemLoadAverage(). That is an average of 
> the count of processes waiting to run. It is effectively unbounded. I've seen 
> it as high as 50 to 100. It is not bound by 1.0 (100%).
> A good limit for load average would need to be aware of the number of CPUs 
> available to the JVM. A load average of 8 is no problem for a 32 CPU host. It 
> is a critical situation for a 2 CPU host.
> Also, load average is a Unix OS metric. I don't know if it is even available 
> on Windows.
> Instead, use a CPU utilization metric that goes from 0.0 to 1.0. A good 
> choice is OperatingSystemMXBean.getSystemCPULoad(). This name also uses 
> "load", but it is a usage metric.
> From the Javadoc:
> > Returns the "recent cpu usage" for the whole system. This value is a double 
> >in the [0.0,1.0] interval. A value of 0.0 means that all CPUs were idle 
> >during the recent period of time observed, while a value of 1.0 means that 
> >all CPUs were actively running 100% of the time during the recent period 
> >being observed. All values betweens 0.0 and 1.0 are possible depending of 
> >the activities going on in the system. If the system recent cpu usage is not 
> >available, the method returns a negative value.
> https://docs.oracle.com/javase/7/docs/jre/api/management/extension/com/sun/management/OperatingSystemMXBean.html#getSystemCpuLoad()
> Also update the documentation to explain which JMX metrics are used for the 
> memory and CPU circuit breakers.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] tizianodeg commented on pull request #2136: SOLR-15037: fix prevent config change listener to reload core while s…

2021-01-11 Thread GitBox


tizianodeg commented on pull request #2136:
URL: https://github.com/apache/lucene-solr/pull/2136#issuecomment-758328108


   @madrob I modified TestManagedSchemaAPI to reproduce the issue, feel free to 
propose a better location or approach. 
   
   It's hard to reproduce this issue with a small schema. My current approach 
was to add 100 times 100 fields. Adding more complex field types (with 
normalizer loading files) my fail faster, but results also in a more complex 
test.
   Somehow it is also related to multi schema updates, as I could not reproduce 
it with single field changes. I suppose there are more frequent writes to 
zookeeper, producing more change notifications while the schema is updated. 
   
   I have tested it with java 8 and 11 both will produce an  "Unknown field" 
around field 5000 on a Intel(R) Core(TM) i7-7800X CPU @ 3.50GHz   3.50 GHz with 
SSD drives. If it doesn't fail on your machine try to increase the numbers of 
fields in one batch. 
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (SOLR-15079) Block Collapse (faster collapse code when groups are co-located via Block Join style nested doc indexing)

2021-01-11 Thread Chris M. Hostetter (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-15079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris M. Hostetter updated SOLR-15079:
--
Attachment: SOLR-15079.patch
Status: Open  (was: Open)

The main difference between this new approach and the existing collapse 
approach is that the existing collapse PostFilter
 maintains a big in memory data structure of every "group key" (values from the 
collapse field) it sees in the matching docs, and the "best" matching doc of 
each group (ie: the current "group head" - along with the selector values 
corresponding to each of those group head docs that are needed to determine if 
they are better/worse then any other candidate doc for that group that might 
come alone (this might be the 'score' of each doc w/default collapsing, or some 
field values if one of the min/max/sort group head selectors are used). Once 
the PostFilter is done collecting all matching docs, then it does another pass 
over these data structures to delegate collection of just the (final) best 
"group heads"

In the new logic, since we know our grouping field is unique per "block" of 
indexed documents, then no large in memory data structures are needed to track 
_all_ groups at once – we can simply record the single best doc / group head 
selector values for the _current_ group, and once we encounter a doc with a new 
value in the collapse field (ie: a new "group key"), we can immedaitely 
delegate collection of the "previous" group's best matching doc, and throw away 
it's metadata.

This means the new impl uses a *LOT* less ram then the old impl.

I did some benchmarking using an index built from some ecommerce style data 
containing ~50,000 (Parent) Products, ~8.5 Million (Child) SKUs in collections 
that had 6 shards, 1 replica each, with each replica hosted on it's own Solr 
node. test clients issued randomized queries designed to match different 
permutations of docs, w/varying number o matches per group.
 * Long running query tests against the collection built using nested docs and 
using block collapse had (cumulative) query times of ~ 45% to 65% lower then a 
"typical" collection*
 ** the relative perf gains of the new impl were higher as the query load (ie: 
num concurrent clients) increased
 ** the relative perf gains were consistent regardless of how many docs matched 
the test query, how many unique groups those docs were in, or how many docs in 
those groups were matched by those queries
 ** there was some notable diff in relative perf based on the number of 
segments – but that was because the existing impl does significantly better 
when there are fewer segments (probably due to ordinal mapping?) while the new 
impl has largely consistent behavior regardless of the number of segments
 * A lot of the "overall gains" probably come from reduced GC/memory contention 
(which system monitoring demonstrated was notely reduced with the new impl), 
but even in micro load testing the new implementation is faster on individual 
requests – which makes sense because it only has to do a single pass over the 
matching documents (as opposed to the "one pass over matching docs + one pass 
over matching groups to sort the group head doc ids + one pass over the final 
docids"
 ** so the more unique groups matched by a query, the faster the new impl is 
(relatively speaking) compared to the existing impl


The attached patch includes this new logic/approach and uses it by default when 
the collapse field is {{_root_}} but it also supports a new {{hint=block}} 
option users can specify if they want this logic for other fields when they 
know their groups are co-located. This is necessary if you have "deeply nested" 
documents and you want group on something that isn't consistent for all 
descendants of the same {{_root_}} doc, but is consistent for all descendants 
of particular ancestor docs.

Example: each root (level-0) product doc may have multiple (level-1) SKU 
"child" docs, and each SKU doc may have it's own (level-2) "variant" child docs 
(ie: grand child of 'product') that include a "sku_s" field which is guaranteed 
to consistent in every "variant" doc (and guaranteed to be unique across all 
unique SKU level documents). You could use {{"hint=block field=sku_s"}} when 
searching against variant docs to collapse down to the "best" variant for each 
sku.x

NOTE: This approach is only valid for {{nullPolicy=expand}} or 
{{nullPolicy=ignore}} (the default). It would not be possible to implement 
{{nullPolicy=collapse}} with this type of "one pass" approach.

I feel like the current patch is really solid and ready to commit & backport to 
8x, but I welcome any questions/concerns.

> Block Collapse (faster collapse code when groups are co-located via Block 
> Join style nested doc indexing)
> -
>
> 

[jira] [Created] (SOLR-15079) Block Collapse (faster collapse code when groups are co-located via Block Join style nested doc indexing)

2021-01-11 Thread Chris M. Hostetter (Jira)
Chris M. Hostetter created SOLR-15079:
-

 Summary: Block Collapse (faster collapse code when groups are 
co-located via Block Join style nested doc indexing)
 Key: SOLR-15079
 URL: https://issues.apache.org/jira/browse/SOLR-15079
 Project: Solr
  Issue Type: New Feature
  Security Level: Public (Default Security Level. Issues are Public)
Reporter: Chris M. Hostetter
Assignee: Chris M. Hostetter


A while back, JoelB had an idea for an optimized version of of the logic in the 
CollapsingQParserPlugin to take advantage of collapsing on fields where the 
user could knows that every doc with the same collapseKey were contiguous in 
the index - for example collapsing on the {{_root_}} field.

Joel whipped up an initial PoC patch internally at lucidworks that only dealt 
with some limited cases (string field collapsing w/o any nulls, using default 
group head selection) to explain the idea, but other priorities prevented him 
from doing thorough benchmarking or flesh it out into "production ready" code.

I took Joel's original PoC and fleshed it out with unit tests, fixed some bugs, 
and did some benchmarking against large indexes - the results look really good.

I've since then beefed the code up more to include collapsing on numeric 
fields, and added support for all group head selector types, as well as adding 
support for {{nullPolicy=expand}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Resolved] (SOLR-15010) Missing jstack warning is alarming, when using bin/solr as client interface to solr

2021-01-11 Thread David Eric Pugh (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-15010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Eric Pugh resolved SOLR-15010.

Fix Version/s: 8.8
   Resolution: Fixed

> Missing jstack warning is alarming, when using bin/solr as client interface 
> to solr
> ---
>
> Key: SOLR-15010
> URL: https://issues.apache.org/jira/browse/SOLR-15010
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: 8.7
>Reporter: David Eric Pugh
>Assignee: David Eric Pugh
>Priority: Minor
> Fix For: 8.8
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> In SOLR-14442 we added a warning if jstack wasn't found.   I notice that I 
> use the bin/solr command a lot as a client, so bin solr zk or bin solr 
> healthcheck. 
> For example:
> {{docker exec solr1 solr zk cp /security.json zk:security.json -z zoo1:2181}}
> All of these emit the message:
> The currently defined JAVA_HOME (/usr/local/openjdk-11) refers to a location
> where java was found but jstack was not found. Continuing.
> This is somewhat alarming, and then becomes annoying.   Thoughts on maybe 
> only conducting this check if you are running {{bin/solr start}} or one of 
> the other commands that is actually starting Solr as a process?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] epugh merged pull request #2192: SOLR-15010 Try to use jattach for threaddump if jstack is missing

2021-01-11 Thread GitBox


epugh merged pull request #2192:
URL: https://github.com/apache/lucene-solr/pull/2192


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-15071) Bug on LTR when using solr 8.6.3 - index out of bounds DisiPriorityQueue.add(DisiPriorityQueue.java:102)

2021-01-11 Thread Christine Poerschke (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-15071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17262947#comment-17262947
 ] 

Christine Poerschke edited comment on SOLR-15071 at 1/11/21, 9:53 PM:
--

* The {{freq >= minShouldMatch}} assertion in the {{updateFreq}} method fails.
 ** 
[https://github.com/apache/lucene-solr/blob/releases/lucene-solr/8.7.0/lucene/core/src/java/org/apache/lucene/search/MinShouldMatchSumScorer.java#L303]

 * This is because in the {{setDocAndFreq}} method the {{freq}} is re-computed 
and somehow that doesn't quite work out.
 ** 
[https://github.com/apache/lucene-solr/blob/releases/lucene-solr/8.7.0/lucene/core/src/java/org/apache/lucene/search/MinShouldMatchSumScorer.java#L263]

 * {{setDocAndFreq}} is called by the {{TwoPhaseIterator}} methods.
 ** 
[https://github.com/apache/lucene-solr/blob/releases/lucene-solr/8.7.0/lucene/core/src/java/org/apache/lucene/search/MinShouldMatchSumScorer.java#L135]

 * {{twoPhaseIterator()}} is an optional {{Scorer}} method, it defaults to 
{{null}} i.e. two-phase iteration is not supported.
 ** 
[https://github.com/apache/lucene-solr/blob/releases/lucene-solr/8.7.0/lucene/core/src/java/org/apache/lucene/search/Scorer.java#L74-L91]

 * With the SOLR-14378 and SOLR-14364 changes the {{SolrFeatureScorer}} 
inadvertently claimed to potentially have two-phase iteration support when it 
does not have it.
 ** 
[https://github.com/apache/lucene-solr/blob/releases/lucene-solr/8.7.0/solr/contrib/ltr/src/java/org/apache/solr/ltr/feature/SolrFeature.java#L233]

Proposed fix: [https://github.com/apache/lucene-solr/pull/2196]


was (Author: cpoerschke):
* The {{freq >= minShouldMatch}} assertion in the {{updateFreq}} method fails.
 ** 
[https://github.com/apache/lucene-solr/blob/releases/lucene-solr/8.7.0/lucene/core/src/java/org/apache/lucene/search/MinShouldMatchSumScorer.java#L303]

 * This is because in the {{setDocAndFreq}} method the {{freq}} is re-computed 
and somehow that doesn't quite work out.
 ** 
[https://github.com/apache/lucene-solr/blob/releases/lucene-solr/8.7.0/lucene/core/src/java/org/apache/lucene/search/MinShouldMatchSumScorer.java#L263]

 * {{setDocAndFreq}} is called by the {{TwoPhaseIterator}} methods.
 ** 
[https://github.com/apache/lucene-solr/blob/releases/lucene-solr/8.7.0/lucene/core/src/java/org/apache/lucene/search/MinShouldMatchSumScorer.java#L135]

 * {{twoPhaseIterator()}} is an optional {{Scorer}} method, it defaults to 
{{null}} i.e. two-phase iteration is not supported.
 ** 
[https://github.com/apache/lucene-solr/blob/releases/lucene-solr/8.7.0/lucene/core/src/java/org/apache/lucene/search/Scorer.java#L74-L91]

 * With the SOLR-14378 and SOLR-14364 changes the {{SolrFeatureScorer}} 
inadvertently claimed to have two-phase iteration support when it does not have 
it.
 ** 
[https://github.com/apache/lucene-solr/blob/releases/lucene-solr/8.7.0/solr/contrib/ltr/src/java/org/apache/solr/ltr/feature/SolrFeature.java#L233]

Proposed fix: [https://github.com/apache/lucene-solr/pull/2196]

> Bug on LTR when using solr 8.6.3 - index out of bounds 
> DisiPriorityQueue.add(DisiPriorityQueue.java:102)
> 
>
> Key: SOLR-15071
> URL: https://issues.apache.org/jira/browse/SOLR-15071
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: contrib - LTR
>Affects Versions: 8.6, 8.7
>Reporter: Florin Babes
>Assignee: Christine Poerschke
>Priority: Major
>  Labels: ltr
> Fix For: 8.8
>
> Attachments: featurestore+model+sample_documents.zip
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Hello,
> We are trying to update Solr from 8.3.1 to 8.6.3. On Solr 8.3.1 we are
> using LTR in production using a MultipleAdditiveTrees model. On Solr 8.6.3
> we receive an error when we try to compute some SolrFeatures. We didn't
> find any pattern of the queries that fail.
> Example:
> We have the following query raw parameters:
> q=lg cx 4k oled 120 hz -> just of many examples
> term_dq=lg cx 4k oled 120 hz
> rq={!ltr model=model reRankDocs=1000 store=feature_store
> efi.term=${term_dq}}
> defType=edismax,
> mm=2<75%
> The features are something like this:
> {
>  "name":"similarity_query_fileld_1",
>  "class":"org.apache.solr.ltr.feature.SolrFeature",
>  "params":\{"q":"{!dismax qf=query_field_1 mm=1}${term}"},
>  "store":"feature_store"
> },
> {
>  "name":"similarity_query_field_2",
>  "class":"org.apache.solr.ltr.feature.SolrFeature",
>  "params":\{"q":"{!dismax qf=query_field_2 mm=5}${term}"},
>  "store":"feature_store"
> }
> We are testing ~6300 production queries and for about 1% of them we receive
> that following error messag

[jira] [Comment Edited] (SOLR-15071) Bug on LTR when using solr 8.6.3 - index out of bounds DisiPriorityQueue.add(DisiPriorityQueue.java:102)

2021-01-11 Thread Christine Poerschke (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-15071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17262947#comment-17262947
 ] 

Christine Poerschke edited comment on SOLR-15071 at 1/11/21, 9:50 PM:
--

* The {{freq >= minShouldMatch}} assertion in the {{updateFreq}} method fails.
 ** 
[https://github.com/apache/lucene-solr/blob/releases/lucene-solr/8.7.0/lucene/core/src/java/org/apache/lucene/search/MinShouldMatchSumScorer.java#L303]

 * This is because in the {{setDocAndFreq}} method the {{freq}} is re-computed 
and somehow that doesn't quite work out.
 ** 
[https://github.com/apache/lucene-solr/blob/releases/lucene-solr/8.7.0/lucene/core/src/java/org/apache/lucene/search/MinShouldMatchSumScorer.java#L263]

 * {{setDocAndFreq}} is called by the {{TwoPhaseIterator}} methods.
 ** 
[https://github.com/apache/lucene-solr/blob/releases/lucene-solr/8.7.0/lucene/core/src/java/org/apache/lucene/search/MinShouldMatchSumScorer.java#L135]

 * {{twoPhaseIterator()}} is an optional {{Scorer}} method, it defaults to 
{{null}} i.e. two-phase iteration is not supported.
 ** 
[https://github.com/apache/lucene-solr/blob/releases/lucene-solr/8.7.0/lucene/core/src/java/org/apache/lucene/search/Scorer.java#L74-L91]

 * With the SOLR-14378 and SOLR-14364 changes the {{SolrFeatureScorer}} 
inadvertently claimed to have two-phase iteration support when it does not have 
it.
 ** 
[https://github.com/apache/lucene-solr/blob/releases/lucene-solr/8.7.0/solr/contrib/ltr/src/java/org/apache/solr/ltr/feature/SolrFeature.java#L233]

Proposed fix: [https://github.com/apache/lucene-solr/pull/2196]


was (Author: cpoerschke):
* The {{freq >= minShouldMatch}} assertion in the {{updateFreq}} method fails.
** 
https://github.com/apache/lucene-solr/blob/releases/lucene-solr/8.7.0/lucene/core/src/java/org/apache/lucene/search/MinShouldMatchSumScorer.java#L303

* This is because in the {{setDocAndFreq}} method the {{freq}} is re-computed 
and somehow that doesn't quite work out.
** 
https://github.com/apache/lucene-solr/blob/releases/lucene-solr/8.7.0/lucene/core/src/java/org/apache/lucene/search/MinShouldMatchSumScorer.java#L303

* {{setDocAndFreq}} is called by the {{TwoPhaseIterator}} methods.
** 
https://github.com/apache/lucene-solr/blob/releases/lucene-solr/8.7.0/lucene/core/src/java/org/apache/lucene/search/MinShouldMatchSumScorer.java#L135

* {{twoPhaseIterator()}} is an optional {{Scorer}} method, it defaults to 
{{null}} i.e. two-pharse iteration is not supported.
** 
https://github.com/apache/lucene-solr/blob/releases/lucene-solr/8.7.0/lucene/core/src/java/org/apache/lucene/search/Scorer.java#L74-L91

* With the SOLR-14378 and SOLR-14364 changes the {{SolrFeatureScorer}} 
inadvertently claimed to have two-phase iteration support when it does not have 
it.
** 
https://github.com/apache/lucene-solr/blob/releases/lucene-solr/8.7.0/solr/contrib/ltr/src/java/org/apache/solr/ltr/feature/SolrFeature.java#L233

Proposed fix: https://github.com/apache/lucene-solr/pull/2196

> Bug on LTR when using solr 8.6.3 - index out of bounds 
> DisiPriorityQueue.add(DisiPriorityQueue.java:102)
> 
>
> Key: SOLR-15071
> URL: https://issues.apache.org/jira/browse/SOLR-15071
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: contrib - LTR
>Affects Versions: 8.6, 8.7
>Reporter: Florin Babes
>Assignee: Christine Poerschke
>Priority: Major
>  Labels: ltr
> Fix For: 8.8
>
> Attachments: featurestore+model+sample_documents.zip
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Hello,
> We are trying to update Solr from 8.3.1 to 8.6.3. On Solr 8.3.1 we are
> using LTR in production using a MultipleAdditiveTrees model. On Solr 8.6.3
> we receive an error when we try to compute some SolrFeatures. We didn't
> find any pattern of the queries that fail.
> Example:
> We have the following query raw parameters:
> q=lg cx 4k oled 120 hz -> just of many examples
> term_dq=lg cx 4k oled 120 hz
> rq={!ltr model=model reRankDocs=1000 store=feature_store
> efi.term=${term_dq}}
> defType=edismax,
> mm=2<75%
> The features are something like this:
> {
>  "name":"similarity_query_fileld_1",
>  "class":"org.apache.solr.ltr.feature.SolrFeature",
>  "params":\{"q":"{!dismax qf=query_field_1 mm=1}${term}"},
>  "store":"feature_store"
> },
> {
>  "name":"similarity_query_field_2",
>  "class":"org.apache.solr.ltr.feature.SolrFeature",
>  "params":\{"q":"{!dismax qf=query_field_2 mm=5}${term}"},
>  "store":"feature_store"
> }
> We are testing ~6300 production queries and for about 1% of them we receive
> that following error message:
> "metadata":[
>  "error-clas

[jira] [Updated] (SOLR-15071) Bug on LTR when using solr 8.6.3 - index out of bounds DisiPriorityQueue.add(DisiPriorityQueue.java:102)

2021-01-11 Thread Christine Poerschke (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-15071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Christine Poerschke updated SOLR-15071:
---
Fix Version/s: 8.8

> Bug on LTR when using solr 8.6.3 - index out of bounds 
> DisiPriorityQueue.add(DisiPriorityQueue.java:102)
> 
>
> Key: SOLR-15071
> URL: https://issues.apache.org/jira/browse/SOLR-15071
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: contrib - LTR
>Affects Versions: 8.6, 8.7
>Reporter: Florin Babes
>Assignee: Christine Poerschke
>Priority: Major
>  Labels: ltr
> Fix For: 8.8
>
> Attachments: featurestore+model+sample_documents.zip
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Hello,
> We are trying to update Solr from 8.3.1 to 8.6.3. On Solr 8.3.1 we are
> using LTR in production using a MultipleAdditiveTrees model. On Solr 8.6.3
> we receive an error when we try to compute some SolrFeatures. We didn't
> find any pattern of the queries that fail.
> Example:
> We have the following query raw parameters:
> q=lg cx 4k oled 120 hz -> just of many examples
> term_dq=lg cx 4k oled 120 hz
> rq={!ltr model=model reRankDocs=1000 store=feature_store
> efi.term=${term_dq}}
> defType=edismax,
> mm=2<75%
> The features are something like this:
> {
>  "name":"similarity_query_fileld_1",
>  "class":"org.apache.solr.ltr.feature.SolrFeature",
>  "params":\{"q":"{!dismax qf=query_field_1 mm=1}${term}"},
>  "store":"feature_store"
> },
> {
>  "name":"similarity_query_field_2",
>  "class":"org.apache.solr.ltr.feature.SolrFeature",
>  "params":\{"q":"{!dismax qf=query_field_2 mm=5}${term}"},
>  "store":"feature_store"
> }
> We are testing ~6300 production queries and for about 1% of them we receive
> that following error message:
> "metadata":[
>  "error-class","org.apache.solr.common.SolrException",
>  "root-error-class","java.lang.ArrayIndexOutOfBoundsException"],
>  "msg":"java.lang.ArrayIndexOutOfBoundsException: Index 2 out of bounds
> for length 2",
> The stacktrace is :
> org.apache.solr.common.SolrException:
> java.lang.ArrayIndexOutOfBoundsException: Index 2 out of bounds for length 2
> at org.apache.solr.search.ReRankCollector.topDocs(ReRankCollector.java:154)
> at
> org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:159
> 9)
> at
> org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:1413
> )
> at
> org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:596)
> at
> org.apache.solr.handler.component.QueryComponent.doProcessUngroupedSearch(QueryC
> omponent.java:1513)
> at
> org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:403
> )
> at
> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.
> java:360)
> at
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java
> :214)
> at org.apache.solr.core.SolrCore.execute(SolrCore.java:2627)
> at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:795)
> at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:568)
> at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:415)
> at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:345)
> at
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.jav
> a:1596)
> at
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:545)
> at
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
> at
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:590)
> at
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)
> at
> org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:235
> )
> at
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:161
> 0)
> at
> org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:233
> )
> at
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:130
> 0)
> at
> org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:188)
> at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:485)
> at
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1580
> )
> at
> org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:186)
> at
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1215
> )
> at
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
> at
> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(Conte

[jira] [Assigned] (SOLR-15071) Bug on LTR when using solr 8.6.3 - index out of bounds DisiPriorityQueue.add(DisiPriorityQueue.java:102)

2021-01-11 Thread Christine Poerschke (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-15071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Christine Poerschke reassigned SOLR-15071:
--

Assignee: Christine Poerschke

> Bug on LTR when using solr 8.6.3 - index out of bounds 
> DisiPriorityQueue.add(DisiPriorityQueue.java:102)
> 
>
> Key: SOLR-15071
> URL: https://issues.apache.org/jira/browse/SOLR-15071
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: contrib - LTR
>Affects Versions: 8.6, 8.7
>Reporter: Florin Babes
>Assignee: Christine Poerschke
>Priority: Major
>  Labels: ltr
> Attachments: featurestore+model+sample_documents.zip
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Hello,
> We are trying to update Solr from 8.3.1 to 8.6.3. On Solr 8.3.1 we are
> using LTR in production using a MultipleAdditiveTrees model. On Solr 8.6.3
> we receive an error when we try to compute some SolrFeatures. We didn't
> find any pattern of the queries that fail.
> Example:
> We have the following query raw parameters:
> q=lg cx 4k oled 120 hz -> just of many examples
> term_dq=lg cx 4k oled 120 hz
> rq={!ltr model=model reRankDocs=1000 store=feature_store
> efi.term=${term_dq}}
> defType=edismax,
> mm=2<75%
> The features are something like this:
> {
>  "name":"similarity_query_fileld_1",
>  "class":"org.apache.solr.ltr.feature.SolrFeature",
>  "params":\{"q":"{!dismax qf=query_field_1 mm=1}${term}"},
>  "store":"feature_store"
> },
> {
>  "name":"similarity_query_field_2",
>  "class":"org.apache.solr.ltr.feature.SolrFeature",
>  "params":\{"q":"{!dismax qf=query_field_2 mm=5}${term}"},
>  "store":"feature_store"
> }
> We are testing ~6300 production queries and for about 1% of them we receive
> that following error message:
> "metadata":[
>  "error-class","org.apache.solr.common.SolrException",
>  "root-error-class","java.lang.ArrayIndexOutOfBoundsException"],
>  "msg":"java.lang.ArrayIndexOutOfBoundsException: Index 2 out of bounds
> for length 2",
> The stacktrace is :
> org.apache.solr.common.SolrException:
> java.lang.ArrayIndexOutOfBoundsException: Index 2 out of bounds for length 2
> at org.apache.solr.search.ReRankCollector.topDocs(ReRankCollector.java:154)
> at
> org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:159
> 9)
> at
> org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:1413
> )
> at
> org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:596)
> at
> org.apache.solr.handler.component.QueryComponent.doProcessUngroupedSearch(QueryC
> omponent.java:1513)
> at
> org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:403
> )
> at
> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.
> java:360)
> at
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java
> :214)
> at org.apache.solr.core.SolrCore.execute(SolrCore.java:2627)
> at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:795)
> at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:568)
> at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:415)
> at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:345)
> at
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.jav
> a:1596)
> at
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:545)
> at
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
> at
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:590)
> at
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)
> at
> org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:235
> )
> at
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:161
> 0)
> at
> org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:233
> )
> at
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:130
> 0)
> at
> org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:188)
> at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:485)
> at
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1580
> )
> at
> org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:186)
> at
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1215
> )
> at
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
> at
> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerC

[jira] [Commented] (SOLR-15071) Bug on LTR when using solr 8.6.3 - index out of bounds DisiPriorityQueue.add(DisiPriorityQueue.java:102)

2021-01-11 Thread Christine Poerschke (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-15071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17262947#comment-17262947
 ] 

Christine Poerschke commented on SOLR-15071:


* The {{freq >= minShouldMatch}} assertion in the {{updateFreq}} method fails.
** 
https://github.com/apache/lucene-solr/blob/releases/lucene-solr/8.7.0/lucene/core/src/java/org/apache/lucene/search/MinShouldMatchSumScorer.java#L303

* This is because in the {{setDocAndFreq}} method the {{freq}} is re-computed 
and somehow that doesn't quite work out.
** 
https://github.com/apache/lucene-solr/blob/releases/lucene-solr/8.7.0/lucene/core/src/java/org/apache/lucene/search/MinShouldMatchSumScorer.java#L303

* {{setDocAndFreq}} is called by the {{TwoPhaseIterator}} methods.
** 
https://github.com/apache/lucene-solr/blob/releases/lucene-solr/8.7.0/lucene/core/src/java/org/apache/lucene/search/MinShouldMatchSumScorer.java#L135

* {{twoPhaseIterator()}} is an optional {{Scorer}} method, it defaults to 
{{null}} i.e. two-pharse iteration is not supported.
** 
https://github.com/apache/lucene-solr/blob/releases/lucene-solr/8.7.0/lucene/core/src/java/org/apache/lucene/search/Scorer.java#L74-L91

* With the SOLR-14378 and SOLR-14364 changes the {{SolrFeatureScorer}} 
inadvertently claimed to have two-phase iteration support when it does not have 
it.
** 
https://github.com/apache/lucene-solr/blob/releases/lucene-solr/8.7.0/solr/contrib/ltr/src/java/org/apache/solr/ltr/feature/SolrFeature.java#L233

Proposed fix: https://github.com/apache/lucene-solr/pull/2196

> Bug on LTR when using solr 8.6.3 - index out of bounds 
> DisiPriorityQueue.add(DisiPriorityQueue.java:102)
> 
>
> Key: SOLR-15071
> URL: https://issues.apache.org/jira/browse/SOLR-15071
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: contrib - LTR
>Affects Versions: 8.6, 8.7
>Reporter: Florin Babes
>Priority: Major
>  Labels: ltr
> Attachments: featurestore+model+sample_documents.zip
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Hello,
> We are trying to update Solr from 8.3.1 to 8.6.3. On Solr 8.3.1 we are
> using LTR in production using a MultipleAdditiveTrees model. On Solr 8.6.3
> we receive an error when we try to compute some SolrFeatures. We didn't
> find any pattern of the queries that fail.
> Example:
> We have the following query raw parameters:
> q=lg cx 4k oled 120 hz -> just of many examples
> term_dq=lg cx 4k oled 120 hz
> rq={!ltr model=model reRankDocs=1000 store=feature_store
> efi.term=${term_dq}}
> defType=edismax,
> mm=2<75%
> The features are something like this:
> {
>  "name":"similarity_query_fileld_1",
>  "class":"org.apache.solr.ltr.feature.SolrFeature",
>  "params":\{"q":"{!dismax qf=query_field_1 mm=1}${term}"},
>  "store":"feature_store"
> },
> {
>  "name":"similarity_query_field_2",
>  "class":"org.apache.solr.ltr.feature.SolrFeature",
>  "params":\{"q":"{!dismax qf=query_field_2 mm=5}${term}"},
>  "store":"feature_store"
> }
> We are testing ~6300 production queries and for about 1% of them we receive
> that following error message:
> "metadata":[
>  "error-class","org.apache.solr.common.SolrException",
>  "root-error-class","java.lang.ArrayIndexOutOfBoundsException"],
>  "msg":"java.lang.ArrayIndexOutOfBoundsException: Index 2 out of bounds
> for length 2",
> The stacktrace is :
> org.apache.solr.common.SolrException:
> java.lang.ArrayIndexOutOfBoundsException: Index 2 out of bounds for length 2
> at org.apache.solr.search.ReRankCollector.topDocs(ReRankCollector.java:154)
> at
> org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:159
> 9)
> at
> org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:1413
> )
> at
> org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:596)
> at
> org.apache.solr.handler.component.QueryComponent.doProcessUngroupedSearch(QueryC
> omponent.java:1513)
> at
> org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:403
> )
> at
> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.
> java:360)
> at
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java
> :214)
> at org.apache.solr.core.SolrCore.execute(SolrCore.java:2627)
> at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:795)
> at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:568)
> at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:415)
> at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:345)
> at
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(Serv

[GitHub] [lucene-solr] cpoerschke opened a new pull request #2196: SOLR-15071: Fix ArrayIndexOutOfBoundsException in contrib/ltr SolrFeatureScorer

2021-01-11 Thread GitBox


cpoerschke opened a new pull request #2196:
URL: https://github.com/apache/lucene-solr/pull/2196


   https://issues.apache.org/jira/browse/SOLR-15071



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-15052) Reducing overseer bottlenecks using per-replica states

2021-01-11 Thread Mike Drob (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-15052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17262918#comment-17262918
 ] 

Mike Drob edited comment on SOLR-15052 at 1/11/21, 8:15 PM:


[~noble.paul] I see that you already merged this in to 8x.

Do you have any performance numbers that you can share about this? I did a lot 
of reviewing of the code for structure and clarity, etc, but I haven't had the 
opportunity to run this in a cluster.

Edit: I know that Ishan posted numbers at the beginning of this journey, but 
wanted to check if you had something run on the final version of this.


was (Author: mdrob):
[~noble.paul] I see that you already merged this in to 8x.

Do you have any performance numbers that you can share about this? I did a lot 
of reviewing of the code for structure and clarity, etc, but I haven't had the 
opportunity to run this in a cluster.

> Reducing overseer bottlenecks using per-replica states
> --
>
> Key: SOLR-15052
> URL: https://issues.apache.org/jira/browse/SOLR-15052
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Ishan Chattopadhyaya
>Priority: Major
> Attachments: per-replica-states-gcp.pdf
>
>  Time Spent: 10h 20m
>  Remaining Estimate: 0h
>
> This work has the same goal as SOLR-13951, that is to reduce overseer 
> bottlenecks by avoiding replica state updates from going to the state.json 
> via the overseer. However, the approach taken here is different from 
> SOLR-13951 and hence this work supercedes that work.
> The design proposed is here: 
> https://docs.google.com/document/d/1xdxpzUNmTZbk0vTMZqfen9R3ArdHokLITdiISBxCFUg/edit
> Briefly,
> # Every replica's state will be in a separate znode nested under the 
> state.json. It has the name that encodes the replica name, state, leadership 
> status.
> # An additional children watcher to be set on state.json for state changes.
> # Upon a state change, a ZK multi-op to delete the previous znode and add a 
> new znode with new state.
> Differences between this and SOLR-13951,
> # In SOLR-13951, we planned to leverage shard terms for per shard states.
> # As a consequence, the code changes required for SOLR-13951 were massive (we 
> needed a shard state provider abstraction and introduce it everywhere in the 
> codebase).
> # This approach is a drastically simpler change and design.
> Credits for this design and the PR is due to [~noble.paul]. 
> [~markrmil...@gmail.com], [~noble.paul] and I have collaborated on this 
> effort. The reference branch takes a conceptually similar (but not identical) 
> approach.
> I shall attach a PR and performance benchmarks shortly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-15052) Reducing overseer bottlenecks using per-replica states

2021-01-11 Thread Mike Drob (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-15052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17262918#comment-17262918
 ] 

Mike Drob commented on SOLR-15052:
--

[~noble.paul] I see that you already merged this in to 8x.

Do you have any performance numbers that you can share about this? I did a lot 
of reviewing of the code for structure and clarity, etc, but I haven't had the 
opportunity to run this in a cluster.

> Reducing overseer bottlenecks using per-replica states
> --
>
> Key: SOLR-15052
> URL: https://issues.apache.org/jira/browse/SOLR-15052
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Ishan Chattopadhyaya
>Priority: Major
> Attachments: per-replica-states-gcp.pdf
>
>  Time Spent: 10h 20m
>  Remaining Estimate: 0h
>
> This work has the same goal as SOLR-13951, that is to reduce overseer 
> bottlenecks by avoiding replica state updates from going to the state.json 
> via the overseer. However, the approach taken here is different from 
> SOLR-13951 and hence this work supercedes that work.
> The design proposed is here: 
> https://docs.google.com/document/d/1xdxpzUNmTZbk0vTMZqfen9R3ArdHokLITdiISBxCFUg/edit
> Briefly,
> # Every replica's state will be in a separate znode nested under the 
> state.json. It has the name that encodes the replica name, state, leadership 
> status.
> # An additional children watcher to be set on state.json for state changes.
> # Upon a state change, a ZK multi-op to delete the previous znode and add a 
> new znode with new state.
> Differences between this and SOLR-13951,
> # In SOLR-13951, we planned to leverage shard terms for per shard states.
> # As a consequence, the code changes required for SOLR-13951 were massive (we 
> needed a shard state provider abstraction and introduce it everywhere in the 
> codebase).
> # This approach is a drastically simpler change and design.
> Credits for this design and the PR is due to [~noble.paul]. 
> [~markrmil...@gmail.com], [~noble.paul] and I have collaborated on this 
> effort. The reference branch takes a conceptually similar (but not identical) 
> approach.
> I shall attach a PR and performance benchmarks shortly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-15071) Bug on LTR when using solr 8.6.3 - index out of bounds DisiPriorityQueue.add(DisiPriorityQueue.java:102)

2021-01-11 Thread Christine Poerschke (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-15071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17262916#comment-17262916
 ] 

Christine Poerschke commented on SOLR-15071:


Well, this is a truely interesting bug! I think I've figured out what the issue 
is and how it got broken and how to fix it. Easy fix but with long explanation 
to go with it, will continue after dinner :-)

> Bug on LTR when using solr 8.6.3 - index out of bounds 
> DisiPriorityQueue.add(DisiPriorityQueue.java:102)
> 
>
> Key: SOLR-15071
> URL: https://issues.apache.org/jira/browse/SOLR-15071
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: contrib - LTR
>Affects Versions: 8.6, 8.7
>Reporter: Florin Babes
>Priority: Major
>  Labels: ltr
> Attachments: featurestore+model+sample_documents.zip
>
>
> Hello,
> We are trying to update Solr from 8.3.1 to 8.6.3. On Solr 8.3.1 we are
> using LTR in production using a MultipleAdditiveTrees model. On Solr 8.6.3
> we receive an error when we try to compute some SolrFeatures. We didn't
> find any pattern of the queries that fail.
> Example:
> We have the following query raw parameters:
> q=lg cx 4k oled 120 hz -> just of many examples
> term_dq=lg cx 4k oled 120 hz
> rq={!ltr model=model reRankDocs=1000 store=feature_store
> efi.term=${term_dq}}
> defType=edismax,
> mm=2<75%
> The features are something like this:
> {
>  "name":"similarity_query_fileld_1",
>  "class":"org.apache.solr.ltr.feature.SolrFeature",
>  "params":\{"q":"{!dismax qf=query_field_1 mm=1}${term}"},
>  "store":"feature_store"
> },
> {
>  "name":"similarity_query_field_2",
>  "class":"org.apache.solr.ltr.feature.SolrFeature",
>  "params":\{"q":"{!dismax qf=query_field_2 mm=5}${term}"},
>  "store":"feature_store"
> }
> We are testing ~6300 production queries and for about 1% of them we receive
> that following error message:
> "metadata":[
>  "error-class","org.apache.solr.common.SolrException",
>  "root-error-class","java.lang.ArrayIndexOutOfBoundsException"],
>  "msg":"java.lang.ArrayIndexOutOfBoundsException: Index 2 out of bounds
> for length 2",
> The stacktrace is :
> org.apache.solr.common.SolrException:
> java.lang.ArrayIndexOutOfBoundsException: Index 2 out of bounds for length 2
> at org.apache.solr.search.ReRankCollector.topDocs(ReRankCollector.java:154)
> at
> org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:159
> 9)
> at
> org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:1413
> )
> at
> org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:596)
> at
> org.apache.solr.handler.component.QueryComponent.doProcessUngroupedSearch(QueryC
> omponent.java:1513)
> at
> org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:403
> )
> at
> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.
> java:360)
> at
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java
> :214)
> at org.apache.solr.core.SolrCore.execute(SolrCore.java:2627)
> at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:795)
> at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:568)
> at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:415)
> at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:345)
> at
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.jav
> a:1596)
> at
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:545)
> at
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
> at
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:590)
> at
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)
> at
> org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:235
> )
> at
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:161
> 0)
> at
> org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:233
> )
> at
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:130
> 0)
> at
> org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:188)
> at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:485)
> at
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1580
> )
> at
> org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:186)
> at
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1215
> )
> at
> org.eclipse.jetty.server.handler.Scope

[jira] [Commented] (SOLR-15010) Missing jstack warning is alarming, when using bin/solr as client interface to solr

2021-01-11 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-15010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17262913#comment-17262913
 ] 

ASF subversion and git services commented on SOLR-15010:


Commit 3e2fb59272f5b4d8106b3d8edf847f50bacd7a61 in lucene-solr's branch 
refs/heads/master from Eric Pugh
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=3e2fb59 ]

SOLR-15010 Try to use jattach for threaddump if jstack is missing (#2192)

* introduce jattach check if jstack is missing.  jattach ships in the Solr 
docker image instead of jstack.
* get the full path to the jattach command

Co-authored-by: Christine Poerschke 


> Missing jstack warning is alarming, when using bin/solr as client interface 
> to solr
> ---
>
> Key: SOLR-15010
> URL: https://issues.apache.org/jira/browse/SOLR-15010
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: 8.7
>Reporter: David Eric Pugh
>Assignee: David Eric Pugh
>Priority: Minor
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> In SOLR-14442 we added a warning if jstack wasn't found.   I notice that I 
> use the bin/solr command a lot as a client, so bin solr zk or bin solr 
> healthcheck. 
> For example:
> {{docker exec solr1 solr zk cp /security.json zk:security.json -z zoo1:2181}}
> All of these emit the message:
> The currently defined JAVA_HOME (/usr/local/openjdk-11) refers to a location
> where java was found but jstack was not found. Continuing.
> This is somewhat alarming, and then becomes annoying.   Thoughts on maybe 
> only conducting this check if you are running {{bin/solr start}} or one of 
> the other commands that is actually starting Solr as a process?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] thelabdude merged pull request #2132: SOLR-15036: auto- select / rollup / sort / plist over facet expression when using a collection alias with multiple collections

2021-01-11 Thread GitBox


thelabdude merged pull request #2132:
URL: https://github.com/apache/lucene-solr/pull/2132


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] msokolov commented on pull request #2176: Initial rewrite of MMapDirectory for JDK-16 preview (incubating) PANAMA APIs

2021-01-11 Thread GitBox


msokolov commented on pull request #2176:
URL: https://github.com/apache/lucene-solr/pull/2176#issuecomment-758175143


   @uschindler I pulled latest from this branch 
(ba61072221905c202450034c0c90409f7073ebb3) and re-ran (comparing to updated 
master (6711eb7571727552aad3ace53c52c9a8fe07dc40) and see similar results:
   
   ```
   TaskQPS baseline  StdDevQPS candidate  StdDev
Pct diff p-value
  BrowseMonthTaxoFacets2.15  (7.0%)1.20  (5.6%)  
-44.2% ( -53% -  -33%) 0.000
   BrowseDayOfYearTaxoFacets2.04  (7.1%)1.16  (5.8%)  
-42.9% ( -52% -  -32%) 0.000
   BrowseDateTaxoFacets2.04  (7.1%)1.17  (5.8%)  
-42.8% ( -51% -  -32%) 0.000
LowTerm 1099.09  (4.8%)  936.16  (3.2%)  
-14.8% ( -21% -   -7%) 0.000
   OrNotHighLow  523.44  (7.0%)  450.50  (3.8%)  
-13.9% ( -23% -   -3%) 0.000
Respell   42.23  (2.2%)   36.71  (2.5%)  
-13.1% ( -17% -   -8%) 0.000
   AndHighMedVector  893.77  (2.9%)  786.53  (3.0%)  
-12.0% ( -17% -   -6%) 0.000
 AndHighLow  662.65  (2.9%)  584.68  (2.0%)  
-11.8% ( -16% -   -7%) 0.000
   PKLookup  136.02  (1.5%)  120.03  (1.5%)  
-11.8% ( -14% -   -8%) 0.000
 Fuzzy1   53.41  (6.4%)   47.18  (5.4%)  
-11.7% ( -22% -0%) 0.000
  MedTermVector 1067.80  (3.2%)  946.10  (2.3%)  
-11.4% ( -16% -   -6%) 0.000
   AndHighLowVector  941.02  (3.8%)  834.30  (3.1%)  
-11.3% ( -17% -   -4%) 0.000
  AndHighHighVector  908.27  (2.9%)  808.05  (2.1%)  
-11.0% ( -15% -   -6%) 0.000
  LowTermVector 1010.16  (2.2%)  899.95  (1.9%)  
-10.9% ( -14% -   -6%) 0.000
MedTerm 1167.78  (3.9%) 1043.79  (3.1%)  
-10.6% ( -17% -   -3%) 0.000
  BrowseMonthSSDVFacets   11.99 (13.4%)   10.73  (7.4%)  
-10.5% ( -27% -   11%) 0.002
 AndHighMed  155.89  (3.5%)  139.90  (2.8%)  
-10.3% ( -16% -   -4%) 0.000
   OrNotHighMed  464.05  (4.1%)  416.91  (4.1%)  
-10.2% ( -17% -   -2%) 0.000
 HighTermVector 1222.14  (3.5%) 1105.22  (2.2%)   
-9.6% ( -14% -   -4%) 0.000
   Wildcard   91.66  (2.2%)   83.55  (2.4%)   
-8.9% ( -13% -   -4%) 0.000
LowSloppyPhrase   46.92  (2.1%)   42.77  (1.5%)   
-8.8% ( -12% -   -5%) 0.000
LowSpanNear   87.56  (2.7%)   80.02  (2.0%)   
-8.6% ( -13% -   -3%) 0.000
   OrHighNotMed  470.32  (5.4%)  429.87  (3.3%)   
-8.6% ( -16% -0%) 0.000
   HighTerm 1269.54  (7.1%) 1163.40  (5.8%)   
-8.4% ( -19% -4%) 0.000
  OrHighNotHigh  430.03  (4.1%)  395.13  (4.3%)   
-8.1% ( -15% -0%) 0.000
   OrHighNotLow  585.18  (5.0%)  537.96  (5.5%)   
-8.1% ( -17% -2%) 0.000
AndHighHigh   37.46  (3.2%)   34.45  (2.6%)   
-8.0% ( -13% -   -2%) 0.000
  OrHighLow  576.99  (7.4%)  533.08  (6.4%)   
-7.6% ( -19% -6%) 0.001
 HighPhrase  261.45  (2.1%)  241.99  (1.8%)   
-7.4% ( -11% -   -3%) 0.000
  LowPhrase  375.57  (3.3%)  347.95  (3.7%)   
-7.4% ( -13% -0%) 0.000
   HighSpanNear   20.85  (2.8%)   19.36  (2.0%)   
-7.2% ( -11% -   -2%) 0.000
  MedPhrase   22.28  (1.8%)   20.77  (1.2%)   
-6.8% (  -9% -   -3%) 0.000
MedSloppyPhrase   15.96  (3.0%)   14.92  (2.5%)   
-6.5% ( -11% -0%) 0.000
Prefix3  204.87  (2.0%)  193.03  (2.2%)   
-5.8% (  -9% -   -1%) 0.000
  OrNotHighHigh  496.09  (6.6%)  467.47  (4.1%)   
-5.8% ( -15% -5%) 0.001
   HighSloppyPhrase   22.22  (4.0%)   21.02  (3.0%)   
-5.4% ( -11% -1%) 0.000
  HighTermMonthSort  132.66 (11.4%)  126.18 (13.2%)   
-4.9% ( -26% -   22%) 0.211
 TermDTSort   68.55 (13.3%)   65.22 (15.1%)   
-4.9% ( -29% -   27%) 0.279
  OrHighMed   78.75  (3.8%)   74.95  (2.6%)   
-4.8% ( -10% -1%) 0.000
MedSpanNear   29.19  (2.3%)   27.83  (1.9%)   
-4.7% (  -8% -0%) 0.000
  HighTermDayOfYearSort  105.52 (12.0%)  101.55 (10.1%)   
-3.8% ( -23% -   20%) 0.284
 IntNRQ   76.00 (12.3%)   73.29 (11.9%)   
-3.6% ( -24% -   23%) 0.351
   HighIntervalsOrdered   21.82  (1.5%)   21

[jira] [Assigned] (SOLR-14792) Remove VelocityResponseWriter from Solr 9

2021-01-11 Thread Erik Hatcher (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-14792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erik Hatcher reassigned SOLR-14792:
---

Assignee: Erik Hatcher

> Remove VelocityResponseWriter from Solr 9
> -
>
> Key: SOLR-14792
> URL: https://issues.apache.org/jira/browse/SOLR-14792
> Project: Solr
>  Issue Type: Improvement
>Affects Versions: master (9.0)
>Reporter: Erik Hatcher
>Assignee: Erik Hatcher
>Priority: Blocker
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> VelocityResponseWriter was deprecated in SOLR-14065.   It can now be removed 
> from 9's code branch.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9537) Add Indri Search Engine Functionality to Lucene

2021-01-11 Thread Cameron VandenBerg (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17262861#comment-17262861
 ] 

Cameron VandenBerg commented on LUCENE-9537:


Hi [~mikemccand], I am so sorry for my delay.  I did see you comments, and I 
have actually already made the changes.  Unfortunately, I seem to have messed 
up reverting my changes in some of the files in my git workspace, which 
prevented me from pushing them before break.  Now that I am back from winter 
break, I hope to have the latest changes checked in today or tomorrow.  Thanks 
so much for working with me on this!

> Add Indri Search Engine Functionality to Lucene
> ---
>
> Key: LUCENE-9537
> URL: https://issues.apache.org/jira/browse/LUCENE-9537
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/search
>Reporter: Cameron VandenBerg
>Priority: Major
>  Labels: patch
> Attachments: LUCENE-9537.patch, LUCENE-INDRI.patch
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> Indri ([http://lemurproject.org/indri.php]) is an academic search engine 
> developed by The University of Massachusetts and Carnegie Mellon University.  
> The major difference between Lucene and Indri is that Indri will give a 
> document a "smoothing score" to a document that does not contain the search 
> term, which has improved the search ranking accuracy in our experiments.  I 
> have created an Indri patch, which adds the search code needed to implement 
> the Indri AND logic as well as Indri's implementation of Dirichlet Smoothing.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-15025) MiniSolrCloudCluster.waitForAllNodes ignores passed timeout value

2021-01-11 Thread Mike Drob (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-15025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17262856#comment-17262856
 ] 

Mike Drob commented on SOLR-15025:
--

One more thought here - I looked at the usages of this method, and there are a 
few places calling {{MiniSolrCloudCluster.waitForAllNodes(timeout)}} with a 
value that is clearly meant to be mills instead of seconds (i.e. 1). Can 
you take care of those as well?

> MiniSolrCloudCluster.waitForAllNodes ignores passed timeout value
> -
>
> Key: SOLR-15025
> URL: https://issues.apache.org/jira/browse/SOLR-15025
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: Tests
>Reporter: Mike Drob
>Priority: Major
>  Labels: beginner, newdev
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> the api could also expand to take a time unit?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Resolved] (SOLR-14413) allow timeAllowed and cursorMark parameters

2021-01-11 Thread Mike Drob (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-14413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Drob resolved SOLR-14413.
--
Resolution: Fixed

> allow timeAllowed and cursorMark parameters
> ---
>
> Key: SOLR-14413
> URL: https://issues.apache.org/jira/browse/SOLR-14413
> Project: Solr
>  Issue Type: Improvement
>  Components: search
>Reporter: John Gallagher
>Assignee: Mike Drob
>Priority: Minor
> Fix For: 8.8, master (9.0)
>
> Attachments: SOLR-14413-bram.patch, SOLR-14413-jg-update1.patch, 
> SOLR-14413-jg-update2.patch, SOLR-14413-jg-update3.patch, SOLR-14413.patch, 
> SOLR-14413.testfix.patch, Screen Shot 2020-10-23 at 10.08.26 PM.png, Screen 
> Shot 2020-10-23 at 10.09.11 PM.png, image-2020-08-18-16-56-41-736.png, 
> image-2020-08-18-16-56-59-178.png, image-2020-08-21-14-18-36-229.png, 
> timeallowed_cursormarks_results.txt
>
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> Ever since cursorMarks were introduced in SOLR-5463 in 2014, cursorMark and 
> timeAllowed parameters were not allowed in combination ("Can not search using 
> both cursorMark and timeAllowed")
> , from [QueryComponent.java|#L359]]:
>  
> {code:java}
>  
>  if (null != rb.getCursorMark() && 0 < timeAllowed) {
>   // fundamentally incompatible
>   throw new SolrException(SolrException.ErrorCode.BAD_REQUEST, "Can not 
> search using both " + CursorMarkParams.CURSOR_MARK_PARAM + " and " + 
> CommonParams.TIME_ALLOWED);
> } {code}
> While theoretically impure to use them in combination, it is often desirable 
> to support cursormarks-style deep paging and attempt to protect Solr nodes 
> from runaway queries using timeAllowed, in the hopes that most of the time, 
> the query completes in the allotted time, and there is no conflict.
>  
> However if the query takes too long, it may be preferable to end the query 
> and protect the Solr node and provide the user with a somewhat inaccurate 
> sorted list. As noted in SOLR-6930, SOLR-5986 and others, timeAllowed is 
> frequently used to prevent runaway load.  In fact, cursorMark and 
> shards.tolerant are allowed in combination, so any argument in favor of 
> purity would be a bit muddied in my opinion.
>  
> This was discussed once in the mailing list that I can find: 
> [https://mail-archives.apache.org/mod_mbox/lucene-solr-user/201506.mbox/%3c5591740b.4080...@elyograg.org%3E]
>  It did not look like there was strong support for preventing the combination.
>  
> I have tested cursorMark and timeAllowed combination together, and even when 
> partial results are returned because the timeAllowed is exceeded, the 
> cursorMark response value is still valid and reasonable.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Resolved] (SOLR-15070) SolrJ ClassCastException on XML /suggest requests

2021-01-11 Thread Jason Gerlowski (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-15070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Gerlowski resolved SOLR-15070.

Fix Version/s: master (9.0)
   8.8
   Resolution: Fixed

I've committed fixes for both 8.8 and 9.0.

To maintain back-compatibility and avoid changing the API, the 8.8 fix changes 
the response parsing code in QueryResponse to just check for both types 
(NamedList and Map), and then massage the NamedList version as necessary to 
avoid exceptions when building the SuggesterResponse.

The fix in 9.0/master changes Solr's server-side suggest component to avoid use 
of the HashMap type, which avoids the discrepancy altogether.

> SolrJ ClassCastException on XML /suggest requests
> -
>
> Key: SOLR-15070
> URL: https://issues.apache.org/jira/browse/SOLR-15070
> Project: Solr
>  Issue Type: Task
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrJ, Suggester
>Affects Versions: 8.6.3, master (9.0)
>Reporter: Jason Gerlowski
>Assignee: Jason Gerlowski
>Priority: Minor
> Fix For: 8.8, master (9.0)
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> SolrJ throws a ClassCastException when parsing XML {{/suggest}} responses.
> The following code snippet (run against the techproduct example) produces the 
> stack trace that follows:
> {code}
>   public void testSuggestFailure() throws Exception {
> HttpSolrClient client = new HttpSolrClient.Builder()
> .withBaseSolrUrl("http://localhost:8983/solr/techproducts";)
> .withResponseParser(new XMLResponseParser())
> .build();
> Map queryParamMap = new HashMap<>();
> queryParamMap.put("qt", "/suggest");
> queryParamMap.put("suggest.q", "elec");
> queryParamMap.put("suggest.build", "true");
> queryParamMap.put("suggest", "true");
> queryParamMap.put("suggest.dictionary", "mySuggester");
> queryParamMap.put("wt", "xml");
> QueryResponse resp = client.query(new MapSolrParams(queryParamMap));
>   }
> {code}
> {code}
> java.lang.ClassCastException: class 
> org.apache.solr.common.util.SimpleOrderedMap cannot be cast to class 
> java.util.Map (org.apache.solr.common.util.SimpleOrderedMap is in unnamed 
> module of loader 'app'; java.util.Map is in module java.base of loader 
> 'bootstrap')
> at 
> org.apache.solr.client.solrj.response.QueryResponse.setResponse(QueryResponse.java:170)
>  ~[solr-solrj-8.4.1.jar:8.4.1 832bf13dd9187095831caf69783179d41059d013 - 
> ishan - 2020-01-10 13:40:30]
> at org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:211) 
> ~[solr-solrj-8.4.1.jar:8.4.1 832bf13dd9187095831caf69783179d41059d013 - ishan 
> - 2020-01-10 13:40:30]
> at org.apache.solr.client.solrj.SolrClient.query(SolrClient.java:1003) 
> ~[solr-solrj-8.4.1.jar:8.4.1 832bf13dd9187095831caf69783179d41059d013 - ishan 
> - 2020-01-10 13:40:30]
> at org.apache.solr.client.solrj.SolrClient.query(SolrClient.java:1018) 
> ~[solr-solrj-8.4.1.jar:8.4.1 832bf13dd9187095831caf69783179d41059d013 - ishan 
> - 2020-01-10 13:40:30]
> {code}
> SolrJ's {{QueryResponse}} expects the "suggest" section in the NamedList 
> response to be a Map - and for requests that use javabin (the default 
> ResponseParser), it is.  But when parsing XML responses the "suggest" section 
> deserialized as a SimpleOrderedMap (which despite the name doesn't implement 
> Map).
> The root cause afaict is that SuggestComponent [uses a 
> type|https://github.com/apache/lucene-solr/blob/43b1a2fdc7a4bf8e5c8409013d07858dec6d0c35/solr/core/src/java/org/apache/solr/handler/component/SuggestComponent.java#L261]
>  (HashMap) that serializes/deserializes differently based on the codec/wt 
> used on the wire.  JavaBinCodec has special handling for maps that our XML 
> serialization doesn't have, so the two produce different response structures 
> on the client side.
> The "right" fix here is to change SuggestComponent's response to only use 
> types that serialize/deserialize identically in all SolrJ's ResponseParser's. 
>  This is a breaking change though - a SolrJ user making /suggest requests, 
> getting the responses via javavbin, and inspecting the resulting NamedList 
> directly would get a different object tree after this fix than they would've 
> before.  So an 8.x fix would need to take a different approach.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (SOLR-15036) Use plist automatically for executing a facet expression against a collection alias backed by multiple collections

2021-01-11 Thread Timothy Potter (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-15036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Timothy Potter updated SOLR-15036:
--
Fix Version/s: master (9.0)
   8.8
   Resolution: Fixed
   Status: Resolved  (was: Patch Available)

> Use plist automatically for executing a facet expression against a collection 
> alias backed by multiple collections
> --
>
> Key: SOLR-15036
> URL: https://issues.apache.org/jira/browse/SOLR-15036
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: streaming expressions
>Reporter: Timothy Potter
>Assignee: Timothy Potter
>Priority: Major
> Fix For: 8.8, master (9.0)
>
> Attachments: relay-approach.patch
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> For analytics use cases, streaming expressions make it possible to compute 
> basic aggregations (count, min, max, sum, and avg) over massive data sets. 
> Moreover, with massive data sets, it is common to use collection aliases over 
> many underlying collections, for instance time-partitioned aliases backed by 
> a set of collections, each covering a specific time range. In some cases, we 
> can end up with many collections (think 50-60) each with 100's of shards. 
> Aliases help insulate client applications from complex collection topologies 
> on the server side.
> Let's take a basic facet expression that computes some useful aggregation 
> metrics:
> {code:java}
> facet(
>   some_alias, 
>   q="*:*", 
>   fl="a_i", 
>   sort="a_i asc", 
>   buckets="a_i", 
>   bucketSorts="count(*) asc", 
>   bucketSizeLimit=1, 
>   sum(a_d), avg(a_d), min(a_d), max(a_d), count(*)
> )
> {code}
> Behind the scenes, the {{FacetStream}} sends a JSON facet request to Solr 
> which then expands the alias to a list of collections. For each collection, 
> the top-level distributed query controller gathers a candidate set of 
> replicas to query and then scatters {{distrib=false}} queries to each replica 
> in the list. For instance, if we have 60 collections with 200 shards each, 
> then this results in 12,000 shard requests from the query controller node to 
> the other nodes in the cluster. The requests are sent in an async manner (see 
> {{SearchHandler}} and {{HttpShardHandler}}) In my testing, we’ve seen cases 
> where we hit 18,000 replicas and these queries don’t always come back in a 
> timely manner. Put simply, this also puts a lot of load on the top-level 
> query controller node in terms of open connections and new object creation.
> Instead, we can use {{plist}} to send the JSON facet query to each collection 
> in the alias in parallel, which reduces the overhead of each top-level 
> distributed query from 12,000 to 200 in my example above. With this approach, 
> you’ll then need to sort the tuples back from each collection and do a 
> rollup, something like:
> {code:java}
> select(
>   rollup(
> sort(
>   plist(
> select(facet(coll1,q="*:*", fl="a_i", sort="a_i asc", buckets="a_i", 
> bucketSorts="count(*) asc", bucketSizeLimit=1, sum(a_d), avg(a_d), 
> min(a_d), max(a_d), count(*)),a_i,sum(a_d) as the_sum, avg(a_d) as the_avg, 
> min(a_d) as the_min, max(a_d) as the_max, count(*) as cnt),
> select(facet(coll2,q="*:*", fl="a_i", sort="a_i asc", buckets="a_i", 
> bucketSorts="count(*) asc", bucketSizeLimit=1, sum(a_d), avg(a_d), 
> min(a_d), max(a_d), count(*)),a_i,sum(a_d) as the_sum, avg(a_d) as the_avg, 
> min(a_d) as the_min, max(a_d) as the_max, count(*) as cnt)
>   ),
>   by="a_i asc"
> ),
> over="a_i",
> sum(the_sum), avg(the_avg), min(the_min), max(the_max), sum(cnt)
>   ),
>   a_i, sum(the_sum) as the_sum, avg(the_avg) as the_avg, min(the_min) as 
> the_min, max(the_max) as the_max, sum(cnt) as cnt
> )
> {code}
> One thing to point out is that you can’t just avg. the averages back from 
> each collection in the rollup. It needs to be a *weighted avg.* when rolling 
> up the avg. from each facet expression in the plist. However, we have the 
> count per collection, so this is doable but will require some changes to the 
> rollup expression to support weighted average.
> While this plist approach is doable, it’s a pain for users to have to create 
> the rollup / sort over plist expression for collection aliases. After all, 
> aliases are supposed to hide these types of complexities from client 
> applications!
> The point of this ticket is to investigate the feasibility of auto-wrapping 
> the facet expression with a rollup / sort / plist when the collection 
> argument is an alias with multiple collections; other stream sources will be 
> considered after facet

[jira] [Commented] (SOLR-15036) Use plist automatically for executing a facet expression against a collection alias backed by multiple collections

2021-01-11 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-15036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17262848#comment-17262848
 ] 

ASF subversion and git services commented on SOLR-15036:


Commit 7933c9004e79acb9a05c9ad92cd316ea7d868454 in lucene-solr's branch 
refs/heads/branch_8x from Timothy Potter
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=7933c90 ]

SOLR-15036: Automatically wrap a facet expression with a select / rollup / sort 
/ plist when using a collection alias with multiple collections and 
count|sum|min|max|avg metrics. (backport of #2132) (#2195)



> Use plist automatically for executing a facet expression against a collection 
> alias backed by multiple collections
> --
>
> Key: SOLR-15036
> URL: https://issues.apache.org/jira/browse/SOLR-15036
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: streaming expressions
>Reporter: Timothy Potter
>Assignee: Timothy Potter
>Priority: Major
> Attachments: relay-approach.patch
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> For analytics use cases, streaming expressions make it possible to compute 
> basic aggregations (count, min, max, sum, and avg) over massive data sets. 
> Moreover, with massive data sets, it is common to use collection aliases over 
> many underlying collections, for instance time-partitioned aliases backed by 
> a set of collections, each covering a specific time range. In some cases, we 
> can end up with many collections (think 50-60) each with 100's of shards. 
> Aliases help insulate client applications from complex collection topologies 
> on the server side.
> Let's take a basic facet expression that computes some useful aggregation 
> metrics:
> {code:java}
> facet(
>   some_alias, 
>   q="*:*", 
>   fl="a_i", 
>   sort="a_i asc", 
>   buckets="a_i", 
>   bucketSorts="count(*) asc", 
>   bucketSizeLimit=1, 
>   sum(a_d), avg(a_d), min(a_d), max(a_d), count(*)
> )
> {code}
> Behind the scenes, the {{FacetStream}} sends a JSON facet request to Solr 
> which then expands the alias to a list of collections. For each collection, 
> the top-level distributed query controller gathers a candidate set of 
> replicas to query and then scatters {{distrib=false}} queries to each replica 
> in the list. For instance, if we have 60 collections with 200 shards each, 
> then this results in 12,000 shard requests from the query controller node to 
> the other nodes in the cluster. The requests are sent in an async manner (see 
> {{SearchHandler}} and {{HttpShardHandler}}) In my testing, we’ve seen cases 
> where we hit 18,000 replicas and these queries don’t always come back in a 
> timely manner. Put simply, this also puts a lot of load on the top-level 
> query controller node in terms of open connections and new object creation.
> Instead, we can use {{plist}} to send the JSON facet query to each collection 
> in the alias in parallel, which reduces the overhead of each top-level 
> distributed query from 12,000 to 200 in my example above. With this approach, 
> you’ll then need to sort the tuples back from each collection and do a 
> rollup, something like:
> {code:java}
> select(
>   rollup(
> sort(
>   plist(
> select(facet(coll1,q="*:*", fl="a_i", sort="a_i asc", buckets="a_i", 
> bucketSorts="count(*) asc", bucketSizeLimit=1, sum(a_d), avg(a_d), 
> min(a_d), max(a_d), count(*)),a_i,sum(a_d) as the_sum, avg(a_d) as the_avg, 
> min(a_d) as the_min, max(a_d) as the_max, count(*) as cnt),
> select(facet(coll2,q="*:*", fl="a_i", sort="a_i asc", buckets="a_i", 
> bucketSorts="count(*) asc", bucketSizeLimit=1, sum(a_d), avg(a_d), 
> min(a_d), max(a_d), count(*)),a_i,sum(a_d) as the_sum, avg(a_d) as the_avg, 
> min(a_d) as the_min, max(a_d) as the_max, count(*) as cnt)
>   ),
>   by="a_i asc"
> ),
> over="a_i",
> sum(the_sum), avg(the_avg), min(the_min), max(the_max), sum(cnt)
>   ),
>   a_i, sum(the_sum) as the_sum, avg(the_avg) as the_avg, min(the_min) as 
> the_min, max(the_max) as the_max, sum(cnt) as cnt
> )
> {code}
> One thing to point out is that you can’t just avg. the averages back from 
> each collection in the rollup. It needs to be a *weighted avg.* when rolling 
> up the avg. from each facet expression in the plist. However, we have the 
> count per collection, so this is doable but will require some changes to the 
> rollup expression to support weighted average.
> While this plist approach is doable, it’s a pain for users to have to create 
> the rollup / sort over plist expression for collection aliases. After all, 
> aliases are supposed to hide th

[jira] [Commented] (SOLR-14413) allow timeAllowed and cursorMark parameters

2021-01-11 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17262847#comment-17262847
 ] 

ASF subversion and git services commented on SOLR-14413:


Commit 87ed3439e88b664fe9ee935152fef700a47182de in lucene-solr's branch 
refs/heads/branch_8x from Mike Drob
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=87ed343 ]

SOLR-14413 fix unit test to use delayed handler (#2189)


> allow timeAllowed and cursorMark parameters
> ---
>
> Key: SOLR-14413
> URL: https://issues.apache.org/jira/browse/SOLR-14413
> Project: Solr
>  Issue Type: Improvement
>  Components: search
>Reporter: John Gallagher
>Assignee: Mike Drob
>Priority: Minor
> Fix For: 8.8, master (9.0)
>
> Attachments: SOLR-14413-bram.patch, SOLR-14413-jg-update1.patch, 
> SOLR-14413-jg-update2.patch, SOLR-14413-jg-update3.patch, SOLR-14413.patch, 
> SOLR-14413.testfix.patch, Screen Shot 2020-10-23 at 10.08.26 PM.png, Screen 
> Shot 2020-10-23 at 10.09.11 PM.png, image-2020-08-18-16-56-41-736.png, 
> image-2020-08-18-16-56-59-178.png, image-2020-08-21-14-18-36-229.png, 
> timeallowed_cursormarks_results.txt
>
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> Ever since cursorMarks were introduced in SOLR-5463 in 2014, cursorMark and 
> timeAllowed parameters were not allowed in combination ("Can not search using 
> both cursorMark and timeAllowed")
> , from [QueryComponent.java|#L359]]:
>  
> {code:java}
>  
>  if (null != rb.getCursorMark() && 0 < timeAllowed) {
>   // fundamentally incompatible
>   throw new SolrException(SolrException.ErrorCode.BAD_REQUEST, "Can not 
> search using both " + CursorMarkParams.CURSOR_MARK_PARAM + " and " + 
> CommonParams.TIME_ALLOWED);
> } {code}
> While theoretically impure to use them in combination, it is often desirable 
> to support cursormarks-style deep paging and attempt to protect Solr nodes 
> from runaway queries using timeAllowed, in the hopes that most of the time, 
> the query completes in the allotted time, and there is no conflict.
>  
> However if the query takes too long, it may be preferable to end the query 
> and protect the Solr node and provide the user with a somewhat inaccurate 
> sorted list. As noted in SOLR-6930, SOLR-5986 and others, timeAllowed is 
> frequently used to prevent runaway load.  In fact, cursorMark and 
> shards.tolerant are allowed in combination, so any argument in favor of 
> purity would be a bit muddied in my opinion.
>  
> This was discussed once in the mailing list that I can find: 
> [https://mail-archives.apache.org/mod_mbox/lucene-solr-user/201506.mbox/%3c5591740b.4080...@elyograg.org%3E]
>  It did not look like there was strong support for preventing the combination.
>  
> I have tested cursorMark and timeAllowed combination together, and even when 
> partial results are returned because the timeAllowed is exceeded, the 
> cursorMark response value is still valid and reasonable.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] thelabdude merged pull request #2195: SOLR-15036: Automatically wrap a facet expression with a select / rollup / sort / plist when using a collection alias with multiple collect

2021-01-11 Thread GitBox


thelabdude merged pull request #2195:
URL: https://github.com/apache/lucene-solr/pull/2195


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-15070) SolrJ ClassCastException on XML /suggest requests

2021-01-11 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-15070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17262846#comment-17262846
 ] 

ASF subversion and git services commented on SOLR-15070:


Commit d44ca858f5e10035de956094dd624e7ca3ee79f7 in lucene-solr's branch 
refs/heads/branch_8x from Jason Gerlowski
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=d44ca85 ]

SOLR-15070: Fix suggest SolrJ requests w/ XML

Prior to this commit, /suggest requests made using SolrJ's
XMLResponseParser failed with a ClassCastException.  The problem:
javabin and xml requests were serializing/deserializing Solr's response
object differently.  Use of the default format (javabin) resulted in a
Map being used to wrap the 'suggest' response on the client side, use of
XML results in the response being wrapped in a SimpleOrderedMap.

The root cause of this problem was the use of a type on the server-side
that our XML serde code doesn't correctly "round-trip".  This has been
changed in 9.0, but in a backwards-incompatible way that's inappropriate
for the 8.x codeline.  This commit papers over the problem in 8x by
checking for both types in SolrJ's QueryResponse and massaging the types
as necessary.


> SolrJ ClassCastException on XML /suggest requests
> -
>
> Key: SOLR-15070
> URL: https://issues.apache.org/jira/browse/SOLR-15070
> Project: Solr
>  Issue Type: Task
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrJ, Suggester
>Affects Versions: 8.6.3, master (9.0)
>Reporter: Jason Gerlowski
>Assignee: Jason Gerlowski
>Priority: Minor
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> SolrJ throws a ClassCastException when parsing XML {{/suggest}} responses.
> The following code snippet (run against the techproduct example) produces the 
> stack trace that follows:
> {code}
>   public void testSuggestFailure() throws Exception {
> HttpSolrClient client = new HttpSolrClient.Builder()
> .withBaseSolrUrl("http://localhost:8983/solr/techproducts";)
> .withResponseParser(new XMLResponseParser())
> .build();
> Map queryParamMap = new HashMap<>();
> queryParamMap.put("qt", "/suggest");
> queryParamMap.put("suggest.q", "elec");
> queryParamMap.put("suggest.build", "true");
> queryParamMap.put("suggest", "true");
> queryParamMap.put("suggest.dictionary", "mySuggester");
> queryParamMap.put("wt", "xml");
> QueryResponse resp = client.query(new MapSolrParams(queryParamMap));
>   }
> {code}
> {code}
> java.lang.ClassCastException: class 
> org.apache.solr.common.util.SimpleOrderedMap cannot be cast to class 
> java.util.Map (org.apache.solr.common.util.SimpleOrderedMap is in unnamed 
> module of loader 'app'; java.util.Map is in module java.base of loader 
> 'bootstrap')
> at 
> org.apache.solr.client.solrj.response.QueryResponse.setResponse(QueryResponse.java:170)
>  ~[solr-solrj-8.4.1.jar:8.4.1 832bf13dd9187095831caf69783179d41059d013 - 
> ishan - 2020-01-10 13:40:30]
> at org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:211) 
> ~[solr-solrj-8.4.1.jar:8.4.1 832bf13dd9187095831caf69783179d41059d013 - ishan 
> - 2020-01-10 13:40:30]
> at org.apache.solr.client.solrj.SolrClient.query(SolrClient.java:1003) 
> ~[solr-solrj-8.4.1.jar:8.4.1 832bf13dd9187095831caf69783179d41059d013 - ishan 
> - 2020-01-10 13:40:30]
> at org.apache.solr.client.solrj.SolrClient.query(SolrClient.java:1018) 
> ~[solr-solrj-8.4.1.jar:8.4.1 832bf13dd9187095831caf69783179d41059d013 - ishan 
> - 2020-01-10 13:40:30]
> {code}
> SolrJ's {{QueryResponse}} expects the "suggest" section in the NamedList 
> response to be a Map - and for requests that use javabin (the default 
> ResponseParser), it is.  But when parsing XML responses the "suggest" section 
> deserialized as a SimpleOrderedMap (which despite the name doesn't implement 
> Map).
> The root cause afaict is that SuggestComponent [uses a 
> type|https://github.com/apache/lucene-solr/blob/43b1a2fdc7a4bf8e5c8409013d07858dec6d0c35/solr/core/src/java/org/apache/solr/handler/component/SuggestComponent.java#L261]
>  (HashMap) that serializes/deserializes differently based on the codec/wt 
> used on the wire.  JavaBinCodec has special handling for maps that our XML 
> serialization doesn't have, so the two produce different response structures 
> on the client side.
> The "right" fix here is to change SuggestComponent's response to only use 
> types that serialize/deserialize identically in all SolrJ's ResponseParser's. 
>  This is a breaking change though - a SolrJ user making /suggest requests, 
> getting the responses via javavbin, and inspecting the resulting NamedList 
> directly would get a different object tree after this fix

[GitHub] [lucene-solr] thelabdude opened a new pull request #2195: SOLR-15036: Automatically wrap a facet expression with a select / rollup / sort / plist when using a collection alias with multiple c

2021-01-11 Thread GitBox


thelabdude opened a new pull request #2195:
URL: https://github.com/apache/lucene-solr/pull/2195


   Back-port to 8x, see original PR: 
https://github.com/apache/lucene-solr/pull/2132



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-14413) allow timeAllowed and cursorMark parameters

2021-01-11 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17262841#comment-17262841
 ] 

ASF subversion and git services commented on SOLR-14413:


Commit a429b969d87090c9cc1ec787553528bece0d809e in lucene-solr's branch 
refs/heads/master from Mike Drob
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=a429b96 ]

SOLR-14413 fix unit test to use delayed handler (#2189)



> allow timeAllowed and cursorMark parameters
> ---
>
> Key: SOLR-14413
> URL: https://issues.apache.org/jira/browse/SOLR-14413
> Project: Solr
>  Issue Type: Improvement
>  Components: search
>Reporter: John Gallagher
>Assignee: Mike Drob
>Priority: Minor
> Fix For: 8.8, master (9.0)
>
> Attachments: SOLR-14413-bram.patch, SOLR-14413-jg-update1.patch, 
> SOLR-14413-jg-update2.patch, SOLR-14413-jg-update3.patch, SOLR-14413.patch, 
> SOLR-14413.testfix.patch, Screen Shot 2020-10-23 at 10.08.26 PM.png, Screen 
> Shot 2020-10-23 at 10.09.11 PM.png, image-2020-08-18-16-56-41-736.png, 
> image-2020-08-18-16-56-59-178.png, image-2020-08-21-14-18-36-229.png, 
> timeallowed_cursormarks_results.txt
>
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> Ever since cursorMarks were introduced in SOLR-5463 in 2014, cursorMark and 
> timeAllowed parameters were not allowed in combination ("Can not search using 
> both cursorMark and timeAllowed")
> , from [QueryComponent.java|#L359]]:
>  
> {code:java}
>  
>  if (null != rb.getCursorMark() && 0 < timeAllowed) {
>   // fundamentally incompatible
>   throw new SolrException(SolrException.ErrorCode.BAD_REQUEST, "Can not 
> search using both " + CursorMarkParams.CURSOR_MARK_PARAM + " and " + 
> CommonParams.TIME_ALLOWED);
> } {code}
> While theoretically impure to use them in combination, it is often desirable 
> to support cursormarks-style deep paging and attempt to protect Solr nodes 
> from runaway queries using timeAllowed, in the hopes that most of the time, 
> the query completes in the allotted time, and there is no conflict.
>  
> However if the query takes too long, it may be preferable to end the query 
> and protect the Solr node and provide the user with a somewhat inaccurate 
> sorted list. As noted in SOLR-6930, SOLR-5986 and others, timeAllowed is 
> frequently used to prevent runaway load.  In fact, cursorMark and 
> shards.tolerant are allowed in combination, so any argument in favor of 
> purity would be a bit muddied in my opinion.
>  
> This was discussed once in the mailing list that I can find: 
> [https://mail-archives.apache.org/mod_mbox/lucene-solr-user/201506.mbox/%3c5591740b.4080...@elyograg.org%3E]
>  It did not look like there was strong support for preventing the combination.
>  
> I have tested cursorMark and timeAllowed combination together, and even when 
> partial results are returned because the timeAllowed is exceeded, the 
> cursorMark response value is still valid and reasonable.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] madrob merged pull request #2189: SOLR-14413 fix unit test to use delayed handler

2021-01-11 Thread GitBox


madrob merged pull request #2189:
URL: https://github.com/apache/lucene-solr/pull/2189


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-15036) Use plist automatically for executing a facet expression against a collection alias backed by multiple collections

2021-01-11 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-15036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17262823#comment-17262823
 ] 

ASF subversion and git services commented on SOLR-15036:


Commit 6711eb7571727552aad3ace53c52c9a8fe07dc40 in lucene-solr's branch 
refs/heads/master from Timothy Potter
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=6711eb7 ]

SOLR-15036: auto- select / rollup / sort / plist over facet expression when 
using a collection alias with multiple collections (#2132)



> Use plist automatically for executing a facet expression against a collection 
> alias backed by multiple collections
> --
>
> Key: SOLR-15036
> URL: https://issues.apache.org/jira/browse/SOLR-15036
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: streaming expressions
>Reporter: Timothy Potter
>Assignee: Timothy Potter
>Priority: Major
> Attachments: relay-approach.patch
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> For analytics use cases, streaming expressions make it possible to compute 
> basic aggregations (count, min, max, sum, and avg) over massive data sets. 
> Moreover, with massive data sets, it is common to use collection aliases over 
> many underlying collections, for instance time-partitioned aliases backed by 
> a set of collections, each covering a specific time range. In some cases, we 
> can end up with many collections (think 50-60) each with 100's of shards. 
> Aliases help insulate client applications from complex collection topologies 
> on the server side.
> Let's take a basic facet expression that computes some useful aggregation 
> metrics:
> {code:java}
> facet(
>   some_alias, 
>   q="*:*", 
>   fl="a_i", 
>   sort="a_i asc", 
>   buckets="a_i", 
>   bucketSorts="count(*) asc", 
>   bucketSizeLimit=1, 
>   sum(a_d), avg(a_d), min(a_d), max(a_d), count(*)
> )
> {code}
> Behind the scenes, the {{FacetStream}} sends a JSON facet request to Solr 
> which then expands the alias to a list of collections. For each collection, 
> the top-level distributed query controller gathers a candidate set of 
> replicas to query and then scatters {{distrib=false}} queries to each replica 
> in the list. For instance, if we have 60 collections with 200 shards each, 
> then this results in 12,000 shard requests from the query controller node to 
> the other nodes in the cluster. The requests are sent in an async manner (see 
> {{SearchHandler}} and {{HttpShardHandler}}) In my testing, we’ve seen cases 
> where we hit 18,000 replicas and these queries don’t always come back in a 
> timely manner. Put simply, this also puts a lot of load on the top-level 
> query controller node in terms of open connections and new object creation.
> Instead, we can use {{plist}} to send the JSON facet query to each collection 
> in the alias in parallel, which reduces the overhead of each top-level 
> distributed query from 12,000 to 200 in my example above. With this approach, 
> you’ll then need to sort the tuples back from each collection and do a 
> rollup, something like:
> {code:java}
> select(
>   rollup(
> sort(
>   plist(
> select(facet(coll1,q="*:*", fl="a_i", sort="a_i asc", buckets="a_i", 
> bucketSorts="count(*) asc", bucketSizeLimit=1, sum(a_d), avg(a_d), 
> min(a_d), max(a_d), count(*)),a_i,sum(a_d) as the_sum, avg(a_d) as the_avg, 
> min(a_d) as the_min, max(a_d) as the_max, count(*) as cnt),
> select(facet(coll2,q="*:*", fl="a_i", sort="a_i asc", buckets="a_i", 
> bucketSorts="count(*) asc", bucketSizeLimit=1, sum(a_d), avg(a_d), 
> min(a_d), max(a_d), count(*)),a_i,sum(a_d) as the_sum, avg(a_d) as the_avg, 
> min(a_d) as the_min, max(a_d) as the_max, count(*) as cnt)
>   ),
>   by="a_i asc"
> ),
> over="a_i",
> sum(the_sum), avg(the_avg), min(the_min), max(the_max), sum(cnt)
>   ),
>   a_i, sum(the_sum) as the_sum, avg(the_avg) as the_avg, min(the_min) as 
> the_min, max(the_max) as the_max, sum(cnt) as cnt
> )
> {code}
> One thing to point out is that you can’t just avg. the averages back from 
> each collection in the rollup. It needs to be a *weighted avg.* when rolling 
> up the avg. from each facet expression in the plist. However, we have the 
> count per collection, so this is doable but will require some changes to the 
> rollup expression to support weighted average.
> While this plist approach is doable, it’s a pain for users to have to create 
> the rollup / sort over plist expression for collection aliases. After all, 
> aliases are supposed to hide these types of complexities from client 
> applications!
> The point of th

[jira] [Commented] (SOLR-15036) Use plist automatically for executing a facet expression against a collection alias backed by multiple collections

2021-01-11 Thread Joel Bernstein (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-15036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17262822#comment-17262822
 ] 

Joel Bernstein commented on SOLR-15036:
---

It is indeed missing from the docs. I will add it.

> Use plist automatically for executing a facet expression against a collection 
> alias backed by multiple collections
> --
>
> Key: SOLR-15036
> URL: https://issues.apache.org/jira/browse/SOLR-15036
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: streaming expressions
>Reporter: Timothy Potter
>Assignee: Timothy Potter
>Priority: Major
> Attachments: relay-approach.patch
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> For analytics use cases, streaming expressions make it possible to compute 
> basic aggregations (count, min, max, sum, and avg) over massive data sets. 
> Moreover, with massive data sets, it is common to use collection aliases over 
> many underlying collections, for instance time-partitioned aliases backed by 
> a set of collections, each covering a specific time range. In some cases, we 
> can end up with many collections (think 50-60) each with 100's of shards. 
> Aliases help insulate client applications from complex collection topologies 
> on the server side.
> Let's take a basic facet expression that computes some useful aggregation 
> metrics:
> {code:java}
> facet(
>   some_alias, 
>   q="*:*", 
>   fl="a_i", 
>   sort="a_i asc", 
>   buckets="a_i", 
>   bucketSorts="count(*) asc", 
>   bucketSizeLimit=1, 
>   sum(a_d), avg(a_d), min(a_d), max(a_d), count(*)
> )
> {code}
> Behind the scenes, the {{FacetStream}} sends a JSON facet request to Solr 
> which then expands the alias to a list of collections. For each collection, 
> the top-level distributed query controller gathers a candidate set of 
> replicas to query and then scatters {{distrib=false}} queries to each replica 
> in the list. For instance, if we have 60 collections with 200 shards each, 
> then this results in 12,000 shard requests from the query controller node to 
> the other nodes in the cluster. The requests are sent in an async manner (see 
> {{SearchHandler}} and {{HttpShardHandler}}) In my testing, we’ve seen cases 
> where we hit 18,000 replicas and these queries don’t always come back in a 
> timely manner. Put simply, this also puts a lot of load on the top-level 
> query controller node in terms of open connections and new object creation.
> Instead, we can use {{plist}} to send the JSON facet query to each collection 
> in the alias in parallel, which reduces the overhead of each top-level 
> distributed query from 12,000 to 200 in my example above. With this approach, 
> you’ll then need to sort the tuples back from each collection and do a 
> rollup, something like:
> {code:java}
> select(
>   rollup(
> sort(
>   plist(
> select(facet(coll1,q="*:*", fl="a_i", sort="a_i asc", buckets="a_i", 
> bucketSorts="count(*) asc", bucketSizeLimit=1, sum(a_d), avg(a_d), 
> min(a_d), max(a_d), count(*)),a_i,sum(a_d) as the_sum, avg(a_d) as the_avg, 
> min(a_d) as the_min, max(a_d) as the_max, count(*) as cnt),
> select(facet(coll2,q="*:*", fl="a_i", sort="a_i asc", buckets="a_i", 
> bucketSorts="count(*) asc", bucketSizeLimit=1, sum(a_d), avg(a_d), 
> min(a_d), max(a_d), count(*)),a_i,sum(a_d) as the_sum, avg(a_d) as the_avg, 
> min(a_d) as the_min, max(a_d) as the_max, count(*) as cnt)
>   ),
>   by="a_i asc"
> ),
> over="a_i",
> sum(the_sum), avg(the_avg), min(the_min), max(the_max), sum(cnt)
>   ),
>   a_i, sum(the_sum) as the_sum, avg(the_avg) as the_avg, min(the_min) as 
> the_min, max(the_max) as the_max, sum(cnt) as cnt
> )
> {code}
> One thing to point out is that you can’t just avg. the averages back from 
> each collection in the rollup. It needs to be a *weighted avg.* when rolling 
> up the avg. from each facet expression in the plist. However, we have the 
> count per collection, so this is doable but will require some changes to the 
> rollup expression to support weighted average.
> While this plist approach is doable, it’s a pain for users to have to create 
> the rollup / sort over plist expression for collection aliases. After all, 
> aliases are supposed to hide these types of complexities from client 
> applications!
> The point of this ticket is to investigate the feasibility of auto-wrapping 
> the facet expression with a rollup / sort / plist when the collection 
> argument is an alias with multiple collections; other stream sources will be 
> considered after facet is proven out.
> Lastly, I also considered an alternative appro

[jira] [Comment Edited] (SOLR-14265) Move collections admin API to v2 completely

2021-01-11 Thread David Eric Pugh (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17262793#comment-17262793
 ] 

David Eric Pugh edited comment on SOLR-14265 at 1/11/21, 5:21 PM:
--

I just read through the history and now I realize that SOLR-11646 is about 
that, in terms of documenting those v2 API's for Collections API which is what 
I'm interested in...!


was (Author: epugh):
I just read through the history and now I realize that SOLR-11646 is about 
that, in terms of documenting those v2 API's for ConfigSets which is what I'm 
interested in...!

> Move collections admin API to v2 completely 
> 
>
> Key: SOLR-14265
> URL: https://issues.apache.org/jira/browse/SOLR-14265
> Project: Solr
>  Issue Type: Improvement
>Reporter: Anshum Gupta
>Assignee: Anshum Gupta
>Priority: Major
>
> V2 admin API has been available in Solr for a very long time, making it 
> difficult for both users and developers to remember and understand which 
> format to use when. We should move to v2 API completely for all Solr Admin 
> calls for the following reasons:
>  # converge code - there are multiple ways of doing the same thing, there's 
> unwanted back-compat code, and we should get rid of that
>  # POJO all the way - no more NamedList. I know this would have split 
> opinions, but I strongly think we should move in this direction. I created 
> Jira about this specific task in the past and went half way but I think we 
> should just close this one out now.
>  # Automatic documentation
>  # Others
> This is just an umbrella Jira for the task. Let's create sub-tasks and split 
> this up as it would require a bunch of rewriting of the code and it makes a 
> lot of sense to get this out with 9.0 so we don't have to support v1 forever! 
> There have been some conversations going on about this and it feels like most 
> folks are happy to go this route.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-14265) Move collections admin API to v2 completely

2021-01-11 Thread David Eric Pugh (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17262793#comment-17262793
 ] 

David Eric Pugh commented on SOLR-14265:


I just read through the history and now I realize that SOLR-11646 is about 
that, in terms of documenting those v2 API's for ConfigSets which is what I'm 
interested in...!

> Move collections admin API to v2 completely 
> 
>
> Key: SOLR-14265
> URL: https://issues.apache.org/jira/browse/SOLR-14265
> Project: Solr
>  Issue Type: Improvement
>Reporter: Anshum Gupta
>Assignee: Anshum Gupta
>Priority: Major
>
> V2 admin API has been available in Solr for a very long time, making it 
> difficult for both users and developers to remember and understand which 
> format to use when. We should move to v2 API completely for all Solr Admin 
> calls for the following reasons:
>  # converge code - there are multiple ways of doing the same thing, there's 
> unwanted back-compat code, and we should get rid of that
>  # POJO all the way - no more NamedList. I know this would have split 
> opinions, but I strongly think we should move in this direction. I created 
> Jira about this specific task in the past and went half way but I think we 
> should just close this one out now.
>  # Automatic documentation
>  # Others
> This is just an umbrella Jira for the task. Let's create sub-tasks and split 
> this up as it would require a bunch of rewriting of the code and it makes a 
> lot of sense to get this out with 9.0 so we don't have to support v1 forever! 
> There have been some conversations going on about this and it feels like most 
> folks are happy to go this route.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-15036) Use plist automatically for executing a facet expression against a collection alias backed by multiple collections

2021-01-11 Thread Timothy Potter (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-15036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17262786#comment-17262786
 ] 

Timothy Potter commented on SOLR-15036:
---

Thanks. Updated the impl. to use {{HashRollupStream}} (doesn't seem to be 
documented btw) ... definitely much cleaner ;-)

> Use plist automatically for executing a facet expression against a collection 
> alias backed by multiple collections
> --
>
> Key: SOLR-15036
> URL: https://issues.apache.org/jira/browse/SOLR-15036
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: streaming expressions
>Reporter: Timothy Potter
>Assignee: Timothy Potter
>Priority: Major
> Attachments: relay-approach.patch
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> For analytics use cases, streaming expressions make it possible to compute 
> basic aggregations (count, min, max, sum, and avg) over massive data sets. 
> Moreover, with massive data sets, it is common to use collection aliases over 
> many underlying collections, for instance time-partitioned aliases backed by 
> a set of collections, each covering a specific time range. In some cases, we 
> can end up with many collections (think 50-60) each with 100's of shards. 
> Aliases help insulate client applications from complex collection topologies 
> on the server side.
> Let's take a basic facet expression that computes some useful aggregation 
> metrics:
> {code:java}
> facet(
>   some_alias, 
>   q="*:*", 
>   fl="a_i", 
>   sort="a_i asc", 
>   buckets="a_i", 
>   bucketSorts="count(*) asc", 
>   bucketSizeLimit=1, 
>   sum(a_d), avg(a_d), min(a_d), max(a_d), count(*)
> )
> {code}
> Behind the scenes, the {{FacetStream}} sends a JSON facet request to Solr 
> which then expands the alias to a list of collections. For each collection, 
> the top-level distributed query controller gathers a candidate set of 
> replicas to query and then scatters {{distrib=false}} queries to each replica 
> in the list. For instance, if we have 60 collections with 200 shards each, 
> then this results in 12,000 shard requests from the query controller node to 
> the other nodes in the cluster. The requests are sent in an async manner (see 
> {{SearchHandler}} and {{HttpShardHandler}}) In my testing, we’ve seen cases 
> where we hit 18,000 replicas and these queries don’t always come back in a 
> timely manner. Put simply, this also puts a lot of load on the top-level 
> query controller node in terms of open connections and new object creation.
> Instead, we can use {{plist}} to send the JSON facet query to each collection 
> in the alias in parallel, which reduces the overhead of each top-level 
> distributed query from 12,000 to 200 in my example above. With this approach, 
> you’ll then need to sort the tuples back from each collection and do a 
> rollup, something like:
> {code:java}
> select(
>   rollup(
> sort(
>   plist(
> select(facet(coll1,q="*:*", fl="a_i", sort="a_i asc", buckets="a_i", 
> bucketSorts="count(*) asc", bucketSizeLimit=1, sum(a_d), avg(a_d), 
> min(a_d), max(a_d), count(*)),a_i,sum(a_d) as the_sum, avg(a_d) as the_avg, 
> min(a_d) as the_min, max(a_d) as the_max, count(*) as cnt),
> select(facet(coll2,q="*:*", fl="a_i", sort="a_i asc", buckets="a_i", 
> bucketSorts="count(*) asc", bucketSizeLimit=1, sum(a_d), avg(a_d), 
> min(a_d), max(a_d), count(*)),a_i,sum(a_d) as the_sum, avg(a_d) as the_avg, 
> min(a_d) as the_min, max(a_d) as the_max, count(*) as cnt)
>   ),
>   by="a_i asc"
> ),
> over="a_i",
> sum(the_sum), avg(the_avg), min(the_min), max(the_max), sum(cnt)
>   ),
>   a_i, sum(the_sum) as the_sum, avg(the_avg) as the_avg, min(the_min) as 
> the_min, max(the_max) as the_max, sum(cnt) as cnt
> )
> {code}
> One thing to point out is that you can’t just avg. the averages back from 
> each collection in the rollup. It needs to be a *weighted avg.* when rolling 
> up the avg. from each facet expression in the plist. However, we have the 
> count per collection, so this is doable but will require some changes to the 
> rollup expression to support weighted average.
> While this plist approach is doable, it’s a pain for users to have to create 
> the rollup / sort over plist expression for collection aliases. After all, 
> aliases are supposed to hide these types of complexities from client 
> applications!
> The point of this ticket is to investigate the feasibility of auto-wrapping 
> the facet expression with a rollup / sort / plist when the collection 
> argument is an alias with multiple collections; other stream sources will be 
> considered aft

[jira] [Commented] (SOLR-15036) Use plist automatically for executing a facet expression against a collection alias backed by multiple collections

2021-01-11 Thread Joel Bernstein (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-15036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17262773#comment-17262773
 ] 

Joel Bernstein commented on SOLR-15036:
---

One thing I noticed in the implementation that could be improved is to switch 
to using the HashRollupStream which avoids the need to sort and should be 
significantly faster in this scenario. 

> Use plist automatically for executing a facet expression against a collection 
> alias backed by multiple collections
> --
>
> Key: SOLR-15036
> URL: https://issues.apache.org/jira/browse/SOLR-15036
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: streaming expressions
>Reporter: Timothy Potter
>Assignee: Timothy Potter
>Priority: Major
> Attachments: relay-approach.patch
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> For analytics use cases, streaming expressions make it possible to compute 
> basic aggregations (count, min, max, sum, and avg) over massive data sets. 
> Moreover, with massive data sets, it is common to use collection aliases over 
> many underlying collections, for instance time-partitioned aliases backed by 
> a set of collections, each covering a specific time range. In some cases, we 
> can end up with many collections (think 50-60) each with 100's of shards. 
> Aliases help insulate client applications from complex collection topologies 
> on the server side.
> Let's take a basic facet expression that computes some useful aggregation 
> metrics:
> {code:java}
> facet(
>   some_alias, 
>   q="*:*", 
>   fl="a_i", 
>   sort="a_i asc", 
>   buckets="a_i", 
>   bucketSorts="count(*) asc", 
>   bucketSizeLimit=1, 
>   sum(a_d), avg(a_d), min(a_d), max(a_d), count(*)
> )
> {code}
> Behind the scenes, the {{FacetStream}} sends a JSON facet request to Solr 
> which then expands the alias to a list of collections. For each collection, 
> the top-level distributed query controller gathers a candidate set of 
> replicas to query and then scatters {{distrib=false}} queries to each replica 
> in the list. For instance, if we have 60 collections with 200 shards each, 
> then this results in 12,000 shard requests from the query controller node to 
> the other nodes in the cluster. The requests are sent in an async manner (see 
> {{SearchHandler}} and {{HttpShardHandler}}) In my testing, we’ve seen cases 
> where we hit 18,000 replicas and these queries don’t always come back in a 
> timely manner. Put simply, this also puts a lot of load on the top-level 
> query controller node in terms of open connections and new object creation.
> Instead, we can use {{plist}} to send the JSON facet query to each collection 
> in the alias in parallel, which reduces the overhead of each top-level 
> distributed query from 12,000 to 200 in my example above. With this approach, 
> you’ll then need to sort the tuples back from each collection and do a 
> rollup, something like:
> {code:java}
> select(
>   rollup(
> sort(
>   plist(
> select(facet(coll1,q="*:*", fl="a_i", sort="a_i asc", buckets="a_i", 
> bucketSorts="count(*) asc", bucketSizeLimit=1, sum(a_d), avg(a_d), 
> min(a_d), max(a_d), count(*)),a_i,sum(a_d) as the_sum, avg(a_d) as the_avg, 
> min(a_d) as the_min, max(a_d) as the_max, count(*) as cnt),
> select(facet(coll2,q="*:*", fl="a_i", sort="a_i asc", buckets="a_i", 
> bucketSorts="count(*) asc", bucketSizeLimit=1, sum(a_d), avg(a_d), 
> min(a_d), max(a_d), count(*)),a_i,sum(a_d) as the_sum, avg(a_d) as the_avg, 
> min(a_d) as the_min, max(a_d) as the_max, count(*) as cnt)
>   ),
>   by="a_i asc"
> ),
> over="a_i",
> sum(the_sum), avg(the_avg), min(the_min), max(the_max), sum(cnt)
>   ),
>   a_i, sum(the_sum) as the_sum, avg(the_avg) as the_avg, min(the_min) as 
> the_min, max(the_max) as the_max, sum(cnt) as cnt
> )
> {code}
> One thing to point out is that you can’t just avg. the averages back from 
> each collection in the rollup. It needs to be a *weighted avg.* when rolling 
> up the avg. from each facet expression in the plist. However, we have the 
> count per collection, so this is doable but will require some changes to the 
> rollup expression to support weighted average.
> While this plist approach is doable, it’s a pain for users to have to create 
> the rollup / sort over plist expression for collection aliases. After all, 
> aliases are supposed to hide these types of complexities from client 
> applications!
> The point of this ticket is to investigate the feasibility of auto-wrapping 
> the facet expression with a rollup / sort / plist when the collection 
> argument is an alias wi

[jira] [Commented] (SOLR-14265) Move collections admin API to v2 completely

2021-01-11 Thread Jason Gerlowski (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17262772#comment-17262772
 ] 

Jason Gerlowski commented on SOLR-14265:


I intended to do that sort of thing - expand the spreadsheet to cover other 
sets of APIs.  It wouldn't directly help Anshum (or whoever) with this ticket 
here, but I'd hoped that they would be a good reference.  But I ran out of time 
over holiday break so I removed what were mostly blank tabs.

Anyway I'd love to see other APIs be covered there so we have an updated 
reference, but realistically I probably can't move that much forward myself 
atm.  I opened up the edit permissions on the doc though, so that anyone that 
wants can fill it out.  Would be awesome if we could start generating some 
momentum on moving towards v2.

> Move collections admin API to v2 completely 
> 
>
> Key: SOLR-14265
> URL: https://issues.apache.org/jira/browse/SOLR-14265
> Project: Solr
>  Issue Type: Improvement
>Reporter: Anshum Gupta
>Assignee: Anshum Gupta
>Priority: Major
>
> V2 admin API has been available in Solr for a very long time, making it 
> difficult for both users and developers to remember and understand which 
> format to use when. We should move to v2 API completely for all Solr Admin 
> calls for the following reasons:
>  # converge code - there are multiple ways of doing the same thing, there's 
> unwanted back-compat code, and we should get rid of that
>  # POJO all the way - no more NamedList. I know this would have split 
> opinions, but I strongly think we should move in this direction. I created 
> Jira about this specific task in the past and went half way but I think we 
> should just close this one out now.
>  # Automatic documentation
>  # Others
> This is just an umbrella Jira for the task. Let's create sub-tasks and split 
> this up as it would require a bunch of rewriting of the code and it makes a 
> lot of sense to get this out with 9.0 so we don't have to support v1 forever! 
> There have been some conversations going on about this and it feels like most 
> folks are happy to go this route.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-15071) Bug on LTR when using solr 8.6.3 - index out of bounds DisiPriorityQueue.add(DisiPriorityQueue.java:102)

2021-01-11 Thread Ovidiu Mihalcea (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-15071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17262764#comment-17262764
 ] 

Ovidiu Mihalcea commented on SOLR-15071:


At first I thought I've managed to reproduce it with even a single document: 

{{*curl -XGET 
'http://localhost:8983/solr/techproducts/select?fl=id%2C%5Bfeatures%20store%3Ddev_2020_08_26%20efi.term%3D%22husa%20cu%20tastatura%20tableta%2010%20inch%22%5D&q=husa%20cu%20tastat*}}
{{*ura%20tableta%2010%20inch&fq=id:934521240'*}}

And it worked, even after a restart (so no caches involved, I believe).

-To sum up, the problem occurs when trying to extract a feature with the 
following params:-

-*{{"q": "\{!dismax qf=name mm=3}${term}"}}*-

-from the document with these fields:-

-*{{{ "id":"934521240", "name":"tableta magnetica abc quercetti" }}}*-

-where $term is:-

-*{{husa cu tastatura tableta 10 inch}}*-

But if I try adding only this document, or this document and another subset of 
the sample documents in the index, (_and this is the weird part_) the problem 
no longer occurs. So the problem seems to occur even with only one document 
returned in the request (fq=id:X), but only if other (_which ones?!_) documents 
are present, which logically makes me believe the request also touches some 
other documents, even though it shouldn't.

 

> Bug on LTR when using solr 8.6.3 - index out of bounds 
> DisiPriorityQueue.add(DisiPriorityQueue.java:102)
> 
>
> Key: SOLR-15071
> URL: https://issues.apache.org/jira/browse/SOLR-15071
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: contrib - LTR
>Affects Versions: 8.6, 8.7
>Reporter: Florin Babes
>Priority: Major
>  Labels: ltr
> Attachments: featurestore+model+sample_documents.zip
>
>
> Hello,
> We are trying to update Solr from 8.3.1 to 8.6.3. On Solr 8.3.1 we are
> using LTR in production using a MultipleAdditiveTrees model. On Solr 8.6.3
> we receive an error when we try to compute some SolrFeatures. We didn't
> find any pattern of the queries that fail.
> Example:
> We have the following query raw parameters:
> q=lg cx 4k oled 120 hz -> just of many examples
> term_dq=lg cx 4k oled 120 hz
> rq={!ltr model=model reRankDocs=1000 store=feature_store
> efi.term=${term_dq}}
> defType=edismax,
> mm=2<75%
> The features are something like this:
> {
>  "name":"similarity_query_fileld_1",
>  "class":"org.apache.solr.ltr.feature.SolrFeature",
>  "params":\{"q":"{!dismax qf=query_field_1 mm=1}${term}"},
>  "store":"feature_store"
> },
> {
>  "name":"similarity_query_field_2",
>  "class":"org.apache.solr.ltr.feature.SolrFeature",
>  "params":\{"q":"{!dismax qf=query_field_2 mm=5}${term}"},
>  "store":"feature_store"
> }
> We are testing ~6300 production queries and for about 1% of them we receive
> that following error message:
> "metadata":[
>  "error-class","org.apache.solr.common.SolrException",
>  "root-error-class","java.lang.ArrayIndexOutOfBoundsException"],
>  "msg":"java.lang.ArrayIndexOutOfBoundsException: Index 2 out of bounds
> for length 2",
> The stacktrace is :
> org.apache.solr.common.SolrException:
> java.lang.ArrayIndexOutOfBoundsException: Index 2 out of bounds for length 2
> at org.apache.solr.search.ReRankCollector.topDocs(ReRankCollector.java:154)
> at
> org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:159
> 9)
> at
> org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:1413
> )
> at
> org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:596)
> at
> org.apache.solr.handler.component.QueryComponent.doProcessUngroupedSearch(QueryC
> omponent.java:1513)
> at
> org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:403
> )
> at
> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.
> java:360)
> at
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java
> :214)
> at org.apache.solr.core.SolrCore.execute(SolrCore.java:2627)
> at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:795)
> at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:568)
> at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:415)
> at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:345)
> at
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.jav
> a:1596)
> at
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:545)
> at
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
> at
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:590)
>

[jira] [Commented] (SOLR-14265) Move collections admin API to v2 completely

2021-01-11 Thread David Eric Pugh (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17262757#comment-17262757
 ] 

David Eric Pugh commented on SOLR-14265:


[~gerlowskija] I was looking for the v2 version of the configset UPLOAD 
command, and looked at your spreadsheet. Do you want to have a tab with the 
ConfigSet API's as well?  Is that useful?

> Move collections admin API to v2 completely 
> 
>
> Key: SOLR-14265
> URL: https://issues.apache.org/jira/browse/SOLR-14265
> Project: Solr
>  Issue Type: Improvement
>Reporter: Anshum Gupta
>Assignee: Anshum Gupta
>Priority: Major
>
> V2 admin API has been available in Solr for a very long time, making it 
> difficult for both users and developers to remember and understand which 
> format to use when. We should move to v2 API completely for all Solr Admin 
> calls for the following reasons:
>  # converge code - there are multiple ways of doing the same thing, there's 
> unwanted back-compat code, and we should get rid of that
>  # POJO all the way - no more NamedList. I know this would have split 
> opinions, but I strongly think we should move in this direction. I created 
> Jira about this specific task in the past and went half way but I think we 
> should just close this one out now.
>  # Automatic documentation
>  # Others
> This is just an umbrella Jira for the task. Let's create sub-tasks and split 
> this up as it would require a bunch of rewriting of the code and it makes a 
> lot of sense to get this out with 9.0 so we don't have to support v1 forever! 
> There have been some conversations going on about this and it feels like most 
> folks are happy to go this route.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] jpountz merged pull request #2141: LUCENE-9346: Support minimumNumberShouldMatch in WANDScorer

2021-01-11 Thread GitBox


jpountz merged pull request #2141:
URL: https://github.com/apache/lucene-solr/pull/2141


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Resolved] (SOLR-14999) Add built-in option to advertise Solr with a different port than Jetty listens on.

2021-01-11 Thread Houston Putman (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-14999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Houston Putman resolved SOLR-14999.
---
Resolution: Fixed

> Add built-in option to advertise Solr with a different port than Jetty 
> listens on.
> --
>
> Key: SOLR-14999
> URL: https://issues.apache.org/jira/browse/SOLR-14999
> Project: Solr
>  Issue Type: Improvement
>Reporter: Houston Putman
>Assignee: Houston Putman
>Priority: Major
> Fix For: 8.8
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Currently the default settings in {{solr.xml}} allow the specification of one 
> port, {{jetty.port}} which the bin/solr script provides from the 
> {{SOLR_PORT}} environment variable. This port is used twice. Jetty uses it to 
> listen for requests, and the clusterState uses the port to advertise the 
> address of the Solr Node.
> In cloud environments, it's sometimes crucial to be able to listen on one 
> port and advertise yourself as listening on another. This is because there is 
> a proxy that listens on the advertised port, and forwards the request to the 
> server which is listening to the jetty port.
> Solr already supports having a separate Jetty port and Live Nodes port 
> (examples provided in the dev-list discussion linked below). I suggest that 
> we add this to the default solr config so that users can use the default 
> solr.xml in cloud configurations, and the solr/bin script will enable easy 
> use of this feature.
> There has been [discussion on this exact 
> problem|https://mail-archives.apache.org/mod_mbox/lucene-dev/201910.mbox/%3CCABEwPvGFEggt9Htn%3DA5%3DtoawuimSJ%2BZcz0FvsaYod7v%2B4wHKog%40mail.gmail.com%3E]
>  on the dev list already.
> I propose the new system property to be used for {{hostPort}} in the 
> solr.xml. I am open to changing the name, but to me it is more descriptive 
> than {{hostPort}}.
> {{-Dsolr.port.advertise}} and {{SOLR_PORT_ADVERTISE}} (env var checked in 
> bin/solr).
> The xml field {{}} would not be changed however, just the system 
> property that is used to fill the value in the default {{solr.xml}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] mikemccand commented on pull request #2161: LUCENE-9646: Set BM25Similarity discountOverlaps via the constructor

2021-01-11 Thread GitBox


mikemccand commented on pull request #2161:
URL: https://github.com/apache/lucene-solr/pull/2161#issuecomment-757993718


   +1
   
   This change looks great, but it is an API break, so maybe we push it for 9.0 
only, and add an entry into `MIGRATE.txt`?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-15071) Bug on LTR when using solr 8.6.3 - index out of bounds DisiPriorityQueue.add(DisiPriorityQueue.java:102)

2021-01-11 Thread Christine Poerschke (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-15071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17262684#comment-17262684
 ] 

Christine Poerschke commented on SOLR-15071:


Great to hear the issue is reproducible with the techproducts example and the 
addition of some of your documents.

Thanks [~dizzu333] for sharing the exact steps and the files to work with!

I see that there are 10,000 documents and would be curious if reducing that 
somehow would lead us to further insights. Either indexing fewer documents or 
indexing all the 10,000 documents but excluding them from the search e.g. via 
{{fq=id:(123 OR 456 OR 789)}} clauses. Perhaps you've already tried that, I see 
there's {{rows=500}} on the query rather than the default {{rows=10}}?

> Bug on LTR when using solr 8.6.3 - index out of bounds 
> DisiPriorityQueue.add(DisiPriorityQueue.java:102)
> 
>
> Key: SOLR-15071
> URL: https://issues.apache.org/jira/browse/SOLR-15071
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: contrib - LTR
>Affects Versions: 8.6, 8.7
>Reporter: Florin Babes
>Priority: Major
>  Labels: ltr
> Attachments: featurestore+model+sample_documents.zip
>
>
> Hello,
> We are trying to update Solr from 8.3.1 to 8.6.3. On Solr 8.3.1 we are
> using LTR in production using a MultipleAdditiveTrees model. On Solr 8.6.3
> we receive an error when we try to compute some SolrFeatures. We didn't
> find any pattern of the queries that fail.
> Example:
> We have the following query raw parameters:
> q=lg cx 4k oled 120 hz -> just of many examples
> term_dq=lg cx 4k oled 120 hz
> rq={!ltr model=model reRankDocs=1000 store=feature_store
> efi.term=${term_dq}}
> defType=edismax,
> mm=2<75%
> The features are something like this:
> {
>  "name":"similarity_query_fileld_1",
>  "class":"org.apache.solr.ltr.feature.SolrFeature",
>  "params":\{"q":"{!dismax qf=query_field_1 mm=1}${term}"},
>  "store":"feature_store"
> },
> {
>  "name":"similarity_query_field_2",
>  "class":"org.apache.solr.ltr.feature.SolrFeature",
>  "params":\{"q":"{!dismax qf=query_field_2 mm=5}${term}"},
>  "store":"feature_store"
> }
> We are testing ~6300 production queries and for about 1% of them we receive
> that following error message:
> "metadata":[
>  "error-class","org.apache.solr.common.SolrException",
>  "root-error-class","java.lang.ArrayIndexOutOfBoundsException"],
>  "msg":"java.lang.ArrayIndexOutOfBoundsException: Index 2 out of bounds
> for length 2",
> The stacktrace is :
> org.apache.solr.common.SolrException:
> java.lang.ArrayIndexOutOfBoundsException: Index 2 out of bounds for length 2
> at org.apache.solr.search.ReRankCollector.topDocs(ReRankCollector.java:154)
> at
> org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:159
> 9)
> at
> org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:1413
> )
> at
> org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:596)
> at
> org.apache.solr.handler.component.QueryComponent.doProcessUngroupedSearch(QueryC
> omponent.java:1513)
> at
> org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:403
> )
> at
> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.
> java:360)
> at
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java
> :214)
> at org.apache.solr.core.SolrCore.execute(SolrCore.java:2627)
> at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:795)
> at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:568)
> at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:415)
> at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:345)
> at
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.jav
> a:1596)
> at
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:545)
> at
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
> at
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:590)
> at
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)
> at
> org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:235
> )
> at
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:161
> 0)
> at
> org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:233
> )
> at
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:130
> 0)
> at
> org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:188)
> at org.eclipse.jetty.s

[GitHub] [lucene-solr] mikemccand commented on pull request #2052: LUCENE-8982: Make NativeUnixDirectory pure java with FileChannel direct IO flag, and rename to DirectIODirectory

2021-01-11 Thread GitBox


mikemccand commented on pull request #2052:
URL: https://github.com/apache/lucene-solr/pull/2052#issuecomment-757990579


   Is this close now?  Hard to tell from all the back&forth!  Was writing 
checkpoints really broken!?  How could tests fail to catch this :)
   
   @uschindler any thing else to do here?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] gerlowskija merged pull request #2183: SOLR-15070: Remove HashMap usage in SuggestComponent rsp

2021-01-11 Thread GitBox


gerlowskija merged pull request #2183:
URL: https://github.com/apache/lucene-solr/pull/2183


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] jpountz commented on pull request #2141: LUCENE-9346: Support minimumNumberShouldMatch in WANDScorer

2021-01-11 Thread GitBox


jpountz commented on pull request #2141:
URL: https://github.com/apache/lucene-solr/pull/2141#issuecomment-757979678


   @zacharymorn I merged your PR because it was good progress already, but I'm 
also +1 on your idea of replacing MinShouldMatchSumScorer with WANDScorer since 
they share very similar logic. Let's give it a try in a follow-up PR?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9346) WANDScorer should support minimumNumberShouldMatch

2021-01-11 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17262665#comment-17262665
 ] 

ASF subversion and git services commented on LUCENE-9346:
-

Commit f0d6fd84bb8442e9caea6f4282cb45c5e25ddb0f in lucene-solr's branch 
refs/heads/master from Adrien Grand
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=f0d6fd8 ]

LUCENE-9346: Add CHANGES entry.


> WANDScorer should support minimumNumberShouldMatch
> --
>
> Key: LUCENE-9346
> URL: https://issues.apache.org/jira/browse/LUCENE-9346
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Priority: Minor
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>
> Currently we deoptimize when a minimumNumberShouldMatch is provided and fall 
> back to a scorer that doesn't dynamically prune hits based on scores.
> Given how WANDScorer and MinShouldMatchSumScorer are similar I wonder if we 
> could remove MinShouldSumScorer once WANDScorer supports minimumNumberShould 
> match. Then any improvements we bring to WANDScorer like two-phase support 
> (LUCENE-8806) would automatically cover more queries.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-9346) WANDScorer should support minimumNumberShouldMatch

2021-01-11 Thread Adrien Grand (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-9346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adrien Grand resolved LUCENE-9346.
--
Fix Version/s: 8.8
   Resolution: Fixed

Thank you [~zacharymorn]!

> WANDScorer should support minimumNumberShouldMatch
> --
>
> Key: LUCENE-9346
> URL: https://issues.apache.org/jira/browse/LUCENE-9346
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Priority: Minor
> Fix For: 8.8
>
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>
> Currently we deoptimize when a minimumNumberShouldMatch is provided and fall 
> back to a scorer that doesn't dynamically prune hits based on scores.
> Given how WANDScorer and MinShouldMatchSumScorer are similar I wonder if we 
> could remove MinShouldSumScorer once WANDScorer supports minimumNumberShould 
> match. Then any improvements we bring to WANDScorer like two-phase support 
> (LUCENE-8806) would automatically cover more queries.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9346) WANDScorer should support minimumNumberShouldMatch

2021-01-11 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17262667#comment-17262667
 ] 

ASF subversion and git services commented on LUCENE-9346:
-

Commit 40cd50a584e860c40de77ef604f2843345613c61 in lucene-solr's branch 
refs/heads/branch_8x from Adrien Grand
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=40cd50a ]

LUCENE-9346: Add CHANGES entry.


> WANDScorer should support minimumNumberShouldMatch
> --
>
> Key: LUCENE-9346
> URL: https://issues.apache.org/jira/browse/LUCENE-9346
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Priority: Minor
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>
> Currently we deoptimize when a minimumNumberShouldMatch is provided and fall 
> back to a scorer that doesn't dynamically prune hits based on scores.
> Given how WANDScorer and MinShouldMatchSumScorer are similar I wonder if we 
> could remove MinShouldSumScorer once WANDScorer supports minimumNumberShould 
> match. Then any improvements we bring to WANDScorer like two-phase support 
> (LUCENE-8806) would automatically cover more queries.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9346) WANDScorer should support minimumNumberShouldMatch

2021-01-11 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17262666#comment-17262666
 ] 

ASF subversion and git services commented on LUCENE-9346:
-

Commit 74c0a978148c404d54b71a0835274276b5decd1c in lucene-solr's branch 
refs/heads/branch_8x from zacharymorn
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=74c0a97 ]

LUCENE-9346: Support minimumNumberShouldMatch in WANDScorer (#2141)

Co-authored-by: Adrien Grand 


> WANDScorer should support minimumNumberShouldMatch
> --
>
> Key: LUCENE-9346
> URL: https://issues.apache.org/jira/browse/LUCENE-9346
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Priority: Minor
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>
> Currently we deoptimize when a minimumNumberShouldMatch is provided and fall 
> back to a scorer that doesn't dynamically prune hits based on scores.
> Given how WANDScorer and MinShouldMatchSumScorer are similar I wonder if we 
> could remove MinShouldSumScorer once WANDScorer supports minimumNumberShould 
> match. Then any improvements we bring to WANDScorer like two-phase support 
> (LUCENE-8806) would automatically cover more queries.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] jpountz commented on a change in pull request #2141: LUCENE-9346: Support minimumNumberShouldMatch in WANDScorer

2021-01-11 Thread GitBox


jpountz commented on a change in pull request #2141:
URL: https://github.com/apache/lucene-solr/pull/2141#discussion_r555074892



##
File path: 
lucene/core/src/java/org/apache/lucene/search/Boolean2ScorerSupplier.java
##
@@ -230,10 +232,13 @@ private Scorer opt(
   for (ScorerSupplier scorer : optional) {
 optionalScorers.add(scorer.get(leadCost));
   }
-  if (minShouldMatch > 1) {
+
+  if (scoreMode == ScoreMode.TOP_SCORES) {
+return new WANDScorer(weight, optionalScorers, minShouldMatch);
+  } else if (minShouldMatch > 1) {
+// nocommit minShouldMath > 1 && scoreMode != ScoreMode.TOP_SCORES 
still requires MinShouldMatchSumScorer.
+// Do we want to deprecate this entirely now ?

Review comment:
   > Is there a reason WANDScorer is only used for TOP_DOCS before?
   
   Mostly because it's a bit slower since it needs to track two priority queues 
and one linked list, when the simple DisjunctionScorer only needs to track a 
single priority queue.
   
   > since WANDScorer can already handle the case where minShouldMatch == 0, 
then I can see MinShouldMatchSumScorer be merged into WANDScorer mostly to 
handle non-scoring mode
   
   +1 to give it a try in a follow-up PR!





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9652) DataInput.readFloats to be used by Lucene90VectorReader

2021-01-11 Thread Michael McCandless (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17262663#comment-17262663
 ] 

Michael McCandless commented on LUCENE-9652:


[~sokolov] it looks like it was the 2021-01-07 data point – I'll add annotation.

> DataInput.readFloats to be used by Lucene90VectorReader
> ---
>
> Key: LUCENE-9652
> URL: https://issues.apache.org/jira/browse/LUCENE-9652
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Michael Sokolov
>Priority: Major
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Benchmarking shows a substantial performance gain can be realized by avoiding 
> the additional memory copy we must do today when converting from {{byte[]}} 
> read using {{IndexInput}} into {{float[]}} returned by 
> {{Lucene90VectorReader}}. We have a model for how to handle the various 
> alignments, and buffer underflow when a value spans buffers, in 
> {{readLELongs}}.
> I think we should only support little-endian floats from the beginning here. 
> We're planning to move towards switching the whole IndexInput to that 
> endianness, right?
> Lucene90VectorWriter relies on {{VectorValues.binaryValue()}} to return bytes 
> in the format expected by the reader, and its javadocs don't currently 
> specify their endianness. In fact the order has been the default supplied by 
> {{ByteBuffer.allocate(int)}}, which I now realize is big-endian, so this 
> issue also proposes to change the index format. That would mean a 
> backwards-incompatible index change, but I think if we're still unreleased 
> and in an experimental class that should be OK?
> Also, we don't need a corresponding {{DataOutput.writeFloats}} to support the 
> current usage for vectors, since there we rely on {{VectorValues}} to do the 
> conversion, so I don't plan to implement that.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] cpoerschke commented on a change in pull request #2192: SOLR-15010 Try to use jattach for threaddump if jstack is missing

2021-01-11 Thread GitBox


cpoerschke commented on a change in pull request #2192:
URL: https://github.com/apache/lucene-solr/pull/2192#discussion_r555071923



##
File path: solr/bin/solr
##
@@ -110,9 +110,10 @@ elif [ -n "$JAVA_HOME" ]; then
   JAVA="$java/java"
   if [ -x "$java/jstack" ]; then
 JSTACK="$java/jstack"
+  elif [ -x "$(command -v jattach)" ]; then
+JATTACH="jattach"

Review comment:
   minor/maybe suggestion:
   ```suggestion
   JATTACH="$(command -v jattach)"
   ```





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9537) Add Indri Search Engine Functionality to Lucene

2021-01-11 Thread Michael McCandless (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17262659#comment-17262659
 ] 

Michael McCandless commented on LUCENE-9537:


[~cvandenberg] did you see my latest round of comments?  I think the PR is 
really close, but I thought you could use a default method on {{Scorable}} 
interface to return 0 instead of replicating that default in many 
implementations?

> Add Indri Search Engine Functionality to Lucene
> ---
>
> Key: LUCENE-9537
> URL: https://issues.apache.org/jira/browse/LUCENE-9537
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/search
>Reporter: Cameron VandenBerg
>Priority: Major
>  Labels: patch
> Attachments: LUCENE-9537.patch, LUCENE-INDRI.patch
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> Indri ([http://lemurproject.org/indri.php]) is an academic search engine 
> developed by The University of Massachusetts and Carnegie Mellon University.  
> The major difference between Lucene and Indri is that Indri will give a 
> document a "smoothing score" to a document that does not contain the search 
> term, which has improved the search ranking accuracy in our experiments.  I 
> have created an Indri patch, which adds the search code needed to implement 
> the Indri AND logic as well as Indri's implementation of Dirichlet Smoothing.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-15071) Bug on LTR when using solr 8.6.3 - index out of bounds DisiPriorityQueue.add(DisiPriorityQueue.java:102)

2021-01-11 Thread Ovidiu Mihalcea (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-15071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17262657#comment-17262657
 ] 

Ovidiu Mihalcea edited comment on SOLR-15071 at 1/11/21, 2:05 PM:
--

Maybe someone who recently went through the code related to this could help us 
identify the issue?

[~dsmiley], if you have the time, could you also take a look at this bug and 
maybe help us identify the issue? It would be very appreciated. Thank you!

 

Here are the exact reproduction steps we are using to trigger the bug (files: 
[^featurestore+model+sample_documents.zip] ):
 * start and index techproducts

{{*$ /opt/solr/bin/solr -e techproducts -Dsolr.ltr.enabled=true*}}
 * upload feature store

*{{$ curl -XPUT 'http://localhost:8983/solr/techproducts/schema/feature-store' 
--data-binary "@./feature-store.minimal.json" -H 
'Content-type:application/json'}}*
 * upload model

*{{$ curl -XPUT 'http://localhost:8983/solr/techproducts/schema/model-store' 
--data-binary "@./model.minimal.json" -H 'Content-type:application/json'}}*
 * index sample documents

*{{$ curl -XPOST 
'http://localhost:8983/solr/techproducts/update?wt=json&commitWithin=1&overwrite=true'
 --data-binary "@./sample_documents.json" -H 'Content-type:application/json'}}*
 * query for ‘husa cu tastatura tableta 10 inch’

*{{$ curl -XGET 
'http://localhost:8983/solr/techproducts/select?fl=id%2C%5Bfeatures%20store%3Ddev_2020_08_26%20efi.term%3D"husa%20cu%20tastatura%20tableta%2010%20inch"%5D&q=husa%20cu%20tastatura%20tableta%2010%20inch&rows=500}}*


was (Author: dizzu333):
Hello,

Maybe someone who recently went through the code related to this could help us 
identify the issue?

[~dsmiley], if you have the time, could you also take a look at this bug and 
maybe help us identify the issue? It would be very appreciated. Thank you!

 

Here are the exact reproduction steps we are using to trigger the bug (files: 
[^featurestore+model+sample_documents.zip] ):
 * start and index techproducts

{{*$ /opt/solr/bin/solr -e techproducts -Dsolr.ltr.enabled=true*}}
 * upload feature store

*{{$ curl -XPUT 'http://localhost:8983/solr/techproducts/schema/feature-store' 
--data-binary "@./feature-store.minimal.json" -H 
'Content-type:application/json'}}*
 * upload model

*{{$ curl -XPUT 'http://localhost:8983/solr/techproducts/schema/model-store' 
--data-binary "@./model.minimal.json" -H 'Content-type:application/json'}}*

* index sample documents

*{{$ curl -XPOST 
'http://localhost:8983/solr/techproducts/update?wt=json&commitWithin=1&overwrite=true'
 --data-binary "@./sample_documents.json" -H 'Content-type:application/json'}}*

* query for ‘husa cu tastatura tableta 10 inch’

*{{$ curl -XGET 
'http://localhost:8983/solr/techproducts/select?fl=id%2C%5Bfeatures%20store%3Ddev_2020_08_26%20efi.term%3D"husa%20cu%20tastatura%20tableta%2010%20inch"%5D&q=husa%20cu%20tastatura%20tableta%2010%20inch&rows=500}}*

> Bug on LTR when using solr 8.6.3 - index out of bounds 
> DisiPriorityQueue.add(DisiPriorityQueue.java:102)
> 
>
> Key: SOLR-15071
> URL: https://issues.apache.org/jira/browse/SOLR-15071
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: contrib - LTR
>Affects Versions: 8.6, 8.7
>Reporter: Florin Babes
>Priority: Major
>  Labels: ltr
> Attachments: featurestore+model+sample_documents.zip
>
>
> Hello,
> We are trying to update Solr from 8.3.1 to 8.6.3. On Solr 8.3.1 we are
> using LTR in production using a MultipleAdditiveTrees model. On Solr 8.6.3
> we receive an error when we try to compute some SolrFeatures. We didn't
> find any pattern of the queries that fail.
> Example:
> We have the following query raw parameters:
> q=lg cx 4k oled 120 hz -> just of many examples
> term_dq=lg cx 4k oled 120 hz
> rq={!ltr model=model reRankDocs=1000 store=feature_store
> efi.term=${term_dq}}
> defType=edismax,
> mm=2<75%
> The features are something like this:
> {
>  "name":"similarity_query_fileld_1",
>  "class":"org.apache.solr.ltr.feature.SolrFeature",
>  "params":\{"q":"{!dismax qf=query_field_1 mm=1}${term}"},
>  "store":"feature_store"
> },
> {
>  "name":"similarity_query_field_2",
>  "class":"org.apache.solr.ltr.feature.SolrFeature",
>  "params":\{"q":"{!dismax qf=query_field_2 mm=5}${term}"},
>  "store":"feature_store"
> }
> We are testing ~6300 production queries and for about 1% of them we receive
> that following error message:
> "metadata":[
>  "error-class","org.apache.solr.common.SolrException",
>  "root-error-class","java.lang.ArrayIndexOutOfBoundsException"],
>  "msg":"java.lang.ArrayIndexOutOfBoundsException: Index 2 out of bounds
> for length 2",
> The stacktra

[jira] [Comment Edited] (SOLR-15071) Bug on LTR when using solr 8.6.3 - index out of bounds DisiPriorityQueue.add(DisiPriorityQueue.java:102)

2021-01-11 Thread Ovidiu Mihalcea (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-15071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17262657#comment-17262657
 ] 

Ovidiu Mihalcea edited comment on SOLR-15071 at 1/11/21, 2:05 PM:
--

Hello,

Maybe someone who recently went through the code related to this could help us 
identify the issue?

[~dsmiley], if you have the time, could you also take a look at this bug and 
maybe help us identify the issue? It would be very appreciated. Thank you!

 

Here are the exact reproduction steps we are using to trigger the bug (files: 
[^featurestore+model+sample_documents.zip] ):
 * start and index techproducts

{{*$ /opt/solr/bin/solr -e techproducts -Dsolr.ltr.enabled=true*}}
 * upload feature store

*{{$ curl -XPUT 'http://localhost:8983/solr/techproducts/schema/feature-store' 
--data-binary "@./feature-store.minimal.json" -H 
'Content-type:application/json'}}*
 * upload model

*{{$ curl -XPUT 'http://localhost:8983/solr/techproducts/schema/model-store' 
--data-binary "@./model.minimal.json" -H 'Content-type:application/json'}}*

* index sample documents

*{{$ curl -XPOST 
'http://localhost:8983/solr/techproducts/update?wt=json&commitWithin=1&overwrite=true'
 --data-binary "@./sample_documents.json" -H 'Content-type:application/json'}}*

* query for ‘husa cu tastatura tableta 10 inch’

*{{$ curl -XGET 
'http://localhost:8983/solr/techproducts/select?fl=id%2C%5Bfeatures%20store%3Ddev_2020_08_26%20efi.term%3D"husa%20cu%20tastatura%20tableta%2010%20inch"%5D&q=husa%20cu%20tastatura%20tableta%2010%20inch&rows=500}}*


was (Author: dizzu333):
Hello,

Maybe someone who recently went through the code related to this could help us 
identify the issue?

[~dsmiley], if you have the time, could you also take a look at this bug and 
maybe help us identify the issue? It would be very appreciated. Thank you!

 

Here are the exact reproduction steps we are using to trigger the bug (files: 
[^featurestore+model+sample_documents.zip] ):
 * start and index techproducts

{{*$ /opt/solr/bin/solr -e techproducts -Dsolr.ltr.enabled=true*}}
 * upload feature store

*{{$ curl -XPUT 'http://localhost:8983/solr/techproducts/schema/feature-store' 
--data-binary "@./feature-store.minimal.json" -H 
'Content-type:application/json'}}*

* upload model

*{{$ curl -XPUT 'http://localhost:8983/solr/techproducts/schema/model-store' 
--data-binary "@./model.minimal.json" -H 'Content-type:application/json'}}*

 * index sample documents

*{{$ curl -XPOST 
'http://localhost:8983/solr/techproducts/update?wt=json&commitWithin=1&overwrite=true'
 --data-binary "@./sample_documents.json" -H 'Content-type:application/json'}}*

 * query for ‘husa cu tastatura tableta 10 inch’

*{{$ curl -XGET 
'http://localhost:8983/solr/techproducts/select?fl=id%2C%5Bfeatures%20store%3Ddev_2020_08_26%20efi.term%3D"husa%20cu%20tastatura%20tableta%2010%20inch"%5D&q=husa%20cu%20tastatura%20tableta%2010%20inch&rows=500}}*

> Bug on LTR when using solr 8.6.3 - index out of bounds 
> DisiPriorityQueue.add(DisiPriorityQueue.java:102)
> 
>
> Key: SOLR-15071
> URL: https://issues.apache.org/jira/browse/SOLR-15071
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: contrib - LTR
>Affects Versions: 8.6, 8.7
>Reporter: Florin Babes
>Priority: Major
>  Labels: ltr
> Attachments: featurestore+model+sample_documents.zip
>
>
> Hello,
> We are trying to update Solr from 8.3.1 to 8.6.3. On Solr 8.3.1 we are
> using LTR in production using a MultipleAdditiveTrees model. On Solr 8.6.3
> we receive an error when we try to compute some SolrFeatures. We didn't
> find any pattern of the queries that fail.
> Example:
> We have the following query raw parameters:
> q=lg cx 4k oled 120 hz -> just of many examples
> term_dq=lg cx 4k oled 120 hz
> rq={!ltr model=model reRankDocs=1000 store=feature_store
> efi.term=${term_dq}}
> defType=edismax,
> mm=2<75%
> The features are something like this:
> {
>  "name":"similarity_query_fileld_1",
>  "class":"org.apache.solr.ltr.feature.SolrFeature",
>  "params":\{"q":"{!dismax qf=query_field_1 mm=1}${term}"},
>  "store":"feature_store"
> },
> {
>  "name":"similarity_query_field_2",
>  "class":"org.apache.solr.ltr.feature.SolrFeature",
>  "params":\{"q":"{!dismax qf=query_field_2 mm=5}${term}"},
>  "store":"feature_store"
> }
> We are testing ~6300 production queries and for about 1% of them we receive
> that following error message:
> "metadata":[
>  "error-class","org.apache.solr.common.SolrException",
>  "root-error-class","java.lang.ArrayIndexOutOfBoundsException"],
>  "msg":"java.lang.ArrayIndexOutOfBoundsException: Index 2 out of bounds
> for length 2",
> Th

[jira] [Comment Edited] (SOLR-15071) Bug on LTR when using solr 8.6.3 - index out of bounds DisiPriorityQueue.add(DisiPriorityQueue.java:102)

2021-01-11 Thread Ovidiu Mihalcea (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-15071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17262657#comment-17262657
 ] 

Ovidiu Mihalcea edited comment on SOLR-15071 at 1/11/21, 2:05 PM:
--

Hello,

Maybe someone who recently went through the code related to this could help us 
identify the issue?

[~dsmiley], if you have the time, could you also take a look at this bug and 
maybe help us identify the issue? It would be very appreciated. Thank you!

 

Here are the exact reproduction steps we are using to trigger the bug (files: 
[^featurestore+model+sample_documents.zip] ):
 * start and index techproducts

{{*$ /opt/solr/bin/solr -e techproducts -Dsolr.ltr.enabled=true*}}
 * upload feature store

*{{$ curl -XPUT 'http://localhost:8983/solr/techproducts/schema/feature-store' 
--data-binary "@./feature-store.minimal.json" -H 
'Content-type:application/json'}}*

* upload model

*{{$ curl -XPUT 'http://localhost:8983/solr/techproducts/schema/model-store' 
--data-binary "@./model.minimal.json" -H 'Content-type:application/json'}}*

 * index sample documents

*{{$ curl -XPOST 
'http://localhost:8983/solr/techproducts/update?wt=json&commitWithin=1&overwrite=true'
 --data-binary "@./sample_documents.json" -H 'Content-type:application/json'}}*

 * query for ‘husa cu tastatura tableta 10 inch’

*{{$ curl -XGET 
'http://localhost:8983/solr/techproducts/select?fl=id%2C%5Bfeatures%20store%3Ddev_2020_08_26%20efi.term%3D"husa%20cu%20tastatura%20tableta%2010%20inch"%5D&q=husa%20cu%20tastatura%20tableta%2010%20inch&rows=500}}*


was (Author: dizzu333):
Hello,

Maybe someone who recently went through the code related to this could help us 
identify the issue?

[~dsmiley], if you have the time, could you also take a look at this bug and 
maybe help us identify the issue? It would be very appreciated. Thank you!

 

Here are the exact reproduction steps we are using to trigger the bug (files: 
[^featurestore+model+sample_documents.zip] ):

* start and index techproducts

{{*$ /opt/solr/bin/solr -e techproducts -Dsolr.ltr.enabled=true*}}

* upload feature store

*{{$ curl -XPUT 'http://localhost:8983/solr/techproducts/schema/feature-store' 
--data-binary "@./feature-store.minimal.json" -H 
'Content-type:application/json'}}*

 # upload model

*{{$ curl -XPUT 'http://localhost:8983/solr/techproducts/schema/model-store' 
--data-binary "@./model.minimal.json" -H 'Content-type:application/json'}}*

 # index sample documents

*{{$ curl -XPOST 
'http://localhost:8983/solr/techproducts/update?wt=json&commitWithin=1&overwrite=true'
 --data-binary "@./sample_documents.json" -H 'Content-type:application/json'}}*

 # query for ‘husa cu tastatura tableta 10 inch’

*{{$ curl -XGET 
'http://localhost:8983/solr/techproducts/select?fl=id%2C%5Bfeatures%20store%3Ddev_2020_08_26%20efi.term%3D"husa%20cu%20tastatura%20tableta%2010%20inch"%5D&q=husa%20cu%20tastatura%20tableta%2010%20inch&rows=500}}*

> Bug on LTR when using solr 8.6.3 - index out of bounds 
> DisiPriorityQueue.add(DisiPriorityQueue.java:102)
> 
>
> Key: SOLR-15071
> URL: https://issues.apache.org/jira/browse/SOLR-15071
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: contrib - LTR
>Affects Versions: 8.6, 8.7
>Reporter: Florin Babes
>Priority: Major
>  Labels: ltr
> Attachments: featurestore+model+sample_documents.zip
>
>
> Hello,
> We are trying to update Solr from 8.3.1 to 8.6.3. On Solr 8.3.1 we are
> using LTR in production using a MultipleAdditiveTrees model. On Solr 8.6.3
> we receive an error when we try to compute some SolrFeatures. We didn't
> find any pattern of the queries that fail.
> Example:
> We have the following query raw parameters:
> q=lg cx 4k oled 120 hz -> just of many examples
> term_dq=lg cx 4k oled 120 hz
> rq={!ltr model=model reRankDocs=1000 store=feature_store
> efi.term=${term_dq}}
> defType=edismax,
> mm=2<75%
> The features are something like this:
> {
>  "name":"similarity_query_fileld_1",
>  "class":"org.apache.solr.ltr.feature.SolrFeature",
>  "params":\{"q":"{!dismax qf=query_field_1 mm=1}${term}"},
>  "store":"feature_store"
> },
> {
>  "name":"similarity_query_field_2",
>  "class":"org.apache.solr.ltr.feature.SolrFeature",
>  "params":\{"q":"{!dismax qf=query_field_2 mm=5}${term}"},
>  "store":"feature_store"
> }
> We are testing ~6300 production queries and for about 1% of them we receive
> that following error message:
> "metadata":[
>  "error-class","org.apache.solr.common.SolrException",
>  "root-error-class","java.lang.ArrayIndexOutOfBoundsException"],
>  "msg":"java.lang.ArrayIndexOutOfBoundsException: Index 2 out of bounds
> for length 2",
>

[jira] [Comment Edited] (SOLR-15071) Bug on LTR when using solr 8.6.3 - index out of bounds DisiPriorityQueue.add(DisiPriorityQueue.java:102)

2021-01-11 Thread Ovidiu Mihalcea (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-15071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17262657#comment-17262657
 ] 

Ovidiu Mihalcea edited comment on SOLR-15071 at 1/11/21, 2:04 PM:
--

Hello,

Maybe someone who recently went through the code related to this could help us 
identify the issue?

[~dsmiley], if you have the time, could you also take a look at this bug and 
maybe help us identify the issue? It would be very appreciated. Thank you!

 

Here are the exact reproduction steps we are using to trigger the bug (files: 
[^featurestore+model+sample_documents.zip] ):

* start and index techproducts

{{*$ /opt/solr/bin/solr -e techproducts -Dsolr.ltr.enabled=true*}}

* upload feature store

*{{$ curl -XPUT 'http://localhost:8983/solr/techproducts/schema/feature-store' 
--data-binary "@./feature-store.minimal.json" -H 
'Content-type:application/json'}}*

 # upload model

*{{$ curl -XPUT 'http://localhost:8983/solr/techproducts/schema/model-store' 
--data-binary "@./model.minimal.json" -H 'Content-type:application/json'}}*

 # index sample documents

*{{$ curl -XPOST 
'http://localhost:8983/solr/techproducts/update?wt=json&commitWithin=1&overwrite=true'
 --data-binary "@./sample_documents.json" -H 'Content-type:application/json'}}*

 # query for ‘husa cu tastatura tableta 10 inch’

*{{$ curl -XGET 
'http://localhost:8983/solr/techproducts/select?fl=id%2C%5Bfeatures%20store%3Ddev_2020_08_26%20efi.term%3D"husa%20cu%20tastatura%20tableta%2010%20inch"%5D&q=husa%20cu%20tastatura%20tableta%2010%20inch&rows=500}}*


was (Author: dizzu333):
Hello,

Maybe someone who recently went through the code related to this could help us 
identify the issue?

[~dsmiley], if you have the time, could you also take a look at this bug and 
maybe help us identify the issue? It would be very appreciated. Thank you!

 

Here are the exact reproduction steps we are using to trigger the bug (files: 
[^featurestore+model+sample_documents.zip] ):

# start and index techproducts

{{*$ /opt/solr/bin/solr -e techproducts -Dsolr.ltr.enabled=true*}}

# upload feature store

*{{$ curl -XPUT 'http://localhost:8983/solr/techproducts/schema/feature-store' 
--data-binary "@./feature-store.minimal.json" -H 
'Content-type:application/json'}}*

 # upload model

*{{$ curl -XPUT 'http://localhost:8983/solr/techproducts/schema/model-store' 
--data-binary "@./model.minimal.json" -H 'Content-type:application/json'}}*

 # index sample documents

*{{$ curl -XPOST 
'http://localhost:8983/solr/techproducts/update?wt=json&commitWithin=1&overwrite=true'
 --data-binary "@./sample_documents.json" -H 'Content-type:application/json'}}*

 # query for ‘husa cu tastatura tableta 10 inch’

*{{$ curl -XGET 
'http://localhost:8983/solr/techproducts/select?fl=id%2C%5Bfeatures%20store%3Ddev_2020_08_26%20efi.term%3D"husa%20cu%20tastatura%20tableta%2010%20inch"%5D&q=husa%20cu%20tastatura%20tableta%2010%20inch&rows=500}}*

> Bug on LTR when using solr 8.6.3 - index out of bounds 
> DisiPriorityQueue.add(DisiPriorityQueue.java:102)
> 
>
> Key: SOLR-15071
> URL: https://issues.apache.org/jira/browse/SOLR-15071
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: contrib - LTR
>Affects Versions: 8.6, 8.7
>Reporter: Florin Babes
>Priority: Major
>  Labels: ltr
> Attachments: featurestore+model+sample_documents.zip
>
>
> Hello,
> We are trying to update Solr from 8.3.1 to 8.6.3. On Solr 8.3.1 we are
> using LTR in production using a MultipleAdditiveTrees model. On Solr 8.6.3
> we receive an error when we try to compute some SolrFeatures. We didn't
> find any pattern of the queries that fail.
> Example:
> We have the following query raw parameters:
> q=lg cx 4k oled 120 hz -> just of many examples
> term_dq=lg cx 4k oled 120 hz
> rq={!ltr model=model reRankDocs=1000 store=feature_store
> efi.term=${term_dq}}
> defType=edismax,
> mm=2<75%
> The features are something like this:
> {
>  "name":"similarity_query_fileld_1",
>  "class":"org.apache.solr.ltr.feature.SolrFeature",
>  "params":\{"q":"{!dismax qf=query_field_1 mm=1}${term}"},
>  "store":"feature_store"
> },
> {
>  "name":"similarity_query_field_2",
>  "class":"org.apache.solr.ltr.feature.SolrFeature",
>  "params":\{"q":"{!dismax qf=query_field_2 mm=5}${term}"},
>  "store":"feature_store"
> }
> We are testing ~6300 production queries and for about 1% of them we receive
> that following error message:
> "metadata":[
>  "error-class","org.apache.solr.common.SolrException",
>  "root-error-class","java.lang.ArrayIndexOutOfBoundsException"],
>  "msg":"java.lang.ArrayIndexOutOfBoundsException: Index 2 out of bounds
> for length 2",

[jira] [Commented] (SOLR-15071) Bug on LTR when using solr 8.6.3 - index out of bounds DisiPriorityQueue.add(DisiPriorityQueue.java:102)

2021-01-11 Thread Ovidiu Mihalcea (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-15071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17262657#comment-17262657
 ] 

Ovidiu Mihalcea commented on SOLR-15071:


Hello,

Maybe someone who recently went through the code related to this could help us 
identify the issue?

[~dsmiley], if you have the time, could you also take a look at this bug and 
maybe help us identify the issue? It would be very appreciated. Thank you!

 

Here are the exact reproduction steps we are using to trigger the bug (files: 
[^featurestore+model+sample_documents.zip] ):

# start and index techproducts

{{*$ /opt/solr/bin/solr -e techproducts -Dsolr.ltr.enabled=true*}}

# upload feature store

*{{$ curl -XPUT 'http://localhost:8983/solr/techproducts/schema/feature-store' 
--data-binary "@./feature-store.minimal.json" -H 
'Content-type:application/json'}}*

 # upload model

*{{$ curl -XPUT 'http://localhost:8983/solr/techproducts/schema/model-store' 
--data-binary "@./model.minimal.json" -H 'Content-type:application/json'}}*

 # index sample documents

*{{$ curl -XPOST 
'http://localhost:8983/solr/techproducts/update?wt=json&commitWithin=1&overwrite=true'
 --data-binary "@./sample_documents.json" -H 'Content-type:application/json'}}*

 # query for ‘husa cu tastatura tableta 10 inch’

*{{$ curl -XGET 
'http://localhost:8983/solr/techproducts/select?fl=id%2C%5Bfeatures%20store%3Ddev_2020_08_26%20efi.term%3D"husa%20cu%20tastatura%20tableta%2010%20inch"%5D&q=husa%20cu%20tastatura%20tableta%2010%20inch&rows=500}}*

> Bug on LTR when using solr 8.6.3 - index out of bounds 
> DisiPriorityQueue.add(DisiPriorityQueue.java:102)
> 
>
> Key: SOLR-15071
> URL: https://issues.apache.org/jira/browse/SOLR-15071
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: contrib - LTR
>Affects Versions: 8.6, 8.7
>Reporter: Florin Babes
>Priority: Major
>  Labels: ltr
> Attachments: featurestore+model+sample_documents.zip
>
>
> Hello,
> We are trying to update Solr from 8.3.1 to 8.6.3. On Solr 8.3.1 we are
> using LTR in production using a MultipleAdditiveTrees model. On Solr 8.6.3
> we receive an error when we try to compute some SolrFeatures. We didn't
> find any pattern of the queries that fail.
> Example:
> We have the following query raw parameters:
> q=lg cx 4k oled 120 hz -> just of many examples
> term_dq=lg cx 4k oled 120 hz
> rq={!ltr model=model reRankDocs=1000 store=feature_store
> efi.term=${term_dq}}
> defType=edismax,
> mm=2<75%
> The features are something like this:
> {
>  "name":"similarity_query_fileld_1",
>  "class":"org.apache.solr.ltr.feature.SolrFeature",
>  "params":\{"q":"{!dismax qf=query_field_1 mm=1}${term}"},
>  "store":"feature_store"
> },
> {
>  "name":"similarity_query_field_2",
>  "class":"org.apache.solr.ltr.feature.SolrFeature",
>  "params":\{"q":"{!dismax qf=query_field_2 mm=5}${term}"},
>  "store":"feature_store"
> }
> We are testing ~6300 production queries and for about 1% of them we receive
> that following error message:
> "metadata":[
>  "error-class","org.apache.solr.common.SolrException",
>  "root-error-class","java.lang.ArrayIndexOutOfBoundsException"],
>  "msg":"java.lang.ArrayIndexOutOfBoundsException: Index 2 out of bounds
> for length 2",
> The stacktrace is :
> org.apache.solr.common.SolrException:
> java.lang.ArrayIndexOutOfBoundsException: Index 2 out of bounds for length 2
> at org.apache.solr.search.ReRankCollector.topDocs(ReRankCollector.java:154)
> at
> org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:159
> 9)
> at
> org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:1413
> )
> at
> org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:596)
> at
> org.apache.solr.handler.component.QueryComponent.doProcessUngroupedSearch(QueryC
> omponent.java:1513)
> at
> org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:403
> )
> at
> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.
> java:360)
> at
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java
> :214)
> at org.apache.solr.core.SolrCore.execute(SolrCore.java:2627)
> at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:795)
> at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:568)
> at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:415)
> at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:345)
> at
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.jav
> a:1596)
> at
> org.eclipse.jetty.servlet.ServletHandler.doHandle(Se

[jira] [Commented] (LUCENE-9346) WANDScorer should support minimumNumberShouldMatch

2021-01-11 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17262658#comment-17262658
 ] 

ASF subversion and git services commented on LUCENE-9346:
-

Commit c2493283a58ea19a13887a732328c1eaf970d371 in lucene-solr's branch 
refs/heads/master from zacharymorn
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=c249328 ]

LUCENE-9346: Support minimumNumberShouldMatch in WANDScorer (#2141)

Co-authored-by: Adrien Grand 

> WANDScorer should support minimumNumberShouldMatch
> --
>
> Key: LUCENE-9346
> URL: https://issues.apache.org/jira/browse/LUCENE-9346
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Priority: Minor
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> Currently we deoptimize when a minimumNumberShouldMatch is provided and fall 
> back to a scorer that doesn't dynamically prune hits based on scores.
> Given how WANDScorer and MinShouldMatchSumScorer are similar I wonder if we 
> could remove MinShouldSumScorer once WANDScorer supports minimumNumberShould 
> match. Then any improvements we bring to WANDScorer like two-phase support 
> (LUCENE-8806) would automatically cover more queries.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (SOLR-15071) Bug on LTR when using solr 8.6.3 - index out of bounds DisiPriorityQueue.add(DisiPriorityQueue.java:102)

2021-01-11 Thread Ovidiu Mihalcea (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-15071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ovidiu Mihalcea updated SOLR-15071:
---
Attachment: featurestore+model+sample_documents.zip

> Bug on LTR when using solr 8.6.3 - index out of bounds 
> DisiPriorityQueue.add(DisiPriorityQueue.java:102)
> 
>
> Key: SOLR-15071
> URL: https://issues.apache.org/jira/browse/SOLR-15071
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: contrib - LTR
>Affects Versions: 8.6, 8.7
>Reporter: Florin Babes
>Priority: Major
>  Labels: ltr
> Attachments: featurestore+model+sample_documents.zip
>
>
> Hello,
> We are trying to update Solr from 8.3.1 to 8.6.3. On Solr 8.3.1 we are
> using LTR in production using a MultipleAdditiveTrees model. On Solr 8.6.3
> we receive an error when we try to compute some SolrFeatures. We didn't
> find any pattern of the queries that fail.
> Example:
> We have the following query raw parameters:
> q=lg cx 4k oled 120 hz -> just of many examples
> term_dq=lg cx 4k oled 120 hz
> rq={!ltr model=model reRankDocs=1000 store=feature_store
> efi.term=${term_dq}}
> defType=edismax,
> mm=2<75%
> The features are something like this:
> {
>  "name":"similarity_query_fileld_1",
>  "class":"org.apache.solr.ltr.feature.SolrFeature",
>  "params":\{"q":"{!dismax qf=query_field_1 mm=1}${term}"},
>  "store":"feature_store"
> },
> {
>  "name":"similarity_query_field_2",
>  "class":"org.apache.solr.ltr.feature.SolrFeature",
>  "params":\{"q":"{!dismax qf=query_field_2 mm=5}${term}"},
>  "store":"feature_store"
> }
> We are testing ~6300 production queries and for about 1% of them we receive
> that following error message:
> "metadata":[
>  "error-class","org.apache.solr.common.SolrException",
>  "root-error-class","java.lang.ArrayIndexOutOfBoundsException"],
>  "msg":"java.lang.ArrayIndexOutOfBoundsException: Index 2 out of bounds
> for length 2",
> The stacktrace is :
> org.apache.solr.common.SolrException:
> java.lang.ArrayIndexOutOfBoundsException: Index 2 out of bounds for length 2
> at org.apache.solr.search.ReRankCollector.topDocs(ReRankCollector.java:154)
> at
> org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:159
> 9)
> at
> org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:1413
> )
> at
> org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:596)
> at
> org.apache.solr.handler.component.QueryComponent.doProcessUngroupedSearch(QueryC
> omponent.java:1513)
> at
> org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:403
> )
> at
> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.
> java:360)
> at
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java
> :214)
> at org.apache.solr.core.SolrCore.execute(SolrCore.java:2627)
> at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:795)
> at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:568)
> at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:415)
> at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:345)
> at
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.jav
> a:1596)
> at
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:545)
> at
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
> at
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:590)
> at
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)
> at
> org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:235
> )
> at
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:161
> 0)
> at
> org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:233
> )
> at
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:130
> 0)
> at
> org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:188)
> at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:485)
> at
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1580
> )
> at
> org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:186)
> at
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1215
> )
> at
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
> at
> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerC
> ollection.java:221)
> at
> org.eclipse.jetty.server.handler.InetAccessHandler.handle(In

[jira] [Commented] (SOLR-15055) Re-implement 'withCollection' and 'maxShardsPerNode'

2021-01-11 Thread Andrzej Bialecki (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-15055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17262624#comment-17262624
 ] 

Andrzej Bialecki commented on SOLR-15055:
-

The requirement #1 from the list above can be implemented just in the placement 
plugin (filter candidate nodes to ensure that they contain at least one 
secondary replica). However, other requirements need modifications to 
{{DeleteReplicaCmd .}}

> Re-implement 'withCollection' and 'maxShardsPerNode'
> 
>
> Key: SOLR-15055
> URL: https://issues.apache.org/jira/browse/SOLR-15055
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Andrzej Bialecki
>Assignee: Andrzej Bialecki
>Priority: Major
>  Time Spent: 4h 20m
>  Remaining Estimate: 0h
>
> Solr 8x replica placement provided two settings that are very useful in 
> certain scenarios:
> * {{withCollection}} constraint specified that replicas should be placed on 
> the same nodes where replicas of another collection are located. In the 8x 
> implementation this was limited in practice to co-locating single-shard 
> secondary collections used for joins or other lookups from the main 
> collection (which could be multi-sharded).
> * {{maxShardsPerNode}} - this constraint specified the maximum number of 
> replicas per shard that can be placed on the same node. In most scenarios 
> this was set to 1 in order to ensure fault-tolerance (ie. at most 1 replica 
> of any given shard would be placed on any given node). Changing this 
> constraint to values > 1 would reduce fault-tolerance but may be desired in 
> test setups or as a temporary relief measure.
>  
> Both these constraints are collection-specific so they should be configured 
> e.g. as collection properties.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-15055) Re-implement 'withCollection' and 'maxShardsPerNode'

2021-01-11 Thread Andrzej Bialecki (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-15055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17262615#comment-17262615
 ] 

Andrzej Bialecki edited comment on SOLR-15055 at 1/11/21, 12:47 PM:


So, let's take a step back, and start with use-cases and requirements.

Cross-collection joins is the main use-case because it benefits greatly from 
co-location of replicas for the primary (search) and secondary (join) 
collections.

 1. when creating the primary collection (or adding new replicas to it) the 
placement plugin must take this into account and enforce the co-location of new 
primary replicas if possible.

* in 8x the framework would automatically create secondary replicas as 
necessary to satisfy this requirement (at the cost of code complexity).
* now at minimum the placement plugin should fail if it's not possible to 
satisfy this constraint with the current cluster layout, because then some 
primary replicas would have to use secondary replicas from other nodes and this 
would greatly (and unevenly) increase the query latency.
* if the placement of the primary collection fails because this constraint 
cannot be satisfied (there are too few secondary replicas) then the operator 
must manually add as many secondary replicas as needed. This is the weakness of 
this minimal approach - it would be better to somehow automate it to eliminate 
the manual intervention.

 2. removal of secondary replicas should be prevented if they are actively used 
under this constraint by any primary replicas on a specific node - allowing 
this would cause performance issues as described above.
 3. removal of primary replicas should cause the secondary replicas to be 
removed as well, if they are no longer in use on a specific node - but ensuring 
that at least N replicas of the secondary collection remain (to prevent data 
loss).


was (Author: ab):
So, let's take a step back, and start with use-cases and requirements:
* cross-collection joins is the main use-case because it benefits greatly from 
co-location of replicas for the primary (search) and secondary (join) 
collections.
* when creating the primary collection (or adding new replicas to it) the 
placement plugin must take this into account and enforce the co-location of new 
primary replicas if possible.
** in 8x the framework would automatically create secondary replicas as 
necessary to satisfy this requirement (at the cost of code complexity).
** now at minimum the placement plugin should fail if it's not possible to 
satisfy this constraint with the current cluster layout, because then some 
primary replicas would have to use secondary replicas from other nodes and this 
would greatly (and unevenly) increase the query latency.
** if the placement of the primary collection fails because this constraint 
cannot be satisfied (there are too few secondary replicas) then the operator 
must manually add as many secondary replicas as needed. This is the weakness of 
this minimal approach - it would be better to somehow automate it to eliminate 
the manual intervention. 
* removal of secondary replicas should be prevented if they are actively used 
under this constraint by any primary replicas on a specific node - allowing 
this would cause performance issues as described above.
* removal of primary replicas should cause the secondary replicas to be removed 
as well, if they are no longer in use on a specific node - but ensuring that at 
least N replicas of the secondary collection remain (to prevent data loss).

> Re-implement 'withCollection' and 'maxShardsPerNode'
> 
>
> Key: SOLR-15055
> URL: https://issues.apache.org/jira/browse/SOLR-15055
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Andrzej Bialecki
>Assignee: Andrzej Bialecki
>Priority: Major
>  Time Spent: 4h 20m
>  Remaining Estimate: 0h
>
> Solr 8x replica placement provided two settings that are very useful in 
> certain scenarios:
> * {{withCollection}} constraint specified that replicas should be placed on 
> the same nodes where replicas of another collection are located. In the 8x 
> implementation this was limited in practice to co-locating single-shard 
> secondary collections used for joins or other lookups from the main 
> collection (which could be multi-sharded).
> * {{maxShardsPerNode}} - this constraint specified the maximum number of 
> replicas per shard that can be placed on the same node. In most scenarios 
> this was set to 1 in order to ensure fault-tolerance (ie. at most 1 replica 
> of any given shard would be placed on any given node). Changing this 
> constraint to values > 1 would reduce fault-tolerance but may be desired in 
> test setups 

[jira] [Commented] (SOLR-15055) Re-implement 'withCollection' and 'maxShardsPerNode'

2021-01-11 Thread Andrzej Bialecki (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-15055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17262615#comment-17262615
 ] 

Andrzej Bialecki commented on SOLR-15055:
-

So, let's take a step back, and start with use-cases and requirements:
* cross-collection joins is the main use-case because it benefits greatly from 
co-location of replicas for the primary (search) and secondary (join) 
collections.
* when creating the primary collection (or adding new replicas to it) the 
placement plugin must take this into account and enforce the co-location of new 
primary replicas if possible.
** in 8x the framework would automatically create secondary replicas as 
necessary to satisfy this requirement (at the cost of code complexity).
** now at minimum the placement plugin should fail if it's not possible to 
satisfy this constraint with the current cluster layout, because then some 
primary replicas would have to use secondary replicas from other nodes and this 
would greatly (and unevenly) increase the query latency.
** if the placement of the primary collection fails because this constraint 
cannot be satisfied (there are too few secondary replicas) then the operator 
must manually add as many secondary replicas as needed. This is the weakness of 
this minimal approach - it would be better to somehow automate it to eliminate 
the manual intervention. 
* removal of secondary replicas should be prevented if they are actively used 
under this constraint by any primary replicas on a specific node - allowing 
this would cause performance issues as described above.
* removal of primary replicas should cause the secondary replicas to be removed 
as well, if they are no longer in use on a specific node - but ensuring that at 
least N replicas of the secondary collection remain (to prevent data loss).

> Re-implement 'withCollection' and 'maxShardsPerNode'
> 
>
> Key: SOLR-15055
> URL: https://issues.apache.org/jira/browse/SOLR-15055
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Andrzej Bialecki
>Assignee: Andrzej Bialecki
>Priority: Major
>  Time Spent: 4h 20m
>  Remaining Estimate: 0h
>
> Solr 8x replica placement provided two settings that are very useful in 
> certain scenarios:
> * {{withCollection}} constraint specified that replicas should be placed on 
> the same nodes where replicas of another collection are located. In the 8x 
> implementation this was limited in practice to co-locating single-shard 
> secondary collections used for joins or other lookups from the main 
> collection (which could be multi-sharded).
> * {{maxShardsPerNode}} - this constraint specified the maximum number of 
> replicas per shard that can be placed on the same node. In most scenarios 
> this was set to 1 in order to ensure fault-tolerance (ie. at most 1 replica 
> of any given shard would be placed on any given node). Changing this 
> constraint to values > 1 would reduce fault-tolerance but may be desired in 
> test setups or as a temporary relief measure.
>  
> Both these constraints are collection-specific so they should be configured 
> e.g. as collection properties.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-15070) SolrJ ClassCastException on XML /suggest requests

2021-01-11 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-15070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17262613#comment-17262613
 ] 

ASF subversion and git services commented on SOLR-15070:


Commit 98c51ca34b0609c8042a6caa31814d95b788f19a in lucene-solr's branch 
refs/heads/master from Jason Gerlowski
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=98c51ca ]

SOLR-15070: Remove HashMap usage in SuggestComponent rsp (#2183)

Prior to this commit, SuggestComponent used a HashMap as part of the
response it built on the server side.  This class is serialized/
deserialized differently depending on the SolrJ ResponseParser used:
a LinkedHashMap when javabin was used, and a SimpleOrderedMap when XML
was used.  This discrepancy led to ClassCastException's in downstream
SolrJ code.

This commit fixes the issue by changing SuggestComponent to avoid these
types that are serialized differently.  "suggest" response sections now
deserialize as a NamedList in SolrJ, and the SuggesterResponse POJO has
been updated accordingly.

> SolrJ ClassCastException on XML /suggest requests
> -
>
> Key: SOLR-15070
> URL: https://issues.apache.org/jira/browse/SOLR-15070
> Project: Solr
>  Issue Type: Task
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrJ, Suggester
>Affects Versions: 8.6.3, master (9.0)
>Reporter: Jason Gerlowski
>Assignee: Jason Gerlowski
>Priority: Minor
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> SolrJ throws a ClassCastException when parsing XML {{/suggest}} responses.
> The following code snippet (run against the techproduct example) produces the 
> stack trace that follows:
> {code}
>   public void testSuggestFailure() throws Exception {
> HttpSolrClient client = new HttpSolrClient.Builder()
> .withBaseSolrUrl("http://localhost:8983/solr/techproducts";)
> .withResponseParser(new XMLResponseParser())
> .build();
> Map queryParamMap = new HashMap<>();
> queryParamMap.put("qt", "/suggest");
> queryParamMap.put("suggest.q", "elec");
> queryParamMap.put("suggest.build", "true");
> queryParamMap.put("suggest", "true");
> queryParamMap.put("suggest.dictionary", "mySuggester");
> queryParamMap.put("wt", "xml");
> QueryResponse resp = client.query(new MapSolrParams(queryParamMap));
>   }
> {code}
> {code}
> java.lang.ClassCastException: class 
> org.apache.solr.common.util.SimpleOrderedMap cannot be cast to class 
> java.util.Map (org.apache.solr.common.util.SimpleOrderedMap is in unnamed 
> module of loader 'app'; java.util.Map is in module java.base of loader 
> 'bootstrap')
> at 
> org.apache.solr.client.solrj.response.QueryResponse.setResponse(QueryResponse.java:170)
>  ~[solr-solrj-8.4.1.jar:8.4.1 832bf13dd9187095831caf69783179d41059d013 - 
> ishan - 2020-01-10 13:40:30]
> at org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:211) 
> ~[solr-solrj-8.4.1.jar:8.4.1 832bf13dd9187095831caf69783179d41059d013 - ishan 
> - 2020-01-10 13:40:30]
> at org.apache.solr.client.solrj.SolrClient.query(SolrClient.java:1003) 
> ~[solr-solrj-8.4.1.jar:8.4.1 832bf13dd9187095831caf69783179d41059d013 - ishan 
> - 2020-01-10 13:40:30]
> at org.apache.solr.client.solrj.SolrClient.query(SolrClient.java:1018) 
> ~[solr-solrj-8.4.1.jar:8.4.1 832bf13dd9187095831caf69783179d41059d013 - ishan 
> - 2020-01-10 13:40:30]
> {code}
> SolrJ's {{QueryResponse}} expects the "suggest" section in the NamedList 
> response to be a Map - and for requests that use javabin (the default 
> ResponseParser), it is.  But when parsing XML responses the "suggest" section 
> deserialized as a SimpleOrderedMap (which despite the name doesn't implement 
> Map).
> The root cause afaict is that SuggestComponent [uses a 
> type|https://github.com/apache/lucene-solr/blob/43b1a2fdc7a4bf8e5c8409013d07858dec6d0c35/solr/core/src/java/org/apache/solr/handler/component/SuggestComponent.java#L261]
>  (HashMap) that serializes/deserializes differently based on the codec/wt 
> used on the wire.  JavaBinCodec has special handling for maps that our XML 
> serialization doesn't have, so the two produce different response structures 
> on the client side.
> The "right" fix here is to change SuggestComponent's response to only use 
> types that serialize/deserialize identically in all SolrJ's ResponseParser's. 
>  This is a breaking change though - a SolrJ user making /suggest requests, 
> getting the responses via javavbin, and inspecting the resulting NamedList 
> directly would get a different object tree after this fix than they would've 
> before.  So an 8.x fix would need to take a different approach.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

--

[jira] [Commented] (SOLR-13055) Introduce check to determine "liveliness" of a Solr node

2021-01-11 Thread Torsten Werner (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17262561#comment-17262561
 ] 

Torsten Werner commented on SOLR-13055:
---

I have just found this Jira. I could not find anything matching the word health 
in the documentation. I have tried calling the URL /admin/health with the 
instance in the docker container but it returns 404. I would appreciate any 
hint on how to setup the health check.

> Introduce check to determine "liveliness" of a Solr node
> 
>
> Key: SOLR-13055
> URL: https://issues.apache.org/jira/browse/SOLR-13055
> Project: Solr
>  Issue Type: Improvement
>Reporter: Amrit Sarkar
>Priority: Minor
>  Labels: cloud, native
>
> As the applications are becoming cloud-friendly; there are multiple probes 
> which are required to verify availability of a node.
> Like in Kubernetes we need 'liveliness' and 'readiness' probe explained in 
> https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-probes/n
>  determine if a node is live and ready to serve live traffic.
> Solr should also support such probes out of the box as an API or otherwise to 
> make things easier. In this JIRA, we are tracking the necessary checks we 
> need to determine if a node is  'liveliness', in all modes, standalone and 
> cloud.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-15056) CPU circuit breaker needs to use CPU utilization, not Unix load average

2021-01-11 Thread Andrzej Bialecki (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-15056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17262521#comment-17262521
 ] 

Andrzej Bialecki edited comment on SOLR-15056 at 1/11/21, 9:36 AM:
---

The equivalent Java code would be something like this:

{code}
coreContainer
  .getSolrMetricManager()
  .registry(“solr.jvm”)
  .getMetrics()
  .get(“os.systemCpuLoad”)
{code}


was (Author: ab):
The equivalent Java code would be something like this:

{code}
corerContainer
  .getSolrMetricManager()
  .registry(“solr.jvm”)
  .getMetrics()
  .get(“os.systemCpuLoad”)
{code}

> CPU circuit breaker needs to use CPU utilization, not Unix load average
> ---
>
> Key: SOLR-15056
> URL: https://issues.apache.org/jira/browse/SOLR-15056
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: metrics
>Affects Versions: 8.7
>Reporter: Walter Underwood
>Priority: Major
>
> The config range, 50% to 95%, assumes that the circuit breaker is triggered 
> by a CPU utilization metric that goes from 0% to 100%. But the code uses the 
> metric OperatingSystemMXBean.getSystemLoadAverage(). That is an average of 
> the count of processes waiting to run. It is effectively unbounded. I've seen 
> it as high as 50 to 100. It is not bound by 1.0 (100%).
> A good limit for load average would need to be aware of the number of CPUs 
> available to the JVM. A load average of 8 is no problem for a 32 CPU host. It 
> is a critical situation for a 2 CPU host.
> Also, load average is a Unix OS metric. I don't know if it is even available 
> on Windows.
> Instead, use a CPU utilization metric that goes from 0.0 to 1.0. A good 
> choice is OperatingSystemMXBean.getSystemCPULoad(). This name also uses 
> "load", but it is a usage metric.
> From the Javadoc:
> > Returns the "recent cpu usage" for the whole system. This value is a double 
> >in the [0.0,1.0] interval. A value of 0.0 means that all CPUs were idle 
> >during the recent period of time observed, while a value of 1.0 means that 
> >all CPUs were actively running 100% of the time during the recent period 
> >being observed. All values betweens 0.0 and 1.0 are possible depending of 
> >the activities going on in the system. If the system recent cpu usage is not 
> >available, the method returns a negative value.
> https://docs.oracle.com/javase/7/docs/jre/api/management/extension/com/sun/management/OperatingSystemMXBean.html#getSystemCpuLoad()
> Also update the documentation to explain which JMX metrics are used for the 
> memory and CPU circuit breakers.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-15056) CPU circuit breaker needs to use CPU utilization, not Unix load average

2021-01-11 Thread Andrzej Bialecki (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-15056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17262521#comment-17262521
 ] 

Andrzej Bialecki commented on SOLR-15056:
-

The equivalent Java code would be something like this:

{code}
corerContainer
  .getSolrMetricManager()
  .registry(“solr.jvm”)
  .getMetrics()
  .get(“os.systemCpuLoad”)
{code}

> CPU circuit breaker needs to use CPU utilization, not Unix load average
> ---
>
> Key: SOLR-15056
> URL: https://issues.apache.org/jira/browse/SOLR-15056
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: metrics
>Affects Versions: 8.7
>Reporter: Walter Underwood
>Priority: Major
>
> The config range, 50% to 95%, assumes that the circuit breaker is triggered 
> by a CPU utilization metric that goes from 0% to 100%. But the code uses the 
> metric OperatingSystemMXBean.getSystemLoadAverage(). That is an average of 
> the count of processes waiting to run. It is effectively unbounded. I've seen 
> it as high as 50 to 100. It is not bound by 1.0 (100%).
> A good limit for load average would need to be aware of the number of CPUs 
> available to the JVM. A load average of 8 is no problem for a 32 CPU host. It 
> is a critical situation for a 2 CPU host.
> Also, load average is a Unix OS metric. I don't know if it is even available 
> on Windows.
> Instead, use a CPU utilization metric that goes from 0.0 to 1.0. A good 
> choice is OperatingSystemMXBean.getSystemCPULoad(). This name also uses 
> "load", but it is a usage metric.
> From the Javadoc:
> > Returns the "recent cpu usage" for the whole system. This value is a double 
> >in the [0.0,1.0] interval. A value of 0.0 means that all CPUs were idle 
> >during the recent period of time observed, while a value of 1.0 means that 
> >all CPUs were actively running 100% of the time during the recent period 
> >being observed. All values betweens 0.0 and 1.0 are possible depending of 
> >the activities going on in the system. If the system recent cpu usage is not 
> >available, the method returns a negative value.
> https://docs.oracle.com/javase/7/docs/jre/api/management/extension/com/sun/management/OperatingSystemMXBean.html#getSystemCpuLoad()
> Also update the documentation to explain which JMX metrics are used for the 
> memory and CPU circuit breakers.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-15056) CPU circuit breaker needs to use CPU utilization, not Unix load average

2021-01-11 Thread Andrzej Bialecki (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-15056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17262468#comment-17262468
 ] 

Andrzej Bialecki edited comment on SOLR-15056 at 1/11/21, 8:12 AM:
---

{{systemCpuLoad}} is already supported and returned as one of the metrics. This 
comes from the (somewhat convoluted) code in {{MetricUtils.addMxBeanMetrics}} 
where it tries to use all known implementations and accumulates any unique bean 
properties that they expose.

For example:
{code}
http://localhost:8983/solr/admin/metrics?group=jvm&prefix=os

{
"responseHeader": {
"status": 0,
"QTime": 1
},
"metrics": {
"solr.jvm": {
"os.arch": "x86_64",
"os.availableProcessors": 12,
"os.committedVirtualMemorySize": 8402419712,
"os.freePhysicalMemorySize": 41504768,
"os.freeSwapSpaceSize": 804519936,
"os.maxFileDescriptorCount": 8192,
"os.name": "Mac OS X",
"os.openFileDescriptorCount": 195,
"os.processCpuLoad": 0.0017402379609634876,
"os.processCpuTime": 1049201,
"os.systemCpuLoad": 0.1268950796343933,
"os.systemLoadAverage": 4.00439453125,
"os.totalPhysicalMemorySize": 34359738368,
"os.totalSwapSpaceSize": 7516192768,
"os.version": "10.16"
}
}
}
{code}


was (Author: ab):
{{systtemCpuLoad}} is already supported and returned as one of the metrics. 
This comes from the (somewhat convoluted) code in 
{{MetricUtils.addMxBeanMetrics}} where it tries to use all known 
implementations and accumulates any unique bean properties that they expose.

For example:
{code}
http://localhost:8983/solr/admin/metrics?group=jvm&prefix=os

{
"responseHeader": {
"status": 0,
"QTime": 1
},
"metrics": {
"solr.jvm": {
"os.arch": "x86_64",
"os.availableProcessors": 12,
"os.committedVirtualMemorySize": 8402419712,
"os.freePhysicalMemorySize": 41504768,
"os.freeSwapSpaceSize": 804519936,
"os.maxFileDescriptorCount": 8192,
"os.name": "Mac OS X",
"os.openFileDescriptorCount": 195,
"os.processCpuLoad": 0.0017402379609634876,
"os.processCpuTime": 1049201,
"os.systemCpuLoad": 0.1268950796343933,
"os.systemLoadAverage": 4.00439453125,
"os.totalPhysicalMemorySize": 34359738368,
"os.totalSwapSpaceSize": 7516192768,
"os.version": "10.16"
}
}
}
{code}

> CPU circuit breaker needs to use CPU utilization, not Unix load average
> ---
>
> Key: SOLR-15056
> URL: https://issues.apache.org/jira/browse/SOLR-15056
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: metrics
>Affects Versions: 8.7
>Reporter: Walter Underwood
>Priority: Major
>
> The config range, 50% to 95%, assumes that the circuit breaker is triggered 
> by a CPU utilization metric that goes from 0% to 100%. But the code uses the 
> metric OperatingSystemMXBean.getSystemLoadAverage(). That is an average of 
> the count of processes waiting to run. It is effectively unbounded. I've seen 
> it as high as 50 to 100. It is not bound by 1.0 (100%).
> A good limit for load average would need to be aware of the number of CPUs 
> available to the JVM. A load average of 8 is no problem for a 32 CPU host. It 
> is a critical situation for a 2 CPU host.
> Also, load average is a Unix OS metric. I don't know if it is even available 
> on Windows.
> Instead, use a CPU utilization metric that goes from 0.0 to 1.0. A good 
> choice is OperatingSystemMXBean.getSystemCPULoad(). This name also uses 
> "load", but it is a usage metric.
> From the Javadoc:
> > Returns the "recent cpu usage" for the whole system. This value is a double 
> >in the [0.0,1.0] interval. A value of 0.0 means that all CPUs were idle 
> >during the recent period of time observed, while a value of 1.0 means that 
> >all CPUs were actively running 100% of the time during the recent period 
> >being observed. All values betweens 0.0 and 1.0 are possible depending of 
> >the activities going on in the system. If the system recent cpu usage is not 
> >available, the method returns a negative value.
> https://docs.oracle.com/javase/7/docs/jre/api/management/extension/com/sun/management/OperatingSystemMXBean.html#getSystemCpuLoad()
> Also update the documentation to explain which JMX metrics are used for the 
> memory and CPU circuit breakers.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

--

[jira] [Commented] (SOLR-15056) CPU circuit breaker needs to use CPU utilization, not Unix load average

2021-01-11 Thread Andrzej Bialecki (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-15056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17262468#comment-17262468
 ] 

Andrzej Bialecki commented on SOLR-15056:
-

{{systtemCpuLoad}} is already supported and returned as one of the metrics. 
This comes from the (somewhat convoluted) code in 
{{MetricUtils.addMxBeanMetrics}} where it tries to use all known 
implementations and accumulates any unique bean properties that they expose.

For example:
{code}
http://localhost:8983/solr/admin/metrics?group=jvm&prefix=os

{
"responseHeader": {
"status": 0,
"QTime": 1
},
"metrics": {
"solr.jvm": {
"os.arch": "x86_64",
"os.availableProcessors": 12,
"os.committedVirtualMemorySize": 8402419712,
"os.freePhysicalMemorySize": 41504768,
"os.freeSwapSpaceSize": 804519936,
"os.maxFileDescriptorCount": 8192,
"os.name": "Mac OS X",
"os.openFileDescriptorCount": 195,
"os.processCpuLoad": 0.0017402379609634876,
"os.processCpuTime": 1049201,
"os.systemCpuLoad": 0.1268950796343933,
"os.systemLoadAverage": 4.00439453125,
"os.totalPhysicalMemorySize": 34359738368,
"os.totalSwapSpaceSize": 7516192768,
"os.version": "10.16"
}
}
}
{code}

> CPU circuit breaker needs to use CPU utilization, not Unix load average
> ---
>
> Key: SOLR-15056
> URL: https://issues.apache.org/jira/browse/SOLR-15056
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: metrics
>Affects Versions: 8.7
>Reporter: Walter Underwood
>Priority: Major
>
> The config range, 50% to 95%, assumes that the circuit breaker is triggered 
> by a CPU utilization metric that goes from 0% to 100%. But the code uses the 
> metric OperatingSystemMXBean.getSystemLoadAverage(). That is an average of 
> the count of processes waiting to run. It is effectively unbounded. I've seen 
> it as high as 50 to 100. It is not bound by 1.0 (100%).
> A good limit for load average would need to be aware of the number of CPUs 
> available to the JVM. A load average of 8 is no problem for a 32 CPU host. It 
> is a critical situation for a 2 CPU host.
> Also, load average is a Unix OS metric. I don't know if it is even available 
> on Windows.
> Instead, use a CPU utilization metric that goes from 0.0 to 1.0. A good 
> choice is OperatingSystemMXBean.getSystemCPULoad(). This name also uses 
> "load", but it is a usage metric.
> From the Javadoc:
> > Returns the "recent cpu usage" for the whole system. This value is a double 
> >in the [0.0,1.0] interval. A value of 0.0 means that all CPUs were idle 
> >during the recent period of time observed, while a value of 1.0 means that 
> >all CPUs were actively running 100% of the time during the recent period 
> >being observed. All values betweens 0.0 and 1.0 are possible depending of 
> >the activities going on in the system. If the system recent cpu usage is not 
> >available, the method returns a negative value.
> https://docs.oracle.com/javase/7/docs/jre/api/management/extension/com/sun/management/OperatingSystemMXBean.html#getSystemCpuLoad()
> Also update the documentation to explain which JMX metrics are used for the 
> memory and CPU circuit breakers.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org