[ 
https://issues.apache.org/jira/browse/SOLR-13132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17051469#comment-17051469
 ] 

Chris M. Hostetter commented on SOLR-13132:
-------------------------------------------

Hey Michael,

I spent a little time looking at the PR – not an in depth review, just poking 
around, to try to get a an overall sense of the idea...

*TL:DR:* Definitely some bugs, needs tests, i'd like to help but it's a bit 
overwhelming to try and review as a monolithic PR
----
The first thing that jumped out at me is that there are no tests, or test 
modifications – that seemes concerning.

With something like this, where we're trying to optimize the behavior through 
caching, it's tempting to assume that as long as the change doesn't break 
existing tests that's "good enough" because we're not adding new 
functionality/APIs – but there's a few problems with that assumption:
 # There is actually a functionality/API change, in the form of the new 
config/hueristics/params controlling when/if to take advantage of the cache 
(and in this specific case of this PR: hueristics on when/if to do a single 
sweep). those options/hueristics need tested to ensure they behave as expected 
(which can typically be done using white box inspection of cache metrics, using 
things like the /metrics API)
 # existing tests that were written w/o any idea that caching might be invovled 
in the underlying code rarely follow usage patterns that would typically 
trigger cache hits – ie: a test that does the exact same query (or facet) twice 
in a row isn't likely to be written unless/untill you know there may/should be 
a cache hit involved.

Because all these changes are esentially a No-Op (w/o an explicit 
"termFacetCache" configured and/or an explicit query option) then even if 
existing tests would just happen to trigger the cache (or sweep) logic that 
won't matter w/o test configuration changes.

I did a quick and dirty hack, demonstrated by this little patch below, to try 
and *force* the "termFacetCache" to be enabled for all tests (and all term 
faceting, regardless of how small the count was) and was suprised to see a 
*LOT* of test errors/failures (I got 63 just in solr core)...
{noformat}
diff --git a/solr/core/src/java/org/apache/solr/request/TermFacetCache.java 
b/solr/core/src/java/org/apache/solr/request/TermFacetCache.java
index 3f22449b380..b818d2723d5 100644
--- a/solr/core/src/java/org/apache/solr/request/TermFacetCache.java
+++ b/solr/core/src/java/org/apache/solr/request/TermFacetCache.java
@@ -28,7 +28,7 @@ import org.apache.solr.search.QueryResultKey;
 public class TermFacetCache {
 
   public static final String NAME = "termFacetCache";
-  public static final int DEFAULT_THRESHOLD = 5000;
+  public static final int DEFAULT_THRESHOLD = 1; // nocommit: HACK to force 
all tests to use cache // 5000;
 
 
   public static final class FacetCacheKey {
diff --git a/solr/core/src/java/org/apache/solr/search/SolrIndexSearcher.java 
b/solr/core/src/java/org/apache/solr/search/SolrIndexSearcher.jav
index ae47aed273b..8767a8f4db2 100644
--- a/solr/core/src/java/org/apache/solr/search/SolrIndexSearcher.java
+++ b/solr/core/src/java/org/apache/solr/search/SolrIndexSearcher.java
@@ -282,7 +282,8 @@ public class SolrIndexSearcher extends IndexSearcher 
implements Closeable, SolrI
       if (documentCache != null) clist.add(documentCache);
 
       if (solrConfig.userCacheConfigs.isEmpty()) {
-        cacheMap = NO_GENERIC_CACHES;
+        // nocommit: HACK to force TermFacetCache in all testing...
+        cacheMap = new HashMap<>(); // nocommit: cacheMap = NO_GENERIC_CACHES;
       } else {
         cacheMap = new HashMap<>(solrConfig.userCacheConfigs.size());
         for (Map.Entry<String,CacheConfig> e : 
solrConfig.userCacheConfigs.entrySet()) {
@@ -294,6 +295,22 @@ public class SolrIndexSearcher extends IndexSearcher 
implements Closeable, SolrI
         }
       }
 
+      { // nocommit: HACK to force TermFacetCache in all testing...
+                final Map<String,String> nocommitArgs = new HashMap<>();
+        nocommitArgs.put("name",org.apache.solr.request.TermFacetCache.NAME);
+        nocommitArgs.put("size","200");
+        nocommitArgs.put("initialSize","200");
+        nocommitArgs.put("autowarmCount","200");
+        
+        final SolrCache nocommitCache =
+          (new CacheConfig(CaffeineCache.class,
+                           nocommitArgs,
+                           new 
org.apache.solr.request.TermFacetCacheRegenerator())).newInstance();
+        
+        cacheMap.put(nocommitCache.name(), nocommitCache);
+        clist.add(nocommitCache);
+      }
+
       cacheList = clist.toArray(new SolrCache[clist.size()]);
     } else {
       this.filterCache = null;
{noformat}
I didn't analyze the results in depth, but at a glamce the failures seem to 
fall into a few categories:
 * term facet constraints out of order and/or unexpected counts
 ** suggesting that the cache is returning incorrect results or results for the 
wrong key?
 * ArrayIndexOutOfBoundsException in various faceting classes
 * NullPointerException from FacetCacheKey.hashCode()
 * "UnsupportedOperationException: reset not supported for cached count" in the 
new CacheUpdateCountSlotAcc

While that patch above is a _HUGE_ hack and obviously not committable, applying 
it should make it easy to spot some problems and help inspire some fixes / new 
tests.
----
While the above tests were running, before I realized how many failures there 
were, I also tried building solr server and doing some manual testing of 
relatedness aggreations (using the example docs/request from the ref-guide: 
[https://lucene.apache.org/solr/guide/8_4/json-facet-api.html#semantic-knowledge-graph-example])
 and inspection of the cache metrics.

The first thing that jumped out at me was that while I kept seeing cache 
_lookups_ I wasn't seeing any cache _inserts_ (this hitratio was always 0)

playing around with the requests/options lead me to realize that the problem 
seemed to be that the fields I was faceting on didn't have docValues, leading 
me to deduce the problems is aparently FacetFieldProcessorByArrayUIF ... 
FacetFieldProcessor has modifications to support the use of the cache in 
subclasses and allow lookups by the collectors, and 
FacetFieldProcessorByArrayDV has code to insert collected results into the 
cache, but FacetFieldProcessorByArrayUIF has no modifications at all (so no 
reason if/why faceting using that processor would result in cache insertions)

Which just goes to show the importance of new (whitebox) tests asserting cache 
hits/miss metrics when expected. :)
----
>From a pure code spelunking standpoint, poking around here and there, I was 
>suprised to see {{QueryResultKey}} modified by this PR, since I couldn't think 
>of any reason why this new facet caching should involve/affect the 
>{{queryResultsCache}}.

As best I can tell, this seems to just be an attempt at code reuse? .. 
{{TermFacetCache}} needs a key that involves an (unordered) list of "filters" 
so the PR modifies {{QueryResultKey}} to try and compose it in inside 
{{FacetCacheKey}}.

But in the process these changes to {{QueryResultKey}} break it's primary usage 
by treating the main "q" param as just another "filter". For it's _intended_ 
purpose (the {{queryResultsCache}} ) the resulting sort order of the documents 
matters, so {{q=foo&fq=bar}} must *NOT* result in a cache hit for 
{{q=bar&fq=foo}} – but with the changes in the current PR the following tests 
this is no longer true, causing a test like this one (which would currently 
pass on master) to fail...
{noformat}
diff --git a/solr/core/src/test/org/apache/solr/core/QueryResultKeyTest.java 
b/solr/core/src/test/org/apache/solr/core/QueryResultKeyTest.java
index dc1e48b17b3..9d627146595 100644
--- a/solr/core/src/test/org/apache/solr/core/QueryResultKeyTest.java
+++ b/solr/core/src/test/org/apache/solr/core/QueryResultKeyTest.java
@@ -106,6 +106,17 @@ public class QueryResultKeyTest extends SolrTestCaseJ4 {
 
     assertKeyNotEquals(key1, key2);
   }
+  
+  public void testSwappingQueryAndFilterIsNotEquals() {
+    final Query qx = new FlatHashTermQuery("x");
+    final Query qy = new FlatHashTermQuery("y");
+    
+    final QueryResultKey x = new QueryResultKey(qx, Arrays.asList(qy), null, 
0);
+    final QueryResultKey y = new QueryResultKey(qy, Arrays.asList(qx), null, 
0);
+    
+    assertKeyNotEquals(x, y);
+  }
+    
 
   public void testRandomQueryKeyEquality() {

{noformat}
I would strongly suggest reverting your changes to {{QueryResultKey}}, and 
instead refactor the {{unorderedCompare}} logic into a helper class where it 
can be re-used by {{TermFacetCache}} ( wrapping a simple {{List<Query>}} 
instead of a hacked {{QueryResultKey}} )
----
In general, to me at least, the changes in this PR feel a bit overwhelming and 
I'm having a hard time wrapping my head around them all at once.

I fully understand how tackling the "do relatedness() calculatios in a single 
sweep" implementation naturally lead you to "implement a termFacetCache" at the 
same time to gain more speed ups, and that you wound up doing that work 
together and it all getting commingled, but from the standpoint of 
encouraging/facilitating review, feedback, adoption (from me or from others) I 
would suggest you to split these out into distinct PRs (even if one is a 
dependent fork/branch of the other) so that it's easier to for reviewers to 
mentally digest. As things stand right now I find myself getting confused as to 
when/why various changes in various classes exist: to support caching? to 
support sweeping? to support caching while sweeping?

> Improve JSON "terms" facet performance when sorted by relatedness 
> ------------------------------------------------------------------
>
>                 Key: SOLR-13132
>                 URL: https://issues.apache.org/jira/browse/SOLR-13132
>             Project: Solr
>          Issue Type: Improvement
>          Components: Facet Module
>    Affects Versions: 7.4, master (9.0)
>            Reporter: Michael Gibney
>            Priority: Major
>         Attachments: SOLR-13132-with-cache-01.patch, 
> SOLR-13132-with-cache.patch, SOLR-13132.patch
>
>          Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> When sorting buckets by {{relatedness}}, JSON "terms" facet must calculate 
> {{relatedness}} for every term. 
> The current implementation uses a standard uninverted approach (either 
> {{docValues}} or {{UnInvertedField}}) to get facet counts over the domain 
> base docSet, and then uses that initial pass as a pre-filter for a 
> second-pass, inverted approach of fetching docSets for each relevant term 
> (i.e., {{count > minCount}}?) and calculating intersection size of those sets 
> with the domain base docSet.
> Over high-cardinality fields, the overhead of per-term docSet creation and 
> set intersection operations increases request latency to the point where 
> relatedness sort may not be usable in practice (for my use case, even after 
> applying the patch for SOLR-13108, for a field with ~220k unique terms per 
> core, QTime for high-cardinality domain docSets were, e.g.: cardinality 
> 1816684=9000ms, cardinality 5032902=18000ms).
> The attached patch brings the above example QTimes down to a manageable 
> ~300ms and ~250ms respectively. The approach calculates uninverted facet 
> counts over domain base, foreground, and background docSets in parallel in a 
> single pass. This allows us to take advantage of the efficiencies built into 
> the standard uninverted {{FacetFieldProcessorByArray[DV|UIF]}}), and avoids 
> the per-term docSet creation and set intersection overhead.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to