mdmarshmallow commented on a change in pull request #747:
URL: https://github.com/apache/lucene/pull/747#discussion_r830444294



##########
File path: 
lucene/facet/src/java/org/apache/lucene/facet/sortedset/SortedSetDocValuesFacetCounts.java
##########
@@ -114,7 +116,7 @@ public FacetResult getTopChildren(int topN, String dim, 
String... path) throws I
         return null;
       }
       DimTree dimTree = state.getDimTree(dim);
-      return getPathResult(dimConfig, dim, path, pathOrd, 
dimTree.iterator(pathOrd), topN);
+      return getPathResult(dimConfig, dim, path, pathOrd, 
dimTree.iterator(pathOrd), topN, null);

Review comment:
       Small nit, but could we have an overloaded method signature here? I 
think it will make the code a bit cleaner than just passing `null`.

##########
File path: 
lucene/facet/src/java/org/apache/lucene/facet/sortedset/SortedSetDocValuesFacetCounts.java
##########
@@ -133,7 +135,7 @@ public FacetResult getTopChildren(int topN, String dim, 
String... path) throws I
         // child:
         childIt.next();
       }
-      return getPathResult(dimConfig, dim, null, dimOrd, childIt, topN);
+      return getPathResult(dimConfig, dim, null, dimOrd, childIt, topN, null);

Review comment:
       Same thing here. Actually there is a few places where `null` is just 
passed into various methods. I think new signatures excluding the 
`cacheChildOrdsResult` with comments explaining what they are would make the 
code clearer.

##########
File path: 
lucene/facet/src/java/org/apache/lucene/facet/sortedset/SortedSetDocValuesFacetCounts.java
##########
@@ -411,7 +484,109 @@ public int compare(FacetResult a, FacetResult b) {
             }
           }
         });
-
     return results;
   }
+
+  @Override
+  public List<FacetResult> getTopDims(int topNDims, int topNChildren) throws 
IOException {
+    // Creates priority queue to store top dimensions and sort by their 
aggregated values/hits and
+    // string values.
+    PriorityQueue<DimValueResult> pq =
+        new PriorityQueue<>(topNDims) {
+          @Override
+          protected boolean lessThan(DimValueResult a, DimValueResult b) {
+            if (a.value > b.value) {
+              return false;
+            } else if (a.value < b.value) {
+              return true;
+            } else {
+              return a.dim.compareTo(b.dim) > 0;
+            }
+          }
+        };
+
+    HashMap<String, ChildOrdsResult> cacheChildOrdsResult = new HashMap<>();
+    int dimCount;
+
+    for (String dim : state.getDims()) {
+      FacetsConfig.DimConfig dimConfig = stateConfig.getDimConfig(dim);
+      if (dimConfig.hierarchical) {
+        DimTree dimTree = state.getDimTree(dim);
+        int dimOrd = dimTree.dimStartOrd;
+        // get dim value
+        dimCount =
+            getDimValue(
+                dimConfig, dim, dimOrd, dimTree.iterator(), topNChildren, 
cacheChildOrdsResult);
+      } else {
+        OrdRange ordRange = state.getOrdRange(dim);
+        int dimOrd = ordRange.start;
+        PrimitiveIterator.OfInt childIt = ordRange.iterator();
+        if (dimConfig.multiValued && dimConfig.requireDimCount) {
+          // If the dim is multi-valued and requires dim counts, we know we've 
explicitly indexed
+          // the dimension and we need to skip past it so the iterator is 
positioned on the first
+          // child:
+          childIt.next();
+        }
+        dimCount = getDimValue(dimConfig, dim, dimOrd, childIt, topNChildren, 
cacheChildOrdsResult);
+      }
+
+      if (dimCount != 0) {
+        // use priority queue to store DimValueResult for topNDims
+        if (pq.size() < topNDims) {
+          pq.insertWithOverflow(new DimValueResult(dim, dimCount));
+        } else {
+          if (dimCount > pq.top().value
+              || (dimCount == pq.top().value && dim.compareTo(pq.top().dim) < 
0)) {
+            DimValueResult bottomDim = pq.pop();
+            bottomDim.dim = dim;
+            bottomDim.value = dimCount;
+            pq.insertWithOverflow(bottomDim);
+          }
+        }
+      }
+    }
+
+    // get FacetResult for topNDims
+    int resultSize = pq.size();
+    FacetResult[] results = new FacetResult[resultSize];
+
+    while (pq.size() > 0) {
+      DimValueResult dimValueResult = pq.pop();
+      FacetResult facetResult =
+          getFacetResultForDim(
+              dimValueResult.dim, topNChildren, 
cacheChildOrdsResult.get(dimValueResult.dim));
+      results[--resultSize] = facetResult;
+    }
+    return Arrays.asList(results);
+  }
+
+  /**
+   * Creates ChildOrdsResult to store dimCount, childCount, and 
TopOrdAndIntQueue q for
+   * getPathResult.

Review comment:
       The `TopOrdAndIntQueue` is a queue for the dimension's top children 
right? Maybe just include that in the comments to make it a bit more clear 
about what it is.

##########
File path: 
lucene/facet/src/java/org/apache/lucene/facet/sortedset/SortedSetDocValuesFacetCounts.java
##########
@@ -411,7 +484,109 @@ public int compare(FacetResult a, FacetResult b) {
             }
           }
         });
-
     return results;
   }
+
+  @Override
+  public List<FacetResult> getTopDims(int topNDims, int topNChildren) throws 
IOException {
+    // Creates priority queue to store top dimensions and sort by their 
aggregated values/hits and
+    // string values.
+    PriorityQueue<DimValueResult> pq =
+        new PriorityQueue<>(topNDims) {
+          @Override
+          protected boolean lessThan(DimValueResult a, DimValueResult b) {
+            if (a.value > b.value) {
+              return false;
+            } else if (a.value < b.value) {
+              return true;
+            } else {
+              return a.dim.compareTo(b.dim) > 0;
+            }
+          }
+        };
+
+    HashMap<String, ChildOrdsResult> cacheChildOrdsResult = new HashMap<>();
+    int dimCount;
+
+    for (String dim : state.getDims()) {
+      FacetsConfig.DimConfig dimConfig = stateConfig.getDimConfig(dim);
+      if (dimConfig.hierarchical) {
+        DimTree dimTree = state.getDimTree(dim);
+        int dimOrd = dimTree.dimStartOrd;
+        // get dim value
+        dimCount =
+            getDimValue(
+                dimConfig, dim, dimOrd, dimTree.iterator(), topNChildren, 
cacheChildOrdsResult);
+      } else {
+        OrdRange ordRange = state.getOrdRange(dim);
+        int dimOrd = ordRange.start;
+        PrimitiveIterator.OfInt childIt = ordRange.iterator();
+        if (dimConfig.multiValued && dimConfig.requireDimCount) {
+          // If the dim is multi-valued and requires dim counts, we know we've 
explicitly indexed
+          // the dimension and we need to skip past it so the iterator is 
positioned on the first
+          // child:
+          childIt.next();
+        }
+        dimCount = getDimValue(dimConfig, dim, dimOrd, childIt, topNChildren, 
cacheChildOrdsResult);
+      }
+
+      if (dimCount != 0) {
+        // use priority queue to store DimValueResult for topNDims
+        if (pq.size() < topNDims) {
+          pq.insertWithOverflow(new DimValueResult(dim, dimCount));
+        } else {
+          if (dimCount > pq.top().value
+              || (dimCount == pq.top().value && dim.compareTo(pq.top().dim) < 
0)) {
+            DimValueResult bottomDim = pq.pop();
+            bottomDim.dim = dim;
+            bottomDim.value = dimCount;
+            pq.insertWithOverflow(bottomDim);
+          }
+        }
+      }
+    }
+
+    // get FacetResult for topNDims
+    int resultSize = pq.size();
+    FacetResult[] results = new FacetResult[resultSize];
+
+    while (pq.size() > 0) {
+      DimValueResult dimValueResult = pq.pop();
+      FacetResult facetResult =
+          getFacetResultForDim(
+              dimValueResult.dim, topNChildren, 
cacheChildOrdsResult.get(dimValueResult.dim));
+      results[--resultSize] = facetResult;

Review comment:
       Can we put `resultSize--` in the previous line and then just pass in 
`resultSize` here. I've gotten some comments before about inline pre-increments 
like this not being the most readable, so figured I'd just let you know.

##########
File path: 
lucene/facet/src/test/org/apache/lucene/facet/sortedset/TestSortedSetDocValuesFacets.java
##########
@@ -1221,6 +1326,12 @@ public void testRandomHierarchicalFlatMix() throws 
Exception {
 
               assertEquals(expectedAllDims, actualAllDims);
 
+              // test getTopDims(1, 10)
+              if (actualAllDims.size() > 0) {
+                List<FacetResult> topDimsResults1 = facets.getTopDims(1, 10);

Review comment:
       Since this unit test is pretty thorough with what it checks, I think you 
should just `facets.getTopDims(n, 10)` where `n` goes from 1 to 
`actualAllDims.size()`.

##########
File path: 
lucene/facet/src/java/org/apache/lucene/facet/sortedset/SortedSetDocValuesFacetCounts.java
##########
@@ -411,7 +484,109 @@ public int compare(FacetResult a, FacetResult b) {
             }
           }
         });
-
     return results;
   }
+
+  @Override
+  public List<FacetResult> getTopDims(int topNDims, int topNChildren) throws 
IOException {
+    // Creates priority queue to store top dimensions and sort by their 
aggregated values/hits and
+    // string values.
+    PriorityQueue<DimValueResult> pq =
+        new PriorityQueue<>(topNDims) {
+          @Override
+          protected boolean lessThan(DimValueResult a, DimValueResult b) {
+            if (a.value > b.value) {
+              return false;
+            } else if (a.value < b.value) {
+              return true;
+            } else {
+              return a.dim.compareTo(b.dim) > 0;
+            }
+          }
+        };
+
+    HashMap<String, ChildOrdsResult> cacheChildOrdsResult = new HashMap<>();

Review comment:
       Another nit: Maybe change this name to `dimToChildOrdsResult` or 
something like that? Calling it a cache was a bit confusing to me. Ignore if 
you think the name makes sense though.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to