[GitHub] [lucene] mdmarshmallow commented on a change in pull request #509: LUCENE-10250: Add support for arbitrary length hierarchical SSDV facets

GitBox Fri, 07 Jan 2022 16:18:29 -0800


mdmarshmallow commented on a change in pull request #509:
URL: https://github.com/apache/lucene/pull/509#discussion_r780593368




##########
File path: 
lucene/facet/src/java/org/apache/lucene/facet/sortedset/SortedSetDocValuesFacetField.java
##########
@@ -40,20 +41,33 @@
   /** Dimension. */
   public final String dim;
 
-  /** Label. */
-  public final String label;
+  /** Path. */
+  public final String[] path;
+
+  /** String form of path. */

Review comment:
       I'll just use this comment.

##########
File path: lucene/facet/src/java/org/apache/lucene/facet/FacetsConfig.java
##########
@@ -471,19 +471,36 @@ private void indexDrillDownTerms(
 
   private void processSSDVFacetFields(
       Map<String, List<SortedSetDocValuesFacetField>> byField, Document doc) {
+
     for (Map.Entry<String, List<SortedSetDocValuesFacetField>> ent : 
byField.entrySet()) {
 
       String indexFieldName = ent.getKey();
 
       for (SortedSetDocValuesFacetField facetField : ent.getValue()) {
-        FacetLabel facetLabel = new FacetLabel(facetField.dim, 
facetField.label);
-        String fullPath = pathToString(facetLabel.components, 
facetLabel.length);
-
-        // For facet counts:
-        doc.add(new SortedSetDocValuesField(indexFieldName, new 
BytesRef(fullPath)));
-
+        FacetLabel facetLabel = new FacetLabel(facetField.dim, 
facetField.path);
+        DimConfig dimConfig = getDimConfig(facetField.dim);
+        if (dimConfig.hierarchical) {
+          for (int i = 0; i < facetLabel.length; i++) {
+            String fullPath = pathToString(facetLabel.components, i + 1);
+            // For facet counts:
+            doc.add(new SortedSetDocValuesField(indexFieldName, new 
BytesRef(fullPath)));
+          }
+          // For drill-down:

Review comment:
       Removed

##########
File path: 
lucene/facet/src/java/org/apache/lucene/facet/sortedset/SortedSetDocValuesFacetCounts.java
##########
@@ -317,10 +345,19 @@ public Number getSpecificValue(String dim, String... 
path) throws IOException {
   public List<FacetResult> getAllDims(int topN) throws IOException {
 
     List<FacetResult> results = new ArrayList<>();
-    for (Map.Entry<String, OrdRange> ent : 
state.getPrefixToOrdRange().entrySet()) {
-      FacetResult fr = getDim(ent.getKey(), ent.getValue(), topN);
-      if (fr != null) {
-        results.add(fr);
+    for (String dim : state.getDims()) {
+      if (stateConfig.getDimConfig(dim).hierarchical) {
+        DimTree dimTree = state.getDimTree(dim);
+        FacetResult fr = getDim(dim, emptyPath, dimTree.dimStartOrd, 
dimTree.iterator(), topN);

Review comment:
       Hmm my reasoning with `dimStartOrd` was that it is the first ("start") 
ord in the dimension represented by the `DimTree`. Though I can also see your 
reasoning for calling it `dimOrd` in that it is ord of the dimension. Maybe 
just `firstOrd` or `startOrd` might be more appropriate? 

##########
File path: 
lucene/facet/src/java/org/apache/lucene/facet/sortedset/SortedSetDocValuesReaderState.java
##########
@@ -62,15 +166,28 @@ protected SortedSetDocValuesReaderState() {}
   /** Indexed field we are reading. */
   public abstract String getField();
 
+  /** Returns top-level index reader. */
+  public abstract IndexReader getReader();
+
+  /** Number of unique labels. */
+  public abstract int getSize();
+
+  /** Returns the associated facet config. */
+  public abstract FacetsConfig getFacetConfig();

Review comment:
       No worries. In fact, thanks for reviewing this so thoroughly :)

##########
File path: 
lucene/facet/src/java/org/apache/lucene/facet/sortedset/DefaultSortedSetDocValuesReaderState.java
##########
@@ -51,20 +55,42 @@
 
   private final Map<String, OrdinalMap> cachedOrdMaps = new HashMap<>();
 
+  private final FacetsConfig config;
+
+  /** Used for hierarchical dims. */
+  private final Map<String, DimTree> prefixToDimTree = new HashMap<>();
+
+  /** Used for flat dims. */
   private final Map<String, OrdRange> prefixToOrdRange = new HashMap<>();
 
   /**
-   * Creates this, pulling doc values from the default {@link
+   * Creates this with a config, pulling doc values from the default {@link
+   * FacetsConfig#DEFAULT_INDEX_FIELD_NAME}.
+   */
+  public DefaultSortedSetDocValuesReaderState(IndexReader reader, FacetsConfig 
config)
+      throws IOException {
+    this(reader, FacetsConfig.DEFAULT_INDEX_FIELD_NAME, config);
+  }
+
+  /**
+   * Creates this without a config, pulling doc values from the default {@link
    * FacetsConfig#DEFAULT_INDEX_FIELD_NAME}.
    */
   public DefaultSortedSetDocValuesReaderState(IndexReader reader) throws 
IOException {
-    this(reader, FacetsConfig.DEFAULT_INDEX_FIELD_NAME);
+    this(reader, FacetsConfig.DEFAULT_INDEX_FIELD_NAME, null);
   }
 
-  /** Creates this, pulling doc values from the specified field. */
+  /** Creates this without a config, pulling doc values from the specified 
field. */
   public DefaultSortedSetDocValuesReaderState(IndexReader reader, String 
field) throws IOException {
+    this(reader, field, null);
+  }
+
+  /** Creates this, pulling doc values from the specified field. */
+  public DefaultSortedSetDocValuesReaderState(IndexReader reader, String 
field, FacetsConfig config)
+      throws IOException {
     this.field = field;
     this.reader = reader;
+    this.config = Objects.requireNonNullElse(config, new FacetsConfig());

Review comment:
       Yeah sure, in `SSDVFacetCounts` and `ConcurrentSSDVFacetCounts`, I have 
this line which just stores the facet config of the state:
   
   ```
   this.stateConfig = state.getFacetConfig();
   ```
   I think replacing a null with the default here would be better, because then 
we can still check if the `state` has a null config (which would mean we can 
still check if a config was provided if needed). I'll move this to the 
FacetCount constructors then.
   
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] mdmarshmallow commented on a change in pull request #509: LUCENE-10250: Add support for arbitrary length hierarchical SSDV facets

Reply via email to