Re: [PR] Multi stage stats [pinot]

via GitHub Mon, 29 Apr 2024 14:44:05 -0700


Jackie-Jiang commented on code in PR #12704:
URL: https://github.com/apache/pinot/pull/12704#discussion_r1583818070



##########
pinot-query-runtime/src/main/java/org/apache/pinot/query/runtime/operator/LeafStageTransferableBlockOperator.java:
##########
@@ -419,4 +550,122 @@ public void send(BaseResultsBlock block)
       addResultsBlock(block);
     }
   }
+
+  public enum StatKey implements StatMap.Key {
+    TABLE(StatMap.Type.STRING),
+    EXECUTION_TIME_MS(StatMap.Type.LONG, null, 
DataTable.MetadataKey.TIME_USED_MS) {
+      @Override
+      public boolean includeDefaultInJson() {
+        return true;
+      }
+    },
+    EMITTED_ROWS(StatMap.Type.LONG, null, DataTable.MetadataKey.NUM_ROWS) {
+      @Override
+      public boolean includeDefaultInJson() {
+        return true;
+      }
+    },
+    NUM_DOCS_SCANNED(StatMap.Type.LONG),
+    NUM_ENTRIES_SCANNED_IN_FILTER(StatMap.Type.LONG),
+    NUM_ENTRIES_SCANNED_POST_FILTER(StatMap.Type.LONG),
+    NUM_SEGMENTS_QUERIED(StatMap.Type.INT),
+    NUM_SEGMENTS_PROCESSED(StatMap.Type.INT),
+    NUM_SEGMENTS_MATCHED(StatMap.Type.INT),
+    NUM_CONSUMING_SEGMENTS_QUERIED(StatMap.Type.INT),
+    // the timestamp indicating the freshness of the data queried in consuming 
segments.
+    // This can be ingestion timestamp if provided by the stream, or the last 
index time
+    MIN_CONSUMING_FRESHNESS_TIME_MS(StatMap.Type.LONG) {
+      @Override
+      public long merge(long value1, long value2) {
+        return StatMap.Key.minPositive(value1, value2);
+      }
+    },
+    TOTAL_DOCS(StatMap.Type.LONG),
+    NUM_GROUPS_LIMIT_REACHED(StatMap.Type.BOOLEAN),
+    //TRACE_INFO(StatMap.Type.STRING),
+    //REQUEST_ID(StatMap.Type.LONG),
+    NUM_RESIZES(StatMap.Type.INT),
+    RESIZE_TIME_MS(StatMap.Type.LONG),
+    THREAD_CPU_TIME_NS(StatMap.Type.LONG),
+    SYSTEM_ACTIVITIES_CPU_TIME_NS(StatMap.Type.LONG),
+    RESPONSE_SER_CPU_TIME_NS(StatMap.Type.LONG, 
"responseSerializationCpuTimeNs"),
+    NUM_SEGMENTS_PRUNED_BY_SERVER(StatMap.Type.INT),
+    NUM_SEGMENTS_PRUNED_INVALID(StatMap.Type.INT),
+    NUM_SEGMENTS_PRUNED_BY_LIMIT(StatMap.Type.INT),
+    NUM_SEGMENTS_PRUNED_BY_VALUE(StatMap.Type.INT),
+    //EXPLAIN_PLAN_NUM_EMPTY_FILTER_SEGMENTS(StatMap.Type.INT),
+    //EXPLAIN_PLAN_NUM_MATCH_ALL_FILTER_SEGMENTS(StatMap.Type.INT),
+    NUM_CONSUMING_SEGMENTS_PROCESSED(StatMap.Type.INT),
+    NUM_CONSUMING_SEGMENTS_MATCHED(StatMap.Type.INT),
+    NUM_BLOCKS(StatMap.Type.INT),
+    OPERATOR_EXECUTION_TIME_MS(StatMap.Type.LONG),
+    OPERATOR_ID(StatMap.Type.STRING),
+    OPERATOR_EXEC_START_TIME_MS(StatMap.Type.LONG) {
+      @Override
+      public long merge(long value1, long value2) {
+        return StatMap.Key.minPositive(value1, value2);
+      }
+    },
+    OPERATOR_EXEC_END_TIME_MS(StatMap.Type.LONG) {
+      @Override
+      public long merge(long value1, long value2) {
+        return Math.max(value1, value2);
+      }
+    },;
+    private final StatMap.Type _type;
+    @Nullable
+    private final DataTable.MetadataKey _v1Key;
+    private final String _statName;
+
+    StatKey(StatMap.Type type) {
+      this(type, null);
+    }
+
+    StatKey(StatMap.Type type, @Nullable String statName) {
+      _type = type;
+      _statName = statName == null ? StatMap.getDefaultStatName(this) : 
statName;
+      _v1Key = DataTable.MetadataKey.getByName(getStatName());
+    }
+
+    StatKey(StatMap.Type type, @Nullable String statName, @Nullable 
DataTable.MetadataKey v1Key) {
+      _type = type;
+      _statName = statName == null ? StatMap.getDefaultStatName(this) : 
statName;
+      _v1Key = v1Key == null ? DataTable.MetadataKey.getByName(getStatName()) 
: v1Key;
+    }
+
+    @Override
+    public String getStatName() {
+      return _statName;
+    }
+
+    @Override
+    public StatMap.Type getType() {
+      return _type;
+    }
+
+    public void updateV1Metadata(StatMap<DataTable.MetadataKey> oldMetadata, 
StatMap<StatKey> stats) {

Review Comment:
   Let me try to explain why I don't like having `MetadataKey` implementing 
`StatMap.Key`. To me `MetadataKey` is the stats key for `v1` engine, and some 
stats don't apply to `v2` engine. Thus I would model `StatMap.Key` as the stats 
key for `v2` engine, which is logically separated from `v1` engine stats. 
Making `MetadataKey` implementing `StatMap.Key` is essentially declaring `v1` 
engine stats part of `v2` stats, which is a coupling of `v1` and `v2`.
   From implementation perspective, based on my code reading (correct me if I 
missed some important steps), for multi-stage engine, on the broker side, we 
only get the stats for leaf stage operator. Then we convert the v2 stats into 
v1 stats (`MultiStageOperator.Type.LEAF.mergeInto()`), then finally call 
`BrokerResponseNativeV2.addServerStats()` to add v1 stats into v2 response. I 
don't see other usage of `MetadataKey` within v2 engine flow, and I don't 
really see the benefits of this back and forth conversion. If all the v1 stats 
are from v2 leaf stage stats, we should be able to directly use leaf stage 
stats to serve the stats within v2 response.
   Let's bring this offline to ensure we are on the same page and I'm not 
missing important point.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org

Re: [PR] Multi stage stats [pinot]

Reply via email to