Re: [PR] Multi stage stats [pinot]

via GitHub Tue, 30 Apr 2024 02:04:17 -0700


gortiz commented on code in PR #12704:
URL: https://github.com/apache/pinot/pull/12704#discussion_r1584406372



##########
pinot-query-runtime/src/main/java/org/apache/pinot/query/runtime/operator/LeafStageTransferableBlockOperator.java:
##########
@@ -419,4 +550,122 @@ public void send(BaseResultsBlock block)
       addResultsBlock(block);
     }
   }
+
+  public enum StatKey implements StatMap.Key {
+    TABLE(StatMap.Type.STRING),
+    EXECUTION_TIME_MS(StatMap.Type.LONG, null, 
DataTable.MetadataKey.TIME_USED_MS) {
+      @Override
+      public boolean includeDefaultInJson() {
+        return true;
+      }
+    },
+    EMITTED_ROWS(StatMap.Type.LONG, null, DataTable.MetadataKey.NUM_ROWS) {
+      @Override
+      public boolean includeDefaultInJson() {
+        return true;
+      }
+    },
+    NUM_DOCS_SCANNED(StatMap.Type.LONG),
+    NUM_ENTRIES_SCANNED_IN_FILTER(StatMap.Type.LONG),
+    NUM_ENTRIES_SCANNED_POST_FILTER(StatMap.Type.LONG),
+    NUM_SEGMENTS_QUERIED(StatMap.Type.INT),
+    NUM_SEGMENTS_PROCESSED(StatMap.Type.INT),
+    NUM_SEGMENTS_MATCHED(StatMap.Type.INT),
+    NUM_CONSUMING_SEGMENTS_QUERIED(StatMap.Type.INT),
+    // the timestamp indicating the freshness of the data queried in consuming 
segments.
+    // This can be ingestion timestamp if provided by the stream, or the last 
index time
+    MIN_CONSUMING_FRESHNESS_TIME_MS(StatMap.Type.LONG) {
+      @Override
+      public long merge(long value1, long value2) {
+        return StatMap.Key.minPositive(value1, value2);
+      }
+    },
+    TOTAL_DOCS(StatMap.Type.LONG),
+    NUM_GROUPS_LIMIT_REACHED(StatMap.Type.BOOLEAN),
+    //TRACE_INFO(StatMap.Type.STRING),
+    //REQUEST_ID(StatMap.Type.LONG),
+    NUM_RESIZES(StatMap.Type.INT),
+    RESIZE_TIME_MS(StatMap.Type.LONG),
+    THREAD_CPU_TIME_NS(StatMap.Type.LONG),
+    SYSTEM_ACTIVITIES_CPU_TIME_NS(StatMap.Type.LONG),
+    RESPONSE_SER_CPU_TIME_NS(StatMap.Type.LONG, 
"responseSerializationCpuTimeNs"),
+    NUM_SEGMENTS_PRUNED_BY_SERVER(StatMap.Type.INT),
+    NUM_SEGMENTS_PRUNED_INVALID(StatMap.Type.INT),
+    NUM_SEGMENTS_PRUNED_BY_LIMIT(StatMap.Type.INT),
+    NUM_SEGMENTS_PRUNED_BY_VALUE(StatMap.Type.INT),
+    //EXPLAIN_PLAN_NUM_EMPTY_FILTER_SEGMENTS(StatMap.Type.INT),
+    //EXPLAIN_PLAN_NUM_MATCH_ALL_FILTER_SEGMENTS(StatMap.Type.INT),
+    NUM_CONSUMING_SEGMENTS_PROCESSED(StatMap.Type.INT),
+    NUM_CONSUMING_SEGMENTS_MATCHED(StatMap.Type.INT),
+    NUM_BLOCKS(StatMap.Type.INT),
+    OPERATOR_EXECUTION_TIME_MS(StatMap.Type.LONG),
+    OPERATOR_ID(StatMap.Type.STRING),
+    OPERATOR_EXEC_START_TIME_MS(StatMap.Type.LONG) {
+      @Override
+      public long merge(long value1, long value2) {
+        return StatMap.Key.minPositive(value1, value2);
+      }
+    },
+    OPERATOR_EXEC_END_TIME_MS(StatMap.Type.LONG) {
+      @Override
+      public long merge(long value1, long value2) {
+        return Math.max(value1, value2);
+      }
+    },;
+    private final StatMap.Type _type;
+    @Nullable
+    private final DataTable.MetadataKey _v1Key;
+    private final String _statName;
+
+    StatKey(StatMap.Type type) {
+      this(type, null);
+    }
+
+    StatKey(StatMap.Type type, @Nullable String statName) {
+      _type = type;
+      _statName = statName == null ? StatMap.getDefaultStatName(this) : 
statName;
+      _v1Key = DataTable.MetadataKey.getByName(getStatName());
+    }
+
+    StatKey(StatMap.Type type, @Nullable String statName, @Nullable 
DataTable.MetadataKey v1Key) {
+      _type = type;
+      _statName = statName == null ? StatMap.getDefaultStatName(this) : 
statName;
+      _v1Key = v1Key == null ? DataTable.MetadataKey.getByName(getStatName()) 
: v1Key;
+    }
+
+    @Override
+    public String getStatName() {
+      return _statName;
+    }
+
+    @Override
+    public StatMap.Type getType() {
+      return _type;
+    }
+
+    public void updateV1Metadata(StatMap<DataTable.MetadataKey> oldMetadata, 
StatMap<StatKey> stats) {

Review Comment:
   > Thus I would model StatMap.Key as the stats key for v2 engine, which is 
logically separated from v1 engine stats. 
   
   I think that is a difference we have in our models. In my model 
`StatMap.Key` is not related to V2 engine. It is just an efficient map from 
enums to primitives that we designed to be used in V2, but could be used in 
other places.
   
   > From implementation perspective, based on my code reading (correct me if I 
missed some important steps), for multi-stage engine, on the broker side, we 
only get the stats for leaf stage operator. 
   
   I think the correct formulation would be *for multi-stage engine, on the 
broker side, we **mostly** get the stats for leaf stage operator*. There are 
some metrics, like `maxRowsInOperator`, `maxRowsInJoinReached` or 
`numGroupsLimitReached` that are collected from other multi-stage operators 
different than leaf stage. For example `BrokerNative.numGroupsLimitReached` is 
true if any aggregation in the tree has reached the group limits. In V1 that is 
the only info we have. In V2 we can iterate over `statsMap` and get which 
specific group by is the one that reached the limit.
   
   Anyway, yesterday I modified the code so MetadataKey does not implement 
`StatMap.Key`. Instead I created a new enum with broker stats.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org

Re: [PR] Multi stage stats [pinot]

Reply via email to