[GitHub] [spark] HeartSaVioR commented on a change in pull request #30336: [SPARK-33287][SS][UI]Expose state custom metrics information on SS UI

GitBox Mon, 16 Nov 2020 18:42:22 -0800


HeartSaVioR commented on a change in pull request #30336:
URL: https://github.com/apache/spark/pull/30336#discussion_r524844943




##########
File path: 
sql/core/src/main/scala/org/apache/spark/sql/streaming/ui/StreamingQueryStatisticsPage.scala
##########
@@ -199,49 +201,99 @@ private[ui] class StreamingQueryStatisticsPage(parent: 
StreamingQueryTab)
           "records")
       graphUIDataForNumRowsDroppedByWatermark.generateDataJs(jsCollector)
 
-      // scalastyle:off
-      <tr>
-        <td style="vertical-align: middle;">
-          <div style="width: 160px;">
-            <div><strong>Aggregated Number Of Total State Rows 
{SparkUIUtils.tooltip("Aggregated number of total state rows.", 
"right")}</strong></div>
-          </div>
-        </td>
-        <td 
class={"aggregated-num-total-state-rows-timeline"}>{graphUIDataForNumberTotalRows.generateTimelineHtml(jsCollector)}</td>
-        <td 
class={"aggregated-num-total-state-rows-histogram"}>{graphUIDataForNumberTotalRows.generateHistogramHtml(jsCollector)}</td>
-      </tr>
-        <tr>
-          <td style="vertical-align: middle;">
-            <div style="width: 160px;">
-              <div><strong>Aggregated Number Of Updated State Rows 
{SparkUIUtils.tooltip("Aggregated number of updated state rows.", 
"right")}</strong></div>
-            </div>
-          </td>
-          <td 
class={"aggregated-num-updated-state-rows-timeline"}>{graphUIDataForNumberUpdatedRows.generateTimelineHtml(jsCollector)}</td>
-          <td 
class={"aggregated-num-updated-state-rows-histogram"}>{graphUIDataForNumberUpdatedRows.generateHistogramHtml(jsCollector)}</td>
-        </tr>
-        <tr>
-          <td style="vertical-align: middle;">
-            <div style="width: 160px;">
-              <div><strong>Aggregated State Memory Used In Bytes 
{SparkUIUtils.tooltip("Aggregated state memory used in bytes.", 
"right")}</strong></div>
-            </div>
-          </td>
-          <td 
class={"aggregated-state-memory-used-bytes-timeline"}>{graphUIDataForMemoryUsedBytes.generateTimelineHtml(jsCollector)}</td>
-          <td 
class={"aggregated-state-memory-used-bytes-histogram"}>{graphUIDataForMemoryUsedBytes.generateHistogramHtml(jsCollector)}</td>
-        </tr>
+      val result =
+        // scalastyle:off
         <tr>
           <td style="vertical-align: middle;">
             <div style="width: 160px;">
-              <div><strong>Aggregated Number Of State Rows Dropped By 
Watermark {SparkUIUtils.tooltip("Aggregated number of state rows dropped by 
watermark.", "right")}</strong></div>
+              <div><strong>Aggregated Number Of Total State Rows 
{SparkUIUtils.tooltip("Aggregated number of total state rows.", 
"right")}</strong></div>
             </div>
           </td>
-          <td 
class={"aggregated-num-state-rows-dropped-by-watermark-timeline"}>{graphUIDataForNumRowsDroppedByWatermark.generateTimelineHtml(jsCollector)}</td>
-          <td 
class={"aggregated-num-state-rows-dropped-by-watermark-histogram"}>{graphUIDataForNumRowsDroppedByWatermark.generateHistogramHtml(jsCollector)}</td>
+          <td 
class={"aggregated-num-total-state-rows-timeline"}>{graphUIDataForNumberTotalRows.generateTimelineHtml(jsCollector)}</td>
+          <td 
class={"aggregated-num-total-state-rows-histogram"}>{graphUIDataForNumberTotalRows.generateHistogramHtml(jsCollector)}</td>
         </tr>
-      // scalastyle:on
+          <tr>
+            <td style="vertical-align: middle;">
+              <div style="width: 160px;">
+                <div><strong>Aggregated Number Of Updated State Rows 
{SparkUIUtils.tooltip("Aggregated number of updated state rows.", 
"right")}</strong></div>
+              </div>
+            </td>
+            <td 
class={"aggregated-num-updated-state-rows-timeline"}>{graphUIDataForNumberUpdatedRows.generateTimelineHtml(jsCollector)}</td>
+            <td 
class={"aggregated-num-updated-state-rows-histogram"}>{graphUIDataForNumberUpdatedRows.generateHistogramHtml(jsCollector)}</td>
+          </tr>
+          <tr>
+            <td style="vertical-align: middle;">
+              <div style="width: 160px;">
+                <div><strong>Aggregated State Memory Used In Bytes 
{SparkUIUtils.tooltip("Aggregated state memory used in bytes.", 
"right")}</strong></div>
+              </div>
+            </td>
+            <td 
class={"aggregated-state-memory-used-bytes-timeline"}>{graphUIDataForMemoryUsedBytes.generateTimelineHtml(jsCollector)}</td>
+            <td 
class={"aggregated-state-memory-used-bytes-histogram"}>{graphUIDataForMemoryUsedBytes.generateHistogramHtml(jsCollector)}</td>
+          </tr>
+          <tr>
+            <td style="vertical-align: middle;">
+              <div style="width: 160px;">
+                <div><strong>Aggregated Number Of State Rows Dropped By 
Watermark {SparkUIUtils.tooltip("Aggregated number of state rows dropped by 
watermark.", "right")}</strong></div>
+              </div>
+            </td>
+            <td 
class={"aggregated-num-state-rows-dropped-by-watermark-timeline"}>{graphUIDataForNumRowsDroppedByWatermark.generateTimelineHtml(jsCollector)}</td>
+            <td 
class={"aggregated-num-state-rows-dropped-by-watermark-histogram"}>{graphUIDataForNumRowsDroppedByWatermark.generateHistogramHtml(jsCollector)}</td>
+          </tr>
+        // scalastyle:on
+
+      result ++= generateAggregatedCustomMetrics(query, minBatchTime, 
maxBatchTime, jsCollector)
+      result
     } else {
       new NodeBuffer()
     }
   }
 
+  def generateAggregatedCustomMetrics(
+     query: StreamingQueryUIData,
+     minBatchTime: Long,
+     maxBatchTime: Long,
+     jsCollector: JsCollector): NodeBuffer = {
+    val result: NodeBuffer = new NodeBuffer
+
+    // This is made sure on caller side but put it here to be defensive
+    require(query.lastProgress.stateOperators.nonEmpty)
+    val enabledCustomMetrics = 
parent.parent.conf.get(ENABLED_STREAMING_UI_CUSTOM_METRIC_LIST)
+    logDebug(s"Enabled custom metrics: $enabledCustomMetrics")
+    query.lastProgress.stateOperators.head.customMetrics.keySet().asScala

Review comment:
       OK you found a good point I was missing.
   
   Regarding subclassing I think it's not intended to - Spark doesn't have any 
actual implementation for overriding the method, as well as we don't expect 
user code to implement `StateStoreWriter` and kick in to physical execution 
node. So as you said that's technically possible, that looks to be beyond 
intention. (Seems to be missing guard, but not sure we have been strictly doing 
it.)
   
   And regarding the possibility of having different custom metrics among stat 
store provider instances like I mentioned:
   
   
https://github.com/apache/spark/blob/5af5aa146ecbff38b809127b5eb9805441627ed2/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/statefulOperators.scala#L127-L137
   
   Here the list of custom metrics are being picked up, and it is populated 
from dummy StateStoreProvider instance. That said, Spark "expects" the same 
value of `supportedCustomMetrics` across StateStoreProvider instances created 
from same provider class.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HeartSaVioR commented on a change in pull request #30336: [SPARK-33287][SS][UI]Expose state custom metrics information on SS UI

Reply via email to