Github user HeartSaVioR commented on the issue:

    https://github.com/apache/spark/pull/21721
  
    I spent more hours to take a look at how SQL UI can update the metrics 
information before task ends, and now I guess I may understand what was the 
concern from @cloud-fan here.
    
    This is different from how we allow custom metrics in StateStore. Every SQL 
metrics even custom metrics in StateStore are accumulators, which are taken 
care of executor heartbeat (Honestly I didn't notice it. My bad) and UI updates 
these information. Custom metrics in StateStore is only updated when state 
operation is going to be finished for each partition, but they are exposed to 
SQL UI anyway and gets updated dynamically in the UI (I meant the values can be 
updated even for running batch).
    
    With StreamingQueryProgress, we are also exposing information which are 
only calculated when they're needed, and now it is when finishTrigger is 
called, so mostly batch ends. Custom metrics in this patch placed here: they're 
additional information for StreamingQueryProgress, hence intentional to be 
updated per batch. They're not actually SQL metrics, but its name would lead 
someone thinking why it doesn't follow SQL metrics. Maybe the name matters?
    
    So there're two desires to add custom information:
    
    1. metrics to be updated in every heartbeat: they will be exposed to SQL 
UI, and also can be collected and added to StreamingQueryProgress like custom 
metrics in StateStore.
    2. information to be updated in each batch: they will be exposed to only 
StreamingQueryProgress.
    
    And the target of the patch is latter.
    
    But we know 2 is only applied to micro-batch, and current 
StreamingQueryProgress is not suitable for continuous mode because of these 
reasons: 1. Unless we stop processing or snapshot metrics once epoch ends, 
metrics can't be correct for specific epoch. 2. Showing the information for 
latest epoch (which all partitions finished) no longer represents the most 
recent. 3. Some metrics are expected to be reset per batch, whereas it doesn't 
happen in continuous mode. If we reset metrics per epoch, metrics in SQL tab in 
UI will be really looking odd (because it just shows current state of metrics, 
not bound to epoch).
    
    So IMHO it's likely that StreamingQueryProgress will not be available for 
continuous mode even afterwards (not only for custom metrics), and we may want 
to rely on running SQL metrics. That's actually what other streaming frameworks 
are providing metrics as of now, but they are also showing these metrics as 
aggregated values in time window or even time-series. Spark doesn't need to 
have such feature for batch and micro-batch, but in continuous mode, without 
that feature these SQL metrics will be really hard to see after long run (say 1 
month). That's the hard thing when we want to make modes being transparent: the 
requirements of metrics for batch/micro-batch and continuous mode are just 
different, and metrics may not be only issue.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to