[jira] [Work logged] (BEAM-5355) Create GroupByKey load test for Java SDK

ASF GitHub Bot (JIRA) Thu, 08 Nov 2018 09:53:14 -0800


     [ 
https://issues.apache.org/jira/browse/BEAM-5355?focusedWorklogId=163978&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-163978
 ]


ASF GitHub Bot logged work on BEAM-5355:
----------------------------------------

                Author: ASF GitHub Bot
            Created on: 08/Nov/18 17:08
            Start Date: 08/Nov/18 17:08
    Worklog Time Spent: 10m 
      Work Description: swegner commented on a change in pull request #6987: 
[BEAM-5355] Prevent creating metrics of the same name multiple times
URL: https://github.com/apache/beam/pull/6987#discussion_r231982588
 
 

 ##########
 File path: 
sdks/java/testing/load-tests/src/main/java/org/apache/beam/sdk/loadtests/GroupByKeyLoadTest.java
 ##########
 @@ -83,15 +83,14 @@ private GroupByKeyLoadTest(String[] args) throws 
IOException {
   void loadTest() throws IOException {
     Optional<SyntheticStep> syntheticStep = 
createStep(options.getStepOptions());
 
-    PCollection<KV<byte[], byte[]>> input =
-        pipeline.apply(SyntheticBoundedIO.readFrom(sourceOptions));
+    PCollection<KV<byte[], byte[]>> input = pipeline
+        .apply(SyntheticBoundedIO.readFrom(sourceOptions))
+        .apply(ParDo.of(new MetricsMonitor(METRICS_NAMESPACE)));
 
 Review comment:
   The issue appears to be that the pipeline was creating many `MetricsMonitor` 
steps based on the `options.getFanout()` for-loop. According to the 
[javadoc](https://beam.apache.org/releases/javadoc/2.8.0/org/apache/beam/sdk/metrics/Metrics.html),
 metrics are scoped to the applied transform they are executed in:
   
   > Reported metrics are implicitly scoped to the transform within the 
pipeline that reported them. This allows reporting the same metric name in 
multiple places and identifying the value each transform reported, as well as 
aggregating the metric across
   
   I'm not familiar with this pipeline or what it's testing. If moving the 
`MetricsMonitor` earlier in the pipeline retains correctness, this seems like 
the simplest solution. If you need the `MetricMonitor` after the fanout logic 
and aggregated, other options are:
   
   1. Query the set of `MetricResult`s for each fork of the graph and aggregate 
them together locally.
   1. Flatten the result of each flattened branch together and apply a single 
`MetricsMonitor` step to the flattened result.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
-------------------

    Worklog Id:     (was: 163978)
    Time Spent: 2h 40m  (was: 2.5h)

> Create GroupByKey load test for Java SDK
> ----------------------------------------
>
>                 Key: BEAM-5355
>                 URL: https://issues.apache.org/jira/browse/BEAM-5355
>             Project: Beam
>          Issue Type: Sub-task
>          Components: testing
>            Reporter: Lukasz Gajowy
>            Assignee: Lukasz Gajowy
>            Priority: Minor
>             Fix For: Not applicable
>
>          Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> This is more thoroughly described in this proposal: 
> [https://docs.google.com/document/d/1PuIQv4v06eosKKwT76u7S6IP88AnXhTf870Rcj1AHt4/edit?usp=sharing]
>  
> In short: this ticket is about implementing the GroupByKeyLoadIT that uses 
> SyntheticStep and Synthetic source to create load on the pipeline. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Work logged] (BEAM-5355) Create GroupByKey load test for Java SDK

Reply via email to