[ 
https://issues.apache.org/jira/browse/BEAM-5355?focusedWorklogId=164319&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-164319
 ]

ASF GitHub Bot logged work on BEAM-5355:
----------------------------------------

                Author: ASF GitHub Bot
            Created on: 09/Nov/18 11:44
            Start Date: 09/Nov/18 11:44
    Worklog Time Spent: 10m 
      Work Description: lgajowy commented on a change in pull request #6987: 
[BEAM-5355] Prevent creating metrics of the same name multiple times
URL: https://github.com/apache/beam/pull/6987#discussion_r232226908
 
 

 ##########
 File path: 
sdks/java/testing/load-tests/src/main/java/org/apache/beam/sdk/loadtests/GroupByKeyLoadTest.java
 ##########
 @@ -83,15 +83,14 @@ private GroupByKeyLoadTest(String[] args) throws 
IOException {
   void loadTest() throws IOException {
     Optional<SyntheticStep> syntheticStep = 
createStep(options.getStepOptions());
 
-    PCollection<KV<byte[], byte[]>> input =
-        pipeline.apply(SyntheticBoundedIO.readFrom(sourceOptions));
+    PCollection<KV<byte[], byte[]>> input = pipeline
+        .apply(SyntheticBoundedIO.readFrom(sourceOptions))
+        .apply(ParDo.of(new MetricsMonitor(METRICS_NAMESPACE)));
 
 Review comment:
   Thanks for the ideas. I considered the ideas before but wasn't sure that it 
is necessary to do it this way. It seems so. 
   
   I think that for collecting the total pipeline run_time (which is a 
distribution metric) it's enough to flatten all the results of all 
distributions and get min and max to calculate it. Preferably it should be 
placed at the end of the pipeline to have all the processing time captured. 
   
   For counting total bytes: it depends on a place where I measure it. It can 
be desired to have different sizes at the beginning and at the end of the 
pipeline. 
   
   It will probably require splitting the `MetricsMonitor` to `TimeMonitor` and 
`BytesMonitor`. Time monitor can be applied anywhere in the pipeline (not much 
difference because we are looking for max and min time in the whole pipeline). 
Separate `BytesMonitor`s will calculate different results depending on the 
place in the pipeline they are "attached".
   
   I will change this in some next contributions, now I wanted to show my 
thoughts. If you see flaws feel free to protest. :) 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
-------------------

    Worklog Id:     (was: 164319)
    Time Spent: 3h 10m  (was: 3h)

> Create GroupByKey load test for Java SDK
> ----------------------------------------
>
>                 Key: BEAM-5355
>                 URL: https://issues.apache.org/jira/browse/BEAM-5355
>             Project: Beam
>          Issue Type: Sub-task
>          Components: testing
>            Reporter: Lukasz Gajowy
>            Assignee: Lukasz Gajowy
>            Priority: Minor
>             Fix For: Not applicable
>
>          Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> This is more thoroughly described in this proposal: 
> [https://docs.google.com/document/d/1PuIQv4v06eosKKwT76u7S6IP88AnXhTf870Rcj1AHt4/edit?usp=sharing]
>  
> In short: this ticket is about implementing the GroupByKeyLoadIT that uses 
> SyntheticStep and Synthetic source to create load on the pipeline. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to