[jira] [Commented] (FLINK-10320) Introduce JobMaster schedule micro-benchmark

Piotr Nowojski (JIRA) Fri, 21 Sep 2018 05:09:18 -0700


    [ 
https://issues.apache.org/jira/browse/FLINK-10320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16623481#comment-16623481
 ]


Piotr Nowojski commented on FLINK-10320:
----------------------------------------

[~till.rohrmann] might have a good point. [~Tison] could you provide a profiler 
logs for both JobManager and the TaskManager during this 10,000 parallelism 
scheduling issue? Maybe we could even narrow down the problematic component and 
write benchmarks target specifically for it instead of for whole JobManager?

> Introduce JobMaster schedule micro-benchmark
> --------------------------------------------
>
>                 Key: FLINK-10320
>                 URL: https://issues.apache.org/jira/browse/FLINK-10320
>             Project: Flink
>          Issue Type: Improvement
>          Components: Tests
>            Reporter: tison
>            Assignee: tison
>            Priority: Major
>
> Based on {{org.apache.flink.streaming.runtime.io.benchmark}} stuff and the 
> repo [flink-benchmark|https://github.com/dataArtisans/flink-benchmarks], I 
> proposal to introduce another micro-benchmark which focuses on {{JobMaster}} 
> schedule performance
> h3. Target
> Benchmark how long from {{JobMaster}} startup(receive the {{JobGraph}} and 
> init) to all tasks RUNNING. Technically we use bounded stream and TM finishes 
> tasks as soon as they arrived. So the real interval we measure is to all 
> tasks FINISHED.
> h3. Case
> 1. JobGraph that cover EAGER + PIPELINED edges
> 2. JobGraph that cover LAZY_FROM_SOURCES + PIPELINED edges
> 3. JobGraph that cover LAZY_FROM_SOURCES + BLOCKING edges
> ps: maybe benchmark if the source is get from {{InputSplit}}?
> h3. Implement
> Based on the flink-benchmark repo, we finally run benchmark using jmh. So the 
> whole test suit is separated into two repos. The testing environment could be 
> located in the main repo, maybe under 
> flink-runtime/src/test/java/org/apache/flink/runtime/jobmaster/benchmark.
> To measure the performance of {{JobMaster}} scheduling, we need to simulate 
> an environment that:
> 1. has a real {{JobMaster}}
> 2. has a mock/testing {{ResourceManager}} that having infinite resource and 
> react immediately.
> 3. has a(many?) mock/testing {{TaskExecutor}} that deploy and finish tasks 
> immediately.
> [~trohrm...@apache.org] [~GJL] [~pnowojski] could you please review this 
> proposal to help clarify the goal and concrete details? Thanks in advance.
> Any suggestions are welcome.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (FLINK-10320) Introduce JobMaster schedule micro-benchmark

Reply via email to