[jira] [Comment Edited] (FLINK-23593) Performance regression on 15.07.2021

2021-08-03 Thread Timo Walther (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-23593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17392337#comment-17392337
 ] 

Timo Walther edited comment on FLINK-23593 at 8/3/21, 2:35 PM:
---

I performed a couple of benchmarks locally for the previously mentioned flags. 
I don't think that FLINK-23372 caused a major regression. However, it seems we 
definitely added some regression recently that slows down this benchmark:

{code}
bb175622e3 (1.13 cut)

Benchmark   
Mode  Cnt ScoreError   Units
SortingBoundedInputBenchmarks.sortedTwoInputNoSlotSharingBlocking  
thrpt   30  1753.489 ± 15.902  ops/ms
SortingBoundedInputBenchmarks.sortedTwoInputNoSlotSharingForwardPipelined  
thrpt   30  1782.957 ± 21.945  ops/ms
SortingBoundedInputBenchmarks.sortedTwoInputNoSlotSharingPipelined 
thrpt   30  1870.771 ± 50.255  ops/ms
SortingBoundedInputBenchmarks.sortedTwoInputSlotSharingBlocking
thrpt   30  1836.818 ± 17.767  ops/ms
SortingBoundedInputBenchmarks.sortedTwoInputSlotSharingForwardPipelined
thrpt   30  1809.482 ± 26.410  ops/ms
SortingBoundedInputBenchmarks.sortedTwoInputSlotSharingPipelined   
thrpt   30  1929.729 ± 21.632  ops/ms


d8b1a6fd36 (FLINK-23593)

Benchmark   
Mode  Cnt ScoreError   Units
SortingBoundedInputBenchmarks.sortedTwoInputNoSlotSharingBlocking  
thrpt   30  1887.372 ± 27.990  ops/ms
SortingBoundedInputBenchmarks.sortedTwoInputNoSlotSharingForwardPipelined  
thrpt   30  1875.029 ± 20.378  ops/ms
SortingBoundedInputBenchmarks.sortedTwoInputNoSlotSharingPipelined 
thrpt   30  1985.825 ± 25.675  ops/ms
SortingBoundedInputBenchmarks.sortedTwoInputSlotSharingBlocking
thrpt   30  1834.068 ± 48.316  ops/ms
SortingBoundedInputBenchmarks.sortedTwoInputSlotSharingForwardPipelined
thrpt   30  1833.997 ± 30.467  ops/ms
SortingBoundedInputBenchmarks.sortedTwoInputSlotSharingPipelined   
thrpt   30  2015.552 ± 27.705  ops/ms

6aa0a8a0dd (master)

Benchmark   
Mode  Cnt ScoreError   Units
SortingBoundedInputBenchmarks.sortedTwoInputNoSlotSharingBlocking  
thrpt   30  1642.628 ± 21.183  ops/ms
SortingBoundedInputBenchmarks.sortedTwoInputNoSlotSharingForwardPipelined  
thrpt   30  1672.128 ± 15.114  ops/ms
SortingBoundedInputBenchmarks.sortedTwoInputNoSlotSharingPipelined 
thrpt   30  1761.725 ± 18.225  ops/ms
SortingBoundedInputBenchmarks.sortedTwoInputSlotSharingBlocking
thrpt   30  1681.684 ± 17.065  ops/ms
SortingBoundedInputBenchmarks.sortedTwoInputSlotSharingForwardPipelined
thrpt   30  1689.087 ± 18.509  ops/ms
SortingBoundedInputBenchmarks.sortedTwoInputSlotSharingPipelined   
thrpt   30  1731.022 ± 32.813  ops/ms
{code}

Branch: https://github.com/twalthr/flink-benchmarks/tree/FLINK-23593


was (Author: twalthr):
I performed a couple of benchmarks locally for the previously mentioned flags. 
I don't think that FLINK-23372 caused a major regression. However, it seems we 
definitely added some regression recently that slows down this benchmark:

{code}
bb175622e3 (1.13 cut)

Benchmark   
Mode  Cnt ScoreError   Units
SortingBoundedInputBenchmarks.sortedTwoInputNoSlotSharingBlocking  
thrpt   30  1753.489 ± 15.902  ops/ms
SortingBoundedInputBenchmarks.sortedTwoInputNoSlotSharingForwardPipelined  
thrpt   30  1782.957 ± 21.945  ops/ms
SortingBoundedInputBenchmarks.sortedTwoInputNoSlotSharingPipelined 
thrpt   30  1870.771 ± 50.255  ops/ms
SortingBoundedInputBenchmarks.sortedTwoInputSlotSharingBlocking
thrpt   30  1836.818 ± 17.767  ops/ms
SortingBoundedInputBenchmarks.sortedTwoInputSlotSharingForwardPipelined
thrpt   30  1809.482 ± 26.410  ops/ms
SortingBoundedInputBenchmarks.sortedTwoInputSlotSharingPipelined   
thrpt   30  1929.729 ± 21.632  ops/ms


d8b1a6fd36 (FLINK-23593)

Benchmark   
Mode  Cnt ScoreError   Units
SortingBoundedInputBenchmarks.sortedTwoInputNoSlotSharingBlocking  
thrpt   30  1887.372 ± 27.990  ops/ms
SortingBoundedInputBenchmarks.sortedTwoInputNoSlotSharingForwardPipelined  
thrpt   30  1875.029 ± 20.378  ops/ms
SortingBoundedInputBenchmarks.sortedTwoInputNoSlotSharingPipelined 
thrpt   30  1985.825 ± 25.675  ops/ms
SortingBoundedInputBenchmarks.sortedTwoInputSlotSharingBlocking
thrpt   30  1834.068 ± 48.316  ops/ms
SortingBoundedInputBenchmarks.sortedTwoInputSlotSharingForwardPipelined
thrpt   30  1833.997 ± 30.467  ops/ms
SortingBoundedInputBenchmarks.sortedTwoInputSlotSharingPipelined   
thrpt   30  2015.552 ± 27.705  o

[jira] [Comment Edited] (FLINK-23593) Performance regression on 15.07.2021

2021-08-03 Thread Piotr Nowojski (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-23593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17392423#comment-17392423
 ] 

Piotr Nowojski edited comment on FLINK-23593 at 8/3/21, 4:58 PM:
-

Because of https://issues.apache.org/jira/browse/FLINK-23392, 
https://issues.apache.org/jira/browse/FLINK-23560, you can not compare the 
results from July 15th to the current results. Also because of various braking 
changes like https://issues.apache.org/jira/browse/FLINK-23464, you can not use 
the benchmarking code from current `flink-benchmarks` master to run old `flink` 
code. You have to use both Flink and flink-benchmarks code from the the time of 
the regression.

I was able quite easily reproduce the regression of FLINK-23392 using 
flink-benchmarks commit: d816a18

http://codespeed.dak8s.net:8080/job/flink-benchmark-request/345/ (last good, 
flink commit: 4a78097d038)

{noformat}
"Benchmark","Mode","Threads","Samples","Score","Score Error (99.9%)","Unit"
"org.apache.flink.benchmark.SortingBoundedInputBenchmarks.sortedMultiInput","thrpt",1,30,1996.460479,28.904057,"ops/ms"
"org.apache.flink.benchmark.SortingBoundedInputBenchmarks.sortedOneInput","thrpt",1,30,2337.385239,43.234577,"ops/ms"
"org.apache.flink.benchmark.SortingBoundedInputBenchmarks.sortedTwoInput","thrpt",1,30,1946.457665,28.919437,"ops/ms"
{noformat}

http://codespeed.dak8s.net:8080/job/flink-benchmark-request/347/ (first bad, 
flink commit: d8b1a6fd368)

{noformat}
"Benchmark","Mode","Threads","Samples","Score","Score Error (99.9%)","Unit"
"org.apache.flink.benchmark.SortingBoundedInputBenchmarks.sortedMultiInput","thrpt",1,30,1837.391829,23.495855,"ops/ms"
"org.apache.flink.benchmark.SortingBoundedInputBenchmarks.sortedOneInput","thrpt",1,30,2370.271382,37.804557,"ops/ms"
"org.apache.flink.benchmark.SortingBoundedInputBenchmarks.sortedTwoInput","thrpt",1,30,1788.425393,22.619503,"ops/ms"
{noformat}



was (Author: pnowojski):
Because of https://issues.apache.org/jira/browse/FLINK-23392, 
https://issues.apache.org/jira/browse/FLINK-23560, you can not compare the 
results from July 15th to the current results. Also because of various braking 
changes like https://issues.apache.org/jira/browse/FLINK-23464, you can not use 
the benchmarking code from current `flink-benchmarks` master to run old `flink` 
code. You have to use both Flink and flink-benchmarks code from the the time of 
the regression.

I was able quite easily reproduce the regression of FLINK-23392 using 
flink-benchmarks commit: d816a18

http://codespeed.dak8s.net:8080/job/flink-benchmark-request/345/ (last good, 
flink commit: 4a78097d038)
http://codespeed.dak8s.net:8080/job/flink-benchmark-request/347/ (first bad, 
flink commit: d8b1a6fd368)


> Performance regression on 15.07.2021
> 
>
> Key: FLINK-23593
> URL: https://issues.apache.org/jira/browse/FLINK-23593
> Project: Flink
>  Issue Type: Bug
>  Components: API / DataStream, Benchmarks
>Affects Versions: 1.14.0
>Reporter: Piotr Nowojski
>Assignee: Timo Walther
>Priority: Blocker
> Fix For: 1.14.0
>
>
> http://codespeed.dak8s.net:8000/timeline/?ben=sortedMultiInput&env=2
> http://codespeed.dak8s.net:8000/timeline/?ben=sortedTwoInput&env=2
> {noformat}
> pnowojski@piotr-mbp: [~/flink -  ((no branch, bisect started on pr/16589))] $ 
> git ls f4afbf3e7de..eb8100f7afe
> eb8100f7afe [4 weeks ago] (pn/bad, bad, refs/bisect/bad) 
> [FLINK-22017][coordination] Allow BLOCKING result partition to be 
> individually consumable [Thesharing]
> d2005268b1e [4 weeks ago] (HEAD, pn/bisect-4, bisect-4) 
> [FLINK-22017][coordination] Get the ConsumedPartitionGroup that 
> IntermediateResultPartition and DefaultResultPartition belong to [Thesharing]
> d8b1a6fd368 [3 weeks ago] [FLINK-23372][streaming-java] Disable 
> AllVerticesInSameSlotSharingGroupByDefault in batch mode [Timo Walther]
> 4a78097d038 [3 weeks ago] (pn/bisect-3, bisect-3, 
> refs/bisect/good-4a78097d0385749daceafd8326930c8cc5f26f1a) 
> [FLINK-21928][clients][runtime] Introduce static method constructors of 
> DuplicateJobSubmissionException for better readability. [David Moravek]
> 172b9e32215 [3 weeks ago] [FLINK-21928][clients] JobManager failover should 
> succeed, when trying to resubmit already terminated job in application mode. 
> [David Moravek]
> f483008db86 [3 weeks ago] [FLINK-21928][core] Introduce 
> org.apache.flink.util.concurrent.FutureUtils#handleException method, that 
> allows future to recover from the specied exception. [David Moravek]
> d7ac08c2ac0 [3 weeks ago] (pn/bisect-2, bisect-2, 
> refs/bisect/good-d7ac08c2ac06b9ff31707f3b8f43c07817814d4f) 
> [FLINK-22843][docs-zh] Document and code are inconsistent [ZhiJie Yang]
> 16c3ea427df [3 weeks ago] [hotfix] Split the final checkpoint related test

[jira] [Comment Edited] (FLINK-23593) Performance regression on 15.07.2021

2021-08-03 Thread Piotr Nowojski (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-23593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17392423#comment-17392423
 ] 

Piotr Nowojski edited comment on FLINK-23593 at 8/3/21, 4:58 PM:
-

Because of https://issues.apache.org/jira/browse/FLINK-23392, 
https://issues.apache.org/jira/browse/FLINK-23560, you can not compare the 
results from July 15th to the current results. Also because of various braking 
changes like https://issues.apache.org/jira/browse/FLINK-23464, you can not use 
the benchmarking code from current `flink-benchmarks` master to run old `flink` 
code. You have to use both Flink and flink-benchmarks code from the the time of 
the regression.

I was able quite easily reproduce the regression of FLINK-23392 using 
flink-benchmarks commit: d816a18

http://codespeed.dak8s.net:8080/job/flink-benchmark-request/345/ (last good, 
flink commit: 4a78097d038)

{noformat}
"Benchmark","Mode","Threads","Samples","Score","Score Error (99.9%)","Unit"
"org.apache.flink.benchmark.SortingBoundedInputBenchmarks.sortedMultiInput","thrpt",1,30,1996.460479,28.904057,"ops/ms"
"org.apache.flink.benchmark.SortingBoundedInputBenchmarks.sortedOneInput","thrpt",1,30,2337.385239,43.234577,"ops/ms"
"org.apache.flink.benchmark.SortingBoundedInputBenchmarks.sortedTwoInput","thrpt",1,30,1946.457665,28.919437,"ops/ms"
{noformat}

http://codespeed.dak8s.net:8080/job/flink-benchmark-request/347/ (first bad, 
flink commit: d8b1a6fd368)

{noformat}
"Benchmark","Mode","Threads","Samples","Score","Score Error (99.9%)","Unit"
"org.apache.flink.benchmark.SortingBoundedInputBenchmarks.sortedMultiInput","thrpt",1,30,1837.391829,23.495855,"ops/ms"
"org.apache.flink.benchmark.SortingBoundedInputBenchmarks.sortedOneInput","thrpt",1,30,2370.271382,37.804557,"ops/ms"
"org.apache.flink.benchmark.SortingBoundedInputBenchmarks.sortedTwoInput","thrpt",1,30,1788.425393,22.619503,"ops/ms"
{noformat}
(those numbers perfectly align with the performance regression visible in the 
webUI on 15.07)


was (Author: pnowojski):
Because of https://issues.apache.org/jira/browse/FLINK-23392, 
https://issues.apache.org/jira/browse/FLINK-23560, you can not compare the 
results from July 15th to the current results. Also because of various braking 
changes like https://issues.apache.org/jira/browse/FLINK-23464, you can not use 
the benchmarking code from current `flink-benchmarks` master to run old `flink` 
code. You have to use both Flink and flink-benchmarks code from the the time of 
the regression.

I was able quite easily reproduce the regression of FLINK-23392 using 
flink-benchmarks commit: d816a18

http://codespeed.dak8s.net:8080/job/flink-benchmark-request/345/ (last good, 
flink commit: 4a78097d038)

{noformat}
"Benchmark","Mode","Threads","Samples","Score","Score Error (99.9%)","Unit"
"org.apache.flink.benchmark.SortingBoundedInputBenchmarks.sortedMultiInput","thrpt",1,30,1996.460479,28.904057,"ops/ms"
"org.apache.flink.benchmark.SortingBoundedInputBenchmarks.sortedOneInput","thrpt",1,30,2337.385239,43.234577,"ops/ms"
"org.apache.flink.benchmark.SortingBoundedInputBenchmarks.sortedTwoInput","thrpt",1,30,1946.457665,28.919437,"ops/ms"
{noformat}

http://codespeed.dak8s.net:8080/job/flink-benchmark-request/347/ (first bad, 
flink commit: d8b1a6fd368)

{noformat}
"Benchmark","Mode","Threads","Samples","Score","Score Error (99.9%)","Unit"
"org.apache.flink.benchmark.SortingBoundedInputBenchmarks.sortedMultiInput","thrpt",1,30,1837.391829,23.495855,"ops/ms"
"org.apache.flink.benchmark.SortingBoundedInputBenchmarks.sortedOneInput","thrpt",1,30,2370.271382,37.804557,"ops/ms"
"org.apache.flink.benchmark.SortingBoundedInputBenchmarks.sortedTwoInput","thrpt",1,30,1788.425393,22.619503,"ops/ms"
{noformat}


> Performance regression on 15.07.2021
> 
>
> Key: FLINK-23593
> URL: https://issues.apache.org/jira/browse/FLINK-23593
> Project: Flink
>  Issue Type: Bug
>  Components: API / DataStream, Benchmarks
>Affects Versions: 1.14.0
>Reporter: Piotr Nowojski
>Assignee: Timo Walther
>Priority: Blocker
> Fix For: 1.14.0
>
>
> http://codespeed.dak8s.net:8000/timeline/?ben=sortedMultiInput&env=2
> http://codespeed.dak8s.net:8000/timeline/?ben=sortedTwoInput&env=2
> {noformat}
> pnowojski@piotr-mbp: [~/flink -  ((no branch, bisect started on pr/16589))] $ 
> git ls f4afbf3e7de..eb8100f7afe
> eb8100f7afe [4 weeks ago] (pn/bad, bad, refs/bisect/bad) 
> [FLINK-22017][coordination] Allow BLOCKING result partition to be 
> individually consumable [Thesharing]
> d2005268b1e [4 weeks ago] (HEAD, pn/bisect-4, bisect-4) 
> [FLINK-22017][coordination] Get the ConsumedPartitionGroup that 
> IntermediateResultPartition and DefaultResultPartition belong to [Thesharing]
> d8b1a6fd368 [3 weeks ago] [FLINK-23372][streami

[jira] [Comment Edited] (FLINK-23593) Performance regression on 15.07.2021

2021-08-03 Thread Piotr Nowojski (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-23593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17392423#comment-17392423
 ] 

Piotr Nowojski edited comment on FLINK-23593 at 8/3/21, 4:59 PM:
-

Because of FLINK-23392, FLINK-23560, you can not compare the results from July 
15th to the current results. Also because of various braking changes like 
FLINK-23464, you can not use the benchmarking code from current 
`flink-benchmarks` master to run old `flink` code. You have to use both Flink 
and flink-benchmarks code from the the time of the regression.

I was able quite easily reproduce the regression of FLINK-23392 using 
flink-benchmarks commit: d816a18

http://codespeed.dak8s.net:8080/job/flink-benchmark-request/345/ (last good, 
flink commit: 4a78097d038)

{noformat}
"Benchmark","Mode","Threads","Samples","Score","Score Error (99.9%)","Unit"
"org.apache.flink.benchmark.SortingBoundedInputBenchmarks.sortedMultiInput","thrpt",1,30,1996.460479,28.904057,"ops/ms"
"org.apache.flink.benchmark.SortingBoundedInputBenchmarks.sortedOneInput","thrpt",1,30,2337.385239,43.234577,"ops/ms"
"org.apache.flink.benchmark.SortingBoundedInputBenchmarks.sortedTwoInput","thrpt",1,30,1946.457665,28.919437,"ops/ms"
{noformat}

http://codespeed.dak8s.net:8080/job/flink-benchmark-request/347/ (first bad, 
flink commit: d8b1a6fd368)

{noformat}
"Benchmark","Mode","Threads","Samples","Score","Score Error (99.9%)","Unit"
"org.apache.flink.benchmark.SortingBoundedInputBenchmarks.sortedMultiInput","thrpt",1,30,1837.391829,23.495855,"ops/ms"
"org.apache.flink.benchmark.SortingBoundedInputBenchmarks.sortedOneInput","thrpt",1,30,2370.271382,37.804557,"ops/ms"
"org.apache.flink.benchmark.SortingBoundedInputBenchmarks.sortedTwoInput","thrpt",1,30,1788.425393,22.619503,"ops/ms"
{noformat}
(those numbers perfectly align with the performance regression visible in the 
webUI on 15.07)


was (Author: pnowojski):
Because of https://issues.apache.org/jira/browse/FLINK-23392, 
https://issues.apache.org/jira/browse/FLINK-23560, you can not compare the 
results from July 15th to the current results. Also because of various braking 
changes like https://issues.apache.org/jira/browse/FLINK-23464, you can not use 
the benchmarking code from current `flink-benchmarks` master to run old `flink` 
code. You have to use both Flink and flink-benchmarks code from the the time of 
the regression.

I was able quite easily reproduce the regression of FLINK-23392 using 
flink-benchmarks commit: d816a18

http://codespeed.dak8s.net:8080/job/flink-benchmark-request/345/ (last good, 
flink commit: 4a78097d038)

{noformat}
"Benchmark","Mode","Threads","Samples","Score","Score Error (99.9%)","Unit"
"org.apache.flink.benchmark.SortingBoundedInputBenchmarks.sortedMultiInput","thrpt",1,30,1996.460479,28.904057,"ops/ms"
"org.apache.flink.benchmark.SortingBoundedInputBenchmarks.sortedOneInput","thrpt",1,30,2337.385239,43.234577,"ops/ms"
"org.apache.flink.benchmark.SortingBoundedInputBenchmarks.sortedTwoInput","thrpt",1,30,1946.457665,28.919437,"ops/ms"
{noformat}

http://codespeed.dak8s.net:8080/job/flink-benchmark-request/347/ (first bad, 
flink commit: d8b1a6fd368)

{noformat}
"Benchmark","Mode","Threads","Samples","Score","Score Error (99.9%)","Unit"
"org.apache.flink.benchmark.SortingBoundedInputBenchmarks.sortedMultiInput","thrpt",1,30,1837.391829,23.495855,"ops/ms"
"org.apache.flink.benchmark.SortingBoundedInputBenchmarks.sortedOneInput","thrpt",1,30,2370.271382,37.804557,"ops/ms"
"org.apache.flink.benchmark.SortingBoundedInputBenchmarks.sortedTwoInput","thrpt",1,30,1788.425393,22.619503,"ops/ms"
{noformat}
(those numbers perfectly align with the performance regression visible in the 
webUI on 15.07)

> Performance regression on 15.07.2021
> 
>
> Key: FLINK-23593
> URL: https://issues.apache.org/jira/browse/FLINK-23593
> Project: Flink
>  Issue Type: Bug
>  Components: API / DataStream, Benchmarks
>Affects Versions: 1.14.0
>Reporter: Piotr Nowojski
>Assignee: Timo Walther
>Priority: Blocker
> Fix For: 1.14.0
>
>
> http://codespeed.dak8s.net:8000/timeline/?ben=sortedMultiInput&env=2
> http://codespeed.dak8s.net:8000/timeline/?ben=sortedTwoInput&env=2
> {noformat}
> pnowojski@piotr-mbp: [~/flink -  ((no branch, bisect started on pr/16589))] $ 
> git ls f4afbf3e7de..eb8100f7afe
> eb8100f7afe [4 weeks ago] (pn/bad, bad, refs/bisect/bad) 
> [FLINK-22017][coordination] Allow BLOCKING result partition to be 
> individually consumable [Thesharing]
> d2005268b1e [4 weeks ago] (HEAD, pn/bisect-4, bisect-4) 
> [FLINK-22017][coordination] Get the ConsumedPartitionGroup that 
> IntermediateResultPartition and DefaultResultPartition belong to [Thesharing]
> d8b1a6fd368 [3 weeks ago] [FLINK-23372][streaming-java] Disable 
> Al

[jira] [Comment Edited] (FLINK-23593) Performance regression on 15.07.2021

2021-08-03 Thread Piotr Nowojski (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-23593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17392423#comment-17392423
 ] 

Piotr Nowojski edited comment on FLINK-23593 at 8/3/21, 4:59 PM:
-

Because of FLINK-23392, FLINK-23560, you can not compare the results from July 
15th to the current results. Also because of various braking changes like 
FLINK-23464, you can not use the benchmarking code from current 
`flink-benchmarks` master to run old `flink` code. You have to use both Flink 
and flink-benchmarks code from the the time of the regression.

I was able quite easily reproduce the regression from this ticket using 
flink-benchmarks commit: d816a18

http://codespeed.dak8s.net:8080/job/flink-benchmark-request/345/ (last good, 
flink commit: 4a78097d038)

{noformat}
"Benchmark","Mode","Threads","Samples","Score","Score Error (99.9%)","Unit"
"org.apache.flink.benchmark.SortingBoundedInputBenchmarks.sortedMultiInput","thrpt",1,30,1996.460479,28.904057,"ops/ms"
"org.apache.flink.benchmark.SortingBoundedInputBenchmarks.sortedOneInput","thrpt",1,30,2337.385239,43.234577,"ops/ms"
"org.apache.flink.benchmark.SortingBoundedInputBenchmarks.sortedTwoInput","thrpt",1,30,1946.457665,28.919437,"ops/ms"
{noformat}

http://codespeed.dak8s.net:8080/job/flink-benchmark-request/347/ (first bad, 
flink commit: d8b1a6fd368)

{noformat}
"Benchmark","Mode","Threads","Samples","Score","Score Error (99.9%)","Unit"
"org.apache.flink.benchmark.SortingBoundedInputBenchmarks.sortedMultiInput","thrpt",1,30,1837.391829,23.495855,"ops/ms"
"org.apache.flink.benchmark.SortingBoundedInputBenchmarks.sortedOneInput","thrpt",1,30,2370.271382,37.804557,"ops/ms"
"org.apache.flink.benchmark.SortingBoundedInputBenchmarks.sortedTwoInput","thrpt",1,30,1788.425393,22.619503,"ops/ms"
{noformat}
(those numbers perfectly align with the performance regression visible in the 
webUI on 15.07)


was (Author: pnowojski):
Because of FLINK-23392, FLINK-23560, you can not compare the results from July 
15th to the current results. Also because of various braking changes like 
FLINK-23464, you can not use the benchmarking code from current 
`flink-benchmarks` master to run old `flink` code. You have to use both Flink 
and flink-benchmarks code from the the time of the regression.

I was able quite easily reproduce the regression of FLINK-23392 using 
flink-benchmarks commit: d816a18

http://codespeed.dak8s.net:8080/job/flink-benchmark-request/345/ (last good, 
flink commit: 4a78097d038)

{noformat}
"Benchmark","Mode","Threads","Samples","Score","Score Error (99.9%)","Unit"
"org.apache.flink.benchmark.SortingBoundedInputBenchmarks.sortedMultiInput","thrpt",1,30,1996.460479,28.904057,"ops/ms"
"org.apache.flink.benchmark.SortingBoundedInputBenchmarks.sortedOneInput","thrpt",1,30,2337.385239,43.234577,"ops/ms"
"org.apache.flink.benchmark.SortingBoundedInputBenchmarks.sortedTwoInput","thrpt",1,30,1946.457665,28.919437,"ops/ms"
{noformat}

http://codespeed.dak8s.net:8080/job/flink-benchmark-request/347/ (first bad, 
flink commit: d8b1a6fd368)

{noformat}
"Benchmark","Mode","Threads","Samples","Score","Score Error (99.9%)","Unit"
"org.apache.flink.benchmark.SortingBoundedInputBenchmarks.sortedMultiInput","thrpt",1,30,1837.391829,23.495855,"ops/ms"
"org.apache.flink.benchmark.SortingBoundedInputBenchmarks.sortedOneInput","thrpt",1,30,2370.271382,37.804557,"ops/ms"
"org.apache.flink.benchmark.SortingBoundedInputBenchmarks.sortedTwoInput","thrpt",1,30,1788.425393,22.619503,"ops/ms"
{noformat}
(those numbers perfectly align with the performance regression visible in the 
webUI on 15.07)

> Performance regression on 15.07.2021
> 
>
> Key: FLINK-23593
> URL: https://issues.apache.org/jira/browse/FLINK-23593
> Project: Flink
>  Issue Type: Bug
>  Components: API / DataStream, Benchmarks
>Affects Versions: 1.14.0
>Reporter: Piotr Nowojski
>Assignee: Timo Walther
>Priority: Blocker
> Fix For: 1.14.0
>
>
> http://codespeed.dak8s.net:8000/timeline/?ben=sortedMultiInput&env=2
> http://codespeed.dak8s.net:8000/timeline/?ben=sortedTwoInput&env=2
> {noformat}
> pnowojski@piotr-mbp: [~/flink -  ((no branch, bisect started on pr/16589))] $ 
> git ls f4afbf3e7de..eb8100f7afe
> eb8100f7afe [4 weeks ago] (pn/bad, bad, refs/bisect/bad) 
> [FLINK-22017][coordination] Allow BLOCKING result partition to be 
> individually consumable [Thesharing]
> d2005268b1e [4 weeks ago] (HEAD, pn/bisect-4, bisect-4) 
> [FLINK-22017][coordination] Get the ConsumedPartitionGroup that 
> IntermediateResultPartition and DefaultResultPartition belong to [Thesharing]
> d8b1a6fd368 [3 weeks ago] [FLINK-23372][streaming-java] Disable 
> AllVerticesInSameSlotSharingGroupByDefault in batch mode [Timo Walther]
> 4a78097d038 [3 weeks ago] (pn/bisect-3, bi

[jira] [Comment Edited] (FLINK-23593) Performance regression on 15.07.2021

2021-08-04 Thread Stephan Ewen (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-23593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17393287#comment-17393287
 ] 

Stephan Ewen edited comment on FLINK-23593 at 8/4/21, 4:00 PM:
---

Here is a summary from discussing this offline with [~twalthr].

**Meaningful Change**

The general change of behavior is meaningful. Not having tasks share their 
slots during batch execution means we don't fragment the memory budget as much 
between different tasks that most likely don't run concurrently anyways.

It should give more reliable performance at scale and more predictable behavior 
by default.

**Regression acceptable**

We are altering behavior here that has a performance impact, so some amount of 
change in the benchmarks is expected.

In particular, slot sharing is beneficial for small scale:
* small data means one slot's memory is enough to accommodate all tasks
* fewer slots allocated means a bit less overhead during slot allocation, less 
bookkeeping.

Not slot sharing is beneficial for larger scale:
* more memory per operator
* means often fewer concurrent tasks so more network buffers per task

**Trying to explain the Regression**

The executed data flow is pretty much the same in all cases. The tasks and the 
network stack (local channels, batch shuffles) don't actually care whether they 
are in one slot or another.

My working assumption is that the difference is caused by a few factors in the 
startup overhead. More slots are required to be allocated, more TM / JM 
coordination at startup.

Another option could be that if the keyed operator (with the sorting) gets its 
own dedicated slot (when not slot sharing), it gets more memory. The sorter 
reserves its full share of memory from the MemoryManager, which in turn 
allocates it at startup (and initializes it to zero). While more memory is 
generally good, it also has a slightly longer initialization phase.
[~zhuzh] could that be an explanation?

I think Timo's benchmarks are quite good, comparing slot-sharing vs. 
not-slot-sharing within the same code snapshot, also relative to the different 
batch shuffle settings. That's really what we want to understand here.
The difference between the slot sharing and not sharing depending on the 
shuffle modes is pretty small here.

{code}
Benchmark   
Mode  Cnt ScoreError   Units
SortingBoundedInputBenchmarks.sortedTwoInputNoSlotSharingBlocking  
thrpt   30  1642.628 ± 21.183  ops/ms
SortingBoundedInputBenchmarks.sortedTwoInputSlotSharingBlocking
thrpt   30  1681.684 ± 17.065  ops/ms

SortingBoundedInputBenchmarks.sortedTwoInputNoSlotSharingPipelined 
thrpt   30  1761.725 ± 18.225  ops/ms
SortingBoundedInputBenchmarks.sortedTwoInputSlotSharingPipelined   
thrpt   30  1731.022 ± 32.813  
{code}

_(Note, I removed the cases with "ForwardPipelined" because it is the same as 
"Blocking" in that benchmark. There are no forward exchanges, the sink is 
chained, the sources connect via keyBy())_

It is curious, though, that for pipelined execution, the variant without 
sharing slots is actually a bit faster.


was (Author: stephanewen):
Here is a summary from discussing this offline with [~twalthr].

**Meaningful Change**

The general change of behavior is meaningful. Not having tasks share their 
slots during batch execution means we don't fragment the memory budget as much 
between different tasks that most likely don't run concurrently anyways.

It should give more reliable performance at scale and more predictable behavior 
by default.

**Regression acceptable**

We are altering behavior here that has a performance impact, so some amount of 
change in the benchmarks is expected.

In particular, slot sharing is beneficial for small scale:
* small data means one slot's memory is enough to accommodate all tasks
* fewer slots allocated means a bit less overhead during slot allocation, less 
bookkeeping.

Not slot sharing is beneficial for larger scale:
* more memory per operator
* means often fewer concurrent tasks so more network buffers per task

**Trying to explain the Regression**

The executed data flow is pretty much the same in all cases. The tasks and the 
network stack (local channels, batch shuffles) don't actually care whether they 
are in one slot or another.

My working assumption is that the difference is caused by a few factors in the 
startup overhead. More slots are required to be allocated, more TM / JM 
coordination at startup.

Another option could be that if the keyed operator (with the sorting) gets its 
own dedicated slot (when not slot sharing), it gets more memory. The sorter 
reserves its full share of memory from the MemoryManager, which in turn 
allocates it at startup (and initializes it to zero). While more memory is 
generally good, it also has a slightly longer i

[jira] [Comment Edited] (FLINK-23593) Performance regression on 15.07.2021

2021-08-05 Thread Zhu Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-23593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17393950#comment-17393950
 ] 

Zhu Zhu edited comment on FLINK-23593 at 8/5/21, 12:19 PM:
---

>> Could the larger difference between local benchmark vs. cloud be that the 
>> cloud is running with regular HDDs and we always spill to disk because 
>> SORT_SPILLING_THRESHOLD is set to 0?

Maybe yes. Because the record processing time can be shorter on SSD and the 
increased initialization time(described in *Trying to explain the Regression*) 
will be more obvious.

Another similar suspicion is that the flink-benchmark 
[patch|https://github.com/twalthr/flink-benchmarks/commit/dfe3cad86030b551daaa7c4a5951a6e4c06fc061]
 increased `RECORDS_PER_INVOCATION` from 1_500_000 to 3_000_000. This increased 
processing time and may make the regression on initialization time less obvious.


was (Author: zhuzh):
>> Could the larger difference between local benchmark vs. cloud be that the 
>> cloud is running with regular HDDs and we always spill to disk because 
>> SORT_SPILLING_THRESHOLD is set to 0?

Maybe yes. Because the record processing time can be shorter on SSD and the 
increased initialization time(described in *Trying to explain the Regression*) 
will be more obvious.

Another similar suspicion is that the flink-benchmark 
[patch|https://github.com/twalthr/flink-benchmarks/commit/dfe3cad86030b551daaa7c4a5951a6e4c06fc061]
 increased `RECORDS_PER_INVOCATION` from 1_500_000 to 3_000_000. This increased 
processing time may make the regression on initialization time less obvious.

> Performance regression on 15.07.2021
> 
>
> Key: FLINK-23593
> URL: https://issues.apache.org/jira/browse/FLINK-23593
> Project: Flink
>  Issue Type: Bug
>  Components: API / DataStream, Benchmarks
>Affects Versions: 1.14.0
>Reporter: Piotr Nowojski
>Assignee: Timo Walther
>Priority: Blocker
> Fix For: 1.14.0
>
>
> http://codespeed.dak8s.net:8000/timeline/?ben=sortedMultiInput&env=2
> http://codespeed.dak8s.net:8000/timeline/?ben=sortedTwoInput&env=2
> {noformat}
> pnowojski@piotr-mbp: [~/flink -  ((no branch, bisect started on pr/16589))] $ 
> git ls f4afbf3e7de..eb8100f7afe
> eb8100f7afe [4 weeks ago] (pn/bad, bad, refs/bisect/bad) 
> [FLINK-22017][coordination] Allow BLOCKING result partition to be 
> individually consumable [Thesharing]
> d2005268b1e [4 weeks ago] (HEAD, pn/bisect-4, bisect-4) 
> [FLINK-22017][coordination] Get the ConsumedPartitionGroup that 
> IntermediateResultPartition and DefaultResultPartition belong to [Thesharing]
> d8b1a6fd368 [3 weeks ago] [FLINK-23372][streaming-java] Disable 
> AllVerticesInSameSlotSharingGroupByDefault in batch mode [Timo Walther]
> 4a78097d038 [3 weeks ago] (pn/bisect-3, bisect-3, 
> refs/bisect/good-4a78097d0385749daceafd8326930c8cc5f26f1a) 
> [FLINK-21928][clients][runtime] Introduce static method constructors of 
> DuplicateJobSubmissionException for better readability. [David Moravek]
> 172b9e32215 [3 weeks ago] [FLINK-21928][clients] JobManager failover should 
> succeed, when trying to resubmit already terminated job in application mode. 
> [David Moravek]
> f483008db86 [3 weeks ago] [FLINK-21928][core] Introduce 
> org.apache.flink.util.concurrent.FutureUtils#handleException method, that 
> allows future to recover from the specied exception. [David Moravek]
> d7ac08c2ac0 [3 weeks ago] (pn/bisect-2, bisect-2, 
> refs/bisect/good-d7ac08c2ac06b9ff31707f3b8f43c07817814d4f) 
> [FLINK-22843][docs-zh] Document and code are inconsistent [ZhiJie Yang]
> 16c3ea427df [3 weeks ago] [hotfix] Split the final checkpoint related tests 
> to a separate test class. [Yun Gao]
> 31b3d37a22c [7 weeks ago] [FLINK-21089][runtime] Skip the execution of new 
> sources if finished on restore [Yun Gao]
> 20fe062e1b5 [3 weeks ago] [FLINK-21089][runtime] Skip execution for the 
> legacy source task if finished on restore [Yun Gao]
> 874c627114b [3 weeks ago] [FLINK-21089][runtime] Skip the lifecycle method of 
> operators if finished on restore [Yun Gao]
> ceaf24b1d88 [3 weeks ago] (pn/bisect-1, bisect-1, 
> refs/bisect/good-ceaf24b1d881c2345a43f305d40435519a09cec9) [hotfix] Fix 
> isClosed() for operator wrapper and proxy operator close to the operator 
> chain [Yun Gao]
> 41ea591a6db [3 weeks ago] [FLINK-22627][runtime] Remove unused slot request 
> protocol [Yangze Guo]
> 489346b60f8 [3 months ago] [FLINK-22627][runtime] Remove PendingSlotRequest 
> [Yangze Guo]
> 8ffb4d2af36 [3 months ago] [FLINK-22627][runtime] Remove TaskManagerSlot 
> [Yangze Guo]
> 72073741588 [3 months ago] [FLINK-22627][runtime] Remove SlotManagerImpl and 
> its related tests [Yangze Guo]
> bdb3b7541b3 [3 months ago] [hotfix][yarn] Remove unused internal options in 
> YarnConfigOptionsI

[jira] [Comment Edited] (FLINK-23593) Performance regression on 15.07.2021

2021-08-09 Thread Zhu Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-23593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17395983#comment-17395983
 ] 

Zhu Zhu edited comment on FLINK-23593 at 8/9/21, 11:20 AM:
---

I tried the benchmarks locally before/after applying FLINK-23372 and did not 
see obvious regression.
Also tried benchmarks on commit f4afbf3e7de19ebcc5cb9324a22ba99fcd354dce(last 
good on 
[codespeed|http://codespeed.dak8s.net:8000/timeline/?ben=sortedTwoInput&env=2#/?exe=1,3,5&ben=sortedTwoInput&env=2&revs=200&equid=off&quarts=on&extr=on]
 curve)  and eb8100f7afe1cd2b6fceb55b174de097db752fc7(first bad on the curve) 
but did not reproduce the regression either. Maybe it's due to HDD but I have 
no idea yet.


was (Author: zhuzh):
I tried the benchmarks locally before/after applying FLINK-23372 and did not 
see obvious regression.
Also tried benchmarks on commit f4afbf3e7de19ebcc5cb9324a22ba99fcd354dce(last 
good on 
[codespeed|http://codespeed.dak8s.net:8000/timeline/?ben=sortedTwoInput&env=2#/?exe=1,3,5&ben=sortedTwoInput&env=2&revs=200&equid=off&quarts=on&extr=on]
 curve)  and eb8100f7afe1cd2b6fceb55b174de097db752fc7(first bad on the curve) 
but did not reproduce the regression either. Maybe it's due to HDD but I have 
no idea yet.






[2]

> Performance regression on 15.07.2021
> 
>
> Key: FLINK-23593
> URL: https://issues.apache.org/jira/browse/FLINK-23593
> Project: Flink
>  Issue Type: Bug
>  Components: API / DataStream, Benchmarks
>Affects Versions: 1.14.0
>Reporter: Piotr Nowojski
>Assignee: Timo Walther
>Priority: Blocker
> Fix For: 1.14.0
>
>
> http://codespeed.dak8s.net:8000/timeline/?ben=sortedMultiInput&env=2
> http://codespeed.dak8s.net:8000/timeline/?ben=sortedTwoInput&env=2
> {noformat}
> pnowojski@piotr-mbp: [~/flink -  ((no branch, bisect started on pr/16589))] $ 
> git ls f4afbf3e7de..eb8100f7afe
> eb8100f7afe [4 weeks ago] (pn/bad, bad, refs/bisect/bad) 
> [FLINK-22017][coordination] Allow BLOCKING result partition to be 
> individually consumable [Thesharing]
> d2005268b1e [4 weeks ago] (HEAD, pn/bisect-4, bisect-4) 
> [FLINK-22017][coordination] Get the ConsumedPartitionGroup that 
> IntermediateResultPartition and DefaultResultPartition belong to [Thesharing]
> d8b1a6fd368 [3 weeks ago] [FLINK-23372][streaming-java] Disable 
> AllVerticesInSameSlotSharingGroupByDefault in batch mode [Timo Walther]
> 4a78097d038 [3 weeks ago] (pn/bisect-3, bisect-3, 
> refs/bisect/good-4a78097d0385749daceafd8326930c8cc5f26f1a) 
> [FLINK-21928][clients][runtime] Introduce static method constructors of 
> DuplicateJobSubmissionException for better readability. [David Moravek]
> 172b9e32215 [3 weeks ago] [FLINK-21928][clients] JobManager failover should 
> succeed, when trying to resubmit already terminated job in application mode. 
> [David Moravek]
> f483008db86 [3 weeks ago] [FLINK-21928][core] Introduce 
> org.apache.flink.util.concurrent.FutureUtils#handleException method, that 
> allows future to recover from the specied exception. [David Moravek]
> d7ac08c2ac0 [3 weeks ago] (pn/bisect-2, bisect-2, 
> refs/bisect/good-d7ac08c2ac06b9ff31707f3b8f43c07817814d4f) 
> [FLINK-22843][docs-zh] Document and code are inconsistent [ZhiJie Yang]
> 16c3ea427df [3 weeks ago] [hotfix] Split the final checkpoint related tests 
> to a separate test class. [Yun Gao]
> 31b3d37a22c [7 weeks ago] [FLINK-21089][runtime] Skip the execution of new 
> sources if finished on restore [Yun Gao]
> 20fe062e1b5 [3 weeks ago] [FLINK-21089][runtime] Skip execution for the 
> legacy source task if finished on restore [Yun Gao]
> 874c627114b [3 weeks ago] [FLINK-21089][runtime] Skip the lifecycle method of 
> operators if finished on restore [Yun Gao]
> ceaf24b1d88 [3 weeks ago] (pn/bisect-1, bisect-1, 
> refs/bisect/good-ceaf24b1d881c2345a43f305d40435519a09cec9) [hotfix] Fix 
> isClosed() for operator wrapper and proxy operator close to the operator 
> chain [Yun Gao]
> 41ea591a6db [3 weeks ago] [FLINK-22627][runtime] Remove unused slot request 
> protocol [Yangze Guo]
> 489346b60f8 [3 months ago] [FLINK-22627][runtime] Remove PendingSlotRequest 
> [Yangze Guo]
> 8ffb4d2af36 [3 months ago] [FLINK-22627][runtime] Remove TaskManagerSlot 
> [Yangze Guo]
> 72073741588 [3 months ago] [FLINK-22627][runtime] Remove SlotManagerImpl and 
> its related tests [Yangze Guo]
> bdb3b7541b3 [3 months ago] [hotfix][yarn] Remove unused internal options in 
> YarnConfigOptionsInternal [Yangze Guo]
> a6a9b192eac [3 weeks ago] [FLINK-23201][streaming] Reset alignment only for 
> the currently processed checkpoint [Anton Kalashnikov]
> b35701a35c7 [3 weeks ago] [FLINK-23201][streaming] Calculate checkpoint 
> alignment time only for last started checkpoint [Anton Kalashnikov]
> 3abec22c536 [3 weeks ago] [FLINK-231

[jira] [Comment Edited] (FLINK-23593) Performance regression on 15.07.2021

2021-08-09 Thread Zhu Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-23593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17393950#comment-17393950
 ] 

Zhu Zhu edited comment on FLINK-23593 at 8/9/21, 11:22 AM:
---

>> Could the larger difference between local benchmark vs. cloud be that the 
>> cloud is running with regular HDDs and we always spill to disk because 
>> SORT_SPILLING_THRESHOLD is set to 0?

-Maybe yes. Because the record processing time can be shorter on SSD and the 
increased initialization time(described in *Trying to explain the Regression*) 
will be more obvious.-
Ignore this line because it is wrong.

Another similar suspicion is that the flink-benchmark 
[patch|https://github.com/twalthr/flink-benchmarks/commit/dfe3cad86030b551daaa7c4a5951a6e4c06fc061]
 increased `RECORDS_PER_INVOCATION` from 1_500_000 to 3_000_000. This increased 
processing time and may make the regression on initialization time less obvious.


was (Author: zhuzh):
>> Could the larger difference between local benchmark vs. cloud be that the 
>> cloud is running with regular HDDs and we always spill to disk because 
>> SORT_SPILLING_THRESHOLD is set to 0?

Maybe yes. Because the record processing time can be shorter on SSD and the 
increased initialization time(described in *Trying to explain the Regression*) 
will be more obvious.

Another similar suspicion is that the flink-benchmark 
[patch|https://github.com/twalthr/flink-benchmarks/commit/dfe3cad86030b551daaa7c4a5951a6e4c06fc061]
 increased `RECORDS_PER_INVOCATION` from 1_500_000 to 3_000_000. This increased 
processing time and may make the regression on initialization time less obvious.

> Performance regression on 15.07.2021
> 
>
> Key: FLINK-23593
> URL: https://issues.apache.org/jira/browse/FLINK-23593
> Project: Flink
>  Issue Type: Bug
>  Components: API / DataStream, Benchmarks
>Affects Versions: 1.14.0
>Reporter: Piotr Nowojski
>Assignee: Timo Walther
>Priority: Blocker
> Fix For: 1.14.0
>
>
> http://codespeed.dak8s.net:8000/timeline/?ben=sortedMultiInput&env=2
> http://codespeed.dak8s.net:8000/timeline/?ben=sortedTwoInput&env=2
> {noformat}
> pnowojski@piotr-mbp: [~/flink -  ((no branch, bisect started on pr/16589))] $ 
> git ls f4afbf3e7de..eb8100f7afe
> eb8100f7afe [4 weeks ago] (pn/bad, bad, refs/bisect/bad) 
> [FLINK-22017][coordination] Allow BLOCKING result partition to be 
> individually consumable [Thesharing]
> d2005268b1e [4 weeks ago] (HEAD, pn/bisect-4, bisect-4) 
> [FLINK-22017][coordination] Get the ConsumedPartitionGroup that 
> IntermediateResultPartition and DefaultResultPartition belong to [Thesharing]
> d8b1a6fd368 [3 weeks ago] [FLINK-23372][streaming-java] Disable 
> AllVerticesInSameSlotSharingGroupByDefault in batch mode [Timo Walther]
> 4a78097d038 [3 weeks ago] (pn/bisect-3, bisect-3, 
> refs/bisect/good-4a78097d0385749daceafd8326930c8cc5f26f1a) 
> [FLINK-21928][clients][runtime] Introduce static method constructors of 
> DuplicateJobSubmissionException for better readability. [David Moravek]
> 172b9e32215 [3 weeks ago] [FLINK-21928][clients] JobManager failover should 
> succeed, when trying to resubmit already terminated job in application mode. 
> [David Moravek]
> f483008db86 [3 weeks ago] [FLINK-21928][core] Introduce 
> org.apache.flink.util.concurrent.FutureUtils#handleException method, that 
> allows future to recover from the specied exception. [David Moravek]
> d7ac08c2ac0 [3 weeks ago] (pn/bisect-2, bisect-2, 
> refs/bisect/good-d7ac08c2ac06b9ff31707f3b8f43c07817814d4f) 
> [FLINK-22843][docs-zh] Document and code are inconsistent [ZhiJie Yang]
> 16c3ea427df [3 weeks ago] [hotfix] Split the final checkpoint related tests 
> to a separate test class. [Yun Gao]
> 31b3d37a22c [7 weeks ago] [FLINK-21089][runtime] Skip the execution of new 
> sources if finished on restore [Yun Gao]
> 20fe062e1b5 [3 weeks ago] [FLINK-21089][runtime] Skip execution for the 
> legacy source task if finished on restore [Yun Gao]
> 874c627114b [3 weeks ago] [FLINK-21089][runtime] Skip the lifecycle method of 
> operators if finished on restore [Yun Gao]
> ceaf24b1d88 [3 weeks ago] (pn/bisect-1, bisect-1, 
> refs/bisect/good-ceaf24b1d881c2345a43f305d40435519a09cec9) [hotfix] Fix 
> isClosed() for operator wrapper and proxy operator close to the operator 
> chain [Yun Gao]
> 41ea591a6db [3 weeks ago] [FLINK-22627][runtime] Remove unused slot request 
> protocol [Yangze Guo]
> 489346b60f8 [3 months ago] [FLINK-22627][runtime] Remove PendingSlotRequest 
> [Yangze Guo]
> 8ffb4d2af36 [3 months ago] [FLINK-22627][runtime] Remove TaskManagerSlot 
> [Yangze Guo]
> 72073741588 [3 months ago] [FLINK-22627][runtime] Remove SlotManagerImpl and 
> its related tests [Yangze Guo]
> bdb3b7541b3 [3 months ago] [hotfix][yarn] Remove unus

[jira] [Comment Edited] (FLINK-23593) Performance regression on 15.07.2021

2021-08-09 Thread Zhu Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-23593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17395983#comment-17395983
 ] 

Zhu Zhu edited comment on FLINK-23593 at 8/9/21, 11:24 AM:
---

I tried the benchmarks locally before/after applying FLINK-23372 and did not 
see obvious regression.
Also tried benchmarks on commit f4afbf3e7de19ebcc5cb9324a22ba99fcd354dce(last 
good on 
[codespeed|http://codespeed.dak8s.net:8000/timeline/?ben=sortedTwoInput&env=2#/?exe=1,3,5&ben=sortedTwoInput&env=2&revs=200&equid=off&quarts=on&extr=on]
 curve)  and eb8100f7afe1cd2b6fceb55b174de097db752fc7(first bad on the curve) 
but did not reproduce the regression either. 
Maybe it is related to environment differences (e.g. HDD) but I have no idea 
yet.


was (Author: zhuzh):
I tried the benchmarks locally before/after applying FLINK-23372 and did not 
see obvious regression.
Also tried benchmarks on commit f4afbf3e7de19ebcc5cb9324a22ba99fcd354dce(last 
good on 
[codespeed|http://codespeed.dak8s.net:8000/timeline/?ben=sortedTwoInput&env=2#/?exe=1,3,5&ben=sortedTwoInput&env=2&revs=200&equid=off&quarts=on&extr=on]
 curve)  and eb8100f7afe1cd2b6fceb55b174de097db752fc7(first bad on the curve) 
but did not reproduce the regression either. Maybe it's due to HDD but I have 
no idea yet.

> Performance regression on 15.07.2021
> 
>
> Key: FLINK-23593
> URL: https://issues.apache.org/jira/browse/FLINK-23593
> Project: Flink
>  Issue Type: Bug
>  Components: API / DataStream, Benchmarks
>Affects Versions: 1.14.0
>Reporter: Piotr Nowojski
>Assignee: Timo Walther
>Priority: Blocker
> Fix For: 1.14.0
>
>
> http://codespeed.dak8s.net:8000/timeline/?ben=sortedMultiInput&env=2
> http://codespeed.dak8s.net:8000/timeline/?ben=sortedTwoInput&env=2
> {noformat}
> pnowojski@piotr-mbp: [~/flink -  ((no branch, bisect started on pr/16589))] $ 
> git ls f4afbf3e7de..eb8100f7afe
> eb8100f7afe [4 weeks ago] (pn/bad, bad, refs/bisect/bad) 
> [FLINK-22017][coordination] Allow BLOCKING result partition to be 
> individually consumable [Thesharing]
> d2005268b1e [4 weeks ago] (HEAD, pn/bisect-4, bisect-4) 
> [FLINK-22017][coordination] Get the ConsumedPartitionGroup that 
> IntermediateResultPartition and DefaultResultPartition belong to [Thesharing]
> d8b1a6fd368 [3 weeks ago] [FLINK-23372][streaming-java] Disable 
> AllVerticesInSameSlotSharingGroupByDefault in batch mode [Timo Walther]
> 4a78097d038 [3 weeks ago] (pn/bisect-3, bisect-3, 
> refs/bisect/good-4a78097d0385749daceafd8326930c8cc5f26f1a) 
> [FLINK-21928][clients][runtime] Introduce static method constructors of 
> DuplicateJobSubmissionException for better readability. [David Moravek]
> 172b9e32215 [3 weeks ago] [FLINK-21928][clients] JobManager failover should 
> succeed, when trying to resubmit already terminated job in application mode. 
> [David Moravek]
> f483008db86 [3 weeks ago] [FLINK-21928][core] Introduce 
> org.apache.flink.util.concurrent.FutureUtils#handleException method, that 
> allows future to recover from the specied exception. [David Moravek]
> d7ac08c2ac0 [3 weeks ago] (pn/bisect-2, bisect-2, 
> refs/bisect/good-d7ac08c2ac06b9ff31707f3b8f43c07817814d4f) 
> [FLINK-22843][docs-zh] Document and code are inconsistent [ZhiJie Yang]
> 16c3ea427df [3 weeks ago] [hotfix] Split the final checkpoint related tests 
> to a separate test class. [Yun Gao]
> 31b3d37a22c [7 weeks ago] [FLINK-21089][runtime] Skip the execution of new 
> sources if finished on restore [Yun Gao]
> 20fe062e1b5 [3 weeks ago] [FLINK-21089][runtime] Skip execution for the 
> legacy source task if finished on restore [Yun Gao]
> 874c627114b [3 weeks ago] [FLINK-21089][runtime] Skip the lifecycle method of 
> operators if finished on restore [Yun Gao]
> ceaf24b1d88 [3 weeks ago] (pn/bisect-1, bisect-1, 
> refs/bisect/good-ceaf24b1d881c2345a43f305d40435519a09cec9) [hotfix] Fix 
> isClosed() for operator wrapper and proxy operator close to the operator 
> chain [Yun Gao]
> 41ea591a6db [3 weeks ago] [FLINK-22627][runtime] Remove unused slot request 
> protocol [Yangze Guo]
> 489346b60f8 [3 months ago] [FLINK-22627][runtime] Remove PendingSlotRequest 
> [Yangze Guo]
> 8ffb4d2af36 [3 months ago] [FLINK-22627][runtime] Remove TaskManagerSlot 
> [Yangze Guo]
> 72073741588 [3 months ago] [FLINK-22627][runtime] Remove SlotManagerImpl and 
> its related tests [Yangze Guo]
> bdb3b7541b3 [3 months ago] [hotfix][yarn] Remove unused internal options in 
> YarnConfigOptionsInternal [Yangze Guo]
> a6a9b192eac [3 weeks ago] [FLINK-23201][streaming] Reset alignment only for 
> the currently processed checkpoint [Anton Kalashnikov]
> b35701a35c7 [3 weeks ago] [FLINK-23201][streaming] Calculate checkpoint 
> alignment time only for last started checkpoint [Anton Kalashnikov]
> 3abec22c5

[jira] [Comment Edited] (FLINK-23593) Performance regression on 15.07.2021

2021-08-10 Thread Zhu Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-23593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17396636#comment-17396636
 ] 

Zhu Zhu edited comment on FLINK-23593 at 8/10/21, 12:08 PM:


I think I find the cause of the regression.

*Cause*
The regression happens because FLINK-23372 disables slot sharing of batch job 
tasks. And a default MiniCluster would just provide 1 task manager with 1 slot.
This means that the two source tasks of {{sortedTwoInput}} were able to run 
simultaneously before FLINK-23372 and had to run sequentially after FLINK-23372 
was merged. The increased the total execution time and resulted in the 
regression.

Later on 07-20, an 
[improvement|https://github.com/apache/flink-benchmarks/commit/70d9b7b4927fc38ecf0950e55a47325b71e2dd63]
 was made on flink-benchmarks and changed the MiniCluster to be pre-launched 
with 1 task manager with 4 slots. This enabled the two source tasks of 
{{sortedTwoInput}} to run simultaneously again. And the regression was gone. 
And that's why we cannot reproduce the regression by reverting FLINK-23372 on 
latest master.

This also explains that 
- why the regression only happened to {{sortedTwoInput}} and 
{{sortedMultiInput}} and not to {{sortedOneInput}}. 
- why the performance increased on 07-20 and it also only happened to 
{{sortedTwoInput}} and {{sortedMultiInput}}

*Conclusion*
It is expected that more slots may be needed for a batch job to run tasks 
simultaneously. However, this does not mean more resources are needed because 
theoretically each slot can be smaller because it is no longer shared. 
Therefore, this regression is expected and acceptable.

*Attachment*
||Benchmark||Score||Link||
|07-15 sortedTwoInput before 
FLINK-23372|1904.626380|[#418|http://codespeed.dak8s.net:8080/job/flink-benchmark-request/418/]|
|07-15 sortedTwoInput after 
FLINK-23372|1782.644331|[#419|http://codespeed.dak8s.net:8080/job/flink-benchmark-request/419/]|
|07-20 sortedTwoInput before 
FLINK-23372|1964.448112|[#420|http://codespeed.dak8s.net:8080/job/flink-benchmark-request/420/]|
|07-20 sortedTwoInput after 
FLINK-23372|1944.880662|[#421|http://codespeed.dak8s.net:8080/job/flink-benchmark-request/421/]|




was (Author: zhuzh):
I think I find the cause of the regression.

*Cause*
The regression happens because FLINK-23372 disables slot sharing of batch job 
tasks. 
And a default MiniCluster would just provide 1 task manager with 1 slot.
This means that the two source tasks of {{sortedTwoInput}} were able to run 
simultaneously before FLINK-23372 and had to run sequentially after FLINK-23372 
was merged. The increased the total execution time and resulted in the 
regression.

Later on 07-20, an 
[improvement|https://github.com/apache/flink-benchmarks/commit/70d9b7b4927fc38ecf0950e55a47325b71e2dd63]
 was made on flink-benchmarks and changed the MiniCluster to be pre-launched 
with 1 task manager with 4 slots. This enabled the two source tasks of 
{{sortedTwoInput}} to run simultaneously again. And the regression was gone. 
And that's why we cannot reproduce the regression by reverting FLINK-23372 on 
latest master.

This also explains that 
- why the regression only happened to {{sortedTwoInput}} and 
{{sortedMultiInput}} and not to {{sortedOneInput}}. 
- why the performance increased on 07-20 and it also only happened to 
{{sortedTwoInput}} and {{sortedMultiInput}}

*Conclusion*
It is expected that more slots may be needed for a batch job to run tasks 
simultaneously. However, this does not mean more resources are needed because 
theoretically each slot can be smaller because it is no longer shared. 
Therefore, this regression is expected and acceptable.

*Attachment*
||Benchmark||Score||Link||
|07-15 sortedTwoInput before 
FLINK-23372|1904.626380|[#418|http://codespeed.dak8s.net:8080/job/flink-benchmark-request/418/]|
|07-15 sortedTwoInput after 
FLINK-23372|1782.644331|[#419|http://codespeed.dak8s.net:8080/job/flink-benchmark-request/419/]|
|07-20 sortedTwoInput before 
FLINK-23372|1964.448112|[#420|http://codespeed.dak8s.net:8080/job/flink-benchmark-request/420/]|
|07-20 sortedTwoInput after 
FLINK-23372|1944.880662|[#421|http://codespeed.dak8s.net:8080/job/flink-benchmark-request/421/]|



> Performance regression on 15.07.2021
> 
>
> Key: FLINK-23593
> URL: https://issues.apache.org/jira/browse/FLINK-23593
> Project: Flink
>  Issue Type: Bug
>  Components: API / DataStream, Benchmarks
>Affects Versions: 1.14.0
>Reporter: Piotr Nowojski
>Assignee: Timo Walther
>Priority: Blocker
> Fix For: 1.14.0
>
>
> http://codespeed.dak8s.net:8000/timeline/?ben=sortedMultiInput&env=2
> http://codespeed.dak8s.net:8000/timeline/?ben=sortedTwoInput&env=2
> {noformat}
> pnowojski@piotr-mbp: [~/flink -  ((no branch, bisec

[jira] [Comment Edited] (FLINK-23593) Performance regression on 15.07.2021

2021-08-10 Thread Zhu Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-23593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17396636#comment-17396636
 ] 

Zhu Zhu edited comment on FLINK-23593 at 8/10/21, 12:09 PM:


I think I find the cause of the regression.

*Cause*
The regression happens because FLINK-23372 disables slot sharing of batch job 
tasks. And a default MiniCluster would just provide 1 task manager with 1 slot.
This means that the two source tasks of {{sortedTwoInput}} were able to run 
simultaneously before FLINK-23372 and had to run sequentially after FLINK-23372 
was merged. This increased the total execution time and resulted in the 
regression.

Later on 07-20, an 
[improvement|https://github.com/apache/flink-benchmarks/commit/70d9b7b4927fc38ecf0950e55a47325b71e2dd63]
 was made on flink-benchmarks and changed the MiniCluster to be pre-launched 
with 1 task manager with 4 slots. This enabled the two source tasks of 
{{sortedTwoInput}} to run simultaneously again. And the regression was gone. 
And that's why we cannot reproduce the regression by reverting FLINK-23372 on 
latest master.

This also explains that 
- why the regression only happened to {{sortedTwoInput}} and 
{{sortedMultiInput}} and not to {{sortedOneInput}}. 
- why the performance increased on 07-20 and it also only happened to 
{{sortedTwoInput}} and {{sortedMultiInput}}

*Conclusion*
It is expected that more slots may be needed for a batch job to run tasks 
simultaneously. However, this does not mean more resources are needed because 
theoretically each slot can be smaller because it is no longer shared. 
Therefore, this regression is expected and acceptable.

*Attachment*
||Benchmark||Score||Link||
|07-15 sortedTwoInput before 
FLINK-23372|1904.626380|[#418|http://codespeed.dak8s.net:8080/job/flink-benchmark-request/418/]|
|07-15 sortedTwoInput after 
FLINK-23372|1782.644331|[#419|http://codespeed.dak8s.net:8080/job/flink-benchmark-request/419/]|
|07-20 sortedTwoInput before 
FLINK-23372|1964.448112|[#420|http://codespeed.dak8s.net:8080/job/flink-benchmark-request/420/]|
|07-20 sortedTwoInput after 
FLINK-23372|1944.880662|[#421|http://codespeed.dak8s.net:8080/job/flink-benchmark-request/421/]|




was (Author: zhuzh):
I think I find the cause of the regression.

*Cause*
The regression happens because FLINK-23372 disables slot sharing of batch job 
tasks. And a default MiniCluster would just provide 1 task manager with 1 slot.
This means that the two source tasks of {{sortedTwoInput}} were able to run 
simultaneously before FLINK-23372 and had to run sequentially after FLINK-23372 
was merged. The increased the total execution time and resulted in the 
regression.

Later on 07-20, an 
[improvement|https://github.com/apache/flink-benchmarks/commit/70d9b7b4927fc38ecf0950e55a47325b71e2dd63]
 was made on flink-benchmarks and changed the MiniCluster to be pre-launched 
with 1 task manager with 4 slots. This enabled the two source tasks of 
{{sortedTwoInput}} to run simultaneously again. And the regression was gone. 
And that's why we cannot reproduce the regression by reverting FLINK-23372 on 
latest master.

This also explains that 
- why the regression only happened to {{sortedTwoInput}} and 
{{sortedMultiInput}} and not to {{sortedOneInput}}. 
- why the performance increased on 07-20 and it also only happened to 
{{sortedTwoInput}} and {{sortedMultiInput}}

*Conclusion*
It is expected that more slots may be needed for a batch job to run tasks 
simultaneously. However, this does not mean more resources are needed because 
theoretically each slot can be smaller because it is no longer shared. 
Therefore, this regression is expected and acceptable.

*Attachment*
||Benchmark||Score||Link||
|07-15 sortedTwoInput before 
FLINK-23372|1904.626380|[#418|http://codespeed.dak8s.net:8080/job/flink-benchmark-request/418/]|
|07-15 sortedTwoInput after 
FLINK-23372|1782.644331|[#419|http://codespeed.dak8s.net:8080/job/flink-benchmark-request/419/]|
|07-20 sortedTwoInput before 
FLINK-23372|1964.448112|[#420|http://codespeed.dak8s.net:8080/job/flink-benchmark-request/420/]|
|07-20 sortedTwoInput after 
FLINK-23372|1944.880662|[#421|http://codespeed.dak8s.net:8080/job/flink-benchmark-request/421/]|



> Performance regression on 15.07.2021
> 
>
> Key: FLINK-23593
> URL: https://issues.apache.org/jira/browse/FLINK-23593
> Project: Flink
>  Issue Type: Bug
>  Components: API / DataStream, Benchmarks
>Affects Versions: 1.14.0
>Reporter: Piotr Nowojski
>Assignee: Timo Walther
>Priority: Blocker
> Fix For: 1.14.0
>
>
> http://codespeed.dak8s.net:8000/timeline/?ben=sortedMultiInput&env=2
> http://codespeed.dak8s.net:8000/timeline/?ben=sortedTwoInput&env=2
> {noformat}
> pnowojski@piotr-mbp: [~/flink -  ((no branch, bisec

[jira] [Comment Edited] (FLINK-23593) Performance regression on 15.07.2021

2021-08-10 Thread Zhu Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-23593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17396636#comment-17396636
 ] 

Zhu Zhu edited comment on FLINK-23593 at 8/11/21, 2:51 AM:
---

I think I find the cause of the major regression.

*Cause*
The major regression happens because FLINK-23372 disables slot sharing of batch 
job tasks. And a default MiniCluster would just provide 1 task manager with 1 
slot.
This means that the two source tasks of {{sortedTwoInput}} were able to run 
simultaneously before FLINK-23372 and had to run sequentially after FLINK-23372 
was merged. This increased the total execution time and resulted in the 
regression.

Later on 07-20, an 
[improvement|https://github.com/apache/flink-benchmarks/commit/70d9b7b4927fc38ecf0950e55a47325b71e2dd63]
 was made on flink-benchmarks and changed the MiniCluster to be pre-launched 
with 1 task manager with 4 slots. This enabled the two source tasks of 
{{sortedTwoInput}} to run simultaneously again. And the major regression was 
gone. And that's why we cannot reproduce the obvious regression by reverting 
FLINK-23372 on latest master.

This also explains that 
- why the obvious regression only happened to {{sortedTwoInput}} and 
{{sortedMultiInput}} and not to {{sortedOneInput}}. 
- why the performance increased on 07-20 and it also only happened to 
{{sortedTwoInput}} and {{sortedMultiInput}}

*Conclusion*
It is expected that more slots may be needed for a batch job to run tasks 
simultaneously. However, this does not mean more resources are needed because 
theoretically each slot can be smaller because it is no longer shared. 
Therefore, this major regression is expected and acceptable.

Note that there still seems to be a minor regression(~1%) after FLINK-23372. 
The cause may be the increased overhead on slot allocation or memory 
initialization, as Stephan [commented 
above|https://issues.apache.org/jira/browse/FLINK-23593?focusedCommentId=17393287&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17393287].
 It is also acceptable in my opinion.

*Attachment*
||Benchmark||Score||Link||
|07-15 sortedTwoInput before 
FLINK-23372|1904.626380|[#418|http://codespeed.dak8s.net:8080/job/flink-benchmark-request/418/]|
|07-15 sortedTwoInput after 
FLINK-23372|1782.644331|[#419|http://codespeed.dak8s.net:8080/job/flink-benchmark-request/419/]|
|07-20 sortedTwoInput before 
FLINK-23372|1964.448112|[#420|http://codespeed.dak8s.net:8080/job/flink-benchmark-request/420/]|
|07-20 sortedTwoInput after 
FLINK-23372|1944.880662|[#421|http://codespeed.dak8s.net:8080/job/flink-benchmark-request/421/]|
|latest sortedTwoInput on latest 
master|1926.685377|[#413|http://codespeed.dak8s.net:8080/job/flink-benchmark-request/413/]|
|latest sortedTwoInput reverting FLINK-23372 on latest 
master|1938.716479|[#414|http://codespeed.dak8s.net:8080/job/flink-benchmark-request/414/]|




was (Author: zhuzh):
I think I find the cause of the regression.

*Cause*
The regression happens because FLINK-23372 disables slot sharing of batch job 
tasks. And a default MiniCluster would just provide 1 task manager with 1 slot.
This means that the two source tasks of {{sortedTwoInput}} were able to run 
simultaneously before FLINK-23372 and had to run sequentially after FLINK-23372 
was merged. This increased the total execution time and resulted in the 
regression.

Later on 07-20, an 
[improvement|https://github.com/apache/flink-benchmarks/commit/70d9b7b4927fc38ecf0950e55a47325b71e2dd63]
 was made on flink-benchmarks and changed the MiniCluster to be pre-launched 
with 1 task manager with 4 slots. This enabled the two source tasks of 
{{sortedTwoInput}} to run simultaneously again. And the regression was gone. 
And that's why we cannot reproduce the regression by reverting FLINK-23372 on 
latest master.

This also explains that 
- why the regression only happened to {{sortedTwoInput}} and 
{{sortedMultiInput}} and not to {{sortedOneInput}}. 
- why the performance increased on 07-20 and it also only happened to 
{{sortedTwoInput}} and {{sortedMultiInput}}

*Conclusion*
It is expected that more slots may be needed for a batch job to run tasks 
simultaneously. However, this does not mean more resources are needed because 
theoretically each slot can be smaller because it is no longer shared. 
Therefore, this regression is expected and acceptable.

*Attachment*
||Benchmark||Score||Link||
|07-15 sortedTwoInput before 
FLINK-23372|1904.626380|[#418|http://codespeed.dak8s.net:8080/job/flink-benchmark-request/418/]|
|07-15 sortedTwoInput after 
FLINK-23372|1782.644331|[#419|http://codespeed.dak8s.net:8080/job/flink-benchmark-request/419/]|
|07-20 sortedTwoInput before 
FLINK-23372|1964.448112|[#420|http://codespeed.dak8s.net:8080/job/flink-benchmark-request/420/]|
|07-20 sortedTwoInput after 
FLINK-23372|1944.880662|[#421|http://codespeed.dak8s.net:808

[jira] [Comment Edited] (FLINK-23593) Performance regression on 15.07.2021

2021-08-10 Thread Zhu Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-23593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17396636#comment-17396636
 ] 

Zhu Zhu edited comment on FLINK-23593 at 8/11/21, 2:53 AM:
---

I think I find the cause of the regression.

*Cause*
The regression happens because FLINK-23372 disables slot sharing of batch job 
tasks. And a default MiniCluster would just provide 1 task manager with 1 slot.
This means that the two source tasks of {{sortedTwoInput}} were able to run 
simultaneously before FLINK-23372 and had to run sequentially after FLINK-23372 
was merged. This increased the total execution time and resulted in the 
regression.

Later on 07-20, an 
[improvement|https://github.com/apache/flink-benchmarks/commit/70d9b7b4927fc38ecf0950e55a47325b71e2dd63]
 was made on flink-benchmarks and changed the MiniCluster to be pre-launched 
with 1 task manager with 4 slots. This enabled the two source tasks of 
{{sortedTwoInput}} to run simultaneously again. And the regression was gone. 
And that's why we cannot reproduce the regression by reverting FLINK-23372 on 
latest master.

This also explains that 
- why the regression only happened to {{sortedTwoInput}} and 
{{sortedMultiInput}} and not to {{sortedOneInput}}. 
- why the performance increased on 07-20 and it also only happened to 
{{sortedTwoInput}} and {{sortedMultiInput}}

*Conclusion*
It is expected that more slots may be needed for a batch job to run tasks 
simultaneously. However, this does not mean more resources are needed because 
theoretically each slot can be smaller because it is no longer shared. 
Therefore, this regression is expected and acceptable.

*Attachment*
||Benchmark||Score||Link||
|07-15 sortedTwoInput before 
FLINK-23372|1904.626380|[#418|http://codespeed.dak8s.net:8080/job/flink-benchmark-request/418/]|
|07-15 sortedTwoInput after 
FLINK-23372|1782.644331|[#419|http://codespeed.dak8s.net:8080/job/flink-benchmark-request/419/]|
|07-20 sortedTwoInput before 
FLINK-23372|1964.448112|[#420|http://codespeed.dak8s.net:8080/job/flink-benchmark-request/420/]|
|07-20 sortedTwoInput after 
FLINK-23372|1944.880662|[#421|http://codespeed.dak8s.net:8080/job/flink-benchmark-request/421/]|
|latest sortedTwoInput on latest 
master|1926.685377|[#413|http://codespeed.dak8s.net:8080/job/flink-benchmark-request/413/]|
|latest sortedTwoInput reverting FLINK-23372 on latest 
master|1938.716479|[#414|http://codespeed.dak8s.net:8080/job/flink-benchmark-request/414/]|




was (Author: zhuzh):
I think I find the cause of the major regression.

*Cause*
The major regression happens because FLINK-23372 disables slot sharing of batch 
job tasks. And a default MiniCluster would just provide 1 task manager with 1 
slot.
This means that the two source tasks of {{sortedTwoInput}} were able to run 
simultaneously before FLINK-23372 and had to run sequentially after FLINK-23372 
was merged. This increased the total execution time and resulted in the 
regression.

Later on 07-20, an 
[improvement|https://github.com/apache/flink-benchmarks/commit/70d9b7b4927fc38ecf0950e55a47325b71e2dd63]
 was made on flink-benchmarks and changed the MiniCluster to be pre-launched 
with 1 task manager with 4 slots. This enabled the two source tasks of 
{{sortedTwoInput}} to run simultaneously again. And the major regression was 
gone. And that's why we cannot reproduce the obvious regression by reverting 
FLINK-23372 on latest master.

This also explains that 
- why the obvious regression only happened to {{sortedTwoInput}} and 
{{sortedMultiInput}} and not to {{sortedOneInput}}. 
- why the performance increased on 07-20 and it also only happened to 
{{sortedTwoInput}} and {{sortedMultiInput}}

*Conclusion*
It is expected that more slots may be needed for a batch job to run tasks 
simultaneously. However, this does not mean more resources are needed because 
theoretically each slot can be smaller because it is no longer shared. 
Therefore, this major regression is expected and acceptable.

Note that there still seems to be a minor regression(~1%) after FLINK-23372. 
The cause may be the increased overhead on slot allocation or memory 
initialization, as Stephan [commented 
above|https://issues.apache.org/jira/browse/FLINK-23593?focusedCommentId=17393287&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17393287].
 It is also acceptable in my opinion.

*Attachment*
||Benchmark||Score||Link||
|07-15 sortedTwoInput before 
FLINK-23372|1904.626380|[#418|http://codespeed.dak8s.net:8080/job/flink-benchmark-request/418/]|
|07-15 sortedTwoInput after 
FLINK-23372|1782.644331|[#419|http://codespeed.dak8s.net:8080/job/flink-benchmark-request/419/]|
|07-20 sortedTwoInput before 
FLINK-23372|1964.448112|[#420|http://codespeed.dak8s.net:8080/job/flink-benchmark-request/420/]|
|07-20 sortedTwoInput after 
FLINK-23372|1944.880662|[#421|http://codespeed.dak8s.net:808

[jira] [Comment Edited] (FLINK-23593) Performance regression on 15.07.2021

2021-08-10 Thread Zhu Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-23593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17396636#comment-17396636
 ] 

Zhu Zhu edited comment on FLINK-23593 at 8/11/21, 3:01 AM:
---

I think I find the cause of the major regression.

*Cause*
The major regression happens because FLINK-23372 disables slot sharing of batch 
job tasks. And a default MiniCluster would just provide 1 task manager with 1 
slot.
This means that the two source tasks of {{sortedTwoInput}} were able to run 
simultaneously before FLINK-23372 and had to run sequentially after FLINK-23372 
was merged. This increased the total execution time and resulted in the major 
regression.

Later on 07-20, an 
[improvement|https://github.com/apache/flink-benchmarks/commit/70d9b7b4927fc38ecf0950e55a47325b71e2dd63]
 was made on flink-benchmarks and changed the MiniCluster to be pre-launched 
with 1 task manager with 4 slots. This enabled the two source tasks of 
{{sortedTwoInput}} to run simultaneously again. And the major regression was 
gone. And that's why we cannot reproduce the obvious regression by reverting 
FLINK-23372 on latest master.

This also explains that 
- why the obvious regression only happened to {{sortedTwoInput}} and 
{{sortedMultiInput}} and not to {{sortedOneInput}}. 
- why the performance increased on 07-20 and it also only happened to 
{{sortedTwoInput}} and {{sortedMultiInput}}

*Conclusion*
It is expected that more slots may be needed for a batch job to run tasks 
simultaneously. However, this does not mean more resources are needed because 
theoretically each slot can be smaller because it is no longer shared. 
Therefore, this regression is expected and acceptable.

Note that there still seems to be minor regression (~1%) after applying 
FLINK-23372. The possible reason is explained above in Stephan's 
[comment|https://issues.apache.org/jira/browse/FLINK-23593?focusedCommentId=17393287&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17393287].
 It's also acceptable in my opinion.

*Attachment*
||Benchmark||Score||Link||
|07-15 sortedTwoInput sharing (right before 
FLINK-23372)|1904.626380|[#418|http://codespeed.dak8s.net:8080/job/flink-benchmark-request/418/]|
|07-15 sortedTwoInput non-sharing (right after 
FLINK-23372)|1782.644331|[#419|http://codespeed.dak8s.net:8080/job/flink-benchmark-request/419/]|
|07-20 sortedTwoInput sharing (right before 
FLINK-23372)|1964.448112|[#420|http://codespeed.dak8s.net:8080/job/flink-benchmark-request/420/]|
|07-20 sortedTwoInput non-sharing (right after 
FLINK-23372)|1944.880662|[#421|http://codespeed.dak8s.net:8080/job/flink-benchmark-request/421/]|
|latest sortedTwoInput sharing (reverting FLINK-23372) on latest 
master)|1938.716479|[#414|http://codespeed.dak8s.net:8080/job/flink-benchmark-request/414/]|
|latest sortedTwoInput sharing on latest 
master|1926.685377|[#413|http://codespeed.dak8s.net:8080/job/flink-benchmark-request/413/]|




was (Author: zhuzh):
I think I find the cause of the regression.

*Cause*
The regression happens because FLINK-23372 disables slot sharing of batch job 
tasks. And a default MiniCluster would just provide 1 task manager with 1 slot.
This means that the two source tasks of {{sortedTwoInput}} were able to run 
simultaneously before FLINK-23372 and had to run sequentially after FLINK-23372 
was merged. This increased the total execution time and resulted in the 
regression.

Later on 07-20, an 
[improvement|https://github.com/apache/flink-benchmarks/commit/70d9b7b4927fc38ecf0950e55a47325b71e2dd63]
 was made on flink-benchmarks and changed the MiniCluster to be pre-launched 
with 1 task manager with 4 slots. This enabled the two source tasks of 
{{sortedTwoInput}} to run simultaneously again. And the regression was gone. 
And that's why we cannot reproduce the regression by reverting FLINK-23372 on 
latest master.

This also explains that 
- why the regression only happened to {{sortedTwoInput}} and 
{{sortedMultiInput}} and not to {{sortedOneInput}}. 
- why the performance increased on 07-20 and it also only happened to 
{{sortedTwoInput}} and {{sortedMultiInput}}

*Conclusion*
It is expected that more slots may be needed for a batch job to run tasks 
simultaneously. However, this does not mean more resources are needed because 
theoretically each slot can be smaller because it is no longer shared. 
Therefore, this regression is expected and acceptable.

*Attachment*
||Benchmark||Score||Link||
|07-15 sortedTwoInput before 
FLINK-23372|1904.626380|[#418|http://codespeed.dak8s.net:8080/job/flink-benchmark-request/418/]|
|07-15 sortedTwoInput after 
FLINK-23372|1782.644331|[#419|http://codespeed.dak8s.net:8080/job/flink-benchmark-request/419/]|
|07-20 sortedTwoInput before 
FLINK-23372|1964.448112|[#420|http://codespeed.dak8s.net:8080/job/flink-benchmark-request/420/]|
|07-20 sortedTwoInput after 
FLINK-23372|194

[jira] [Comment Edited] (FLINK-23593) Performance regression on 15.07.2021

2021-08-10 Thread Zhu Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-23593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17396636#comment-17396636
 ] 

Zhu Zhu edited comment on FLINK-23593 at 8/11/21, 3:01 AM:
---

I think I find the cause of the major regression.

*Cause*
The major regression happens because FLINK-23372 disables slot sharing of batch 
job tasks. And a default MiniCluster would just provide 1 task manager with 1 
slot.
This means that the two source tasks of {{sortedTwoInput}} were able to run 
simultaneously before FLINK-23372 and had to run sequentially after FLINK-23372 
was merged. This increased the total execution time and resulted in the major 
regression.

Later on 07-20, an 
[improvement|https://github.com/apache/flink-benchmarks/commit/70d9b7b4927fc38ecf0950e55a47325b71e2dd63]
 was made on flink-benchmarks and changed the MiniCluster to be pre-launched 
with 1 task manager with 4 slots. This enabled the two source tasks of 
{{sortedTwoInput}} to run simultaneously again. And the major regression was 
gone. And that's why we cannot reproduce the obvious regression by reverting 
FLINK-23372 on latest master.

This also explains that 
- why the obvious regression only happened to {{sortedTwoInput}} and 
{{sortedMultiInput}} and not to {{sortedOneInput}}. 
- why the performance increased on 07-20 and it also only happened to 
{{sortedTwoInput}} and {{sortedMultiInput}}

*Conclusion*
It is expected that more slots may be needed for a batch job to run tasks 
simultaneously. However, this does not mean more resources are needed because 
theoretically each slot can be smaller because it is no longer shared. 
Therefore, this regression is expected and acceptable.

Note that there still seems to be minor regression (~1%) after applying 
FLINK-23372. The possible reason is explained above in Stephan's 
[comment|https://issues.apache.org/jira/browse/FLINK-23593?focusedCommentId=17393287&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17393287].
 It's also acceptable in my opinion.

*Attachment*
||Benchmark||Score||Link||
|07-15 sortedTwoInput sharing (right before 
FLINK-23372)|1904.626380|[#418|http://codespeed.dak8s.net:8080/job/flink-benchmark-request/418/]|
|07-15 sortedTwoInput non-sharing (right after 
FLINK-23372)|1782.644331|[#419|http://codespeed.dak8s.net:8080/job/flink-benchmark-request/419/]|
|07-20 sortedTwoInput sharing (right before 
FLINK-23372)|1964.448112|[#420|http://codespeed.dak8s.net:8080/job/flink-benchmark-request/420/]|
|07-20 sortedTwoInput non-sharing (right after 
FLINK-23372)|1944.880662|[#421|http://codespeed.dak8s.net:8080/job/flink-benchmark-request/421/]|
|latest sortedTwoInput sharing (reverting FLINK-23372) on latest 
master)|1938.716479|[#414|http://codespeed.dak8s.net:8080/job/flink-benchmark-request/414/]|
|latest sortedTwoInput non-sharing on latest 
master|1926.685377|[#413|http://codespeed.dak8s.net:8080/job/flink-benchmark-request/413/]|




was (Author: zhuzh):
I think I find the cause of the major regression.

*Cause*
The major regression happens because FLINK-23372 disables slot sharing of batch 
job tasks. And a default MiniCluster would just provide 1 task manager with 1 
slot.
This means that the two source tasks of {{sortedTwoInput}} were able to run 
simultaneously before FLINK-23372 and had to run sequentially after FLINK-23372 
was merged. This increased the total execution time and resulted in the major 
regression.

Later on 07-20, an 
[improvement|https://github.com/apache/flink-benchmarks/commit/70d9b7b4927fc38ecf0950e55a47325b71e2dd63]
 was made on flink-benchmarks and changed the MiniCluster to be pre-launched 
with 1 task manager with 4 slots. This enabled the two source tasks of 
{{sortedTwoInput}} to run simultaneously again. And the major regression was 
gone. And that's why we cannot reproduce the obvious regression by reverting 
FLINK-23372 on latest master.

This also explains that 
- why the obvious regression only happened to {{sortedTwoInput}} and 
{{sortedMultiInput}} and not to {{sortedOneInput}}. 
- why the performance increased on 07-20 and it also only happened to 
{{sortedTwoInput}} and {{sortedMultiInput}}

*Conclusion*
It is expected that more slots may be needed for a batch job to run tasks 
simultaneously. However, this does not mean more resources are needed because 
theoretically each slot can be smaller because it is no longer shared. 
Therefore, this regression is expected and acceptable.

Note that there still seems to be minor regression (~1%) after applying 
FLINK-23372. The possible reason is explained above in Stephan's 
[comment|https://issues.apache.org/jira/browse/FLINK-23593?focusedCommentId=17393287&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17393287].
 It's also acceptable in my opinion.

*Attachment*
||Benchmark||Score||Link||
|07-15 sortedTwoInput sharin