Hi Thomas,

Regarding [2], it has more detail infos in the Jira description 
(https://issues.apache.org/jira/browse/FLINK-16404). 

I can also give some basic explanations here to dismiss the concern.
1. In the past, the following buffers after the barrier will be cached on 
downstream side before alignment.
2. In 1.11, the upstream would not send the buffers after the barrier. When the 
downstream finishes the alignment, it will notify the downstream of continuing 
sending following buffers, since it can process them after alignment.
3. The only difference is that the temporary blocked buffers are cached either 
on downstream side or on upstream side before alignment.
4. The side effect would be the additional notification cost for every barrier 
alignment. If the downstream and upstream are deployed in separate TaskManager, 
the cost is network transport delay (the effect can be ignored based on our 
testing with 1s checkpoint interval). For sharing slot in your case, the cost 
is only one method call in processor, can be ignored also.

You mentioned "In this case, the downstream task has a high average checkpoint 
duration(~30s, sync part)." This duration is not reflecting the changes above, 
and it is only indicating the duration for calling `Operation.snapshotState`. 
If this duration is beyond your expectation, you can check or debug whether the 
source/sink operations might take more time to finish `snapshotState` in 
practice. E.g. you can
make the implementation of this method as empty to further verify the effect.

Best,
Zhijiang


------------------------------------------------------------------
From:Thomas Weise <t...@apache.org>
Send Time:2020年7月5日(星期日) 12:22
To:dev <dev@flink.apache.org>; Zhijiang <wangzhijiang...@aliyun.com>
Cc:Yingjie Cao <kevin.ying...@gmail.com>
Subject:Re: [VOTE] Release 1.11.0, release candidate #4

Hi Zhijiang,

Could you please point me to more details regarding: "[2]: Delay send the
following buffers after checkpoint barrier on upstream side until barrier
alignment on downstream side."

In this case, the downstream task has a high average checkpoint duration
(~30s, sync part). If there was a change to hold buffers depending on
downstream performance, could this possibly apply to this case (even when
there is no shuffle that would require alignment)?

Thanks,
Thomas


On Sat, Jul 4, 2020 at 7:39 AM Zhijiang <wangzhijiang...@aliyun.com.invalid>
wrote:

> Hi Thomas,
>
> Thanks for the further update information.
>
> I guess we can dismiss the network stack changes, since in your case the
> downstream and upstream would probably be deployed in the same slot
> bypassing the network data shuffle.
> Also I guess release-1.11 will not bring general performance regression in
> runtime engine, as we also did the performance testing for all general
> cases by [1] in real cluster before and the testing results should fit the
> expectation. But we indeed did not test the specific source and sink
> connectors yet as I known.
>
> Regarding your performance regression with 40%, I wonder it is probably
> related to specific source/sink changes (e.g. kinesis) or environment
> issues with corner case.
> If possible, it would be helpful to further locate whether the regression
> is caused by kinesis, by replacing the kinesis source & sink and keeping
> the others same.
>
> As you said, it would be efficient to contact with you directly next week
> to further discuss this issue. And we are willing/eager to provide any help
> to resolve this issue soon.
>
> Besides that, I guess this issue should not be the blocker for the
> release, since it is probably a corner case based on the current analysis.
> If we really conclude anything need to be resolved after the final
> release, then we can also make the next minor release-1.11.1 come soon.
>
> [1] https://issues.apache.org/jira/browse/FLINK-18433
>
> Best,
> Zhijiang
>
>
> ------------------------------------------------------------------
> From:Thomas Weise <t...@apache.org>
> Send Time:2020年7月4日(星期六) 12:26
> To:dev <dev@flink.apache.org>; Zhijiang <wangzhijiang...@aliyun.com>
> Cc:Yingjie Cao <kevin.ying...@gmail.com>
> Subject:Re: [VOTE] Release 1.11.0, release candidate #4
>
> Hi Zhijiang,
>
> It will probably be best if we connect next week and discuss the issue
> directly since this could be quite difficult to reproduce.
>
> Before the testing result on our side comes out for your respective job
> case, I have some other questions to confirm for further analysis:
>     -  How much percentage regression you found after switching to 1.11?
>
> ~40% throughput decline
>
>     -  Are there any network bottleneck in your cluster? E.g. the network
> bandwidth is full caused by other jobs? If so, it might have more effects
> by above [2]
>
> The test runs on a k8s cluster that is also used for other production jobs.
> There is no reason be believe network is the bottleneck.
>
>     -  Did you adjust the default network buffer setting? E.g.
> "taskmanager.network.memory.floating-buffers-per-gate" or
> "taskmanager.network.memory.buffers-per-channel"
>
> The job is using the defaults, i.e we don't configure the settings. If you
> want me to try specific settings in the hope that it will help to isolate
> the issue please let me know.
>
>     -  I guess the topology has three vertexes "KinesisConsumer -> Chained
> FlatMap -> KinesisProducer", and the partition mode for "KinesisConsumer ->
> FlatMap" and "FlatMap->KinesisProducer" are both "forward"? If so, the edge
> connection is one-to-one, not all-to-all, then the above [1][2] should no
> effects in theory with default network buffer setting.
>
> There are only 2 vertices and the edge is "forward".
>
>     - By slot sharing, I guess these three vertex parallelism task would
> probably be deployed into the same slot, then the data shuffle is by memory
> queue, not network stack. If so, the above [2] should no effect.
>
> Yes, vertices share slots.
>
>     - I also saw some Jira changes for kinesis in this release, could you
> confirm that these changes would not effect the performance?
>
> I will need to take a look. 1.10 already had a regression introduced by the
> Kinesis producer update.
>
>
> Thanks,
> Thomas
>
>
> On Thu, Jul 2, 2020 at 11:46 PM Zhijiang <wangzhijiang...@aliyun.com
> .invalid>
> wrote:
>
> > Hi Thomas,
> >
> > Thanks for your reply with rich information!
> >
> > We are trying to reproduce your case in our cluster to further verify it,
> > and  @Yingjie Cao is working on it now.
> >  As we have not kinesis consumer and producer internally, so we will
> > construct the common source and sink instead in the case of backpressure.
> >
> > Firstly, we can dismiss the rockdb factor in this release, since you also
> > mentioned that "filesystem leads to same symptoms".
> >
> > Secondly, if my understanding is right, you emphasis that the regression
> > only exists for the jobs with low checkpoint interval (10s).
> > Based on that, I have two suspicions with the network related changes in
> > this release:
> >     - [1]: Limited the maximum backlog value (default 10) in subpartition
> > queue.
> >     - [2]: Delay send the following buffers after checkpoint barrier on
> > upstream side until barrier alignment on downstream side.
> >
> > These changes are motivated for reducing the in-flight buffers to speedup
> > checkpoint especially in the case of backpressure.
> > In theory they should have very minor performance effect and actually we
> > also tested in cluster to verify within expectation before merging them,
> >  but maybe there are other corner cases we have not thought of before.
> >
> > Before the testing result on our side comes out for your respective job
> > case, I have some other questions to confirm for further analysis:
> >     -  How much percentage regression you found after switching to 1.11?
> >     -  Are there any network bottleneck in your cluster? E.g. the network
> > bandwidth is full caused by other jobs? If so, it might have more effects
> > by above [2]
> >     -  Did you adjust the default network buffer setting? E.g.
> > "taskmanager.network.memory.floating-buffers-per-gate" or
> > "taskmanager.network.memory.buffers-per-channel"
> >     -  I guess the topology has three vertexes "KinesisConsumer ->
> Chained
> > FlatMap -> KinesisProducer", and the partition mode for "KinesisConsumer
> ->
> > FlatMap" and "FlatMap->KinesisProducer" are both "forward"? If so, the
> edge
> > connection is one-to-one, not all-to-all, then the above [1][2] should no
> > effects in theory with default network buffer setting.
> >     - By slot sharing, I guess these three vertex parallelism task would
> > probably be deployed into the same slot, then the data shuffle is by
> memory
> > queue, not network stack. If so, the above [2] should no effect.
> >     - I also saw some Jira changes for kinesis in this release, could you
> > confirm that these changes would not effect the performance?
> >
> > Best,
> > Zhijiang
> >
> >
> > ------------------------------------------------------------------
> > From:Thomas Weise <t...@apache.org>
> > Send Time:2020年7月3日(星期五) 01:07
> > To:dev <dev@flink.apache.org>; Zhijiang <wangzhijiang...@aliyun.com>
> > Subject:Re: [VOTE] Release 1.11.0, release candidate #4
> >
> > Hi Zhijiang,
> >
> > The performance degradation manifests in backpressure which leads to
> > growing backlog in the source. I switched a few times between 1.10 and
> 1.11
> > and the behavior is consistent.
> >
> > The DAG is:
> >
> > KinesisConsumer -> (Flat Map, Flat Map, Flat Map)   -------- forward
> > ---------> KinesisProducer
> >
> > Parallelism: 160
> > No shuffle/rebalance.
> >
> > Checkpointing config:
> >
> > Checkpointing Mode Exactly Once
> > Interval 10s
> > Timeout 10m 0s
> > Minimum Pause Between Checkpoints 10s
> > Maximum Concurrent Checkpoints 1
> > Persist Checkpoints Externally Enabled (delete on cancellation)
> >
> > State backend: rocksdb  (filesystem leads to same symptoms)
> > Checkpoint size is tiny (500KB)
> >
> > An interesting difference to another job that I had upgraded successfully
> > is the low checkpointing interval.
> >
> > Thanks,
> > Thomas
> >
> >
> > On Wed, Jul 1, 2020 at 9:02 PM Zhijiang <wangzhijiang...@aliyun.com
> > .invalid>
> > wrote:
> >
> > > Hi Thomas,
> > >
> > > Thanks for the efficient feedback.
> > >
> > > Regarding the suggestion of adding the release notes document, I agree
> > > with your point. Maybe we should adjust the vote template accordingly
> in
> > > the respective wiki to guide the following release processes.
> > >
> > > Regarding the performance regression, could you provide some more
> details
> > > for our better measurement or reproducing on our sides?
> > > E.g. I guess the topology only includes two vertexes source and sink?
> > > What is the parallelism for every vertex?
> > > The upstream shuffles data to the downstream via rebalance partitioner
> or
> > > other?
> > > The checkpoint mode is exactly-once with rocksDB state backend?
> > > The backpressure happened in this case?
> > > How much percentage regression in this case?
> > >
> > > Best,
> > > Zhijiang
> > >
> > >
> > >
> > > ------------------------------------------------------------------
> > > From:Thomas Weise <t...@apache.org>
> > > Send Time:2020年7月2日(星期四) 09:54
> > > To:dev <dev@flink.apache.org>
> > > Subject:Re: [VOTE] Release 1.11.0, release candidate #4
> > >
> > > Hi Till,
> > >
> > > Yes, we don't have the setting in flink-conf.yaml.
> > >
> > > Generally, we carry forward the existing configuration and any change
> to
> > > default configuration values would impact the upgrade.
> > >
> > > Yes, since it is an incompatible change I would state it in the release
> > > notes.
> > >
> > > Thanks,
> > > Thomas
> > >
> > > BTW I found a performance regression while trying to upgrade another
> > > pipeline with this RC. It is a simple Kinesis to Kinesis job. Wasn't
> able
> > > to pin it down yet, symptoms include increased checkpoint alignment
> time.
> > >
> > > On Wed, Jul 1, 2020 at 12:04 AM Till Rohrmann <trohrm...@apache.org>
> > > wrote:
> > >
> > > > Hi Thomas,
> > > >
> > > > just to confirm: When starting the image in local mode, then you
> don't
> > > have
> > > > any of the JobManager memory configuration settings configured in the
> > > > effective flink-conf.yaml, right? Does this mean that you have
> > explicitly
> > > > removed `jobmanager.heap.size: 1024m` from the default configuration?
> > If
> > > > this is the case, then I believe it was more of an unintentional
> > artifact
> > > > that it worked before and it has been corrected now so that one needs
> > to
> > > > specify the memory of the JM process explicitly. Do you think it
> would
> > > help
> > > > to explicitly state this in the release notes?
> > > >
> > > > Cheers,
> > > > Till
> > > >
> > > > On Wed, Jul 1, 2020 at 7:01 AM Thomas Weise <t...@apache.org> wrote:
> > > >
> > > > > Thanks for preparing another RC!
> > > > >
> > > > > As mentioned in the previous RC thread, it would be super helpful
> if
> > > the
> > > > > release notes that are part of the documentation can be included
> [1].
> > > > It's
> > > > > a significant time-saver to have read those first.
> > > > >
> > > > > I found one more non-backward compatible change that would be worth
> > > > > addressing/mentioning:
> > > > >
> > > > > It is now necessary to configure the jobmanager heap size in
> > > > > flink-conf.yaml (with either jobmanager.heap.size
> > > > > or jobmanager.memory.heap.size). Why would I not want to do that
> > > anyways?
> > > > > Well, we set it dynamically for a cluster deployment via the
> > > > > flinkk8soperator, but the container image can also be used for
> > testing
> > > > with
> > > > > local mode (./bin/jobmanager.sh start-foreground local). That will
> > fail
> > > > if
> > > > > the heap wasn't configured and that's how I noticed it.
> > > > >
> > > > > Thanks,
> > > > > Thomas
> > > > >
> > > > > [1]
> > > > >
> > > > >
> > > >
> > >
> >
> https://ci.apache.org/projects/flink/flink-docs-release-1.11/release-notes/flink-1.11.html
> > > > >
> > > > > On Tue, Jun 30, 2020 at 3:18 AM Zhijiang <
> wangzhijiang...@aliyun.com
> > > > > .invalid>
> > > > > wrote:
> > > > >
> > > > > > Hi everyone,
> > > > > >
> > > > > > Please review and vote on the release candidate #4 for the
> version
> > > > > 1.11.0,
> > > > > > as follows:
> > > > > > [ ] +1, Approve the release
> > > > > > [ ] -1, Do not approve the release (please provide specific
> > comments)
> > > > > >
> > > > > > The complete staging area is available for your review, which
> > > includes:
> > > > > > * JIRA release notes [1],
> > > > > > * the official Apache source release and binary convenience
> > releases
> > > to
> > > > > be
> > > > > > deployed to dist.apache.org [2], which are signed with the key
> > with
> > > > > > fingerprint 2DA85B93244FDFA19A6244500653C0A2CEA00D0E [3],
> > > > > > * all artifacts to be deployed to the Maven Central Repository
> [4],
> > > > > > * source code tag "release-1.11.0-rc4" [5],
> > > > > > * website pull request listing the new release and adding
> > > announcement
> > > > > > blog post [6].
> > > > > >
> > > > > > The vote will be open for at least 72 hours. It is adopted by
> > > majority
> > > > > > approval, with at least 3 PMC affirmative votes.
> > > > > >
> > > > > > Thanks,
> > > > > > Release Manager
> > > > > >
> > > > > > [1]
> > > > > >
> > > > >
> > > >
> > >
> >
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522&version=12346364
> > > > > > [2]
> https://dist.apache.org/repos/dist/dev/flink/flink-1.11.0-rc4/
> > > > > > [3] https://dist.apache.org/repos/dist/release/flink/KEYS
> > > > > > [4]
> > > > > >
> > > >
> > https://repository.apache.org/content/repositories/orgapacheflink-1377/
> > > > > > [5]
> > https://github.com/apache/flink/releases/tag/release-1.11.0-rc4
> > > > > > [6] https://github.com/apache/flink-web/pull/352
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> > >
> >
> >
>
>

Reply via email to