Hi, I’m hereby canceling the vote for RC2 of Flink 1.8.0 because of various issues mentioned in the vote thread.
Best, Aljoscha > On 19. Mar 2019, at 11:48, Stephan Ewen <se...@apache.org> wrote: > > @Gordon The tupleByKey and benchmarkCount are most likely not caused by > serializers, more probably by a network stack change. > I would look at the AvroSerializer issue independent of those benchmarks. > > On Tue, Mar 19, 2019 at 8:23 AM Piotr Nowojski <piotr.nowoj...@gmail.com> > wrote: > >> Hi all, >> >> Regarding the regression from mid February looks like happened in this >> commit range 3d39cb0..a9eb6d7 >> >> I'm investigating the regression from January 29th. It happened in the >> commit range 35fa2b7..81acd0a (I think I managed to reproduce the results >> locally for it) >> >> Piotrek >> >> wt., 19 mar 2019 o 07:20 jincheng sun <sunjincheng...@gmail.com> >> napisał(a): >> >>> Hi Alijoscha, >>> >>> I have merged the following issues found in RC1 and RC2 into the >>> release-1.8 branch. >>> >>> - Add `frocksdbjni` dependency in NOTICE - FLINK-11950 >>> - Improve end-to-end test - FLINK-11892 >>> - Deprecated Window API - FLINK-11918 >>> >>> Currently, I am performing functional testing of YARN cluster mode and >>> multiple operating systems. I think these tests result will be valid for >>> the next RC as well. >>> >>> Best, >>> Jincheng >>> >>> Shaoxuan Wang <wshaox...@gmail.com> 于2019年3月19日周二 上午11:45写道: >>> >>>> I tested RC2 with the following items: >>>> - Maven Central Repository contains all artifacts >>>> - Built the source with Maven (ensured all source files have Apache >>>> headers) >>>> - Checked checksums and GPG files (for instance, flink-core-1.8.0.jar) >>>> that >>>> match the corresponding release files >>>> - Verified that the source archives do not contains any binaries >>>> - Manually executed the tests in IDE >>>> >>>> @Alijoscha, per the discussion in RC1, we should consider sending the >>>> release vote to the user group to gather more feedbacks. >>>> @Gordon and @Yu, I noticed there are some perf regressions occurred on >>>> Jan.29 (and consistently exist after that) for the tests >>>> of stateBackends.FS and stateBackends.ROCKS_INC. >>>> >>>> http://codespeed.dak8s.net:8000/timeline/#/?exe=1&ben=stateBackends.FS&env=2&revs=200&equid=off&quarts=on&extr=on >>>> >>>> http://codespeed.dak8s.net:8000/timeline/#/?exe=1&ben=tumblingWindow&env=2&revs=200&equid=off&quarts=on&extr=on >>>> @Chesnay, how did you notice and capture the license Notice issue? It >>>> seems >>>> very difficult to track. I am trying to understand the way how we >>>> organized >>>> the license Notice. For this case, why do we only need to add the >>>> dependency of 5.17.2-artisans-1.0 to the Notice file of flink-dist? It >>>> seems there are other modules that bundles dependency of the >>>> flink-statebackend. >>>> >>>> Regards, >>>> Shaoxuan >>>> >>>> >>>> >>>> On Tue, Mar 19, 2019 at 10:49 AM Tzu-Li (Gordon) Tai < >>>> tzuli...@apache.org> >>>> wrote: >>>> >>>>> Hi, >>>>> >>>>> The regressions in the benchmark were also brought up earlier in this >>>>> thread by Yu. >>>>> From the previous investigations, these are the commits that touched >>>>> relevant serializers (TupleSerializer, AvroSerializer, RowSerializer) >>>>> around Jan / Feb: >>>>> >>>>> TupleSerializer - >>>>> 73e4d0ecfd (Thu Feb 14 11:56:51 2019 +0800) [FLINK-10493] Migrate all >>>>> subclasses of TupleSerializerBase to use new serialization >>>> compatibility >>>>> abstractions >>>>> >>>>> AvroSerializer - >>>>> 09bb7bbc0f (Wed Feb 20 09:52:57 2019 +0100) [FLINK-9803] Drop >>>> canEqual() >>>>> from TypeSerializer >>>>> 479ebd5987 (Tue Jan 29 15:06:09 2019 +0800) [FLINK-11436] [avro] >>>> Manually >>>>> Java-deserialize AvroSerializer for backwards compatibility >>>>> >>>>> RowSerializer - >>>>> 09bb7bbc0f (Wed Feb 20 09:52:57 2019 +0100) [FLINK-9803] Drop >>>> canEqual() >>>>> from TypeSerializer >>>>> b434b32c08 (Wed Jan 30 22:53:27 2019 +0800) [FLINK-11329] [table] >>>> Migrating >>>>> the RowSerializer to use new compatibility API >>>>> >>>>> The odd thing is, the times of these commits don't really match the >>>> drops >>>>> in their respective benchmark result timeline. >>>>> For tupleKeyBy benchmark, the drop started around end of January, >>>> where as >>>>> the TupleSerializer was only last touched mid February. >>>>> For the serializerRow and serializerAvro benchmarks, the drop occurred >>>>> around mid February, where as the only commit around that time was >>>>> 09bb7bbc0f ([FLINK-9803] Drop canEqual() from TypeSerializer). >>>>> >>>>> The only possible explanation that I can provide for the AvroSerializer >>>>> benchmark drop for now, is due to 479ebd5987 (FLINK-11436). >>>>> That commit had to touch the `readObject` method of the AvroSerializer, >>>>> which introduced some type checks / casts. >>>>> This may have caused regression in deserializing the AvroSerializer >>>> itself, >>>>> which would have been accounted for in the job initialization phase of >>>> the >>>>> serializerAvro benchmark. >>>>> The commit should not have affected per-record performance of the >>>>> AvroSerializer. >>>>> However, again, the commit time for 479ebd5987 was end of January, >>>> where as >>>>> the benchmark result drop occurred around mid February for the >>>>> serializerAvro benchmark. >>>>> >>>>> We haven't managed to identify any solid causes so far, only the above >>>>> speculations. >>>>> >>>>> Cheers, >>>>> Gordon >>>>> >>>>> >>>>> On Tue, Mar 19, 2019 at 1:36 AM Stephan Ewen <se...@apache.org> wrote: >>>>> >>>>>> Piotr and me discovered a possible issue in the benchmarks. >>>>>> >>>>>> Looking at the time graphs, there seems to be one issue coming >>>> around end >>>>>> of January. It increased network throughput, but decreased overall >>>>>> performance and added more variation in time (possibly through GC). >>>> Check >>>>>> the trend in these graphs: >>>>>> >>>>>> Increased Throughput: >>>>>> >>>>>> >>>>> >>>> http://codespeed.dak8s.net:8000/timeline/#/?exe=1&ben=networkThroughput.1000,100ms&env=2&revs=200&equid=off&quarts=on&extr=on >>>>>> Higher variance in count benchmark: >>>>>> >>>>>> >>>>> >>>> http://codespeed.dak8s.net:8000/timeline/#/?exe=1&ben=benchmarkCount&env=2&revs=200&equid=off&quarts=on&extr=on >>>>>> Drop in tuple-key-by performance trend: >>>>>> >>>>>> >>>>> >>>> http://codespeed.dak8s.net:8000/timeline/#/?exe=1&ben=tupleKeyBy&env=2&revs=200&equid=off&quarts=on&extr=on >>>>>> >>>>>> In addition, the Avro and Row serializers seem to have a performance >>>> drop >>>>>> since mid February: >>>>>> >>>>>> >>>>> >>>> http://codespeed.dak8s.net:8000/timeline/#/?exe=1&ben=serializerAvro&env=2&revs=200&equid=off&quarts=on&extr=on >>>>>> >>>>>> >>>>> >>>> http://codespeed.dak8s.net:8000/timeline/#/?exe=1&ben=serializerRow&env=2&revs=200&equid=off&quarts=on&extr=on >>>>>> >>>>>> @Gordon any idea what could be the cause of this? >>>>>> >>>>>> >>>>>> On Mon, Mar 18, 2019 at 3:08 PM Yu Li <car...@gmail.com> wrote: >>>>>> >>>>>>> Watching the benchmark data for days and indeed it's normalized >>>> for the >>>>>>> time being. However, the result seems to be unstable. I also tried >>>> the >>>>>>> benchmark locally and observed obvious wave even with the same >>>>> commit... >>>>>>> >>>>>>> I guess we may need to improve it such as increasing the >>>>>>> RECORDS_PER_INVOCATION to generate a reproducible result. IMHO a >>>> stable >>>>>>> micro benchmark is important to verify perf-related improvements >>>> (and I >>>>>>> think the benchmark and website are already great ones but just >>>> need >>>>> some >>>>>>> love). Let me mark this as one of my backlog and will open a JIRA >>>> when >>>>>>> prepared. >>>>>>> >>>>>>> Anyway good to know it's not a regression, and thanks for the >>>> efforts >>>>>> spent >>>>>>> on checking it over! @Gordon @Chesnay >>>>>>> >>>>>>> Best Regards, >>>>>>> Yu >>>>>>> >>>>>>> >>>>>>> On Fri, 15 Mar 2019 at 19:20, Chesnay Schepler <ches...@apache.org >>>>> >>>>>> wrote: >>>>>>> >>>>>>>> The regressions is already normalizing again. I'd observer it >>>> further >>>>>>>> before doing anything. >>>>>>>> >>>>>>>> The same applies to the benchmarkCount which tanked even more in >>>> that >>>>>>>> same run. >>>>>>>> >>>>>>>> On 15.03.2019 06:02, Tzu-Li (Gordon) Tai wrote: >>>>>>>>> @Yu >>>>>>>>> Thanks for reporting that Yu, great that this was noticed. >>>>>>>>> >>>>>>>>> The serializerAvro case seems to only be testing on-wire >>>>>> serialization. >>>>>>>>> I checked the changes to the `AvroSerializer`, and it seems >>>> like >>>>>>>>> FLINK-11436 [1] with commit 479ebd59 was the only change that >>>> may >>>>>> have >>>>>>>>> affected that. >>>>>>>>> That commit wasn't introduced exactly around the time when the >>>>>>> indicated >>>>>>>>> performance regression occurred, but was still before the >>>>> regression. >>>>>>>>> The commit introduced some instanceof type checks / type >>>> casting in >>>>>> the >>>>>>>>> readObject of the AvroSerializer, which may have caused this. >>>>>>>>> >>>>>>>>> Currently investigating further. >>>>>>>>> >>>>>>>>> Cheers, >>>>>>>>> Gordon >>>>>>>>> >>>>>>>>> On Fri, Mar 15, 2019 at 11:45 AM Yu Li <car...@gmail.com> >>>> wrote: >>>>>>>>> >>>>>>>>>> Hi Aljoscha and all, >>>>>>>>>> >>>>>>>>>> From our performance benchmark web site ( >>>>>>>>>> http://codespeed.dak8s.net:8000/changes/) I observed a >>>> noticeable >>>>>>>>>> regression (-6.92%) on the serializerAvro case comparing the >>>>> latest >>>>>>> 100 >>>>>>>>>> revisions, which may need some attention. Thanks. >>>>>>>>>> >>>>>>>>>> Best Regards, >>>>>>>>>> Yu >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Thu, 14 Mar 2019 at 20:42, Aljoscha Krettek < >>>>> aljos...@apache.org >>>>>>> >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>>> Hi everyone, >>>>>>>>>>> Please review and vote on the release candidate 2 for Flink >>>>> 1.8.0, >>>>>> as >>>>>>>>>>> follows: >>>>>>>>>>> [ ] +1, Approve the release >>>>>>>>>>> [ ] -1, Do not approve the release (please provide specific >>>>>> comments) >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> The complete staging area is available for your review, which >>>>>>> includes: >>>>>>>>>>> * JIRA release notes [1], >>>>>>>>>>> * the official Apache source release and binary convenience >>>>>> releases >>>>>>> to >>>>>>>>>> be >>>>>>>>>>> deployed to dist.apache.org <http://dist.apache.org/> [2], >>>> which >>>>>> are >>>>>>>>>>> signed with the key with fingerprint >>>>>>>>>>> F2A67A8047499BBB3908D17AA8F4FD97121D7293 [3], >>>>>>>>>>> * all artifacts to be deployed to the Maven Central >>>> Repository >>>>> [4], >>>>>>>>>>> * source code tag "release-1.8.0-rc2" [5], >>>>>>>>>>> * website pull request listing the new release [6] >>>>>>>>>>> * website pull request adding announcement blog post [7]. >>>>>>>>>>> >>>>>>>>>>> The vote will be open for at least 72 hours. It is adopted by >>>>>>> majority >>>>>>>>>>> approval, with at least 3 PMC affirmative votes. >>>>>>>>>>> >>>>>>>>>>> Thanks, >>>>>>>>>>> Aljoscha >>>>>>>>>>> >>>>>>>>>>> [1] >>>>>>>>>>> >>>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522&version=12344274 >>>>>>>>>>> < >>>>>>>>>>> >>>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522&version=12344274 >>>>>>>>>>> [2] >>>>> https://dist.apache.org/repos/dist/dev/flink/flink-1.8.0-rc2/ >>>>>> < >>>>>>>>>>> >>>> https://dist.apache.org/repos/dist/dev/flink/flink-1.8.0-rc2/> >>>>>>>>>>> [3] https://dist.apache.org/repos/dist/release/flink/KEYS < >>>>>>>>>>> https://dist.apache.org/repos/dist/release/flink/KEYS> >>>>>>>>>>> [4] >>>>>>>>>> >>>>>>> >>>> https://repository.apache.org/content/repositories/orgapacheflink-1213 >>>>>>>>>>> < >>>>>>>> >>>>>> >>>> https://repository.apache.org/content/repositories/orgapacheflink-1210/ >>>>>>>>>>> >>>>>>>>>>> [5] >>>>>>>>>>> >>>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> https://gitbox.apache.org/repos/asf?p=flink.git;a=tag;h=c77a329b71e3068bfde965ae91921ad5c47246dd >>>>>>>>>>> < >>>>>>>>>>> >>>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> https://gitbox.apache.org/repos/asf?p=flink.git;a=tag;h=2d00b1c26d7b4554707063ab0d1d6cc236cfe8a5 >>>>>>>>>>> [6] https://github.com/apache/flink-web/pull/180 < >>>>>>>>>>> https://github.com/apache/flink-web/pull/180> >>>>>>>>>>> [7] https://github.com/apache/flink-web/pull/179 < >>>>>>>>>>> https://github.com/apache/flink-web/pull/179> >>>>>>>>>>> >>>>>>>>>>> P.S. The difference to the previous RC1 is very small, you >>>> can >>>>>> fetch >>>>>>>> the >>>>>>>>>>> two tags and do a "git log >>>> release-1.8.0-rc1..release-1.8.0-rc2” >>>>> to >>>>>>> see >>>>>>>>>> the >>>>>>>>>>> difference in commits. Its fixes for the issues that led to >>>> the >>>>>>>>>>> cancellation of the previous RC plus smaller fixes. Most >>>>>>>>>>> verification/testing that was carried out should apply as is >>>> to >>>>>> this >>>>>>>> RC. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>>