@Gordon The tupleByKey and benchmarkCount are most likely not caused by serializers, more probably by a network stack change. I would look at the AvroSerializer issue independent of those benchmarks.
On Tue, Mar 19, 2019 at 8:23 AM Piotr Nowojski <piotr.nowoj...@gmail.com> wrote: > Hi all, > > Regarding the regression from mid February looks like happened in this > commit range 3d39cb0..a9eb6d7 > > I'm investigating the regression from January 29th. It happened in the > commit range 35fa2b7..81acd0a (I think I managed to reproduce the results > locally for it) > > Piotrek > > wt., 19 mar 2019 o 07:20 jincheng sun <sunjincheng...@gmail.com> > napisał(a): > >> Hi Alijoscha, >> >> I have merged the following issues found in RC1 and RC2 into the >> release-1.8 branch. >> >> - Add `frocksdbjni` dependency in NOTICE - FLINK-11950 >> - Improve end-to-end test - FLINK-11892 >> - Deprecated Window API - FLINK-11918 >> >> Currently, I am performing functional testing of YARN cluster mode and >> multiple operating systems. I think these tests result will be valid for >> the next RC as well. >> >> Best, >> Jincheng >> >> Shaoxuan Wang <wshaox...@gmail.com> 于2019年3月19日周二 上午11:45写道: >> >>> I tested RC2 with the following items: >>> - Maven Central Repository contains all artifacts >>> - Built the source with Maven (ensured all source files have Apache >>> headers) >>> - Checked checksums and GPG files (for instance, flink-core-1.8.0.jar) >>> that >>> match the corresponding release files >>> - Verified that the source archives do not contains any binaries >>> - Manually executed the tests in IDE >>> >>> @Alijoscha, per the discussion in RC1, we should consider sending the >>> release vote to the user group to gather more feedbacks. >>> @Gordon and @Yu, I noticed there are some perf regressions occurred on >>> Jan.29 (and consistently exist after that) for the tests >>> of stateBackends.FS and stateBackends.ROCKS_INC. >>> >>> http://codespeed.dak8s.net:8000/timeline/#/?exe=1&ben=stateBackends.FS&env=2&revs=200&equid=off&quarts=on&extr=on >>> >>> http://codespeed.dak8s.net:8000/timeline/#/?exe=1&ben=tumblingWindow&env=2&revs=200&equid=off&quarts=on&extr=on >>> @Chesnay, how did you notice and capture the license Notice issue? It >>> seems >>> very difficult to track. I am trying to understand the way how we >>> organized >>> the license Notice. For this case, why do we only need to add the >>> dependency of 5.17.2-artisans-1.0 to the Notice file of flink-dist? It >>> seems there are other modules that bundles dependency of the >>> flink-statebackend. >>> >>> Regards, >>> Shaoxuan >>> >>> >>> >>> On Tue, Mar 19, 2019 at 10:49 AM Tzu-Li (Gordon) Tai < >>> tzuli...@apache.org> >>> wrote: >>> >>> > Hi, >>> > >>> > The regressions in the benchmark were also brought up earlier in this >>> > thread by Yu. >>> > From the previous investigations, these are the commits that touched >>> > relevant serializers (TupleSerializer, AvroSerializer, RowSerializer) >>> > around Jan / Feb: >>> > >>> > TupleSerializer - >>> > 73e4d0ecfd (Thu Feb 14 11:56:51 2019 +0800) [FLINK-10493] Migrate all >>> > subclasses of TupleSerializerBase to use new serialization >>> compatibility >>> > abstractions >>> > >>> > AvroSerializer - >>> > 09bb7bbc0f (Wed Feb 20 09:52:57 2019 +0100) [FLINK-9803] Drop >>> canEqual() >>> > from TypeSerializer >>> > 479ebd5987 (Tue Jan 29 15:06:09 2019 +0800) [FLINK-11436] [avro] >>> Manually >>> > Java-deserialize AvroSerializer for backwards compatibility >>> > >>> > RowSerializer - >>> > 09bb7bbc0f (Wed Feb 20 09:52:57 2019 +0100) [FLINK-9803] Drop >>> canEqual() >>> > from TypeSerializer >>> > b434b32c08 (Wed Jan 30 22:53:27 2019 +0800) [FLINK-11329] [table] >>> Migrating >>> > the RowSerializer to use new compatibility API >>> > >>> > The odd thing is, the times of these commits don't really match the >>> drops >>> > in their respective benchmark result timeline. >>> > For tupleKeyBy benchmark, the drop started around end of January, >>> where as >>> > the TupleSerializer was only last touched mid February. >>> > For the serializerRow and serializerAvro benchmarks, the drop occurred >>> > around mid February, where as the only commit around that time was >>> > 09bb7bbc0f ([FLINK-9803] Drop canEqual() from TypeSerializer). >>> > >>> > The only possible explanation that I can provide for the AvroSerializer >>> > benchmark drop for now, is due to 479ebd5987 (FLINK-11436). >>> > That commit had to touch the `readObject` method of the AvroSerializer, >>> > which introduced some type checks / casts. >>> > This may have caused regression in deserializing the AvroSerializer >>> itself, >>> > which would have been accounted for in the job initialization phase of >>> the >>> > serializerAvro benchmark. >>> > The commit should not have affected per-record performance of the >>> > AvroSerializer. >>> > However, again, the commit time for 479ebd5987 was end of January, >>> where as >>> > the benchmark result drop occurred around mid February for the >>> > serializerAvro benchmark. >>> > >>> > We haven't managed to identify any solid causes so far, only the above >>> > speculations. >>> > >>> > Cheers, >>> > Gordon >>> > >>> > >>> > On Tue, Mar 19, 2019 at 1:36 AM Stephan Ewen <se...@apache.org> wrote: >>> > >>> > > Piotr and me discovered a possible issue in the benchmarks. >>> > > >>> > > Looking at the time graphs, there seems to be one issue coming >>> around end >>> > > of January. It increased network throughput, but decreased overall >>> > > performance and added more variation in time (possibly through GC). >>> Check >>> > > the trend in these graphs: >>> > > >>> > > Increased Throughput: >>> > > >>> > > >>> > >>> http://codespeed.dak8s.net:8000/timeline/#/?exe=1&ben=networkThroughput.1000,100ms&env=2&revs=200&equid=off&quarts=on&extr=on >>> > > Higher variance in count benchmark: >>> > > >>> > > >>> > >>> http://codespeed.dak8s.net:8000/timeline/#/?exe=1&ben=benchmarkCount&env=2&revs=200&equid=off&quarts=on&extr=on >>> > > Drop in tuple-key-by performance trend: >>> > > >>> > > >>> > >>> http://codespeed.dak8s.net:8000/timeline/#/?exe=1&ben=tupleKeyBy&env=2&revs=200&equid=off&quarts=on&extr=on >>> > > >>> > > In addition, the Avro and Row serializers seem to have a performance >>> drop >>> > > since mid February: >>> > > >>> > > >>> > >>> http://codespeed.dak8s.net:8000/timeline/#/?exe=1&ben=serializerAvro&env=2&revs=200&equid=off&quarts=on&extr=on >>> > > >>> > > >>> > >>> http://codespeed.dak8s.net:8000/timeline/#/?exe=1&ben=serializerRow&env=2&revs=200&equid=off&quarts=on&extr=on >>> > > >>> > > @Gordon any idea what could be the cause of this? >>> > > >>> > > >>> > > On Mon, Mar 18, 2019 at 3:08 PM Yu Li <car...@gmail.com> wrote: >>> > > >>> > > > Watching the benchmark data for days and indeed it's normalized >>> for the >>> > > > time being. However, the result seems to be unstable. I also tried >>> the >>> > > > benchmark locally and observed obvious wave even with the same >>> > commit... >>> > > > >>> > > > I guess we may need to improve it such as increasing the >>> > > > RECORDS_PER_INVOCATION to generate a reproducible result. IMHO a >>> stable >>> > > > micro benchmark is important to verify perf-related improvements >>> (and I >>> > > > think the benchmark and website are already great ones but just >>> need >>> > some >>> > > > love). Let me mark this as one of my backlog and will open a JIRA >>> when >>> > > > prepared. >>> > > > >>> > > > Anyway good to know it's not a regression, and thanks for the >>> efforts >>> > > spent >>> > > > on checking it over! @Gordon @Chesnay >>> > > > >>> > > > Best Regards, >>> > > > Yu >>> > > > >>> > > > >>> > > > On Fri, 15 Mar 2019 at 19:20, Chesnay Schepler <ches...@apache.org >>> > >>> > > wrote: >>> > > > >>> > > > > The regressions is already normalizing again. I'd observer it >>> further >>> > > > > before doing anything. >>> > > > > >>> > > > > The same applies to the benchmarkCount which tanked even more in >>> that >>> > > > > same run. >>> > > > > >>> > > > > On 15.03.2019 06:02, Tzu-Li (Gordon) Tai wrote: >>> > > > > > @Yu >>> > > > > > Thanks for reporting that Yu, great that this was noticed. >>> > > > > > >>> > > > > > The serializerAvro case seems to only be testing on-wire >>> > > serialization. >>> > > > > > I checked the changes to the `AvroSerializer`, and it seems >>> like >>> > > > > > FLINK-11436 [1] with commit 479ebd59 was the only change that >>> may >>> > > have >>> > > > > > affected that. >>> > > > > > That commit wasn't introduced exactly around the time when the >>> > > > indicated >>> > > > > > performance regression occurred, but was still before the >>> > regression. >>> > > > > > The commit introduced some instanceof type checks / type >>> casting in >>> > > the >>> > > > > > readObject of the AvroSerializer, which may have caused this. >>> > > > > > >>> > > > > > Currently investigating further. >>> > > > > > >>> > > > > > Cheers, >>> > > > > > Gordon >>> > > > > > >>> > > > > > On Fri, Mar 15, 2019 at 11:45 AM Yu Li <car...@gmail.com> >>> wrote: >>> > > > > > >>> > > > > >> Hi Aljoscha and all, >>> > > > > >> >>> > > > > >> From our performance benchmark web site ( >>> > > > > >> http://codespeed.dak8s.net:8000/changes/) I observed a >>> noticeable >>> > > > > >> regression (-6.92%) on the serializerAvro case comparing the >>> > latest >>> > > > 100 >>> > > > > >> revisions, which may need some attention. Thanks. >>> > > > > >> >>> > > > > >> Best Regards, >>> > > > > >> Yu >>> > > > > >> >>> > > > > >> >>> > > > > >> On Thu, 14 Mar 2019 at 20:42, Aljoscha Krettek < >>> > aljos...@apache.org >>> > > > >>> > > > > >> wrote: >>> > > > > >> >>> > > > > >>> Hi everyone, >>> > > > > >>> Please review and vote on the release candidate 2 for Flink >>> > 1.8.0, >>> > > as >>> > > > > >>> follows: >>> > > > > >>> [ ] +1, Approve the release >>> > > > > >>> [ ] -1, Do not approve the release (please provide specific >>> > > comments) >>> > > > > >>> >>> > > > > >>> >>> > > > > >>> The complete staging area is available for your review, which >>> > > > includes: >>> > > > > >>> * JIRA release notes [1], >>> > > > > >>> * the official Apache source release and binary convenience >>> > > releases >>> > > > to >>> > > > > >> be >>> > > > > >>> deployed to dist.apache.org <http://dist.apache.org/> [2], >>> which >>> > > are >>> > > > > >>> signed with the key with fingerprint >>> > > > > >>> F2A67A8047499BBB3908D17AA8F4FD97121D7293 [3], >>> > > > > >>> * all artifacts to be deployed to the Maven Central >>> Repository >>> > [4], >>> > > > > >>> * source code tag "release-1.8.0-rc2" [5], >>> > > > > >>> * website pull request listing the new release [6] >>> > > > > >>> * website pull request adding announcement blog post [7]. >>> > > > > >>> >>> > > > > >>> The vote will be open for at least 72 hours. It is adopted by >>> > > > majority >>> > > > > >>> approval, with at least 3 PMC affirmative votes. >>> > > > > >>> >>> > > > > >>> Thanks, >>> > > > > >>> Aljoscha >>> > > > > >>> >>> > > > > >>> [1] >>> > > > > >>> >>> > > > > >> >>> > > > > >>> > > > >>> > > >>> > >>> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522&version=12344274 >>> > > > > >>> < >>> > > > > >>> >>> > > > > >> >>> > > > > >>> > > > >>> > > >>> > >>> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522&version=12344274 >>> > > > > >>> [2] >>> > https://dist.apache.org/repos/dist/dev/flink/flink-1.8.0-rc2/ >>> > > < >>> > > > > >>> >>> https://dist.apache.org/repos/dist/dev/flink/flink-1.8.0-rc2/> >>> > > > > >>> [3] https://dist.apache.org/repos/dist/release/flink/KEYS < >>> > > > > >>> https://dist.apache.org/repos/dist/release/flink/KEYS> >>> > > > > >>> [4] >>> > > > > >> >>> > > > >>> https://repository.apache.org/content/repositories/orgapacheflink-1213 >>> > > > > >>> < >>> > > > > >>> > > >>> https://repository.apache.org/content/repositories/orgapacheflink-1210/ >>> > > > > >>> >>> > > > > >>> [5] >>> > > > > >>> >>> > > > > >> >>> > > > > >>> > > > >>> > > >>> > >>> https://gitbox.apache.org/repos/asf?p=flink.git;a=tag;h=c77a329b71e3068bfde965ae91921ad5c47246dd >>> > > > > >>> < >>> > > > > >>> >>> > > > > >> >>> > > > > >>> > > > >>> > > >>> > >>> https://gitbox.apache.org/repos/asf?p=flink.git;a=tag;h=2d00b1c26d7b4554707063ab0d1d6cc236cfe8a5 >>> > > > > >>> [6] https://github.com/apache/flink-web/pull/180 < >>> > > > > >>> https://github.com/apache/flink-web/pull/180> >>> > > > > >>> [7] https://github.com/apache/flink-web/pull/179 < >>> > > > > >>> https://github.com/apache/flink-web/pull/179> >>> > > > > >>> >>> > > > > >>> P.S. The difference to the previous RC1 is very small, you >>> can >>> > > fetch >>> > > > > the >>> > > > > >>> two tags and do a "git log >>> release-1.8.0-rc1..release-1.8.0-rc2” >>> > to >>> > > > see >>> > > > > >> the >>> > > > > >>> difference in commits. Its fixes for the issues that led to >>> the >>> > > > > >>> cancellation of the previous RC plus smaller fixes. Most >>> > > > > >>> verification/testing that was carried out should apply as is >>> to >>> > > this >>> > > > > RC. >>> > > > > >>> > > > > >>> > > > > >>> > > > >>> > > >>> > >>> >>