Hi, The regressions in the benchmark were also brought up earlier in this thread by Yu. >From the previous investigations, these are the commits that touched relevant serializers (TupleSerializer, AvroSerializer, RowSerializer) around Jan / Feb:
TupleSerializer - 73e4d0ecfd (Thu Feb 14 11:56:51 2019 +0800) [FLINK-10493] Migrate all subclasses of TupleSerializerBase to use new serialization compatibility abstractions AvroSerializer - 09bb7bbc0f (Wed Feb 20 09:52:57 2019 +0100) [FLINK-9803] Drop canEqual() from TypeSerializer 479ebd5987 (Tue Jan 29 15:06:09 2019 +0800) [FLINK-11436] [avro] Manually Java-deserialize AvroSerializer for backwards compatibility RowSerializer - 09bb7bbc0f (Wed Feb 20 09:52:57 2019 +0100) [FLINK-9803] Drop canEqual() from TypeSerializer b434b32c08 (Wed Jan 30 22:53:27 2019 +0800) [FLINK-11329] [table] Migrating the RowSerializer to use new compatibility API The odd thing is, the times of these commits don't really match the drops in their respective benchmark result timeline. For tupleKeyBy benchmark, the drop started around end of January, where as the TupleSerializer was only last touched mid February. For the serializerRow and serializerAvro benchmarks, the drop occurred around mid February, where as the only commit around that time was 09bb7bbc0f ([FLINK-9803] Drop canEqual() from TypeSerializer). The only possible explanation that I can provide for the AvroSerializer benchmark drop for now, is due to 479ebd5987 (FLINK-11436). That commit had to touch the `readObject` method of the AvroSerializer, which introduced some type checks / casts. This may have caused regression in deserializing the AvroSerializer itself, which would have been accounted for in the job initialization phase of the serializerAvro benchmark. The commit should not have affected per-record performance of the AvroSerializer. However, again, the commit time for 479ebd5987 was end of January, where as the benchmark result drop occurred around mid February for the serializerAvro benchmark. We haven't managed to identify any solid causes so far, only the above speculations. Cheers, Gordon On Tue, Mar 19, 2019 at 1:36 AM Stephan Ewen <se...@apache.org> wrote: > Piotr and me discovered a possible issue in the benchmarks. > > Looking at the time graphs, there seems to be one issue coming around end > of January. It increased network throughput, but decreased overall > performance and added more variation in time (possibly through GC). Check > the trend in these graphs: > > Increased Throughput: > > http://codespeed.dak8s.net:8000/timeline/#/?exe=1&ben=networkThroughput.1000,100ms&env=2&revs=200&equid=off&quarts=on&extr=on > Higher variance in count benchmark: > > http://codespeed.dak8s.net:8000/timeline/#/?exe=1&ben=benchmarkCount&env=2&revs=200&equid=off&quarts=on&extr=on > Drop in tuple-key-by performance trend: > > http://codespeed.dak8s.net:8000/timeline/#/?exe=1&ben=tupleKeyBy&env=2&revs=200&equid=off&quarts=on&extr=on > > In addition, the Avro and Row serializers seem to have a performance drop > since mid February: > > http://codespeed.dak8s.net:8000/timeline/#/?exe=1&ben=serializerAvro&env=2&revs=200&equid=off&quarts=on&extr=on > > http://codespeed.dak8s.net:8000/timeline/#/?exe=1&ben=serializerRow&env=2&revs=200&equid=off&quarts=on&extr=on > > @Gordon any idea what could be the cause of this? > > > On Mon, Mar 18, 2019 at 3:08 PM Yu Li <car...@gmail.com> wrote: > > > Watching the benchmark data for days and indeed it's normalized for the > > time being. However, the result seems to be unstable. I also tried the > > benchmark locally and observed obvious wave even with the same commit... > > > > I guess we may need to improve it such as increasing the > > RECORDS_PER_INVOCATION to generate a reproducible result. IMHO a stable > > micro benchmark is important to verify perf-related improvements (and I > > think the benchmark and website are already great ones but just need some > > love). Let me mark this as one of my backlog and will open a JIRA when > > prepared. > > > > Anyway good to know it's not a regression, and thanks for the efforts > spent > > on checking it over! @Gordon @Chesnay > > > > Best Regards, > > Yu > > > > > > On Fri, 15 Mar 2019 at 19:20, Chesnay Schepler <ches...@apache.org> > wrote: > > > > > The regressions is already normalizing again. I'd observer it further > > > before doing anything. > > > > > > The same applies to the benchmarkCount which tanked even more in that > > > same run. > > > > > > On 15.03.2019 06:02, Tzu-Li (Gordon) Tai wrote: > > > > @Yu > > > > Thanks for reporting that Yu, great that this was noticed. > > > > > > > > The serializerAvro case seems to only be testing on-wire > serialization. > > > > I checked the changes to the `AvroSerializer`, and it seems like > > > > FLINK-11436 [1] with commit 479ebd59 was the only change that may > have > > > > affected that. > > > > That commit wasn't introduced exactly around the time when the > > indicated > > > > performance regression occurred, but was still before the regression. > > > > The commit introduced some instanceof type checks / type casting in > the > > > > readObject of the AvroSerializer, which may have caused this. > > > > > > > > Currently investigating further. > > > > > > > > Cheers, > > > > Gordon > > > > > > > > On Fri, Mar 15, 2019 at 11:45 AM Yu Li <car...@gmail.com> wrote: > > > > > > > >> Hi Aljoscha and all, > > > >> > > > >> From our performance benchmark web site ( > > > >> http://codespeed.dak8s.net:8000/changes/) I observed a noticeable > > > >> regression (-6.92%) on the serializerAvro case comparing the latest > > 100 > > > >> revisions, which may need some attention. Thanks. > > > >> > > > >> Best Regards, > > > >> Yu > > > >> > > > >> > > > >> On Thu, 14 Mar 2019 at 20:42, Aljoscha Krettek <aljos...@apache.org > > > > > >> wrote: > > > >> > > > >>> Hi everyone, > > > >>> Please review and vote on the release candidate 2 for Flink 1.8.0, > as > > > >>> follows: > > > >>> [ ] +1, Approve the release > > > >>> [ ] -1, Do not approve the release (please provide specific > comments) > > > >>> > > > >>> > > > >>> The complete staging area is available for your review, which > > includes: > > > >>> * JIRA release notes [1], > > > >>> * the official Apache source release and binary convenience > releases > > to > > > >> be > > > >>> deployed to dist.apache.org <http://dist.apache.org/> [2], which > are > > > >>> signed with the key with fingerprint > > > >>> F2A67A8047499BBB3908D17AA8F4FD97121D7293 [3], > > > >>> * all artifacts to be deployed to the Maven Central Repository [4], > > > >>> * source code tag "release-1.8.0-rc2" [5], > > > >>> * website pull request listing the new release [6] > > > >>> * website pull request adding announcement blog post [7]. > > > >>> > > > >>> The vote will be open for at least 72 hours. It is adopted by > > majority > > > >>> approval, with at least 3 PMC affirmative votes. > > > >>> > > > >>> Thanks, > > > >>> Aljoscha > > > >>> > > > >>> [1] > > > >>> > > > >> > > > > > > https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522&version=12344274 > > > >>> < > > > >>> > > > >> > > > > > > https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522&version=12344274 > > > >>> [2] https://dist.apache.org/repos/dist/dev/flink/flink-1.8.0-rc2/ > < > > > >>> https://dist.apache.org/repos/dist/dev/flink/flink-1.8.0-rc2/> > > > >>> [3] https://dist.apache.org/repos/dist/release/flink/KEYS < > > > >>> https://dist.apache.org/repos/dist/release/flink/KEYS> > > > >>> [4] > > > >> > > https://repository.apache.org/content/repositories/orgapacheflink-1213 > > > >>> < > > > > https://repository.apache.org/content/repositories/orgapacheflink-1210/ > > > >>> > > > >>> [5] > > > >>> > > > >> > > > > > > https://gitbox.apache.org/repos/asf?p=flink.git;a=tag;h=c77a329b71e3068bfde965ae91921ad5c47246dd > > > >>> < > > > >>> > > > >> > > > > > > https://gitbox.apache.org/repos/asf?p=flink.git;a=tag;h=2d00b1c26d7b4554707063ab0d1d6cc236cfe8a5 > > > >>> [6] https://github.com/apache/flink-web/pull/180 < > > > >>> https://github.com/apache/flink-web/pull/180> > > > >>> [7] https://github.com/apache/flink-web/pull/179 < > > > >>> https://github.com/apache/flink-web/pull/179> > > > >>> > > > >>> P.S. The difference to the previous RC1 is very small, you can > fetch > > > the > > > >>> two tags and do a "git log release-1.8.0-rc1..release-1.8.0-rc2” to > > see > > > >> the > > > >>> difference in commits. Its fixes for the issues that led to the > > > >>> cancellation of the previous RC plus smaller fixes. Most > > > >>> verification/testing that was carried out should apply as is to > this > > > RC. > > > > > > > > > > > >