Hi,

The regressions in the benchmark were also brought up earlier in this
thread by Yu.
>From the previous investigations, these are the commits that touched
relevant serializers (TupleSerializer, AvroSerializer, RowSerializer)
around Jan / Feb:

TupleSerializer -
73e4d0ecfd (Thu Feb 14 11:56:51 2019 +0800) [FLINK-10493] Migrate all
subclasses of TupleSerializerBase to use new serialization compatibility
abstractions

AvroSerializer -
09bb7bbc0f (Wed Feb 20 09:52:57 2019 +0100) [FLINK-9803] Drop canEqual()
from TypeSerializer
479ebd5987 (Tue Jan 29 15:06:09 2019 +0800) [FLINK-11436] [avro] Manually
Java-deserialize AvroSerializer for backwards compatibility

RowSerializer -
09bb7bbc0f (Wed Feb 20 09:52:57 2019 +0100) [FLINK-9803] Drop canEqual()
from TypeSerializer
b434b32c08 (Wed Jan 30 22:53:27 2019 +0800) [FLINK-11329] [table] Migrating
the RowSerializer to use new compatibility API

The odd thing is, the times of these commits don't really match the drops
in their respective benchmark result timeline.
For tupleKeyBy benchmark, the drop started around end of January, where as
the TupleSerializer was only last touched mid February.
For the serializerRow and serializerAvro benchmarks, the drop occurred
around mid February, where as the only commit around that time was
09bb7bbc0f ([FLINK-9803] Drop canEqual() from TypeSerializer).

The only possible explanation that I can provide for the AvroSerializer
benchmark drop for now, is due to 479ebd5987 (FLINK-11436).
That commit had to touch the `readObject` method of the AvroSerializer,
which introduced some type checks / casts.
This may have caused regression in deserializing the AvroSerializer itself,
which would have been accounted for in the job initialization phase of the
serializerAvro benchmark.
The commit should not have affected per-record performance of the
AvroSerializer.
However, again, the commit time for 479ebd5987 was end of January, where as
the benchmark result drop occurred around mid February for the
serializerAvro benchmark.

We haven't managed to identify any solid causes so far, only the above
speculations.

Cheers,
Gordon


On Tue, Mar 19, 2019 at 1:36 AM Stephan Ewen <se...@apache.org> wrote:

> Piotr and me discovered a possible issue in the benchmarks.
>
> Looking at the time graphs, there seems to be one issue coming around end
> of January. It increased network throughput, but decreased overall
> performance and added more variation in time (possibly through GC). Check
> the trend in these graphs:
>
> Increased Throughput:
>
> http://codespeed.dak8s.net:8000/timeline/#/?exe=1&ben=networkThroughput.1000,100ms&env=2&revs=200&equid=off&quarts=on&extr=on
> Higher variance in count benchmark:
>
> http://codespeed.dak8s.net:8000/timeline/#/?exe=1&ben=benchmarkCount&env=2&revs=200&equid=off&quarts=on&extr=on
> Drop in tuple-key-by performance trend:
>
> http://codespeed.dak8s.net:8000/timeline/#/?exe=1&ben=tupleKeyBy&env=2&revs=200&equid=off&quarts=on&extr=on
>
> In addition, the Avro and Row serializers seem to have a performance drop
> since mid February:
>
> http://codespeed.dak8s.net:8000/timeline/#/?exe=1&ben=serializerAvro&env=2&revs=200&equid=off&quarts=on&extr=on
>
> http://codespeed.dak8s.net:8000/timeline/#/?exe=1&ben=serializerRow&env=2&revs=200&equid=off&quarts=on&extr=on
>
> @Gordon any idea what could be the cause of this?
>
>
> On Mon, Mar 18, 2019 at 3:08 PM Yu Li <car...@gmail.com> wrote:
>
> > Watching the benchmark data for days and indeed it's normalized for the
> > time being. However, the result seems to be unstable. I also tried the
> > benchmark locally and observed obvious wave even with the same commit...
> >
> > I guess we may need to improve it such as increasing the
> > RECORDS_PER_INVOCATION to generate a reproducible result. IMHO a stable
> > micro benchmark is important to verify perf-related improvements (and I
> > think the benchmark and website are already great ones but just need some
> > love). Let me mark this as one of my backlog and will open a JIRA when
> > prepared.
> >
> > Anyway good to know it's not a regression, and thanks for the efforts
> spent
> > on checking it over! @Gordon @Chesnay
> >
> > Best Regards,
> > Yu
> >
> >
> > On Fri, 15 Mar 2019 at 19:20, Chesnay Schepler <ches...@apache.org>
> wrote:
> >
> > > The regressions is already normalizing again. I'd observer it further
> > > before doing anything.
> > >
> > > The same applies to the benchmarkCount which tanked even more in that
> > > same run.
> > >
> > > On 15.03.2019 06:02, Tzu-Li (Gordon) Tai wrote:
> > > > @Yu
> > > > Thanks for reporting that Yu, great that this was noticed.
> > > >
> > > > The serializerAvro case seems to only be testing on-wire
> serialization.
> > > > I checked the changes to the `AvroSerializer`, and it seems like
> > > > FLINK-11436 [1] with commit 479ebd59 was the only change that may
> have
> > > > affected that.
> > > > That commit wasn't introduced exactly around the time when the
> > indicated
> > > > performance regression occurred, but was still before the regression.
> > > > The commit introduced some instanceof type checks / type casting in
> the
> > > > readObject of the AvroSerializer, which may have caused this.
> > > >
> > > > Currently investigating further.
> > > >
> > > > Cheers,
> > > > Gordon
> > > >
> > > > On Fri, Mar 15, 2019 at 11:45 AM Yu Li <car...@gmail.com> wrote:
> > > >
> > > >> Hi Aljoscha and all,
> > > >>
> > > >>  From our performance benchmark web site (
> > > >> http://codespeed.dak8s.net:8000/changes/) I observed a noticeable
> > > >> regression (-6.92%) on the serializerAvro case comparing the latest
> > 100
> > > >> revisions, which may need some attention. Thanks.
> > > >>
> > > >> Best Regards,
> > > >> Yu
> > > >>
> > > >>
> > > >> On Thu, 14 Mar 2019 at 20:42, Aljoscha Krettek <aljos...@apache.org
> >
> > > >> wrote:
> > > >>
> > > >>> Hi everyone,
> > > >>> Please review and vote on the release candidate 2 for Flink 1.8.0,
> as
> > > >>> follows:
> > > >>> [ ] +1, Approve the release
> > > >>> [ ] -1, Do not approve the release (please provide specific
> comments)
> > > >>>
> > > >>>
> > > >>> The complete staging area is available for your review, which
> > includes:
> > > >>> * JIRA release notes [1],
> > > >>> * the official Apache source release and binary convenience
> releases
> > to
> > > >> be
> > > >>> deployed to dist.apache.org <http://dist.apache.org/> [2], which
> are
> > > >>> signed with the key with fingerprint
> > > >>> F2A67A8047499BBB3908D17AA8F4FD97121D7293 [3],
> > > >>> * all artifacts to be deployed to the Maven Central Repository [4],
> > > >>> * source code tag "release-1.8.0-rc2" [5],
> > > >>> * website pull request listing the new release [6]
> > > >>> * website pull request adding announcement blog post [7].
> > > >>>
> > > >>> The vote will be open for at least 72 hours. It is adopted by
> > majority
> > > >>> approval, with at least 3 PMC affirmative votes.
> > > >>>
> > > >>> Thanks,
> > > >>> Aljoscha
> > > >>>
> > > >>> [1]
> > > >>>
> > > >>
> > >
> >
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522&version=12344274
> > > >>> <
> > > >>>
> > > >>
> > >
> >
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522&version=12344274
> > > >>> [2] https://dist.apache.org/repos/dist/dev/flink/flink-1.8.0-rc2/
> <
> > > >>> https://dist.apache.org/repos/dist/dev/flink/flink-1.8.0-rc2/>
> > > >>> [3] https://dist.apache.org/repos/dist/release/flink/KEYS <
> > > >>> https://dist.apache.org/repos/dist/release/flink/KEYS>
> > > >>> [4]
> > > >>
> > https://repository.apache.org/content/repositories/orgapacheflink-1213
> > > >>> <
> > >
> https://repository.apache.org/content/repositories/orgapacheflink-1210/
> > > >>>
> > > >>> [5]
> > > >>>
> > > >>
> > >
> >
> https://gitbox.apache.org/repos/asf?p=flink.git;a=tag;h=c77a329b71e3068bfde965ae91921ad5c47246dd
> > > >>> <
> > > >>>
> > > >>
> > >
> >
> https://gitbox.apache.org/repos/asf?p=flink.git;a=tag;h=2d00b1c26d7b4554707063ab0d1d6cc236cfe8a5
> > > >>> [6] https://github.com/apache/flink-web/pull/180 <
> > > >>> https://github.com/apache/flink-web/pull/180>
> > > >>> [7] https://github.com/apache/flink-web/pull/179 <
> > > >>> https://github.com/apache/flink-web/pull/179>
> > > >>>
> > > >>> P.S. The difference to the previous RC1 is very small, you can
> fetch
> > > the
> > > >>> two tags and do a "git log release-1.8.0-rc1..release-1.8.0-rc2” to
> > see
> > > >> the
> > > >>> difference in commits. Its fixes for the issues that led to the
> > > >>> cancellation of the previous RC plus smaller fixes. Most
> > > >>> verification/testing that was carried out should apply as is to
> this
> > > RC.
> > >
> > >
> > >
> >
>

Reply via email to