+1 non-binding I’ve re-done the manual EMR tests, including running some examples.
Piotrek > On 10 Feb 2020, at 11:37, Yang Wang <[email protected]> wrote: > > +1 non-binding > > > - Building from source with all tests skipped > - Build a custom image with 1.10-rc3 > - K8s tests > * Deploy a standalone session cluster on K8s and submit multiple jobs > * Deploy a standalone per-job cluster > * Deploy a native session cluster on K8s with/without HA configured, > kill TM and jobs could recover successfully > > > Best, > Yang > > Jingsong Li <[email protected]> 于2020年2月10日周一 下午4:29写道: > >> Hi, >> >> >> +1 (non-binding) Thanks for driving this, Gary & Yu. >> >> >> There is an unfriendly error here: "OutOfMemoryError: Direct buffer memory" >> in FileChannelBoundedData$FileBufferReader. >> >> It forces our batch users to configure >> "taskmanager.memory.task.off-heap.size" in production jobs. And users are >> hard to know how much memory they need configure. >> >> Even for us developers, it is hard to say how much memory, it depends on >> tasks left over from the previous stage and the parallelism. >> >> >> It is not a blocker, but hope to resolve it in 1.11. >> >> >> - Verified signatures and checksums >> >> - Maven build from source skip tests >> >> - Verified pom files point to the 1.10.0 version >> >> - Test Hive integration and SQL client: work well >> >> >> Best, >> >> Jingsong Lee >> >> On Mon, Feb 10, 2020 at 12:28 PM Zhu Zhu <[email protected]> wrote: >> >>> My bad. The missing commit info is caused by building from the src code >> zip >>> which does not contain the git info. >>> So this is not a problem. >>> >>> +1 (binding) for rc3 >>> Here's what's were verified : >>> * built successfully from the source code >>> * run a sample streaming and a batch job with parallelism=1000 on yarn >>> cluster, with the new scheduler and legacy scheduler, the job runs well >>> (tuned some resource configs to enable the jobs to work well) >>> * killed TMs to trigger failures, the jobs can finally recover from the >>> failures >>> >>> Thanks, >>> Zhu Zhu >>> >>> Zhu Zhu <[email protected]> 于2020年2月10日周一 上午12:31写道: >>> >>>> The commit info is shown as <unknown> on the web UI and in logs. >>>> Not sure if it's a common issue or just happens to my build only. >>>> >>>> Thanks, >>>> Zhu Zhu >>>> >>>> aihua li <[email protected]> 于2020年2月9日周日 下午7:42写道: >>>> >>>>> Yes, but the results you see in the Performance Code Speed Center [3] >>>>> skip FLIP-49. >>>>> The results of the default configurations are overwritten by the >> latest >>>>> results. >>>>> >>>>>> 2020年2月9日 下午5:29,Yu Li <[email protected]> 写道: >>>>>> >>>>>> Thanks for the efforts Aihua! These could definitely improve our RC >>>>> test coverage! >>>>>> >>>>>> Just to confirm, that the stability tests were executed with the >> same >>>>> test suite for Alibaba production usage, and the e2e performance one >> was >>>>> executed with the test suite proposed in FLIP-83 [1] and FLINK-14917 >>> [2], >>>>> and the result could also be observed from our performance code-speed >>>>> center [3], right? >>>>>> >>>>>> Thanks. >>>>>> >>>>>> Best Regards, >>>>>> Yu >>>>>> >>>>>> [1] >>>>> >>> >> https://cwiki.apache.org/confluence/display/FLINK/FLIP-83%3A+Flink+End-to-end+Performance+Testing+Framework >>>>> < >>>>> >>> >> https://cwiki.apache.org/confluence/display/FLINK/FLIP-83%3A+Flink+End-to-end+Performance+Testing+Framework >>>>>> >>>>>> [2] https://issues.apache.org/jira/browse/FLINK-14917 < >>>>> https://issues.apache.org/jira/browse/FLINK-14917> >>>>>> [3] https://s.apache.org/nglhm <https://s.apache.org/nglhm> >>>>>> >>>>>> On Sun, 9 Feb 2020 at 11:20, aihua li <[email protected] >> <mailto: >>>>> [email protected]>> wrote: >>>>>> +1 (non-binging) >>>>>> >>>>>> I ran stability tests and end-to-end performance tests in branch >>>>> release-1.10.0-rc3,both of them passed. >>>>>> >>>>>> Stability test: It mainly checks The flink job can revover from >>>>> various abnormal situations which concluding disk full, >>>>>> network interruption, zk unable to connect, rpc message timeout, >> etc. >>>>>> If job can't be recoverd it means test failed. >>>>>> The test passed after running 5 hours. >>>>>> >>>>>> End-to-end performance test: It containes 32 test scenarios which >>>>> designed in FLIP-83. >>>>>> Test results: The performance regressions about 3% from 1.9.1 if >> uses >>>>> default parameters; >>>>>> The result: >>>>>> >>>>>> if skips FLIP-49 (add >> parameters:taskmanager.memory.managed.fraction: >>>>> 0,taskmanager.memory.flink.size: 1568m in flink-conf.yaml), >>>>>> the performance improves about 5% from 1.9.1. The result: >>>>>> >>>>>> >>>>>> I confirm it with @Xintong Song < >>>>> https://cwiki.apache.org/confluence/display/~xintongsong> that the >>>>> result makes sense. >>>>>> >>>>>>> 2020年2月8日 上午5:54,Gary Yao <[email protected] <mailto:[email protected] >>>> >>>>> 写道: >>>>>>> >>>>>>> Hi everyone, >>>>>>> Please review and vote on the release candidate #3 for the version >>>>> 1.10.0, >>>>>>> as follows: >>>>>>> [ ] +1, Approve the release >>>>>>> [ ] -1, Do not approve the release (please provide specific >> comments) >>>>>>> >>>>>>> >>>>>>> The complete staging area is available for your review, which >>> includes: >>>>>>> * JIRA release notes [1], >>>>>>> * the official Apache source release and binary convenience >> releases >>>>> to be >>>>>>> deployed to dist.apache.org <http://dist.apache.org/> [2], which >> are >>>>> signed with the key with >>>>>>> fingerprint BB137807CEFBE7DD2616556710B12A1F89C115E8 [3], >>>>>>> * all artifacts to be deployed to the Maven Central Repository [4], >>>>>>> * source code tag "release-1.10.0-rc3" [5], >>>>>>> * website pull request listing the new release and adding >>> announcement >>>>> blog >>>>>>> post [6][7]. >>>>>>> >>>>>>> The vote will be open for at least 72 hours. It is adopted by >>> majority >>>>>>> approval, with at least 3 PMC affirmative votes. >>>>>>> >>>>>>> Thanks, >>>>>>> Yu & Gary >>>>>>> >>>>>>> [1] >>>>>>> >>>>> >>> >> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522&version=12345845 >>>>> < >>>>> >>> >> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522&version=12345845 >>>>>> >>>>>>> [2] https://dist.apache.org/repos/dist/dev/flink/flink-1.10.0-rc3/ >> < >>>>> https://dist.apache.org/repos/dist/dev/flink/flink-1.10.0-rc3/> >>>>>>> [3] https://dist.apache.org/repos/dist/release/flink/KEYS < >>>>> https://dist.apache.org/repos/dist/release/flink/KEYS> >>>>>>> [4] >>>>> >> https://repository.apache.org/content/repositories/orgapacheflink-1333 >>> < >>>>> >> https://repository.apache.org/content/repositories/orgapacheflink-1333> >>>>>>> [5] >> https://github.com/apache/flink/releases/tag/release-1.10.0-rc3 >>> < >>>>> https://github.com/apache/flink/releases/tag/release-1.10.0-rc3> >>>>>>> [6] https://github.com/apache/flink-web/pull/302 < >>>>> https://github.com/apache/flink-web/pull/302> >>>>>>> [7] https://github.com/apache/flink-web/pull/301 < >>>>> https://github.com/apache/flink-web/pull/301> >>>>>> >>>>> >>>>> >>> >> >> >> -- >> Best, Jingsong Lee >>
