+1 non-binding

I’ve re-done the manual EMR tests, including running some examples.

Piotrek

> On 10 Feb 2020, at 11:37, Yang Wang <danrtsey...@gmail.com> wrote:
> 
> +1 non-binding
> 
> 
> - Building from source with all tests skipped
> - Build a custom image with 1.10-rc3
> - K8s tests
>    * Deploy a standalone session cluster on K8s and submit multiple jobs
>    * Deploy a standalone per-job cluster
>    * Deploy a native session cluster on K8s with/without HA configured,
> kill TM and jobs could recover successfully
> 
> 
> Best,
> Yang
> 
> Jingsong Li <jingsongl...@gmail.com> 于2020年2月10日周一 下午4:29写道:
> 
>> Hi,
>> 
>> 
>> +1 (non-binding) Thanks for driving this, Gary & Yu.
>> 
>> 
>> There is an unfriendly error here: "OutOfMemoryError: Direct buffer memory"
>> in FileChannelBoundedData$FileBufferReader.
>> 
>> It forces our batch users to configure
>> "taskmanager.memory.task.off-heap.size" in production jobs. And users are
>> hard to know how much memory they need configure.
>> 
>> Even for us developers, it is hard to say how much memory, it depends on
>> tasks left over from the previous stage and the parallelism.
>> 
>> 
>> It is not a blocker, but hope to resolve it in 1.11.
>> 
>> 
>> - Verified signatures and checksums
>> 
>> - Maven build from source skip tests
>> 
>> - Verified pom files point to the 1.10.0 version
>> 
>> - Test Hive integration and SQL client: work well
>> 
>> 
>> Best,
>> 
>> Jingsong Lee
>> 
>> On Mon, Feb 10, 2020 at 12:28 PM Zhu Zhu <reed...@gmail.com> wrote:
>> 
>>> My bad. The missing commit info is caused by building from the src code
>> zip
>>> which does not contain the git info.
>>> So this is not a problem.
>>> 
>>> +1 (binding) for rc3
>>> Here's what's were verified :
>>> * built successfully from the source code
>>> * run a sample streaming and a batch job with parallelism=1000 on yarn
>>> cluster, with the new scheduler and legacy scheduler, the job runs well
>>> (tuned some resource configs to enable the jobs to work well)
>>> * killed TMs to trigger failures, the jobs can finally recover from the
>>> failures
>>> 
>>> Thanks,
>>> Zhu Zhu
>>> 
>>> Zhu Zhu <reed...@gmail.com> 于2020年2月10日周一 上午12:31写道:
>>> 
>>>> The commit info is shown as <unknown> on the web UI and in logs.
>>>> Not sure if it's a common issue or just happens to my build only.
>>>> 
>>>> Thanks,
>>>> Zhu Zhu
>>>> 
>>>> aihua li <liaihua1...@gmail.com> 于2020年2月9日周日 下午7:42写道:
>>>> 
>>>>> Yes, but the results you see in the Performance Code Speed Center [3]
>>>>> skip FLIP-49.
>>>>> The results of the default configurations are overwritten by the
>> latest
>>>>> results.
>>>>> 
>>>>>> 2020年2月9日 下午5:29,Yu Li <car...@gmail.com> 写道:
>>>>>> 
>>>>>> Thanks for the efforts Aihua! These could definitely improve our RC
>>>>> test coverage!
>>>>>> 
>>>>>> Just to confirm, that the stability tests were executed with the
>> same
>>>>> test suite for Alibaba production usage, and the e2e performance one
>> was
>>>>> executed with the test suite proposed in FLIP-83 [1] and FLINK-14917
>>> [2],
>>>>> and the result could also be observed from our performance code-speed
>>>>> center [3], right?
>>>>>> 
>>>>>> Thanks.
>>>>>> 
>>>>>> Best Regards,
>>>>>> Yu
>>>>>> 
>>>>>> [1]
>>>>> 
>>> 
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-83%3A+Flink+End-to-end+Performance+Testing+Framework
>>>>> <
>>>>> 
>>> 
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-83%3A+Flink+End-to-end+Performance+Testing+Framework
>>>>>> 
>>>>>> [2] https://issues.apache.org/jira/browse/FLINK-14917 <
>>>>> https://issues.apache.org/jira/browse/FLINK-14917>
>>>>>> [3] https://s.apache.org/nglhm <https://s.apache.org/nglhm>
>>>>>> 
>>>>>> On Sun, 9 Feb 2020 at 11:20, aihua li <liaihua1...@gmail.com
>> <mailto:
>>>>> liaihua1...@gmail.com>> wrote:
>>>>>> +1 (non-binging)
>>>>>> 
>>>>>> I ran stability tests and end-to-end performance tests in branch
>>>>> release-1.10.0-rc3,both of them passed.
>>>>>> 
>>>>>> Stability test: It mainly checks The flink job can revover from
>>>>> various abnormal situations which concluding disk full,
>>>>>> network interruption, zk unable to connect, rpc message timeout,
>> etc.
>>>>>> If job can't be recoverd it means test failed.
>>>>>> The test passed after running 5 hours.
>>>>>> 
>>>>>> End-to-end performance test: It containes 32 test scenarios which
>>>>> designed in FLIP-83.
>>>>>> Test results: The performance regressions about 3% from 1.9.1 if
>> uses
>>>>> default parameters;
>>>>>> The result:
>>>>>> 
>>>>>> if skips FLIP-49 (add
>> parameters:taskmanager.memory.managed.fraction:
>>>>> 0,taskmanager.memory.flink.size: 1568m in flink-conf.yaml),
>>>>>> the performance improves about 5% from 1.9.1. The result:
>>>>>> 
>>>>>> 
>>>>>> I confirm it with @Xintong Song <
>>>>> https://cwiki.apache.org/confluence/display/~xintongsong> that the
>>>>> result  makes sense.
>>>>>> 
>>>>>>> 2020年2月8日 上午5:54,Gary Yao <g...@apache.org <mailto:g...@apache.org
>>>> 
>>>>> 写道:
>>>>>>> 
>>>>>>> Hi everyone,
>>>>>>> Please review and vote on the release candidate #3 for the version
>>>>> 1.10.0,
>>>>>>> as follows:
>>>>>>> [ ] +1, Approve the release
>>>>>>> [ ] -1, Do not approve the release (please provide specific
>> comments)
>>>>>>> 
>>>>>>> 
>>>>>>> The complete staging area is available for your review, which
>>> includes:
>>>>>>> * JIRA release notes [1],
>>>>>>> * the official Apache source release and binary convenience
>> releases
>>>>> to be
>>>>>>> deployed to dist.apache.org <http://dist.apache.org/> [2], which
>> are
>>>>> signed with the key with
>>>>>>> fingerprint BB137807CEFBE7DD2616556710B12A1F89C115E8 [3],
>>>>>>> * all artifacts to be deployed to the Maven Central Repository [4],
>>>>>>> * source code tag "release-1.10.0-rc3" [5],
>>>>>>> * website pull request listing the new release and adding
>>> announcement
>>>>> blog
>>>>>>> post [6][7].
>>>>>>> 
>>>>>>> The vote will be open for at least 72 hours. It is adopted by
>>> majority
>>>>>>> approval, with at least 3 PMC affirmative votes.
>>>>>>> 
>>>>>>> Thanks,
>>>>>>> Yu & Gary
>>>>>>> 
>>>>>>> [1]
>>>>>>> 
>>>>> 
>>> 
>> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522&version=12345845
>>>>> <
>>>>> 
>>> 
>> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522&version=12345845
>>>>>> 
>>>>>>> [2] https://dist.apache.org/repos/dist/dev/flink/flink-1.10.0-rc3/
>> <
>>>>> https://dist.apache.org/repos/dist/dev/flink/flink-1.10.0-rc3/>
>>>>>>> [3] https://dist.apache.org/repos/dist/release/flink/KEYS <
>>>>> https://dist.apache.org/repos/dist/release/flink/KEYS>
>>>>>>> [4]
>>>>> 
>> https://repository.apache.org/content/repositories/orgapacheflink-1333
>>> <
>>>>> 
>> https://repository.apache.org/content/repositories/orgapacheflink-1333>
>>>>>>> [5]
>> https://github.com/apache/flink/releases/tag/release-1.10.0-rc3
>>> <
>>>>> https://github.com/apache/flink/releases/tag/release-1.10.0-rc3>
>>>>>>> [6] https://github.com/apache/flink-web/pull/302 <
>>>>> https://github.com/apache/flink-web/pull/302>
>>>>>>> [7] https://github.com/apache/flink-web/pull/301 <
>>>>> https://github.com/apache/flink-web/pull/301>
>>>>>> 
>>>>> 
>>>>> 
>>> 
>> 
>> 
>> --
>> Best, Jingsong Lee
>> 

Reply via email to