[jira] [Commented] (FLINK-14968) Kerberized YARN on Docker test (custom fs plugin) fails on Travis

Aljoscha Krettek (Jira) Thu, 28 Nov 2019 09:36:08 -0800


    [ 
https://issues.apache.org/jira/browse/FLINK-14968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16984565#comment-16984565
 ]


Aljoscha Krettek commented on FLINK-14968:
------------------------------------------

Something very strange is going on. When I try this on a (dockerized) YARN 
cluster the job sometimes needs 3 slots to run and sometimes needs 4 slots. I 
run this job:

{code}
bin/flink run -m yarn-cluster -p 3 -yjm 2000 -ytm 2000 
examples/streaming/WordCount.jar --input hdfs:///wc-in-1 --input 
hdfs:///wc-in-2 --output hdfs:///wc-out
{code}

The attached logs show the (DEBUG) jobmanager.log of two different runs.

> Kerberized YARN on Docker test (custom fs plugin) fails on Travis
> -----------------------------------------------------------------
>
>                 Key: FLINK-14968
>                 URL: https://issues.apache.org/jira/browse/FLINK-14968
>             Project: Flink
>          Issue Type: Bug
>          Components: FileSystems, Tests
>    Affects Versions: 1.10.0
>            Reporter: Gary Yao
>            Priority: Blocker
>              Labels: test-stability
>             Fix For: 1.10.0
>
>         Attachments: run-with-3-slots.txt, run-with-4-slots.txt
>
>
> This change made the test flaky: 
> https://github.com/apache/flink/commit/749965348170e4608ff2a23c9617f67b8c341df5.
>  It changes the job to have two sources instead of one which, under normal 
> circumstances, requires too many slots to run and therefore the job will fail.
> The setup of this test is very intricate, we configure YARN to have two 
> NodeManagers with 2500mb memory each: 
> https://github.com/apache/flink/blob/413a77157caf25dbbfb8b0caaf2c9e12c7374d98/flink-end-to-end-tests/test-scripts/docker-hadoop-secure-cluster/config/yarn-site.xml#L39.
>  We run the job with parallelism 3 and configure Flink to use 1000mb as 
> TaskManager memory and 1000mb of JobManager memory. This means that the job 
> fits into the YARN memory budget but more TaskManagers would not fit. We also 
> don't simply increase the YARN resources because we want the Flink job to use 
> TMs on different NMs because we had a bug where Kerberos config file shipping 
> was not working correctly but the bug was not materialising if all TMs where 
> on the same NM.
> https://api.travis-ci.org/v3/job/612782888/log.txt



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (FLINK-14968) Kerberized YARN on Docker test (custom fs plugin) fails on Travis

Reply via email to