[ 
https://issues.apache.org/jira/browse/BEAM-7993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16919211#comment-16919211
 ] 

Hannah Jiang edited comment on BEAM-7993 at 8/30/19 5:44 AM:
-------------------------------------------------------------

I tried to run at local with py2 and py36.

I ran more than 10 times, and all of them parallel running py2 and py36 and no 
one failed. So I haven't try to run one by one, because it doesn't explain 
non-parallel run would solve the problem.

I wrote a build.gradle to trigger python portable precommit job at local. 
Jenkins uses a job to build the tasks, so it might not be exactly same.

[~markflyhigh], are you familiar with gradle/jenkins? I try to remove parallel 
part from the Jenkins job, but all precommit job share the same job builder, 
impact range is very big. Is it possible for you to remove parallel part for 
Python Portable Precommit tasks only?


was (Author: hannahjiang):
I tried to run at local with py2 and py36.

I wrote a build.gradle to trigger python portable precommit job at local. 
Jenkins uses a job to build the tasks, so it might not be exactly same.

I ran more than 10 times, and all of them parallel running py2 and py36 and no 
one failed. So I haven't try to run one by one, because it doesn't explain 
non-parallel run would solve the problem.

[~markflyhigh], are you familiar with gradle? I try to remove parallel part 
from the Jenkins job, but all precommit job share the same job builder, impact 
range is very big. Is it possible for you to remove parallel part for Python 
Portable Precommit tasks only?

> portable python precommit is flaky
> ----------------------------------
>
>                 Key: BEAM-7993
>                 URL: https://issues.apache.org/jira/browse/BEAM-7993
>             Project: Beam
>          Issue Type: Bug
>          Components: sdk-py-core, test-failures, testing
>    Affects Versions: 2.15.0
>            Reporter: Udi Meiri
>            Assignee: Mark Liu
>            Priority: Major
>              Labels: currently-failing
>             Fix For: 2.16.0
>
>          Time Spent: 40m
>  Remaining Estimate: 0h
>
> I'm not sure what the root cause is here.
> Example log where 
> :sdks:python:test-suites:portable:py35:portableWordCountBatch failed:
> {code}
> 11:51:22 [CHAIN MapPartition (MapPartition at [1]read/Read/Split) -> FlatMap 
> (FlatMap at ExtractOutput[0]) (2/2)] ERROR 
> org.apache.flink.runtime.operators.BatchTask - Error in task code:  CHAIN 
> MapPartition (MapPartition at [1]read/Read/Split) -> FlatMap (FlatMap at 
> ExtractOutput[0]) (2/2)
> 11:51:22 [CHAIN MapPartition (MapPartition at [1]read/Read/Split) -> FlatMap 
> (FlatMap at ExtractOutput[0]) (1/2)] ERROR 
> org.apache.flink.runtime.operators.BatchTask - Error in task code:  CHAIN 
> MapPartition (MapPartition at [1]read/Read/Split) -> FlatMap (FlatMap at 
> ExtractOutput[0]) (1/2)
> 11:51:22 [CHAIN MapPartition (MapPartition at 
> [2]write/Write/WriteImpl/DoOnce/{FlatMap(<lambda at core.py:2457>), 
> Map(decode)}) -> FlatMap (FlatMap at ExtractOutput[0]) (2/2)] ERROR 
> org.apache.flink.runtime.operators.BatchTask - Error in task code:  CHAIN 
> MapPartition (MapPartition at 
> [2]write/Write/WriteImpl/DoOnce/{FlatMap(<lambda at core.py:2457>), 
> Map(decode)}) -> FlatMap (FlatMap at ExtractOutput[0]) (2/2)
> 11:51:22 [CHAIN MapPartition (MapPartition at 
> [2]write/Write/WriteImpl/DoOnce/{FlatMap(<lambda at core.py:2457>), 
> Map(decode)}) -> FlatMap (FlatMap at ExtractOutput[0]) (1/2)] ERROR 
> org.apache.flink.runtime.operators.BatchTask - Error in task code:  CHAIN 
> MapPartition (MapPartition at 
> [2]write/Write/WriteImpl/DoOnce/{FlatMap(<lambda at core.py:2457>), 
> Map(decode)}) -> FlatMap (FlatMap at ExtractOutput[0]) (1/2)
> 11:51:22 java.lang.Exception: The user defined 'open()' method caused an 
> exception: java.io.IOException: Received exit code 1 for command 'docker 
> inspect -f {{.State.Running}} 
> 642c312c335d3881b885873c66917b536e79cff07503fdceaddee5fbeb10bfd1'. stderr: 
> Error: No such object: 
> 642c312c335d3881b885873c66917b536e79cff07503fdceaddee5fbeb10bfd1
> 11:51:22      at 
> org.apache.flink.runtime.operators.BatchTask.run(BatchTask.java:498)
> 11:51:22      at 
> org.apache.flink.runtime.operators.BatchTask.invoke(BatchTask.java:368)
> 11:51:22      at org.apache.flink.runtime.taskmanager.Task.run(Task.java:712)
> 11:51:22      at java.lang.Thread.run(Thread.java:748)
> 11:51:22 Caused by: 
> org.apache.beam.vendor.guava.v26_0_jre.com.google.common.util.concurrent.UncheckedExecutionException:
>  java.io.IOException: Received exit code 1 for command 'docker inspect -f 
> {{.State.Running}} 
> 642c312c335d3881b885873c66917b536e79cff07503fdceaddee5fbeb10bfd1'. stderr: 
> Error: No such object: 
> 642c312c335d3881b885873c66917b536e79cff07503fdceaddee5fbeb10bfd1
> 11:51:22      at 
> org.apache.beam.vendor.guava.v26_0_jre.com.google.common.cache.LocalCache$LocalLoadingCache.getUnchecked(LocalCache.java:4966)
> 11:51:22      at 
> org.apache.beam.runners.fnexecution.control.DefaultJobBundleFactory$SimpleStageBundleFactory.<init>(DefaultJobBundleFactory.java:211)
> 11:51:22      at 
> org.apache.beam.runners.fnexecution.control.DefaultJobBundleFactory$SimpleStageBundleFactory.<init>(DefaultJobBundleFactory.java:202)
> 11:51:22      at 
> org.apache.beam.runners.fnexecution.control.DefaultJobBundleFactory.forStage(DefaultJobBundleFactory.java:185)
> 11:51:22      at 
> org.apache.beam.runners.flink.translation.functions.FlinkDefaultExecutableStageContext.getStageBundleFactory(FlinkDefaultExecutableStageContext.java:49)
> 11:51:22      at 
> org.apache.beam.runners.flink.translation.functions.ReferenceCountingFlinkExecutableStageContextFactory$WrappedContext.getStageBundleFactory(ReferenceCountingFlinkExecutableStageContextFactory.java:203)
> 11:51:22      at 
> org.apache.beam.runners.flink.translation.functions.FlinkExecutableStageFunction.open(FlinkExecutableStageFunction.java:129)
> 11:51:22      at 
> org.apache.flink.api.common.functions.util.FunctionUtils.openFunction(FunctionUtils.java:36)
> 11:51:22      at 
> org.apache.flink.runtime.operators.BatchTask.run(BatchTask.java:494)
> 11:51:22      ... 3 more
> {code}
> https://builds.apache.org/job/beam_PreCommit_Portable_Python_Commit/5512/consoleFull



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

Reply via email to