[ 
https://issues.apache.org/jira/browse/BEAM-7993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16919866#comment-16919866
 ] 

Hannah Jiang commented on BEAM-7993:
------------------------------------

I investigated a little more.

We are not parallel running multiple sdist tasks for Python portable precommit 
tasks. Only one sdist task is running and all precommit tasks depend on it , 
which means all precommit tests are sharing the same beam tar ball, so I think 
we do have endpoints_pb2.py, because it doesn't fail with py2. But somehow, it 
is not recognized.

We need to investigate 
 # Is above conclusion correct? [~markflyhigh], can you please help with it?
 # why import failure only happens at Jenkins, but not at local? What is env 
diff? who can help with it?
 # why import doesn't fail some time? If it's a py3 issue, it should happen 
consistently, but it doesn't fail some time, though this chance is small. 
[~tvalentyn], do you have any insights?
 # anything I missed here..

> portable python precommit is flaky
> ----------------------------------
>
>                 Key: BEAM-7993
>                 URL: https://issues.apache.org/jira/browse/BEAM-7993
>             Project: Beam
>          Issue Type: Bug
>          Components: sdk-py-core, test-failures, testing
>    Affects Versions: 2.15.0
>            Reporter: Udi Meiri
>            Assignee: Mark Liu
>            Priority: Major
>              Labels: currently-failing
>             Fix For: 2.16.0
>
>          Time Spent: 40m
>  Remaining Estimate: 0h
>
> I'm not sure what the root cause is here.
> Example log where 
> :sdks:python:test-suites:portable:py35:portableWordCountBatch failed:
> {code}
> 11:51:22 [CHAIN MapPartition (MapPartition at [1]read/Read/Split) -> FlatMap 
> (FlatMap at ExtractOutput[0]) (2/2)] ERROR 
> org.apache.flink.runtime.operators.BatchTask - Error in task code:  CHAIN 
> MapPartition (MapPartition at [1]read/Read/Split) -> FlatMap (FlatMap at 
> ExtractOutput[0]) (2/2)
> 11:51:22 [CHAIN MapPartition (MapPartition at [1]read/Read/Split) -> FlatMap 
> (FlatMap at ExtractOutput[0]) (1/2)] ERROR 
> org.apache.flink.runtime.operators.BatchTask - Error in task code:  CHAIN 
> MapPartition (MapPartition at [1]read/Read/Split) -> FlatMap (FlatMap at 
> ExtractOutput[0]) (1/2)
> 11:51:22 [CHAIN MapPartition (MapPartition at 
> [2]write/Write/WriteImpl/DoOnce/{FlatMap(<lambda at core.py:2457>), 
> Map(decode)}) -> FlatMap (FlatMap at ExtractOutput[0]) (2/2)] ERROR 
> org.apache.flink.runtime.operators.BatchTask - Error in task code:  CHAIN 
> MapPartition (MapPartition at 
> [2]write/Write/WriteImpl/DoOnce/{FlatMap(<lambda at core.py:2457>), 
> Map(decode)}) -> FlatMap (FlatMap at ExtractOutput[0]) (2/2)
> 11:51:22 [CHAIN MapPartition (MapPartition at 
> [2]write/Write/WriteImpl/DoOnce/{FlatMap(<lambda at core.py:2457>), 
> Map(decode)}) -> FlatMap (FlatMap at ExtractOutput[0]) (1/2)] ERROR 
> org.apache.flink.runtime.operators.BatchTask - Error in task code:  CHAIN 
> MapPartition (MapPartition at 
> [2]write/Write/WriteImpl/DoOnce/{FlatMap(<lambda at core.py:2457>), 
> Map(decode)}) -> FlatMap (FlatMap at ExtractOutput[0]) (1/2)
> 11:51:22 java.lang.Exception: The user defined 'open()' method caused an 
> exception: java.io.IOException: Received exit code 1 for command 'docker 
> inspect -f {{.State.Running}} 
> 642c312c335d3881b885873c66917b536e79cff07503fdceaddee5fbeb10bfd1'. stderr: 
> Error: No such object: 
> 642c312c335d3881b885873c66917b536e79cff07503fdceaddee5fbeb10bfd1
> 11:51:22      at 
> org.apache.flink.runtime.operators.BatchTask.run(BatchTask.java:498)
> 11:51:22      at 
> org.apache.flink.runtime.operators.BatchTask.invoke(BatchTask.java:368)
> 11:51:22      at org.apache.flink.runtime.taskmanager.Task.run(Task.java:712)
> 11:51:22      at java.lang.Thread.run(Thread.java:748)
> 11:51:22 Caused by: 
> org.apache.beam.vendor.guava.v26_0_jre.com.google.common.util.concurrent.UncheckedExecutionException:
>  java.io.IOException: Received exit code 1 for command 'docker inspect -f 
> {{.State.Running}} 
> 642c312c335d3881b885873c66917b536e79cff07503fdceaddee5fbeb10bfd1'. stderr: 
> Error: No such object: 
> 642c312c335d3881b885873c66917b536e79cff07503fdceaddee5fbeb10bfd1
> 11:51:22      at 
> org.apache.beam.vendor.guava.v26_0_jre.com.google.common.cache.LocalCache$LocalLoadingCache.getUnchecked(LocalCache.java:4966)
> 11:51:22      at 
> org.apache.beam.runners.fnexecution.control.DefaultJobBundleFactory$SimpleStageBundleFactory.<init>(DefaultJobBundleFactory.java:211)
> 11:51:22      at 
> org.apache.beam.runners.fnexecution.control.DefaultJobBundleFactory$SimpleStageBundleFactory.<init>(DefaultJobBundleFactory.java:202)
> 11:51:22      at 
> org.apache.beam.runners.fnexecution.control.DefaultJobBundleFactory.forStage(DefaultJobBundleFactory.java:185)
> 11:51:22      at 
> org.apache.beam.runners.flink.translation.functions.FlinkDefaultExecutableStageContext.getStageBundleFactory(FlinkDefaultExecutableStageContext.java:49)
> 11:51:22      at 
> org.apache.beam.runners.flink.translation.functions.ReferenceCountingFlinkExecutableStageContextFactory$WrappedContext.getStageBundleFactory(ReferenceCountingFlinkExecutableStageContextFactory.java:203)
> 11:51:22      at 
> org.apache.beam.runners.flink.translation.functions.FlinkExecutableStageFunction.open(FlinkExecutableStageFunction.java:129)
> 11:51:22      at 
> org.apache.flink.api.common.functions.util.FunctionUtils.openFunction(FunctionUtils.java:36)
> 11:51:22      at 
> org.apache.flink.runtime.operators.BatchTask.run(BatchTask.java:494)
> 11:51:22      ... 3 more
> {code}
> https://builds.apache.org/job/beam_PreCommit_Portable_Python_Commit/5512/consoleFull



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

Reply via email to