[ https://issues.apache.org/jira/browse/BEAM-7993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16919866#comment-16919866 ]
Hannah Jiang commented on BEAM-7993: ------------------------------------ I investigated a little more. We are not parallel running multiple sdist tasks for Python portable precommit tasks. Only one sdist task is running and all precommit tasks depend on it , which means all precommit tests are sharing the same beam tar ball, so I think we do have endpoints_pb2.py, because it doesn't fail with py2. But somehow, it is not recognized. We need to investigate # Is above conclusion correct? [~markflyhigh], can you please help with it? # why import failure only happens at Jenkins, but not at local? What is env diff? who can help with it? # why import doesn't fail some time? If it's a py3 issue, it should happen consistently, but it doesn't fail some time, though this chance is small. [~tvalentyn], do you have any insights? # anything I missed here.. > portable python precommit is flaky > ---------------------------------- > > Key: BEAM-7993 > URL: https://issues.apache.org/jira/browse/BEAM-7993 > Project: Beam > Issue Type: Bug > Components: sdk-py-core, test-failures, testing > Affects Versions: 2.15.0 > Reporter: Udi Meiri > Assignee: Mark Liu > Priority: Major > Labels: currently-failing > Fix For: 2.16.0 > > Time Spent: 40m > Remaining Estimate: 0h > > I'm not sure what the root cause is here. > Example log where > :sdks:python:test-suites:portable:py35:portableWordCountBatch failed: > {code} > 11:51:22 [CHAIN MapPartition (MapPartition at [1]read/Read/Split) -> FlatMap > (FlatMap at ExtractOutput[0]) (2/2)] ERROR > org.apache.flink.runtime.operators.BatchTask - Error in task code: CHAIN > MapPartition (MapPartition at [1]read/Read/Split) -> FlatMap (FlatMap at > ExtractOutput[0]) (2/2) > 11:51:22 [CHAIN MapPartition (MapPartition at [1]read/Read/Split) -> FlatMap > (FlatMap at ExtractOutput[0]) (1/2)] ERROR > org.apache.flink.runtime.operators.BatchTask - Error in task code: CHAIN > MapPartition (MapPartition at [1]read/Read/Split) -> FlatMap (FlatMap at > ExtractOutput[0]) (1/2) > 11:51:22 [CHAIN MapPartition (MapPartition at > [2]write/Write/WriteImpl/DoOnce/{FlatMap(<lambda at core.py:2457>), > Map(decode)}) -> FlatMap (FlatMap at ExtractOutput[0]) (2/2)] ERROR > org.apache.flink.runtime.operators.BatchTask - Error in task code: CHAIN > MapPartition (MapPartition at > [2]write/Write/WriteImpl/DoOnce/{FlatMap(<lambda at core.py:2457>), > Map(decode)}) -> FlatMap (FlatMap at ExtractOutput[0]) (2/2) > 11:51:22 [CHAIN MapPartition (MapPartition at > [2]write/Write/WriteImpl/DoOnce/{FlatMap(<lambda at core.py:2457>), > Map(decode)}) -> FlatMap (FlatMap at ExtractOutput[0]) (1/2)] ERROR > org.apache.flink.runtime.operators.BatchTask - Error in task code: CHAIN > MapPartition (MapPartition at > [2]write/Write/WriteImpl/DoOnce/{FlatMap(<lambda at core.py:2457>), > Map(decode)}) -> FlatMap (FlatMap at ExtractOutput[0]) (1/2) > 11:51:22 java.lang.Exception: The user defined 'open()' method caused an > exception: java.io.IOException: Received exit code 1 for command 'docker > inspect -f {{.State.Running}} > 642c312c335d3881b885873c66917b536e79cff07503fdceaddee5fbeb10bfd1'. stderr: > Error: No such object: > 642c312c335d3881b885873c66917b536e79cff07503fdceaddee5fbeb10bfd1 > 11:51:22 at > org.apache.flink.runtime.operators.BatchTask.run(BatchTask.java:498) > 11:51:22 at > org.apache.flink.runtime.operators.BatchTask.invoke(BatchTask.java:368) > 11:51:22 at org.apache.flink.runtime.taskmanager.Task.run(Task.java:712) > 11:51:22 at java.lang.Thread.run(Thread.java:748) > 11:51:22 Caused by: > org.apache.beam.vendor.guava.v26_0_jre.com.google.common.util.concurrent.UncheckedExecutionException: > java.io.IOException: Received exit code 1 for command 'docker inspect -f > {{.State.Running}} > 642c312c335d3881b885873c66917b536e79cff07503fdceaddee5fbeb10bfd1'. stderr: > Error: No such object: > 642c312c335d3881b885873c66917b536e79cff07503fdceaddee5fbeb10bfd1 > 11:51:22 at > org.apache.beam.vendor.guava.v26_0_jre.com.google.common.cache.LocalCache$LocalLoadingCache.getUnchecked(LocalCache.java:4966) > 11:51:22 at > org.apache.beam.runners.fnexecution.control.DefaultJobBundleFactory$SimpleStageBundleFactory.<init>(DefaultJobBundleFactory.java:211) > 11:51:22 at > org.apache.beam.runners.fnexecution.control.DefaultJobBundleFactory$SimpleStageBundleFactory.<init>(DefaultJobBundleFactory.java:202) > 11:51:22 at > org.apache.beam.runners.fnexecution.control.DefaultJobBundleFactory.forStage(DefaultJobBundleFactory.java:185) > 11:51:22 at > org.apache.beam.runners.flink.translation.functions.FlinkDefaultExecutableStageContext.getStageBundleFactory(FlinkDefaultExecutableStageContext.java:49) > 11:51:22 at > org.apache.beam.runners.flink.translation.functions.ReferenceCountingFlinkExecutableStageContextFactory$WrappedContext.getStageBundleFactory(ReferenceCountingFlinkExecutableStageContextFactory.java:203) > 11:51:22 at > org.apache.beam.runners.flink.translation.functions.FlinkExecutableStageFunction.open(FlinkExecutableStageFunction.java:129) > 11:51:22 at > org.apache.flink.api.common.functions.util.FunctionUtils.openFunction(FunctionUtils.java:36) > 11:51:22 at > org.apache.flink.runtime.operators.BatchTask.run(BatchTask.java:494) > 11:51:22 ... 3 more > {code} > https://builds.apache.org/job/beam_PreCommit_Portable_Python_Commit/5512/consoleFull -- This message was sent by Atlassian Jira (v8.3.2#803003)