[ https://issues.apache.org/jira/browse/TEZ-3239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ming Ma resolved TEZ-3239. -------------------------- Resolution: Invalid Verified that the issue no longer exists in the master branch. > ShuffleVertexManager recovery issue when auto parallelism is enabled > -------------------------------------------------------------------- > > Key: TEZ-3239 > URL: https://issues.apache.org/jira/browse/TEZ-3239 > Project: Apache Tez > Issue Type: Bug > Reporter: Ming Ma > Attachments: tez.am.recovery.attempt.auto.parallelism.log > > > Repro: > * Enable {{tez.shuffle-vertex-manager.enable.auto-parallel}}. > * kill the Tez AM container after the job has reached to the point that VM > has reconfigured the Edge. > * The new Tez AM attempt will fail to the following error. > {noformat} > org.apache.tez.dag.api.TezUncheckedException: Atleast 1 bipartite source > should exist > at > org.apache.tez.dag.library.vertexmanager.ShuffleVertexManager.onVertexStarted(ShuffleVertexManager.java:497) > at > org.apache.tez.dag.app.dag.impl.VertexManager$VertexManagerEventOnVertexStarted.invoke(VertexManager.java:589) > at > org.apache.tez.dag.app.dag.impl.VertexManager$VertexManagerEvent$1.run(VertexManager.java:658) > at > org.apache.tez.dag.app.dag.impl.VertexManager$VertexManagerEvent$1.run(VertexManager.java:653) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > {noformat} > That is because the edge routing type changed to {{DataMovementType.CUSTOM}} > after reconfiguration. Allowing {{DataMovementType.CUSTOM}} in the following > check seems to fix the issue. > {noformat} > if (entry.getValue().getDataMovementType() == > DataMovementType.SCATTER_GATHER) { > bipartiteSources++; > } > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)