Minimal details - Vikram / Gunther should be able to provide more. At the moment Hive is using this to implement Bucketed Map Joins, where one side of the join does not need to be pre-bucketed.
A simple 2 table example: Table 1 is pre-bucketed. Table 2 is not - so it will be bucketed dynamically during execution. Table 1 determines the number of tasks, and the distribution of work to individual tasks. A single bucket may span multiple tasks. Depending on the task distribution, buckets generated by Table2 are routed to the correct set of tasks (belonging to the appropriate bucket). Custom Edge/VertexManagers are used since this isn¹t a standard routing pattern. Thanks - Sid On 3/7/14, 11:41 PM, "Rohini Palaniswamy" <[email protected]> wrote: >Hi, > Could you guys tell us what is the hive team using custom edges for? > >Regards, >Rohini > >---------- Forwarded message ---------- >From: Siddharth Seth (JIRA) <[email protected]> >Date: Thu, Mar 6, 2014 at 10:24 AM >Subject: [jira] [Commented] (TEZ-917) NPE when executing running via a >custom edge >To: [email protected] > > > > [ >https://issues.apache.org/jira/browse/TEZ-917?page=com.atlassian.jira.plug >in.system.issuetabpanels:comment-tabpanel&focusedCommentId=13922841#commen >t-13922841] > >Siddharth Seth commented on TEZ-917: >------------------------------------ > >Scratch that. Looking at the trace, this is likely a race in the Hive >custom edge plugin.. > >> NPE when executing running via a custom edge >> -------------------------------------------- >> >> Key: TEZ-917 >> URL: https://issues.apache.org/jira/browse/TEZ-917 >> Project: Apache Tez >> Issue Type: Bug >> Reporter: Siddharth Seth >> >> Reported by [~vikram.dixit]. Likely a race in event routing. >> {code} >> java.lang.NullPointerException >> at >org.apache.hadoop.hive.ql.exec.tez.CustomPartitionEdge.getNumSourceTaskPhy >sicalOutputs(CustomPartitionEdge.java:55) >> at org.apache.tez.dag.app.dag.impl.Edge.getSourceSpec(Edge.java:183) >> at >org.apache.tez.dag.app.dag.impl.VertexImpl.getOutputSpecList(VertexImpl.ja >va:2371) >> at >org.apache.tez.dag.app.dag.impl.TaskAttemptImpl.createRemoteTaskSpec(TaskA >ttemptImpl.java:518) >> at >org.apache.tez.dag.app.dag.impl.TaskAttemptImpl$ScheduleTaskattemptTransit >ion.transition(TaskAttemptImpl.java:1038) >> at >org.apache.tez.dag.app.dag.impl.TaskAttemptImpl$ScheduleTaskattemptTransit >ion.transition(TaskAttemptImpl.java:1027) >> at >org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTrans >ition(StateMachineFactory.java:362) >> at >org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachine >Factory.java:302) >> at >org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFa >ctory.java:46) >> at >org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTr >ansition(StateMachineFactory.java:448) >> at >org.apache.tez.dag.app.dag.impl.TaskAttemptImpl.handle(TaskAttemptImpl.jav >a:721) >> at >org.apache.tez.dag.app.dag.impl.TaskAttemptImpl.handle(TaskAttemptImpl.jav >a:105) >> at >org.apache.tez.dag.app.DAGAppMaster$TaskAttemptEventDispatcher.handle(DAGA >ppMaster.java:1432) >> at >org.apache.tez.dag.app.DAGAppMaster$TaskAttemptEventDispatcher.handle(DAGA >ppMaster.java:1417) >> at >org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java >:173) >> at >org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:10 >6) >> at java.lang.Thread.run(Thread.java:695) >> 2014-03-03 14:55:56,519 INFO [AsyncDispatcher event handler] >org.apache.hadoop.yarn.event.AsyncDispatcher: Exiting, bbye >> {code} > > > >-- >This message was sent by Atlassian JIRA >(v6.2#6252)
