[ https://issues.apache.org/jira/browse/TEZ-1447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Bikas Saha updated TEZ-1447: ---------------------------- Assignee: Siddharth Seth (was: Bikas Saha) > Handle parallelism updates and versioning w/ custom InputInitializerEvents > -------------------------------------------------------------------------- > > Key: TEZ-1447 > URL: https://issues.apache.org/jira/browse/TEZ-1447 > Project: Apache Tez > Issue Type: Bug > Reporter: Gunther Hagleitner > Assignee: Siddharth Seth > Priority: Blocker > Fix For: 0.5.0 > > > I'm trying to do dynamic partition pruning through input initializer events > in Hive. That means that the initializer of a table scan vertex has to > receive events from all tasks in another vertex (which contain the pruning > info) before generating tasks to run. > The problem with the current API I ran into: > getNumTasks: I'm currently using a busy loop to wait for the num tasks for a > vertex to be decided (-1 -> x). There's no way around it, because it's the > only way to find out what number of events to expect (0 is a valid number of > tasks - so I can't wait for the first to complete). > With auto-reducer parallelism I have to employ another busy loop. Because I > might be initially expecting 10 events, which later get's knocked down to 5. > Since there's no event associated with this, I have to periodically check > whether I have enough events. > Versioning: Events have a version number, but I don't know which task they > are coming from. Thus I can't de-dup events. -- This message was sent by Atlassian JIRA (v6.2#6252)