[ 
https://issues.apache.org/jira/browse/TEZ-1447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14101693#comment-14101693
 ] 

Siddharth Seth commented on TEZ-1447:
-------------------------------------

Blocker for 0.5.1, which we should target a week or two after 0.5.0 goes in. 
This is required for the InputInitializerEvent feature to work in a usable 
manner. Since I'd put that in - with some minimal APIs at the time, I would 
like to take the feature to completion. Past experience is nice, but I don't 
believe this is complicated enough that it matters a lot. End of the day, we 
need to define the APIs and mechanism for this.

We will need to define the set of transitions, and other information which is 
useful to User plugins. This includes start events, completion, failures, 
parallelism updates - off which there could be multiple eventually. Not every 
state transition matters. Those could either be defined as APIs on 
InputInitializer, VertexManager etc - or notification registrations on the 
context; in either way - the ability to query for relevant state is important.

> Handle parallelism updates and versioning w/ custom InputInitializerEvents
> --------------------------------------------------------------------------
>
>                 Key: TEZ-1447
>                 URL: https://issues.apache.org/jira/browse/TEZ-1447
>             Project: Apache Tez
>          Issue Type: Bug
>            Reporter: Gunther Hagleitner
>            Assignee: Bikas Saha
>            Priority: Blocker
>             Fix For: 0.5.0
>
>
> I'm trying to do dynamic partition pruning through input initializer events 
> in Hive. That means that the initializer of a table scan vertex has to 
> receive events from all tasks in another vertex (which contain the pruning 
> info) before generating tasks to run.
> The problem with the current API I ran into:
> getNumTasks: I'm currently using a busy loop to wait for the num tasks for a 
> vertex to be decided (-1 -> x). There's no way around it, because it's the 
> only way to find out what number of events to expect (0 is a valid number of 
> tasks - so I can't wait for the first to complete).
> With auto-reducer parallelism I have to employ another busy loop. Because I 
> might be initially expecting 10 events, which later get's knocked down to 5. 
> Since there's no event associated with this, I have to periodically check 
> whether I have enough events.
> Versioning: Events have a version number, but I don't know which task they 
> are coming from. Thus I can't de-dup events.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to