[ 
https://issues.apache.org/jira/browse/TEZ-1447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14120874#comment-14120874
 ] 

Bikas Saha commented on TEZ-1447:
---------------------------------

bq.  Advantages of an event is that additional information can be provided via 
them, but that can be looked at later.
We could do .onStateChangeNotify(ENUM, Event) where the event currently 
contains the vertex name. Its simple but future proof because of the wrapping 
object which currently has no hierarchy. If we ever need a hierarchy we can add 
it compatibly.

bq. The reason the API doesn't exist is that I don't think registration for 
individual event types is very useful. 
This is a point of disagreement. IMO, users want to be notified when something 
that interests them happens. And not have to worry about getting notified about 
every transition that a vertex exposes. Interesting event types will only 
increase and not decrease. E.g. if I only need to know that a vertex succeeded, 
currently I have to poll on that vertex now. This feature allows me to not poll 
but get notified. But I dont want to get notified about a bunch of things that 
I dont care about. Why burden the user? 
Since we are not coming to a consensus on this and its important because this 
is an API perhaps other watchers on this can post can comment. [~hitesh] 
[~hagleitn] Please share your views on the API. 
Short summary. 
Option 1) register(VertexName) - the listener gets notifications about all 
state changes published by the vertex.
Option 2) register(ENUM, VertexName) - the listener registers for a specific 
change and gets notified when that happens.

bq. Another thing would be to add the same notification APIs on the VMs because 
they will also need this.
A separate jira is fine, though the feature seems incomplete without adding it 
to the VertexManagerPluginContext.


> Handle parallelism updates and versioning w/ custom InputInitializerEvents
> --------------------------------------------------------------------------
>
>                 Key: TEZ-1447
>                 URL: https://issues.apache.org/jira/browse/TEZ-1447
>             Project: Apache Tez
>          Issue Type: Bug
>    Affects Versions: 0.5.0
>            Reporter: Gunther Hagleitner
>            Assignee: Siddharth Seth
>            Priority: Blocker
>         Attachments: TEZ-1447.1.wip.txt
>
>
> I'm trying to do dynamic partition pruning through input initializer events 
> in Hive. That means that the initializer of a table scan vertex has to 
> receive events from all tasks in another vertex (which contain the pruning 
> info) before generating tasks to run.
> The problem with the current API I ran into:
> getNumTasks: I'm currently using a busy loop to wait for the num tasks for a 
> vertex to be decided (-1 -> x). There's no way around it, because it's the 
> only way to find out what number of events to expect (0 is a valid number of 
> tasks - so I can't wait for the first to complete).
> With auto-reducer parallelism I have to employ another busy loop. Because I 
> might be initially expecting 10 events, which later get's knocked down to 5. 
> Since there's no event associated with this, I have to periodically check 
> whether I have enough events.
> Versioning: Events have a version number, but I don't know which task they 
> are coming from. Thus I can't de-dup events.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to