Robert Kanter created OOZIE-2179:
------------------------------------

             Summary: Use HDFS INotify to track HDFS data dependencies instead 
of polling
                 Key: OOZIE-2179
                 URL: https://issues.apache.org/jira/browse/OOZIE-2179
             Project: Oozie
          Issue Type: New Feature
          Components: coordinator
            Reporter: Robert Kanter


Instead of polling the NN every minute for Coordinators, we should look into 
using the new INotify feature in HDFS-6634.  It allows you to get a stream of 
events from HDFS.  Internally, it still uses a polling mechanism for now, but 
even so, it would likely be more efficient and less heavy-handed than what 
we're doing.

We'd probably still have to check if the directory exists when a coordinator 
action starts in case we missed the event, but while waiting for an HDFS 
dependency to be available, we can use INotify.

For HCat dependencies we still have a backup polling of 10 minutes in case a 
JMS message is missed or lost.  I don't think we'll need to do this for INotify 
because you can view past events as long as you keep track of the event ID.  
For example, if you restart Oozie and we kept track of the last ID Oozie looked 
at, we could resume from there without losing anything.

The INotify stream is asynchronous, so we won't receive a notification 
immediately.  We should look into the guarantees of how long it can take for 
the notification to show up.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to