Provide hooks / callbacks to execute some code based on events happening in 
HDFS (file / directory creation, opening, closing, etc)
-----------------------------------------------------------------------------------------------------------------------------------

                 Key: HDFS-1742
                 URL: https://issues.apache.org/jira/browse/HDFS-1742
             Project: Hadoop HDFS
          Issue Type: New Feature
          Components: name-node
            Reporter: Mikhail Yakshin


We're working on a system that runs various Hadoop job continuously, based on 
the data that appears in HDFS: for example, we have a job that works on day's 
worth of data and creates output in {{/output/YYYY/MM/DD}}. For input, it 
should wait for directory with externally uploaded data as 
{{/input/YYYY/MM/DD}} to appear, and also wait for previous day's data to 
appear, i.e. {{/output/YYYY/MM/DD-1}}.

Obviously, one of the possible solutions is polling once in a while for 
files/directories we're waiting for, but generally it's a bad solution. The 
better one is something like file alteration monitor or [inode activity 
notifiers|http://en.wikipedia.org/wiki/Inotify], such as ones implemented in 
Linux filesystems.

Basic idea is that one can specify (inject) some code that will be executed on 
every major event happening in HDFS, such as:
* File created / open
* File closed
* File deleted
* Directory created
* Directory deleted

I see simplistic implementation as following: NN defines some interfaces that 
implement callback/hook mechanism - i.e. something like:

{code}
interface NameNodeCallback {
    public void onFileCreate(SomeFileInformation f);
    public void onFileClose(SomeFileInformation f);
    public void onFileDelete(SomeFileInformation f);
    ...
}
{code}

A user creates a class that implements this method and loads it somehow (for 
example, using an extra jar in classpath) in NameNode's JVM. NameNode includes 
a configuration option that specifies names of such class(es) - then NameNode 
instantiates them and calls methods from them (in a separate thread) on every 
valid event happening.

This would allow systems such as I've described in the beginning to be 
implemented without polling.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to