Provide hooks / callbacks to execute some code based on events happening in HDFS (file / directory creation, opening, closing, etc) -----------------------------------------------------------------------------------------------------------------------------------
Key: HDFS-1742 URL: https://issues.apache.org/jira/browse/HDFS-1742 Project: Hadoop HDFS Issue Type: New Feature Components: name-node Reporter: Mikhail Yakshin We're working on a system that runs various Hadoop job continuously, based on the data that appears in HDFS: for example, we have a job that works on day's worth of data and creates output in {{/output/YYYY/MM/DD}}. For input, it should wait for directory with externally uploaded data as {{/input/YYYY/MM/DD}} to appear, and also wait for previous day's data to appear, i.e. {{/output/YYYY/MM/DD-1}}. Obviously, one of the possible solutions is polling once in a while for files/directories we're waiting for, but generally it's a bad solution. The better one is something like file alteration monitor or [inode activity notifiers|http://en.wikipedia.org/wiki/Inotify], such as ones implemented in Linux filesystems. Basic idea is that one can specify (inject) some code that will be executed on every major event happening in HDFS, such as: * File created / open * File closed * File deleted * Directory created * Directory deleted I see simplistic implementation as following: NN defines some interfaces that implement callback/hook mechanism - i.e. something like: {code} interface NameNodeCallback { public void onFileCreate(SomeFileInformation f); public void onFileClose(SomeFileInformation f); public void onFileDelete(SomeFileInformation f); ... } {code} A user creates a class that implements this method and loads it somehow (for example, using an extra jar in classpath) in NameNode's JVM. NameNode includes a configuration option that specifies names of such class(es) - then NameNode instantiates them and calls methods from them (in a separate thread) on every valid event happening. This would allow systems such as I've described in the beginning to be implemented without polling. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira