+ Dev ML You can model this as a process in Falcon with no inputs and outputs and run azkaban jobs as a java action. This will not add any data dependency to your pipeline.
You could also write a simple Workflow engine implementation for Azkaban and use enhance falcon to drive azkaban flows. On Sun, Jun 15, 2014 at 8:27 PM, Venkat R <[email protected]> wrote: > Hey Venkatesh, > > Good to know. Have a great time in Bangalore. > > I want to leverage Falcon, but one hurdle that we are facing is the > migration. Users have a large number of Azkaban flows that is free form -- > meaning no need to declare the input and output feeds etc. An Azkaban > package contains some properties files and the pig script or MR jar files > and it just runs them. No data dependency or data availability trigger etc. > It's up to your java code to check if the input is ready and move to the > next step in the Azkaban flow. > > Now, converting all of them into Falcon input/output/process definitions > is daunting and I'm hoping it can be mitigated by some tools -- though not > finalized how to modify a Pig script/MR jobs to redefine the LOAD statement > to use the Falcon INPUT/OUTPUT variables. Let me know if you have any > thoughts on this. > > Enjoy your vacation and hope to talk to you soon. > > Thanks > Venkat > > On Saturday, June 14, 2014 10:47 PM, Seetharam Venkatesh < > [email protected]> wrote: > > > HI Venkat, > > Vacation in India for a few weeks. > > > > > On Wed, Jun 11, 2014 at 3:54 PM, Venkat R <[email protected]> wrote: > > Hey Venkatesh, > > There is some idea on Hadoop DR implementation based on parsing the HDFS > audit log to see what folders are accessed by a set of users and > periodically replicate it to the stand by cluster. > > This is exactly what Oozie does but polls dir. This is not a public API > and depending on a log is odd since formats could change. > > > > Since this can take care of the input datasets needed to launch the jobs > on the stand-by clusters, the flows can be restarted on the stand-by > clusters. This sort of looks like because users don't need to define input > and output datasets etc. > > This is already done by falcon with out you writing custom code. This is > active-passive config. > > > I'm sure you would have thought about this implementation -- any idea > where this will break? does DR gets implemented at Yahoo like this? > > > What are you gaining but going out and doing this by hand which are > already solved by existing tools. > > > Appreciate your insights > Venkat > > > > > -- > Regards, > > Venkatesh > > “Perfection (in design) is achieved not when there is nothing more to add, > but rather when there is nothing more to take away.” > - Antoine de Saint-Exupéry > > > -- Regards, Venkatesh “Perfection (in design) is achieved not when there is nothing more to add, but rather when there is nothing more to take away.” - Antoine de Saint-Exupéry
