Alex, Oozie lets you schedule data processing jobs. The emphasis is mainly on processing and Oozie lets you define this processing through workflow and coordinator (recurring workflow). You can specify the input datasets for data processing (in coordinator) where you specify the data properties like path, frequency, etc. If there are 2 coordinators that depend on the same data, these details have to be defined twice. Now, if you want to add data eviction(delete old data) , you have to define another coordinator. Oozie provides APIs to manage these coordinators, but there is no easy way to define and track the data lifecyle.
In contrast, falcon gives data view. Data is defined as Feed entity(with a unique name) which contains the data path, frequency, the clusters where this data exists, how long the data is retained in each cluster(eviction), how the data is replicated across clusters and so on. The standard data recipes like acquisition, eviction, replication are available directly. To enable data processing across datasets, falcon exposes Process entity which contains the input and output feed names(which references feed names already defined), frequency of processing and how the data should be processed. Data processing can be defined using either pig script, hive script or oozie workflow. In the backend, the different data lifecycles are implemented using a scheduler which is Oozie currently, but can be replaced easily. Falcon APIs hide the scheduler details and give easy way to define and manage the data lifecycles. Regards, Shwetha On Tue, Sep 9, 2014 at 9:01 PM, Alex Nastetsky <[email protected]> wrote: > Hi, > > I have a general usage question about Falcon. I don't see a "user" mailing > list, so I am sending it here. If there's a better place to direct the > question, please let me know. > > I have been looking at the OnBoarding: > http://falcon.incubator.apache.org/docs/OnBoarding.html > > I understand that Falcon uses Oozie underneath. What is the advantage of > using Falcon instead of using Oozie directly? > > It looks like you can specify in your Input Feed information about your > input data, but you can parameterize your paths in Oozie as well (using > job.properties). > > I have also heard conflicting information about whether Falcon generates > Oozie workflow.xml files, but in that on-boarding example, it looks like > you need to create the workflow.xml manually. Which is correct? > > Thanks in advance, > Alex. > -- _____________________________________________________________ The information contained in this communication is intended solely for the use of the individual or entity to whom it is addressed and others authorized to receive it. It may contain confidential or legally privileged information. If you are not the intended recipient you are hereby notified that any disclosure, copying, distribution or taking any action in reliance on the contents of this information is strictly prohibited and may be unlawful. If you have received this communication in error, please notify us immediately by responding to this email and then delete it from your system. The firm is neither liable for the proper and complete transmission of the information contained in this communication nor for any delay in its receipt.
