Hi, I was looking at the falcon basic userguide [1] and the recent blog post of same by Hortonworks [2]
I was just wondering if there is some proposal to reduce the amount of XML code needed to ingest any new feed or process into the system. Can we have some properties globally defined in the system. Cluster A Cluster B etc Cluster A temp dir Cluster B temp dir Cluster A hive parent dir Cluster B hive parent dir And for any new feed we just need to write something similar to what we do in cascading or pig script. 3-4 declarative steps what has to be done with that data. Write 3-4 lines of code and its all done. We can generate XMLs in the background if needed to make it working , but writing XMLs for ingesting every need feed is the most scary thing for me at this moment to use it in Production. Imagine we have 500 feeds and how many XMLs it will be needed to support What are your thoughts on this. Thanks, Jagat Singh [1] http://falcon.incubator.apache.org/docs/EntitySpecification.html [2] http://hortonworks.com/hadoop-tutorial/defining-processing-data-end-end-data-pipeline-apache-falcon/
