The use case is pretty simple: I have a UDF which validates input rows based on certain set of criteria. Upon failure, it emits an entry containing contextual information (corresponding input split, specific validation that failed, etc) to a side-file. On query completion, these side-files are further analyzed to figure out potential problems (if any).
While it might be possible to reorganize the script into a multi-query it will almost certainly be at the cost of added complexity and redundancy (this validation is performed in a bunch of scripts). On Wed, May 5, 2010 at 2:44 PM, Arun C Murthy <a...@yahoo-inc.com> wrote: > You usually do not have the level of control necessary to do this in tasks > of the Pig jobs. What is your use-case? > > Maybe, you could consider pig-streaming which will let you do this. Or, as > Richard pointed out you could use multi-query if it suffices. > > Arun > > On May 5, 2010, at 10:42 AM, Sandesh Devaraju wrote: > >> With Pig 0.6, as per >> >> http://hadoop.apache.org/common/docs/current/mapred_tutorial.html#Task+Side-Effect+Files >> , I was able to write to side-files. However, I am unable to find an >> obvious way to accomplish this in Pig 0.7. >> >> Thanks in advance! >> >> - Sandesh > >