Re: Task Side-Effect Files in Pig

Sandesh Devaraju Wed, 05 May 2010 12:16:41 -0700

The use case is pretty simple:
I have a UDF which validates input rows based on certain set of
criteria. Upon failure, it emits an entry containing contextual
information (corresponding input split, specific validation that
failed, etc) to a side-file. On query completion, these side-files are
further analyzed to figure out potential problems (if any).


While it might be possible to reorganize the script into a multi-query
it will almost certainly be at the cost of added complexity and
redundancy (this validation is performed in a bunch of scripts).

On Wed, May 5, 2010 at 2:44 PM, Arun C Murthy <a...@yahoo-inc.com> wrote:
> You usually do not have the level of control necessary to do this in tasks
> of the Pig jobs. What is your use-case?
>
> Maybe, you could consider pig-streaming which will let you do this. Or, as
> Richard pointed out you could use multi-query if it suffices.
>
> Arun
>
> On May 5, 2010, at 10:42 AM, Sandesh Devaraju wrote:
>
>> With Pig 0.6, as per
>>
>> http://hadoop.apache.org/common/docs/current/mapred_tutorial.html#Task+Side-Effect+Files
>> , I was able to write to side-files. However, I am unable to find an
>> obvious way to accomplish this in Pig 0.7.
>>
>> Thanks in advance!
>>
>> - Sandesh
>
>

Re: Task Side-Effect Files in Pig

Reply via email to