Currently I am forced to write python scripts which consume the entire
line ( or map-key ) and modify only the desired field I am interested
in. This is unnecessary complicating my script as I have to
unnecessarily worried about ensuring that I dont any of the other
remaining fields. This is again forcing me to write a separate
python-script-file.
for example to replace a set of strings particular for a particular
set of fields I could have used the following python UDF :
define REPLACE1 `python -c "import sys;[sys.stdout.write('%s'
%(L.replace(',','=')) for L in sys.stdin]"`;
define REPLACE2 `python -c "import sys;[sys.stdout.write('%s'
%(L.replace(':','=')) for L in sys.stdin]"`;
and then use it in directly to the applied fields in a same relation.
But to do it all in a single script would mean a lot more complicated
code , which may or may nto be done via a one-liner python -c command.
-Prasen
On Mon, Mar 1, 2010 at 2:30 AM, Mridul Muralidharan
<[email protected]> wrote:
>
> You can get in touch with Arnab if you want more info on it ... I am sure he
> will be very much interested to see others using it :-)
>
>
> Regards,
> Mridul
>
> On Friday 26 February 2010 08:43 AM, prasenjit mukherjee wrote:
>>
>> Any thoughts on including python-based UDFs like the following :
>> http://arnab.org/blog/baconsnake-inlined-python-udfs-pig
>>
>> This will be a big help indeed.
>>
>> -Thanks,
>> Prasen
>
>