This is a sample work I am trying to write a distributed s3-fetch pig scrip
which uses python script.

s3fetch.pig:
define CMD `s3fetch.py` SHIP('/root/s3fetch.py');
r1 = LOAD '/ip/s3fetch_input_files' AS (filename:chararray);
grp_r1 = GROUP r1 BY filename PARALLEL 5;
r2 = FOREACH grp_r1 GENERATE FLATTEN(r1);
r3 = STREAM r2 through CMD;
store r3 INTO '/op/s3fetch_debug_log';

And here is my s3fetch.py :
for word in sys.stdin:
  word=word.rstrip()
  str='/usr/local/hadoop-0.20.0/
bin/hadoop fs -cp s3n://<s3-credentials>@bucket/dir-name/'+word+'
/ip/data/.';
  sys.stdout.write('\n\n'+word+ ':\t'+str+'\n')
  (input_str,out_err) = os.popen4(str);
  for line in out_err.readlines():
    sys.stdout.write('\t'+word+'::\t'+line)



On Thu, Jan 21, 2010 at 10:48 PM, Alexey Tigarev
<alexey.tiga...@gmail.com>wrote:

> On Thu, Jan 21, 2010 at 8:53 AM, Sunil Kulkarni
> <sunil_kulka...@persistent.co.in> wrote:
> > I am new to hadoop. Presently, I am reading Hadoop Streaming related
> documents.
> > Anyone has sample program Hadoop Streaming using shell script used for
> Map/Reduce.
> > Please help me on this.
>
> Here's an article with a simple example,
> "Finding Similar Items with Amazon Elastic MapReduce, Python, and
> Hadoop Streaming", Pete Skomoroch:
> http://developer.amazonwebservices.com/connect/entry.jspa?externalID=2294
>
> Hope this helps.
>
> --
> С уважением, Алексей Тигарев
> <ti...@nlp.od.ua> Jabber: ti...@jabber.od.ua Skype: t__gra
>
> Как программисту стать фрилансером и заработать первую $1000 на oDesk:
> http://freelance-start.com/earn-first-1000-on-odesk
>

Reply via email to