This is a sample work I am trying to write a distributed s3-fetch pig scrip which uses python script.
s3fetch.pig: define CMD `s3fetch.py` SHIP('/root/s3fetch.py'); r1 = LOAD '/ip/s3fetch_input_files' AS (filename:chararray); grp_r1 = GROUP r1 BY filename PARALLEL 5; r2 = FOREACH grp_r1 GENERATE FLATTEN(r1); r3 = STREAM r2 through CMD; store r3 INTO '/op/s3fetch_debug_log'; And here is my s3fetch.py : for word in sys.stdin: word=word.rstrip() str='/usr/local/hadoop-0.20.0/ bin/hadoop fs -cp s3n://<s3-credentials>@bucket/dir-name/'+word+' /ip/data/.'; sys.stdout.write('\n\n'+word+ ':\t'+str+'\n') (input_str,out_err) = os.popen4(str); for line in out_err.readlines(): sys.stdout.write('\t'+word+'::\t'+line) On Thu, Jan 21, 2010 at 10:48 PM, Alexey Tigarev <alexey.tiga...@gmail.com>wrote: > On Thu, Jan 21, 2010 at 8:53 AM, Sunil Kulkarni > <sunil_kulka...@persistent.co.in> wrote: > > I am new to hadoop. Presently, I am reading Hadoop Streaming related > documents. > > Anyone has sample program Hadoop Streaming using shell script used for > Map/Reduce. > > Please help me on this. > > Here's an article with a simple example, > "Finding Similar Items with Amazon Elastic MapReduce, Python, and > Hadoop Streaming", Pete Skomoroch: > http://developer.amazonwebservices.com/connect/entry.jspa?externalID=2294 > > Hope this helps. > > -- > С уважением, Алексей Тигарев > <ti...@nlp.od.ua> Jabber: ti...@jabber.od.ua Skype: t__gra > > Как программисту стать фрилансером и заработать первую $1000 на oDesk: > http://freelance-start.com/earn-first-1000-on-odesk >