Hi Eduardo , You could run the Pig command as an in-line script command and capture the output into an output file.
Example: pig -f pigscript.pig > pig_log.log Kamal Hakim American Express Big Data Architecture Phone: 602-537-6819 ________________________________________ From: Eduardo Afonso Ferreira [[email protected]] Sent: Tuesday, September 04, 2012 02:24 PM To: [email protected] Subject: Re: Capturing Pig action output Hey, thanks for the response. What I need is not the log file that's created by Pig when there's an error, you know, the one that lists stack trace and things like that. I need the output that Pig sends to stdout/stderr. This is the output that includes information about hadoop jobs created, started/finished timestamps, success/failure, etc. Here's an example of part of that output: ________________________________ HadoopVersion PigVersion UserId StartedAt FinishedAt Features 0.20.2-cdh3u3 0.11.0-SNAPSHOT mapred 2012-09-04 17:00:56 2012-09-04 17:06:26 GROUP_BY,DISTINCT,FILTER Success! Job Stats (time in seconds): JobId Maps Reduces MaxMapTime MinMapTIme AvgMapTime MaxReduceTime MinReduceTime AvgReduceTime Alias Feature Outputs job_201206281058_812355 14 4 20 11 16 250 235 246 B,esi,filt1,keys,slots DISTINCT job_201206281058_812414 4 4 7 6 7 26 22 24 1-4,C,E GROUP_BY,DISTINCT hbase://active_video_plays, Input(s): Successfully read 144190 records (5079 bytes) from: "hbase://events_sessions" Output(s): Successfully stored 20835 records in: "hbase://active_video_plays" Counters: Total records written : 20835 Total bytes written : 0 Spillable Memory Manager spill count : 0 Total bags proactively spilled: 0 Total records proactively spilled: 0 ________________________________ Virag, We're currently using Oozie 2.3.2 (2.3.2-cdh3u3)and I guess the wf configuration you mentioned is not available on this version. Eduardo. ________________________________ From: Mona Chitnis <[email protected]> To: "[email protected]" <[email protected]>; Eduardo Afonso Ferreira <[email protected]> Sent: Thursday, August 30, 2012 4:01 PM Subject: Re: Capturing Pig action output Hi Eduardo, The log file where the Oozie pig action's output is written to, is a local file in the current working directory of the map task, and not on HDFS. Also, passing a custom path using "pig -logfile <HDFS_PATH>" is not allowed. Can you try with passing another argument at the end of your arguments list to your pig action as a redirection to some file of choice? E.g. Trying to recreate something like the following. Note the redirection to 'myfile.txt' at the end pig -param $xyz=1000 myscript.pig &>myfile.txt I haven't tried this out myself but it will be helpful to find out. Otherwise, Virag's suggestion can help you access specific stats related information Regards, -- Mona Chitnis On 8/30/12 11:59 AM, "Virag Kothari" <[email protected]> wrote: >Hi, > >From 3.2 onwards, counters and hadoop job ids for Pig and Map-reduce can >be accessed through the API or EL function. > >First, the following should be set in wf configuration. This will store >the Pig/MR related statistics in the DB. ><property> > <name>oozie.action.external.stats.write</name> > <value>true</value> > </property> > >Then, the stats and jobIds can be accessed using the verbose API >oozie job -info <jobId> -verbose > >Also, the hadoop job Id's can be retrieved for a Pig action through >El-function > >wf:actionData(<pig-action-name>)["hadoopJobs"] > > >Detailed docs at >http://incubator.apache.org/oozie/docs/3.2.0-incubating/docs/WorkflowFunct >i >onalSpec.html. Look under "4.2.5 Hadoop EL Functions" > >Thanks, >Virag > > > > > >On 8/30/12 10:31 AM, "Eduardo Afonso Ferreira" <[email protected]> wrote: > >>Hi there, >> >>I have a pig that runs periodically by oozie via coordinator with a set >>frequency. >>I wanted to capture the Pig script output because I need to look at some >>information on the results to keep track of several things. >>I know I can look at the output by doing a whole bunch of clicks starting >>at the oozie web console as follows: >> >>- Open oozie web console (ex.: http://localhost:11000/oozie/) >>- Find and click the specific job under "Workflow Jobs" >>- Select (click) the pig action in the window that pops up >>- Click the magnifying glass icon on the "Console URL" field >>- Click the Map of the launcher job >>- Click the task ID >>- Click All under "Task Logs" >> >>My question is how can I know the exact name and location of that log >>file in HDFS so I can programmaticaly retrieve the file from HDFS and >>parse and look for what I need? >> >>Is this something I can determine ahead of time, like pass a >>parameter/argument to the action/pig so that it will store the log where >>I want with the file name I want? >> >>Thanks in advance for your help. >>Eduardo. > American Express made the following annotations on Tue Sep 04 2012 14:40:20 ****************************************************************************** "This message and any attachments are solely for the intended recipient and may contain confidential or privileged information. If you are not the intended recipient, any disclosure, copying, use, or distribution of the information included in this message and any attachments is prohibited. If you have received this communication in error, please notify us by reply e-mail and immediately and permanently delete this message and any attachments. Thank you." American Express a ajouté le commentaire suivant le Tue Sep 04 2012 14:40:20 Ce courrier et toute pièce jointe qu'il contient sont réservés au seul destinataire indiqué et peuvent renfermer des renseignements confidentiels et privilégiés. Si vous n'êtes pas le destinataire prévu, toute divulgation, duplication, utilisation ou distribution du courrier ou de toute pièce jointe est interdite. Si vous avez reçu cette communication par erreur, veuillez nous en aviser par courrier et détruire immédiatement le courrier et les pièces jointes. Merci. ****************************************************************************** -------------------------------------------------------------------------------
