Hey, thanks for the response. What I need is not the log file that's created by Pig when there's an error, you know, the one that lists stack trace and things like that.
I need the output that Pig sends to stdout/stderr. This is the output that includes information about hadoop jobs created, started/finished timestamps, success/failure, etc. Here's an example of part of that output: ________________________________ HadoopVersion PigVersion UserId StartedAt FinishedAt Features 0.20.2-cdh3u3 0.11.0-SNAPSHOT mapred 2012-09-04 17:00:56 2012-09-04 17:06:26 GROUP_BY,DISTINCT,FILTER Success! Job Stats (time in seconds): JobId Maps Reduces MaxMapTime MinMapTIme AvgMapTime MaxReduceTime MinReduceTime AvgReduceTime Alias Feature Outputs job_201206281058_812355 14 4 20 11 16 250 235 246 B,esi,filt1,keys,slots DISTINCT job_201206281058_812414 4 4 7 6 7 26 22 24 1-4,C,E GROUP_BY,DISTINCT hbase://active_video_plays, Input(s): Successfully read 144190 records (5079 bytes) from: "hbase://events_sessions" Output(s): Successfully stored 20835 records in: "hbase://active_video_plays" Counters: Total records written : 20835 Total bytes written : 0 Spillable Memory Manager spill count : 0 Total bags proactively spilled: 0 Total records proactively spilled: 0 ________________________________ Virag, We're currently using Oozie 2.3.2 (2.3.2-cdh3u3)and I guess the wf configuration you mentioned is not available on this version. Eduardo. ________________________________ From: Mona Chitnis <[email protected]> To: "[email protected]" <[email protected]>; Eduardo Afonso Ferreira <[email protected]> Sent: Thursday, August 30, 2012 4:01 PM Subject: Re: Capturing Pig action output Hi Eduardo, The log file where the Oozie pig action's output is written to, is a local file in the current working directory of the map task, and not on HDFS. Also, passing a custom path using "pig -logfile <HDFS_PATH>" is not allowed. Can you try with passing another argument at the end of your arguments list to your pig action as a redirection to some file of choice? E.g. Trying to recreate something like the following. Note the redirection to 'myfile.txt' at the end pig -param $xyz=1000 myscript.pig &>myfile.txt I haven't tried this out myself but it will be helpful to find out. Otherwise, Virag's suggestion can help you access specific stats related information Regards, -- Mona Chitnis On 8/30/12 11:59 AM, "Virag Kothari" <[email protected]> wrote: >Hi, > >From 3.2 onwards, counters and hadoop job ids for Pig and Map-reduce can >be accessed through the API or EL function. > >First, the following should be set in wf configuration. This will store >the Pig/MR related statistics in the DB. ><property> > <name>oozie.action.external.stats.write</name> > <value>true</value> > </property> > >Then, the stats and jobIds can be accessed using the verbose API >oozie job -info <jobId> -verbose > >Also, the hadoop job Id's can be retrieved for a Pig action through >El-function > >wf:actionData(<pig-action-name>)["hadoopJobs"] > > >Detailed docs at >http://incubator.apache.org/oozie/docs/3.2.0-incubating/docs/WorkflowFunct >i >onalSpec.html. Look under "4.2.5 Hadoop EL Functions" > >Thanks, >Virag > > > > > >On 8/30/12 10:31 AM, "Eduardo Afonso Ferreira" <[email protected]> wrote: > >>Hi there, >> >>I have a pig that runs periodically by oozie via coordinator with a set >>frequency. >>I wanted to capture the Pig script output because I need to look at some >>information on the results to keep track of several things. >>I know I can look at the output by doing a whole bunch of clicks starting >>at the oozie web console as follows: >> >>- Open oozie web console (ex.: http://localhost:11000/oozie/) >>- Find and click the specific job under "Workflow Jobs" >>- Select (click) the pig action in the window that pops up >>- Click the magnifying glass icon on the "Console URL" field >>- Click the Map of the launcher job >>- Click the task ID >>- Click All under "Task Logs" >> >>My question is how can I know the exact name and location of that log >>file in HDFS so I can programmaticaly retrieve the file from HDFS and >>parse and look for what I need? >> >>Is this something I can determine ahead of time, like pass a >>parameter/argument to the action/pig so that it will store the log where >>I want with the file name I want? >> >>Thanks in advance for your help. >>Eduardo. >
