Re: Capturing Pig action output

Eduardo Afonso Ferreira Tue, 04 Sep 2012 14:25:05 -0700

Hey, thanks for the response.

What I need is not the log file that's created by Pig when there's an error, 
you know, the one that lists stack trace and things like that.


I need the output that Pig sends to stdout/stderr. This is the output that 
includes information about hadoop jobs created, started/finished timestamps, 
success/failure, etc.

Here's an example of part of that output:

________________________________
 
HadoopVersion    PigVersion    UserId    StartedAt    FinishedAt    Features
0.20.2-cdh3u3    0.11.0-SNAPSHOT    mapred    2012-09-04 17:00:56    2012-09-04 
17:06:26    GROUP_BY,DISTINCT,FILTER

Success!

Job Stats (time in seconds):
JobId    Maps    Reduces    MaxMapTime    MinMapTIme    AvgMapTime    
MaxReduceTime    MinReduceTime    AvgReduceTime    Alias    Feature    Outputs
job_201206281058_812355    14    4    20    11    16    250    235    246    
B,esi,filt1,keys,slots    DISTINCT    
job_201206281058_812414    4    4    7    6    7    26    22    24    
1-4,C,E    GROUP_BY,DISTINCT    hbase://active_video_plays,

Input(s):
Successfully read 144190 records (5079 bytes) from: "hbase://events_sessions"

Output(s):
Successfully stored 20835 records in: "hbase://active_video_plays"

Counters:
Total records written : 20835
Total bytes written : 0
Spillable Memory Manager spill count : 0
Total bags proactively spilled: 0
Total records proactively spilled: 0
________________________________
 

Virag,

We're currently using Oozie 2.3.2 (2.3.2-cdh3u3)and I guess the wf 
configuration you mentioned is not available on this version.


Eduardo.




________________________________
 From: Mona Chitnis <[email protected]>
To: "[email protected]" <[email protected]>; 
Eduardo Afonso Ferreira <[email protected]> 
Sent: Thursday, August 30, 2012 4:01 PM
Subject: Re: Capturing Pig action output
 
Hi Eduardo,

The log file where the Oozie pig action's output is written to, is a local
file in the current working directory of the map task, and not on HDFS.
Also, passing a custom path using "pig -logfile <HDFS_PATH>" is not
allowed. Can you try with passing another argument at the end of your
arguments list to your pig action as a redirection to some file of choice?

E.g. Trying to recreate something like the following. Note the redirection
to 'myfile.txt' at the end
pig -param $xyz=1000 myscript.pig &>myfile.txt

I haven't tried this out myself but it will be helpful to find out.
Otherwise, Virag's suggestion can help you access specific stats related
information

Regards,

--
Mona Chitnis




On 8/30/12 11:59 AM, "Virag Kothari" <[email protected]> wrote:

>Hi,
>
>From 3.2 onwards, counters and hadoop job ids for Pig and Map-reduce can
>be accessed through the API or EL function.
>
>First, the following should be set in wf configuration. This will store
>the Pig/MR related statistics in the DB.
><property>
>                    <name>oozie.action.external.stats.write</name>
>                    <value>true</value>
>                </property>
>
>Then, the stats and jobIds can be accessed using the verbose API
>oozie job -info <jobId> -verbose
>
>Also, the hadoop job Id's can be retrieved for a Pig action through
>El-function
>
>wf:actionData(<pig-action-name>)["hadoopJobs"]
>
>
>Detailed docs at 
>http://incubator.apache.org/oozie/docs/3.2.0-incubating/docs/WorkflowFunct
>i
>onalSpec.html. Look under "4.2.5 Hadoop EL Functions"
>
>Thanks,
>Virag
>
>
>
>
>
>On 8/30/12 10:31 AM, "Eduardo Afonso Ferreira" <[email protected]> wrote:
>
>>Hi there,
>>
>>I have a pig that runs periodically by oozie via coordinator with a set
>>frequency.
>>I wanted to capture the Pig script output because I need to look at some
>>information on the results to keep track of several things.
>>I know I can look at the output by doing a whole bunch of clicks starting
>>at the oozie web console as follows:
>>
>>- Open oozie web console (ex.: http://localhost:11000/oozie/)
>>- Find and click the specific job under "Workflow Jobs"
>>- Select (click) the pig action in the window that pops up
>>- Click the magnifying glass icon on the "Console URL" field
>>- Click the Map of the launcher job
>>- Click the task ID
>>- Click All under "Task Logs"
>>
>>My question is how can I know the exact name and location of that log
>>file in HDFS so I can programmaticaly retrieve the file from HDFS and
>>parse and look for what I need?
>>
>>Is this something I can determine ahead of time, like pass a
>>parameter/argument to the action/pig so that it will store the log where
>>I want with the file name I want?
>>
>>Thanks in advance for your help.
>>Eduardo.
>

Re: Capturing Pig action output

Reply via email to