Re: JobTracker History data+analysis

2008-07-28 Thread Amareshwari Sriramadasu

Paco NATHAN wrote:

Thanks Amareshwari -

That could be quite useful to access summary analysis from within the code.

Currently this is not written as a public class, which makes it
difficult to use inside application code.

Are there plans to make it a public class?

  
I created a jira for the same, 
https://issues.apache.org/jira/browse/HADOOP-3850. You can give you 
inputs there.


Thanks
Amareshwari

Paco


On Mon, Jul 28, 2008 at 1:42 AM, Amareshwari Sriramadasu
<[EMAIL PROTECTED]> wrote:
  

HistoryViewer is used in JobClient to view the history files in the
directory provided on the command line. The command is
$ bin/hadoop job -history   #by default history is stored in
output dir.
outputDir in the constructor of HistoryViewer is the directory passed on the
command-line.

You can specify a location to store the history files of a particular job
using "hadoop.job.history.user.location". If nothing is specified, the logs
are stored in the job's
output directory i.e. "mapred.output.dir". The files are stored in
"_logs/history/" inside the directory.
Thanks
Amareshwari

Paco NATHAN wrote:


Thank you, Amareshwari -

That helps.  Hadn't noticed HistoryViewer before. It has no JavaDoc.

What is a typical usage?  In other words, what would be the
"outputDir" value in the context of ToolRunner, JobClient, etc. ?

Paco


On Sun, Jul 27, 2008 at 11:48 PM, Amareshwari Sriramadasu
<[EMAIL PROTECTED]> wrote:

  

Can you have a look at org.apache.hadoop.mapred.HistoryViewer and see if
it
make sense?

Thanks
Amareshwari

Paco NATHAN wrote:



We have a need to access data found in the JobTracker History link.
Specifically in the "Analyse This Job" analysis. Must be run in Java,
between jobs, in the same code which calls ToolRunner and JobClient.
In essence, we need to collect descriptive statistics about task
counts and times for map, shuffle, reduce.

After tracing the flow of the JSP in "src/webapps/job"...  Is there a
better way to get at this data, *not* from the web UI perspective but
from the code?

Tried to find any applicable patterns in JobTracker, ClusterStatus,
JobClient, etc., but no joy.

Thanks,
Paco


  






Re: JobTracker History data+analysis

2008-07-28 Thread Paco NATHAN
Thanks Amareshwari -

That could be quite useful to access summary analysis from within the code.

Currently this is not written as a public class, which makes it
difficult to use inside application code.

Are there plans to make it a public class?


Paco


On Mon, Jul 28, 2008 at 1:42 AM, Amareshwari Sriramadasu
<[EMAIL PROTECTED]> wrote:
> HistoryViewer is used in JobClient to view the history files in the
> directory provided on the command line. The command is
> $ bin/hadoop job -history   #by default history is stored in
> output dir.
> outputDir in the constructor of HistoryViewer is the directory passed on the
> command-line.
>
> You can specify a location to store the history files of a particular job
> using "hadoop.job.history.user.location". If nothing is specified, the logs
> are stored in the job's
> output directory i.e. "mapred.output.dir". The files are stored in
> "_logs/history/" inside the directory.
> Thanks
> Amareshwari
>
> Paco NATHAN wrote:
>>
>> Thank you, Amareshwari -
>>
>> That helps.  Hadn't noticed HistoryViewer before. It has no JavaDoc.
>>
>> What is a typical usage?  In other words, what would be the
>> "outputDir" value in the context of ToolRunner, JobClient, etc. ?
>>
>> Paco
>>
>>
>> On Sun, Jul 27, 2008 at 11:48 PM, Amareshwari Sriramadasu
>> <[EMAIL PROTECTED]> wrote:
>>
>>>
>>> Can you have a look at org.apache.hadoop.mapred.HistoryViewer and see if
>>> it
>>> make sense?
>>>
>>> Thanks
>>> Amareshwari
>>>
>>> Paco NATHAN wrote:
>>>

 We have a need to access data found in the JobTracker History link.
 Specifically in the "Analyse This Job" analysis. Must be run in Java,
 between jobs, in the same code which calls ToolRunner and JobClient.
 In essence, we need to collect descriptive statistics about task
 counts and times for map, shuffle, reduce.

 After tracing the flow of the JSP in "src/webapps/job"...  Is there a
 better way to get at this data, *not* from the web UI perspective but
 from the code?

 Tried to find any applicable patterns in JobTracker, ClusterStatus,
 JobClient, etc., but no joy.

 Thanks,
 Paco


>>>
>>>
>
>


Re: JobTracker History data+analysis

2008-07-27 Thread Amareshwari Sriramadasu
HistoryViewer is used in JobClient to view the history files in the 
directory provided on the command line. The command is
$ bin/hadoop job -history   #by default history is stored 
in output dir.
outputDir in the constructor of HistoryViewer is the directory passed on 
the command-line.


You can specify a location to store the history files of a particular 
job using "hadoop.job.history.user.location". If nothing is specified, 
the logs are stored in the job's
output directory i.e. "mapred.output.dir". The files are stored in 
"_logs/history/" inside the directory.

Thanks
Amareshwari

Paco NATHAN wrote:

Thank you, Amareshwari -

That helps.  Hadn't noticed HistoryViewer before. It has no JavaDoc.

What is a typical usage?  In other words, what would be the
"outputDir" value in the context of ToolRunner, JobClient, etc. ?

Paco


On Sun, Jul 27, 2008 at 11:48 PM, Amareshwari Sriramadasu
<[EMAIL PROTECTED]> wrote:
  

Can you have a look at org.apache.hadoop.mapred.HistoryViewer and see if it
make sense?

Thanks
Amareshwari

Paco NATHAN wrote:


We have a need to access data found in the JobTracker History link.
Specifically in the "Analyse This Job" analysis. Must be run in Java,
between jobs, in the same code which calls ToolRunner and JobClient.
In essence, we need to collect descriptive statistics about task
counts and times for map, shuffle, reduce.

After tracing the flow of the JSP in "src/webapps/job"...  Is there a
better way to get at this data, *not* from the web UI perspective but
from the code?

Tried to find any applicable patterns in JobTracker, ClusterStatus,
JobClient, etc., but no joy.

Thanks,
Paco

  





Re: JobTracker History data+analysis

2008-07-27 Thread Paco NATHAN
Thank you, Amareshwari -

That helps.  Hadn't noticed HistoryViewer before. It has no JavaDoc.

What is a typical usage?  In other words, what would be the
"outputDir" value in the context of ToolRunner, JobClient, etc. ?

Paco


On Sun, Jul 27, 2008 at 11:48 PM, Amareshwari Sriramadasu
<[EMAIL PROTECTED]> wrote:
> Can you have a look at org.apache.hadoop.mapred.HistoryViewer and see if it
> make sense?
>
> Thanks
> Amareshwari
>
> Paco NATHAN wrote:
>>
>> We have a need to access data found in the JobTracker History link.
>> Specifically in the "Analyse This Job" analysis. Must be run in Java,
>> between jobs, in the same code which calls ToolRunner and JobClient.
>> In essence, we need to collect descriptive statistics about task
>> counts and times for map, shuffle, reduce.
>>
>> After tracing the flow of the JSP in "src/webapps/job"...  Is there a
>> better way to get at this data, *not* from the web UI perspective but
>> from the code?
>>
>> Tried to find any applicable patterns in JobTracker, ClusterStatus,
>> JobClient, etc., but no joy.
>>
>> Thanks,
>> Paco
>>
>
>


Re: JobTracker History data+analysis

2008-07-27 Thread Amareshwari Sriramadasu
Can you have a look at org.apache.hadoop.mapred.HistoryViewer and see if 
it make sense?


Thanks
Amareshwari

Paco NATHAN wrote:

We have a need to access data found in the JobTracker History link.
Specifically in the "Analyse This Job" analysis. Must be run in Java,
between jobs, in the same code which calls ToolRunner and JobClient.
In essence, we need to collect descriptive statistics about task
counts and times for map, shuffle, reduce.

After tracing the flow of the JSP in "src/webapps/job"...  Is there a
better way to get at this data, *not* from the web UI perspective but
from the code?

Tried to find any applicable patterns in JobTracker, ClusterStatus,
JobClient, etc., but no joy.

Thanks,
Paco
  




JobTracker History data+analysis

2008-07-27 Thread Paco NATHAN
We have a need to access data found in the JobTracker History link.
Specifically in the "Analyse This Job" analysis. Must be run in Java,
between jobs, in the same code which calls ToolRunner and JobClient.
In essence, we need to collect descriptive statistics about task
counts and times for map, shuffle, reduce.

After tracing the flow of the JSP in "src/webapps/job"...  Is there a
better way to get at this data, *not* from the web UI perspective but
from the code?

Tried to find any applicable patterns in JobTracker, ClusterStatus,
JobClient, etc., but no joy.

Thanks,
Paco