Re: Accessing Job counters displayed in WEB GUI in Hadoop Code

2011-12-11 Thread Harsh J
Hej again,

You can get task data via the completion events call from your client:
http://hadoop.apache.org/common/docs/r0.20.1/api/org/apache/hadoop/mapreduce/Job.html#getTaskCompletionEvents(int)

They should carry the data you seek.

On Mon, Dec 12, 2011 at 4:47 AM, W.P. McNeill  wrote:
> You can read counter values from the Job.getCounters API
> (http://hadoop.apache.org/common/docs/r0.20.1/api/org/apache/hadoop/mapreduce/Job.html).
> I'm not sure about the other information like execution times. I've
> been wondering that myself.
>
> On 12/10/11, ArunKumar  wrote:
>> Hai guys !
>>
>> Can i access the Job counters displayed in WEB GUI in Hadoop code when the
>> job finished their execution ?
>> If so, how can i access the values like "average task run time" and counters
>> "FILE/HDFS BYTES READ/WRITTEN" immediately after the job has completed in
>> JobQueueTaskScheduler Code or in some other code file ?
>>
>>
>> Thanks,
>> Arun
>>
>> --
>> View this message in context:
>> http://lucene.472066.n3.nabble.com/Accessing-Job-counters-displayed-in-WEB-GUI-in-Hadoop-Code-tp3576925p3576925.html
>> Sent from the Hadoop lucene-users mailing list archive at Nabble.com.
>>



-- 
Harsh J


Re: HDFS Backup nodes

2011-12-11 Thread M. C. Srivas
You are out of luck if you don't want to use NFS, and yet want redundancy
for the NN.  Even the new "NN HA" work being done by the community will
require NFS ... and the NFS itself needs to be HA.

But if you use a Netapp, then the likelihood of the Netapp crashing is
lower than the likelihood of a garbage-collection-of-death happening in the
NN.

[ disclaimer:  I don't work for Netapp, I work for MapR ]


On Wed, Dec 7, 2011 at 4:30 PM, randy  wrote:

> Thanks Joey. We've had enough problems with nfs (mainly under very high
> load) that we thought it might be riskier to use it for the NN.
>
> randy
>
>
> On 12/07/2011 06:46 PM, Joey Echeverria wrote:
>
>> Hey Rand,
>>
>> It will mark that storage directory as failed and ignore it from then
>> on. In order to do this correctly, you need a couple of options
>> enabled on the NFS mount to make sure that it doesn't retry
>> infinitely. I usually run with the tcp,soft,intr,timeo=10,**retrans=10
>> options set.
>>
>> -Joey
>>
>> On Wed, Dec 7, 2011 at 12:37 PM,  wrote:
>>
>>> What happens then if the nfs server fails or isn't reachable? Does hdfs
>>> lock up? Does it gracefully ignore the nfs copy?
>>>
>>> Thanks,
>>> randy
>>>
>>> - Original Message -
>>> From: "Joey Echeverria"
>>> To: common-user@hadoop.apache.org
>>> Sent: Wednesday, December 7, 2011 6:07:58 AM
>>> Subject: Re: HDFS Backup nodes
>>>
>>> You should also configure the Namenode to use an NFS mount for one of
>>> it's storage directories. That will give the most up-to-date back of
>>> the metadata in case of total node failure.
>>>
>>> -Joey
>>>
>>> On Wed, Dec 7, 2011 at 3:17 AM, praveenesh kumar
>>>  wrote:
>>>
 This means still we are relying on Secondary NameNode idealogy for
 Namenode's backup.
 Can OS-mirroring of Namenode is a good alternative keep it alive all the
 time ?

 Thanks,
 Praveenesh

 On Wed, Dec 7, 2011 at 1:35 PM, Uma Maheswara Rao G<
 mahesw...@huawei.com>wrote:

  AFAIK backup node introduced in 0.21 version onwards.
> __**__
> From: praveenesh kumar [praveen...@gmail.com]
> Sent: Wednesday, December 07, 2011 12:40 PM
> To: common-user@hadoop.apache.org
> Subject: HDFS Backup nodes
>
> Does hadoop 0.20.205 supports configuring HDFS backup nodes ?
>
> Thanks,
> Praveenesh
>
>
>>>
>>>
>>> --
>>> Joseph Echeverria
>>> Cloudera, Inc.
>>> 443.305.9434
>>>
>>
>>
>>
>>
>


RE: Grouping nodes into different racks in Hadoop Cluster

2011-12-11 Thread Devaraj K
Hi Arun,

You can enable rack awareness for your hadoop cluster by configuring
the "topology.script.file.name" property.


Please go through this link for more details about rack awareness.

http://hadoop.apache.org/common/docs/r0.19.2/cluster_setup.html#Hadoop+Rack+
Awareness


Devaraj K 


-Original Message-
From: ArunKumar [mailto:arunk...@gmail.com] 
Sent: Saturday, December 10, 2011 1:20 PM
To: hadoop-u...@lucene.apache.org
Subject: Grouping nodes into different racks in Hadoop Cluster

Hi guys !

I am able to set up Hadoop Multinode Clusters as per 
http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-single-
node-cluster/
http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-single-
node-cluster/
.
I have all my nodes in a LAN. 
How do i group them in different racks ?
I have only a 5 node cluster of say 4GB RAM and i want to create a big
cluster with these using virtualization and group the resulting linux
instances and some real nodes under different racks. How i can do it ?

Any help ?

Thanks,
Arun


--
View this message in context:
http://lucene.472066.n3.nabble.com/Grouping-nodes-into-different-racks-in-Ha
doop-Cluster-tp3574978p3574978.html
Sent from the Hadoop lucene-users mailing list archive at Nabble.com.



Hadoop-Snappy is integrated into Hadoop Common (JUN 2011).

2011-12-11 Thread Jinyan Xu
This project is integrated into Hadoop Common (JUN 2011).

Hadoop-Snappy can be used as an add-on for recent (released) versions of Hadoop 
that do not provide Snappy Codec support yet.

Hadoop-Snappy is being kept in synch with Hadoop Common.

what's this meaning

then how to enable snappy on hadoop-0.20.205.0



The information and any attached documents contained in this message
may be confidential and/or legally privileged. The message is
intended solely for the addressee(s). If you are not the intended
recipient, you are hereby notified that any use, dissemination, or
reproduction is strictly prohibited and may be unlawful. If you are
not the intended recipient, please contact the sender immediately by
return e-mail and destroy all copies of the original message.


RE: Accessing Job counters displayed in WEB GUI in Hadoop Code

2011-12-11 Thread Devaraj K
Hi Arun,

You can get the Counters object from the job after completion and
then find the counter whichever you want in Counters object using
findCounter api. Please find the sample snippet for accessing the counter
after Job completion


Configuration conf = new Configuration();  

Cluster cluster = new Cluster(conf);  

Job job = Job.getInstance(cluster,conf);  

result = job.waitForCompletion(true);  
.
.
Counters counters = job.getCounters(); 

Counter c1 = counters.findCounter(CustomCOUNTER.NOOFBADRECORDS);  

System.out.println(c1.getDisplayName()+":"+c1.getValue());





Devaraj K 

-Original Message-
From: ArunKumar [mailto:arunk...@gmail.com] 
Sent: Sunday, December 11, 2011 12:15 PM
To: hadoop-u...@lucene.apache.org
Subject: Accessing Job counters displayed in WEB GUI in Hadoop Code

Hai guys !

Can i access the Job counters displayed in WEB GUI in Hadoop code when the
job finished their execution ?
If so, how can i access the values like "average task run time" and counters
"FILE/HDFS BYTES READ/WRITTEN" immediately after the job has completed in
JobQueueTaskScheduler Code or in some other code file ?


Thanks,
Arun

--
View this message in context:
http://lucene.472066.n3.nabble.com/Accessing-Job-counters-displayed-in-WEB-G
UI-in-Hadoop-Code-tp3576925p3576925.html
Sent from the Hadoop lucene-users mailing list archive at Nabble.com.



Re: Namenode does not start and generates no error messages

2011-12-11 Thread Hemanth Makkapati
You need to format the namenode before you start its daemon.
Try  "*./hadoop namenode -format* " and then run start-dfs.sh

On Sun, Dec 11, 2011 at 6:25 PM, W.P. McNeill  wrote:

> I am trying to run Hadoop as single-node cluster on OS X 10.7 (Lion),
> Hadoop 0.20.203. The namenode does not start and gives no indication
> of what is wrong.
>
> > start-dfs.sh
> starting namenode, logging to
>
> /tmp/hadoop/logs/hadoop-williammcneill-namenode-William-McNeills-MacBook.local.out
> localhost: starting datanode, logging to
>
> /tmp/hadoop/logs/hadoop-williammcneill-datanode-William-McNeills-MacBook.local.out
> localhost: starting secondarynamenode, logging to
>
> /tmp/hadoop/logs/hadoop-williammcneill-secondarynamenode-William-McNeills-MacBook.local.out
>
> The web interface at http://localhost:50070/dfshealth.jsp does not
> respond and "hadoop fs" commands fail because the namenode is not
> running.
> > jps
> 1883 DataNode
> 1974 Jps
> 308
> 1954 SecondaryNameNode
> 316 RemoteMavenServer
>
> The other DFS daemons appear to be running. I can also start the task
> tracker.
>
> The namenode log file is not helpful. It just contains the following line:
> > cat
> /tmp/hadoop/logs/hadoop-williammcneill-namenode-William-McNeills-MacBook.local.out
> 2011-12-11 15:08:50.065 java[1811:1903] Unable to load realm info from
> SCDynamicStore
> This is an unrelated issue (see HADOOP 7489).
>
> I have been able to run the exact same version and configuration of
> Hadoop on this machine before. I don't know why it is not working now.
> The only major change to the machine I can think of is that I upgraded
> to OS X 10.7.
>
> Does anyone have an idea as to what the issue might be, or how I can
> get the namenode to emit more helpful debugging information?
>


Namenode does not start and generates no error messages

2011-12-11 Thread W.P. McNeill
I am trying to run Hadoop as single-node cluster on OS X 10.7 (Lion),
Hadoop 0.20.203. The namenode does not start and gives no indication
of what is wrong.

> start-dfs.sh
starting namenode, logging to
/tmp/hadoop/logs/hadoop-williammcneill-namenode-William-McNeills-MacBook.local.out
localhost: starting datanode, logging to
/tmp/hadoop/logs/hadoop-williammcneill-datanode-William-McNeills-MacBook.local.out
localhost: starting secondarynamenode, logging to
/tmp/hadoop/logs/hadoop-williammcneill-secondarynamenode-William-McNeills-MacBook.local.out

The web interface at http://localhost:50070/dfshealth.jsp does not
respond and "hadoop fs" commands fail because the namenode is not
running.
> jps
1883 DataNode
1974 Jps
308
1954 SecondaryNameNode
316 RemoteMavenServer

The other DFS daemons appear to be running. I can also start the task tracker.

The namenode log file is not helpful. It just contains the following line:
> cat 
> /tmp/hadoop/logs/hadoop-williammcneill-namenode-William-McNeills-MacBook.local.out
2011-12-11 15:08:50.065 java[1811:1903] Unable to load realm info from
SCDynamicStore
This is an unrelated issue (see HADOOP 7489).

I have been able to run the exact same version and configuration of
Hadoop on this machine before. I don't know why it is not working now.
The only major change to the machine I can think of is that I upgraded
to OS X 10.7.

Does anyone have an idea as to what the issue might be, or how I can
get the namenode to emit more helpful debugging information?


Re: Accessing Job counters displayed in WEB GUI in Hadoop Code

2011-12-11 Thread W.P. McNeill
You can read counter values from the Job.getCounters API
(http://hadoop.apache.org/common/docs/r0.20.1/api/org/apache/hadoop/mapreduce/Job.html).
I'm not sure about the other information like execution times. I've
been wondering that myself.

On 12/10/11, ArunKumar  wrote:
> Hai guys !
>
> Can i access the Job counters displayed in WEB GUI in Hadoop code when the
> job finished their execution ?
> If so, how can i access the values like "average task run time" and counters
> "FILE/HDFS BYTES READ/WRITTEN" immediately after the job has completed in
> JobQueueTaskScheduler Code or in some other code file ?
>
>
> Thanks,
> Arun
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Accessing-Job-counters-displayed-in-WEB-GUI-in-Hadoop-Code-tp3576925p3576925.html
> Sent from the Hadoop lucene-users mailing list archive at Nabble.com.
>


Re: Running a job continuously

2011-12-11 Thread Inder Pall
have you looked at kafka. it provides a streaming view of data stream.
flume at the moment is getting rewritten as flume ng
On Dec 6, 2011 4:28 PM, "Praveen Sripati"  wrote:

> If the requirement is for real time data processing, using Flume
> will not suffice as there is a time lag between the collection of files
> by Flume and processing done by Hadoop. Consider frameworks like S4,
> Storm (from Twitter), HStreaming etc which suits realtime processing.
>
> Regards,
> Praveen
>
> On Tue, Dec 6, 2011 at 10:39 AM, Ravi teja ch n v
> wrote:
>
> > Hi Burak,
> >
> > >Bejoy Ks, i have a continuous inflow of data but i think i need a near
> > real-time system.
> >
> > Just to add to Bejoy's point,
> > with Oozie, you can specify the data dependency for running your job.
> > When specific amount of data is in, your can configure Oozie to run your
> > job.
> > I think this will suffice your requirement.
> >
> > Regards,
> > Ravi Teja
> >
> > 
> > From: burakkk [burak.isi...@gmail.com]
> > Sent: 06 December 2011 04:03:59
> > To: mapreduce-u...@hadoop.apache.org
> > Cc: common-user@hadoop.apache.org
> > Subject: Re: Running a job continuously
> >
> > Athanasios Papaoikonomou, cron job isn't useful for me. Because i want to
> > execute the MR job on the same algorithm but different files have
> different
> > velocity.
> >
> > Both Storm and facebook's hadoop are designed for that. But i want to use
> > apache distribution.
> >
> > Bejoy Ks, i have a continuous inflow of data but i think i need a near
> > real-time system.
> >
> > Mike Spreitzer, both output and input are continuous. Output isn't
> relevant
> > to the input. Only that i want is all the incoming files are processed by
> > the same job and the same algorithm.
> > For ex, you think about wordcount problem. When you want to run
> wordcount,
> > you implement that:
> > http://wiki.apache.org/hadoop/WordCount
> >
> > But when the program find that code "job.waitForCompletion(true);",
> somehow
> > job will end up. When you want to make it continuously, what will you do
> in
> > hadoop without other tools?
> > One more thing is you assumption that the input file's name is
> > filename_timestamp(filename_20111206_0030)
> >
> > public static void main(String[] args) throws Exception {
>  Configuration
> > conf = new Configuration();Job job = new Job(conf,
> > "wordcount");job.setOutputKeyClass(Text.class);
> > job.setOutputValueClass(IntWritable.class);
> > job.setMapperClass(Map.class);job.setReducerClass(Reduce.class);
> >  job.setInputFormatClass(TextInputFormat.class);
> > job.setOutputFormatClass(TextOutputFormat.class);
> > FileInputFormat.addInputPath(job, new Path(args[0]));
> > FileOutputFormat.setOutputPath(job, new Path(args[1]));
> > job.waitForCompletion(true); }
> >
> > On Mon, Dec 5, 2011 at 11:19 PM, Bejoy Ks 
> wrote:
> >
> > > Burak
> > >If you have a continuous inflow of data, you can choose flume to
> > > aggregate the files into larger sequence files or so if they are small
> > and
> > > when you have a substantial chunk of data(equal to hdfs block size).
> You
> > > can push that data on to hdfs based on your SLAs you need to schedule
> > your
> > > jobs using oozie or simpe shell script. In very simple terms
> > > - push input data (could be from flume collector) into a staging hdfs
> dir
> > > - before triggering the job(hadoop jar) copy the input from staging to
> > > main input dir
> > > - execute the job
> > > - archive the input and output into archive dirs(any other dirs).
> > >- the output archive dir could be source of output data
> > > - delete output dir and empty input dir
> > >
> > > Hope it helps!...
> > >
> > > Regards
> > > Bejoy.K.S
> > >
> > > On Tue, Dec 6, 2011 at 2:19 AM, burakkk 
> wrote:
> > >
> > >> Hi everyone,
> > >> I want to run a MR job continuously. Because i have streaming data
> and i
> > >> try to analyze it all the time in my way(algorithm). For example you
> > want
> > >> to solve wordcount problem. It's the simplest one :) If you have some
> > >> multiple files and the new files are keep going, how do you handle it?
> > >> You could execute a MR job per one file but you have to do it
> repeatly.
> > So
> > >> what do you think?
> > >>
> > >> Thanks
> > >> Best regards...
> > >>
> > >> --
> > >>
> > >> *BURAK ISIKLI** *| *http://burakisikli.wordpress.com*
> > >> *
> > >> *
> > >>
> > >
> > >
> >
> >
> > --
> >
> > *BURAK ISIKLI** *| *http://burakisikli.wordpress.com*
> > *
> > *
> >
>