Re: risks of using Hadoop

2011-09-16 Thread George Kousiouris


Hi,

When you say that 0.20.205 will support appends, you mean for general 
purpose writes on the HDFS? or only Hbase?


Thanks,
George

On 9/17/2011 7:08 AM, Uma Maheswara Rao G 72686 wrote:

6. If you plan to use Hbase, it requires append support. 20Append has the 
support for append. 0.20.205 release also will have append support but not yet 
released. Choose your correct version to avoid sudden surprises.



Regards,
Uma
- Original Message -
From: Kobina Kwarko
Date: Saturday, September 17, 2011 3:42 am
Subject: Re: risks of using Hadoop
To: common-user@hadoop.apache.org


We are planning to use Hadoop in my organisation for quality of
servicesanalysis out of CDR records from mobile operators. We are
thinking of having
a small cluster of may be 10 - 15 nodes and I'm preparing the
proposal. my
office requires that i provide some risk analysis in the proposal.

thank you.

On 16 September 2011 20:34, Uma Maheswara Rao G 72686
wrote:


Hello,

First of all where you are planning to use Hadoop?

Regards,
Uma
- Original Message -
From: Kobina Kwarko
Date: Saturday, September 17, 2011 0:41 am
Subject: risks of using Hadoop
To: common-user


Hello,

Please can someone point some of the risks we may incur if we
decide to
implement Hadoop?

BR,

Isaac.






--

---

George Kousiouris
Electrical and Computer Engineer
Division of Communications,
Electronics and Information Engineering
School of Electrical and Computer Engineering
Tel: +30 210 772 2546
Mobile: +30 6939354121
Fax: +30 210 772 2569
Email: gkous...@mail.ntua.gr
Site: http://users.ntua.gr/gkousiou/

National Technical University of Athens
9 Heroon Polytechniou str., 157 73 Zografou, Athens, Greece



Re: Job Scheduler, Task Scheduler and Fair Scheduler

2011-09-16 Thread Arun C Murthy

On Sep 16, 2011, at 11:26 PM, kartheek muthyala wrote:

> Any updates!!

A bit of patience will help. It also helps to do some homework and ask specific 
questions.

I don't know if you have looked at any of the code, but there are 3 schedulers:
JobQueueTaskScheduler (aka default scheduler or fifo scheduler)
Capacity Scheduler (CS)
Fair Scheduler (FS).

TaskScheduler is just an interface for all schedulers (default, CS, FS).

Then there is JobInProgress which handles scheduling for map tasks of an 
individual job based on data locality (JobInProgress.obtainNew*MapTask).

Other than that each of the schedulers (default, CS, FS) use different criteria 
for picking a certain job to offer a 'slot' on a given TT when it's available.

All this has changed radically and completely with MRv2 which is now in 
branch-0.23 and trunk to allow MR and non-MR apps on same Hadoop cluster:
http://wiki.apache.org/hadoop/NextGenMapReduce

Arun

> 
> -- Forwarded message --
> From: kartheek muthyala 
> Date: Fri, Sep 16, 2011 at 8:38 PM
> Subject: Job Scheduler, Task Scheduler and Fair Scheduler
> To: common-user@hadoop.apache.org
> 
> 
> Hi all,
> Can any one explain me the responsibilities of each scheduler?. I am
> interested in the flow of commands that goes between these scheduler. And if
> any one have any info regarding how the job scheduler schedules a job based
> on the data locality?. As of I know, there is some heartbeat mechanism that
> goes from task scheduler to job scheduler and in response job scheduler does
> something here to find out the node where the data is more closely located
> and schedules the task in that node. Is there an elaborate way of
> explanation around this area?. Any help will be greatly appreciated.
> Thanks and Regards,
> Kartheek.



Fwd: Job Scheduler, Task Scheduler and Fair Scheduler

2011-09-16 Thread kartheek muthyala
Any updates!!

-- Forwarded message --
From: kartheek muthyala 
Date: Fri, Sep 16, 2011 at 8:38 PM
Subject: Job Scheduler, Task Scheduler and Fair Scheduler
To: common-user@hadoop.apache.org


Hi all,
Can any one explain me the responsibilities of each scheduler?. I am
interested in the flow of commands that goes between these scheduler. And if
any one have any info regarding how the job scheduler schedules a job based
on the data locality?. As of I know, there is some heartbeat mechanism that
goes from task scheduler to job scheduler and in response job scheduler does
something here to find out the node where the data is more closely located
and schedules the task in that node. Is there an elaborate way of
explanation around this area?. Any help will be greatly appreciated.
Thanks and Regards,
Kartheek.


Re: risks of using Hadoop

2011-09-16 Thread Kobina Kwarko
Hi Uma,

Response very much appreciated.

Thanks.

Isaac.

On 17 September 2011 05:08, Uma Maheswara Rao G 72686
wrote:

> Hi Kobina,
>
>  Some experiences which may helpful for you with respective to DFS.
>
>  1. Selecting the correct version.
>I will recommend to use 0.20X version. This is pretty stable version and
> all other organizations prefers it. Well tested as well.
>  Dont go for 21 version.This version is not a stable version.This is risk.
>
> 2. You should perform thorough test with your customer operations.
>  (of-course you will do this :-))
>
> 3. 0.20x version has the problem of SPOF.
>   If NameNode goes down you will loose the data.One way of recovering is by
> using the secondaryNameNode.You can recover the data till last
> checkpoint.But here manual intervention is required.
> In latest trunk SPOF will be addressed bu HDFS-1623.
>
> 4. 0.20x NameNodes can not scale. Federation changes included in latest
> versions. ( i think in 22). this may not be the problem for your cluster.
> But please consider this aspect as well.
>
> 5. Please select the hadoop version depending on your security
> requirements. There are versions available for security as well in 0.20X.
>
> 6. If you plan to use Hbase, it requires append support. 20Append has the
> support for append. 0.20.205 release also will have append support but not
> yet released. Choose your correct version to avoid sudden surprises.
>
>
>
> Regards,
> Uma
> - Original Message -
> From: Kobina Kwarko 
> Date: Saturday, September 17, 2011 3:42 am
> Subject: Re: risks of using Hadoop
> To: common-user@hadoop.apache.org
>
> > We are planning to use Hadoop in my organisation for quality of
> > servicesanalysis out of CDR records from mobile operators. We are
> > thinking of having
> > a small cluster of may be 10 - 15 nodes and I'm preparing the
> > proposal. my
> > office requires that i provide some risk analysis in the proposal.
> >
> > thank you.
> >
> > On 16 September 2011 20:34, Uma Maheswara Rao G 72686
> > wrote:
> >
> > > Hello,
> > >
> > > First of all where you are planning to use Hadoop?
> > >
> > > Regards,
> > > Uma
> > > - Original Message -
> > > From: Kobina Kwarko 
> > > Date: Saturday, September 17, 2011 0:41 am
> > > Subject: risks of using Hadoop
> > > To: common-user 
> > >
> > > > Hello,
> > > >
> > > > Please can someone point some of the risks we may incur if we
> > > > decide to
> > > > implement Hadoop?
> > > >
> > > > BR,
> > > >
> > > > Isaac.
> > > >
> > >
> >
>


Re: risks of using Hadoop

2011-09-16 Thread Uma Maheswara Rao G 72686
Hi Kobina,
 
 Some experiences which may helpful for you with respective to DFS. 

 1. Selecting the correct version.
I will recommend to use 0.20X version. This is pretty stable version and 
all other organizations prefers it. Well tested as well.
 Dont go for 21 version.This version is not a stable version.This is risk.

2. You should perform thorough test with your customer operations. 
  (of-course you will do this :-))

3. 0.20x version has the problem of SPOF.
   If NameNode goes down you will loose the data.One way of recovering is by 
using the secondaryNameNode.You can recover the data till last checkpoint.But 
here manual intervention is required.
In latest trunk SPOF will be addressed bu HDFS-1623.

4. 0.20x NameNodes can not scale. Federation changes included in latest 
versions. ( i think in 22). this may not be the problem for your cluster. But 
please consider this aspect as well.

5. Please select the hadoop version depending on your security requirements. 
There are versions available for security as well in 0.20X.

6. If you plan to use Hbase, it requires append support. 20Append has the 
support for append. 0.20.205 release also will have append support but not yet 
released. Choose your correct version to avoid sudden surprises.



Regards,
Uma
- Original Message -
From: Kobina Kwarko 
Date: Saturday, September 17, 2011 3:42 am
Subject: Re: risks of using Hadoop
To: common-user@hadoop.apache.org

> We are planning to use Hadoop in my organisation for quality of 
> servicesanalysis out of CDR records from mobile operators. We are 
> thinking of having
> a small cluster of may be 10 - 15 nodes and I'm preparing the 
> proposal. my
> office requires that i provide some risk analysis in the proposal.
> 
> thank you.
> 
> On 16 September 2011 20:34, Uma Maheswara Rao G 72686
> wrote:
> 
> > Hello,
> >
> > First of all where you are planning to use Hadoop?
> >
> > Regards,
> > Uma
> > - Original Message -
> > From: Kobina Kwarko 
> > Date: Saturday, September 17, 2011 0:41 am
> > Subject: risks of using Hadoop
> > To: common-user 
> >
> > > Hello,
> > >
> > > Please can someone point some of the risks we may incur if we
> > > decide to
> > > implement Hadoop?
> > >
> > > BR,
> > >
> > > Isaac.
> > >
> >
> 


Re: risks of using Hadoop

2011-09-16 Thread Kobina Kwarko
We are planning to use Hadoop in my organisation for quality of services
analysis out of CDR records from mobile operators. We are thinking of having
a small cluster of may be 10 - 15 nodes and I'm preparing the proposal. my
office requires that i provide some risk analysis in the proposal.

thank you.

On 16 September 2011 20:34, Uma Maheswara Rao G 72686
wrote:

> Hello,
>
> First of all where you are planning to use Hadoop?
>
> Regards,
> Uma
> - Original Message -
> From: Kobina Kwarko 
> Date: Saturday, September 17, 2011 0:41 am
> Subject: risks of using Hadoop
> To: common-user 
>
> > Hello,
> >
> > Please can someone point some of the risks we may incur if we
> > decide to
> > implement Hadoop?
> >
> > BR,
> >
> > Isaac.
> >
>


RE: risks of using Hadoop

2011-09-16 Thread Michael Segel

Risks?

Well if you come to Hadoop World in Nov, we actually have a presentation that 
might help reduce some of your initial risks.

There are always risks when starting a new project. Regardless of the 
underlying technology, you have costs associated with failure and unless you 
can level set expectations you'll increase your odds of failure. 

Best advice... don't listen to sales critters or marketing folks. ;-) [Right 
Tom?]
They have an agenda.
 ;-)

> Date: Fri, 16 Sep 2011 20:11:20 +0100
> Subject: risks of using Hadoop
> From: kobina.kwa...@gmail.com
> To: common-user@hadoop.apache.org
> 
> Hello,
> 
> Please can someone point some of the risks we may incur if we decide to
> implement Hadoop?
> 
> BR,
> 
> Isaac.
  

Re: risks of using Hadoop

2011-09-16 Thread Tom Deutsch

And that once your business folks see what they have been missing you'll
never able to stop giving them the benefit of that insight.

--Original Message--
From: Harsh J
To: common-user
ReplyTo: common-user
Subject: Re: risks of using Hadoop
Sent: Sep 16, 2011 12:38 PM

Hey Kobina,

You might find some interesting results with your data that may change
the world. Big risk, I'd say :-)

On Sat, Sep 17, 2011 at 12:41 AM, Kobina Kwarko 
wrote:
> Hello,
>
> Please can someone point some of the risks we may incur if we decide to
> implement Hadoop?

J/K. As Uma says, we need more context.

--
Harsh J


---
Sent from my Blackberry so please excuse typing and spelling errors.



Re: risks of using Hadoop

2011-09-16 Thread Harsh J
Hey Kobina,

You might find some interesting results with your data that may change
the world. Big risk, I'd say :-)

On Sat, Sep 17, 2011 at 12:41 AM, Kobina Kwarko  wrote:
> Hello,
>
> Please can someone point some of the risks we may incur if we decide to
> implement Hadoop?

J/K. As Uma says, we need more context.

-- 
Harsh J


Re: Creating a hive table for a custom log

2011-09-16 Thread Raimon Bosch



Any Ideas? 

The most common aproach will be writting your own serde and plug it to your
hive like:

http://code.google.com/p/hive-json-serde/

But I'm wondering if there is some work already done in this area.


Raimon Bosch wrote:
> 
> Hi,
> 
> I'm trying to create a table similar to apache_log but I'm trying to avoid
> to write my own map-reduce task because I don't want to have my HDFS files
> twice.
> 
> So if you're working with log lines like this:
> 
> 186.92.134.151 [31/Aug/2011:00:10:41 +] "GET
> /client/action1/?transaction_id=8002&user_id=87179311248&ts=1314749223525&item1=271&item2=6045&environment=2
> HTTP/1.1"
> 
> 112.201.65.238 [31/Aug/2011:00:10:41 +] "GET
> /client/action1/?transaction_id=9002&ts=1314749223525&user_id=9048871793100&item2=6045&item1=271&environment=2
> HTTP/1.1"
> 
> 90.45.198.251 [31/Aug/2011:00:10:41 +] "GET
> /client/action2/?transaction_id=9022&ts=1314749223525&user_id=9048871793100&item2=6045&item1=271&environment=2
> HTTP/1.1"
> 
> And having in mind that the parameters could be in different orders. Which
> will be the best strategy to create this table? Write my own
> org.apache.hadoop.hive.contrib.serde2? Is there any resource already
> implemented that I could use to perform this task?
> 
> In the end the objective is convert all the parameters in fields and use
> as type the "action". With this big table I will be able to perform my
> queries, my joins or my views.
> 
> Any ideas?
> 
> Thanks in Advance,
> Raimon Bosch.
> 

-- 
View this message in context: 
http://old.nabble.com/Creating-a-hive-table-for-a-custom-log-tp32379849p32481457.html
Sent from the Hadoop core-user mailing list archive at Nabble.com.



Re: risks of using Hadoop

2011-09-16 Thread Uma Maheswara Rao G 72686
Hello,

First of all where you are planning to use Hadoop?

Regards,
Uma
- Original Message -
From: Kobina Kwarko 
Date: Saturday, September 17, 2011 0:41 am
Subject: risks of using Hadoop
To: common-user 

> Hello,
> 
> Please can someone point some of the risks we may incur if we 
> decide to
> implement Hadoop?
> 
> BR,
> 
> Isaac.
> 


risks of using Hadoop

2011-09-16 Thread Kobina Kwarko
Hello,

Please can someone point some of the risks we may incur if we decide to
implement Hadoop?

BR,

Isaac.


Running dependent jobs in 0.20.2

2011-09-16 Thread Tucker Barbour
I'm using hadoop 0.20.2. I would like to use ChainMapper to chain [Map /
Reduce / Map]. I noticed that ChainMapper expects a JobConf object which is
deprecated in 0.20.2. Do I need to need to switch to using the deprecated
JobConf or is there a way to use ChainMapper with the current mapreduce.Job?
Or something similar to ChainMapper? I did see that
mapreduce.lib.chain.ChainMapper is available in 0.21.0 but I'd like to stay
with a stable release. Any suggestions would be appreciated.


Re: Tutorial about Security in Hadoop

2011-09-16 Thread Uma Maheswara Rao G 72686
Hi,

please find the below links
https://media.blackhat.com/bh-us-10/whitepapers/Becherer/BlackHat-USA-2010-Becherer-Andrew-Hadoop-Security-wp.pdf
http://markmail.org/download.xqy?id=yjdqleg3zv5pr54t&number=1

Which will help you to understand more.

Regards,
Uma

- Original Message -
From: Xianqing Yu 
Date: Friday, September 16, 2011 10:43 pm
Subject: Tutorial about Security in Hadoop
To: common-user@hadoop.apache.org

> Hi Community,
> 
> I am trying to install security mechanism in the Hadoop, for 
> instance, using 
> kerberos. However, I didn't find much information about it. Anyone 
> knows 
> that if there any link talking about the tutorial about installing 
> kerberos 
> in Hadoop?
> 
> Thanks,
> 
> Xianqing Yu
> 
> --
> Graduate Research Assistant, Cyber Defense Lab
> Department of Computer Science
> North Carolina State University, Raleigh, NC
> E-mail: x...@ncsu.edu 
> 
> 


RE: Datanodes going down frequently

2011-09-16 Thread Aaron Baff
By KVM I was referring to Keyboard-Video-Mouse console. Basically a cart with a 
monitor, mouse & keyboard that you plug into a server for console access.

Ah, yes, it does sound like your OS was having problems with memory then.

We're not generally having problems with MR Jobs per-se, but it _appears_ that 
there is something going on when doing HDFS accesses. Most of our Jobs use a 
custom grouping & sorting comparators, but they aren't joins so probably not 
too intensive.

Our newer cluster we are going to be using from now on is CDH3u1, and from the 
mailing list they don't really have a clue why we're seeing this behavior. 
We're running on FreeBSD with the Diablo-JVM (Java 1.6), which a guy on their 
list feels is a pretty unusual configuration that people aren't really running.

--Aaron
-Original Message-
From: john smith [mailto:js1987.sm...@gmail.com]
Sent: Friday, September 16, 2011 10:04 AM
To: common-user@hadoop.apache.org
Subject: Re: Datanodes going down frequently

Hi Aaron,

I haven't really run any MR jobs on my cluster till now. I've just been
pushing data into the hdfs . So network shouldn't be a problem.

Initially my HADOOP_HEAPSIZE was set to 2000MB and my ram size was 2GB .
This resulted in datanodes going down randomly. I actually realized that the
OS kept crashing and system went unresponsive until I manually power it on
again.

So I reduced the HADOOP_HEAPSIZE to 800MB and the cluster seems to be stable
again and the datanodes are stable from the past few hours.(I am not sure
though,I need to run a few heavy tasks to check it thoroughly).

Looks like my problem wasn't with ethernet interface going down and its
actually a full OS crash. I am not used to KVM , so i'll have to google it
and i'll attach it to the datanodes and watch them closely incase they fail
again in the future.

What abt your cluster? Are you running any "suffle intense" jobs like JOINs
or CROSS PRODUCTs ?

Thanks

On Fri, Sep 16, 2011 at 10:16 PM, Aaron Baff wrote:

> John,
>
> Are the machines simply unreachable? Or has the OS crashed? We've been
> having quite a few problems with our network mbufs filling up and not
> getting released, which causes a machine to eventually become unreachable
> via the network, although they are otherwise up and running fine. Can you
> attach a KVM to a machine when it becomes unreachable and take a look? Or
> add some monitoring to keep an eye on the network mbufs? Don't know if this
> is your problem as well or not.
>
> --Aaron
> -Original Message-
> From: john smith [mailto:js1987.sm...@gmail.com]
> Sent: Thursday, September 15, 2011 9:46 PM
> To: common-user@hadoop.apache.org
> Subject: Re: Datanodes going down frequently
>
> Hi All,
>
> Thanks for your inputs,
>
> @Aaron : No, they aren't recovering. They are losing network connectivity
> and they are not getting it back. I am unable to ssh to them and I need to
> manually go and restart the networking.
>
> @harsh and Raj,
>
> One thing I noticed in my hadoop-env.sh that  "export HADOOP_HEAPSIZE=2000"
> . Isn't this strange? Allocating my whole ram to the JVM ? Should I
> consider
> this? Right now I am not running any MR jobs as such .
>
> I've started my cluster and I've put around 30 to 40GB of data with a
> replication factor of 3 . This takes the machines down. Looks like swapping
> issue .. But how to see if I am swapping or not? Any help?
>
> Thanks
> jS
>
> On Fri, Sep 16, 2011 at 10:03 AM, Harsh J  wrote:
>
> > I bet its swapping. You may just be oversubscribing those machines
> > with your MR slots and heap per slot or otherwise. Could also be low
> > heap given number of blocks its gotta report (which would equate to a
> > small files issue given your cluster size possibly, but that's a
> > different discussion).
> >
> > On Fri, Sep 16, 2011 at 3:36 AM, john smith 
> > wrote:
> > > Hi all,
> > >
> > > I am running a 10 node cluster (1NN + 9DN, ubuntu server 10.04, 2GB RAM
> > > each). I am facing a strange problem. My datanodes go down randomly and
> > > nothing showup in the logs. They lose their network connectivity
> suddenly
> > > and NN declares them as dead. Any one faced this problem? Is it because
> > of
> > > hadoop or is it some problem with my infrastructure?
> > >
> > > The worst part of the problem is, I need to manually go to the remote
> > > machine and restart networking. Can someone help me with this? Did any
> > one
> > > face a similar kind of a problem
> > >
> > > Btw: my had version : 0.20.2
> > >
> > > Thanks,
> > > jS
> > >
> >
> >
> >
> > --
> > Harsh J
> >
>


Tutorial about Security in Hadoop

2011-09-16 Thread Xianqing Yu

Hi Community,

I am trying to install security mechanism in the Hadoop, for instance, using 
kerberos. However, I didn't find much information about it. Anyone knows 
that if there any link talking about the tutorial about installing kerberos 
in Hadoop?


Thanks,

Xianqing Yu

--
Graduate Research Assistant, Cyber Defense Lab
Department of Computer Science
North Carolina State University, Raleigh, NC
E-mail: x...@ncsu.edu 



Re: Datanodes going down frequently

2011-09-16 Thread john smith
Hi Aaron,

I haven't really run any MR jobs on my cluster till now. I've just been
pushing data into the hdfs . So network shouldn't be a problem.

Initially my HADOOP_HEAPSIZE was set to 2000MB and my ram size was 2GB .
This resulted in datanodes going down randomly. I actually realized that the
OS kept crashing and system went unresponsive until I manually power it on
again.

So I reduced the HADOOP_HEAPSIZE to 800MB and the cluster seems to be stable
again and the datanodes are stable from the past few hours.(I am not sure
though,I need to run a few heavy tasks to check it thoroughly).

Looks like my problem wasn't with ethernet interface going down and its
actually a full OS crash. I am not used to KVM , so i'll have to google it
and i'll attach it to the datanodes and watch them closely incase they fail
again in the future.

What abt your cluster? Are you running any "suffle intense" jobs like JOINs
or CROSS PRODUCTs ?

Thanks

On Fri, Sep 16, 2011 at 10:16 PM, Aaron Baff wrote:

> John,
>
> Are the machines simply unreachable? Or has the OS crashed? We've been
> having quite a few problems with our network mbufs filling up and not
> getting released, which causes a machine to eventually become unreachable
> via the network, although they are otherwise up and running fine. Can you
> attach a KVM to a machine when it becomes unreachable and take a look? Or
> add some monitoring to keep an eye on the network mbufs? Don't know if this
> is your problem as well or not.
>
> --Aaron
> -Original Message-
> From: john smith [mailto:js1987.sm...@gmail.com]
> Sent: Thursday, September 15, 2011 9:46 PM
> To: common-user@hadoop.apache.org
> Subject: Re: Datanodes going down frequently
>
> Hi All,
>
> Thanks for your inputs,
>
> @Aaron : No, they aren't recovering. They are losing network connectivity
> and they are not getting it back. I am unable to ssh to them and I need to
> manually go and restart the networking.
>
> @harsh and Raj,
>
> One thing I noticed in my hadoop-env.sh that  "export HADOOP_HEAPSIZE=2000"
> . Isn't this strange? Allocating my whole ram to the JVM ? Should I
> consider
> this? Right now I am not running any MR jobs as such .
>
> I've started my cluster and I've put around 30 to 40GB of data with a
> replication factor of 3 . This takes the machines down. Looks like swapping
> issue .. But how to see if I am swapping or not? Any help?
>
> Thanks
> jS
>
> On Fri, Sep 16, 2011 at 10:03 AM, Harsh J  wrote:
>
> > I bet its swapping. You may just be oversubscribing those machines
> > with your MR slots and heap per slot or otherwise. Could also be low
> > heap given number of blocks its gotta report (which would equate to a
> > small files issue given your cluster size possibly, but that's a
> > different discussion).
> >
> > On Fri, Sep 16, 2011 at 3:36 AM, john smith 
> > wrote:
> > > Hi all,
> > >
> > > I am running a 10 node cluster (1NN + 9DN, ubuntu server 10.04, 2GB RAM
> > > each). I am facing a strange problem. My datanodes go down randomly and
> > > nothing showup in the logs. They lose their network connectivity
> suddenly
> > > and NN declares them as dead. Any one faced this problem? Is it because
> > of
> > > hadoop or is it some problem with my infrastructure?
> > >
> > > The worst part of the problem is, I need to manually go to the remote
> > > machine and restart networking. Can someone help me with this? Did any
> > one
> > > face a similar kind of a problem
> > >
> > > Btw: my had version : 0.20.2
> > >
> > > Thanks,
> > > jS
> > >
> >
> >
> >
> > --
> > Harsh J
> >
>


RE: Datanodes going down frequently

2011-09-16 Thread Aaron Baff
John,

Are the machines simply unreachable? Or has the OS crashed? We've been having 
quite a few problems with our network mbufs filling up and not getting 
released, which causes a machine to eventually become unreachable via the 
network, although they are otherwise up and running fine. Can you attach a KVM 
to a machine when it becomes unreachable and take a look? Or add some 
monitoring to keep an eye on the network mbufs? Don't know if this is your 
problem as well or not.

--Aaron
-Original Message-
From: john smith [mailto:js1987.sm...@gmail.com]
Sent: Thursday, September 15, 2011 9:46 PM
To: common-user@hadoop.apache.org
Subject: Re: Datanodes going down frequently

Hi All,

Thanks for your inputs,

@Aaron : No, they aren't recovering. They are losing network connectivity
and they are not getting it back. I am unable to ssh to them and I need to
manually go and restart the networking.

@harsh and Raj,

One thing I noticed in my hadoop-env.sh that  "export HADOOP_HEAPSIZE=2000"
. Isn't this strange? Allocating my whole ram to the JVM ? Should I consider
this? Right now I am not running any MR jobs as such .

I've started my cluster and I've put around 30 to 40GB of data with a
replication factor of 3 . This takes the machines down. Looks like swapping
issue .. But how to see if I am swapping or not? Any help?

Thanks
jS

On Fri, Sep 16, 2011 at 10:03 AM, Harsh J  wrote:

> I bet its swapping. You may just be oversubscribing those machines
> with your MR slots and heap per slot or otherwise. Could also be low
> heap given number of blocks its gotta report (which would equate to a
> small files issue given your cluster size possibly, but that's a
> different discussion).
>
> On Fri, Sep 16, 2011 at 3:36 AM, john smith 
> wrote:
> > Hi all,
> >
> > I am running a 10 node cluster (1NN + 9DN, ubuntu server 10.04, 2GB RAM
> > each). I am facing a strange problem. My datanodes go down randomly and
> > nothing showup in the logs. They lose their network connectivity suddenly
> > and NN declares them as dead. Any one faced this problem? Is it because
> of
> > hadoop or is it some problem with my infrastructure?
> >
> > The worst part of the problem is, I need to manually go to the remote
> > machine and restart networking. Can someone help me with this? Did any
> one
> > face a similar kind of a problem
> >
> > Btw: my had version : 0.20.2
> >
> > Thanks,
> > jS
> >
>
>
>
> --
> Harsh J
>


Job Scheduler, Task Scheduler and Fair Scheduler

2011-09-16 Thread kartheek muthyala
Hi all,
Can any one explain me the responsibilities of each scheduler?. I am
interested in the flow of commands that goes between these scheduler. And if
any one have any info regarding how the job scheduler schedules a job based
on the data locality?. As of I know, there is some heartbeat mechanism that
goes from task scheduler to job scheduler and in response job scheduler does
something here to find out the node where the data is more closely located
and schedules the task in that node. Is there an elaborate way of
explanation around this area?. Any help will be greatly appreciated.
Thanks and Regards,
Kartheek.


Re: Running example application with capacity scheduler ?

2011-09-16 Thread ArunKumar
Hi all !

Problem found !
I have set the queue properties in the mapred-site.xml instead of
capacity-scheduler.xml.

Arun




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Running-example-application-with-capacity-scheduler-tp3335471p3341934.html
Sent from the Hadoop lucene-users mailing list archive at Nabble.com.


Search over the index created by hadoop contrib/index

2011-09-16 Thread 27g
I have used the source code in hadoop contrib/index built a lucene index,but
I didn't use Shards ,I used a indexpath(in the code UpdateIndex.java,we can
see). Now,how can I search over this index? And ,If I can search ,Can it 
search one word from a file ? 
Thank you !

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Search-over-the-index-created-by-hadoop-contrib-index-tp3341458p3341458.html
Sent from the Hadoop lucene-users mailing list archive at Nabble.com.


HELP NEEDED: What to do after crash and fsck says that .2% Blocks missing. Namenode in safemode

2011-09-16 Thread Robert J Berger
Just had an HDFS/HBase instance where all the slave/regionservers processes 
crashed, but the namenode stayed up. I did proper shutdown of the namenode

After bringing Hadoop back up the namenode is stuck in safe mode. Fsck shows 
235 corrupt/missing blocks out of 117280 Blocks. All the slaves are doing 
DataBlockScanner: Verification succeeded. As far as I can tell there are no 
errors in the datanodes.

Can I expect it to self-heal? Or do I need to do something to help it along? 
Anyway to tell how long it will take to recover if I do have to just wait?

Other than the verification messages on the datanodes, the namenode fsck 
numbers are not changing and the namenode log continues to say:

The ratio of reported blocks 0.9980 has not reached the threshold 0.9990. Safe 
mode will be turned off automatically.

The ratio has not changed for over an hour now.

If you happen to know the answer, please get back to me right away by email or 
on #hadoop IRC as I'm trying to figure it out now...

Thanks!
__
Robert J Berger - CTO
Runa Inc.
+1 408-838-8896
http://blog.ibd.com