Hadoop - Browsefile system error

2013-09-16 Thread Manickam P
Hi,
I've installed hadoop-2.1.0-beta version and configured. When i start 
http://10.108.19.68:50070 i'm getting the page. While clicking on the browse 
file system i'm getting dns unresolved host name error. I've given below the 
browser URL. 
http://lab2-hadoop2-vm1.eng.com:50075/browseDirectory.jsp?namenodeInfoPort=50070&dir=/&nnaddr=10.108.19.68:9000
I check all the configuration files I've given the correct ip everywhere. No 
localhost is available. I've the host name entry in /etc/hosts file also with 
correct host name and ip address. 
Please help me.

Thanks,
Manickam P

Re: Hadoop - Browsefile system error

2013-09-16 Thread Jitendra Yadav
Hi,

>From where you are accessing this
"http://10.108.19.68:50070
" URL?

Regards
Jitendra


On Mon, Sep 16, 2013 at 3:22 PM, Manickam P  wrote:

> Hi,
>
> I've installed hadoop-2.1.0-beta version and configured. When i start
> http://10.108.19.68:50070
> i'm getting the page. While clicking on the browse file system i'm getting
> dns unresolved host name error. I've given below the browser URL.
>
>
> http://lab2-hadoop2-vm1.eng.com:50075/browseDirectory.jsp?namenodeInfoPort=50070&dir=/&nnaddr=10.108.19.68:9000
>
> I check all the configuration files I've given the correct ip everywhere.
> No localhost is available. I've the host name entry in /etc/hosts file also
> with correct host name and ip address.
>
> Please help me.
>
>
> Thanks,
> Manickam P
>


RE: Hadoop - Browsefile system error

2013-09-16 Thread Manickam P
Hi,
The 10.108.19.68 machine is my name node. when try open it browser i'm getting 
that error. 



Thanks,
Manickam P
Date: Mon, 16 Sep 2013 15:26:38 +0530
Subject: Re: Hadoop - Browsefile system error
From: jeetuyadav200...@gmail.com
To: user@hadoop.apache.org

Hi,
>From where you are accessing this "http://10.108.19.68:50070 " URL?

RegardsJitendra

On Mon, Sep 16, 2013 at 3:22 PM, Manickam P  wrote:




Hi,
I've installed hadoop-2.1.0-beta version and configured. When i start 
http://10.108.19.68:50070 i'm getting the page. While clicking on the browse 
file system i'm getting dns unresolved host name error. I've given below the 
browser URL. 

http://lab2-hadoop2-vm1.eng.com:50075/browseDirectory.jsp?namenodeInfoPort=50070&dir=/&nnaddr=10.108.19.68:9000

I check all the configuration files I've given the correct ip everywhere. No 
localhost is available. I've the host name entry in /etc/hosts file also with 
correct host name and ip address. 

Please help me.

Thanks,
Manickam P

  

Re: Hadoop - Browsefile system error

2013-09-16 Thread Jitendra Yadav
Hi,

Looks like your are using hadoop in Virtual environment right?

Make sure you have
lab2-hadoop2-vm1.eng.dnb.com
domain and IP entry in your host file from where your are accessing http://
10.108.19.68:50070
URL. I believe your accessing this URl from your physical box right?

Thanks
Jitendra


On Mon, Sep 16, 2013 at 3:26 PM, Jitendra Yadav
wrote:

> Hi,
>
> From where you are accessing this 
> "http://10.108.19.68:50070
> " URL?
>
> Regards
> Jitendra
>
>
> On Mon, Sep 16, 2013 at 3:22 PM, Manickam P wrote:
>
>> Hi,
>>
>> I've installed hadoop-2.1.0-beta version and configured. When i start
>> http://10.108.19.68:50070
>> i'm getting the page. While clicking on the browse file system i'm getting
>> dns unresolved host name error. I've given below the browser URL.
>>
>>
>> http://lab2-hadoop2-vm1.eng.com:50075/browseDirectory.jsp?namenodeInfoPort=50070&dir=/&nnaddr=10.108.19.68:9000
>>
>> I check all the configuration files I've given the correct ip everywhere.
>> No localhost is available. I've the host name entry in /etc/hosts file also
>> with correct host name and ip address.
>>
>> Please help me.
>>
>>
>> Thanks,
>> Manickam P
>>
>
>


RE: Hadoop - Browsefile system error

2013-09-16 Thread Manickam P
Hi,
I checked the host entry. It has all the details. I checked my data nodes also 
even It has the proper host entry. I don't have any clue here.


Thanks,
Manickam P
Date: Mon, 16 Sep 2013 15:41:03 +0530
Subject: Re: Hadoop - Browsefile system error
From: jeetuyadav200...@gmail.com
To: user@hadoop.apache.org

Hi,
Looks like your are using hadoop in Virtual environment right?
Make sure you have lab2-hadoop2-vm1.eng.dnb.com  domain and IP entry in your 
host file from where your are accessing http://10.108.19.68:50070 URL. I 
believe your accessing this URl from your physical box right?

Thanks
Jitendra


On Mon, Sep 16, 2013 at 3:26 PM, Jitendra Yadav  
wrote:

Hi,
>From where you are accessing this "http://10.108.19.68:50070 " URL?


RegardsJitendra

On Mon, Sep 16, 2013 at 3:22 PM, Manickam P  wrote:





Hi,
I've installed hadoop-2.1.0-beta version and configured. When i start 
http://10.108.19.68:50070 i'm getting the page. While clicking on the browse 
file system i'm getting dns unresolved host name error. I've given below the 
browser URL. 


http://lab2-hadoop2-vm1.eng.com:50075/browseDirectory.jsp?namenodeInfoPort=50070&dir=/&nnaddr=10.108.19.68:9000


I check all the configuration files I've given the correct ip everywhere. No 
localhost is available. I've the host name entry in /etc/hosts file also with 
correct host name and ip address. 


Please help me.

Thanks,
Manickam P



  

Re: Hadoop - Browsefile system error

2013-09-16 Thread Jitendra Yadav
If you don't mind can you please share your host entries from all the nodes?
Also let me know from which host you are accessing the URL.

Regards
Jitendra


On Mon, Sep 16, 2013 at 3:56 PM, Manickam P  wrote:

> Hi,
>
> I checked the host entry. It has all the details. I checked my data nodes
> also even It has the proper host entry.
> I don't have any clue here.
>
>
>
> Thanks,
> Manickam P
> --
> Date: Mon, 16 Sep 2013 15:41:03 +0530
>
> Subject: Re: Hadoop - Browsefile system error
> From: jeetuyadav200...@gmail.com
> To: user@hadoop.apache.org
>
> Hi,
>
> Looks like your are using hadoop in Virtual environment right?
>
> Make sure you have 
> lab2-hadoop2-vm1.eng.dnb.com
> domain and IP entry in your host file from where your are accessing
> http://10.108.19.68:50070
> URL. I believe your accessing this URl from your physical box right?
>
> Thanks
> Jitendra
>
>
> On Mon, Sep 16, 2013 at 3:26 PM, Jitendra Yadav <
> jeetuyadav200...@gmail.com> wrote:
>
> Hi,
>
> From where you are accessing this 
> "http://10.108.19.68:50070
> " URL?
>
> Regards
> Jitendra
>
>
> On Mon, Sep 16, 2013 at 3:22 PM, Manickam P wrote:
>
> Hi,
>
> I've installed hadoop-2.1.0-beta version and configured. When i start
> http://10.108.19.68:50070
> i'm getting the page. While clicking on the browse file system i'm getting
> dns unresolved host name error. I've given below the browser URL.
>
>
> http://lab2-hadoop2-vm1.eng.com:50075/browseDirectory.jsp?namenodeInfoPort=50070&dir=/&nnaddr=10.108.19.68:9000
>
> I check all the configuration files I've given the correct ip everywhere.
> No localhost is available. I've the host name entry in /etc/hosts file also
> with correct host name and ip address.
>
> Please help me.
>
>
> Thanks,
> Manickam P
>
>
>
>


RE: Hadoop - Browsefile system error

2013-09-16 Thread Manickam P
Hi Jitendra,
The below are the host entry in my name node. Here 68 is the name node and 
other two are data nodes. i'm having the same entry in data nodes host file. 
10.108.19.68lab2-hadoop.eng.com lab2-hadoop10.108.19.69
lab2-hadoop2-vm1.eng.com lab2-hadoop2-vm110.108.19.70
lab2-hadoop2-vm2.eng.com lab2-hadoop2-vm2
I tried to access the URL from master node. I used http://10.108.19.68:50070/ 
to open the file system. 


Thanks,
Manickam PDate: Mon, 16 Sep 2013 16:02:44 +0530
Subject: Re: Hadoop - Browsefile system error
From: jeetuyadav200...@gmail.com
To: user@hadoop.apache.org

If you don't mind can you please share your host entries from all the 
nodes?Also let me know from which host you are accessing the URL.

RegardsJitendra


On Mon, Sep 16, 2013 at 3:56 PM, Manickam P  wrote:




Hi,
I checked the host entry. It has all the details. I checked my data nodes also 
even It has the proper host entry. 
I don't have any clue here.


Thanks,

Manickam P
Date: Mon, 16 Sep 2013 15:41:03 +0530
Subject: Re: Hadoop - Browsefile system error
From: jeetuyadav200...@gmail.com

To: user@hadoop.apache.org

Hi,
Looks like your are using hadoop in Virtual environment right?

Make sure you have lab2-hadoop2-vm1.eng.dnb.com  domain and IP entry in your 
host file from where your are accessing http://10.108.19.68:50070 URL. I 
believe your accessing this URl from your physical box right?


Thanks
Jitendra


On Mon, Sep 16, 2013 at 3:26 PM, Jitendra Yadav  
wrote:

Hi,
>From where you are accessing this "http://10.108.19.68:50070 " URL?



RegardsJitendra

On Mon, Sep 16, 2013 at 3:22 PM, Manickam P  wrote:






Hi,
I've installed hadoop-2.1.0-beta version and configured. When i start 
http://10.108.19.68:50070 i'm getting the page. While clicking on the browse 
file system i'm getting dns unresolved host name error. I've given below the 
browser URL. 



http://lab2-hadoop2-vm1.eng.com:50075/browseDirectory.jsp?namenodeInfoPort=50070&dir=/&nnaddr=10.108.19.68:9000



I check all the configuration files I've given the correct ip everywhere. No 
localhost is available. I've the host name entry in /etc/hosts file also with 
correct host name and ip address. 



Please help me.

Thanks,
Manickam P



  

  

Re: Hadoop - Browsefile system error

2013-09-16 Thread Jitendra Yadav
Can you try this from your web browser?

http://
10.108.19.69
:50075/browseDirectory.jsp?namenodeInfoPort=50070&dir=/&nnaddr=10.108.19.68:9000

Thanks


On Mon, Sep 16, 2013 at 4:25 PM, Manickam P  wrote:

> Hi Jitendra,
>
> The below are the host entry in my name node. Here 68 is the name node and
> other two are data nodes. i'm having the same entry in data nodes host
> file.
>
> 10.108.19.68lab2-hadoop.eng.com lab2-hadoop
> 10.108.19.69lab2-hadoop2-vm1.eng.com lab2-hadoop2-vm1
> 10.108.19.70lab2-hadoop2-vm2.eng.com lab2-hadoop2-vm2
>
> I tried to access the URL from master node. I used *
> http://10.108.19.68:50070/* to open the file system.
>
>
>
> Thanks,
> Manickam P
> --
> Date: Mon, 16 Sep 2013 16:02:44 +0530
>
> Subject: Re: Hadoop - Browsefile system error
> From: jeetuyadav200...@gmail.com
> To: user@hadoop.apache.org
>
> If you don't mind can you please share your host entries from all the
> nodes?
> Also let me know from which host you are accessing the URL.
>
> Regards
> Jitendra
>
>
> On Mon, Sep 16, 2013 at 3:56 PM, Manickam P wrote:
>
> Hi,
>
> I checked the host entry. It has all the details. I checked my data nodes
> also even It has the proper host entry.
> I don't have any clue here.
>
>
>
> Thanks,
> Manickam P
> --
> Date: Mon, 16 Sep 2013 15:41:03 +0530
>
> Subject: Re: Hadoop - Browsefile system error
> From: jeetuyadav200...@gmail.com
> To: user@hadoop.apache.org
>
> Hi,
>
> Looks like your are using hadoop in Virtual environment right?
>
> Make sure you have 
> lab2-hadoop2-vm1.eng.dnb.com
> domain and IP entry in your host file from where your are accessing
> http://10.108.19.68:50070
> URL. I believe your accessing this URl from your physical box right?
>
> Thanks
> Jitendra
>
>
> On Mon, Sep 16, 2013 at 3:26 PM, Jitendra Yadav <
> jeetuyadav200...@gmail.com> wrote:
>
> Hi,
>
> From where you are accessing this 
> "http://10.108.19.68:50070
> " URL?
>
> Regards
> Jitendra
>
>
> On Mon, Sep 16, 2013 at 3:22 PM, Manickam P wrote:
>
> Hi,
>
> I've installed hadoop-2.1.0-beta version and configured. When i start
> http://10.108.19.68:50070
> i'm getting the page. While clicking on the browse file system i'm getting
> dns unresolved host name error. I've given below the browser URL.
>
>
> http://lab2-hadoop2-vm1.eng.com:50075/browseDirectory.jsp?namenodeInfoPort=50070&dir=/&nnaddr=10.108.19.68:9000
>
> I check all the configuration files I've given the correct ip everywhere.
> No localhost is available. I've the host name entry in /etc/hosts file also
> with correct host name and ip address.
>
> Please help me.
>
>
> Thanks,
> Manickam P
>
>
>
>
>


RE: Hadoop - Browsefile system error

2013-09-16 Thread Manickam P
hi,
It works. but why it is not coming for master node? 

Thanks,
Manickam P

Date: Mon, 16 Sep 2013 16:53:31 +0530
Subject: Re: Hadoop - Browsefile system error
From: jeetuyadav200...@gmail.com
To: user@hadoop.apache.org

Can you try this from your web browser? 
http://10.108.19.69:50075/browseDirectory.jsp?namenodeInfoPort=50070&dir=/&nnaddr=10.108.19.68:9000

Thanks

On Mon, Sep 16, 2013 at 4:25 PM, Manickam P  wrote:




Hi Jitendra,
The below are the host entry in my name node. Here 68 is the name node and 
other two are data nodes. i'm having the same entry in data nodes host file. 

10.108.19.68lab2-hadoop.eng.com lab2-hadoop10.108.19.69
lab2-hadoop2-vm1.eng.com lab2-hadoop2-vm1
10.108.19.70lab2-hadoop2-vm2.eng.com lab2-hadoop2-vm2

I tried to access the URL from master node. I used http://10.108.19.68:50070/ 
to open the file system. 



Thanks,
Manickam PDate: Mon, 16 Sep 2013 16:02:44 +0530
Subject: Re: Hadoop - Browsefile system error
From: jeetuyadav200...@gmail.com

To: user@hadoop.apache.org

If you don't mind can you please share your host entries from all the 
nodes?Also let me know from which host you are accessing the URL.


RegardsJitendra


On Mon, Sep 16, 2013 at 3:56 PM, Manickam P  wrote:




Hi,
I checked the host entry. It has all the details. I checked my data nodes also 
even It has the proper host entry. 

I don't have any clue here.


Thanks,


Manickam P
Date: Mon, 16 Sep 2013 15:41:03 +0530
Subject: Re: Hadoop - Browsefile system error
From: jeetuyadav200...@gmail.com


To: user@hadoop.apache.org

Hi,
Looks like your are using hadoop in Virtual environment right?

Make sure you have lab2-hadoop2-vm1.eng.dnb.com  domain and IP entry in your 
host file from where your are accessing http://10.108.19.68:50070 URL. I 
believe your accessing this URl from your physical box right?



Thanks
Jitendra


On Mon, Sep 16, 2013 at 3:26 PM, Jitendra Yadav  
wrote:

Hi,
>From where you are accessing this "http://10.108.19.68:50070 " URL?




RegardsJitendra

On Mon, Sep 16, 2013 at 3:22 PM, Manickam P  wrote:







Hi,
I've installed hadoop-2.1.0-beta version and configured. When i start 
http://10.108.19.68:50070 i'm getting the page. While clicking on the browse 
file system i'm getting dns unresolved host name error. I've given below the 
browser URL. 




http://lab2-hadoop2-vm1.eng.com:50075/browseDirectory.jsp?namenodeInfoPort=50070&dir=/&nnaddr=10.108.19.68:9000




I check all the configuration files I've given the correct ip everywhere. No 
localhost is available. I've the host name entry in /etc/hosts file also with 
correct host name and ip address. 




Please help me.

Thanks,
Manickam P



  

  

  

Re: Hadoop - Browsefile system error

2013-09-16 Thread Jitendra Yadav
Hi,

Because you were are accessing name node through ip address not by domain
name.
may be you have some ip/domain resolver issue, can you please use ping by
domain name/ip/hostname from all the nodes?

Regards
Jitendra


On Mon, Sep 16, 2013 at 5:08 PM, Manickam P  wrote:

> hi,
>
> It works. but why it is not coming for master node?
>
>
> Thanks,
> Manickam P
>
> --
> Date: Mon, 16 Sep 2013 16:53:31 +0530
>
> Subject: Re: Hadoop - Browsefile system error
> From: jeetuyadav200...@gmail.com
> To: user@hadoop.apache.org
>
> Can you try this from your web browser?
>
> http://
> 10.108.19.69
> :50075/browseDirectory.jsp?namenodeInfoPort=50070&dir=/&nnaddr=10.108.19.68:9000
>
> Thanks
>
>
> On Mon, Sep 16, 2013 at 4:25 PM, Manickam P wrote:
>
> Hi Jitendra,
>
> The below are the host entry in my name node. Here 68 is the name node and
> other two are data nodes. i'm having the same entry in data nodes host
> file.
>
> 10.108.19.68lab2-hadoop.eng.com lab2-hadoop
> 10.108.19.69lab2-hadoop2-vm1.eng.com lab2-hadoop2-vm1
> 10.108.19.70lab2-hadoop2-vm2.eng.com lab2-hadoop2-vm2
>
> I tried to access the URL from master node. I used *
> http://10.108.19.68:50070/* to open the file system.
>
>
>
> Thanks,
> Manickam P
> --
> Date: Mon, 16 Sep 2013 16:02:44 +0530
>
> Subject: Re: Hadoop - Browsefile system error
> From: jeetuyadav200...@gmail.com
> To: user@hadoop.apache.org
>
> If you don't mind can you please share your host entries from all the
> nodes?
> Also let me know from which host you are accessing the URL.
>
> Regards
> Jitendra
>
>
> On Mon, Sep 16, 2013 at 3:56 PM, Manickam P wrote:
>
> Hi,
>
> I checked the host entry. It has all the details. I checked my data nodes
> also even It has the proper host entry.
> I don't have any clue here.
>
>
>
> Thanks,
> Manickam P
> --
> Date: Mon, 16 Sep 2013 15:41:03 +0530
>
> Subject: Re: Hadoop - Browsefile system error
> From: jeetuyadav200...@gmail.com
> To: user@hadoop.apache.org
>
> Hi,
>
> Looks like your are using hadoop in Virtual environment right?
>
> Make sure you have 
> lab2-hadoop2-vm1.eng.dnb.com
> domain and IP entry in your host file from where your are accessing
> http://10.108.19.68:50070
> URL. I believe your accessing this URl from your physical box right?
>
> Thanks
> Jitendra
>
>
> On Mon, Sep 16, 2013 at 3:26 PM, Jitendra Yadav <
> jeetuyadav200...@gmail.com> wrote:
>
> Hi,
>
> From where you are accessing this 
> "http://10.108.19.68:50070
> " URL?
>
> Regards
> Jitendra
>
>
> On Mon, Sep 16, 2013 at 3:22 PM, Manickam P wrote:
>
> Hi,
>
> I've installed hadoop-2.1.0-beta version and configured. When i start
> http://10.108.19.68:50070
> i'm getting the page. While clicking on the browse file system i'm getting
> dns unresolved host name error. I've given below the browser URL.
>
>
> http://lab2-hadoop2-vm1.eng.com:50075/browseDirectory.jsp?namenodeInfoPort=50070&dir=/&nnaddr=10.108.19.68:9000
>
> I check all the configuration files I've given the correct ip everywhere.
> No localhost is available. I've the host name entry in /etc/hosts file also
> with correct host name and ip address.
>
> Please help me.
>
>
> Thanks,
> Manickam P
>
>
>
>
>
>


hadoop join example

2013-09-16 Thread Konstantinos A .
Hi all,

Can anyone explains how the join example in the hadoop source code  examples 
folder works?

What I don't really understand is how it works the "mapred.join.expr" parameter 
in the jobconf.

Thanks in advance!

K.A

RE: Hadoop - Browsefile system error

2013-09-16 Thread Manickam P
Hi,
I checked as you said i'm able to ping with my data nodes using domain name as 
well as with ip address. 

Thanks,
Manickam P
Date: Mon, 16 Sep 2013 17:16:07 +0530
Subject: Re: Hadoop - Browsefile system error
From: jeetuyadav200...@gmail.com
To: user@hadoop.apache.org

Hi,
Because you were are accessing name node through ip address not by domain 
name.may be you have some ip/domain resolver issue, can you please use ping by 
domain name/ip/hostname from all the nodes?

RegardsJitendra

On Mon, Sep 16, 2013 at 5:08 PM, Manickam P  wrote:




hi,
It works. but why it is not coming for master node? 


Thanks,
Manickam P


Date: Mon, 16 Sep 2013 16:53:31 +0530
Subject: Re: Hadoop - Browsefile system error
From: jeetuyadav200...@gmail.com
To: user@hadoop.apache.org


Can you try this from your web browser? 
http://10.108.19.69:50075/browseDirectory.jsp?namenodeInfoPort=50070&dir=/&nnaddr=10.108.19.68:9000


Thanks

On Mon, Sep 16, 2013 at 4:25 PM, Manickam P  wrote:





Hi Jitendra,
The below are the host entry in my name node. Here 68 is the name node and 
other two are data nodes. i'm having the same entry in data nodes host file. 


10.108.19.68lab2-hadoop.eng.com lab2-hadoop10.108.19.69
lab2-hadoop2-vm1.eng.com lab2-hadoop2-vm1

10.108.19.70lab2-hadoop2-vm2.eng.com lab2-hadoop2-vm2


I tried to access the URL from master node. I used http://10.108.19.68:50070/ 
to open the file system. 




Thanks,
Manickam PDate: Mon, 16 Sep 2013 16:02:44 +0530
Subject: Re: Hadoop - Browsefile system error
From: jeetuyadav200...@gmail.com


To: user@hadoop.apache.org

If you don't mind can you please share your host entries from all the 
nodes?Also let me know from which host you are accessing the URL.



RegardsJitendra


On Mon, Sep 16, 2013 at 3:56 PM, Manickam P  wrote:




Hi,
I checked the host entry. It has all the details. I checked my data nodes also 
even It has the proper host entry. 


I don't have any clue here.


Thanks,



Manickam P
Date: Mon, 16 Sep 2013 15:41:03 +0530
Subject: Re: Hadoop - Browsefile system error
From: jeetuyadav200...@gmail.com



To: user@hadoop.apache.org

Hi,
Looks like your are using hadoop in Virtual environment right?

Make sure you have lab2-hadoop2-vm1.eng.dnb.com  domain and IP entry in your 
host file from where your are accessing http://10.108.19.68:50070 URL. I 
believe your accessing this URl from your physical box right?




Thanks
Jitendra


On Mon, Sep 16, 2013 at 3:26 PM, Jitendra Yadav  
wrote:

Hi,
>From where you are accessing this "http://10.108.19.68:50070 " URL?





RegardsJitendra

On Mon, Sep 16, 2013 at 3:22 PM, Manickam P  wrote:








Hi,
I've installed hadoop-2.1.0-beta version and configured. When i start 
http://10.108.19.68:50070 i'm getting the page. While clicking on the browse 
file system i'm getting dns unresolved host name error. I've given below the 
browser URL. 





http://lab2-hadoop2-vm1.eng.com:50075/browseDirectory.jsp?namenodeInfoPort=50070&dir=/&nnaddr=10.108.19.68:9000





I check all the configuration files I've given the correct ip everywhere. No 
localhost is available. I've the host name entry in /etc/hosts file also with 
correct host name and ip address. 





Please help me.

Thanks,
Manickam P



  

  

  

  

Re: Unclear Hadoop 2.1X documentation

2013-09-16 Thread Karthik Kambatla
Moving general@ to bcc and redirecting this to the appropriate list -
user@hadoop.apache.org


On Mon, Sep 16, 2013 at 2:18 AM, Jagat Singh  wrote:

> Hello Mahmoud
>
> You can run on your machine also.
>
> I learnt everything on my 3gb 2ghz machine and recently got better machine.
>
> If you follow this post below you should be able to install and run hadoop
> in 30 mins.
>
> If your machine is not linux then i suggest you to download virtualbox ,
> give it 1400mb ram and start ubuntu in it.
>
> Then just follow steps here.
>
>
> http://jugnu-life.blogspot.com.au/2012/05/hadoop-20-install-tutorial-023x.html
>
> Thanks,
>
> Jagat
> On 16/09/2013 7:07 PM, "Mahmoud Al-Ewiwi"  wrote:
>
> > Thanks Ted,
> >
> > for now i just need to learn the basics of the hadoop before going to ask
> > my university for more powerful machines.
> > i just want to know how to install and write some simple programs to ask
> my
> > supervisor for another server machines
> >
> > Best Regards
> >
> >
> > On Mon, Sep 16, 2013 at 3:57 AM, Ted Dunning 
> > wrote:
> >
> > > This is a very small amount of memory for running Hadoop + user
> programs.
> > >
> > > You might consider running your tests on a cloud provider like Amazon.
> > >  That will give you access to decent sized machines for a relatively
> > small
> > > cost.
> > >
> > >
> > > On Sun, Sep 15, 2013 at 11:27 AM, Mahmoud Al-Ewiwi 
> > > wrote:
> > >
> > > > Thanks to all, i'v tried to use some of these sandboxs, but
> > unfortunately
> > > > most of them a high amount of memory(3GB) for the guest machine and i
> > > have
> > > > only a 3GB on my machine (old machine), so i'm going to go along with
> > the
> > > > the normal installation (i have no choice)
> > > >
> > > > Thanks
> > > >
> > > >
> > > > On Sun, Sep 15, 2013 at 9:13 AM, Roman Shaposhnik 
> > > wrote:
> > > >
> > > > > On Sat, Sep 14, 2013 at 10:54 AM, Mahmoud Al-Ewiwi <
> mew...@gmail.com
> > >
> > > > > wrote:
> > > > > > Hello,
> > > > > >
> > > > > > I'm new to Hadoop and i want to learn it in order to do a
> project.
> > > > > > I'v started reading the documentation at this site:
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> http://hadoop.apache.org/docs/r2.1.0-beta/hadoop-project-dist/hadoop-common/SingleCluster.html
> > > > > >
> > > > > > for setting a single node, but i could not figure a lot of things
> > in
> > > > > these
> > > > > > documentation.
> > > > >
> > > > > For the first timer like yourself, perhaps using a Hadoop
> > distribution
> > > > > would be the best way to get started. Bigtop offers a 100%
> community
> > > > > driven distro, but there are, of course, vendor choices as well.
> > > > >
> > > > > Here's the info on Bigtop:
> > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/BIGTOP/How+to+install+Hadoop+distribution+from+Bigtop+0.6.0
> > > > >
> > > > > Thanks,
> > > > > Roman.
> > > > >
> > > >
> > >
> >
>


Re: assign tasks to specific nodes

2013-09-16 Thread Omkar Joshi
Potentially you would be able to but I guess you will have to update the
partitioning code and correspondingly RMContainerAllocator (YARN-map
reduce) code. Today we have same priority for all map task < same priority
for all reduce task. What you can do is to change the MAP task priorities
based on partition size (file size).  Make sure when you are assigning
priorities to container request
priorities for containers for corresponding map tasks
apartment > room > villa

However you should notice few things here..plus I have few questions for
you..
1) I don't see why you want to do this but for your task to succeed you
will need all the of the map tasks to finish.. why you want this ordering??
any benefits?
2) Even if you submit all the requests with specified priorities you are
not guaranteed to get them in same order because most of these requests are
for specific host machines (node managers) so we don't know in advance
whether sufficient resources will be available there or not.

Thanks,
Omkar Joshi
*Hortonworks Inc.* 


On Wed, Sep 11, 2013 at 4:08 PM, Mark Olimpiati  wrote:

> Hi Vinod, I had the node assignment at first but in my second email I
> explained how I want to change the order of data partition execution. The
> default is run tasks based on the *size *of the assigned partition to it.
> Now I want to run tasks such that specific order of partitions is to be
> executed.
>
> Eg. First assume input is directory Houses/ with files {Villa, Apartment,
> Room} such that file "Villa" is larger in size than "Apartments" than
> "Room".
>
> The default hadoop would run :
> map1 --> Villa
> map2 --> Apartment
> map3 --> Room
>
> I want to assign priorities to the *data partitions* such that
> Apartment=1, Room=2, Villa=3 then the scheduler will run the following in
> this order:
> map1 --> Apartment
> map2 --> Room
> map3 --> Villa
>
> My question is that possible? Notice this is regardless of the assigned
> node.
> Thank you,
> Mark
>
>
> On Wed, Sep 11, 2013 at 10:45 AM, Vinod Kumar Vavilapalli <
> vino...@apache.org> wrote:
>
>>
>> I assume you are talking about MapReduce. And 1.x release or 2.x?
>>
>> In either of the releases, this cannot be done directly.
>>
>> In 1.x, the framework doesn't expose a feature like this as it is a
>> shared service, and if enough jobs flock to a node, it will lead to
>> utilization and failure handling issues.
>>
>> In Hadoop 2 YARN, the platform does expose this functionality. But
>> MapReduce framework doesn't yet expose this functionality to the end users.
>>
>> What exactly is your use case? Why are some nodes of higher priority than
>> others?
>>
>>  Thanks,
>> +Vinod Kumar Vavilapalli
>> Hortonworks Inc.
>> http://hortonworks.com/
>>
>> On Sep 11, 2013, at 10:09 AM, Mark Olimpiati wrote:
>>
>> Thanks for replying Rev, but the link is talking about reducers which
>> seems to be like a similar case but what if I assigned priorities to the
>> data partitions (eg. partition B=1, partition C=2, partition A=3,...) such
>> that first map task is assigned partition B to run first. Then second map
>> is given partition C, .. etc. This is instead of assigning based on
>> partition size. Is that possible?
>>
>> Thanks,
>> Mark
>>
>>
>> On Mon, Sep 9, 2013 at 11:17 AM, Ravi Prakash  wrote:
>>
>>>
>>> http://lucene.472066.n3.nabble.com/Assigning-reduce-tasks-to-specific-nodes-td4022832.html
>>>
>>>   --
>>>  *From:* Mark Olimpiati 
>>> *To:* user@hadoop.apache.org
>>> *Sent:* Friday, September 6, 2013 1:47 PM
>>> *Subject:* assign tasks to specific nodes
>>>
>>> Hi guys,
>>>
>>>I'm wondering if there is a way for me to assign tasks to specific
>>> machines or at least assign priorities to the tasks to be executed in that
>>> order. Any suggestions?
>>>
>>> Thanks,
>>> Mark
>>>
>>>
>>>
>>
>>
>> CONFIDENTIALITY NOTICE
>> NOTICE: This message is intended for the use of the individual or entity
>> to which it is addressed and may contain information that is confidential,
>> privileged and exempt from disclosure under applicable law. If the reader
>> of this message is not the intended recipient, you are hereby notified that
>> any printing, copying, dissemination, distribution, disclosure or
>> forwarding of this communication is strictly prohibited. If you have
>> received this communication in error, please contact the sender immediately
>> and delete it from your system. Thank You.
>
>
>

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender imme

Re: Resource limits with Hadoop and JVM

2013-09-16 Thread Vinod Kumar Vavilapalli
I assume you are on Linux. Also assuming that your tasks are so resource 
intensive that they are taking down nodes. You should enable limits per task, 
see http://hadoop.apache.org/docs/stable/cluster_setup.html#Memory+monitoring

What it does is that jobs are now forced to up front provide their resource 
requirements, and TTs enforce those limits.

HTH
+Vinod Kumar Vavilapalli
Hortonworks Inc.
http://hortonworks.com/

On Sep 16, 2013, at 1:35 PM, Forrest Aldrich wrote:

> We recently experienced a couple of situations that brought one or more 
> Hadoop nodes down (unresponsive).   One was related to a bug in a utility we 
> use (ffmpeg) that was resolved by compiling a new version. The next, today, 
> occurred after attempting to join a new node to the cluster.   
> 
> A basic start of the (local) tasktracker and datanode did not work -- so 
> based on reference, I issued:  hadoop mradmin -refreshNodes, which was to be 
> followed by hadoop dfsadmin -refreshNodes.The load average literally 
> jumped to 60 and the master (which also runs a slave) became unresponsive.
> 
> Seems to me that this should never happen.   But, looking around, I saw an 
> article from Spotify which mentioned the need to set certain resource limits 
> on the JVM as well as in the system itself (limits.conf, we run RHEL).I 
> (and we) are fairly new to Hadoop, so some of these issues are very new.
> 
> I wonder if some of the experts here might be able to comment on this issue - 
> perhaps point out settings and other measures we can take to prevent this 
> sort of incident in the future.
> 
> Our setup is not complicated.   Have 3 hadoop nodes, the first is also a 
> master and a slave (has more resources, too).   The underlying system we do 
> is split up tasks to ffmpeg  (which is another issue as it tends to eat 
> resources, but so far with a recompile, we are good).   We have two more 
> hardware nodes to add shortly.
> 
> 
> Thanks!


-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


signature.asc
Description: Message signed with OpenPGP using GPGMail


dfs.namenode.edits.dir and dfs.namenode.shared.edits.dir

2013-09-16 Thread Bryan Beaudreault
I am running CDH4.2.

I've noticed that my NameNodes are logging edits both locally and to the
journalnodes.  I took a look at the code, and this doesn't seem to be
required -- and also, it's the whole point of QJM right?

However, due to the following, we are logging both locally and to the
quorum:


  dfs.namenode.edits.dir
  ${dfs.namenode.name.dir}
  hdfs-default.xml


Two questions:

1) Is this intended, or should I file a JIRA?
2) Is it indeed safe/recommended to set dfs.namenode.edits.dir to empty, so
that we only write to the quorum?


Re: SequenceFile output in Wordcount example

2013-09-16 Thread Karthik Kambatla
Moving general@ to bcc



On Mon, Sep 16, 2013 at 1:20 PM, xeon  wrote:

> Hi,
>
> - I want that the wordcount example produces a SequenceFile output with
> the result. How I do this?
>
> - I want also to do a cat to the SequenceFile and read the result. A
> simple "hdfs dfs -cat sequencefile" is enough?
>
>
> Thanks,
>


Re: Resource limits with Hadoop and JVM

2013-09-16 Thread Forrest Aldrich

Yes, I mentioned below we're running RHEL.

In this case, when I went to add the node, I ran "hadoop mradmin 
-refreshNodes" (as user hadoop) and the master node went completely nuts 
- the system load jumped to 60 ("top" was frozen on the console) and 
required a hard reboot.


Whether or not the slave node I added had errors in the *.xml, this 
should never happen.  At least, I would like it if it never happened 
again ;-)


We're running:

java version "1.6.0_39"
Java(TM) SE Runtime Environment (build 1.6.0_39-b04)
Java HotSpot(TM) 64-Bit Server VM (build 20.14-b01, mixed mode)

Hadoop v1.0.1

Perhaps we ran into a bug?   I know we need to upgrade, but we're being 
very cautious about changes to the production environment.  If it works, 
don't fix it type of approach.




Thanks,

Forrest



On 9/16/13 5:04 PM, Vinod Kumar Vavilapalli wrote:
I assume you are on Linux. Also assuming that your tasks are so 
resource intensive that they are taking down nodes. You should enable 
limits per task, see 
http://hadoop.apache.org/docs/stable/cluster_setup.html#Memory+monitoring


What it does is that jobs are now forced to up front provide their 
resource requirements, and TTs enforce those limits.


HTH
+Vinod Kumar Vavilapalli
Hortonworks Inc.
http://hortonworks.com/

On Sep 16, 2013, at 1:35 PM, Forrest Aldrich wrote:

We recently experienced a couple of situations that brought one or 
more Hadoop nodes down (unresponsive).   One was related to a bug in 
a utility we use (ffmpeg) that was resolved by compiling a new 
version. The next, today, occurred after attempting to join a new 
node to the cluster.


A basic start of the (local) tasktracker and datanode did not work -- 
so based on reference, I issued: hadoop mradmin -refreshNodes, which 
was to be followed by hadoop dfsadmin -refreshNodes.The load 
average literally jumped to 60 and the master (which also runs a 
slave) became unresponsive.


Seems to me that this should never happen.   But, looking around, I 
saw an article from Spotify which mentioned the need to set certain 
resource limits on the JVM as well as in the system itself 
(limits.conf, we run RHEL).I (and we) are fairly new to Hadoop, 
so some of these issues are very new.


I wonder if some of the experts here might be able to comment on this 
issue - perhaps point out settings and other measures we can take to 
prevent this sort of incident in the future.


Our setup is not complicated.   Have 3 hadoop nodes, the first is 
also a master and a slave (has more resources, too).   The underlying 
system we do is split up tasks to ffmpeg  (which is another issue as 
it tends to eat resources, but so far with a recompile, we are 
good).   We have two more hardware nodes to add shortly.



Thanks!



CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or 
entity to which it is addressed and may contain information that is 
confidential, privileged and exempt from disclosure under applicable 
law. If the reader of this message is not the intended recipient, you 
are hereby notified that any printing, copying, dissemination, 
distribution, disclosure or forwarding of this communication is 
strictly prohibited. If you have received this communication in error, 
please contact the sender immediately and delete it from your system. 
Thank You. 





Re: Cloudera Vs Hortonworks Vs MapR

2013-09-16 Thread Xuri Nagarin
So I will try to answer the OP's question best I can without deviating too
much into opinions and stick to facts. Disclaimer: I am not an employee of
either vendor or any partner of theirs.

Context is important: My team's use case was general data exploration of
semi-structured log data and we had no typical data-warehouse type of
existing use cases. Also, our's is a small (less than 30 nodes cluster). In
terms of ops/maintenance, we only have one person. I point this out because
lots of hadoop shops have dedicated team for each - OS administration,
Hadoop admin, Hadoop developers. And, they are very mature in terms of
their compute use cases. To my mind, these aspects can significantly impact
your vendor choices.

MapR: My team simply did not consider them because of all the proprietary
code in there. We are trying to move from a monolithic proprietary product
and one of the criteria we set was - if we decided to move away from the
chosen hadoop vendor, can we easily unlock our data?
HortonWorks: Distro uses HDFS 1.x with MRv2. All open source. Cluster
management is via Ambari. Compared to Cloudera's CM, Ambari has very
rudimentary features. But you have to keep in mind that Ambari is only an
year old where as CM already has been under development for several years.
This was a major selection factor for us because Ambari did not have all
the automation/feature-set compared to CM for a single
administrator/developer to easily maintain the cluster. Also, during the
trial period, Hortonwork's packing format/structure apparently kept
changing which made things a bit difficult to centrally deploy/administer.

Cloudera: Distro uses HDFS 2.x with MRv1. All open source except cluster
management which is via their proprietary Cloudera Manager tool. It is free
for use without certain feature like auditing and cluster replication
features. Maybe a few more features are restricted to
Enterprise/Licensed-only version. Offers much more features than Ambari. In
terms of cluster administration, I found CM much easy to work with than
Ambari. Pretty much all aspects from deploying new nodes to configuration
and troubleshooting is much more refined than Ambari.

During the selection process, what I found was that both vendors are very
aggressive in their pitch. So much so that each pushes some FUD regarding
the competition.

HW uses HDFS 1.x + MRv2 while CDH uses HDFS 2.x + MRv1. HW claimed that
Cloudera's distro is heavily patched off-course from the core Apache trunk
that can cause severe data corruption issues. Yes, Cloudera has some 1500+
patches over apache's Hadoop distro but (1) they aren't private patches.
You can pull the list and verify that yourself just as I did. (2) In our
testing and talking to other Cloudera customers, I couldn't find any issues
with data corruption. It is true though that HDFS 2.x is still in beta but
so is MRv2 that HW uses. I think both are stable and work well - depending
on what you need but each uses that point to create FUD.

HW also claimed that a new SQL engine that Cloudera's including in their
distro - Impala is proprietary. Not true. The software is open source. But
if you want support for Impala then Cloudera will charge you separately per
node for Impala over and above what they charge per node for Hadoop support.

In my experience, both products have plenty of issues when it comes to
compute engines - Hive, Pig etc and their cluster management software. HDFS
seem to be solid in both distros. So I wouldn't call either of them
trouble-free and neither is at the maturity level of other popular
enterprise products like say, Oracle. That said, you have to keep in mind
that both vendors/products are successfully used by several customers so
again, it is more a question of what fits your needs.

In the end, we chose to go with Cloudera mostly because a more positive
experience with CM in terms of administration/operations and their
pre-sales team when compared to HW. Again, that said, another team that we
closely work with chose HW for their cluster. I use both vendors/clusters
at work and neither has any significant issues.




On Sat, Sep 14, 2013 at 12:37 PM, Chris Mattmann wrote:

> Here's the deal, folks can post questions to the list that aren't
> abusive and simply asking what the difference between different vendor
> implementations (downstream) of Apache  Hadoop is not an inflammatory
> or abusive question.
>
> Stick to the facts. Discuss it here. Why should the Apache Hadoop
> PMC push off potentially useful questions that may have upstream
> implications to the Apache  Hadoop core and let all the innovation
> occur downstream?
>
> Have the conversations here if you'd like. I wouldn't turn anyone
> away..
>
> My 2c.
>
> Cheers,
> Chris
>
> Original Message-
>
> From: Shahab Yunus 
> Reply-To: "user@hadoop.apache.org" 
> Date: Friday, September 13, 2013 10:48 AM
> To: "user@hadoop.apache.org" 
> Subject: Re: Cloudera Vs Hortonworks Vs MapR
>
> >I think, in my opinion

Resource limits with Hadoop and JVM

2013-09-16 Thread Forrest Aldrich
We recently experienced a couple of situations that brought one or more 
Hadoop nodes down (unresponsive).   One was related to a bug in a 
utility we use (ffmpeg) that was resolved by compiling a new version. 
The next, today, occurred after attempting to join a new node to the 
cluster.


A basic start of the (local) tasktracker and datanode did not work -- so 
based on reference, I issued: hadoop mradmin -refreshNodes, which was to 
be followed by hadoop dfsadmin -refreshNodes.The load average 
literally jumped to 60 and the master (which also runs a slave) became 
unresponsive.


Seems to me that this should never happen.   But, looking around, I saw 
an article from Spotify which mentioned the need to set certain resource 
limits on the JVM as well as in the system itself (limits.conf, we run 
RHEL).I (and we) are fairly new to Hadoop, so some of these issues 
are very new.


I wonder if some of the experts here might be able to comment on this 
issue - perhaps point out settings and other measures we can take to 
prevent this sort of incident in the future.


Our setup is not complicated.   Have 3 hadoop nodes, the first is also a 
master and a slave (has more resources, too).   The underlying system we 
do is split up tasks to ffmpeg  (which is another issue as it tends to 
eat resources, but so far with a recompile, we are good).   We have two 
more hardware nodes to add shortly.



Thanks!


mapred.join package not migrated to mapreduce

2013-09-16 Thread Ivan Balashov
Hi,

Just wondering if there is any particular reason that 'mapred.join' package
never found its way into 'mapreduce'. Being in the old space makes its use
rather inconvenient when most of its former neighbors now happily live in
the new package.

Is this package recommended for prod at all, or maybe, there is an
alternative?

Thanks,

-- 
Ivan Balashov


Re: Cloudera Vs Hortonworks Vs MapR

2013-09-16 Thread Chris Embree
Our evaluation was similar except we did not consider the "management"
tools any vendor provided as that's just as much lock in as any proprietary
tool.  What if I want trade vendors?  I have to re-tool to use there mgmt?
 Nope, wrote our own.

Being in a large enterprise, we went with the "perceived" more stable
platform.  Draw your own conclusions.


On Mon, Sep 16, 2013 at 6:10 PM, Xuri Nagarin  wrote:

> So I will try to answer the OP's question best I can without deviating too
> much into opinions and stick to facts. Disclaimer: I am not an employee of
> either vendor or any partner of theirs.
>
> Context is important: My team's use case was general data exploration of
> semi-structured log data and we had no typical data-warehouse type of
> existing use cases. Also, our's is a small (less than 30 nodes cluster). In
> terms of ops/maintenance, we only have one person. I point this out because
> lots of hadoop shops have dedicated team for each - OS administration,
> Hadoop admin, Hadoop developers. And, they are very mature in terms of
> their compute use cases. To my mind, these aspects can significantly impact
> your vendor choices.
>
> MapR: My team simply did not consider them because of all the proprietary
> code in there. We are trying to move from a monolithic proprietary product
> and one of the criteria we set was - if we decided to move away from the
> chosen hadoop vendor, can we easily unlock our data?
> HortonWorks: Distro uses HDFS 1.x with MRv2. All open source. Cluster
> management is via Ambari. Compared to Cloudera's CM, Ambari has very
> rudimentary features. But you have to keep in mind that Ambari is only an
> year old where as CM already has been under development for several years.
> This was a major selection factor for us because Ambari did not have all
> the automation/feature-set compared to CM for a single
> administrator/developer to easily maintain the cluster. Also, during the
> trial period, Hortonwork's packing format/structure apparently kept
> changing which made things a bit difficult to centrally deploy/administer.
>
> Cloudera: Distro uses HDFS 2.x with MRv1. All open source except cluster
> management which is via their proprietary Cloudera Manager tool. It is free
> for use without certain feature like auditing and cluster replication
> features. Maybe a few more features are restricted to
> Enterprise/Licensed-only version. Offers much more features than Ambari. In
> terms of cluster administration, I found CM much easy to work with than
> Ambari. Pretty much all aspects from deploying new nodes to configuration
> and troubleshooting is much more refined than Ambari.
>
> During the selection process, what I found was that both vendors are very
> aggressive in their pitch. So much so that each pushes some FUD regarding
> the competition.
>
> HW uses HDFS 1.x + MRv2 while CDH uses HDFS 2.x + MRv1. HW claimed that
> Cloudera's distro is heavily patched off-course from the core Apache trunk
> that can cause severe data corruption issues. Yes, Cloudera has some 1500+
> patches over apache's Hadoop distro but (1) they aren't private patches.
> You can pull the list and verify that yourself just as I did. (2) In our
> testing and talking to other Cloudera customers, I couldn't find any issues
> with data corruption. It is true though that HDFS 2.x is still in beta but
> so is MRv2 that HW uses. I think both are stable and work well - depending
> on what you need but each uses that point to create FUD.
>
> HW also claimed that a new SQL engine that Cloudera's including in their
> distro - Impala is proprietary. Not true. The software is open source. But
> if you want support for Impala then Cloudera will charge you separately per
> node for Impala over and above what they charge per node for Hadoop support.
>
> In my experience, both products have plenty of issues when it comes to
> compute engines - Hive, Pig etc and their cluster management software. HDFS
> seem to be solid in both distros. So I wouldn't call either of them
> trouble-free and neither is at the maturity level of other popular
> enterprise products like say, Oracle. That said, you have to keep in mind
> that both vendors/products are successfully used by several customers so
> again, it is more a question of what fits your needs.
>
> In the end, we chose to go with Cloudera mostly because a more positive
> experience with CM in terms of administration/operations and their
> pre-sales team when compared to HW. Again, that said, another team that we
> closely work with chose HW for their cluster. I use both vendors/clusters
> at work and neither has any significant issues.
>
>
>
>
> On Sat, Sep 14, 2013 at 12:37 PM, Chris Mattmann wrote:
>
>> Here's the deal, folks can post questions to the list that aren't
>> abusive and simply asking what the difference between different vendor
>> implementations (downstream) of Apache  Hadoop is not an inflammatory
>> or abusive question.
>>
>> Stick to the facts. Dis

Re: mapred.join package not migrated to mapreduce

2013-09-16 Thread kun yan
mapred is not recommended for use I understand mapred is based
interface, but  mapreduce is based on an abstract class

2013/9/17 Ivan Balashov 

> Hi,
>
> Just wondering if there is any particular reason that 'mapred.join'
> package never found its way into 'mapreduce'. Being in the old space makes
> its use rather inconvenient when most of its former neighbors now happily
> live in the new package.
>
> Is this package recommended for prod at all, or maybe, there is an
> alternative?
>
> Thanks,
>
> --
> Ivan Balashov
>



-- 

In the Hadoop world, I am just a novice, explore the entire Hadoop
ecosystem, I hope one day I can contribute their own code

YanBit
yankunhad...@gmail.com


Re: Cloudera Vs Hortonworks Vs MapR

2013-09-16 Thread M. C. Srivas
So here's an example of marketing FUD at work.

On Mon, Sep 16, 2013 at 3:10 PM, Xuri Nagarin  wrote:

> So I will try to answer the OP's question best I can without deviating too
> much into opinions and stick to facts. Disclaimer: I am not an employee of
> either vendor or any partner of theirs.
>
> Context is important: My team's use case was general data exploration of
> semi-structured log data and we had no typical data-warehouse type of
> existing use cases. Also, our's is a small (less than 30 nodes cluster). In
> terms of ops/maintenance, we only have one person. I point this out because
> lots of hadoop shops have dedicated team for each - OS administration,
> Hadoop admin, Hadoop developers. And, they are very mature in terms of
> their compute use cases. To my mind, these aspects can significantly impact
> your vendor choices.
>
> MapR: My team simply did not consider them because of all the proprietary
> code in there. We are trying to move from a monolithic proprietary product
> and one of the criteria we set was - if we decided to move away from the
> chosen hadoop vendor, can we easily unlock our data?
>

Unlock your data? How about disctp? Or just "cp"?

The fact is there are 10x  more standard ways to access your data in a MapR
cluster versus a Cloudera or Hortonworks data.

MapR is entirely open source, with proprietary add-ons, just like Cloudera
or Hortonworks.

The difference is MapR has innovated both above and below the Hadoop stack,
while Cloudera and Horton have only done so above the stack. MapR's
innovations have set the bar so high that its competition likes to spread
FUD.

[disclaimer: I work for MapR ]



> HortonWorks: Distro uses HDFS 1.x with MRv2. All open source. Cluster
> management is via Ambari. Compared to Cloudera's CM, Ambari has very
> rudimentary features. But you have to keep in mind that Ambari is only an
> year old where as CM already has been under development for several years.
> This was a major selection factor for us because Ambari did not have all
> the automation/feature-set compared to CM for a single
> administrator/developer to easily maintain the cluster. Also, during the
> trial period, Hortonwork's packing format/structure apparently kept
> changing which made things a bit difficult to centrally deploy/administer.
>
> Cloudera: Distro uses HDFS 2.x with MRv1. All open source except cluster
> management which is via their proprietary Cloudera Manager tool. It is free
> for use without certain feature like auditing and cluster replication
> features. Maybe a few more features are restricted to
> Enterprise/Licensed-only version. Offers much more features than Ambari. In
> terms of cluster administration, I found CM much easy to work with than
> Ambari. Pretty much all aspects from deploying new nodes to configuration
> and troubleshooting is much more refined than Ambari.
>
> During the selection process, what I found was that both vendors are very
> aggressive in their pitch. So much so that each pushes some FUD regarding
> the competition.
>

Obviously some of it worked, given some of the statements earlier.



>
> HW uses HDFS 1.x + MRv2 while CDH uses HDFS 2.x + MRv1. HW claimed that
> Cloudera's distro is heavily patched off-course from the core Apache trunk
> that can cause severe data corruption issues. Yes, Cloudera has some 1500+
> patches over apache's Hadoop distro but (1) they aren't private patches.
> You can pull the list and verify that yourself just as I did. (2) In our
> testing and talking to other Cloudera customers, I couldn't find any issues
> with data corruption. It is true though that HDFS 2.x is still in beta but
> so is MRv2 that HW uses. I think both are stable and work well - depending
> on what you need but each uses that point to create FUD.
>
> HW also claimed that a new SQL engine that Cloudera's including in their
> distro - Impala is proprietary. Not true. The software is open source. But
> if you want support for Impala then Cloudera will charge you separately per
> node for Impala over and above what they charge per node for Hadoop support.
>
> In my experience, both products have plenty of issues when it comes to
> compute engines - Hive, Pig etc and their cluster management software. HDFS
> seem to be solid in both distros. So I wouldn't call either of them
> trouble-free and neither is at the maturity level of other popular
> enterprise products like say, Oracle. That said, you have to keep in mind
> that both vendors/products are successfully used by several customers so
> again, it is more a question of what fits your needs.
>
> In the end, we chose to go with Cloudera mostly because a more positive
> experience with CM in terms of administration/operations and their
> pre-sales team when compared to HW. Again, that said, another team that we
> closely work with chose HW for their cluster. I use both vendors/clusters
> at work and neither has any significant issues.
>
>
>
>
> On Sat, Sep 14, 2013 at 12:37 PM,