Re: AWS Setting for setting up Hadoop cluster

2015-03-05 Thread Jonathan Aquilina
 

I have experience with my full time job using EMR damn thing is quick
and cheap. The interesting part is wrapping your head around the
concepts. If you need things quickly and fast EMR is the way to go. It
spawns up a number of ec2 instances 

by default you have 1 master and 2 core nodes. The three of them are
m3.large nodes which run you 7 cents per hour. to run one years with of
data which is about 1.1 billion records from the database it took 50 min
from cluster spawn up to completion and shutting down of the cluster. 

---
Regards,
Jonathan Aquilina
Founder Eagle Eye T

On 2015-03-05 23:41, Dieter De Witte wrote: 

 You can install Hadoop on Amazon EC2 instances and use the free tier for new 
 members but you can also use Amazon EMR which is not free but is up and 
 running in a couple of seconds... 
 
 2015-03-05 23:28 GMT+01:00 Krish Donald gotomyp...@gmail.com:
 
 Hi, 
 
 I am tired of setting Hadoop cluster using my laptop which has 8GB RAM. 
 I tried 2gb for namenode and 1-1 gb for 3 datanoded so total 5gb I was using 
 . 
 And I was using very basic Hadoop services only. 
 But it is so slow that I am not able to do anything on that. 
 
 Hence I would like to try the AWS service now. 
 
 Can anybody please help me, which configuration I should use it without 
 paying at all? 
 What are the tips you have for AWS ? 
 
 Thanks 
 Krish
 

Re: AWS Setting for setting up Hadoop cluster

2015-03-05 Thread Dieter De Witte
 You can install Hadoop on Amazon EC2 instances and use the free tier for
new members but you can also use Amazon EMR which is not free but is up and
running in a couple of seconds...

2015-03-05 23:28 GMT+01:00 Krish Donald gotomyp...@gmail.com:

 Hi,

 I am tired of setting Hadoop cluster using my laptop which has 8GB RAM.
 I tried 2gb for namenode and 1-1 gb for 3 datanoded so total 5gb I was
 using .
 And I was using very basic Hadoop services only.
 But it is so slow that I am not able to do anything on that.

 Hence I would like to try the AWS service now.

 Can anybody please help me, which configuration I should use it without
 paying at all?
 What are the tips you have for AWS ?

 Thanks
 Krish



Re: AWS Setting for setting up Hadoop cluster

2015-03-05 Thread Krish Donald
Because I am new to AWS, I would like to explore the free service first and
then later I can use EMR.
Which one is fast in EC2 and free too?

Thanks


On Thu, Mar 5, 2015 at 2:47 PM, Jonathan Aquilina jaquil...@eagleeyet.net
wrote:

  I have experience with my full time job using EMR damn thing is quick
 and cheap. The interesting part is wrapping your head around the concepts.
 If you need things quickly and fast EMR is the way to go. It spawns up a
 number of ec2 instances

 by default you have 1 master and 2 core nodes. The three of them are
 m3.large nodes which run you 7 cents per hour. to run one years with of
 data which is about 1.1 billion records from the database it took 50 min
 from cluster spawn up to completion and shutting down of the cluster.



 ---
 Regards,
 Jonathan Aquilina
 Founder Eagle Eye T

  On 2015-03-05 23:41, Dieter De Witte wrote:

  You can install Hadoop on Amazon EC2 instances and use the free tier for
 new members but you can also use Amazon EMR which is not free but is up and
 running in a couple of seconds...

 2015-03-05 23:28 GMT+01:00 Krish Donald gotomyp...@gmail.com:

  Hi,

 I am tired of setting Hadoop cluster using my laptop which has 8GB RAM.
 I tried 2gb for namenode and 1-1 gb for 3 datanoded so total 5gb I was
 using .
 And I was using very basic Hadoop services only.
 But it is so slow that I am not able to do anything on that.

 Hence I would like to try the AWS service now.

 Can anybody please help me, which configuration I should use it without
 paying at all?
 What are the tips you have for AWS ?

 Thanks
 Krish




Re: t2.micro on AWS; Is it enough for setting up Hadoop cluster ?

2015-03-05 Thread Jonathan Aquilina
 

krish EMR wont cost you much with all the testing and data we ran
through the test systems as well as the large amont of data when
everythign was read we paid about 15.00 USD. I honestly do not think
that the specs there would be enough as java can be pretty ram hungry. 

---
Regards,
Jonathan Aquilina
Founder Eagle Eye T

On 2015-03-06 00:41, Krish Donald wrote: 

 Hi, 
 
 I am new to AWS and would like to setup Hadoop cluster using cloudera manager 
 for 6-7 nodes. 
 
 t2.micro on AWS; Is it enough for setting up Hadoop cluster ? 
 I would like to use free service as of now. 
 
 Please advise. 
 
 Thanks 
 Krish
 

Re: t2.micro on AWS; Is it enough for setting up Hadoop cluster ?

2015-03-05 Thread Krish Donald
Thanks Jonathan,

I will try to explore EMR option also.
Can you please let me know the configuration which you have used it?
Can you please recommend for me also?
I would like to setup Hadoop cluster using cloudera manager and then would
like to do below things:

setup kerberos
setup federation
setup monitoring
setup hadr
backup and recovery
authorization using sentry
backup and recovery of individual componenets
performamce tuning
upgrade of cdh
upgrade of CM
Hue User Administration
Spark
Solr


Thanks
Krish


On Thu, Mar 5, 2015 at 3:57 PM, Jonathan Aquilina jaquil...@eagleeyet.net
wrote:

  krish EMR wont cost you much with all the testing and data we ran
 through the test systems as well as the large amont of data when everythign
 was read we paid about 15.00 USD. I honestly do not think that the specs
 there would be enough as java can be pretty ram hungry.



 ---
 Regards,
 Jonathan Aquilina
 Founder Eagle Eye T

  On 2015-03-06 00:41, Krish Donald wrote:

  Hi,

 I am new to AWS and would like to setup Hadoop cluster using cloudera
 manager for 6-7 nodes.

 t2.micro on AWS; Is it enough for setting up Hadoop cluster ?
 I would like to use free service as of now.

 Please advise.

 Thanks
 Krish




Re: AWS Setting for setting up Hadoop cluster

2015-03-05 Thread Jonathan Aquilina
 

Advantage of EMR is that you dont have to stay screwing around with
installing hadoop it does all that for you so you are ready to go 

---
Regards,
Jonathan Aquilina
Founder Eagle Eye T

On 2015-03-05 23:51, Krish Donald wrote: 

 Because I am new to AWS, I would like to explore the free service first and 
 then later I can use EMR. 
 Which one is fast in EC2 and free too? 
 
 Thanks 
 
 On Thu, Mar 5, 2015 at 2:47 PM, Jonathan Aquilina jaquil...@eagleeyet.net 
 wrote:
 
 I have experience with my full time job using EMR damn thing is quick and 
 cheap. The interesting part is wrapping your head around the concepts. If you 
 need things quickly and fast EMR is the way to go. It spawns up a number of 
 ec2 instances 
 
 by default you have 1 master and 2 core nodes. The three of them are m3.large 
 nodes which run you 7 cents per hour. to run one years with of data which is 
 about 1.1 billion records from the database it took 50 min from cluster spawn 
 up to completion and shutting down of the cluster. 
 
 ---
 Regards,
 Jonathan Aquilina
 Founder Eagle Eye T
 
 On 2015-03-05 23:41, Dieter De Witte wrote: 
 You can install Hadoop on Amazon EC2 instances and use the free tier for new 
 members but you can also use Amazon EMR which is not free but is up and 
 running in a couple of seconds... 
 
 2015-03-05 23:28 GMT+01:00 Krish Donald gotomyp...@gmail.com:
 
 Hi, 
 
 I am tired of setting Hadoop cluster using my laptop which has 8GB RAM. 
 I tried 2gb for namenode and 1-1 gb for 3 datanoded so total 5gb I was using 
 . 
 And I was using very basic Hadoop services only. 
 But it is so slow that I am not able to do anything on that. 
 
 Hence I would like to try the AWS service now. 
 
 Can anybody please help me, which configuration I should use it without 
 paying at all? 
 What are the tips you have for AWS ? 
 
 Thanks 
 Krish
 

AWS Setting for setting up Hadoop cluster

2015-03-05 Thread Krish Donald
Hi,

I am tired of setting Hadoop cluster using my laptop which has 8GB RAM.
I tried 2gb for namenode and 1-1 gb for 3 datanoded so total 5gb I was
using .
And I was using very basic Hadoop services only.
But it is so slow that I am not able to do anything on that.

Hence I would like to try the AWS service now.

Can anybody please help me, which configuration I should use it without
paying at all?
What are the tips you have for AWS ?

Thanks
Krish


t2.micro on AWS; Is it enough for setting up Hadoop cluster ?

2015-03-05 Thread Krish Donald
Hi,

I am new to AWS and would like to setup Hadoop cluster using cloudera
manager for 6-7 nodes.

t2.micro on AWS; Is it enough for setting up Hadoop cluster ?
I would like to use free service as of now.

Please advise.

Thanks
Krish


Re: t2.micro on AWS; Is it enough for setting up Hadoop cluster ?

2015-03-05 Thread Alexander Pivovarov
what about DNS?
if you have 2 computers (nn and dn) how nn knows dn ip?

The script puts only this computer ip to /etc/hosts

On Thu, Mar 5, 2015 at 6:39 PM, max scalf oracle.bl...@gmail.com wrote:

 Here is a easy way to go about assigning static name to your ec2
 instance.  When you get the launch an EC2-instance from aws console when
 you get to the point of selecting VPC, ip address screen there is a screen
 that says USER DATA...put the below in with appropriate host name(change
 CHANGE_HOST_NAME_HERE to whatever you want) and that should be able to get
 you static name.

 #!/bin/bash

 HOSTNAME_TAG=CHANGE_HOST_NAME_HERE
 cat  /etc/sysconfig/network  EOF
 NETWORKING=yes
 NETWORKING_IPV6=no
 HOSTNAME=${HOSTNAME_TAG}
 EOF

 IP=$(curl http://169.254.169.254/latest/meta-data/local-ipv4)
 echo ${IP} ${HOSTNAME_TAG}.localhost ${HOSTNAME_TAG}  /etc/hosts

 echo ${HOSTNAME_TAG}  /proc/sys/kernel/hostname
 service network restart


 Also note i was able to do this on couple of spot instance for cheap
 price, only thing is once you shut it down or someone outbids you, you
 loose that instance but its easy/cheap to play around with and i have
 used couple of m3.medium for my NN/SNN and couple of them for data nodes...

 On Thu, Mar 5, 2015 at 7:19 PM, Jonathan Aquilina jaquil...@eagleeyet.net
  wrote:

  I dont know how you would do that to be honest. With EMR you have
 destinctions master core and task nodes. If you need to change
 configuration you just ssh into the EMR master node.



 ---
 Regards,
 Jonathan Aquilina
 Founder Eagle Eye T

  On 2015-03-06 02:11, Alexander Pivovarov wrote:

 What is the easiest way to assign names to aws ec2 computers?
 I guess computer need static hostname and dns name before it can be used
 in hadoop cluster.
 On Mar 5, 2015 4:36 PM, Jonathan Aquilina jaquil...@eagleeyet.net
 wrote:

  When I started with EMR it was alot of testing and trial and error.
 HUE is already supported as something that can be installed from the AWS
 console. What I need to know is if you need this cluster on all the time or
 this is goign ot be what amazon call a transient cluster. Meaning you fire
 it up run the job and tear it back down.



 ---
 Regards,
 Jonathan Aquilina
 Founder Eagle Eye T

  On 2015-03-06 01:10, Krish Donald wrote:

  Thanks Jonathan,

 I will try to explore EMR option also.
 Can you please let me know the configuration which you have used it?
 Can you please recommend for me also?
 I would like to setup Hadoop cluster using cloudera manager and then
 would like to do below things:

 setup kerberos
 setup federation
 setup monitoring
 setup hadr
 backup and recovery
 authorization using sentry
 backup and recovery of individual componenets
 performamce tuning
 upgrade of cdh
 upgrade of CM
 Hue User Administration
 Spark
 Solr


 Thanks
 Krish


 On Thu, Mar 5, 2015 at 3:57 PM, Jonathan Aquilina 
 jaquil...@eagleeyet.net wrote:

  krish EMR wont cost you much with all the testing and data we ran
 through the test systems as well as the large amont of data when everythign
 was read we paid about 15.00 USD. I honestly do not think that the specs
 there would be enough as java can be pretty ram hungry.



 ---
 Regards,
 Jonathan Aquilina
 Founder Eagle Eye T

   On 2015-03-06 00:41, Krish Donald wrote:

  Hi,

 I am new to AWS and would like to setup Hadoop cluster using cloudera
 manager for 6-7 nodes.

 t2.micro on AWS; Is it enough for setting up Hadoop cluster ?
 I would like to use free service as of now.

 Please advise.

 Thanks
 Krish





Re: t2.micro on AWS; Is it enough for setting up Hadoop cluster ?

2015-03-05 Thread Alexander Pivovarov
I think EMR has its own limitation

e.g. I want to setup hadoop 2.6.0 with kerberos + hive-1.2.0 to test my
hive patch.

How EMR can help me?  it supports hadoop up to 2.4.0  (not even 2.4.1)
http://docs.aws.amazon.com/ElasticMapReduce/latest/DeveloperGuide/emr-plan-hadoop-version.html







On Thu, Mar 5, 2015 at 9:51 PM, Jonathan Aquilina jaquil...@eagleeyet.net
wrote:

  Hi guys I know you guys want to keep costs down, but why go through all
 the effort to setup ec2 instances when you deploy EMR it takes the time to
 provision and setup the ec2 instances for you. All configuration then for
 the entire cluster is done on the master node of the particular cluster or
 setting up of additional software that is all done through the EMR console.
 We were doing some geospatial calculations and we loaded a 3rd party jar
 file called esri into the EMR cluster. I then had to pass a small bootstrap
 action (script) to have it distribute esri to the entire cluster.

 Why are you guys reinventing the wheel?



 ---
 Regards,
 Jonathan Aquilina
 Founder Eagle Eye T

  On 2015-03-06 03:35, Alexander Pivovarov wrote:

I found the following solution to this problem

 I registered 2 subdomains  (public and local) for each computer on
 https://freedns.afraid.org/subdomain/
 e.g.
 myhadoop-nn.crabdance.com
 myhadoop-nn-local.crabdance.com

 then I added cron job which sends http requests to update public and local
 ip on freedns server
 hint: public ip is detected automatically
 ip address for local name can be set using request parameter address=10.x.x.x
 (don't forget to escape )

 as a result my nn computer has 2 DNS names with currently assigned ip
 addresses , e.g.
 myhadoop-nn.crabdance.com  54.203.181.177
 myhadoop-nn-local.crabdance.com   10.220.149.103

 in hadoop configuration I can use local machine names
 to access my cluster outside of AWS I can use public names

 Just curious if AWS provides easier way to name EC2 computers?

 On Thu, Mar 5, 2015 at 5:19 PM, Jonathan Aquilina jaquil...@eagleeyet.net
  wrote:

  I dont know how you would do that to be honest. With EMR you have
 destinctions master core and task nodes. If you need to change
 configuration you just ssh into the EMR master node.



 ---
 Regards,
 Jonathan Aquilina
 Founder Eagle Eye T

   On 2015-03-06 02:11, Alexander Pivovarov wrote:

 What is the easiest way to assign names to aws ec2 computers?
 I guess computer need static hostname and dns name before it can be used
 in hadoop cluster.
 On Mar 5, 2015 4:36 PM, Jonathan Aquilina jaquil...@eagleeyet.net
 wrote:

  When I started with EMR it was alot of testing and trial and error.
 HUE is already supported as something that can be installed from the AWS
 console. What I need to know is if you need this cluster on all the time or
 this is goign ot be what amazon call a transient cluster. Meaning you fire
 it up run the job and tear it back down.



 ---
 Regards,
 Jonathan Aquilina
 Founder Eagle Eye T

  On 2015-03-06 01:10, Krish Donald wrote:

  Thanks Jonathan,

 I will try to explore EMR option also.
 Can you please let me know the configuration which you have used it?
 Can you please recommend for me also?
 I would like to setup Hadoop cluster using cloudera manager and then
 would like to do below things:

 setup kerberos
 setup federation
 setup monitoring
 setup hadr
 backup and recovery
 authorization using sentry
 backup and recovery of individual componenets
 performamce tuning
 upgrade of cdh
 upgrade of CM
 Hue User Administration
 Spark
 Solr


 Thanks
 Krish


 On Thu, Mar 5, 2015 at 3:57 PM, Jonathan Aquilina 
 jaquil...@eagleeyet.net wrote:

  krish EMR wont cost you much with all the testing and data we ran
 through the test systems as well as the large amont of data when everythign
 was read we paid about 15.00 USD. I honestly do not think that the specs
 there would be enough as java can be pretty ram hungry.



 ---
 Regards,
 Jonathan Aquilina
 Founder Eagle Eye T

   On 2015-03-06 00:41, Krish Donald wrote:

  Hi,

 I am new to AWS and would like to setup Hadoop cluster using cloudera
 manager for 6-7 nodes.

 t2.micro on AWS; Is it enough for setting up Hadoop cluster ?
 I would like to use free service as of now.

 Please advise.

 Thanks
 Krish




Re: t2.micro on AWS; Is it enough for setting up Hadoop cluster ?

2015-03-05 Thread Jonathan Aquilina
 

The only limitation I know is that of how many nodes you can have and
how many instances of that particular size the host is on can support.
you can load hive in EMR and then any other features of the cluster are
managed at the master node level as you have SSH access there. 

What are the advantage of 2.6 over 2.4 for example. 

I just feel you guys are reinventing the wheel when amazon already
caters for hadoop granted it might not be 2.6. 

---
Regards,
Jonathan Aquilina
Founder Eagle Eye T

On 2015-03-06 07:31, Alexander Pivovarov wrote: 

 I think EMR has its own limitation
 
 e.g. I want to setup hadoop 2.6.0 with kerberos + hive-1.2.0 to test my hive 
 patch. How EMR can help me? it supports hadoop up to 2.4.0 (not even 2.4.1)
 http://docs.aws.amazon.com/ElasticMapReduce/latest/DeveloperGuide/emr-plan-hadoop-version.html
  [1]
 
 On Thu, Mar 5, 2015 at 9:51 PM, Jonathan Aquilina jaquil...@eagleeyet.net 
 wrote:
 
 Hi guys I know you guys want to keep costs down, but why go through all the 
 effort to setup ec2 instances when you deploy EMR it takes the time to 
 provision and setup the ec2 instances for you. All configuration then for the 
 entire cluster is done on the master node of the particular cluster or 
 setting up of additional software that is all done through the EMR console. 
 We were doing some geospatial calculations and we loaded a 3rd party jar file 
 called esri into the EMR cluster. I then had to pass a small bootstrap action 
 (script) to have it distribute esri to the entire cluster. 
 
 Why are you guys reinventing the wheel? 
 
 ---
 Regards,
 Jonathan Aquilina
 Founder Eagle Eye T
 
 On 2015-03-06 03:35, Alexander Pivovarov wrote: 
 
 I found the following solution to this problem
 
 I registered 2 subdomains (public and local) for each computer on 
 https://freedns.afraid.org/subdomain/ [2] 
 e.g. 
 myhadoop-nn.crabdance.com [3]
 myhadoop-nn-local.crabdance.com [4] 
 then I added cron job which sends http requests to update public and local ip 
 on freedns server hint: public ip is detected automatically ip address for 
 local name can be set using request parameter address=10.x.x.x (don't forget 
 to escape )
 
 as a result my nn computer has 2 DNS names with currently assigned ip 
 addresses , e.g.
 myhadoop-nn.crabdance.com [3] 54.203.181.177
 myhadoop-nn-local.crabdance.com [4] 10.220.149.103
 
 in hadoop configuration I can use local machine names to access my cluster 
 outside of AWS I can use public names
 
 Just curious if AWS provides easier way to name EC2 computers?
 
 On Thu, Mar 5, 2015 at 5:19 PM, Jonathan Aquilina jaquil...@eagleeyet.net 
 wrote:
 
 I dont know how you would do that to be honest. With EMR you have 
 destinctions master core and task nodes. If you need to change configuration 
 you just ssh into the EMR master node. 
 
 ---
 Regards,
 Jonathan Aquilina
 Founder Eagle Eye T
 
 On 2015-03-06 02:11, Alexander Pivovarov wrote: 
 
 What is the easiest way to assign names to aws ec2 computers?
 I guess computer need static hostname and dns name before it can be used in 
 hadoop cluster. 
 On Mar 5, 2015 4:36 PM, Jonathan Aquilina jaquil...@eagleeyet.net wrote:
 
 When I started with EMR it was alot of testing and trial and error. HUE is 
 already supported as something that can be installed from the AWS console. 
 What I need to know is if you need this cluster on all the time or this is 
 goign ot be what amazon call a transient cluster. Meaning you fire it up run 
 the job and tear it back down. 
 
 ---
 Regards,
 Jonathan Aquilina
 Founder Eagle Eye T
 
 On 2015-03-06 01:10, Krish Donald wrote: 
 
 Thanks Jonathan, 
 
 I will try to explore EMR option also. 
 Can you please let me know the configuration which you have used it? 
 Can you please recommend for me also? 
 I would like to setup Hadoop cluster using cloudera manager and then would 
 like to do below things: 
 
 setup kerberos
 setup federation
 setup monitoring
 setup hadr
 backup and recovery
 authorization using sentry
 backup and recovery of individual componenets
 performamce tuning
 upgrade of cdh 
 upgrade of CM
 Hue User Administration 
 Spark 
 Solr 
 
 Thanks 
 Krish 
 
 On Thu, Mar 5, 2015 at 3:57 PM, Jonathan Aquilina jaquil...@eagleeyet.net 
 wrote:
 
 krish EMR wont cost you much with all the testing and data we ran through the 
 test systems as well as the large amont of data when everythign was read we 
 paid about 15.00 USD. I honestly do not think that the specs there would be 
 enough as java can be pretty ram hungry. 
 
 ---
 Regards,
 Jonathan Aquilina
 Founder Eagle Eye T
 
 On 2015-03-06 00:41, Krish Donald wrote: 
 
 Hi, 
 
 I am new to AWS and would like to setup Hadoop cluster using cloudera manager 
 for 6-7 nodes. 
 
 t2.micro on AWS; Is it enough for setting up Hadoop cluster ? 
 I would like to use free service as of now. 
 
 Please advise. 
 
 Thanks 
 Krish
 

Links:
--
[1]

Re: t2.micro on AWS; Is it enough for setting up Hadoop cluster ?

2015-03-05 Thread daemeon reiydelle
Do a reverse lookup and use the name you find. There are a few areas
of Hadoopo that require reverse name lookup, but in general just
create relevant entries (shared across the cluster, e.g. via Ansible
if more than just a few nodes) in /etc/hosts.

Not hard.


On Thu, Mar 5, 2015 at 6:35 PM, Alexander Pivovarov
apivova...@gmail.com wrote:
 I found the following solution to this problem

 I registered 2 subdomains  (public and local) for each computer on
 https://freedns.afraid.org/subdomain/
 e.g.
 myhadoop-nn.crabdance.com
 myhadoop-nn-local.crabdance.com

 then I added cron job which sends http requests to update public and local
 ip on freedns server
 hint: public ip is detected automatically
 ip address for local name can be set using request parameter
 address=10.x.x.x   (don't forget to escape )

 as a result my nn computer has 2 DNS names with currently assigned ip
 addresses , e.g.
 myhadoop-nn.crabdance.com  54.203.181.177
 myhadoop-nn-local.crabdance.com   10.220.149.103

 in hadoop configuration I can use local machine names
 to access my cluster outside of AWS I can use public names

 Just curious if AWS provides easier way to name EC2 computers?

 On Thu, Mar 5, 2015 at 5:19 PM, Jonathan Aquilina jaquil...@eagleeyet.net
 wrote:

 I dont know how you would do that to be honest. With EMR you have
 destinctions master core and task nodes. If you need to change configuration
 you just ssh into the EMR master node.



 ---
 Regards,
 Jonathan Aquilina
 Founder Eagle Eye T

 On 2015-03-06 02:11, Alexander Pivovarov wrote:

 What is the easiest way to assign names to aws ec2 computers?
 I guess computer need static hostname and dns name before it can be used
 in hadoop cluster.

 On Mar 5, 2015 4:36 PM, Jonathan Aquilina jaquil...@eagleeyet.net
 wrote:

 When I started with EMR it was alot of testing and trial and error. HUE
 is already supported as something that can be installed from the AWS
 console. What I need to know is if you need this cluster on all the time or
 this is goign ot be what amazon call a transient cluster. Meaning you fire
 it up run the job and tear it back down.



 ---
 Regards,
 Jonathan Aquilina
 Founder Eagle Eye T

 On 2015-03-06 01:10, Krish Donald wrote:

 Thanks Jonathan,

 I will try to explore EMR option also.
 Can you please let me know the configuration which you have used it?
 Can you please recommend for me also?
 I would like to setup Hadoop cluster using cloudera manager and then
 would like to do below things:

 setup kerberos
 setup federation
 setup monitoring
 setup hadr
 backup and recovery
 authorization using sentry
 backup and recovery of individual componenets
 performamce tuning
 upgrade of cdh
 upgrade of CM
 Hue User Administration
 Spark
 Solr


 Thanks
 Krish


 On Thu, Mar 5, 2015 at 3:57 PM, Jonathan Aquilina
 jaquil...@eagleeyet.net wrote:

 krish EMR wont cost you much with all the testing and data we ran
 through the test systems as well as the large amont of data when everythign
 was read we paid about 15.00 USD. I honestly do not think that the specs
 there would be enough as java can be pretty ram hungry.



 ---
 Regards,
 Jonathan Aquilina
 Founder Eagle Eye T

 On 2015-03-06 00:41, Krish Donald wrote:

 Hi,

 I am new to AWS and would like to setup Hadoop cluster using cloudera
 manager for 6-7 nodes.

 t2.micro on AWS; Is it enough for setting up Hadoop cluster ?
 I would like to use free service as of now.

 Please advise.

 Thanks
 Krish




Re: t2.micro on AWS; Is it enough for setting up Hadoop cluster ?

2015-03-05 Thread Alexander Pivovarov
What is the easiest way to assign names to aws ec2 computers?
I guess computer need static hostname and dns name before it can be used in
hadoop cluster.
On Mar 5, 2015 4:36 PM, Jonathan Aquilina jaquil...@eagleeyet.net wrote:

  When I started with EMR it was alot of testing and trial and error. HUE
 is already supported as something that can be installed from the AWS
 console. What I need to know is if you need this cluster on all the time or
 this is goign ot be what amazon call a transient cluster. Meaning you fire
 it up run the job and tear it back down.



 ---
 Regards,
 Jonathan Aquilina
 Founder Eagle Eye T

  On 2015-03-06 01:10, Krish Donald wrote:

  Thanks Jonathan,

 I will try to explore EMR option also.
 Can you please let me know the configuration which you have used it?
 Can you please recommend for me also?
 I would like to setup Hadoop cluster using cloudera manager and then would
 like to do below things:

 setup kerberos
 setup federation
 setup monitoring
 setup hadr
 backup and recovery
 authorization using sentry
 backup and recovery of individual componenets
 performamce tuning
 upgrade of cdh
 upgrade of CM
 Hue User Administration
 Spark
 Solr


 Thanks
 Krish


 On Thu, Mar 5, 2015 at 3:57 PM, Jonathan Aquilina jaquil...@eagleeyet.net
  wrote:

  krish EMR wont cost you much with all the testing and data we ran
 through the test systems as well as the large amont of data when everythign
 was read we paid about 15.00 USD. I honestly do not think that the specs
 there would be enough as java can be pretty ram hungry.



 ---
 Regards,
 Jonathan Aquilina
 Founder Eagle Eye T

   On 2015-03-06 00:41, Krish Donald wrote:

  Hi,

 I am new to AWS and would like to setup Hadoop cluster using cloudera
 manager for 6-7 nodes.

 t2.micro on AWS; Is it enough for setting up Hadoop cluster ?
 I would like to use free service as of now.

 Please advise.

 Thanks
 Krish




Re: t2.micro on AWS; Is it enough for setting up Hadoop cluster ?

2015-03-05 Thread max scalf
Here is a easy way to go about assigning static name to your ec2 instance.
When you get the launch an EC2-instance from aws console when you get to
the point of selecting VPC, ip address screen there is a screen that says
USER DATA...put the below in with appropriate host name(change
CHANGE_HOST_NAME_HERE to whatever you want) and that should be able to get
you static name.

#!/bin/bash

HOSTNAME_TAG=CHANGE_HOST_NAME_HERE
cat  /etc/sysconfig/network  EOF
NETWORKING=yes
NETWORKING_IPV6=no
HOSTNAME=${HOSTNAME_TAG}
EOF

IP=$(curl http://169.254.169.254/latest/meta-data/local-ipv4)
echo ${IP} ${HOSTNAME_TAG}.localhost ${HOSTNAME_TAG}  /etc/hosts

echo ${HOSTNAME_TAG}  /proc/sys/kernel/hostname
service network restart


Also note i was able to do this on couple of spot instance for cheap price,
only thing is once you shut it down or someone outbids you, you loose that
instance but its easy/cheap to play around with and i have used couple
of m3.medium for my NN/SNN and couple of them for data nodes...

On Thu, Mar 5, 2015 at 7:19 PM, Jonathan Aquilina jaquil...@eagleeyet.net
wrote:

  I dont know how you would do that to be honest. With EMR you have
 destinctions master core and task nodes. If you need to change
 configuration you just ssh into the EMR master node.



 ---
 Regards,
 Jonathan Aquilina
 Founder Eagle Eye T

  On 2015-03-06 02:11, Alexander Pivovarov wrote:

 What is the easiest way to assign names to aws ec2 computers?
 I guess computer need static hostname and dns name before it can be used
 in hadoop cluster.
 On Mar 5, 2015 4:36 PM, Jonathan Aquilina jaquil...@eagleeyet.net
 wrote:

  When I started with EMR it was alot of testing and trial and error. HUE
 is already supported as something that can be installed from the AWS
 console. What I need to know is if you need this cluster on all the time or
 this is goign ot be what amazon call a transient cluster. Meaning you fire
 it up run the job and tear it back down.



 ---
 Regards,
 Jonathan Aquilina
 Founder Eagle Eye T

  On 2015-03-06 01:10, Krish Donald wrote:

  Thanks Jonathan,

 I will try to explore EMR option also.
 Can you please let me know the configuration which you have used it?
 Can you please recommend for me also?
 I would like to setup Hadoop cluster using cloudera manager and then
 would like to do below things:

 setup kerberos
 setup federation
 setup monitoring
 setup hadr
 backup and recovery
 authorization using sentry
 backup and recovery of individual componenets
 performamce tuning
 upgrade of cdh
 upgrade of CM
 Hue User Administration
 Spark
 Solr


 Thanks
 Krish


 On Thu, Mar 5, 2015 at 3:57 PM, Jonathan Aquilina 
 jaquil...@eagleeyet.net wrote:

  krish EMR wont cost you much with all the testing and data we ran
 through the test systems as well as the large amont of data when everythign
 was read we paid about 15.00 USD. I honestly do not think that the specs
 there would be enough as java can be pretty ram hungry.



 ---
 Regards,
 Jonathan Aquilina
 Founder Eagle Eye T

   On 2015-03-06 00:41, Krish Donald wrote:

  Hi,

 I am new to AWS and would like to setup Hadoop cluster using cloudera
 manager for 6-7 nodes.

 t2.micro on AWS; Is it enough for setting up Hadoop cluster ?
 I would like to use free service as of now.

 Please advise.

 Thanks
 Krish




Re: t2.micro on AWS; Is it enough for setting up Hadoop cluster ?

2015-03-05 Thread Alexander Pivovarov
ok, how we can easily put all hadoop computer names and IPs to /etc/hosts
on all computers?
Do you have a script? or I need manually go to each computer, get its ip
and put it to /etc/hosts and then distribute /etc/hosts to all machines?

Don't you think one time effort to configure freedns is easier?
freedns solution works with AWS spot-instances as well.

You need to create snapshot after you configure freedns, hadoop, etc on
particular box.
Next time you need computer you can can go to your saved snapshots and
create spot-instance from it.


On Thu, Mar 5, 2015 at 6:54 PM, max scalf oracle.bl...@gmail.com wrote:

 unfortunately without DNS you have to rely on /etc/hosts, so put in entry
 for all your nodes(nn,snn,dn1,dn2 etc..) on all nodes(/etc/hosts file) and
 i have that tested for hortonworks(using ambari) and cloudera manager and i
 am certainly sure it will work for MapR

 On Thu, Mar 5, 2015 at 8:47 PM, Alexander Pivovarov apivova...@gmail.com
 wrote:

 what about DNS?
 if you have 2 computers (nn and dn) how nn knows dn ip?

 The script puts only this computer ip to /etc/hosts

 On Thu, Mar 5, 2015 at 6:39 PM, max scalf oracle.bl...@gmail.com wrote:

 Here is a easy way to go about assigning static name to your ec2
 instance.  When you get the launch an EC2-instance from aws console when
 you get to the point of selecting VPC, ip address screen there is a screen
 that says USER DATA...put the below in with appropriate host name(change
 CHANGE_HOST_NAME_HERE to whatever you want) and that should be able to get
 you static name.

 #!/bin/bash

 HOSTNAME_TAG=CHANGE_HOST_NAME_HERE
 cat  /etc/sysconfig/network  EOF
 NETWORKING=yes
 NETWORKING_IPV6=no
 HOSTNAME=${HOSTNAME_TAG}
 EOF

 IP=$(curl http://169.254.169.254/latest/meta-data/local-ipv4)
 echo ${IP} ${HOSTNAME_TAG}.localhost ${HOSTNAME_TAG}  /etc/hosts

 echo ${HOSTNAME_TAG}  /proc/sys/kernel/hostname
 service network restart


 Also note i was able to do this on couple of spot instance for cheap
 price, only thing is once you shut it down or someone outbids you, you
 loose that instance but its easy/cheap to play around with and i have
 used couple of m3.medium for my NN/SNN and couple of them for data nodes...

 On Thu, Mar 5, 2015 at 7:19 PM, Jonathan Aquilina 
 jaquil...@eagleeyet.net wrote:

  I dont know how you would do that to be honest. With EMR you have
 destinctions master core and task nodes. If you need to change
 configuration you just ssh into the EMR master node.



 ---
 Regards,
 Jonathan Aquilina
 Founder Eagle Eye T

  On 2015-03-06 02:11, Alexander Pivovarov wrote:

 What is the easiest way to assign names to aws ec2 computers?
 I guess computer need static hostname and dns name before it can be
 used in hadoop cluster.
 On Mar 5, 2015 4:36 PM, Jonathan Aquilina jaquil...@eagleeyet.net
 wrote:

  When I started with EMR it was alot of testing and trial and error.
 HUE is already supported as something that can be installed from the AWS
 console. What I need to know is if you need this cluster on all the time 
 or
 this is goign ot be what amazon call a transient cluster. Meaning you fire
 it up run the job and tear it back down.



 ---
 Regards,
 Jonathan Aquilina
 Founder Eagle Eye T

  On 2015-03-06 01:10, Krish Donald wrote:

  Thanks Jonathan,

 I will try to explore EMR option also.
 Can you please let me know the configuration which you have used it?
 Can you please recommend for me also?
 I would like to setup Hadoop cluster using cloudera manager and then
 would like to do below things:

 setup kerberos
 setup federation
 setup monitoring
 setup hadr
 backup and recovery
 authorization using sentry
 backup and recovery of individual componenets
 performamce tuning
 upgrade of cdh
 upgrade of CM
 Hue User Administration
 Spark
 Solr


 Thanks
 Krish


 On Thu, Mar 5, 2015 at 3:57 PM, Jonathan Aquilina 
 jaquil...@eagleeyet.net wrote:

  krish EMR wont cost you much with all the testing and data we ran
 through the test systems as well as the large amont of data when 
 everythign
 was read we paid about 15.00 USD. I honestly do not think that the specs
 there would be enough as java can be pretty ram hungry.



 ---
 Regards,
 Jonathan Aquilina
 Founder Eagle Eye T

   On 2015-03-06 00:41, Krish Donald wrote:

  Hi,

 I am new to AWS and would like to setup Hadoop cluster using cloudera
 manager for 6-7 nodes.

 t2.micro on AWS; Is it enough for setting up Hadoop cluster ?
 I would like to use free service as of now.

 Please advise.

 Thanks
 Krish







Re: t2.micro on AWS; Is it enough for setting up Hadoop cluster ?

2015-03-05 Thread Jonathan Aquilina
 

I dont know how you would do that to be honest. With EMR you have
destinctions master core and task nodes. If you need to change
configuration you just ssh into the EMR master node. 

---
Regards,
Jonathan Aquilina
Founder Eagle Eye T

On 2015-03-06 02:11, Alexander Pivovarov wrote: 

 What is the easiest way to assign names to aws ec2 computers?
 I guess computer need static hostname and dns name before it can be used in 
 hadoop cluster. 
 On Mar 5, 2015 4:36 PM, Jonathan Aquilina jaquil...@eagleeyet.net wrote:
 
 When I started with EMR it was alot of testing and trial and error. HUE is 
 already supported as something that can be installed from the AWS console. 
 What I need to know is if you need this cluster on all the time or this is 
 goign ot be what amazon call a transient cluster. Meaning you fire it up run 
 the job and tear it back down. 
 
 ---
 Regards,
 Jonathan Aquilina
 Founder Eagle Eye T
 
 On 2015-03-06 01:10, Krish Donald wrote: 
 
 Thanks Jonathan, 
 
 I will try to explore EMR option also. 
 Can you please let me know the configuration which you have used it? 
 Can you please recommend for me also? 
 I would like to setup Hadoop cluster using cloudera manager and then would 
 like to do below things: 
 
 setup kerberos
 setup federation
 setup monitoring
 setup hadr
 backup and recovery
 authorization using sentry
 backup and recovery of individual componenets
 performamce tuning
 upgrade of cdh 
 upgrade of CM
 Hue User Administration 
 Spark 
 Solr 
 
 Thanks 
 Krish 
 
 On Thu, Mar 5, 2015 at 3:57 PM, Jonathan Aquilina jaquil...@eagleeyet.net 
 wrote:
 
 krish EMR wont cost you much with all the testing and data we ran through the 
 test systems as well as the large amont of data when everythign was read we 
 paid about 15.00 USD. I honestly do not think that the specs there would be 
 enough as java can be pretty ram hungry. 
 
 ---
 Regards,
 Jonathan Aquilina
 Founder Eagle Eye T
 
 On 2015-03-06 00:41, Krish Donald wrote: 
 
 Hi, 
 
 I am new to AWS and would like to setup Hadoop cluster using cloudera manager 
 for 6-7 nodes. 
 
 t2.micro on AWS; Is it enough for setting up Hadoop cluster ? 
 I would like to use free service as of now. 
 
 Please advise. 
 
 Thanks 
 Krish
 

Re: t2.micro on AWS; Is it enough for setting up Hadoop cluster ?

2015-03-05 Thread max scalf
unfortunately without DNS you have to rely on /etc/hosts, so put in entry
for all your nodes(nn,snn,dn1,dn2 etc..) on all nodes(/etc/hosts file) and
i have that tested for hortonworks(using ambari) and cloudera manager and i
am certainly sure it will work for MapR

On Thu, Mar 5, 2015 at 8:47 PM, Alexander Pivovarov apivova...@gmail.com
wrote:

 what about DNS?
 if you have 2 computers (nn and dn) how nn knows dn ip?

 The script puts only this computer ip to /etc/hosts

 On Thu, Mar 5, 2015 at 6:39 PM, max scalf oracle.bl...@gmail.com wrote:

 Here is a easy way to go about assigning static name to your ec2
 instance.  When you get the launch an EC2-instance from aws console when
 you get to the point of selecting VPC, ip address screen there is a screen
 that says USER DATA...put the below in with appropriate host name(change
 CHANGE_HOST_NAME_HERE to whatever you want) and that should be able to get
 you static name.

 #!/bin/bash

 HOSTNAME_TAG=CHANGE_HOST_NAME_HERE
 cat  /etc/sysconfig/network  EOF
 NETWORKING=yes
 NETWORKING_IPV6=no
 HOSTNAME=${HOSTNAME_TAG}
 EOF

 IP=$(curl http://169.254.169.254/latest/meta-data/local-ipv4)
 echo ${IP} ${HOSTNAME_TAG}.localhost ${HOSTNAME_TAG}  /etc/hosts

 echo ${HOSTNAME_TAG}  /proc/sys/kernel/hostname
 service network restart


 Also note i was able to do this on couple of spot instance for cheap
 price, only thing is once you shut it down or someone outbids you, you
 loose that instance but its easy/cheap to play around with and i have
 used couple of m3.medium for my NN/SNN and couple of them for data nodes...

 On Thu, Mar 5, 2015 at 7:19 PM, Jonathan Aquilina 
 jaquil...@eagleeyet.net wrote:

  I dont know how you would do that to be honest. With EMR you have
 destinctions master core and task nodes. If you need to change
 configuration you just ssh into the EMR master node.



 ---
 Regards,
 Jonathan Aquilina
 Founder Eagle Eye T

  On 2015-03-06 02:11, Alexander Pivovarov wrote:

 What is the easiest way to assign names to aws ec2 computers?
 I guess computer need static hostname and dns name before it can be used
 in hadoop cluster.
 On Mar 5, 2015 4:36 PM, Jonathan Aquilina jaquil...@eagleeyet.net
 wrote:

  When I started with EMR it was alot of testing and trial and error.
 HUE is already supported as something that can be installed from the AWS
 console. What I need to know is if you need this cluster on all the time or
 this is goign ot be what amazon call a transient cluster. Meaning you fire
 it up run the job and tear it back down.



 ---
 Regards,
 Jonathan Aquilina
 Founder Eagle Eye T

  On 2015-03-06 01:10, Krish Donald wrote:

  Thanks Jonathan,

 I will try to explore EMR option also.
 Can you please let me know the configuration which you have used it?
 Can you please recommend for me also?
 I would like to setup Hadoop cluster using cloudera manager and then
 would like to do below things:

 setup kerberos
 setup federation
 setup monitoring
 setup hadr
 backup and recovery
 authorization using sentry
 backup and recovery of individual componenets
 performamce tuning
 upgrade of cdh
 upgrade of CM
 Hue User Administration
 Spark
 Solr


 Thanks
 Krish


 On Thu, Mar 5, 2015 at 3:57 PM, Jonathan Aquilina 
 jaquil...@eagleeyet.net wrote:

  krish EMR wont cost you much with all the testing and data we ran
 through the test systems as well as the large amont of data when 
 everythign
 was read we paid about 15.00 USD. I honestly do not think that the specs
 there would be enough as java can be pretty ram hungry.



 ---
 Regards,
 Jonathan Aquilina
 Founder Eagle Eye T

   On 2015-03-06 00:41, Krish Donald wrote:

  Hi,

 I am new to AWS and would like to setup Hadoop cluster using cloudera
 manager for 6-7 nodes.

 t2.micro on AWS; Is it enough for setting up Hadoop cluster ?
 I would like to use free service as of now.

 Please advise.

 Thanks
 Krish






Re: (no subject)

2015-03-05 Thread Raj K Singh
just configure logging appender in log4j setting and rerun the command
On Mar 5, 2015 12:30 AM, SP sajid...@gmail.com wrote:

 Hello All,

 Why am I getting this error every time I execute a command. It was working
 fine with CDH4 version. When I upgraded to CDH5 version this message
 started showing up.

 does any one have resolution for this error

 sudo -u hdfs hadoop fs -ls /
 SLF4J: Failed to load class org.slf4j.impl.StaticLoggerBinder.
 SLF4J: Defaulting to no-operation (NOP) logger implementation
 SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further
 details.
 Found 1 items
 drwxrwxrwt   - hdfs hadoop  0 2015-03-04 10:30 /tmp


 Thanks
 SP



Re: HDFS Append Problem

2015-03-05 Thread Suresh Srinivas
Please take this up CDH mailing list.



From: Molnár Bálint molnarcsi...@gmail.com
Sent: Thursday, March 05, 2015 4:53 AM
To: user@hadoop.apache.org
Subject: HDFS Append Problem

Hi Everyone!

I 'm experiencing an annoying problem.

My Scenario is:

I want to store lots of small files (1-2MB max) in map files. These files will 
come periodically during the days, so I cannot use the factory writer because 
it will create a lot of small MapFiles. (I want to store these files in the 
HDFS immediately.)

I' m trying to create a code to append Map files. I use the
org.apache.hadoop.fs.FileSystem append() method which calls the 
org.apache.hadoop.hdfs.DistributedFileSystem append() method to do the job.

My code works well, because the stock MapFile Reader can retrieve the files. My 
problem appears in the upload phase. When I try to upload a set (1GB) of small 
files, the free space of the HDFS decreases fast. The program only uploads 
400MB but according to the Cloudera Manager it is more than 5GB.
The interesting part is that, when I terminate the upload, and wait 1-2 
minutes, the HDFS goes back to normal size (500MB), and none of my files are 
lost. If I don't terminate the upload, the HDFS goes out of free space and the 
program gets errors.
I'm using cloudera quickvm 5.3 for testing, and the hdfs replication number is 
1.


Any ideas how to solve this issue?


Thanks


Re: File is not written on HDFS after running libhdfs C API

2015-03-05 Thread Azuryy Yu
Can you share your core-site.xml here?


On Thu, Mar 5, 2015 at 4:32 PM, Alexandru Calin alexandrucali...@gmail.com
wrote:

 No change at all, I've added them at the start and end of the CLASSPATH,
 either way it still writes the file on the local fs. I've also restarted
 hadoop.

 On Thu, Mar 5, 2015 at 10:22 AM, Azuryy Yu azury...@gmail.com wrote:

 Yes,  you should do it:)

 On Thu, Mar 5, 2015 at 4:17 PM, Alexandru Calin 
 alexandrucali...@gmail.com wrote:

 Wow, you are so right! it's on the local filesystem!  Do I have to
 manually specify hdfs-site.xml and core-site.xml in the CLASSPATH variable
 ? Like this:
 CLASSPATH=$CLASSPATH:/usr/local/hadoop/etc/hadoop/core-site.xml
 ?

 On Thu, Mar 5, 2015 at 10:04 AM, Azuryy Yu azury...@gmail.com wrote:

 you need to include core-site.xml as well. and I think you can find
 '/tmp/testfile.txt' on your local disk, instead of HDFS.

 if so,  My guess is right.  because you don't include core-site.xml,
 then your Filesystem schema is file:// by default, not hdfs://.



 On Thu, Mar 5, 2015 at 3:52 PM, Alexandru Calin 
 alexandrucali...@gmail.com wrote:

 I am trying to run the basic libhdfs example, it compiles ok, and
 actually runs ok, and executes the whole program, but I cannot see the 
 file
 on the HDFS.

 It is said  here http://hadoop.apache.org/docs/r1.2.1/libhdfs.html,
 that you have to include *the right configuration directory
 containing hdfs-site.xml*

 My hdfs-site.xml:

 configuration
 property
 namedfs.replication/name
 value1/value
 /property
 property
   namedfs.namenode.name.dir/name
   valuefile:///usr/local/hadoop/hadoop_data/hdfs/namenode/value
 /property
 property
   namedfs.datanode.data.dir/name
   valuefile:///usr/local/hadoop/hadoop_store/hdfs/datanode/value
 /property/configuration

 I generate my classpath with this:

 #!/bin/bashexport CLASSPATH=/usr/local/hadoop/
 declare -a subdirs=(hdfs tools common yarn mapreduce)for subdir 
 in ${subdirs[@]}do
 for file in $(find /usr/local/hadoop/share/hadoop/$subdir -name 
 *.jar)
 do
 export CLASSPATH=$CLASSPATH:$file
 donedone

 and I also add export
 CLASSPATH=$CLASSPATH:/usr/local/hadoop/etc/hadoop , where my
 *hdfs-site.xml* reside.

 MY LD_LIBRARY_PATH =
 /usr/local/hadoop/lib/native:/usr/lib/jvm/java-7-openjdk-amd64/jre/lib/amd64/server
 Code:

 #include hdfs.h#include stdio.h#include string.h#include 
 stdio.h#include stdlib.h
 int main(int argc, char **argv) {

 hdfsFS fs = hdfsConnect(default, 0);
 const char* writePath = /tmp/testfile.txt;
 hdfsFile writeFile = hdfsOpenFile(fs, writePath, O_WRONLY|O_CREAT, 0, 
 0, 0);
 if(!writeFile) {
   printf(Failed to open %s for writing!\n, writePath);
   exit(-1);
 }
 printf(\nfile opened\n);
 char* buffer = Hello, World!;
 tSize num_written_bytes = hdfsWrite(fs, writeFile, (void*)buffer, 
 strlen(buffer)+1);
 printf(\nWrote %d bytes\n, (int)num_written_bytes);
 if (hdfsFlush(fs, writeFile)) {
printf(Failed to 'flush' %s\n, writePath);
   exit(-1);
 }
hdfsCloseFile(fs, writeFile);
hdfsDisconnect(fs);
return 0;}

 It compiles and runs without error, but I cannot see the file on HDFS.

 I have Hadoop 2.6.0 on Ubuntu 14.04 64bit.

 Any ideas on this ?









Re: File is not written on HDFS after running libhdfs C API

2015-03-05 Thread Azuryy Yu
you can try:

for file in `hadoop classpath | tr ':' ' ' | sort | uniq`   ;do
  export CLASSPATH=$CLASSPATH:$file
done


On Thu, Mar 5, 2015 at 4:48 PM, Alexandru Calin alexandrucali...@gmail.com
wrote:

 This is how core-site.xml looks:

 configuration
 property
 namefs.defaultFS/name
 valuehdfs://localhost:9000/value
 /property
 /configuration

 On Thu, Mar 5, 2015 at 10:32 AM, Alexandru Calin 
 alexandrucali...@gmail.com wrote:

 No change at all, I've added them at the start and end of the CLASSPATH,
 either way it still writes the file on the local fs. I've also restarted
 hadoop.

 On Thu, Mar 5, 2015 at 10:22 AM, Azuryy Yu azury...@gmail.com wrote:

 Yes,  you should do it:)

 On Thu, Mar 5, 2015 at 4:17 PM, Alexandru Calin 
 alexandrucali...@gmail.com wrote:

 Wow, you are so right! it's on the local filesystem!  Do I have to
 manually specify hdfs-site.xml and core-site.xml in the CLASSPATH variable
 ? Like this:
 CLASSPATH=$CLASSPATH:/usr/local/hadoop/etc/hadoop/core-site.xml
 ?

 On Thu, Mar 5, 2015 at 10:04 AM, Azuryy Yu azury...@gmail.com wrote:

 you need to include core-site.xml as well. and I think you can find
 '/tmp/testfile.txt' on your local disk, instead of HDFS.

 if so,  My guess is right.  because you don't include core-site.xml,
 then your Filesystem schema is file:// by default, not hdfs://.



 On Thu, Mar 5, 2015 at 3:52 PM, Alexandru Calin 
 alexandrucali...@gmail.com wrote:

 I am trying to run the basic libhdfs example, it compiles ok, and
 actually runs ok, and executes the whole program, but I cannot see the 
 file
 on the HDFS.

 It is said  here http://hadoop.apache.org/docs/r1.2.1/libhdfs.html,
 that you have to include *the right configuration directory
 containing hdfs-site.xml*

 My hdfs-site.xml:

 configuration
 property
 namedfs.replication/name
 value1/value
 /property
 property
   namedfs.namenode.name.dir/name
   valuefile:///usr/local/hadoop/hadoop_data/hdfs/namenode/value
 /property
 property
   namedfs.datanode.data.dir/name
   valuefile:///usr/local/hadoop/hadoop_store/hdfs/datanode/value
 /property/configuration

 I generate my classpath with this:

 #!/bin/bashexport CLASSPATH=/usr/local/hadoop/
 declare -a subdirs=(hdfs tools common yarn mapreduce)for 
 subdir in ${subdirs[@]}do
 for file in $(find /usr/local/hadoop/share/hadoop/$subdir -name 
 *.jar)
 do
 export CLASSPATH=$CLASSPATH:$file
 donedone

 and I also add export
 CLASSPATH=$CLASSPATH:/usr/local/hadoop/etc/hadoop , where my
 *hdfs-site.xml* reside.

 MY LD_LIBRARY_PATH =
 /usr/local/hadoop/lib/native:/usr/lib/jvm/java-7-openjdk-amd64/jre/lib/amd64/server
 Code:

 #include hdfs.h#include stdio.h#include string.h#include 
 stdio.h#include stdlib.h
 int main(int argc, char **argv) {

 hdfsFS fs = hdfsConnect(default, 0);
 const char* writePath = /tmp/testfile.txt;
 hdfsFile writeFile = hdfsOpenFile(fs, writePath, O_WRONLY|O_CREAT, 
 0, 0, 0);
 if(!writeFile) {
   printf(Failed to open %s for writing!\n, writePath);
   exit(-1);
 }
 printf(\nfile opened\n);
 char* buffer = Hello, World!;
 tSize num_written_bytes = hdfsWrite(fs, writeFile, (void*)buffer, 
 strlen(buffer)+1);
 printf(\nWrote %d bytes\n, (int)num_written_bytes);
 if (hdfsFlush(fs, writeFile)) {
printf(Failed to 'flush' %s\n, writePath);
   exit(-1);
 }
hdfsCloseFile(fs, writeFile);
hdfsDisconnect(fs);
return 0;}

 It compiles and runs without error, but I cannot see the file on HDFS.

 I have Hadoop 2.6.0 on Ubuntu 14.04 64bit.

 Any ideas on this ?










Re: File is not written on HDFS after running libhdfs C API

2015-03-05 Thread Alexandru Calin
This is how core-site.xml looks:

configuration
property
namefs.defaultFS/name
valuehdfs://localhost:9000/value
/property
/configuration

On Thu, Mar 5, 2015 at 10:32 AM, Alexandru Calin alexandrucali...@gmail.com
 wrote:

 No change at all, I've added them at the start and end of the CLASSPATH,
 either way it still writes the file on the local fs. I've also restarted
 hadoop.

 On Thu, Mar 5, 2015 at 10:22 AM, Azuryy Yu azury...@gmail.com wrote:

 Yes,  you should do it:)

 On Thu, Mar 5, 2015 at 4:17 PM, Alexandru Calin 
 alexandrucali...@gmail.com wrote:

 Wow, you are so right! it's on the local filesystem!  Do I have to
 manually specify hdfs-site.xml and core-site.xml in the CLASSPATH variable
 ? Like this:
 CLASSPATH=$CLASSPATH:/usr/local/hadoop/etc/hadoop/core-site.xml
 ?

 On Thu, Mar 5, 2015 at 10:04 AM, Azuryy Yu azury...@gmail.com wrote:

 you need to include core-site.xml as well. and I think you can find
 '/tmp/testfile.txt' on your local disk, instead of HDFS.

 if so,  My guess is right.  because you don't include core-site.xml,
 then your Filesystem schema is file:// by default, not hdfs://.



 On Thu, Mar 5, 2015 at 3:52 PM, Alexandru Calin 
 alexandrucali...@gmail.com wrote:

 I am trying to run the basic libhdfs example, it compiles ok, and
 actually runs ok, and executes the whole program, but I cannot see the 
 file
 on the HDFS.

 It is said  here http://hadoop.apache.org/docs/r1.2.1/libhdfs.html,
 that you have to include *the right configuration directory
 containing hdfs-site.xml*

 My hdfs-site.xml:

 configuration
 property
 namedfs.replication/name
 value1/value
 /property
 property
   namedfs.namenode.name.dir/name
   valuefile:///usr/local/hadoop/hadoop_data/hdfs/namenode/value
 /property
 property
   namedfs.datanode.data.dir/name
   valuefile:///usr/local/hadoop/hadoop_store/hdfs/datanode/value
 /property/configuration

 I generate my classpath with this:

 #!/bin/bashexport CLASSPATH=/usr/local/hadoop/
 declare -a subdirs=(hdfs tools common yarn mapreduce)for subdir 
 in ${subdirs[@]}do
 for file in $(find /usr/local/hadoop/share/hadoop/$subdir -name 
 *.jar)
 do
 export CLASSPATH=$CLASSPATH:$file
 donedone

 and I also add export
 CLASSPATH=$CLASSPATH:/usr/local/hadoop/etc/hadoop , where my
 *hdfs-site.xml* reside.

 MY LD_LIBRARY_PATH =
 /usr/local/hadoop/lib/native:/usr/lib/jvm/java-7-openjdk-amd64/jre/lib/amd64/server
 Code:

 #include hdfs.h#include stdio.h#include string.h#include 
 stdio.h#include stdlib.h
 int main(int argc, char **argv) {

 hdfsFS fs = hdfsConnect(default, 0);
 const char* writePath = /tmp/testfile.txt;
 hdfsFile writeFile = hdfsOpenFile(fs, writePath, O_WRONLY|O_CREAT, 0, 
 0, 0);
 if(!writeFile) {
   printf(Failed to open %s for writing!\n, writePath);
   exit(-1);
 }
 printf(\nfile opened\n);
 char* buffer = Hello, World!;
 tSize num_written_bytes = hdfsWrite(fs, writeFile, (void*)buffer, 
 strlen(buffer)+1);
 printf(\nWrote %d bytes\n, (int)num_written_bytes);
 if (hdfsFlush(fs, writeFile)) {
printf(Failed to 'flush' %s\n, writePath);
   exit(-1);
 }
hdfsCloseFile(fs, writeFile);
hdfsDisconnect(fs);
return 0;}

 It compiles and runs without error, but I cannot see the file on HDFS.

 I have Hadoop 2.6.0 on Ubuntu 14.04 64bit.

 Any ideas on this ?









Re: File is not written on HDFS after running libhdfs C API

2015-03-05 Thread Alexandru Calin
Now I've also started YARN ( just for the sake of trying anything), the
config for mapred-site.xml and yarn-site.xml are those on apache
website. A *jps
*command shows:

11257 NodeManager
11129 ResourceManager
11815 Jps
10620 NameNode
10966 SecondaryNameNode

On Thu, Mar 5, 2015 at 10:48 AM, Azuryy Yu azury...@gmail.com wrote:

 Can you share your core-site.xml here?


 On Thu, Mar 5, 2015 at 4:32 PM, Alexandru Calin 
 alexandrucali...@gmail.com wrote:

 No change at all, I've added them at the start and end of the CLASSPATH,
 either way it still writes the file on the local fs. I've also restarted
 hadoop.

 On Thu, Mar 5, 2015 at 10:22 AM, Azuryy Yu azury...@gmail.com wrote:

 Yes,  you should do it:)

 On Thu, Mar 5, 2015 at 4:17 PM, Alexandru Calin 
 alexandrucali...@gmail.com wrote:

 Wow, you are so right! it's on the local filesystem!  Do I have to
 manually specify hdfs-site.xml and core-site.xml in the CLASSPATH variable
 ? Like this:
 CLASSPATH=$CLASSPATH:/usr/local/hadoop/etc/hadoop/core-site.xml
 ?

 On Thu, Mar 5, 2015 at 10:04 AM, Azuryy Yu azury...@gmail.com wrote:

 you need to include core-site.xml as well. and I think you can find
 '/tmp/testfile.txt' on your local disk, instead of HDFS.

 if so,  My guess is right.  because you don't include core-site.xml,
 then your Filesystem schema is file:// by default, not hdfs://.



 On Thu, Mar 5, 2015 at 3:52 PM, Alexandru Calin 
 alexandrucali...@gmail.com wrote:

 I am trying to run the basic libhdfs example, it compiles ok, and
 actually runs ok, and executes the whole program, but I cannot see the 
 file
 on the HDFS.

 It is said  here http://hadoop.apache.org/docs/r1.2.1/libhdfs.html,
 that you have to include *the right configuration directory
 containing hdfs-site.xml*

 My hdfs-site.xml:

 configuration
 property
 namedfs.replication/name
 value1/value
 /property
 property
   namedfs.namenode.name.dir/name
   valuefile:///usr/local/hadoop/hadoop_data/hdfs/namenode/value
 /property
 property
   namedfs.datanode.data.dir/name
   valuefile:///usr/local/hadoop/hadoop_store/hdfs/datanode/value
 /property/configuration

 I generate my classpath with this:

 #!/bin/bashexport CLASSPATH=/usr/local/hadoop/
 declare -a subdirs=(hdfs tools common yarn mapreduce)for 
 subdir in ${subdirs[@]}do
 for file in $(find /usr/local/hadoop/share/hadoop/$subdir -name 
 *.jar)
 do
 export CLASSPATH=$CLASSPATH:$file
 donedone

 and I also add export
 CLASSPATH=$CLASSPATH:/usr/local/hadoop/etc/hadoop , where my
 *hdfs-site.xml* reside.

 MY LD_LIBRARY_PATH =
 /usr/local/hadoop/lib/native:/usr/lib/jvm/java-7-openjdk-amd64/jre/lib/amd64/server
 Code:

 #include hdfs.h#include stdio.h#include string.h#include 
 stdio.h#include stdlib.h
 int main(int argc, char **argv) {

 hdfsFS fs = hdfsConnect(default, 0);
 const char* writePath = /tmp/testfile.txt;
 hdfsFile writeFile = hdfsOpenFile(fs, writePath, O_WRONLY|O_CREAT, 
 0, 0, 0);
 if(!writeFile) {
   printf(Failed to open %s for writing!\n, writePath);
   exit(-1);
 }
 printf(\nfile opened\n);
 char* buffer = Hello, World!;
 tSize num_written_bytes = hdfsWrite(fs, writeFile, (void*)buffer, 
 strlen(buffer)+1);
 printf(\nWrote %d bytes\n, (int)num_written_bytes);
 if (hdfsFlush(fs, writeFile)) {
printf(Failed to 'flush' %s\n, writePath);
   exit(-1);
 }
hdfsCloseFile(fs, writeFile);
hdfsDisconnect(fs);
return 0;}

 It compiles and runs without error, but I cannot see the file on HDFS.

 I have Hadoop 2.6.0 on Ubuntu 14.04 64bit.

 Any ideas on this ?










Re: File is not written on HDFS after running libhdfs C API

2015-03-05 Thread Alexandru Calin
Wow, you are so right! it's on the local filesystem!  Do I have to manually
specify hdfs-site.xml and core-site.xml in the CLASSPATH variable ? Like
this:
CLASSPATH=$CLASSPATH:/usr/local/hadoop/etc/hadoop/core-site.xml
?

On Thu, Mar 5, 2015 at 10:04 AM, Azuryy Yu azury...@gmail.com wrote:

 you need to include core-site.xml as well. and I think you can find
 '/tmp/testfile.txt' on your local disk, instead of HDFS.

 if so,  My guess is right.  because you don't include core-site.xml, then
 your Filesystem schema is file:// by default, not hdfs://.



 On Thu, Mar 5, 2015 at 3:52 PM, Alexandru Calin 
 alexandrucali...@gmail.com wrote:

 I am trying to run the basic libhdfs example, it compiles ok, and
 actually runs ok, and executes the whole program, but I cannot see the file
 on the HDFS.

 It is said  here http://hadoop.apache.org/docs/r1.2.1/libhdfs.html,
 that you have to include *the right configuration directory containing
 hdfs-site.xml*

 My hdfs-site.xml:

 configuration
 property
 namedfs.replication/name
 value1/value
 /property
 property
   namedfs.namenode.name.dir/name
   valuefile:///usr/local/hadoop/hadoop_data/hdfs/namenode/value
 /property
 property
   namedfs.datanode.data.dir/name
   valuefile:///usr/local/hadoop/hadoop_store/hdfs/datanode/value
 /property/configuration

 I generate my classpath with this:

 #!/bin/bashexport CLASSPATH=/usr/local/hadoop/
 declare -a subdirs=(hdfs tools common yarn mapreduce)for subdir in 
 ${subdirs[@]}do
 for file in $(find /usr/local/hadoop/share/hadoop/$subdir -name 
 *.jar)
 do
 export CLASSPATH=$CLASSPATH:$file
 donedone

 and I also add export CLASSPATH=$CLASSPATH:/usr/local/hadoop/etc/hadoop ,
 where my *hdfs-site.xml* reside.

 MY LD_LIBRARY_PATH =
 /usr/local/hadoop/lib/native:/usr/lib/jvm/java-7-openjdk-amd64/jre/lib/amd64/server
 Code:

 #include hdfs.h#include stdio.h#include string.h#include 
 stdio.h#include stdlib.h
 int main(int argc, char **argv) {

 hdfsFS fs = hdfsConnect(default, 0);
 const char* writePath = /tmp/testfile.txt;
 hdfsFile writeFile = hdfsOpenFile(fs, writePath, O_WRONLY|O_CREAT, 0, 0, 
 0);
 if(!writeFile) {
   printf(Failed to open %s for writing!\n, writePath);
   exit(-1);
 }
 printf(\nfile opened\n);
 char* buffer = Hello, World!;
 tSize num_written_bytes = hdfsWrite(fs, writeFile, (void*)buffer, 
 strlen(buffer)+1);
 printf(\nWrote %d bytes\n, (int)num_written_bytes);
 if (hdfsFlush(fs, writeFile)) {
printf(Failed to 'flush' %s\n, writePath);
   exit(-1);
 }
hdfsCloseFile(fs, writeFile);
hdfsDisconnect(fs);
return 0;}

 It compiles and runs without error, but I cannot see the file on HDFS.

 I have Hadoop 2.6.0 on Ubuntu 14.04 64bit.

 Any ideas on this ?






Re: File is not written on HDFS after running libhdfs C API

2015-03-05 Thread Alexandru Calin
No change at all, I've added them at the start and end of the CLASSPATH,
either way it still writes the file on the local fs. I've also restarted
hadoop.

On Thu, Mar 5, 2015 at 10:22 AM, Azuryy Yu azury...@gmail.com wrote:

 Yes,  you should do it:)

 On Thu, Mar 5, 2015 at 4:17 PM, Alexandru Calin 
 alexandrucali...@gmail.com wrote:

 Wow, you are so right! it's on the local filesystem!  Do I have to
 manually specify hdfs-site.xml and core-site.xml in the CLASSPATH variable
 ? Like this:
 CLASSPATH=$CLASSPATH:/usr/local/hadoop/etc/hadoop/core-site.xml
 ?

 On Thu, Mar 5, 2015 at 10:04 AM, Azuryy Yu azury...@gmail.com wrote:

 you need to include core-site.xml as well. and I think you can find
 '/tmp/testfile.txt' on your local disk, instead of HDFS.

 if so,  My guess is right.  because you don't include core-site.xml,
 then your Filesystem schema is file:// by default, not hdfs://.



 On Thu, Mar 5, 2015 at 3:52 PM, Alexandru Calin 
 alexandrucali...@gmail.com wrote:

 I am trying to run the basic libhdfs example, it compiles ok, and
 actually runs ok, and executes the whole program, but I cannot see the file
 on the HDFS.

 It is said  here http://hadoop.apache.org/docs/r1.2.1/libhdfs.html,
 that you have to include *the right configuration directory containing
 hdfs-site.xml*

 My hdfs-site.xml:

 configuration
 property
 namedfs.replication/name
 value1/value
 /property
 property
   namedfs.namenode.name.dir/name
   valuefile:///usr/local/hadoop/hadoop_data/hdfs/namenode/value
 /property
 property
   namedfs.datanode.data.dir/name
   valuefile:///usr/local/hadoop/hadoop_store/hdfs/datanode/value
 /property/configuration

 I generate my classpath with this:

 #!/bin/bashexport CLASSPATH=/usr/local/hadoop/
 declare -a subdirs=(hdfs tools common yarn mapreduce)for subdir 
 in ${subdirs[@]}do
 for file in $(find /usr/local/hadoop/share/hadoop/$subdir -name 
 *.jar)
 do
 export CLASSPATH=$CLASSPATH:$file
 donedone

 and I also add export CLASSPATH=$CLASSPATH:/usr/local/hadoop/etc/hadoop ,
 where my *hdfs-site.xml* reside.

 MY LD_LIBRARY_PATH =
 /usr/local/hadoop/lib/native:/usr/lib/jvm/java-7-openjdk-amd64/jre/lib/amd64/server
 Code:

 #include hdfs.h#include stdio.h#include string.h#include 
 stdio.h#include stdlib.h
 int main(int argc, char **argv) {

 hdfsFS fs = hdfsConnect(default, 0);
 const char* writePath = /tmp/testfile.txt;
 hdfsFile writeFile = hdfsOpenFile(fs, writePath, O_WRONLY|O_CREAT, 0, 
 0, 0);
 if(!writeFile) {
   printf(Failed to open %s for writing!\n, writePath);
   exit(-1);
 }
 printf(\nfile opened\n);
 char* buffer = Hello, World!;
 tSize num_written_bytes = hdfsWrite(fs, writeFile, (void*)buffer, 
 strlen(buffer)+1);
 printf(\nWrote %d bytes\n, (int)num_written_bytes);
 if (hdfsFlush(fs, writeFile)) {
printf(Failed to 'flush' %s\n, writePath);
   exit(-1);
 }
hdfsCloseFile(fs, writeFile);
hdfsDisconnect(fs);
return 0;}

 It compiles and runs without error, but I cannot see the file on HDFS.

 I have Hadoop 2.6.0 on Ubuntu 14.04 64bit.

 Any ideas on this ?








Re: File is not written on HDFS after running libhdfs C API

2015-03-05 Thread Azuryy Yu
Yes,  you should do it:)

On Thu, Mar 5, 2015 at 4:17 PM, Alexandru Calin alexandrucali...@gmail.com
wrote:

 Wow, you are so right! it's on the local filesystem!  Do I have to
 manually specify hdfs-site.xml and core-site.xml in the CLASSPATH variable
 ? Like this:
 CLASSPATH=$CLASSPATH:/usr/local/hadoop/etc/hadoop/core-site.xml
 ?

 On Thu, Mar 5, 2015 at 10:04 AM, Azuryy Yu azury...@gmail.com wrote:

 you need to include core-site.xml as well. and I think you can find
 '/tmp/testfile.txt' on your local disk, instead of HDFS.

 if so,  My guess is right.  because you don't include core-site.xml, then
 your Filesystem schema is file:// by default, not hdfs://.



 On Thu, Mar 5, 2015 at 3:52 PM, Alexandru Calin 
 alexandrucali...@gmail.com wrote:

 I am trying to run the basic libhdfs example, it compiles ok, and
 actually runs ok, and executes the whole program, but I cannot see the file
 on the HDFS.

 It is said  here http://hadoop.apache.org/docs/r1.2.1/libhdfs.html,
 that you have to include *the right configuration directory containing
 hdfs-site.xml*

 My hdfs-site.xml:

 configuration
 property
 namedfs.replication/name
 value1/value
 /property
 property
   namedfs.namenode.name.dir/name
   valuefile:///usr/local/hadoop/hadoop_data/hdfs/namenode/value
 /property
 property
   namedfs.datanode.data.dir/name
   valuefile:///usr/local/hadoop/hadoop_store/hdfs/datanode/value
 /property/configuration

 I generate my classpath with this:

 #!/bin/bashexport CLASSPATH=/usr/local/hadoop/
 declare -a subdirs=(hdfs tools common yarn mapreduce)for subdir 
 in ${subdirs[@]}do
 for file in $(find /usr/local/hadoop/share/hadoop/$subdir -name 
 *.jar)
 do
 export CLASSPATH=$CLASSPATH:$file
 donedone

 and I also add export CLASSPATH=$CLASSPATH:/usr/local/hadoop/etc/hadoop ,
 where my *hdfs-site.xml* reside.

 MY LD_LIBRARY_PATH =
 /usr/local/hadoop/lib/native:/usr/lib/jvm/java-7-openjdk-amd64/jre/lib/amd64/server
 Code:

 #include hdfs.h#include stdio.h#include string.h#include 
 stdio.h#include stdlib.h
 int main(int argc, char **argv) {

 hdfsFS fs = hdfsConnect(default, 0);
 const char* writePath = /tmp/testfile.txt;
 hdfsFile writeFile = hdfsOpenFile(fs, writePath, O_WRONLY|O_CREAT, 0, 
 0, 0);
 if(!writeFile) {
   printf(Failed to open %s for writing!\n, writePath);
   exit(-1);
 }
 printf(\nfile opened\n);
 char* buffer = Hello, World!;
 tSize num_written_bytes = hdfsWrite(fs, writeFile, (void*)buffer, 
 strlen(buffer)+1);
 printf(\nWrote %d bytes\n, (int)num_written_bytes);
 if (hdfsFlush(fs, writeFile)) {
printf(Failed to 'flush' %s\n, writePath);
   exit(-1);
 }
hdfsCloseFile(fs, writeFile);
hdfsDisconnect(fs);
return 0;}

 It compiles and runs without error, but I cannot see the file on HDFS.

 I have Hadoop 2.6.0 on Ubuntu 14.04 64bit.

 Any ideas on this ?







Re: File is not written on HDFS after running libhdfs C API

2015-03-05 Thread Azuryy Yu
You don't need to start Yarn if you only want to write HDFS using C API.
and you also don't need to restart HDFS.



On Thu, Mar 5, 2015 at 4:58 PM, Alexandru Calin alexandrucali...@gmail.com
wrote:

 Now I've also started YARN ( just for the sake of trying anything), the
 config for mapred-site.xml and yarn-site.xml are those on apache website. A 
 *jps
 *command shows:

 11257 NodeManager
 11129 ResourceManager
 11815 Jps
 10620 NameNode
 10966 SecondaryNameNode

 On Thu, Mar 5, 2015 at 10:48 AM, Azuryy Yu azury...@gmail.com wrote:

 Can you share your core-site.xml here?


 On Thu, Mar 5, 2015 at 4:32 PM, Alexandru Calin 
 alexandrucali...@gmail.com wrote:

 No change at all, I've added them at the start and end of the CLASSPATH,
 either way it still writes the file on the local fs. I've also restarted
 hadoop.

 On Thu, Mar 5, 2015 at 10:22 AM, Azuryy Yu azury...@gmail.com wrote:

 Yes,  you should do it:)

 On Thu, Mar 5, 2015 at 4:17 PM, Alexandru Calin 
 alexandrucali...@gmail.com wrote:

 Wow, you are so right! it's on the local filesystem!  Do I have to
 manually specify hdfs-site.xml and core-site.xml in the CLASSPATH variable
 ? Like this:
 CLASSPATH=$CLASSPATH:/usr/local/hadoop/etc/hadoop/core-site.xml
 ?

 On Thu, Mar 5, 2015 at 10:04 AM, Azuryy Yu azury...@gmail.com wrote:

 you need to include core-site.xml as well. and I think you can find
 '/tmp/testfile.txt' on your local disk, instead of HDFS.

 if so,  My guess is right.  because you don't include core-site.xml,
 then your Filesystem schema is file:// by default, not hdfs://.



 On Thu, Mar 5, 2015 at 3:52 PM, Alexandru Calin 
 alexandrucali...@gmail.com wrote:

 I am trying to run the basic libhdfs example, it compiles ok, and
 actually runs ok, and executes the whole program, but I cannot see the 
 file
 on the HDFS.

 It is said  here http://hadoop.apache.org/docs/r1.2.1/libhdfs.html,
 that you have to include *the right configuration directory
 containing hdfs-site.xml*

 My hdfs-site.xml:

 configuration
 property
 namedfs.replication/name
 value1/value
 /property
 property
   namedfs.namenode.name.dir/name
   valuefile:///usr/local/hadoop/hadoop_data/hdfs/namenode/value
 /property
 property
   namedfs.datanode.data.dir/name
   valuefile:///usr/local/hadoop/hadoop_store/hdfs/datanode/value
 /property/configuration

 I generate my classpath with this:

 #!/bin/bashexport CLASSPATH=/usr/local/hadoop/
 declare -a subdirs=(hdfs tools common yarn mapreduce)for 
 subdir in ${subdirs[@]}do
 for file in $(find /usr/local/hadoop/share/hadoop/$subdir -name 
 *.jar)
 do
 export CLASSPATH=$CLASSPATH:$file
 donedone

 and I also add export
 CLASSPATH=$CLASSPATH:/usr/local/hadoop/etc/hadoop , where my
 *hdfs-site.xml* reside.

 MY LD_LIBRARY_PATH =
 /usr/local/hadoop/lib/native:/usr/lib/jvm/java-7-openjdk-amd64/jre/lib/amd64/server
 Code:

 #include hdfs.h#include stdio.h#include string.h#include 
 stdio.h#include stdlib.h
 int main(int argc, char **argv) {

 hdfsFS fs = hdfsConnect(default, 0);
 const char* writePath = /tmp/testfile.txt;
 hdfsFile writeFile = hdfsOpenFile(fs, writePath, O_WRONLY|O_CREAT, 
 0, 0, 0);
 if(!writeFile) {
   printf(Failed to open %s for writing!\n, writePath);
   exit(-1);
 }
 printf(\nfile opened\n);
 char* buffer = Hello, World!;
 tSize num_written_bytes = hdfsWrite(fs, writeFile, (void*)buffer, 
 strlen(buffer)+1);
 printf(\nWrote %d bytes\n, (int)num_written_bytes);
 if (hdfsFlush(fs, writeFile)) {
printf(Failed to 'flush' %s\n, writePath);
   exit(-1);
 }
hdfsCloseFile(fs, writeFile);
hdfsDisconnect(fs);
return 0;}

 It compiles and runs without error, but I cannot see the file on
 HDFS.

 I have Hadoop 2.6.0 on Ubuntu 14.04 64bit.

 Any ideas on this ?











Re: File is not written on HDFS after running libhdfs C API

2015-03-05 Thread Azuryy Yu
you need to include core-site.xml as well. and I think you can find
'/tmp/testfile.txt' on your local disk, instead of HDFS.

if so,  My guess is right.  because you don't include core-site.xml, then
your Filesystem schema is file:// by default, not hdfs://.



On Thu, Mar 5, 2015 at 3:52 PM, Alexandru Calin alexandrucali...@gmail.com
wrote:

 I am trying to run the basic libhdfs example, it compiles ok, and actually
 runs ok, and executes the whole program, but I cannot see the file on the
 HDFS.

 It is said  here http://hadoop.apache.org/docs/r1.2.1/libhdfs.html,
 that you have to include *the right configuration directory containing
 hdfs-site.xml*

 My hdfs-site.xml:

 configuration
 property
 namedfs.replication/name
 value1/value
 /property
 property
   namedfs.namenode.name.dir/name
   valuefile:///usr/local/hadoop/hadoop_data/hdfs/namenode/value
 /property
 property
   namedfs.datanode.data.dir/name
   valuefile:///usr/local/hadoop/hadoop_store/hdfs/datanode/value
 /property/configuration

 I generate my classpath with this:

 #!/bin/bashexport CLASSPATH=/usr/local/hadoop/
 declare -a subdirs=(hdfs tools common yarn mapreduce)for subdir in 
 ${subdirs[@]}do
 for file in $(find /usr/local/hadoop/share/hadoop/$subdir -name *.jar)
 do
 export CLASSPATH=$CLASSPATH:$file
 donedone

 and I also add export CLASSPATH=$CLASSPATH:/usr/local/hadoop/etc/hadoop ,
 where my *hdfs-site.xml* reside.

 MY LD_LIBRARY_PATH =
 /usr/local/hadoop/lib/native:/usr/lib/jvm/java-7-openjdk-amd64/jre/lib/amd64/server
 Code:

 #include hdfs.h#include stdio.h#include string.h#include 
 stdio.h#include stdlib.h
 int main(int argc, char **argv) {

 hdfsFS fs = hdfsConnect(default, 0);
 const char* writePath = /tmp/testfile.txt;
 hdfsFile writeFile = hdfsOpenFile(fs, writePath, O_WRONLY|O_CREAT, 0, 0, 
 0);
 if(!writeFile) {
   printf(Failed to open %s for writing!\n, writePath);
   exit(-1);
 }
 printf(\nfile opened\n);
 char* buffer = Hello, World!;
 tSize num_written_bytes = hdfsWrite(fs, writeFile, (void*)buffer, 
 strlen(buffer)+1);
 printf(\nWrote %d bytes\n, (int)num_written_bytes);
 if (hdfsFlush(fs, writeFile)) {
printf(Failed to 'flush' %s\n, writePath);
   exit(-1);
 }
hdfsCloseFile(fs, writeFile);
hdfsDisconnect(fs);
return 0;}

 It compiles and runs without error, but I cannot see the file on HDFS.

 I have Hadoop 2.6.0 on Ubuntu 14.04 64bit.

 Any ideas on this ?





Cloudera monitoring Services not starting

2015-03-05 Thread Krish Donald
Hi,

I have setup a 4 node cliuster , 1 namenode and 3 datanode using cloudera
manager 5.2 .
But it is not starting Cloudra Monitorinf service and for Hosts health it
is showing unknown.

How can I disable Monitoring service completely and work with only cluster
different feature.

Thanks
Krish


Re: Cloudera monitoring Services not starting

2015-03-05 Thread Rich Haase
Please ask cloudera related questions on Cloudera’s forums.  
community.cloudera.com

On Mar 5, 2015, at 11:56 AM, Krish Donald gotomyp...@gmail.com wrote:

 Hi,
 
 I have setup a 4 node cliuster , 1 namenode and 3 datanode using cloudera 
 manager 5.2 .
 But it is not starting Cloudra Monitorinf service and for Hosts health it is 
 showing unknown.
 
 How can I disable Monitoring service completely and work with only cluster 
 different feature.
 
 Thanks
 Krish



Re: File is not written on HDFS after running libhdfs C API

2015-03-05 Thread Alexandru Calin
DataNode was not starting due to this error : java.io.IOException:
Incompatible clusterIDs in /usr/local/hadoop/hadoop_store/hdfs/datanode:
namenode clusterID = CID-b788c93b-a1d7-4351-bd91-28fdd134e9ba; datanode
clusterID = CID-862f3fad-175e-442d-a06b-d65ac57d64b2

I can't image how this happened, anyway .. I issued this command :
*bin/hdfs namenode -format -clusterId
CID-862f3fad-175e-442d-a06b-d65ac57d64b2*

And that got it started, the file is written correctly.

Thank you very much


On Thu, Mar 5, 2015 at 2:03 PM, Alexandru Calin alexandrucali...@gmail.com
wrote:

 After putting the CLASSPATH initialization in .bashrc it creates the file,
 but it has 0 size and I also get this warnning:

 file opened

 Wrote 14 bytes
 15/03/05 14:00:55 WARN hdfs.DFSClient: DataStreamer Exception
 org.apache.hadoop.ipc.RemoteException(java.io.IOException): File
 /tmp/testfile.txt could only be replicated to 0 nodes instead of
 minReplication (=1).  There are 0 datanode(s) running and no node(s) are
 excluded in this operation.
 at
 org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:1549)
 at
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:3200)
 at
 org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:641)
 at
 org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:482)
 at
 org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
 at
 org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619)
 at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:962)
 at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2039)
 at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2035)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
 at
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
 at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2033)

 at org.apache.hadoop.ipc.Client.call(Client.java:1468)
 at org.apache.hadoop.ipc.Client.call(Client.java:1399)
 at
 org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232)
 at com.sun.proxy.$Proxy9.addBlock(Unknown Source)
 at
 org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:399)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:606)
 at
 org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
 at
 org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
 at com.sun.proxy.$Proxy10.addBlock(Unknown Source)
 at
 org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1532)
 at
 org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1349)
 at
 org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:588)
 FSDataOutputStream#close error:
 org.apache.hadoop.ipc.RemoteException(java.io.IOException): File
 /tmp/testfile.txt could only be replicated to 0 nodes instead of
 minReplication (=1).  There are 0 datanode(s) running and no node(s) are
 excluded in this operation.
 at
 org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:1549)
 at
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:3200)
 at
 org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:641)
 at
 org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:482)
 at
 org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
 at
 org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619)
 at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:962)
 at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2039)
 at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2035)
 at java.security.AccessController.doPrivileged(Native Method)
 at 

HDFS Append Problem

2015-03-05 Thread Molnár Bálint
Hi Everyone!


I ‘m experiencing an annoying problem.


My Scenario is:


I want to store lots of small files (1-2MB max) in map files. These files
will come periodically during the days, so I cannot use the “factory”
writer because it will create a lot of small MapFiles. (I want to store
these files in the HDFS immediately.)


I’ m trying to create a code to append Map files. I use the

*org.apache.hadoop.fs.FileSystem append() *method which calls the
*org.apache.hadoop.hdfs.DistributedFileSystem
append()* method to do the job.


My code works well, because the stock MapFile Reader can retrieve the
files. My problem appears in the upload phase. When I try to upload a set
(1GB) of small files, the free space of the HDFS decreases fast. The
program only uploads 400MB but according to the Cloudera Manager it is more
than 5GB.

The interesting part is that, when I terminate the upload, and wait 1-2
minutes, the HDFS goes back to normal size (500MB), and none of my files
are lost. If I don’t terminate the upload, the HDFS goes out of free space
and the program gets errors.

I’m using cloudera quickvm 5.3 for testing, and the hdfs replication number
is 1.



Any ideas how to solve this issue?



Thanks


Re: File is not written on HDFS after running libhdfs C API

2015-03-05 Thread Alexandru Calin
After putting the CLASSPATH initialization in .bashrc it creates the file,
but it has 0 size and I also get this warnning:

file opened

Wrote 14 bytes
15/03/05 14:00:55 WARN hdfs.DFSClient: DataStreamer Exception
org.apache.hadoop.ipc.RemoteException(java.io.IOException): File
/tmp/testfile.txt could only be replicated to 0 nodes instead of
minReplication (=1).  There are 0 datanode(s) running and no node(s) are
excluded in this operation.
at
org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:1549)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:3200)
at
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:641)
at
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:482)
at
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:962)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2039)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2035)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2033)

at org.apache.hadoop.ipc.Client.call(Client.java:1468)
at org.apache.hadoop.ipc.Client.call(Client.java:1399)
at
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232)
at com.sun.proxy.$Proxy9.addBlock(Unknown Source)
at
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:399)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
at
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
at com.sun.proxy.$Proxy10.addBlock(Unknown Source)
at
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1532)
at
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1349)
at
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:588)
FSDataOutputStream#close error:
org.apache.hadoop.ipc.RemoteException(java.io.IOException): File
/tmp/testfile.txt could only be replicated to 0 nodes instead of
minReplication (=1).  There are 0 datanode(s) running and no node(s) are
excluded in this operation.
at
org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:1549)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:3200)
at
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:641)
at
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:482)
at
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:962)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2039)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2035)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2033)

at org.apache.hadoop.ipc.Client.call(Client.java:1468)
at org.apache.hadoop.ipc.Client.call(Client.java:1399)
at
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232)
at com.sun.proxy.$Proxy9.addBlock(Unknown Source)
at
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:399)
at 

Re: t2.micro on AWS; Is it enough for setting up Hadoop cluster ?

2015-03-05 Thread Jonathan Aquilina
 

Hi guys I know you guys want to keep costs down, but why go through all
the effort to setup ec2 instances when you deploy EMR it takes the time
to provision and setup the ec2 instances for you. All configuration then
for the entire cluster is done on the master node of the particular
cluster or setting up of additional software that is all done through
the EMR console. We were doing some geospatial calculations and we
loaded a 3rd party jar file called esri into the EMR cluster. I then had
to pass a small bootstrap action (script) to have it distribute esri to
the entire cluster. 

Why are you guys reinventing the wheel? 

---
Regards,
Jonathan Aquilina
Founder Eagle Eye T

On 2015-03-06 03:35, Alexander Pivovarov wrote: 

 I found the following solution to this problem
 
 I registered 2 subdomains (public and local) for each computer on 
 https://freedns.afraid.org/subdomain/ [1] 
 e.g. 
 myhadoop-nn.crabdance.com [2]
 myhadoop-nn-local.crabdance.com [3] 
 then I added cron job which sends http requests to update public and local ip 
 on freedns server hint: public ip is detected automatically ip address for 
 local name can be set using request parameter address=10.x.x.x (don't forget 
 to escape )
 
 as a result my nn computer has 2 DNS names with currently assigned ip 
 addresses , e.g.
 myhadoop-nn.crabdance.com [2] 54.203.181.177
 myhadoop-nn-local.crabdance.com [3] 10.220.149.103
 
 in hadoop configuration I can use local machine names to access my cluster 
 outside of AWS I can use public names
 
 Just curious if AWS provides easier way to name EC2 computers?
 
 On Thu, Mar 5, 2015 at 5:19 PM, Jonathan Aquilina jaquil...@eagleeyet.net 
 wrote:
 
 I dont know how you would do that to be honest. With EMR you have 
 destinctions master core and task nodes. If you need to change configuration 
 you just ssh into the EMR master node. 
 
 ---
 Regards,
 Jonathan Aquilina
 Founder Eagle Eye T
 
 On 2015-03-06 02:11, Alexander Pivovarov wrote: 
 
 What is the easiest way to assign names to aws ec2 computers?
 I guess computer need static hostname and dns name before it can be used in 
 hadoop cluster. 
 On Mar 5, 2015 4:36 PM, Jonathan Aquilina jaquil...@eagleeyet.net wrote:
 
 When I started with EMR it was alot of testing and trial and error. HUE is 
 already supported as something that can be installed from the AWS console. 
 What I need to know is if you need this cluster on all the time or this is 
 goign ot be what amazon call a transient cluster. Meaning you fire it up run 
 the job and tear it back down. 
 
 ---
 Regards,
 Jonathan Aquilina
 Founder Eagle Eye T
 
 On 2015-03-06 01:10, Krish Donald wrote: 
 
 Thanks Jonathan, 
 
 I will try to explore EMR option also. 
 Can you please let me know the configuration which you have used it? 
 Can you please recommend for me also? 
 I would like to setup Hadoop cluster using cloudera manager and then would 
 like to do below things: 
 
 setup kerberos
 setup federation
 setup monitoring
 setup hadr
 backup and recovery
 authorization using sentry
 backup and recovery of individual componenets
 performamce tuning
 upgrade of cdh 
 upgrade of CM
 Hue User Administration 
 Spark 
 Solr 
 
 Thanks 
 Krish 
 
 On Thu, Mar 5, 2015 at 3:57 PM, Jonathan Aquilina jaquil...@eagleeyet.net 
 wrote:
 
 krish EMR wont cost you much with all the testing and data we ran through the 
 test systems as well as the large amont of data when everythign was read we 
 paid about 15.00 USD. I honestly do not think that the specs there would be 
 enough as java can be pretty ram hungry. 
 
 ---
 Regards,
 Jonathan Aquilina
 Founder Eagle Eye T
 
 On 2015-03-06 00:41, Krish Donald wrote: 
 
 Hi, 
 
 I am new to AWS and would like to setup Hadoop cluster using cloudera manager 
 for 6-7 nodes. 
 
 t2.micro on AWS; Is it enough for setting up Hadoop cluster ? 
 I would like to use free service as of now. 
 
 Please advise. 
 
 Thanks 
 Krish
 

Links:
--
[1] https://freedns.afraid.org/subdomain/
[2] http://myhadoop-nn.crabdance.com
[3] http://myhadoop-nn-local.crabdance.com


Re: t2.micro on AWS; Is it enough for setting up Hadoop cluster ?

2015-03-05 Thread Jonathan Aquilina
 

When I started with EMR it was alot of testing and trial and error. HUE
is already supported as something that can be installed from the AWS
console. What I need to know is if you need this cluster on all the time
or this is goign ot be what amazon call a transient cluster. Meaning you
fire it up run the job and tear it back down. 

---
Regards,
Jonathan Aquilina
Founder Eagle Eye T

On 2015-03-06 01:10, Krish Donald wrote: 

 Thanks Jonathan, 
 
 I will try to explore EMR option also. 
 Can you please let me know the configuration which you have used it? 
 Can you please recommend for me also? 
 I would like to setup Hadoop cluster using cloudera manager and then would 
 like to do below things: 
 
 setup kerberos
 setup federation
 setup monitoring
 setup hadr
 backup and recovery
 authorization using sentry
 backup and recovery of individual componenets
 performamce tuning
 upgrade of cdh 
 upgrade of CM
 Hue User Administration 
 Spark 
 Solr 
 
 Thanks 
 Krish 
 
 On Thu, Mar 5, 2015 at 3:57 PM, Jonathan Aquilina jaquil...@eagleeyet.net 
 wrote:
 
 krish EMR wont cost you much with all the testing and data we ran through the 
 test systems as well as the large amont of data when everythign was read we 
 paid about 15.00 USD. I honestly do not think that the specs there would be 
 enough as java can be pretty ram hungry. 
 
 ---
 Regards,
 Jonathan Aquilina
 Founder Eagle Eye T
 
 On 2015-03-06 00:41, Krish Donald wrote: 
 
 Hi, 
 
 I am new to AWS and would like to setup Hadoop cluster using cloudera manager 
 for 6-7 nodes. 
 
 t2.micro on AWS; Is it enough for setting up Hadoop cluster ? 
 I would like to use free service as of now. 
 
 Please advise. 
 
 Thanks 
 Krish
 

Re: (no subject)

2015-03-05 Thread SP
I resolved it by downloaded slf4j-simple-1.7.10.jar and copied it
to $HADOOP_HOME/lib

in .bashrc added this variable.

export HADOOP_CLASSPATH=$HADOOP_HOME/lib


Issue got resolved.

thanks a lot for your response Raj.

Thanks
SP


On Thu, Mar 5, 2015 at 11:52 AM, Raj K Singh rajkrrsi...@gmail.com wrote:

 just configure logging appender in log4j setting and rerun the command
 On Mar 5, 2015 12:30 AM, SP sajid...@gmail.com wrote:

 Hello All,

 Why am I getting this error every time I execute a command. It was
 working fine with CDH4 version. When I upgraded to CDH5 version this
 message started showing up.

 does any one have resolution for this error

 sudo -u hdfs hadoop fs -ls /
 SLF4J: Failed to load class org.slf4j.impl.StaticLoggerBinder.
 SLF4J: Defaulting to no-operation (NOP) logger implementation
 SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for
 further details.
 Found 1 items
 drwxrwxrwt   - hdfs hadoop  0 2015-03-04 10:30 /tmp


 Thanks
 SP