Re: AWS Setting for setting up Hadoop cluster
I have experience with my full time job using EMR damn thing is quick and cheap. The interesting part is wrapping your head around the concepts. If you need things quickly and fast EMR is the way to go. It spawns up a number of ec2 instances by default you have 1 master and 2 core nodes. The three of them are m3.large nodes which run you 7 cents per hour. to run one years with of data which is about 1.1 billion records from the database it took 50 min from cluster spawn up to completion and shutting down of the cluster. --- Regards, Jonathan Aquilina Founder Eagle Eye T On 2015-03-05 23:41, Dieter De Witte wrote: You can install Hadoop on Amazon EC2 instances and use the free tier for new members but you can also use Amazon EMR which is not free but is up and running in a couple of seconds... 2015-03-05 23:28 GMT+01:00 Krish Donald gotomyp...@gmail.com: Hi, I am tired of setting Hadoop cluster using my laptop which has 8GB RAM. I tried 2gb for namenode and 1-1 gb for 3 datanoded so total 5gb I was using . And I was using very basic Hadoop services only. But it is so slow that I am not able to do anything on that. Hence I would like to try the AWS service now. Can anybody please help me, which configuration I should use it without paying at all? What are the tips you have for AWS ? Thanks Krish
Re: AWS Setting for setting up Hadoop cluster
You can install Hadoop on Amazon EC2 instances and use the free tier for new members but you can also use Amazon EMR which is not free but is up and running in a couple of seconds... 2015-03-05 23:28 GMT+01:00 Krish Donald gotomyp...@gmail.com: Hi, I am tired of setting Hadoop cluster using my laptop which has 8GB RAM. I tried 2gb for namenode and 1-1 gb for 3 datanoded so total 5gb I was using . And I was using very basic Hadoop services only. But it is so slow that I am not able to do anything on that. Hence I would like to try the AWS service now. Can anybody please help me, which configuration I should use it without paying at all? What are the tips you have for AWS ? Thanks Krish
Re: AWS Setting for setting up Hadoop cluster
Because I am new to AWS, I would like to explore the free service first and then later I can use EMR. Which one is fast in EC2 and free too? Thanks On Thu, Mar 5, 2015 at 2:47 PM, Jonathan Aquilina jaquil...@eagleeyet.net wrote: I have experience with my full time job using EMR damn thing is quick and cheap. The interesting part is wrapping your head around the concepts. If you need things quickly and fast EMR is the way to go. It spawns up a number of ec2 instances by default you have 1 master and 2 core nodes. The three of them are m3.large nodes which run you 7 cents per hour. to run one years with of data which is about 1.1 billion records from the database it took 50 min from cluster spawn up to completion and shutting down of the cluster. --- Regards, Jonathan Aquilina Founder Eagle Eye T On 2015-03-05 23:41, Dieter De Witte wrote: You can install Hadoop on Amazon EC2 instances and use the free tier for new members but you can also use Amazon EMR which is not free but is up and running in a couple of seconds... 2015-03-05 23:28 GMT+01:00 Krish Donald gotomyp...@gmail.com: Hi, I am tired of setting Hadoop cluster using my laptop which has 8GB RAM. I tried 2gb for namenode and 1-1 gb for 3 datanoded so total 5gb I was using . And I was using very basic Hadoop services only. But it is so slow that I am not able to do anything on that. Hence I would like to try the AWS service now. Can anybody please help me, which configuration I should use it without paying at all? What are the tips you have for AWS ? Thanks Krish
Re: t2.micro on AWS; Is it enough for setting up Hadoop cluster ?
krish EMR wont cost you much with all the testing and data we ran through the test systems as well as the large amont of data when everythign was read we paid about 15.00 USD. I honestly do not think that the specs there would be enough as java can be pretty ram hungry. --- Regards, Jonathan Aquilina Founder Eagle Eye T On 2015-03-06 00:41, Krish Donald wrote: Hi, I am new to AWS and would like to setup Hadoop cluster using cloudera manager for 6-7 nodes. t2.micro on AWS; Is it enough for setting up Hadoop cluster ? I would like to use free service as of now. Please advise. Thanks Krish
Re: t2.micro on AWS; Is it enough for setting up Hadoop cluster ?
Thanks Jonathan, I will try to explore EMR option also. Can you please let me know the configuration which you have used it? Can you please recommend for me also? I would like to setup Hadoop cluster using cloudera manager and then would like to do below things: setup kerberos setup federation setup monitoring setup hadr backup and recovery authorization using sentry backup and recovery of individual componenets performamce tuning upgrade of cdh upgrade of CM Hue User Administration Spark Solr Thanks Krish On Thu, Mar 5, 2015 at 3:57 PM, Jonathan Aquilina jaquil...@eagleeyet.net wrote: krish EMR wont cost you much with all the testing and data we ran through the test systems as well as the large amont of data when everythign was read we paid about 15.00 USD. I honestly do not think that the specs there would be enough as java can be pretty ram hungry. --- Regards, Jonathan Aquilina Founder Eagle Eye T On 2015-03-06 00:41, Krish Donald wrote: Hi, I am new to AWS and would like to setup Hadoop cluster using cloudera manager for 6-7 nodes. t2.micro on AWS; Is it enough for setting up Hadoop cluster ? I would like to use free service as of now. Please advise. Thanks Krish
Re: AWS Setting for setting up Hadoop cluster
Advantage of EMR is that you dont have to stay screwing around with installing hadoop it does all that for you so you are ready to go --- Regards, Jonathan Aquilina Founder Eagle Eye T On 2015-03-05 23:51, Krish Donald wrote: Because I am new to AWS, I would like to explore the free service first and then later I can use EMR. Which one is fast in EC2 and free too? Thanks On Thu, Mar 5, 2015 at 2:47 PM, Jonathan Aquilina jaquil...@eagleeyet.net wrote: I have experience with my full time job using EMR damn thing is quick and cheap. The interesting part is wrapping your head around the concepts. If you need things quickly and fast EMR is the way to go. It spawns up a number of ec2 instances by default you have 1 master and 2 core nodes. The three of them are m3.large nodes which run you 7 cents per hour. to run one years with of data which is about 1.1 billion records from the database it took 50 min from cluster spawn up to completion and shutting down of the cluster. --- Regards, Jonathan Aquilina Founder Eagle Eye T On 2015-03-05 23:41, Dieter De Witte wrote: You can install Hadoop on Amazon EC2 instances and use the free tier for new members but you can also use Amazon EMR which is not free but is up and running in a couple of seconds... 2015-03-05 23:28 GMT+01:00 Krish Donald gotomyp...@gmail.com: Hi, I am tired of setting Hadoop cluster using my laptop which has 8GB RAM. I tried 2gb for namenode and 1-1 gb for 3 datanoded so total 5gb I was using . And I was using very basic Hadoop services only. But it is so slow that I am not able to do anything on that. Hence I would like to try the AWS service now. Can anybody please help me, which configuration I should use it without paying at all? What are the tips you have for AWS ? Thanks Krish
AWS Setting for setting up Hadoop cluster
Hi, I am tired of setting Hadoop cluster using my laptop which has 8GB RAM. I tried 2gb for namenode and 1-1 gb for 3 datanoded so total 5gb I was using . And I was using very basic Hadoop services only. But it is so slow that I am not able to do anything on that. Hence I would like to try the AWS service now. Can anybody please help me, which configuration I should use it without paying at all? What are the tips you have for AWS ? Thanks Krish
t2.micro on AWS; Is it enough for setting up Hadoop cluster ?
Hi, I am new to AWS and would like to setup Hadoop cluster using cloudera manager for 6-7 nodes. t2.micro on AWS; Is it enough for setting up Hadoop cluster ? I would like to use free service as of now. Please advise. Thanks Krish
Re: t2.micro on AWS; Is it enough for setting up Hadoop cluster ?
what about DNS? if you have 2 computers (nn and dn) how nn knows dn ip? The script puts only this computer ip to /etc/hosts On Thu, Mar 5, 2015 at 6:39 PM, max scalf oracle.bl...@gmail.com wrote: Here is a easy way to go about assigning static name to your ec2 instance. When you get the launch an EC2-instance from aws console when you get to the point of selecting VPC, ip address screen there is a screen that says USER DATA...put the below in with appropriate host name(change CHANGE_HOST_NAME_HERE to whatever you want) and that should be able to get you static name. #!/bin/bash HOSTNAME_TAG=CHANGE_HOST_NAME_HERE cat /etc/sysconfig/network EOF NETWORKING=yes NETWORKING_IPV6=no HOSTNAME=${HOSTNAME_TAG} EOF IP=$(curl http://169.254.169.254/latest/meta-data/local-ipv4) echo ${IP} ${HOSTNAME_TAG}.localhost ${HOSTNAME_TAG} /etc/hosts echo ${HOSTNAME_TAG} /proc/sys/kernel/hostname service network restart Also note i was able to do this on couple of spot instance for cheap price, only thing is once you shut it down or someone outbids you, you loose that instance but its easy/cheap to play around with and i have used couple of m3.medium for my NN/SNN and couple of them for data nodes... On Thu, Mar 5, 2015 at 7:19 PM, Jonathan Aquilina jaquil...@eagleeyet.net wrote: I dont know how you would do that to be honest. With EMR you have destinctions master core and task nodes. If you need to change configuration you just ssh into the EMR master node. --- Regards, Jonathan Aquilina Founder Eagle Eye T On 2015-03-06 02:11, Alexander Pivovarov wrote: What is the easiest way to assign names to aws ec2 computers? I guess computer need static hostname and dns name before it can be used in hadoop cluster. On Mar 5, 2015 4:36 PM, Jonathan Aquilina jaquil...@eagleeyet.net wrote: When I started with EMR it was alot of testing and trial and error. HUE is already supported as something that can be installed from the AWS console. What I need to know is if you need this cluster on all the time or this is goign ot be what amazon call a transient cluster. Meaning you fire it up run the job and tear it back down. --- Regards, Jonathan Aquilina Founder Eagle Eye T On 2015-03-06 01:10, Krish Donald wrote: Thanks Jonathan, I will try to explore EMR option also. Can you please let me know the configuration which you have used it? Can you please recommend for me also? I would like to setup Hadoop cluster using cloudera manager and then would like to do below things: setup kerberos setup federation setup monitoring setup hadr backup and recovery authorization using sentry backup and recovery of individual componenets performamce tuning upgrade of cdh upgrade of CM Hue User Administration Spark Solr Thanks Krish On Thu, Mar 5, 2015 at 3:57 PM, Jonathan Aquilina jaquil...@eagleeyet.net wrote: krish EMR wont cost you much with all the testing and data we ran through the test systems as well as the large amont of data when everythign was read we paid about 15.00 USD. I honestly do not think that the specs there would be enough as java can be pretty ram hungry. --- Regards, Jonathan Aquilina Founder Eagle Eye T On 2015-03-06 00:41, Krish Donald wrote: Hi, I am new to AWS and would like to setup Hadoop cluster using cloudera manager for 6-7 nodes. t2.micro on AWS; Is it enough for setting up Hadoop cluster ? I would like to use free service as of now. Please advise. Thanks Krish
Re: t2.micro on AWS; Is it enough for setting up Hadoop cluster ?
I think EMR has its own limitation e.g. I want to setup hadoop 2.6.0 with kerberos + hive-1.2.0 to test my hive patch. How EMR can help me? it supports hadoop up to 2.4.0 (not even 2.4.1) http://docs.aws.amazon.com/ElasticMapReduce/latest/DeveloperGuide/emr-plan-hadoop-version.html On Thu, Mar 5, 2015 at 9:51 PM, Jonathan Aquilina jaquil...@eagleeyet.net wrote: Hi guys I know you guys want to keep costs down, but why go through all the effort to setup ec2 instances when you deploy EMR it takes the time to provision and setup the ec2 instances for you. All configuration then for the entire cluster is done on the master node of the particular cluster or setting up of additional software that is all done through the EMR console. We were doing some geospatial calculations and we loaded a 3rd party jar file called esri into the EMR cluster. I then had to pass a small bootstrap action (script) to have it distribute esri to the entire cluster. Why are you guys reinventing the wheel? --- Regards, Jonathan Aquilina Founder Eagle Eye T On 2015-03-06 03:35, Alexander Pivovarov wrote: I found the following solution to this problem I registered 2 subdomains (public and local) for each computer on https://freedns.afraid.org/subdomain/ e.g. myhadoop-nn.crabdance.com myhadoop-nn-local.crabdance.com then I added cron job which sends http requests to update public and local ip on freedns server hint: public ip is detected automatically ip address for local name can be set using request parameter address=10.x.x.x (don't forget to escape ) as a result my nn computer has 2 DNS names with currently assigned ip addresses , e.g. myhadoop-nn.crabdance.com 54.203.181.177 myhadoop-nn-local.crabdance.com 10.220.149.103 in hadoop configuration I can use local machine names to access my cluster outside of AWS I can use public names Just curious if AWS provides easier way to name EC2 computers? On Thu, Mar 5, 2015 at 5:19 PM, Jonathan Aquilina jaquil...@eagleeyet.net wrote: I dont know how you would do that to be honest. With EMR you have destinctions master core and task nodes. If you need to change configuration you just ssh into the EMR master node. --- Regards, Jonathan Aquilina Founder Eagle Eye T On 2015-03-06 02:11, Alexander Pivovarov wrote: What is the easiest way to assign names to aws ec2 computers? I guess computer need static hostname and dns name before it can be used in hadoop cluster. On Mar 5, 2015 4:36 PM, Jonathan Aquilina jaquil...@eagleeyet.net wrote: When I started with EMR it was alot of testing and trial and error. HUE is already supported as something that can be installed from the AWS console. What I need to know is if you need this cluster on all the time or this is goign ot be what amazon call a transient cluster. Meaning you fire it up run the job and tear it back down. --- Regards, Jonathan Aquilina Founder Eagle Eye T On 2015-03-06 01:10, Krish Donald wrote: Thanks Jonathan, I will try to explore EMR option also. Can you please let me know the configuration which you have used it? Can you please recommend for me also? I would like to setup Hadoop cluster using cloudera manager and then would like to do below things: setup kerberos setup federation setup monitoring setup hadr backup and recovery authorization using sentry backup and recovery of individual componenets performamce tuning upgrade of cdh upgrade of CM Hue User Administration Spark Solr Thanks Krish On Thu, Mar 5, 2015 at 3:57 PM, Jonathan Aquilina jaquil...@eagleeyet.net wrote: krish EMR wont cost you much with all the testing and data we ran through the test systems as well as the large amont of data when everythign was read we paid about 15.00 USD. I honestly do not think that the specs there would be enough as java can be pretty ram hungry. --- Regards, Jonathan Aquilina Founder Eagle Eye T On 2015-03-06 00:41, Krish Donald wrote: Hi, I am new to AWS and would like to setup Hadoop cluster using cloudera manager for 6-7 nodes. t2.micro on AWS; Is it enough for setting up Hadoop cluster ? I would like to use free service as of now. Please advise. Thanks Krish
Re: t2.micro on AWS; Is it enough for setting up Hadoop cluster ?
The only limitation I know is that of how many nodes you can have and how many instances of that particular size the host is on can support. you can load hive in EMR and then any other features of the cluster are managed at the master node level as you have SSH access there. What are the advantage of 2.6 over 2.4 for example. I just feel you guys are reinventing the wheel when amazon already caters for hadoop granted it might not be 2.6. --- Regards, Jonathan Aquilina Founder Eagle Eye T On 2015-03-06 07:31, Alexander Pivovarov wrote: I think EMR has its own limitation e.g. I want to setup hadoop 2.6.0 with kerberos + hive-1.2.0 to test my hive patch. How EMR can help me? it supports hadoop up to 2.4.0 (not even 2.4.1) http://docs.aws.amazon.com/ElasticMapReduce/latest/DeveloperGuide/emr-plan-hadoop-version.html [1] On Thu, Mar 5, 2015 at 9:51 PM, Jonathan Aquilina jaquil...@eagleeyet.net wrote: Hi guys I know you guys want to keep costs down, but why go through all the effort to setup ec2 instances when you deploy EMR it takes the time to provision and setup the ec2 instances for you. All configuration then for the entire cluster is done on the master node of the particular cluster or setting up of additional software that is all done through the EMR console. We were doing some geospatial calculations and we loaded a 3rd party jar file called esri into the EMR cluster. I then had to pass a small bootstrap action (script) to have it distribute esri to the entire cluster. Why are you guys reinventing the wheel? --- Regards, Jonathan Aquilina Founder Eagle Eye T On 2015-03-06 03:35, Alexander Pivovarov wrote: I found the following solution to this problem I registered 2 subdomains (public and local) for each computer on https://freedns.afraid.org/subdomain/ [2] e.g. myhadoop-nn.crabdance.com [3] myhadoop-nn-local.crabdance.com [4] then I added cron job which sends http requests to update public and local ip on freedns server hint: public ip is detected automatically ip address for local name can be set using request parameter address=10.x.x.x (don't forget to escape ) as a result my nn computer has 2 DNS names with currently assigned ip addresses , e.g. myhadoop-nn.crabdance.com [3] 54.203.181.177 myhadoop-nn-local.crabdance.com [4] 10.220.149.103 in hadoop configuration I can use local machine names to access my cluster outside of AWS I can use public names Just curious if AWS provides easier way to name EC2 computers? On Thu, Mar 5, 2015 at 5:19 PM, Jonathan Aquilina jaquil...@eagleeyet.net wrote: I dont know how you would do that to be honest. With EMR you have destinctions master core and task nodes. If you need to change configuration you just ssh into the EMR master node. --- Regards, Jonathan Aquilina Founder Eagle Eye T On 2015-03-06 02:11, Alexander Pivovarov wrote: What is the easiest way to assign names to aws ec2 computers? I guess computer need static hostname and dns name before it can be used in hadoop cluster. On Mar 5, 2015 4:36 PM, Jonathan Aquilina jaquil...@eagleeyet.net wrote: When I started with EMR it was alot of testing and trial and error. HUE is already supported as something that can be installed from the AWS console. What I need to know is if you need this cluster on all the time or this is goign ot be what amazon call a transient cluster. Meaning you fire it up run the job and tear it back down. --- Regards, Jonathan Aquilina Founder Eagle Eye T On 2015-03-06 01:10, Krish Donald wrote: Thanks Jonathan, I will try to explore EMR option also. Can you please let me know the configuration which you have used it? Can you please recommend for me also? I would like to setup Hadoop cluster using cloudera manager and then would like to do below things: setup kerberos setup federation setup monitoring setup hadr backup and recovery authorization using sentry backup and recovery of individual componenets performamce tuning upgrade of cdh upgrade of CM Hue User Administration Spark Solr Thanks Krish On Thu, Mar 5, 2015 at 3:57 PM, Jonathan Aquilina jaquil...@eagleeyet.net wrote: krish EMR wont cost you much with all the testing and data we ran through the test systems as well as the large amont of data when everythign was read we paid about 15.00 USD. I honestly do not think that the specs there would be enough as java can be pretty ram hungry. --- Regards, Jonathan Aquilina Founder Eagle Eye T On 2015-03-06 00:41, Krish Donald wrote: Hi, I am new to AWS and would like to setup Hadoop cluster using cloudera manager for 6-7 nodes. t2.micro on AWS; Is it enough for setting up Hadoop cluster ? I would like to use free service as of now. Please advise. Thanks Krish Links: -- [1]
Re: t2.micro on AWS; Is it enough for setting up Hadoop cluster ?
Do a reverse lookup and use the name you find. There are a few areas of Hadoopo that require reverse name lookup, but in general just create relevant entries (shared across the cluster, e.g. via Ansible if more than just a few nodes) in /etc/hosts. Not hard. On Thu, Mar 5, 2015 at 6:35 PM, Alexander Pivovarov apivova...@gmail.com wrote: I found the following solution to this problem I registered 2 subdomains (public and local) for each computer on https://freedns.afraid.org/subdomain/ e.g. myhadoop-nn.crabdance.com myhadoop-nn-local.crabdance.com then I added cron job which sends http requests to update public and local ip on freedns server hint: public ip is detected automatically ip address for local name can be set using request parameter address=10.x.x.x (don't forget to escape ) as a result my nn computer has 2 DNS names with currently assigned ip addresses , e.g. myhadoop-nn.crabdance.com 54.203.181.177 myhadoop-nn-local.crabdance.com 10.220.149.103 in hadoop configuration I can use local machine names to access my cluster outside of AWS I can use public names Just curious if AWS provides easier way to name EC2 computers? On Thu, Mar 5, 2015 at 5:19 PM, Jonathan Aquilina jaquil...@eagleeyet.net wrote: I dont know how you would do that to be honest. With EMR you have destinctions master core and task nodes. If you need to change configuration you just ssh into the EMR master node. --- Regards, Jonathan Aquilina Founder Eagle Eye T On 2015-03-06 02:11, Alexander Pivovarov wrote: What is the easiest way to assign names to aws ec2 computers? I guess computer need static hostname and dns name before it can be used in hadoop cluster. On Mar 5, 2015 4:36 PM, Jonathan Aquilina jaquil...@eagleeyet.net wrote: When I started with EMR it was alot of testing and trial and error. HUE is already supported as something that can be installed from the AWS console. What I need to know is if you need this cluster on all the time or this is goign ot be what amazon call a transient cluster. Meaning you fire it up run the job and tear it back down. --- Regards, Jonathan Aquilina Founder Eagle Eye T On 2015-03-06 01:10, Krish Donald wrote: Thanks Jonathan, I will try to explore EMR option also. Can you please let me know the configuration which you have used it? Can you please recommend for me also? I would like to setup Hadoop cluster using cloudera manager and then would like to do below things: setup kerberos setup federation setup monitoring setup hadr backup and recovery authorization using sentry backup and recovery of individual componenets performamce tuning upgrade of cdh upgrade of CM Hue User Administration Spark Solr Thanks Krish On Thu, Mar 5, 2015 at 3:57 PM, Jonathan Aquilina jaquil...@eagleeyet.net wrote: krish EMR wont cost you much with all the testing and data we ran through the test systems as well as the large amont of data when everythign was read we paid about 15.00 USD. I honestly do not think that the specs there would be enough as java can be pretty ram hungry. --- Regards, Jonathan Aquilina Founder Eagle Eye T On 2015-03-06 00:41, Krish Donald wrote: Hi, I am new to AWS and would like to setup Hadoop cluster using cloudera manager for 6-7 nodes. t2.micro on AWS; Is it enough for setting up Hadoop cluster ? I would like to use free service as of now. Please advise. Thanks Krish
Re: t2.micro on AWS; Is it enough for setting up Hadoop cluster ?
What is the easiest way to assign names to aws ec2 computers? I guess computer need static hostname and dns name before it can be used in hadoop cluster. On Mar 5, 2015 4:36 PM, Jonathan Aquilina jaquil...@eagleeyet.net wrote: When I started with EMR it was alot of testing and trial and error. HUE is already supported as something that can be installed from the AWS console. What I need to know is if you need this cluster on all the time or this is goign ot be what amazon call a transient cluster. Meaning you fire it up run the job and tear it back down. --- Regards, Jonathan Aquilina Founder Eagle Eye T On 2015-03-06 01:10, Krish Donald wrote: Thanks Jonathan, I will try to explore EMR option also. Can you please let me know the configuration which you have used it? Can you please recommend for me also? I would like to setup Hadoop cluster using cloudera manager and then would like to do below things: setup kerberos setup federation setup monitoring setup hadr backup and recovery authorization using sentry backup and recovery of individual componenets performamce tuning upgrade of cdh upgrade of CM Hue User Administration Spark Solr Thanks Krish On Thu, Mar 5, 2015 at 3:57 PM, Jonathan Aquilina jaquil...@eagleeyet.net wrote: krish EMR wont cost you much with all the testing and data we ran through the test systems as well as the large amont of data when everythign was read we paid about 15.00 USD. I honestly do not think that the specs there would be enough as java can be pretty ram hungry. --- Regards, Jonathan Aquilina Founder Eagle Eye T On 2015-03-06 00:41, Krish Donald wrote: Hi, I am new to AWS and would like to setup Hadoop cluster using cloudera manager for 6-7 nodes. t2.micro on AWS; Is it enough for setting up Hadoop cluster ? I would like to use free service as of now. Please advise. Thanks Krish
Re: t2.micro on AWS; Is it enough for setting up Hadoop cluster ?
Here is a easy way to go about assigning static name to your ec2 instance. When you get the launch an EC2-instance from aws console when you get to the point of selecting VPC, ip address screen there is a screen that says USER DATA...put the below in with appropriate host name(change CHANGE_HOST_NAME_HERE to whatever you want) and that should be able to get you static name. #!/bin/bash HOSTNAME_TAG=CHANGE_HOST_NAME_HERE cat /etc/sysconfig/network EOF NETWORKING=yes NETWORKING_IPV6=no HOSTNAME=${HOSTNAME_TAG} EOF IP=$(curl http://169.254.169.254/latest/meta-data/local-ipv4) echo ${IP} ${HOSTNAME_TAG}.localhost ${HOSTNAME_TAG} /etc/hosts echo ${HOSTNAME_TAG} /proc/sys/kernel/hostname service network restart Also note i was able to do this on couple of spot instance for cheap price, only thing is once you shut it down or someone outbids you, you loose that instance but its easy/cheap to play around with and i have used couple of m3.medium for my NN/SNN and couple of them for data nodes... On Thu, Mar 5, 2015 at 7:19 PM, Jonathan Aquilina jaquil...@eagleeyet.net wrote: I dont know how you would do that to be honest. With EMR you have destinctions master core and task nodes. If you need to change configuration you just ssh into the EMR master node. --- Regards, Jonathan Aquilina Founder Eagle Eye T On 2015-03-06 02:11, Alexander Pivovarov wrote: What is the easiest way to assign names to aws ec2 computers? I guess computer need static hostname and dns name before it can be used in hadoop cluster. On Mar 5, 2015 4:36 PM, Jonathan Aquilina jaquil...@eagleeyet.net wrote: When I started with EMR it was alot of testing and trial and error. HUE is already supported as something that can be installed from the AWS console. What I need to know is if you need this cluster on all the time or this is goign ot be what amazon call a transient cluster. Meaning you fire it up run the job and tear it back down. --- Regards, Jonathan Aquilina Founder Eagle Eye T On 2015-03-06 01:10, Krish Donald wrote: Thanks Jonathan, I will try to explore EMR option also. Can you please let me know the configuration which you have used it? Can you please recommend for me also? I would like to setup Hadoop cluster using cloudera manager and then would like to do below things: setup kerberos setup federation setup monitoring setup hadr backup and recovery authorization using sentry backup and recovery of individual componenets performamce tuning upgrade of cdh upgrade of CM Hue User Administration Spark Solr Thanks Krish On Thu, Mar 5, 2015 at 3:57 PM, Jonathan Aquilina jaquil...@eagleeyet.net wrote: krish EMR wont cost you much with all the testing and data we ran through the test systems as well as the large amont of data when everythign was read we paid about 15.00 USD. I honestly do not think that the specs there would be enough as java can be pretty ram hungry. --- Regards, Jonathan Aquilina Founder Eagle Eye T On 2015-03-06 00:41, Krish Donald wrote: Hi, I am new to AWS and would like to setup Hadoop cluster using cloudera manager for 6-7 nodes. t2.micro on AWS; Is it enough for setting up Hadoop cluster ? I would like to use free service as of now. Please advise. Thanks Krish
Re: t2.micro on AWS; Is it enough for setting up Hadoop cluster ?
ok, how we can easily put all hadoop computer names and IPs to /etc/hosts on all computers? Do you have a script? or I need manually go to each computer, get its ip and put it to /etc/hosts and then distribute /etc/hosts to all machines? Don't you think one time effort to configure freedns is easier? freedns solution works with AWS spot-instances as well. You need to create snapshot after you configure freedns, hadoop, etc on particular box. Next time you need computer you can can go to your saved snapshots and create spot-instance from it. On Thu, Mar 5, 2015 at 6:54 PM, max scalf oracle.bl...@gmail.com wrote: unfortunately without DNS you have to rely on /etc/hosts, so put in entry for all your nodes(nn,snn,dn1,dn2 etc..) on all nodes(/etc/hosts file) and i have that tested for hortonworks(using ambari) and cloudera manager and i am certainly sure it will work for MapR On Thu, Mar 5, 2015 at 8:47 PM, Alexander Pivovarov apivova...@gmail.com wrote: what about DNS? if you have 2 computers (nn and dn) how nn knows dn ip? The script puts only this computer ip to /etc/hosts On Thu, Mar 5, 2015 at 6:39 PM, max scalf oracle.bl...@gmail.com wrote: Here is a easy way to go about assigning static name to your ec2 instance. When you get the launch an EC2-instance from aws console when you get to the point of selecting VPC, ip address screen there is a screen that says USER DATA...put the below in with appropriate host name(change CHANGE_HOST_NAME_HERE to whatever you want) and that should be able to get you static name. #!/bin/bash HOSTNAME_TAG=CHANGE_HOST_NAME_HERE cat /etc/sysconfig/network EOF NETWORKING=yes NETWORKING_IPV6=no HOSTNAME=${HOSTNAME_TAG} EOF IP=$(curl http://169.254.169.254/latest/meta-data/local-ipv4) echo ${IP} ${HOSTNAME_TAG}.localhost ${HOSTNAME_TAG} /etc/hosts echo ${HOSTNAME_TAG} /proc/sys/kernel/hostname service network restart Also note i was able to do this on couple of spot instance for cheap price, only thing is once you shut it down or someone outbids you, you loose that instance but its easy/cheap to play around with and i have used couple of m3.medium for my NN/SNN and couple of them for data nodes... On Thu, Mar 5, 2015 at 7:19 PM, Jonathan Aquilina jaquil...@eagleeyet.net wrote: I dont know how you would do that to be honest. With EMR you have destinctions master core and task nodes. If you need to change configuration you just ssh into the EMR master node. --- Regards, Jonathan Aquilina Founder Eagle Eye T On 2015-03-06 02:11, Alexander Pivovarov wrote: What is the easiest way to assign names to aws ec2 computers? I guess computer need static hostname and dns name before it can be used in hadoop cluster. On Mar 5, 2015 4:36 PM, Jonathan Aquilina jaquil...@eagleeyet.net wrote: When I started with EMR it was alot of testing and trial and error. HUE is already supported as something that can be installed from the AWS console. What I need to know is if you need this cluster on all the time or this is goign ot be what amazon call a transient cluster. Meaning you fire it up run the job and tear it back down. --- Regards, Jonathan Aquilina Founder Eagle Eye T On 2015-03-06 01:10, Krish Donald wrote: Thanks Jonathan, I will try to explore EMR option also. Can you please let me know the configuration which you have used it? Can you please recommend for me also? I would like to setup Hadoop cluster using cloudera manager and then would like to do below things: setup kerberos setup federation setup monitoring setup hadr backup and recovery authorization using sentry backup and recovery of individual componenets performamce tuning upgrade of cdh upgrade of CM Hue User Administration Spark Solr Thanks Krish On Thu, Mar 5, 2015 at 3:57 PM, Jonathan Aquilina jaquil...@eagleeyet.net wrote: krish EMR wont cost you much with all the testing and data we ran through the test systems as well as the large amont of data when everythign was read we paid about 15.00 USD. I honestly do not think that the specs there would be enough as java can be pretty ram hungry. --- Regards, Jonathan Aquilina Founder Eagle Eye T On 2015-03-06 00:41, Krish Donald wrote: Hi, I am new to AWS and would like to setup Hadoop cluster using cloudera manager for 6-7 nodes. t2.micro on AWS; Is it enough for setting up Hadoop cluster ? I would like to use free service as of now. Please advise. Thanks Krish
Re: t2.micro on AWS; Is it enough for setting up Hadoop cluster ?
I dont know how you would do that to be honest. With EMR you have destinctions master core and task nodes. If you need to change configuration you just ssh into the EMR master node. --- Regards, Jonathan Aquilina Founder Eagle Eye T On 2015-03-06 02:11, Alexander Pivovarov wrote: What is the easiest way to assign names to aws ec2 computers? I guess computer need static hostname and dns name before it can be used in hadoop cluster. On Mar 5, 2015 4:36 PM, Jonathan Aquilina jaquil...@eagleeyet.net wrote: When I started with EMR it was alot of testing and trial and error. HUE is already supported as something that can be installed from the AWS console. What I need to know is if you need this cluster on all the time or this is goign ot be what amazon call a transient cluster. Meaning you fire it up run the job and tear it back down. --- Regards, Jonathan Aquilina Founder Eagle Eye T On 2015-03-06 01:10, Krish Donald wrote: Thanks Jonathan, I will try to explore EMR option also. Can you please let me know the configuration which you have used it? Can you please recommend for me also? I would like to setup Hadoop cluster using cloudera manager and then would like to do below things: setup kerberos setup federation setup monitoring setup hadr backup and recovery authorization using sentry backup and recovery of individual componenets performamce tuning upgrade of cdh upgrade of CM Hue User Administration Spark Solr Thanks Krish On Thu, Mar 5, 2015 at 3:57 PM, Jonathan Aquilina jaquil...@eagleeyet.net wrote: krish EMR wont cost you much with all the testing and data we ran through the test systems as well as the large amont of data when everythign was read we paid about 15.00 USD. I honestly do not think that the specs there would be enough as java can be pretty ram hungry. --- Regards, Jonathan Aquilina Founder Eagle Eye T On 2015-03-06 00:41, Krish Donald wrote: Hi, I am new to AWS and would like to setup Hadoop cluster using cloudera manager for 6-7 nodes. t2.micro on AWS; Is it enough for setting up Hadoop cluster ? I would like to use free service as of now. Please advise. Thanks Krish
Re: t2.micro on AWS; Is it enough for setting up Hadoop cluster ?
unfortunately without DNS you have to rely on /etc/hosts, so put in entry for all your nodes(nn,snn,dn1,dn2 etc..) on all nodes(/etc/hosts file) and i have that tested for hortonworks(using ambari) and cloudera manager and i am certainly sure it will work for MapR On Thu, Mar 5, 2015 at 8:47 PM, Alexander Pivovarov apivova...@gmail.com wrote: what about DNS? if you have 2 computers (nn and dn) how nn knows dn ip? The script puts only this computer ip to /etc/hosts On Thu, Mar 5, 2015 at 6:39 PM, max scalf oracle.bl...@gmail.com wrote: Here is a easy way to go about assigning static name to your ec2 instance. When you get the launch an EC2-instance from aws console when you get to the point of selecting VPC, ip address screen there is a screen that says USER DATA...put the below in with appropriate host name(change CHANGE_HOST_NAME_HERE to whatever you want) and that should be able to get you static name. #!/bin/bash HOSTNAME_TAG=CHANGE_HOST_NAME_HERE cat /etc/sysconfig/network EOF NETWORKING=yes NETWORKING_IPV6=no HOSTNAME=${HOSTNAME_TAG} EOF IP=$(curl http://169.254.169.254/latest/meta-data/local-ipv4) echo ${IP} ${HOSTNAME_TAG}.localhost ${HOSTNAME_TAG} /etc/hosts echo ${HOSTNAME_TAG} /proc/sys/kernel/hostname service network restart Also note i was able to do this on couple of spot instance for cheap price, only thing is once you shut it down or someone outbids you, you loose that instance but its easy/cheap to play around with and i have used couple of m3.medium for my NN/SNN and couple of them for data nodes... On Thu, Mar 5, 2015 at 7:19 PM, Jonathan Aquilina jaquil...@eagleeyet.net wrote: I dont know how you would do that to be honest. With EMR you have destinctions master core and task nodes. If you need to change configuration you just ssh into the EMR master node. --- Regards, Jonathan Aquilina Founder Eagle Eye T On 2015-03-06 02:11, Alexander Pivovarov wrote: What is the easiest way to assign names to aws ec2 computers? I guess computer need static hostname and dns name before it can be used in hadoop cluster. On Mar 5, 2015 4:36 PM, Jonathan Aquilina jaquil...@eagleeyet.net wrote: When I started with EMR it was alot of testing and trial and error. HUE is already supported as something that can be installed from the AWS console. What I need to know is if you need this cluster on all the time or this is goign ot be what amazon call a transient cluster. Meaning you fire it up run the job and tear it back down. --- Regards, Jonathan Aquilina Founder Eagle Eye T On 2015-03-06 01:10, Krish Donald wrote: Thanks Jonathan, I will try to explore EMR option also. Can you please let me know the configuration which you have used it? Can you please recommend for me also? I would like to setup Hadoop cluster using cloudera manager and then would like to do below things: setup kerberos setup federation setup monitoring setup hadr backup and recovery authorization using sentry backup and recovery of individual componenets performamce tuning upgrade of cdh upgrade of CM Hue User Administration Spark Solr Thanks Krish On Thu, Mar 5, 2015 at 3:57 PM, Jonathan Aquilina jaquil...@eagleeyet.net wrote: krish EMR wont cost you much with all the testing and data we ran through the test systems as well as the large amont of data when everythign was read we paid about 15.00 USD. I honestly do not think that the specs there would be enough as java can be pretty ram hungry. --- Regards, Jonathan Aquilina Founder Eagle Eye T On 2015-03-06 00:41, Krish Donald wrote: Hi, I am new to AWS and would like to setup Hadoop cluster using cloudera manager for 6-7 nodes. t2.micro on AWS; Is it enough for setting up Hadoop cluster ? I would like to use free service as of now. Please advise. Thanks Krish
Re: (no subject)
just configure logging appender in log4j setting and rerun the command On Mar 5, 2015 12:30 AM, SP sajid...@gmail.com wrote: Hello All, Why am I getting this error every time I execute a command. It was working fine with CDH4 version. When I upgraded to CDH5 version this message started showing up. does any one have resolution for this error sudo -u hdfs hadoop fs -ls / SLF4J: Failed to load class org.slf4j.impl.StaticLoggerBinder. SLF4J: Defaulting to no-operation (NOP) logger implementation SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details. Found 1 items drwxrwxrwt - hdfs hadoop 0 2015-03-04 10:30 /tmp Thanks SP
Re: HDFS Append Problem
Please take this up CDH mailing list. From: Molnár Bálint molnarcsi...@gmail.com Sent: Thursday, March 05, 2015 4:53 AM To: user@hadoop.apache.org Subject: HDFS Append Problem Hi Everyone! I 'm experiencing an annoying problem. My Scenario is: I want to store lots of small files (1-2MB max) in map files. These files will come periodically during the days, so I cannot use the factory writer because it will create a lot of small MapFiles. (I want to store these files in the HDFS immediately.) I' m trying to create a code to append Map files. I use the org.apache.hadoop.fs.FileSystem append() method which calls the org.apache.hadoop.hdfs.DistributedFileSystem append() method to do the job. My code works well, because the stock MapFile Reader can retrieve the files. My problem appears in the upload phase. When I try to upload a set (1GB) of small files, the free space of the HDFS decreases fast. The program only uploads 400MB but according to the Cloudera Manager it is more than 5GB. The interesting part is that, when I terminate the upload, and wait 1-2 minutes, the HDFS goes back to normal size (500MB), and none of my files are lost. If I don't terminate the upload, the HDFS goes out of free space and the program gets errors. I'm using cloudera quickvm 5.3 for testing, and the hdfs replication number is 1. Any ideas how to solve this issue? Thanks
Re: File is not written on HDFS after running libhdfs C API
Can you share your core-site.xml here? On Thu, Mar 5, 2015 at 4:32 PM, Alexandru Calin alexandrucali...@gmail.com wrote: No change at all, I've added them at the start and end of the CLASSPATH, either way it still writes the file on the local fs. I've also restarted hadoop. On Thu, Mar 5, 2015 at 10:22 AM, Azuryy Yu azury...@gmail.com wrote: Yes, you should do it:) On Thu, Mar 5, 2015 at 4:17 PM, Alexandru Calin alexandrucali...@gmail.com wrote: Wow, you are so right! it's on the local filesystem! Do I have to manually specify hdfs-site.xml and core-site.xml in the CLASSPATH variable ? Like this: CLASSPATH=$CLASSPATH:/usr/local/hadoop/etc/hadoop/core-site.xml ? On Thu, Mar 5, 2015 at 10:04 AM, Azuryy Yu azury...@gmail.com wrote: you need to include core-site.xml as well. and I think you can find '/tmp/testfile.txt' on your local disk, instead of HDFS. if so, My guess is right. because you don't include core-site.xml, then your Filesystem schema is file:// by default, not hdfs://. On Thu, Mar 5, 2015 at 3:52 PM, Alexandru Calin alexandrucali...@gmail.com wrote: I am trying to run the basic libhdfs example, it compiles ok, and actually runs ok, and executes the whole program, but I cannot see the file on the HDFS. It is said here http://hadoop.apache.org/docs/r1.2.1/libhdfs.html, that you have to include *the right configuration directory containing hdfs-site.xml* My hdfs-site.xml: configuration property namedfs.replication/name value1/value /property property namedfs.namenode.name.dir/name valuefile:///usr/local/hadoop/hadoop_data/hdfs/namenode/value /property property namedfs.datanode.data.dir/name valuefile:///usr/local/hadoop/hadoop_store/hdfs/datanode/value /property/configuration I generate my classpath with this: #!/bin/bashexport CLASSPATH=/usr/local/hadoop/ declare -a subdirs=(hdfs tools common yarn mapreduce)for subdir in ${subdirs[@]}do for file in $(find /usr/local/hadoop/share/hadoop/$subdir -name *.jar) do export CLASSPATH=$CLASSPATH:$file donedone and I also add export CLASSPATH=$CLASSPATH:/usr/local/hadoop/etc/hadoop , where my *hdfs-site.xml* reside. MY LD_LIBRARY_PATH = /usr/local/hadoop/lib/native:/usr/lib/jvm/java-7-openjdk-amd64/jre/lib/amd64/server Code: #include hdfs.h#include stdio.h#include string.h#include stdio.h#include stdlib.h int main(int argc, char **argv) { hdfsFS fs = hdfsConnect(default, 0); const char* writePath = /tmp/testfile.txt; hdfsFile writeFile = hdfsOpenFile(fs, writePath, O_WRONLY|O_CREAT, 0, 0, 0); if(!writeFile) { printf(Failed to open %s for writing!\n, writePath); exit(-1); } printf(\nfile opened\n); char* buffer = Hello, World!; tSize num_written_bytes = hdfsWrite(fs, writeFile, (void*)buffer, strlen(buffer)+1); printf(\nWrote %d bytes\n, (int)num_written_bytes); if (hdfsFlush(fs, writeFile)) { printf(Failed to 'flush' %s\n, writePath); exit(-1); } hdfsCloseFile(fs, writeFile); hdfsDisconnect(fs); return 0;} It compiles and runs without error, but I cannot see the file on HDFS. I have Hadoop 2.6.0 on Ubuntu 14.04 64bit. Any ideas on this ?
Re: File is not written on HDFS after running libhdfs C API
you can try: for file in `hadoop classpath | tr ':' ' ' | sort | uniq` ;do export CLASSPATH=$CLASSPATH:$file done On Thu, Mar 5, 2015 at 4:48 PM, Alexandru Calin alexandrucali...@gmail.com wrote: This is how core-site.xml looks: configuration property namefs.defaultFS/name valuehdfs://localhost:9000/value /property /configuration On Thu, Mar 5, 2015 at 10:32 AM, Alexandru Calin alexandrucali...@gmail.com wrote: No change at all, I've added them at the start and end of the CLASSPATH, either way it still writes the file on the local fs. I've also restarted hadoop. On Thu, Mar 5, 2015 at 10:22 AM, Azuryy Yu azury...@gmail.com wrote: Yes, you should do it:) On Thu, Mar 5, 2015 at 4:17 PM, Alexandru Calin alexandrucali...@gmail.com wrote: Wow, you are so right! it's on the local filesystem! Do I have to manually specify hdfs-site.xml and core-site.xml in the CLASSPATH variable ? Like this: CLASSPATH=$CLASSPATH:/usr/local/hadoop/etc/hadoop/core-site.xml ? On Thu, Mar 5, 2015 at 10:04 AM, Azuryy Yu azury...@gmail.com wrote: you need to include core-site.xml as well. and I think you can find '/tmp/testfile.txt' on your local disk, instead of HDFS. if so, My guess is right. because you don't include core-site.xml, then your Filesystem schema is file:// by default, not hdfs://. On Thu, Mar 5, 2015 at 3:52 PM, Alexandru Calin alexandrucali...@gmail.com wrote: I am trying to run the basic libhdfs example, it compiles ok, and actually runs ok, and executes the whole program, but I cannot see the file on the HDFS. It is said here http://hadoop.apache.org/docs/r1.2.1/libhdfs.html, that you have to include *the right configuration directory containing hdfs-site.xml* My hdfs-site.xml: configuration property namedfs.replication/name value1/value /property property namedfs.namenode.name.dir/name valuefile:///usr/local/hadoop/hadoop_data/hdfs/namenode/value /property property namedfs.datanode.data.dir/name valuefile:///usr/local/hadoop/hadoop_store/hdfs/datanode/value /property/configuration I generate my classpath with this: #!/bin/bashexport CLASSPATH=/usr/local/hadoop/ declare -a subdirs=(hdfs tools common yarn mapreduce)for subdir in ${subdirs[@]}do for file in $(find /usr/local/hadoop/share/hadoop/$subdir -name *.jar) do export CLASSPATH=$CLASSPATH:$file donedone and I also add export CLASSPATH=$CLASSPATH:/usr/local/hadoop/etc/hadoop , where my *hdfs-site.xml* reside. MY LD_LIBRARY_PATH = /usr/local/hadoop/lib/native:/usr/lib/jvm/java-7-openjdk-amd64/jre/lib/amd64/server Code: #include hdfs.h#include stdio.h#include string.h#include stdio.h#include stdlib.h int main(int argc, char **argv) { hdfsFS fs = hdfsConnect(default, 0); const char* writePath = /tmp/testfile.txt; hdfsFile writeFile = hdfsOpenFile(fs, writePath, O_WRONLY|O_CREAT, 0, 0, 0); if(!writeFile) { printf(Failed to open %s for writing!\n, writePath); exit(-1); } printf(\nfile opened\n); char* buffer = Hello, World!; tSize num_written_bytes = hdfsWrite(fs, writeFile, (void*)buffer, strlen(buffer)+1); printf(\nWrote %d bytes\n, (int)num_written_bytes); if (hdfsFlush(fs, writeFile)) { printf(Failed to 'flush' %s\n, writePath); exit(-1); } hdfsCloseFile(fs, writeFile); hdfsDisconnect(fs); return 0;} It compiles and runs without error, but I cannot see the file on HDFS. I have Hadoop 2.6.0 on Ubuntu 14.04 64bit. Any ideas on this ?
Re: File is not written on HDFS after running libhdfs C API
This is how core-site.xml looks: configuration property namefs.defaultFS/name valuehdfs://localhost:9000/value /property /configuration On Thu, Mar 5, 2015 at 10:32 AM, Alexandru Calin alexandrucali...@gmail.com wrote: No change at all, I've added them at the start and end of the CLASSPATH, either way it still writes the file on the local fs. I've also restarted hadoop. On Thu, Mar 5, 2015 at 10:22 AM, Azuryy Yu azury...@gmail.com wrote: Yes, you should do it:) On Thu, Mar 5, 2015 at 4:17 PM, Alexandru Calin alexandrucali...@gmail.com wrote: Wow, you are so right! it's on the local filesystem! Do I have to manually specify hdfs-site.xml and core-site.xml in the CLASSPATH variable ? Like this: CLASSPATH=$CLASSPATH:/usr/local/hadoop/etc/hadoop/core-site.xml ? On Thu, Mar 5, 2015 at 10:04 AM, Azuryy Yu azury...@gmail.com wrote: you need to include core-site.xml as well. and I think you can find '/tmp/testfile.txt' on your local disk, instead of HDFS. if so, My guess is right. because you don't include core-site.xml, then your Filesystem schema is file:// by default, not hdfs://. On Thu, Mar 5, 2015 at 3:52 PM, Alexandru Calin alexandrucali...@gmail.com wrote: I am trying to run the basic libhdfs example, it compiles ok, and actually runs ok, and executes the whole program, but I cannot see the file on the HDFS. It is said here http://hadoop.apache.org/docs/r1.2.1/libhdfs.html, that you have to include *the right configuration directory containing hdfs-site.xml* My hdfs-site.xml: configuration property namedfs.replication/name value1/value /property property namedfs.namenode.name.dir/name valuefile:///usr/local/hadoop/hadoop_data/hdfs/namenode/value /property property namedfs.datanode.data.dir/name valuefile:///usr/local/hadoop/hadoop_store/hdfs/datanode/value /property/configuration I generate my classpath with this: #!/bin/bashexport CLASSPATH=/usr/local/hadoop/ declare -a subdirs=(hdfs tools common yarn mapreduce)for subdir in ${subdirs[@]}do for file in $(find /usr/local/hadoop/share/hadoop/$subdir -name *.jar) do export CLASSPATH=$CLASSPATH:$file donedone and I also add export CLASSPATH=$CLASSPATH:/usr/local/hadoop/etc/hadoop , where my *hdfs-site.xml* reside. MY LD_LIBRARY_PATH = /usr/local/hadoop/lib/native:/usr/lib/jvm/java-7-openjdk-amd64/jre/lib/amd64/server Code: #include hdfs.h#include stdio.h#include string.h#include stdio.h#include stdlib.h int main(int argc, char **argv) { hdfsFS fs = hdfsConnect(default, 0); const char* writePath = /tmp/testfile.txt; hdfsFile writeFile = hdfsOpenFile(fs, writePath, O_WRONLY|O_CREAT, 0, 0, 0); if(!writeFile) { printf(Failed to open %s for writing!\n, writePath); exit(-1); } printf(\nfile opened\n); char* buffer = Hello, World!; tSize num_written_bytes = hdfsWrite(fs, writeFile, (void*)buffer, strlen(buffer)+1); printf(\nWrote %d bytes\n, (int)num_written_bytes); if (hdfsFlush(fs, writeFile)) { printf(Failed to 'flush' %s\n, writePath); exit(-1); } hdfsCloseFile(fs, writeFile); hdfsDisconnect(fs); return 0;} It compiles and runs without error, but I cannot see the file on HDFS. I have Hadoop 2.6.0 on Ubuntu 14.04 64bit. Any ideas on this ?
Re: File is not written on HDFS after running libhdfs C API
Now I've also started YARN ( just for the sake of trying anything), the config for mapred-site.xml and yarn-site.xml are those on apache website. A *jps *command shows: 11257 NodeManager 11129 ResourceManager 11815 Jps 10620 NameNode 10966 SecondaryNameNode On Thu, Mar 5, 2015 at 10:48 AM, Azuryy Yu azury...@gmail.com wrote: Can you share your core-site.xml here? On Thu, Mar 5, 2015 at 4:32 PM, Alexandru Calin alexandrucali...@gmail.com wrote: No change at all, I've added them at the start and end of the CLASSPATH, either way it still writes the file on the local fs. I've also restarted hadoop. On Thu, Mar 5, 2015 at 10:22 AM, Azuryy Yu azury...@gmail.com wrote: Yes, you should do it:) On Thu, Mar 5, 2015 at 4:17 PM, Alexandru Calin alexandrucali...@gmail.com wrote: Wow, you are so right! it's on the local filesystem! Do I have to manually specify hdfs-site.xml and core-site.xml in the CLASSPATH variable ? Like this: CLASSPATH=$CLASSPATH:/usr/local/hadoop/etc/hadoop/core-site.xml ? On Thu, Mar 5, 2015 at 10:04 AM, Azuryy Yu azury...@gmail.com wrote: you need to include core-site.xml as well. and I think you can find '/tmp/testfile.txt' on your local disk, instead of HDFS. if so, My guess is right. because you don't include core-site.xml, then your Filesystem schema is file:// by default, not hdfs://. On Thu, Mar 5, 2015 at 3:52 PM, Alexandru Calin alexandrucali...@gmail.com wrote: I am trying to run the basic libhdfs example, it compiles ok, and actually runs ok, and executes the whole program, but I cannot see the file on the HDFS. It is said here http://hadoop.apache.org/docs/r1.2.1/libhdfs.html, that you have to include *the right configuration directory containing hdfs-site.xml* My hdfs-site.xml: configuration property namedfs.replication/name value1/value /property property namedfs.namenode.name.dir/name valuefile:///usr/local/hadoop/hadoop_data/hdfs/namenode/value /property property namedfs.datanode.data.dir/name valuefile:///usr/local/hadoop/hadoop_store/hdfs/datanode/value /property/configuration I generate my classpath with this: #!/bin/bashexport CLASSPATH=/usr/local/hadoop/ declare -a subdirs=(hdfs tools common yarn mapreduce)for subdir in ${subdirs[@]}do for file in $(find /usr/local/hadoop/share/hadoop/$subdir -name *.jar) do export CLASSPATH=$CLASSPATH:$file donedone and I also add export CLASSPATH=$CLASSPATH:/usr/local/hadoop/etc/hadoop , where my *hdfs-site.xml* reside. MY LD_LIBRARY_PATH = /usr/local/hadoop/lib/native:/usr/lib/jvm/java-7-openjdk-amd64/jre/lib/amd64/server Code: #include hdfs.h#include stdio.h#include string.h#include stdio.h#include stdlib.h int main(int argc, char **argv) { hdfsFS fs = hdfsConnect(default, 0); const char* writePath = /tmp/testfile.txt; hdfsFile writeFile = hdfsOpenFile(fs, writePath, O_WRONLY|O_CREAT, 0, 0, 0); if(!writeFile) { printf(Failed to open %s for writing!\n, writePath); exit(-1); } printf(\nfile opened\n); char* buffer = Hello, World!; tSize num_written_bytes = hdfsWrite(fs, writeFile, (void*)buffer, strlen(buffer)+1); printf(\nWrote %d bytes\n, (int)num_written_bytes); if (hdfsFlush(fs, writeFile)) { printf(Failed to 'flush' %s\n, writePath); exit(-1); } hdfsCloseFile(fs, writeFile); hdfsDisconnect(fs); return 0;} It compiles and runs without error, but I cannot see the file on HDFS. I have Hadoop 2.6.0 on Ubuntu 14.04 64bit. Any ideas on this ?
Re: File is not written on HDFS after running libhdfs C API
Wow, you are so right! it's on the local filesystem! Do I have to manually specify hdfs-site.xml and core-site.xml in the CLASSPATH variable ? Like this: CLASSPATH=$CLASSPATH:/usr/local/hadoop/etc/hadoop/core-site.xml ? On Thu, Mar 5, 2015 at 10:04 AM, Azuryy Yu azury...@gmail.com wrote: you need to include core-site.xml as well. and I think you can find '/tmp/testfile.txt' on your local disk, instead of HDFS. if so, My guess is right. because you don't include core-site.xml, then your Filesystem schema is file:// by default, not hdfs://. On Thu, Mar 5, 2015 at 3:52 PM, Alexandru Calin alexandrucali...@gmail.com wrote: I am trying to run the basic libhdfs example, it compiles ok, and actually runs ok, and executes the whole program, but I cannot see the file on the HDFS. It is said here http://hadoop.apache.org/docs/r1.2.1/libhdfs.html, that you have to include *the right configuration directory containing hdfs-site.xml* My hdfs-site.xml: configuration property namedfs.replication/name value1/value /property property namedfs.namenode.name.dir/name valuefile:///usr/local/hadoop/hadoop_data/hdfs/namenode/value /property property namedfs.datanode.data.dir/name valuefile:///usr/local/hadoop/hadoop_store/hdfs/datanode/value /property/configuration I generate my classpath with this: #!/bin/bashexport CLASSPATH=/usr/local/hadoop/ declare -a subdirs=(hdfs tools common yarn mapreduce)for subdir in ${subdirs[@]}do for file in $(find /usr/local/hadoop/share/hadoop/$subdir -name *.jar) do export CLASSPATH=$CLASSPATH:$file donedone and I also add export CLASSPATH=$CLASSPATH:/usr/local/hadoop/etc/hadoop , where my *hdfs-site.xml* reside. MY LD_LIBRARY_PATH = /usr/local/hadoop/lib/native:/usr/lib/jvm/java-7-openjdk-amd64/jre/lib/amd64/server Code: #include hdfs.h#include stdio.h#include string.h#include stdio.h#include stdlib.h int main(int argc, char **argv) { hdfsFS fs = hdfsConnect(default, 0); const char* writePath = /tmp/testfile.txt; hdfsFile writeFile = hdfsOpenFile(fs, writePath, O_WRONLY|O_CREAT, 0, 0, 0); if(!writeFile) { printf(Failed to open %s for writing!\n, writePath); exit(-1); } printf(\nfile opened\n); char* buffer = Hello, World!; tSize num_written_bytes = hdfsWrite(fs, writeFile, (void*)buffer, strlen(buffer)+1); printf(\nWrote %d bytes\n, (int)num_written_bytes); if (hdfsFlush(fs, writeFile)) { printf(Failed to 'flush' %s\n, writePath); exit(-1); } hdfsCloseFile(fs, writeFile); hdfsDisconnect(fs); return 0;} It compiles and runs without error, but I cannot see the file on HDFS. I have Hadoop 2.6.0 on Ubuntu 14.04 64bit. Any ideas on this ?
Re: File is not written on HDFS after running libhdfs C API
No change at all, I've added them at the start and end of the CLASSPATH, either way it still writes the file on the local fs. I've also restarted hadoop. On Thu, Mar 5, 2015 at 10:22 AM, Azuryy Yu azury...@gmail.com wrote: Yes, you should do it:) On Thu, Mar 5, 2015 at 4:17 PM, Alexandru Calin alexandrucali...@gmail.com wrote: Wow, you are so right! it's on the local filesystem! Do I have to manually specify hdfs-site.xml and core-site.xml in the CLASSPATH variable ? Like this: CLASSPATH=$CLASSPATH:/usr/local/hadoop/etc/hadoop/core-site.xml ? On Thu, Mar 5, 2015 at 10:04 AM, Azuryy Yu azury...@gmail.com wrote: you need to include core-site.xml as well. and I think you can find '/tmp/testfile.txt' on your local disk, instead of HDFS. if so, My guess is right. because you don't include core-site.xml, then your Filesystem schema is file:// by default, not hdfs://. On Thu, Mar 5, 2015 at 3:52 PM, Alexandru Calin alexandrucali...@gmail.com wrote: I am trying to run the basic libhdfs example, it compiles ok, and actually runs ok, and executes the whole program, but I cannot see the file on the HDFS. It is said here http://hadoop.apache.org/docs/r1.2.1/libhdfs.html, that you have to include *the right configuration directory containing hdfs-site.xml* My hdfs-site.xml: configuration property namedfs.replication/name value1/value /property property namedfs.namenode.name.dir/name valuefile:///usr/local/hadoop/hadoop_data/hdfs/namenode/value /property property namedfs.datanode.data.dir/name valuefile:///usr/local/hadoop/hadoop_store/hdfs/datanode/value /property/configuration I generate my classpath with this: #!/bin/bashexport CLASSPATH=/usr/local/hadoop/ declare -a subdirs=(hdfs tools common yarn mapreduce)for subdir in ${subdirs[@]}do for file in $(find /usr/local/hadoop/share/hadoop/$subdir -name *.jar) do export CLASSPATH=$CLASSPATH:$file donedone and I also add export CLASSPATH=$CLASSPATH:/usr/local/hadoop/etc/hadoop , where my *hdfs-site.xml* reside. MY LD_LIBRARY_PATH = /usr/local/hadoop/lib/native:/usr/lib/jvm/java-7-openjdk-amd64/jre/lib/amd64/server Code: #include hdfs.h#include stdio.h#include string.h#include stdio.h#include stdlib.h int main(int argc, char **argv) { hdfsFS fs = hdfsConnect(default, 0); const char* writePath = /tmp/testfile.txt; hdfsFile writeFile = hdfsOpenFile(fs, writePath, O_WRONLY|O_CREAT, 0, 0, 0); if(!writeFile) { printf(Failed to open %s for writing!\n, writePath); exit(-1); } printf(\nfile opened\n); char* buffer = Hello, World!; tSize num_written_bytes = hdfsWrite(fs, writeFile, (void*)buffer, strlen(buffer)+1); printf(\nWrote %d bytes\n, (int)num_written_bytes); if (hdfsFlush(fs, writeFile)) { printf(Failed to 'flush' %s\n, writePath); exit(-1); } hdfsCloseFile(fs, writeFile); hdfsDisconnect(fs); return 0;} It compiles and runs without error, but I cannot see the file on HDFS. I have Hadoop 2.6.0 on Ubuntu 14.04 64bit. Any ideas on this ?
Re: File is not written on HDFS after running libhdfs C API
Yes, you should do it:) On Thu, Mar 5, 2015 at 4:17 PM, Alexandru Calin alexandrucali...@gmail.com wrote: Wow, you are so right! it's on the local filesystem! Do I have to manually specify hdfs-site.xml and core-site.xml in the CLASSPATH variable ? Like this: CLASSPATH=$CLASSPATH:/usr/local/hadoop/etc/hadoop/core-site.xml ? On Thu, Mar 5, 2015 at 10:04 AM, Azuryy Yu azury...@gmail.com wrote: you need to include core-site.xml as well. and I think you can find '/tmp/testfile.txt' on your local disk, instead of HDFS. if so, My guess is right. because you don't include core-site.xml, then your Filesystem schema is file:// by default, not hdfs://. On Thu, Mar 5, 2015 at 3:52 PM, Alexandru Calin alexandrucali...@gmail.com wrote: I am trying to run the basic libhdfs example, it compiles ok, and actually runs ok, and executes the whole program, but I cannot see the file on the HDFS. It is said here http://hadoop.apache.org/docs/r1.2.1/libhdfs.html, that you have to include *the right configuration directory containing hdfs-site.xml* My hdfs-site.xml: configuration property namedfs.replication/name value1/value /property property namedfs.namenode.name.dir/name valuefile:///usr/local/hadoop/hadoop_data/hdfs/namenode/value /property property namedfs.datanode.data.dir/name valuefile:///usr/local/hadoop/hadoop_store/hdfs/datanode/value /property/configuration I generate my classpath with this: #!/bin/bashexport CLASSPATH=/usr/local/hadoop/ declare -a subdirs=(hdfs tools common yarn mapreduce)for subdir in ${subdirs[@]}do for file in $(find /usr/local/hadoop/share/hadoop/$subdir -name *.jar) do export CLASSPATH=$CLASSPATH:$file donedone and I also add export CLASSPATH=$CLASSPATH:/usr/local/hadoop/etc/hadoop , where my *hdfs-site.xml* reside. MY LD_LIBRARY_PATH = /usr/local/hadoop/lib/native:/usr/lib/jvm/java-7-openjdk-amd64/jre/lib/amd64/server Code: #include hdfs.h#include stdio.h#include string.h#include stdio.h#include stdlib.h int main(int argc, char **argv) { hdfsFS fs = hdfsConnect(default, 0); const char* writePath = /tmp/testfile.txt; hdfsFile writeFile = hdfsOpenFile(fs, writePath, O_WRONLY|O_CREAT, 0, 0, 0); if(!writeFile) { printf(Failed to open %s for writing!\n, writePath); exit(-1); } printf(\nfile opened\n); char* buffer = Hello, World!; tSize num_written_bytes = hdfsWrite(fs, writeFile, (void*)buffer, strlen(buffer)+1); printf(\nWrote %d bytes\n, (int)num_written_bytes); if (hdfsFlush(fs, writeFile)) { printf(Failed to 'flush' %s\n, writePath); exit(-1); } hdfsCloseFile(fs, writeFile); hdfsDisconnect(fs); return 0;} It compiles and runs without error, but I cannot see the file on HDFS. I have Hadoop 2.6.0 on Ubuntu 14.04 64bit. Any ideas on this ?
Re: File is not written on HDFS after running libhdfs C API
You don't need to start Yarn if you only want to write HDFS using C API. and you also don't need to restart HDFS. On Thu, Mar 5, 2015 at 4:58 PM, Alexandru Calin alexandrucali...@gmail.com wrote: Now I've also started YARN ( just for the sake of trying anything), the config for mapred-site.xml and yarn-site.xml are those on apache website. A *jps *command shows: 11257 NodeManager 11129 ResourceManager 11815 Jps 10620 NameNode 10966 SecondaryNameNode On Thu, Mar 5, 2015 at 10:48 AM, Azuryy Yu azury...@gmail.com wrote: Can you share your core-site.xml here? On Thu, Mar 5, 2015 at 4:32 PM, Alexandru Calin alexandrucali...@gmail.com wrote: No change at all, I've added them at the start and end of the CLASSPATH, either way it still writes the file on the local fs. I've also restarted hadoop. On Thu, Mar 5, 2015 at 10:22 AM, Azuryy Yu azury...@gmail.com wrote: Yes, you should do it:) On Thu, Mar 5, 2015 at 4:17 PM, Alexandru Calin alexandrucali...@gmail.com wrote: Wow, you are so right! it's on the local filesystem! Do I have to manually specify hdfs-site.xml and core-site.xml in the CLASSPATH variable ? Like this: CLASSPATH=$CLASSPATH:/usr/local/hadoop/etc/hadoop/core-site.xml ? On Thu, Mar 5, 2015 at 10:04 AM, Azuryy Yu azury...@gmail.com wrote: you need to include core-site.xml as well. and I think you can find '/tmp/testfile.txt' on your local disk, instead of HDFS. if so, My guess is right. because you don't include core-site.xml, then your Filesystem schema is file:// by default, not hdfs://. On Thu, Mar 5, 2015 at 3:52 PM, Alexandru Calin alexandrucali...@gmail.com wrote: I am trying to run the basic libhdfs example, it compiles ok, and actually runs ok, and executes the whole program, but I cannot see the file on the HDFS. It is said here http://hadoop.apache.org/docs/r1.2.1/libhdfs.html, that you have to include *the right configuration directory containing hdfs-site.xml* My hdfs-site.xml: configuration property namedfs.replication/name value1/value /property property namedfs.namenode.name.dir/name valuefile:///usr/local/hadoop/hadoop_data/hdfs/namenode/value /property property namedfs.datanode.data.dir/name valuefile:///usr/local/hadoop/hadoop_store/hdfs/datanode/value /property/configuration I generate my classpath with this: #!/bin/bashexport CLASSPATH=/usr/local/hadoop/ declare -a subdirs=(hdfs tools common yarn mapreduce)for subdir in ${subdirs[@]}do for file in $(find /usr/local/hadoop/share/hadoop/$subdir -name *.jar) do export CLASSPATH=$CLASSPATH:$file donedone and I also add export CLASSPATH=$CLASSPATH:/usr/local/hadoop/etc/hadoop , where my *hdfs-site.xml* reside. MY LD_LIBRARY_PATH = /usr/local/hadoop/lib/native:/usr/lib/jvm/java-7-openjdk-amd64/jre/lib/amd64/server Code: #include hdfs.h#include stdio.h#include string.h#include stdio.h#include stdlib.h int main(int argc, char **argv) { hdfsFS fs = hdfsConnect(default, 0); const char* writePath = /tmp/testfile.txt; hdfsFile writeFile = hdfsOpenFile(fs, writePath, O_WRONLY|O_CREAT, 0, 0, 0); if(!writeFile) { printf(Failed to open %s for writing!\n, writePath); exit(-1); } printf(\nfile opened\n); char* buffer = Hello, World!; tSize num_written_bytes = hdfsWrite(fs, writeFile, (void*)buffer, strlen(buffer)+1); printf(\nWrote %d bytes\n, (int)num_written_bytes); if (hdfsFlush(fs, writeFile)) { printf(Failed to 'flush' %s\n, writePath); exit(-1); } hdfsCloseFile(fs, writeFile); hdfsDisconnect(fs); return 0;} It compiles and runs without error, but I cannot see the file on HDFS. I have Hadoop 2.6.0 on Ubuntu 14.04 64bit. Any ideas on this ?
Re: File is not written on HDFS after running libhdfs C API
you need to include core-site.xml as well. and I think you can find '/tmp/testfile.txt' on your local disk, instead of HDFS. if so, My guess is right. because you don't include core-site.xml, then your Filesystem schema is file:// by default, not hdfs://. On Thu, Mar 5, 2015 at 3:52 PM, Alexandru Calin alexandrucali...@gmail.com wrote: I am trying to run the basic libhdfs example, it compiles ok, and actually runs ok, and executes the whole program, but I cannot see the file on the HDFS. It is said here http://hadoop.apache.org/docs/r1.2.1/libhdfs.html, that you have to include *the right configuration directory containing hdfs-site.xml* My hdfs-site.xml: configuration property namedfs.replication/name value1/value /property property namedfs.namenode.name.dir/name valuefile:///usr/local/hadoop/hadoop_data/hdfs/namenode/value /property property namedfs.datanode.data.dir/name valuefile:///usr/local/hadoop/hadoop_store/hdfs/datanode/value /property/configuration I generate my classpath with this: #!/bin/bashexport CLASSPATH=/usr/local/hadoop/ declare -a subdirs=(hdfs tools common yarn mapreduce)for subdir in ${subdirs[@]}do for file in $(find /usr/local/hadoop/share/hadoop/$subdir -name *.jar) do export CLASSPATH=$CLASSPATH:$file donedone and I also add export CLASSPATH=$CLASSPATH:/usr/local/hadoop/etc/hadoop , where my *hdfs-site.xml* reside. MY LD_LIBRARY_PATH = /usr/local/hadoop/lib/native:/usr/lib/jvm/java-7-openjdk-amd64/jre/lib/amd64/server Code: #include hdfs.h#include stdio.h#include string.h#include stdio.h#include stdlib.h int main(int argc, char **argv) { hdfsFS fs = hdfsConnect(default, 0); const char* writePath = /tmp/testfile.txt; hdfsFile writeFile = hdfsOpenFile(fs, writePath, O_WRONLY|O_CREAT, 0, 0, 0); if(!writeFile) { printf(Failed to open %s for writing!\n, writePath); exit(-1); } printf(\nfile opened\n); char* buffer = Hello, World!; tSize num_written_bytes = hdfsWrite(fs, writeFile, (void*)buffer, strlen(buffer)+1); printf(\nWrote %d bytes\n, (int)num_written_bytes); if (hdfsFlush(fs, writeFile)) { printf(Failed to 'flush' %s\n, writePath); exit(-1); } hdfsCloseFile(fs, writeFile); hdfsDisconnect(fs); return 0;} It compiles and runs without error, but I cannot see the file on HDFS. I have Hadoop 2.6.0 on Ubuntu 14.04 64bit. Any ideas on this ?
Cloudera monitoring Services not starting
Hi, I have setup a 4 node cliuster , 1 namenode and 3 datanode using cloudera manager 5.2 . But it is not starting Cloudra Monitorinf service and for Hosts health it is showing unknown. How can I disable Monitoring service completely and work with only cluster different feature. Thanks Krish
Re: Cloudera monitoring Services not starting
Please ask cloudera related questions on Cloudera’s forums. community.cloudera.com On Mar 5, 2015, at 11:56 AM, Krish Donald gotomyp...@gmail.com wrote: Hi, I have setup a 4 node cliuster , 1 namenode and 3 datanode using cloudera manager 5.2 . But it is not starting Cloudra Monitorinf service and for Hosts health it is showing unknown. How can I disable Monitoring service completely and work with only cluster different feature. Thanks Krish
Re: File is not written on HDFS after running libhdfs C API
DataNode was not starting due to this error : java.io.IOException: Incompatible clusterIDs in /usr/local/hadoop/hadoop_store/hdfs/datanode: namenode clusterID = CID-b788c93b-a1d7-4351-bd91-28fdd134e9ba; datanode clusterID = CID-862f3fad-175e-442d-a06b-d65ac57d64b2 I can't image how this happened, anyway .. I issued this command : *bin/hdfs namenode -format -clusterId CID-862f3fad-175e-442d-a06b-d65ac57d64b2* And that got it started, the file is written correctly. Thank you very much On Thu, Mar 5, 2015 at 2:03 PM, Alexandru Calin alexandrucali...@gmail.com wrote: After putting the CLASSPATH initialization in .bashrc it creates the file, but it has 0 size and I also get this warnning: file opened Wrote 14 bytes 15/03/05 14:00:55 WARN hdfs.DFSClient: DataStreamer Exception org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /tmp/testfile.txt could only be replicated to 0 nodes instead of minReplication (=1). There are 0 datanode(s) running and no node(s) are excluded in this operation. at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:1549) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:3200) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:641) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:482) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:962) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2039) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2035) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2033) at org.apache.hadoop.ipc.Client.call(Client.java:1468) at org.apache.hadoop.ipc.Client.call(Client.java:1399) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232) at com.sun.proxy.$Proxy9.addBlock(Unknown Source) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:399) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) at com.sun.proxy.$Proxy10.addBlock(Unknown Source) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1532) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1349) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:588) FSDataOutputStream#close error: org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /tmp/testfile.txt could only be replicated to 0 nodes instead of minReplication (=1). There are 0 datanode(s) running and no node(s) are excluded in this operation. at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:1549) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:3200) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:641) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:482) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:962) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2039) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2035) at java.security.AccessController.doPrivileged(Native Method) at
HDFS Append Problem
Hi Everyone! I ‘m experiencing an annoying problem. My Scenario is: I want to store lots of small files (1-2MB max) in map files. These files will come periodically during the days, so I cannot use the “factory” writer because it will create a lot of small MapFiles. (I want to store these files in the HDFS immediately.) I’ m trying to create a code to append Map files. I use the *org.apache.hadoop.fs.FileSystem append() *method which calls the *org.apache.hadoop.hdfs.DistributedFileSystem append()* method to do the job. My code works well, because the stock MapFile Reader can retrieve the files. My problem appears in the upload phase. When I try to upload a set (1GB) of small files, the free space of the HDFS decreases fast. The program only uploads 400MB but according to the Cloudera Manager it is more than 5GB. The interesting part is that, when I terminate the upload, and wait 1-2 minutes, the HDFS goes back to normal size (500MB), and none of my files are lost. If I don’t terminate the upload, the HDFS goes out of free space and the program gets errors. I’m using cloudera quickvm 5.3 for testing, and the hdfs replication number is 1. Any ideas how to solve this issue? Thanks
Re: File is not written on HDFS after running libhdfs C API
After putting the CLASSPATH initialization in .bashrc it creates the file, but it has 0 size and I also get this warnning: file opened Wrote 14 bytes 15/03/05 14:00:55 WARN hdfs.DFSClient: DataStreamer Exception org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /tmp/testfile.txt could only be replicated to 0 nodes instead of minReplication (=1). There are 0 datanode(s) running and no node(s) are excluded in this operation. at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:1549) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:3200) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:641) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:482) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:962) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2039) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2035) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2033) at org.apache.hadoop.ipc.Client.call(Client.java:1468) at org.apache.hadoop.ipc.Client.call(Client.java:1399) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232) at com.sun.proxy.$Proxy9.addBlock(Unknown Source) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:399) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) at com.sun.proxy.$Proxy10.addBlock(Unknown Source) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1532) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1349) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:588) FSDataOutputStream#close error: org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /tmp/testfile.txt could only be replicated to 0 nodes instead of minReplication (=1). There are 0 datanode(s) running and no node(s) are excluded in this operation. at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:1549) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:3200) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:641) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:482) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:962) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2039) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2035) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2033) at org.apache.hadoop.ipc.Client.call(Client.java:1468) at org.apache.hadoop.ipc.Client.call(Client.java:1399) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232) at com.sun.proxy.$Proxy9.addBlock(Unknown Source) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:399) at
Re: t2.micro on AWS; Is it enough for setting up Hadoop cluster ?
Hi guys I know you guys want to keep costs down, but why go through all the effort to setup ec2 instances when you deploy EMR it takes the time to provision and setup the ec2 instances for you. All configuration then for the entire cluster is done on the master node of the particular cluster or setting up of additional software that is all done through the EMR console. We were doing some geospatial calculations and we loaded a 3rd party jar file called esri into the EMR cluster. I then had to pass a small bootstrap action (script) to have it distribute esri to the entire cluster. Why are you guys reinventing the wheel? --- Regards, Jonathan Aquilina Founder Eagle Eye T On 2015-03-06 03:35, Alexander Pivovarov wrote: I found the following solution to this problem I registered 2 subdomains (public and local) for each computer on https://freedns.afraid.org/subdomain/ [1] e.g. myhadoop-nn.crabdance.com [2] myhadoop-nn-local.crabdance.com [3] then I added cron job which sends http requests to update public and local ip on freedns server hint: public ip is detected automatically ip address for local name can be set using request parameter address=10.x.x.x (don't forget to escape ) as a result my nn computer has 2 DNS names with currently assigned ip addresses , e.g. myhadoop-nn.crabdance.com [2] 54.203.181.177 myhadoop-nn-local.crabdance.com [3] 10.220.149.103 in hadoop configuration I can use local machine names to access my cluster outside of AWS I can use public names Just curious if AWS provides easier way to name EC2 computers? On Thu, Mar 5, 2015 at 5:19 PM, Jonathan Aquilina jaquil...@eagleeyet.net wrote: I dont know how you would do that to be honest. With EMR you have destinctions master core and task nodes. If you need to change configuration you just ssh into the EMR master node. --- Regards, Jonathan Aquilina Founder Eagle Eye T On 2015-03-06 02:11, Alexander Pivovarov wrote: What is the easiest way to assign names to aws ec2 computers? I guess computer need static hostname and dns name before it can be used in hadoop cluster. On Mar 5, 2015 4:36 PM, Jonathan Aquilina jaquil...@eagleeyet.net wrote: When I started with EMR it was alot of testing and trial and error. HUE is already supported as something that can be installed from the AWS console. What I need to know is if you need this cluster on all the time or this is goign ot be what amazon call a transient cluster. Meaning you fire it up run the job and tear it back down. --- Regards, Jonathan Aquilina Founder Eagle Eye T On 2015-03-06 01:10, Krish Donald wrote: Thanks Jonathan, I will try to explore EMR option also. Can you please let me know the configuration which you have used it? Can you please recommend for me also? I would like to setup Hadoop cluster using cloudera manager and then would like to do below things: setup kerberos setup federation setup monitoring setup hadr backup and recovery authorization using sentry backup and recovery of individual componenets performamce tuning upgrade of cdh upgrade of CM Hue User Administration Spark Solr Thanks Krish On Thu, Mar 5, 2015 at 3:57 PM, Jonathan Aquilina jaquil...@eagleeyet.net wrote: krish EMR wont cost you much with all the testing and data we ran through the test systems as well as the large amont of data when everythign was read we paid about 15.00 USD. I honestly do not think that the specs there would be enough as java can be pretty ram hungry. --- Regards, Jonathan Aquilina Founder Eagle Eye T On 2015-03-06 00:41, Krish Donald wrote: Hi, I am new to AWS and would like to setup Hadoop cluster using cloudera manager for 6-7 nodes. t2.micro on AWS; Is it enough for setting up Hadoop cluster ? I would like to use free service as of now. Please advise. Thanks Krish Links: -- [1] https://freedns.afraid.org/subdomain/ [2] http://myhadoop-nn.crabdance.com [3] http://myhadoop-nn-local.crabdance.com
Re: t2.micro on AWS; Is it enough for setting up Hadoop cluster ?
When I started with EMR it was alot of testing and trial and error. HUE is already supported as something that can be installed from the AWS console. What I need to know is if you need this cluster on all the time or this is goign ot be what amazon call a transient cluster. Meaning you fire it up run the job and tear it back down. --- Regards, Jonathan Aquilina Founder Eagle Eye T On 2015-03-06 01:10, Krish Donald wrote: Thanks Jonathan, I will try to explore EMR option also. Can you please let me know the configuration which you have used it? Can you please recommend for me also? I would like to setup Hadoop cluster using cloudera manager and then would like to do below things: setup kerberos setup federation setup monitoring setup hadr backup and recovery authorization using sentry backup and recovery of individual componenets performamce tuning upgrade of cdh upgrade of CM Hue User Administration Spark Solr Thanks Krish On Thu, Mar 5, 2015 at 3:57 PM, Jonathan Aquilina jaquil...@eagleeyet.net wrote: krish EMR wont cost you much with all the testing and data we ran through the test systems as well as the large amont of data when everythign was read we paid about 15.00 USD. I honestly do not think that the specs there would be enough as java can be pretty ram hungry. --- Regards, Jonathan Aquilina Founder Eagle Eye T On 2015-03-06 00:41, Krish Donald wrote: Hi, I am new to AWS and would like to setup Hadoop cluster using cloudera manager for 6-7 nodes. t2.micro on AWS; Is it enough for setting up Hadoop cluster ? I would like to use free service as of now. Please advise. Thanks Krish
Re: (no subject)
I resolved it by downloaded slf4j-simple-1.7.10.jar and copied it to $HADOOP_HOME/lib in .bashrc added this variable. export HADOOP_CLASSPATH=$HADOOP_HOME/lib Issue got resolved. thanks a lot for your response Raj. Thanks SP On Thu, Mar 5, 2015 at 11:52 AM, Raj K Singh rajkrrsi...@gmail.com wrote: just configure logging appender in log4j setting and rerun the command On Mar 5, 2015 12:30 AM, SP sajid...@gmail.com wrote: Hello All, Why am I getting this error every time I execute a command. It was working fine with CDH4 version. When I upgraded to CDH5 version this message started showing up. does any one have resolution for this error sudo -u hdfs hadoop fs -ls / SLF4J: Failed to load class org.slf4j.impl.StaticLoggerBinder. SLF4J: Defaulting to no-operation (NOP) logger implementation SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details. Found 1 items drwxrwxrwt - hdfs hadoop 0 2015-03-04 10:30 /tmp Thanks SP