Running more than one secondary namenode
Hi all, I was wondering if there are any (technical) issues with running two secondary namenodes on two separate servers rather than running just one. Since basically everything falls or stands with a consistent snapshot of the namenode fsimage I was considering to run two secondary namenodes for additional resilience. Has this been done before or am I being too paranoid? Are there any caveats with doing this? Thanks, Jorn
Does Hbase automatically restore data when regionservers fail?
Hi, I am using CDH3U0 Hadoop and Hbase. I know that HDFS automatically replicates data to multiple data nodes so that when a couple of data nodes fail, the user won't lose any data. I am currently using replication factor of 3. Now in Hbase, I know there is some experimental replication service, but I am not using it. I thought that since HDFS already does replication, Hbase wouldn't need additional replication. Am I right on this? So my question would be: Given that I am using HDFS replication and not using Hbase replication, if a couple of regionservers fail, am I still able to access all the data I have stored before? I would appreciate any help. Thanks. Ed
Re: Running more than one secondary namenode
Jorn, If you've configured the Name Node fsimage and edit log replication to both NFS and Secondary Name Node and regularly backup the fsimage and edit logs you would do better investing time in understanding exactly how the Name Node builds up it's internal database and how it applies it's edit logs; 'read the code, Luke'. Then, if you really want to be prepared, you can then produce some test scenarios by applying a corruption (that the Name Node can't handle automatically) to the fsimage or edit logs on a sacrificial system (VM?) and see if you can recover from this. That way, if you ever get hit with a Name Node corruption you'll be in a much better place to recovery most/all of your data. Even with the best setup it can happen if you hit a 'corner case' scenario. Chris On 12 October 2011 08:50, Jorn Argelo - Ephorus jorn.arg...@ephorus.com wrote: Hi all, I was wondering if there are any (technical) issues with running two secondary namenodes on two separate servers rather than running just one. Since basically everything falls or stands with a consistent snapshot of the namenode fsimage I was considering to run two secondary namenodes for additional resilience. Has this been done before or am I being too paranoid? Are there any caveats with doing this? Thanks, Jorn
RE: Running more than one secondary namenode
Hi Chris, I am doing exactly what you described there apart from the regular backup thing (which is still on the todo list). Unfortunately my Java knowledge is poor at best so I'm not sure if I would actually understand the Namenode internals. I'm going to give it a try nevertheless though! I guess you're quite right that if we have regular backups of the namenode fsimage and edit logs we're quite safe. Thanks for your feedback. Jorn -Oorspronkelijk bericht- Van: Chris Smith [mailto:csmi...@gmail.com] Verzonden: woensdag 12 oktober 2011 12:03 Aan: common-user@hadoop.apache.org Onderwerp: Re: Running more than one secondary namenode Jorn, If you've configured the Name Node fsimage and edit log replication to both NFS and Secondary Name Node and regularly backup the fsimage and edit logs you would do better investing time in understanding exactly how the Name Node builds up it's internal database and how it applies it's edit logs; 'read the code, Luke'. Then, if you really want to be prepared, you can then produce some test scenarios by applying a corruption (that the Name Node can't handle automatically) to the fsimage or edit logs on a sacrificial system (VM?) and see if you can recover from this. That way, if you ever get hit with a Name Node corruption you'll be in a much better place to recovery most/all of your data. Even with the best setup it can happen if you hit a 'corner case' scenario. Chris On 12 October 2011 08:50, Jorn Argelo - Ephorus jorn.arg...@ephorus.com wrote: Hi all, I was wondering if there are any (technical) issues with running two secondary namenodes on two separate servers rather than running just one. Since basically everything falls or stands with a consistent snapshot of the namenode fsimage I was considering to run two secondary namenodes for additional resilience. Has this been done before or am I being too paranoid? Are there any caveats with doing this? Thanks, Jorn
Re: Running more than one secondary namenode
Jorn, Speaking beyond what Chris said: A very bad idea. You'll end up with a corrupted FS if you do that right now: https://issues.apache.org/jira/browse/HDFS-2305 (Fixed in a future release, however) On Wed, Oct 12, 2011 at 1:20 PM, Jorn Argelo - Ephorus jorn.arg...@ephorus.com wrote: Hi all, I was wondering if there are any (technical) issues with running two secondary namenodes on two separate servers rather than running just one. Since basically everything falls or stands with a consistent snapshot of the namenode fsimage I was considering to run two secondary namenodes for additional resilience. Has this been done before or am I being too paranoid? Are there any caveats with doing this? Thanks, Jorn -- Harsh J
Re: Hbase with Hadoop
for what its worth, i was in a similar situation/dilemma few days ago and got frustrated figuring out what version combination of hadoop/hbase to use and how to build hadoop manually to be compatible with hbase. the build process didn't work for me either. eventually, i ended up using cloudera distribution and i think it saved me a lot of headache and time. thanks On Tue, Oct 11, 2011 at 8:29 PM, jigneshmpatel jigneshmpa...@gmail.comwrote: Matt, Thanks a lot. Just wanted to have some more information. If hadoop 0.2.205.0 voted by the community members then will it become major release? And what if it is not approved by community members. And as you said I do like to use 0.90.3 if it works. If it is ok, can you share the deails of those configuration changes? -Jignesh -- View this message in context: http://lucene.472066.n3.nabble.com/Hbase-with-Hadoop-tp3413950p3414658.html Sent from the Hadoop lucene-users mailing list archive at Nabble.com.
Re: Hbase with Hadoop
Sorry to here that. Is CDH3 is a open source or a paid version? -jignesh On Oct 12, 2011, at 11:58 AM, Vinod Gupta Tankala wrote: for what its worth, i was in a similar situation/dilemma few days ago and got frustrated figuring out what version combination of hadoop/hbase to use and how to build hadoop manually to be compatible with hbase. the build process didn't work for me either. eventually, i ended up using cloudera distribution and i think it saved me a lot of headache and time. thanks On Tue, Oct 11, 2011 at 8:29 PM, jigneshmpatel jigneshmpa...@gmail.comwrote: Matt, Thanks a lot. Just wanted to have some more information. If hadoop 0.2.205.0 voted by the community members then will it become major release? And what if it is not approved by community members. And as you said I do like to use 0.90.3 if it works. If it is ok, can you share the deails of those configuration changes? -Jignesh -- View this message in context: http://lucene.472066.n3.nabble.com/Hbase-with-Hadoop-tp3413950p3414658.html Sent from the Hadoop lucene-users mailing list archive at Nabble.com.
Re: Hbase with Hadoop
its free and open source too.. basically, their releases are ahead of public releases of hadoop/hbase - from what i understand, major bug fixes and enhancements are checked in to their branch first and then eventually make it to public release branches. thanks On Wed, Oct 12, 2011 at 9:26 AM, Jignesh Patel jign...@websoft.com wrote: Sorry to here that. Is CDH3 is a open source or a paid version? -jignesh On Oct 12, 2011, at 11:58 AM, Vinod Gupta Tankala wrote: for what its worth, i was in a similar situation/dilemma few days ago and got frustrated figuring out what version combination of hadoop/hbase to use and how to build hadoop manually to be compatible with hbase. the build process didn't work for me either. eventually, i ended up using cloudera distribution and i think it saved me a lot of headache and time. thanks On Tue, Oct 11, 2011 at 8:29 PM, jigneshmpatel jigneshmpa...@gmail.com wrote: Matt, Thanks a lot. Just wanted to have some more information. If hadoop 0.2.205.0 voted by the community members then will it become major release? And what if it is not approved by community members. And as you said I do like to use 0.90.3 if it works. If it is ok, can you share the deails of those configuration changes? -Jignesh -- View this message in context: http://lucene.472066.n3.nabble.com/Hbase-with-Hadoop-tp3413950p3414658.html Sent from the Hadoop lucene-users mailing list archive at Nabble.com.
Re: About Tasktracker and DataNode
just want to add little on top. I believe that you have to start set of datanode (start-dfs.sh) before set of trasktracker (start-mapred.sh) hth p On Tue, Oct 11, 2011 at 9:10 PM, Harsh J ha...@cloudera.com wrote: Yes, you can do this - the services are not coupled with one another. Just start tasktrackers on one set of machines, and datanodes on another set of machines (via bin/hadoop-daemon.sh start {tasktracker,datanode} or so, individually.) You will lose out on complete data locality during processing, however. On Wed, Oct 12, 2011 at 9:07 AM, Xianqing Yu x...@ncsu.edu wrote: Hi people, I have a question about how to setup hadoop cluster. Could I set TaskTracker and DataNode running on the different machines? Which means one machine with Tasktracker only, and one machine has DataNode daemon only. Thanks, Xianqing -- Harsh J
Re: Hbase with Hadoop
I have installed Hadoop-0.20.205.0 but when I replace the hadoop 0.20.204.0 eclipse plugin with the 0.20.205.0, eclipse is not recognizing it. -Jignesh On Oct 12, 2011, at 12:31 PM, Vinod Gupta Tankala wrote: its free and open source too.. basically, their releases are ahead of public releases of hadoop/hbase - from what i understand, major bug fixes and enhancements are checked in to their branch first and then eventually make it to public release branches. thanks On Wed, Oct 12, 2011 at 9:26 AM, Jignesh Patel jign...@websoft.com wrote: Sorry to here that. Is CDH3 is a open source or a paid version? -jignesh On Oct 12, 2011, at 11:58 AM, Vinod Gupta Tankala wrote: for what its worth, i was in a similar situation/dilemma few days ago and got frustrated figuring out what version combination of hadoop/hbase to use and how to build hadoop manually to be compatible with hbase. the build process didn't work for me either. eventually, i ended up using cloudera distribution and i think it saved me a lot of headache and time. thanks On Tue, Oct 11, 2011 at 8:29 PM, jigneshmpatel jigneshmpa...@gmail.com wrote: Matt, Thanks a lot. Just wanted to have some more information. If hadoop 0.2.205.0 voted by the community members then will it become major release? And what if it is not approved by community members. And as you said I do like to use 0.90.3 if it works. If it is ok, can you share the deails of those configuration changes? -Jignesh -- View this message in context: http://lucene.472066.n3.nabble.com/Hbase-with-Hadoop-tp3413950p3414658.html Sent from the Hadoop lucene-users mailing list archive at Nabble.com.
Re: Hbase with Hadoop
The new plugin works after deleting eclipse and reinstalling it. On Oct 12, 2011, at 2:39 PM, Jignesh Patel wrote: I have installed Hadoop-0.20.205.0 but when I replace the hadoop 0.20.204.0 eclipse plugin with the 0.20.205.0, eclipse is not recognizing it. -Jignesh On Oct 12, 2011, at 12:31 PM, Vinod Gupta Tankala wrote: its free and open source too.. basically, their releases are ahead of public releases of hadoop/hbase - from what i understand, major bug fixes and enhancements are checked in to their branch first and then eventually make it to public release branches. thanks On Wed, Oct 12, 2011 at 9:26 AM, Jignesh Patel jign...@websoft.com wrote: Sorry to here that. Is CDH3 is a open source or a paid version? -jignesh On Oct 12, 2011, at 11:58 AM, Vinod Gupta Tankala wrote: for what its worth, i was in a similar situation/dilemma few days ago and got frustrated figuring out what version combination of hadoop/hbase to use and how to build hadoop manually to be compatible with hbase. the build process didn't work for me either. eventually, i ended up using cloudera distribution and i think it saved me a lot of headache and time. thanks On Tue, Oct 11, 2011 at 8:29 PM, jigneshmpatel jigneshmpa...@gmail.com wrote: Matt, Thanks a lot. Just wanted to have some more information. If hadoop 0.2.205.0 voted by the community members then will it become major release? And what if it is not approved by community members. And as you said I do like to use 0.90.3 if it works. If it is ok, can you share the deails of those configuration changes? -Jignesh -- View this message in context: http://lucene.472066.n3.nabble.com/Hbase-with-Hadoop-tp3413950p3414658.html Sent from the Hadoop lucene-users mailing list archive at Nabble.com.
Avoiding zero /empty output files in 0.20.203
Hi, how can empty output files (no output.collect() called) be avoided in Hadoop 0.20.203? The documentation suggests using LazyOutputFormat: http://hadoop.apache.org/mapreduce/docs/current/mapred_tutorial.html#Lazy+Output+Creation However, it seems as if it exists beginning from 0.21.0 only: http://hadoop.apache.org/mapreduce/docs/current/api/org/apache/hadoop/mapreduce/lib/output/LazyOutputFormat.html I couldn't find it in Hadoop 0.20.203. Best regards, Katja Müller
Re: Hbase with Hadoop
When I tried to run Hbase 0.90.4 with hadoop-.0.20.205.0 I got following error Jignesh-MacBookPro:hadoop-hbase hadoop-user$ bin/hbase shell HBase Shell; enter 'helpRETURN' for list of supported commands. Type exitRETURN to leave the HBase Shell Version 0.90.4, r1150278, Sun Jul 24 15:53:29 PDT 2011 hbase(main):001:0 status ERROR: org.apache.hadoop.hbase.ZooKeeperConnectionException: HBase is able to connect to ZooKeeper but the connection closes immediately. This could be a sign that the server has too many connections (30 is the default). Consider inspecting your ZK server logs for that error and then make sure you are reusing HBaseConfiguration as often as you can. See HTable's javadoc for more information. And when I tried to stop Hbase I continuously sees dot being printed and no sign of stopping it. Not sure why it just simply stop it. stopping hbase...…. On Oct 12, 2011, at 3:19 PM, Jignesh Patel wrote: The new plugin works after deleting eclipse and reinstalling it. On Oct 12, 2011, at 2:39 PM, Jignesh Patel wrote: I have installed Hadoop-0.20.205.0 but when I replace the hadoop 0.20.204.0 eclipse plugin with the 0.20.205.0, eclipse is not recognizing it. -Jignesh On Oct 12, 2011, at 12:31 PM, Vinod Gupta Tankala wrote: its free and open source too.. basically, their releases are ahead of public releases of hadoop/hbase - from what i understand, major bug fixes and enhancements are checked in to their branch first and then eventually make it to public release branches. thanks On Wed, Oct 12, 2011 at 9:26 AM, Jignesh Patel jign...@websoft.com wrote: Sorry to here that. Is CDH3 is a open source or a paid version? -jignesh On Oct 12, 2011, at 11:58 AM, Vinod Gupta Tankala wrote: for what its worth, i was in a similar situation/dilemma few days ago and got frustrated figuring out what version combination of hadoop/hbase to use and how to build hadoop manually to be compatible with hbase. the build process didn't work for me either. eventually, i ended up using cloudera distribution and i think it saved me a lot of headache and time. thanks On Tue, Oct 11, 2011 at 8:29 PM, jigneshmpatel jigneshmpa...@gmail.com wrote: Matt, Thanks a lot. Just wanted to have some more information. If hadoop 0.2.205.0 voted by the community members then will it become major release? And what if it is not approved by community members. And as you said I do like to use 0.90.3 if it works. If it is ok, can you share the deails of those configuration changes? -Jignesh -- View this message in context: http://lucene.472066.n3.nabble.com/Hbase-with-Hadoop-tp3413950p3414658.html Sent from the Hadoop lucene-users mailing list archive at Nabble.com.
wrong in secondary namenode ?
hi hadoopers, just want to check how it looks in your hadoop settings. i am running Cloudera CDH3u0, and really curious why previous.checkpoint directory is not there. I haven't set any of fs.checkpoint.period or fs.checkpoint.size. from Tom White's book, in ${fs.checkpoint.dir} should have current and previous.checkpoint directories this is from hdfs-site.xml property namedfs.name.dir/name value/hadoop/name,/hadoop/backup/value /property ... ... property namefs.checkpoint.dir/name value/hadoop/namesecondary/value /property and here is from secondary namenode [namesecondary]$ pwd /hadoop/namesecondary [namesecondary]$ ls current image in_use.lock hope this make sense P
Re: Hbase with Hadoop
Hi Jignesh, Not clear what's going on with your ZK, but as a starting point, the hsync/flush feature in 205 was implemented with an on-off switch. Make sure you've turned it on by setting *dfs.support.append *to true in the hdfs-site.xml config file. Also, are you installing Hadoop with security turned on or off? I'll gather some other config info that should help. --Matt On Wed, Oct 12, 2011 at 1:47 PM, Jignesh Patel jign...@websoft.com wrote: When I tried to run Hbase 0.90.4 with hadoop-.0.20.205.0 I got following error Jignesh-MacBookPro:hadoop-hbase hadoop-user$ bin/hbase shell HBase Shell; enter 'helpRETURN' for list of supported commands. Type exitRETURN to leave the HBase Shell Version 0.90.4, r1150278, Sun Jul 24 15:53:29 PDT 2011 hbase(main):001:0 status ERROR: org.apache.hadoop.hbase.ZooKeeperConnectionException: HBase is able to connect to ZooKeeper but the connection closes immediately. This could be a sign that the server has too many connections (30 is the default). Consider inspecting your ZK server logs for that error and then make sure you are reusing HBaseConfiguration as often as you can. See HTable's javadoc for more information. And when I tried to stop Hbase I continuously sees dot being printed and no sign of stopping it. Not sure why it just simply stop it. stopping hbase...…. On Oct 12, 2011, at 3:19 PM, Jignesh Patel wrote: The new plugin works after deleting eclipse and reinstalling it. On Oct 12, 2011, at 2:39 PM, Jignesh Patel wrote: I have installed Hadoop-0.20.205.0 but when I replace the hadoop 0.20.204.0 eclipse plugin with the 0.20.205.0, eclipse is not recognizing it. -Jignesh On Oct 12, 2011, at 12:31 PM, Vinod Gupta Tankala wrote: its free and open source too.. basically, their releases are ahead of public releases of hadoop/hbase - from what i understand, major bug fixes and enhancements are checked in to their branch first and then eventually make it to public release branches. thanks On Wed, Oct 12, 2011 at 9:26 AM, Jignesh Patel jign...@websoft.com wrote: Sorry to here that. Is CDH3 is a open source or a paid version? -jignesh On Oct 12, 2011, at 11:58 AM, Vinod Gupta Tankala wrote: for what its worth, i was in a similar situation/dilemma few days ago and got frustrated figuring out what version combination of hadoop/hbase to use and how to build hadoop manually to be compatible with hbase. the build process didn't work for me either. eventually, i ended up using cloudera distribution and i think it saved me a lot of headache and time. thanks On Tue, Oct 11, 2011 at 8:29 PM, jigneshmpatel jigneshmpa...@gmail.com wrote: Matt, Thanks a lot. Just wanted to have some more information. If hadoop 0.2.205.0 voted by the community members then will it become major release? And what if it is not approved by community members. And as you said I do like to use 0.90.3 if it works. If it is ok, can you share the deails of those configuration changes? -Jignesh -- View this message in context: http://lucene.472066.n3.nabble.com/Hbase-with-Hadoop-tp3413950p3414658.html Sent from the Hadoop lucene-users mailing list archive at Nabble.com.
Re: Hbase with Hadoop
Hi Jignesh, I have been running quite a few hbase tests on Hadoop 0.20.205 without any issues on both secure and non secure clusters. I have seen the error you mentioned when one has not specified the hbase config directory. Can you please try hbase --config path to hbase config directory shell and check if that solves the problem? Thanks Ramya On Wed, Oct 12, 2011 at 4:50 PM, Matt Foley mfo...@hortonworks.com wrote: Hi Jignesh, Not clear what's going on with your ZK, but as a starting point, the hsync/flush feature in 205 was implemented with an on-off switch. Make sure you've turned it on by setting *dfs.support.append *to true in the hdfs-site.xml config file. Also, are you installing Hadoop with security turned on or off? I'll gather some other config info that should help. --Matt On Wed, Oct 12, 2011 at 1:47 PM, Jignesh Patel jign...@websoft.com wrote: When I tried to run Hbase 0.90.4 with hadoop-.0.20.205.0 I got following error Jignesh-MacBookPro:hadoop-hbase hadoop-user$ bin/hbase shell HBase Shell; enter 'helpRETURN' for list of supported commands. Type exitRETURN to leave the HBase Shell Version 0.90.4, r1150278, Sun Jul 24 15:53:29 PDT 2011 hbase(main):001:0 status ERROR: org.apache.hadoop.hbase.ZooKeeperConnectionException: HBase is able to connect to ZooKeeper but the connection closes immediately. This could be a sign that the server has too many connections (30 is the default). Consider inspecting your ZK server logs for that error and then make sure you are reusing HBaseConfiguration as often as you can. See HTable's javadoc for more information. And when I tried to stop Hbase I continuously sees dot being printed and no sign of stopping it. Not sure why it just simply stop it. stopping hbase...…. On Oct 12, 2011, at 3:19 PM, Jignesh Patel wrote: The new plugin works after deleting eclipse and reinstalling it. On Oct 12, 2011, at 2:39 PM, Jignesh Patel wrote: I have installed Hadoop-0.20.205.0 but when I replace the hadoop 0.20.204.0 eclipse plugin with the 0.20.205.0, eclipse is not recognizing it. -Jignesh On Oct 12, 2011, at 12:31 PM, Vinod Gupta Tankala wrote: its free and open source too.. basically, their releases are ahead of public releases of hadoop/hbase - from what i understand, major bug fixes and enhancements are checked in to their branch first and then eventually make it to public release branches. thanks On Wed, Oct 12, 2011 at 9:26 AM, Jignesh Patel jign...@websoft.com wrote: Sorry to here that. Is CDH3 is a open source or a paid version? -jignesh On Oct 12, 2011, at 11:58 AM, Vinod Gupta Tankala wrote: for what its worth, i was in a similar situation/dilemma few days ago and got frustrated figuring out what version combination of hadoop/hbase to use and how to build hadoop manually to be compatible with hbase. the build process didn't work for me either. eventually, i ended up using cloudera distribution and i think it saved me a lot of headache and time. thanks On Tue, Oct 11, 2011 at 8:29 PM, jigneshmpatel jigneshmpa...@gmail.com wrote: Matt, Thanks a lot. Just wanted to have some more information. If hadoop 0.2.205.0 voted by the community members then will it become major release? And what if it is not approved by community members. And as you said I do like to use 0.90.3 if it works. If it is ok, can you share the deails of those configuration changes? -Jignesh -- View this message in context: http://lucene.472066.n3.nabble.com/Hbase-with-Hadoop-tp3413950p3414658.html Sent from the Hadoop lucene-users mailing list archive at Nabble.com.