Making Mumak work with capacity scheduler
Hi ! I have set up mumak and able to run it in terminal and in eclipse. I have modified the mapred-site.xml and capacity-scheduler.xml as necessary. I tried to apply patch MAPREDUCE-1253-20100804.patch in https://issues.apache.org/jira/browse/MAPREDUCE-1253 https://issues.apache.org/jira/browse/MAPREDUCE-1253 as follows {HADOOP_HOME}contrib/mumak$patch -p0 patch_file_location but i get error 3 out of 3 HUNK failed. Thanks, Arun -- View this message in context: http://lucene.472066.n3.nabble.com/Making-Mumak-work-with-capacity-scheduler-tp3354615p3354615.html Sent from the Hadoop lucene-users mailing list archive at Nabble.com.
RE: RE: java.io.IOException: Incorrect data format
I just solved the problem by releasing more space on the related HD partitions. Thank you all for your help ! Wei -Original Message- From: Peng, Wei [mailto:wei.p...@xerox.com] Sent: Tuesday, September 20, 2011 9:35 PM To: common-user@hadoop.apache.org Subject: RE: RE: java.io.IOException: Incorrect data format After manually created /state/partition2/hadoop/dfs/tmp, the datanode still could not be started and weirdly the /state/partition2/hadoop/dfs/tmp file is removed somehow... Wei -Original Message- From: Uma Maheswara Rao G 72686 [mailto:mahesw...@huawei.com] Sent: Tuesday, September 20, 2011 9:30 PM To: common-user@hadoop.apache.org Subject: Re: RE: java.io.IOException: Incorrect data format Are you able to create the directory manually in the DataNode Machine? #mkdirs /state/partition2/hadoop/dfs/tmp Regards, Uma - Original Message - From: Peng, Wei wei.p...@xerox.com Date: Wednesday, September 21, 2011 9:44 am Subject: RE: java.io.IOException: Incorrect data format To: common-user@hadoop.apache.org I modified edits so that hadoop namenode is restarted, however, I couldnot start my datanode. The datanode log shows 2011-09-20 21:07:10,068 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: java.io.IOException: Mkdirs failed to create /state/partition2/hadoop/dfs/tmp at org.apache.hadoop.hdfs.server.datanode.FSDataset$FSVolume.init(FSDatas et.java:394) at org.apache.hadoop.hdfs.server.datanode.FSDataset.init(FSDataset.java:8 94) at org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.j ava:318) at org.apache.hadoop.hdfs.server.datanode.DataNode.init(DataNode.java:232 ) at org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.ja va:1363) at org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(Data Node.java:1318) at org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode. java:1326) at org.apache.hadoop.hdfs.server.datanode.DataNode.main(DataNode.java:1448) Wei -Original Message- From: Uma Maheswara Rao G 72686 [mailto:mahesw...@huawei.com] Sent: Tuesday, September 20, 2011 9:10 PM To: common-user@hadoop.apache.org Subject: Re: java.io.IOException: Incorrect data format Can you check what is the value for command 'df -h'in NN machine. I think, one more possibility could be that while saving image itself it would have been currupted. To avoid such cases it has been already handled in trunk.For more details https://issues.apache.org/jira/browse/HDFS-1594 Regards, Uma - Original Message - From: Peng, Wei wei.p...@xerox.com Date: Wednesday, September 21, 2011 9:01 am Subject: java.io.IOException: Incorrect data format To: common-user@hadoop.apache.org I was not able to restart my name server because I the name server ran out of space. Then I adjusted dfs.datanode.du.reserved to 0, and used tune2fs -m to get some space, but I still could not restart the name node. I got the following error: java.io.IOException: Incorrect data format. logVersion is -18 but writables.length is 0. Anyone knows how to resolve this issue? Best, Wei
Re: Making Mumak work with capacity scheduler
Hello Arun, On which code base you are trying to apply the patch. Code should match to apply the patch. Regards, Uma - Original Message - From: ArunKumar arunk...@gmail.com Date: Wednesday, September 21, 2011 11:33 am Subject: Making Mumak work with capacity scheduler To: hadoop-u...@lucene.apache.org Hi ! I have set up mumak and able to run it in terminal and in eclipse. I have modified the mapred-site.xml and capacity-scheduler.xml as necessary.I tried to apply patch MAPREDUCE-1253-20100804.patch in https://issues.apache.org/jira/browse/MAPREDUCE-1253 https://issues.apache.org/jira/browse/MAPREDUCE-1253 as follows {HADOOP_HOME}contrib/mumak$patch -p0 patch_file_location but i get error 3 out of 3 HUNK failed. Thanks, Arun -- View this message in context: http://lucene.472066.n3.nabble.com/Making-Mumak-work-with-capacity- scheduler-tp3354615p3354615.html Sent from the Hadoop lucene-users mailing list archive at Nabble.com.
Re: RE: RE: java.io.IOException: Incorrect data format
I would suggest you to clean some space and try. Regards, Uma - Original Message - From: Peng, Wei wei.p...@xerox.com Date: Wednesday, September 21, 2011 10:03 am Subject: RE: RE: java.io.IOException: Incorrect data format To: common-user@hadoop.apache.org Yes, I can. The datanode is not able to start after crashing without enough HD space. Wei -Original Message- From: Uma Maheswara Rao G 72686 [mailto:mahesw...@huawei.com] Sent: Tuesday, September 20, 2011 9:30 PM To: common-user@hadoop.apache.org Subject: Re: RE: java.io.IOException: Incorrect data format Are you able to create the directory manually in the DataNode Machine? #mkdirs /state/partition2/hadoop/dfs/tmp Regards, Uma - Original Message - From: Peng, Wei wei.p...@xerox.com Date: Wednesday, September 21, 2011 9:44 am Subject: RE: java.io.IOException: Incorrect data format To: common-user@hadoop.apache.org I modified edits so that hadoop namenode is restarted, however, I couldnot start my datanode. The datanode log shows 2011-09-20 21:07:10,068 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: java.io.IOException: Mkdirs failed to create /state/partition2/hadoop/dfs/tmpat org.apache.hadoop.hdfs.server.datanode.FSDataset$FSVolume.init(FSDatas et.java:394) at org.apache.hadoop.hdfs.server.datanode.FSDataset.init(FSDataset.java:8 94) at org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.j ava:318) at org.apache.hadoop.hdfs.server.datanode.DataNode.init(DataNode.java:232 ) at org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.ja va:1363) at org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(Data Node.java:1318) at org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode. java:1326) at org.apache.hadoop.hdfs.server.datanode.DataNode.main(DataNode.java:1448) Wei -Original Message- From: Uma Maheswara Rao G 72686 [mailto:mahesw...@huawei.com] Sent: Tuesday, September 20, 2011 9:10 PM To: common-user@hadoop.apache.org Subject: Re: java.io.IOException: Incorrect data format Can you check what is the value for command 'df -h'in NN machine. I think, one more possibility could be that while saving image itself it would have been currupted. To avoid such cases it has been already handled in trunk.For more details https://issues.apache.org/jira/browse/HDFS-1594 Regards, Uma - Original Message - From: Peng, Wei wei.p...@xerox.com Date: Wednesday, September 21, 2011 9:01 am Subject: java.io.IOException: Incorrect data format To: common-user@hadoop.apache.org I was not able to restart my name server because I the name server ran out of space. Then I adjusted dfs.datanode.du.reserved to 0, and used tune2fs -m to get some space, but I still could not restart the name node. I got the following error: java.io.IOException: Incorrect data format. logVersion is -18 but writables.length is 0. Anyone knows how to resolve this issue? Best, Wei
Re: Making Mumak work with capacity scheduler
Hi Uma ! I am applying patch to mumak in hadoop-0.21 version. Arun On Wed, Sep 21, 2011 at 11:55 AM, Uma Maheswara Rao G [via Lucene] ml-node+s472066n3354652...@n3.nabble.com wrote: Hello Arun, On which code base you are trying to apply the patch. Code should match to apply the patch. Regards, Uma - Original Message - From: ArunKumar [hidden email]http://user/SendEmail.jtp?type=nodenode=3354652i=0 Date: Wednesday, September 21, 2011 11:33 am Subject: Making Mumak work with capacity scheduler To: [hidden email] http://user/SendEmail.jtp?type=nodenode=3354652i=1 Hi ! I have set up mumak and able to run it in terminal and in eclipse. I have modified the mapred-site.xml and capacity-scheduler.xml as necessary.I tried to apply patch MAPREDUCE-1253-20100804.patch in https://issues.apache.org/jira/browse/MAPREDUCE-1253 https://issues.apache.org/jira/browse/MAPREDUCE-1253 as follows {HADOOP_HOME}contrib/mumak$patch -p0 patch_file_location but i get error 3 out of 3 HUNK failed. Thanks, Arun -- View this message in context: http://lucene.472066.n3.nabble.com/Making-Mumak-work-with-capacity- scheduler-tp3354615p3354615.html Sent from the Hadoop lucene-users mailing list archive at Nabble.com. -- If you reply to this email, your message will be added to the discussion below: http://lucene.472066.n3.nabble.com/Making-Mumak-work-with-capacity-scheduler-tp3354615p3354652.html To unsubscribe from Making Mumak work with capacity scheduler, click herehttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=3354615code=YXJ1bms3ODZAZ21haWwuY29tfDMzNTQ2MTV8NzA5NTc4MTY3. -- View this message in context: http://lucene.472066.n3.nabble.com/Making-Mumak-work-with-capacity-scheduler-tp3354615p3354660.html Sent from the Hadoop lucene-users mailing list archive at Nabble.com.
Re: Making Mumak work with capacity scheduler
Looks that patchs are based on 0.22 version. So, you can not apply them directly. You may need to merge them logically ( back port them). one more point to note here 0.21 version of hadoop is not a stable version. Presently 0.20xx versions are stable. Regards, Uma - Original Message - From: ArunKumar arunk...@gmail.com Date: Wednesday, September 21, 2011 12:01 pm Subject: Re: Making Mumak work with capacity scheduler To: hadoop-u...@lucene.apache.org Hi Uma ! I am applying patch to mumak in hadoop-0.21 version. Arun On Wed, Sep 21, 2011 at 11:55 AM, Uma Maheswara Rao G [via Lucene] ml-node+s472066n3354652...@n3.nabble.com wrote: Hello Arun, On which code base you are trying to apply the patch. Code should match to apply the patch. Regards, Uma - Original Message - From: ArunKumar [hidden email]http://user/SendEmail.jtp?type=nodenode=3354652i=0 Date: Wednesday, September 21, 2011 11:33 am Subject: Making Mumak work with capacity scheduler To: [hidden email] http://user/SendEmail.jtp?type=nodenode=3354652i=1 Hi ! I have set up mumak and able to run it in terminal and in eclipse. I have modified the mapred-site.xml and capacity-scheduler.xml as necessary.I tried to apply patch MAPREDUCE-1253-20100804.patch in https://issues.apache.org/jira/browse/MAPREDUCE-1253 https://issues.apache.org/jira/browse/MAPREDUCE-1253 as follows {HADOOP_HOME}contrib/mumak$patch -p0 patch_file_location but i get error 3 out of 3 HUNK failed. Thanks, Arun -- View this message in context: http://lucene.472066.n3.nabble.com/Making-Mumak-work-with- capacity- scheduler-tp3354615p3354615.html Sent from the Hadoop lucene-users mailing list archive at Nabble.com. -- If you reply to this email, your message will be added to the discussion below: http://lucene.472066.n3.nabble.com/Making-Mumak-work-with- capacity-scheduler-tp3354615p3354652.html To unsubscribe from Making Mumak work with capacity scheduler, click herehttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=3354615code=YXJ1bms3ODZAZ21haWwuY29tfDMzNTQ2MTV8NzA5NTc4MTY3. -- View this message in context: http://lucene.472066.n3.nabble.com/Making-Mumak-work-with-capacity- scheduler-tp3354615p3354660.html Sent from the Hadoop lucene-users mailing list archive at Nabble.com.
Re: Making Mumak work with capacity scheduler
Hi Uma ! Mumak is not part of stable versions yet. It comes from Hadoop-0.21 onwards. Can u describe in detail You may need to merge them logically ( back port them) ? I don't get it . Arun On Wed, Sep 21, 2011 at 12:07 PM, Uma Maheswara Rao G [via Lucene] ml-node+s472066n3354668...@n3.nabble.com wrote: Looks that patchs are based on 0.22 version. So, you can not apply them directly. You may need to merge them logically ( back port them). one more point to note here 0.21 version of hadoop is not a stable version. Presently 0.20xx versions are stable. Regards, Uma - Original Message - From: ArunKumar [hidden email]http://user/SendEmail.jtp?type=nodenode=3354668i=0 Date: Wednesday, September 21, 2011 12:01 pm Subject: Re: Making Mumak work with capacity scheduler To: [hidden email] http://user/SendEmail.jtp?type=nodenode=3354668i=1 Hi Uma ! I am applying patch to mumak in hadoop-0.21 version. Arun On Wed, Sep 21, 2011 at 11:55 AM, Uma Maheswara Rao G [via Lucene] [hidden email] http://user/SendEmail.jtp?type=nodenode=3354668i=2 wrote: Hello Arun, On which code base you are trying to apply the patch. Code should match to apply the patch. Regards, Uma - Original Message - From: ArunKumar [hidden email]http://user/SendEmail.jtp?type=nodenode=3354652i=0 Date: Wednesday, September 21, 2011 11:33 am Subject: Making Mumak work with capacity scheduler To: [hidden email] http://user/SendEmail.jtp?type=nodenode=3354652i=1 Hi ! I have set up mumak and able to run it in terminal and in eclipse. I have modified the mapred-site.xml and capacity-scheduler.xml as necessary.I tried to apply patch MAPREDUCE-1253-20100804.patch in https://issues.apache.org/jira/browse/MAPREDUCE-1253 https://issues.apache.org/jira/browse/MAPREDUCE-1253 as follows {HADOOP_HOME}contrib/mumak$patch -p0 patch_file_location but i get error 3 out of 3 HUNK failed. Thanks, Arun -- View this message in context: http://lucene.472066.n3.nabble.com/Making-Mumak-work-with- capacity- scheduler-tp3354615p3354615.html Sent from the Hadoop lucene-users mailing list archive at Nabble.com. -- If you reply to this email, your message will be added to the discussion below: http://lucene.472066.n3.nabble.com/Making-Mumak-work-with- capacity-scheduler-tp3354615p3354652.html To unsubscribe from Making Mumak work with capacity scheduler, click here -- View this message in context: http://lucene.472066.n3.nabble.com/Making-Mumak-work-with-capacity- scheduler-tp3354615p3354660.html Sent from the Hadoop lucene-users mailing list archive at Nabble.com. -- If you reply to this email, your message will be added to the discussion below: http://lucene.472066.n3.nabble.com/Making-Mumak-work-with-capacity-scheduler-tp3354615p3354668.html To unsubscribe from Making Mumak work with capacity scheduler, click herehttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=3354615code=YXJ1bms3ODZAZ21haWwuY29tfDMzNTQ2MTV8NzA5NTc4MTY3. -- View this message in context: http://lucene.472066.n3.nabble.com/Making-Mumak-work-with-capacity-scheduler-tp3354615p3354818.html Sent from the Hadoop lucene-users mailing list archive at Nabble.com.
Re: Making Mumak work with capacity scheduler
Hello Arun, If you want to apply MAPREDUCE-1253 on 21 version, applying patch directly using commands may not work because of codebase changes. So, you take the patch and apply the lines in your code base manually. I am not sure any otherway for this. Did i understand wrongly your intention? Regards, Uma - Original Message - From: ArunKumar arunk...@gmail.com Date: Wednesday, September 21, 2011 1:52 pm Subject: Re: Making Mumak work with capacity scheduler To: hadoop-u...@lucene.apache.org Hi Uma ! Mumak is not part of stable versions yet. It comes from Hadoop- 0.21 onwards. Can u describe in detail You may need to merge them logically ( back port them) ? I don't get it . Arun On Wed, Sep 21, 2011 at 12:07 PM, Uma Maheswara Rao G [via Lucene] ml-node+s472066n3354668...@n3.nabble.com wrote: Looks that patchs are based on 0.22 version. So, you can not apply them directly. You may need to merge them logically ( back port them). one more point to note here 0.21 version of hadoop is not a stable version. Presently 0.20xx versions are stable. Regards, Uma - Original Message - From: ArunKumar [hidden email]http://user/SendEmail.jtp?type=nodenode=3354668i=0 Date: Wednesday, September 21, 2011 12:01 pm Subject: Re: Making Mumak work with capacity scheduler To: [hidden email] http://user/SendEmail.jtp?type=nodenode=3354668i=1 Hi Uma ! I am applying patch to mumak in hadoop-0.21 version. Arun On Wed, Sep 21, 2011 at 11:55 AM, Uma Maheswara Rao G [via Lucene] [hidden email] http://user/SendEmail.jtp?type=nodenode=3354668i=2 wrote: Hello Arun, On which code base you are trying to apply the patch. Code should match to apply the patch. Regards, Uma - Original Message - From: ArunKumar [hidden email]http://user/SendEmail.jtp?type=nodenode=3354652i=0 Date: Wednesday, September 21, 2011 11:33 am Subject: Making Mumak work with capacity scheduler To: [hidden email] http://user/SendEmail.jtp?type=nodenode=3354652i=1 Hi ! I have set up mumak and able to run it in terminal and in eclipse.I have modified the mapred-site.xml and capacity- scheduler.xml as necessary.I tried to apply patch MAPREDUCE-1253- 20100804.patch in https://issues.apache.org/jira/browse/MAPREDUCE-1253 https://issues.apache.org/jira/browse/MAPREDUCE-1253 as follows{HADOOP_HOME}contrib/mumak$patch -p0 patch_file_locationbut i get error 3 out of 3 HUNK failed. Thanks, Arun -- View this message in context: http://lucene.472066.n3.nabble.com/Making-Mumak-work-with- capacity- scheduler-tp3354615p3354615.html Sent from the Hadoop lucene-users mailing list archive at Nabble.com. -- If you reply to this email, your message will be added to the discussion below: http://lucene.472066.n3.nabble.com/Making-Mumak-work-with- capacity-scheduler-tp3354615p3354652.html To unsubscribe from Making Mumak work with capacity scheduler, click here -- View this message in context: http://lucene.472066.n3.nabble.com/Making-Mumak-work-with- capacity- scheduler-tp3354615p3354660.html Sent from the Hadoop lucene-users mailing list archive at Nabble.com. -- If you reply to this email, your message will be added to the discussion below: http://lucene.472066.n3.nabble.com/Making-Mumak-work-with- capacity-scheduler-tp3354615p3354668.html To unsubscribe from Making Mumak work with capacity scheduler, click herehttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=3354615code=YXJ1bms3ODZAZ21haWwuY29tfDMzNTQ2MTV8NzA5NTc4MTY3. -- View this message in context: http://lucene.472066.n3.nabble.com/Making-Mumak-work-with-capacity- scheduler-tp3354615p3354818.html Sent from the Hadoop lucene-users mailing list archive at Nabble.com.
Any other way to copy to HDFS ?
Guys, As far as I know hadoop, I think, to copy the files to HDFS, first it needs to be copied to the NameNode's local filesystem. Is it right ?? So does it mean that even if I have a hadoop cluster of 10 nodes with overall capacity of 6TB, but if my NameNode's hard disk capacity is 500 GB, I can not copy any file to HDFS greater than 500 GB ? Is there any other way to directly copy to HDFS without copy the file to namenode's local filesystem ? What can be other ways to copy large files greater than namenode's disk capacity ? Thanks, Praveenesh.
Re: Any other way to copy to HDFS ?
Hi, You need not copy the files to NameNode. Hadoop provide Client code as well to copy the files. To copy the files from other node ( non dfs), you need to put the hadoop**.jar's into classpath and use the below code snippet. FileSystem fs =new DistributedFileSystem(); fs.initialize(NAMENODE_URI, configuration); fs.copyFromLocal(srcPath, dstPath); using this API, you can copy the files from any machine. Regards, Uma - Original Message - From: praveenesh kumar praveen...@gmail.com Date: Wednesday, September 21, 2011 2:14 pm Subject: Any other way to copy to HDFS ? To: common-user@hadoop.apache.org Guys, As far as I know hadoop, I think, to copy the files to HDFS, first it needs to be copied to the NameNode's local filesystem. Is it right ?? So does it mean that even if I have a hadoop cluster of 10 nodes with overall capacity of 6TB, but if my NameNode's hard disk capacity is 500 GB, I can not copy any file to HDFS greater than 500 GB ? Is there any other way to directly copy to HDFS without copy the file to namenode's local filesystem ? What can be other ways to copy large files greater than namenode's diskcapacity ? Thanks, Praveenesh.
Re: Any other way to copy to HDFS ?
For more understanding the flows, i would recommend you to go through once below docs http://hadoop.apache.org/common/docs/r0.16.4/hdfs_design.html#The+File+System+Namespace Regards, Uma - Original Message - From: Uma Maheswara Rao G 72686 mahesw...@huawei.com Date: Wednesday, September 21, 2011 2:36 pm Subject: Re: Any other way to copy to HDFS ? To: common-user@hadoop.apache.org Hi, You need not copy the files to NameNode. Hadoop provide Client code as well to copy the files. To copy the files from other node ( non dfs), you need to put the hadoop**.jar's into classpath and use the below code snippet. FileSystem fs =new DistributedFileSystem(); fs.initialize(NAMENODE_URI, configuration); fs.copyFromLocal(srcPath, dstPath); using this API, you can copy the files from any machine. Regards, Uma - Original Message - From: praveenesh kumar praveen...@gmail.com Date: Wednesday, September 21, 2011 2:14 pm Subject: Any other way to copy to HDFS ? To: common-user@hadoop.apache.org Guys, As far as I know hadoop, I think, to copy the files to HDFS, first it needs to be copied to the NameNode's local filesystem. Is it right ?? So does it mean that even if I have a hadoop cluster of 10 nodes with overall capacity of 6TB, but if my NameNode's hard disk capacity is 500 GB, I can not copy any file to HDFS greater than 500 GB ? Is there any other way to directly copy to HDFS without copy the file to namenode's local filesystem ? What can be other ways to copy large files greater than namenode's diskcapacity ? Thanks, Praveenesh.
Re: Any other way to copy to HDFS ?
So I want to copy the file from windows machine to linux namenode. How can I define NAMENODE_URI in the code you mention, if I want to copy data from windows machine to namenode machine ? Thanks, Praveenesh On Wed, Sep 21, 2011 at 2:37 PM, Uma Maheswara Rao G 72686 mahesw...@huawei.com wrote: For more understanding the flows, i would recommend you to go through once below docs http://hadoop.apache.org/common/docs/r0.16.4/hdfs_design.html#The+File+System+Namespace Regards, Uma - Original Message - From: Uma Maheswara Rao G 72686 mahesw...@huawei.com Date: Wednesday, September 21, 2011 2:36 pm Subject: Re: Any other way to copy to HDFS ? To: common-user@hadoop.apache.org Hi, You need not copy the files to NameNode. Hadoop provide Client code as well to copy the files. To copy the files from other node ( non dfs), you need to put the hadoop**.jar's into classpath and use the below code snippet. FileSystem fs =new DistributedFileSystem(); fs.initialize(NAMENODE_URI, configuration); fs.copyFromLocal(srcPath, dstPath); using this API, you can copy the files from any machine. Regards, Uma - Original Message - From: praveenesh kumar praveen...@gmail.com Date: Wednesday, September 21, 2011 2:14 pm Subject: Any other way to copy to HDFS ? To: common-user@hadoop.apache.org Guys, As far as I know hadoop, I think, to copy the files to HDFS, first it needs to be copied to the NameNode's local filesystem. Is it right ?? So does it mean that even if I have a hadoop cluster of 10 nodes with overall capacity of 6TB, but if my NameNode's hard disk capacity is 500 GB, I can not copy any file to HDFS greater than 500 GB ? Is there any other way to directly copy to HDFS without copy the file to namenode's local filesystem ? What can be other ways to copy large files greater than namenode's diskcapacity ? Thanks, Praveenesh.
Re: Any other way to copy to HDFS ?
When you start the NameNode in Linux Machine, it will listen on one address.You can configure that address in NameNode by using fs.default.name. From the clients, you can give this address to connect to your NameNode. initialize API will take URI and configuration. Assume if your NameNode is running on hdfs://10.18.52.63:9000 Then you can caonnect to your NameNode like below. FileSystem fs =new DistributedFileSystem(); fs.initialize(new URI(hdfs://10.18.52.63:9000/), new Configuration()); Please go through the below mentioned docs, you will more understanding. if I want to copy data from windows machine to namenode machine ? In DFS namenode will be responsible for only nameSpace. in simple words to understand quickly the flow: Clients will ask NameNode to give some DNs to copy the data. Then NN will create file entry in NameSpace and also will return the block entries based on client request. Then clients directly will connect to the DNs and copy the data. Reading data back also will the sameway. I hope you will understand better now :-) Regards, Uma - Original Message - From: praveenesh kumar praveen...@gmail.com Date: Wednesday, September 21, 2011 3:11 pm Subject: Re: Any other way to copy to HDFS ? To: common-user@hadoop.apache.org So I want to copy the file from windows machine to linux namenode. How can I define NAMENODE_URI in the code you mention, if I want to copy data from windows machine to namenode machine ? Thanks, Praveenesh On Wed, Sep 21, 2011 at 2:37 PM, Uma Maheswara Rao G 72686 mahesw...@huawei.com wrote: For more understanding the flows, i would recommend you to go through once below docs http://hadoop.apache.org/common/docs/r0.16.4/hdfs_design.html#The+File+System+Namespace Regards, Uma - Original Message - From: Uma Maheswara Rao G 72686 mahesw...@huawei.com Date: Wednesday, September 21, 2011 2:36 pm Subject: Re: Any other way to copy to HDFS ? To: common-user@hadoop.apache.org Hi, You need not copy the files to NameNode. Hadoop provide Client code as well to copy the files. To copy the files from other node ( non dfs), you need to put the hadoop**.jar's into classpath and use the below code snippet. FileSystem fs =new DistributedFileSystem(); fs.initialize(NAMENODE_URI, configuration); fs.copyFromLocal(srcPath, dstPath); using this API, you can copy the files from any machine. Regards, Uma - Original Message - From: praveenesh kumar praveen...@gmail.com Date: Wednesday, September 21, 2011 2:14 pm Subject: Any other way to copy to HDFS ? To: common-user@hadoop.apache.org Guys, As far as I know hadoop, I think, to copy the files to HDFS, first it needs to be copied to the NameNode's local filesystem. Is it right ?? So does it mean that even if I have a hadoop cluster of 10 nodes with overall capacity of 6TB, but if my NameNode's hard disk capacity is 500 GB, I can not copy any file to HDFS greater than 500 GB ? Is there any other way to directly copy to HDFS without copy the file to namenode's local filesystem ? What can be other ways to copy large files greater than namenode's diskcapacity ? Thanks, Praveenesh.
Fwd: Any other way to copy to HDFS ?
Thanks a lot. I am trying to run the following code on my windows machine that is not part of cluster. ** *public* *static* *void* main(String args[]) *throws* IOException, URISyntaxException { FileSystem fs =*new* DistributedFileSystem(); fs.initialize(*new* URI(hdfs://162.192.100.53:54310/), *new*Configuration()); fs.copyFromLocalFile(*new* Path(C:\\Positive.txt),*new* Path( /user/hadoop/Positive.txt)); System.*out*.println(Done); } But I am getting the following exception : Exception in thread main org.apache.hadoop.security.AccessControlException: org.apache.hadoop.security.AccessControlException: Permission denied: user=DrWho, access=WRITE, inode=hadoop:hadoop:supergroup:rwxr-xr-x at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27) at java.lang.reflect.Constructor.newInstance(Constructor.java:513) at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:96) at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:58) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.init(DFSClient.java:2836) at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:500) at org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:206) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:484) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:465) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:372) at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:208) at org.apache.hadoop.fs.FileSystem.copyFromLocalFile(FileSystem.java:1189) at org.apache.hadoop.fs.FileSystem.copyFromLocalFile(FileSystem.java:1165) at org.apache.hadoop.fs.FileSystem.copyFromLocalFile(FileSystem.java:1137) at com.musigma.hdfs.HdfsBackup.main(HdfsBackup.java:20) Caused by: org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.security.AccessControlException: Permission denied: user=DrWho, access=WRITE, inode=hadoop:hadoop:supergroup:rwxr-xr-x at org.apache.hadoop.hdfs.server.namenode.PermissionChecker.check(PermissionChecker.java:176) at org.apache.hadoop.hdfs.server.namenode.PermissionChecker.check(PermissionChecker.java:157) at org.apache.hadoop.hdfs.server.namenode.PermissionChecker.checkPermission(PermissionChecker.java:105) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:4702) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkAncestorAccess(FSNamesystem.java:4672) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInternal(FSNamesystem.java:1048) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:1002) at org.apache.hadoop.hdfs.server.namenode.NameNode.create(NameNode.java:381) at sun.reflect.GeneratedMethodAccessor19.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:616) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:508) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:961) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:957) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:416) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:955) at org.apache.hadoop.ipc.Client.call(Client.java:740) at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220) at $Proxy0.create(Unknown Source) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59) at $Proxy0.create(Unknown Source) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.init(DFSClient.java:2833) ... 10 more As far as I know, the exception is coming because some other user is trying to access HDFS than my hadoop user. Does it mean I have to change permission ? or is there any other way to do it from java code ? Thanks, Praveenesh -- Forwarded message -- From: Uma Maheswara Rao G 72686 mahesw...@huawei.com Date: Wed, Sep 21, 2011 at 3:27 PM Subject: Re: Any other way to copy to HDFS ? To: common-user@hadoop.apache.org When you start the NameNode in Linux Machine, it will listen on one address.You can configure that address in NameNode by using fs.default.name. From the clients, you can give this address to connect to your NameNode. initialize API will take URI and configuration. Assume if your NameNode is
Re: risks of using Hadoop
On 20/09/11 22:52, Michael Segel wrote: PS... There's this junction box in your machine room that has this very large on/off switch. If pulled down, it will cut power to your cluster and you will lose everything. Now would you consider this a risk? Sure. But is it something you should really lose sleep over? Do you understand that there are risks and there are improbable risks? We follow the @devops_borat Ops book and have a post-it-note on the switch saying not a light switch
Re: Fwd: Any other way to copy to HDFS ?
Hello Praveenesh, If you really need not care about permissions then you can disable it at NN side by using the property dfs.permissions.enable You can the permission for the path before creating as well. from docs: Changes to the File System API All methods that use a path parameter will throw AccessControlException if permission checking fails. New methods: public FSDataOutputStream create(Path f, FsPermission permission, boolean overwrite, int bufferSize, short replication, long blockSize, Progressable progress) throws IOException; public boolean mkdirs(Path f, FsPermission permission) throws IOException; public void setPermission(Path p, FsPermission permission) throws IOException; public void setOwner(Path p, String username, String groupname) throws IOException; public FileStatus getFileStatus(Path f) throws IOException; will additionally return the user, group and mode associated with the path. http://hadoop.apache.org/common/docs/r0.20.2/hdfs_permissions_guide.html Regards, Uma - Original Message - From: praveenesh kumar praveen...@gmail.com Date: Wednesday, September 21, 2011 3:41 pm Subject: Fwd: Any other way to copy to HDFS ? To: common-user@hadoop.apache.org Thanks a lot. I am trying to run the following code on my windows machinethat is not part of cluster. ** *public* *static* *void* main(String args[]) *throws* IOException, URISyntaxException { FileSystem fs =*new* DistributedFileSystem(); fs.initialize(*new* URI(hdfs://162.192.100.53:54310/), *new*Configuration()); fs.copyFromLocalFile(*new* Path(C:\\Positive.txt),*new* Path( /user/hadoop/Positive.txt)); System.*out*.println(Done); } But I am getting the following exception : Exception in thread main org.apache.hadoop.security.AccessControlException: org.apache.hadoop.security.AccessControlException: Permission denied: user=DrWho, access=WRITE, inode=hadoop:hadoop:supergroup:rwxr-xr-x at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27) at java.lang.reflect.Constructor.newInstance(Constructor.java:513) at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:96) at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:58) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.init(DFSClient.java:2836) at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:500) at org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:206) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:484) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:465) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:372) at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:208) at org.apache.hadoop.fs.FileSystem.copyFromLocalFile(FileSystem.java:1189) at org.apache.hadoop.fs.FileSystem.copyFromLocalFile(FileSystem.java:1165) at org.apache.hadoop.fs.FileSystem.copyFromLocalFile(FileSystem.java:1137) at com.musigma.hdfs.HdfsBackup.main(HdfsBackup.java:20) Caused by: org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.security.AccessControlException: Permission denied: user=DrWho, access=WRITE, inode=hadoop:hadoop:supergroup:rwxr-xr-x at org.apache.hadoop.hdfs.server.namenode.PermissionChecker.check(PermissionChecker.java:176) at org.apache.hadoop.hdfs.server.namenode.PermissionChecker.check(PermissionChecker.java:157) at org.apache.hadoop.hdfs.server.namenode.PermissionChecker.checkPermission(PermissionChecker.java:105) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:4702) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkAncestorAccess(FSNamesystem.java:4672) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInternal(FSNamesystem.java:1048) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:1002) at org.apache.hadoop.hdfs.server.namenode.NameNode.create(NameNode.java:381) at sun.reflect.GeneratedMethodAccessor19.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:616) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:508) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:961) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:957) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:416) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:955) at org.apache.hadoop.ipc.Client.call(Client.java:740) at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220) at $Proxy0.create(Unknown Source) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at
Re: risks of using Hadoop
On Wed, 21 Sep 2011 11:21:01 +0100 Steve Loughran ste...@apache.org wrote: On 20/09/11 22:52, Michael Segel wrote: PS... There's this junction box in your machine room that has this very large on/off switch. If pulled down, it will cut power to your cluster and you will lose everything. Now would you consider this a risk? Sure. But is it something you should really lose sleep over? Do you understand that there are risks and there are improbable risks? We follow the @devops_borat Ops book and have a post-it-note on the switch saying not a light switch :D
Re: Fwd: Any other way to copy to HDFS ?
Thanks a lot..!! I guess I can play around with the permissions of dfs for a while. On Wed, Sep 21, 2011 at 3:59 PM, Uma Maheswara Rao G 72686 mahesw...@huawei.com wrote: Hello Praveenesh, If you really need not care about permissions then you can disable it at NN side by using the property dfs.permissions.enable You can the permission for the path before creating as well. from docs: Changes to the File System API All methods that use a path parameter will throw AccessControlException if permission checking fails. New methods: public FSDataOutputStream create(Path f, FsPermission permission, boolean overwrite, int bufferSize, short replication, long blockSize, Progressable progress) throws IOException; public boolean mkdirs(Path f, FsPermission permission) throws IOException; public void setPermission(Path p, FsPermission permission) throws IOException; public void setOwner(Path p, String username, String groupname) throws IOException; public FileStatus getFileStatus(Path f) throws IOException; will additionally return the user, group and mode associated with the path. http://hadoop.apache.org/common/docs/r0.20.2/hdfs_permissions_guide.html Regards, Uma - Original Message - From: praveenesh kumar praveen...@gmail.com Date: Wednesday, September 21, 2011 3:41 pm Subject: Fwd: Any other way to copy to HDFS ? To: common-user@hadoop.apache.org Thanks a lot. I am trying to run the following code on my windows machinethat is not part of cluster. ** *public* *static* *void* main(String args[]) *throws* IOException, URISyntaxException { FileSystem fs =*new* DistributedFileSystem(); fs.initialize(*new* URI(hdfs://162.192.100.53:54310/), *new*Configuration()); fs.copyFromLocalFile(*new* Path(C:\\Positive.txt),*new* Path( /user/hadoop/Positive.txt)); System.*out*.println(Done); } But I am getting the following exception : Exception in thread main org.apache.hadoop.security.AccessControlException: org.apache.hadoop.security.AccessControlException: Permission denied: user=DrWho, access=WRITE, inode=hadoop:hadoop:supergroup:rwxr-xr-x at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27) at java.lang.reflect.Constructor.newInstance(Constructor.java:513) at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:96) at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:58) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.init(DFSClient.java:2836) at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:500) at org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:206) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:484) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:465) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:372) at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:208) at org.apache.hadoop.fs.FileSystem.copyFromLocalFile(FileSystem.java:1189) at org.apache.hadoop.fs.FileSystem.copyFromLocalFile(FileSystem.java:1165) at org.apache.hadoop.fs.FileSystem.copyFromLocalFile(FileSystem.java:1137) at com.musigma.hdfs.HdfsBackup.main(HdfsBackup.java:20) Caused by: org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.security.AccessControlException: Permission denied: user=DrWho, access=WRITE, inode=hadoop:hadoop:supergroup:rwxr-xr-x at org.apache.hadoop.hdfs.server.namenode.PermissionChecker.check(PermissionChecker.java:176) at org.apache.hadoop.hdfs.server.namenode.PermissionChecker.check(PermissionChecker.java:157) at org.apache.hadoop.hdfs.server.namenode.PermissionChecker.checkPermission(PermissionChecker.java:105) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:4702) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkAncestorAccess(FSNamesystem.java:4672) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInternal(FSNamesystem.java:1048) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:1002) at org.apache.hadoop.hdfs.server.namenode.NameNode.create(NameNode.java:381) at sun.reflect.GeneratedMethodAccessor19.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:616) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:508) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:961) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:957) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:416) at
Re: risks of using Hadoop
On 21/09/11 11:30, Dieter Plaetinck wrote: On Wed, 21 Sep 2011 11:21:01 +0100 Steve Loughranste...@apache.org wrote: On 20/09/11 22:52, Michael Segel wrote: PS... There's this junction box in your machine room that has this very large on/off switch. If pulled down, it will cut power to your cluster and you will lose everything. Now would you consider this a risk? Sure. But is it something you should really lose sleep over? Do you understand that there are risks and there are improbable risks? We follow the @devops_borat Ops book and have a post-it-note on the switch saying not a light switch :D Also we have a backup 4-port 1Gbe linksys router for when the main switch fails. The biggest issue these days is that since we switched the backplane to Ethernet over Powerline a power outage leads to network partitioning even when the racks have UPS. see also http://twitter.com/#!/DEVOPS_BORAT
Re: Fwd: Any other way to copy to HDFS ?
Praveenesh, It should be understood, as a takeaway from this, that HDFS is a set of servers, like webservers are. You can send it a request, and you can expect a response. It is also an FS in the sense that it is designed to do FS like operations (hold inodes, read/write data), but primally it behaves like any other server would when you wanna communicate with it. When you load files into it, the mechanisms underneath are merely opening a TCP socket connection to the server(s) and writing packets through, and closing it down when done. Similarly, when reading out files as well. Of course the details are much more complex than a simple, single TCP connection, but that's how it works. Hope this helps you understand your Hadoop better ;-) On Wed, Sep 21, 2011 at 4:29 PM, praveenesh kumar praveen...@gmail.com wrote: Thanks a lot..!! I guess I can play around with the permissions of dfs for a while. On Wed, Sep 21, 2011 at 3:59 PM, Uma Maheswara Rao G 72686 mahesw...@huawei.com wrote: Hello Praveenesh, If you really need not care about permissions then you can disable it at NN side by using the property dfs.permissions.enable You can the permission for the path before creating as well. from docs: Changes to the File System API All methods that use a path parameter will throw AccessControlException if permission checking fails. New methods: public FSDataOutputStream create(Path f, FsPermission permission, boolean overwrite, int bufferSize, short replication, long blockSize, Progressable progress) throws IOException; public boolean mkdirs(Path f, FsPermission permission) throws IOException; public void setPermission(Path p, FsPermission permission) throws IOException; public void setOwner(Path p, String username, String groupname) throws IOException; public FileStatus getFileStatus(Path f) throws IOException; will additionally return the user, group and mode associated with the path. http://hadoop.apache.org/common/docs/r0.20.2/hdfs_permissions_guide.html Regards, Uma - Original Message - From: praveenesh kumar praveen...@gmail.com Date: Wednesday, September 21, 2011 3:41 pm Subject: Fwd: Any other way to copy to HDFS ? To: common-user@hadoop.apache.org Thanks a lot. I am trying to run the following code on my windows machinethat is not part of cluster. ** *public* *static* *void* main(String args[]) *throws* IOException, URISyntaxException { FileSystem fs =*new* DistributedFileSystem(); fs.initialize(*new* URI(hdfs://162.192.100.53:54310/), *new*Configuration()); fs.copyFromLocalFile(*new* Path(C:\\Positive.txt),*new* Path( /user/hadoop/Positive.txt)); System.*out*.println(Done); } But I am getting the following exception : Exception in thread main org.apache.hadoop.security.AccessControlException: org.apache.hadoop.security.AccessControlException: Permission denied: user=DrWho, access=WRITE, inode=hadoop:hadoop:supergroup:rwxr-xr-x at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27) at java.lang.reflect.Constructor.newInstance(Constructor.java:513) at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:96) at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:58) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.init(DFSClient.java:2836) at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:500) at org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:206) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:484) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:465) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:372) at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:208) at org.apache.hadoop.fs.FileSystem.copyFromLocalFile(FileSystem.java:1189) at org.apache.hadoop.fs.FileSystem.copyFromLocalFile(FileSystem.java:1165) at org.apache.hadoop.fs.FileSystem.copyFromLocalFile(FileSystem.java:1137) at com.musigma.hdfs.HdfsBackup.main(HdfsBackup.java:20) Caused by: org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.security.AccessControlException: Permission denied: user=DrWho, access=WRITE, inode=hadoop:hadoop:supergroup:rwxr-xr-x at org.apache.hadoop.hdfs.server.namenode.PermissionChecker.check(PermissionChecker.java:176) at org.apache.hadoop.hdfs.server.namenode.PermissionChecker.check(PermissionChecker.java:157) at org.apache.hadoop.hdfs.server.namenode.PermissionChecker.checkPermission(PermissionChecker.java:105) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:4702) at
Can we run job on some datanodes ?
Is there any way that we can run a particular job in a hadoop on subset of datanodes ? My problem is I don't want to use all the nodes to run some job, I am trying to make Job completion Vs No. of nodes graph for a particular job. One way to do is I can remove datanodes, and then see how much time the job is taking. Just for curiosity sake, want to know is there any other way possible to do this, without removing datanodes. I am afraid, if I remove datanodes, I can loose some data blocks that reside on those machines as I have some files with replication = 1 ? Thanks, Praveenesh
Re: Can we run job on some datanodes ?
Praveenesh, TaskTrackers run your jobs' tasks for you, not DataNodes directly. So you can statically control loads on nodes by removing away TaskTrackers from your cluster. i.e, if you service hadoop-0.20-tasktracker stop or hadoop-daemon.sh stop tasktracker on the specific nodes, jobs won't run there anymore. Is this what you're looking for? (There are ways to achieve the exclusion dynamically, by writing a scheduler, but hard to tell without knowing what you need specifically, and why do you require it?) On Wed, Sep 21, 2011 at 6:32 PM, praveenesh kumar praveen...@gmail.com wrote: Is there any way that we can run a particular job in a hadoop on subset of datanodes ? My problem is I don't want to use all the nodes to run some job, I am trying to make Job completion Vs No. of nodes graph for a particular job. One way to do is I can remove datanodes, and then see how much time the job is taking. Just for curiosity sake, want to know is there any other way possible to do this, without removing datanodes. I am afraid, if I remove datanodes, I can loose some data blocks that reside on those machines as I have some files with replication = 1 ? Thanks, Praveenesh -- Harsh J
RE: risks of using Hadoop
I am truly sorry if at some point in your life someone dropped an IBM logo on your head and it left a dent - but you are being a jerk. Right after you were engaging in your usual condescension a person from Xerox posted on the very issue you were blowing off. Things happen. To any system. I'm not knocking Hadoop - and frankly making sure new users have a good experience based on the real things that need to be aware of / manage is in everyone's interests here to grow the footprint. Please take note that no where in here have I ever said anything to discourage Hadoop deployments/use or anything that is vendor specific. Tom Deutsch Program Director CTO Office: Information Management Hadoop Product Manager / Customer Exec IBM 3565 Harbor Blvd Costa Mesa, CA 92626-1420 tdeut...@us.ibm.com Michael Segel michael_se...@hotmail.com 09/20/2011 02:52 PM Please respond to common-user@hadoop.apache.org To common-user@hadoop.apache.org cc Subject RE: risks of using Hadoop Tom, I think it is arrogant to parrot FUD when you've never had your hands dirty in any real Hadoop environment. So how could your response reflect the operational realities of running a Hadoop cluster? What Brian was saying was that the SPOF is an over played FUD trump card. Anyone who's built clusters will have mitigated the risks of losing the NN. Then there's MapR... where you don't have a SPOF. But again that's a derivative of Apache Hadoop. (Derivative isn't a bad thing...) You're right that you need to plan accordingly, however from risk perspective, this isn't a risk. In fact, I believe Tom White's book has a good layout to mitigate this and while I have First Ed, I'll have to double check the second ed to see if he modified it. Again, the point Brian was making and one that I agree with is that the NN as a SPOF is an overblown 'risk'. You have a greater chance of data loss than you do of losing your NN. Probably the reason why some of us are a bit irritated by the SPOF reference to the NN is that its clowns who haven't done any work in this space, pick up on the FUD and spread it around. This makes it difficult for guys like me from getting anything done because we constantly have to go back and reassure stake holders that its a non-issue. With respect to naming vendors, I did name MapR outside of Apache because they do have their own derivative release that improves upon the limitations found in Apache's Hadoop. -Mike PS... There's this junction box in your machine room that has this very large on/off switch. If pulled down, it will cut power to your cluster and you will lose everything. Now would you consider this a risk? Sure. But is it something you should really lose sleep over? Do you understand that there are risks and there are improbable risks? To: common-user@hadoop.apache.org Subject: RE: risks of using Hadoop From: tdeut...@us.ibm.com Date: Tue, 20 Sep 2011 12:48:05 -0700 No worries Michael - it would be stretch to see any arrogance or disrespect in your response. Kobina has asked a fair question, and deserves a response that reflects the operational realities of where we are. If you are looking at doing large scale CDR handling - which I believe is the use case here - you need to plan accordingly. Even you use the term mitigate - which is different than prevent. Kobina needs an understanding of that they are looking at. That isn't a pro/con stance on Hadoop, it is just reality and they should plan accordingly. (Note - I'm not the one who brought vendors into this - which doesn't strike me as appropriate for this list) Tom Deutsch Program Director CTO Office: Information Management Hadoop Product Manager / Customer Exec IBM 3565 Harbor Blvd Costa Mesa, CA 92626-1420 tdeut...@us.ibm.com Michael Segel michael_se...@hotmail.com 09/17/2011 07:37 PM Please respond to common-user@hadoop.apache.org To common-user@hadoop.apache.org cc Subject RE: risks of using Hadoop Gee Tom, No disrespect, but I don't believe you have any personal practical experience in designing and building out clusters or putting them to the test. Now to the points that Brian raised.. 1) SPOF... it sounds great on paper. Some FUD to scare someone away from Hadoop. But in reality... you can mitigate your risks by setting up raid on your NN/HM node. You can also NFS mount a copy to your SN (or whatever they're calling it these days...) Or you can go to MapR which has redesigned HDFS which removes this problem. But with your Apache Hadoop or Cloudera's release, losing your NN is rare. Yes it can happen, but not your greatest risk. (Not by a long shot) 2) Data Loss. You can mitigate this as well. Do I need to go through all of the options and DR/BCP planning? Sure there's always a chance that you
Re: Can we run job on some datanodes ?
Oh wow.. I didn't know that.. Actually for me datanodes/tasktrackers are running on same machines. I mention datanodes because if I delete those machines from masters list, chances are the data will also loose. So I don't want to do that.. but now I guess by stoping tasktrackers individually... I can decrease the strength of my cluster by decreasing the number of nodes that will run tasktracker .. right ?? This way I won't loose my data also.. Right ?? On Wed, Sep 21, 2011 at 6:39 PM, Harsh J ha...@cloudera.com wrote: Praveenesh, TaskTrackers run your jobs' tasks for you, not DataNodes directly. So you can statically control loads on nodes by removing away TaskTrackers from your cluster. i.e, if you service hadoop-0.20-tasktracker stop or hadoop-daemon.sh stop tasktracker on the specific nodes, jobs won't run there anymore. Is this what you're looking for? (There are ways to achieve the exclusion dynamically, by writing a scheduler, but hard to tell without knowing what you need specifically, and why do you require it?) On Wed, Sep 21, 2011 at 6:32 PM, praveenesh kumar praveen...@gmail.com wrote: Is there any way that we can run a particular job in a hadoop on subset of datanodes ? My problem is I don't want to use all the nodes to run some job, I am trying to make Job completion Vs No. of nodes graph for a particular job. One way to do is I can remove datanodes, and then see how much time the job is taking. Just for curiosity sake, want to know is there any other way possible to do this, without removing datanodes. I am afraid, if I remove datanodes, I can loose some data blocks that reside on those machines as I have some files with replication = 1 ? Thanks, Praveenesh -- Harsh J
Re: Can we run job on some datanodes ?
Praveenesh, Absolutely right. Just stop them individually :) On Wed, Sep 21, 2011 at 6:53 PM, praveenesh kumar praveen...@gmail.com wrote: Oh wow.. I didn't know that.. Actually for me datanodes/tasktrackers are running on same machines. I mention datanodes because if I delete those machines from masters list, chances are the data will also loose. So I don't want to do that.. but now I guess by stoping tasktrackers individually... I can decrease the strength of my cluster by decreasing the number of nodes that will run tasktracker .. right ?? This way I won't loose my data also.. Right ?? On Wed, Sep 21, 2011 at 6:39 PM, Harsh J ha...@cloudera.com wrote: Praveenesh, TaskTrackers run your jobs' tasks for you, not DataNodes directly. So you can statically control loads on nodes by removing away TaskTrackers from your cluster. i.e, if you service hadoop-0.20-tasktracker stop or hadoop-daemon.sh stop tasktracker on the specific nodes, jobs won't run there anymore. Is this what you're looking for? (There are ways to achieve the exclusion dynamically, by writing a scheduler, but hard to tell without knowing what you need specifically, and why do you require it?) On Wed, Sep 21, 2011 at 6:32 PM, praveenesh kumar praveen...@gmail.com wrote: Is there any way that we can run a particular job in a hadoop on subset of datanodes ? My problem is I don't want to use all the nodes to run some job, I am trying to make Job completion Vs No. of nodes graph for a particular job. One way to do is I can remove datanodes, and then see how much time the job is taking. Just for curiosity sake, want to know is there any other way possible to do this, without removing datanodes. I am afraid, if I remove datanodes, I can loose some data blocks that reside on those machines as I have some files with replication = 1 ? Thanks, Praveenesh -- Harsh J -- Harsh J
Problem with MR job
Hi all, We are trying to run a mahout job in a hadoop cluster, but we keep getting the same status. The job passes the initial mahout stages and when it comes to be executed as a MR job, it seems to be stuck at 0% progress. Through the UI we see that it is submitted but not running. After a while it gets killed. In the logs the error shown is this one: 2011-09-21 07:47:50,507 INFO org.apache.hadoop.mapred.JobTracker: problem cleaning system directory: hdfs://master/var/lib/hadoop-0.20/cache/hdfs/mapred/system org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.hdfs.server.namenode.SafeModeException: Cannot create directory /var/lib/hadoop-0.20/cache/hdfs/mapred/system. Name nod$ The reported blocks 0 needs additional 12 blocks to reach the threshold 0.9990 of total blocks 13. Safe mode will be turned off automatically. at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInternal(FSNamesystem.java:1966) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirs(FSNamesystem.java:1940) at org.apache.hadoop.hdfs.server.namenode.NameNode.mkdirs(NameNode.java:770) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) Some staging files seem to have been created however. I was thinking of sending this to the mahout mailing list but it seems a more core hadoop issue. We are using the following command to launch the mahout example: ./mahout org.apache.mahout.clustering.syntheticcontrol.kmeans.Job --input hdfs://master/user/hdfs/testdata/synthetic_control.data --output hdfs://master/user/hdfs/testdata/output --t1 0.5 --t2 1 --maxIter 50 Any clues? George -- --- George Kousiouris Electrical and Computer Engineer Division of Communications, Electronics and Information Engineering School of Electrical and Computer Engineering Tel: +30 210 772 2546 Mobile: +30 6939354121 Fax: +30 210 772 2569 Email: gkous...@mail.ntua.gr Site: http://users.ntua.gr/gkousiou/ National Technical University of Athens 9 Heroon Polytechniou str., 157 73 Zografou, Athens, Greece
Re: Problem with MR job
Hello George, Have you looked at your DFS health page (http://NN:50070/)? I believe you have missing or fallen DataNode instances. I'd start them back up, after checking their (DataNode's) logs to figure out why they died. On Wed, Sep 21, 2011 at 7:28 PM, George Kousiouris gkous...@mail.ntua.gr wrote: Hi all, We are trying to run a mahout job in a hadoop cluster, but we keep getting the same status. The job passes the initial mahout stages and when it comes to be executed as a MR job, it seems to be stuck at 0% progress. Through the UI we see that it is submitted but not running. After a while it gets killed. In the logs the error shown is this one: 2011-09-21 07:47:50,507 INFO org.apache.hadoop.mapred.JobTracker: problem cleaning system directory: hdfs://master/var/lib/hadoop-0.20/cache/hdfs/mapred/system org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.hdfs.server.namenode.SafeModeException: Cannot create directory /var/lib/hadoop-0.20/cache/hdfs/mapred/system. Name nod$ The reported blocks 0 needs additional 12 blocks to reach the threshold 0.9990 of total blocks 13. Safe mode will be turned off automatically. at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInternal(FSNamesystem.java:1966) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirs(FSNamesystem.java:1940) at org.apache.hadoop.hdfs.server.namenode.NameNode.mkdirs(NameNode.java:770) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) Some staging files seem to have been created however. I was thinking of sending this to the mahout mailing list but it seems a more core hadoop issue. We are using the following command to launch the mahout example: ./mahout org.apache.mahout.clustering.syntheticcontrol.kmeans.Job --input hdfs://master/user/hdfs/testdata/synthetic_control.data --output hdfs://master/user/hdfs/testdata/output --t1 0.5 --t2 1 --maxIter 50 Any clues? George -- --- George Kousiouris Electrical and Computer Engineer Division of Communications, Electronics and Information Engineering School of Electrical and Computer Engineering Tel: +30 210 772 2546 Mobile: +30 6939354121 Fax: +30 210 772 2569 Email: gkous...@mail.ntua.gr Site: http://users.ntua.gr/gkousiou/ National Technical University of Athens 9 Heroon Polytechniou str., 157 73 Zografou, Athens, Greece -- Harsh J
Re: Problem with MR job
Hi, Any cluster restart happend? ..is your NameNode detecting DataNodes as live? Looks DNs did not report anyblocks to NN yet. You have 13 blocks persisted in NameNode namespace. At least 12 blocks should be reported from your DNs. Other wise automatically it will not come out of safemode. Regards, Uma - Original Message - From: George Kousiouris gkous...@mail.ntua.gr Date: Wednesday, September 21, 2011 7:29 pm Subject: Problem with MR job To: common-user@hadoop.apache.org common-user@hadoop.apache.org Hi all, We are trying to run a mahout job in a hadoop cluster, but we keep getting the same status. The job passes the initial mahout stages and when it comes to be executed as a MR job, it seems to be stuck at 0% progress. Through the UI we see that it is submitted but not running. After a while it gets killed. In the logs the error shown is this one: 2011-09-21 07:47:50,507 INFO org.apache.hadoop.mapred.JobTracker: problem cleaning system directory: hdfs://master/var/lib/hadoop-0.20/cache/hdfs/mapred/system org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.hdfs.server.namenode.SafeModeException: Cannot create directory /var/lib/hadoop-0.20/cache/hdfs/mapred/system. Name nod$ The reported blocks 0 needs additional 12 blocks to reach the threshold 0.9990 of total blocks 13. Safe mode will be turned off automatically. at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInternal(FSNamesystem.java:1966) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirs(FSNamesystem.java:1940) at org.apache.hadoop.hdfs.server.namenode.NameNode.mkdirs(NameNode.java:770) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) Some staging files seem to have been created however. I was thinking of sending this to the mahout mailing list but it seems a more core hadoop issue. We are using the following command to launch the mahout example: ./mahout org.apache.mahout.clustering.syntheticcontrol.kmeans.Job --input hdfs://master/user/hdfs/testdata/synthetic_control.data -- output hdfs://master/user/hdfs/testdata/output --t1 0.5 --t2 1 --maxIter 50 Any clues? George -- --- George Kousiouris Electrical and Computer Engineer Division of Communications, Electronics and Information Engineering School of Electrical and Computer Engineering Tel: +30 210 772 2546 Mobile: +30 6939354121 Fax: +30 210 772 2569 Email: gkous...@mail.ntua.gr Site: http://users.ntua.gr/gkousiou/ National Technical University of Athens 9 Heroon Polytechniou str., 157 73 Zografou, Athens, Greece
Re: Problem with MR job
Hi, The status seems healthy and the datanodes live: Status: HEALTHY Total size:118805326 B Total dirs:31 Total files:38 Total blocks (validated):38 (avg. block size 3126455 B) Minimally replicated blocks:38 (100.0 %) Over-replicated blocks:0 (0.0 %) Under-replicated blocks:9 (23.68421 %) Mis-replicated blocks:0 (0.0 %) Default replication factor:1 Average block replication:1.2368422 Corrupt blocks:0 Missing replicas:72 (153.19148 %) Number of data-nodes:2 Number of racks:1 FSCK ended at Wed Sep 21 10:06:17 EDT 2011 in 9 milliseconds The filesystem under path '/' is HEALTHY The jps command has the following output: hdfs@master:~$ jps 24292 SecondaryNameNode 30010 Jps 24109 DataNode 23962 NameNode Shouldn't this have two datanode listings? In our system, one of the datanodes and the namenode is the same machine, but i seem to remember that in the past even with this setup two datanode listings appeared in the jps output. Thanks, George On 9/21/2011 5:08 PM, Uma Maheswara Rao G 72686 wrote: Hi, Any cluster restart happend? ..is your NameNode detecting DataNodes as live? Looks DNs did not report anyblocks to NN yet. You have 13 blocks persisted in NameNode namespace. At least 12 blocks should be reported from your DNs. Other wise automatically it will not come out of safemode. Regards, Uma - Original Message - From: George Kousiourisgkous...@mail.ntua.gr Date: Wednesday, September 21, 2011 7:29 pm Subject: Problem with MR job To: common-user@hadoop.apache.orgcommon-user@hadoop.apache.org Hi all, We are trying to run a mahout job in a hadoop cluster, but we keep getting the same status. The job passes the initial mahout stages and when it comes to be executed as a MR job, it seems to be stuck at 0% progress. Through the UI we see that it is submitted but not running. After a while it gets killed. In the logs the error shown is this one: 2011-09-21 07:47:50,507 INFO org.apache.hadoop.mapred.JobTracker: problem cleaning system directory: hdfs://master/var/lib/hadoop-0.20/cache/hdfs/mapred/system org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.hdfs.server.namenode.SafeModeException: Cannot create directory /var/lib/hadoop-0.20/cache/hdfs/mapred/system. Name nod$ The reported blocks 0 needs additional 12 blocks to reach the threshold 0.9990 of total blocks 13. Safe mode will be turned off automatically. at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInternal(FSNamesystem.java:1966) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirs(FSNamesystem.java:1940) at org.apache.hadoop.hdfs.server.namenode.NameNode.mkdirs(NameNode.java:770) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) Some staging files seem to have been created however. I was thinking of sending this to the mahout mailing list but it seems a more core hadoop issue. We are using the following command to launch the mahout example: ./mahout org.apache.mahout.clustering.syntheticcontrol.kmeans.Job --input hdfs://master/user/hdfs/testdata/synthetic_control.data -- output hdfs://master/user/hdfs/testdata/output --t1 0.5 --t2 1 --maxIter 50 Any clues? George -- --- George Kousiouris Electrical and Computer Engineer Division of Communications, Electronics and Information Engineering School of Electrical and Computer Engineering Tel: +30 210 772 2546 Mobile: +30 6939354121 Fax: +30 210 772 2569 Email: gkous...@mail.ntua.gr Site: http://users.ntua.gr/gkousiou/ National Technical University of Athens 9 Heroon Polytechniou str., 157 73 Zografou, Athens, Greece -- --- George Kousiouris Electrical and Computer Engineer Division of Communications, Electronics and Information Engineering School of Electrical and Computer Engineering Tel: +30 210 772 2546 Mobile: +30 6939354121 Fax: +30 210 772 2569 Email: gkous...@mail.ntua.gr Site: http://users.ntua.gr/gkousiou/ National Technical University of Athens 9 Heroon Polytechniou str., 157 73 Zografou, Athens, Greece
Re: Problem with MR job
Hi, Some more logs, specifically from the JobTracker: 2011-09-21 10:22:43,482 INFO org.apache.hadoop.mapred.JobInProgress: Initializing job_201109211018_0001 2011-09-21 10:22:43,538 ERROR org.apache.hadoop.mapred.JobHistory: Failed creating job history log file for job job_201109211018_0001 java.io.FileNotFoundException: /usr/lib/hadoop-0.20/logs/history/master_1316614721548_job_201109211018_0001_hdfs_Input+Driver+running+over+input%3A+hdfs%3A%2F%2Fmaster%2Fuse (P$ at java.io.FileOutputStream.open(Native Method) at java.io.FileOutputStream.init(FileOutputStream.java:179) at org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.init(RawLocalFileSystem.java:189) at org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.init(RawLocalFileSystem.java:185) at org.apache.hadoop.fs.RawLocalFileSystem.create(RawLocalFileSystem.java:243) at org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSOutputSummer.init(ChecksumFileSystem.java:336) at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:369) at org.apache.hadoop.mapred.JobHistory$JobInfo.logSubmitted(JobHistory.java:1223) at org.apache.hadoop.mapred.JobInProgress$3.run(JobInProgress.java:681) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115) at org.apache.hadoop.mapred.JobInProgress.initTasks(JobInProgress.java:678) at org.apache.hadoop.mapred.JobTracker.initJob(JobTracker.java:4013) at org.apache.hadoop.mapred.EagerTaskInitializationListener$InitJob.run(EagerTaskInitializationListener.java:79) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) 2011-09-21 10:22:43,666 ERROR org.apache.hadoop.mapred.JobHistory: Failed to store job conf in the log dir java.io.FileNotFoundException: /usr/lib/hadoop-0.20/logs/history/master_1316614721548_job_201109211018_0001_conf.xml (Permission denied) at java.io.FileOutputStream.open(Native Method) at java.io.FileOutputStream.init(FileOutputStream.java:179) at org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.init(RawLocalFileSystem.java:189) at org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.init(RawLocalFileSystem.java:185) at org.apache.hadoop.fs.RawLocalFileSystem.create(RawLocalFileSystem.java:243) at org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSOutputSummer.init(ChecksumFileSystem.java:336) at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:369) On 9/21/2011 5:15 PM, George Kousiouris wrote: Hi, The status seems healthy and the datanodes live: Status: HEALTHY Total size:118805326 B Total dirs:31 Total files:38 Total blocks (validated):38 (avg. block size 3126455 B) Minimally replicated blocks:38 (100.0 %) Over-replicated blocks:0 (0.0 %) Under-replicated blocks:9 (23.68421 %) Mis-replicated blocks:0 (0.0 %) Default replication factor:1 Average block replication:1.2368422 Corrupt blocks:0 Missing replicas:72 (153.19148 %) Number of data-nodes:2 Number of racks:1 FSCK ended at Wed Sep 21 10:06:17 EDT 2011 in 9 milliseconds The filesystem under path '/' is HEALTHY The jps command has the following output: hdfs@master:~$ jps 24292 SecondaryNameNode 30010 Jps 24109 DataNode 23962 NameNode Shouldn't this have two datanode listings? In our system, one of the datanodes and the namenode is the same machine, but i seem to remember that in the past even with this setup two datanode listings appeared in the jps output. Thanks, George On 9/21/2011 5:08 PM, Uma Maheswara Rao G 72686 wrote: Hi, Any cluster restart happend? ..is your NameNode detecting DataNodes as live? Looks DNs did not report anyblocks to NN yet. You have 13 blocks persisted in NameNode namespace. At least 12 blocks should be reported from your DNs. Other wise automatically it will not come out of safemode. Regards, Uma - Original Message - From: George Kousiourisgkous...@mail.ntua.gr Date: Wednesday, September 21, 2011 7:29 pm Subject: Problem with MR job To: common-user@hadoop.apache.orgcommon-user@hadoop.apache.org Hi all, We are trying to run a mahout job in a hadoop cluster, but we keep getting the same status. The job passes the initial mahout stages and when it comes to be executed as a MR job, it seems to be stuck at 0% progress. Through the UI we see that it is submitted but not running. After a while it gets killed. In the logs the error shown is this one: 2011-09-21 07:47:50,507 INFO
Re: Can we run job on some datanodes ?
Praveen, If you are doing performance measurements be aware that having more datanodes then tasktrackers will impact the performance as well (Don't really know for sure how). It will not be the same performance as running on a cluster with just fewer nodes over all. Also if you do shut off datanodes as well as task trackers you will need to give the cluster a while for re-replication to finish before you try to run your performance numbers. --Bobby Evans On 9/21/11 8:27 AM, Harsh J ha...@cloudera.com wrote: Praveenesh, Absolutely right. Just stop them individually :) On Wed, Sep 21, 2011 at 6:53 PM, praveenesh kumar praveen...@gmail.com wrote: Oh wow.. I didn't know that.. Actually for me datanodes/tasktrackers are running on same machines. I mention datanodes because if I delete those machines from masters list, chances are the data will also loose. So I don't want to do that.. but now I guess by stoping tasktrackers individually... I can decrease the strength of my cluster by decreasing the number of nodes that will run tasktracker .. right ?? This way I won't loose my data also.. Right ?? On Wed, Sep 21, 2011 at 6:39 PM, Harsh J ha...@cloudera.com wrote: Praveenesh, TaskTrackers run your jobs' tasks for you, not DataNodes directly. So you can statically control loads on nodes by removing away TaskTrackers from your cluster. i.e, if you service hadoop-0.20-tasktracker stop or hadoop-daemon.sh stop tasktracker on the specific nodes, jobs won't run there anymore. Is this what you're looking for? (There are ways to achieve the exclusion dynamically, by writing a scheduler, but hard to tell without knowing what you need specifically, and why do you require it?) On Wed, Sep 21, 2011 at 6:32 PM, praveenesh kumar praveen...@gmail.com wrote: Is there any way that we can run a particular job in a hadoop on subset of datanodes ? My problem is I don't want to use all the nodes to run some job, I am trying to make Job completion Vs No. of nodes graph for a particular job. One way to do is I can remove datanodes, and then see how much time the job is taking. Just for curiosity sake, want to know is there any other way possible to do this, without removing datanodes. I am afraid, if I remove datanodes, I can loose some data blocks that reside on those machines as I have some files with replication = 1 ? Thanks, Praveenesh -- Harsh J -- Harsh J
Re: Using HBase for real time transaction
On Sep 20, 2011, at 10:06 PM, Jean-Daniel Cryans wrote: I think there has to be some clarification. The OP was asking about a mySQL replacement. HBase will never be a RDBMS replacement. No Transactions means no way of doing OLTP. Its the wrong tool for that type of work. Agreed, if you are looking to handle relational data in a relational fashion, might be better to look elsewhere I am not looking for relational database. But looking creating multi tenant database, now at this time I am not sure whether it needs transactions or not and even that kind of architecture can support transactions. Recognize what HBase is and what it is not. Not sure what you're referring to here. This doesn't mean you can't take in or deliver data in real time, it can. So if you want to use it in a real time manner, sure. Note that like with other databases, you will have to do some work to handle real time data. I guess you would have to provide a specific use case on what you want to achieve in order to know if its a good fit. He says: Hope above line explains what I am interested(multi tenant database) The requirement is to have real time read and write operations. I mean as soon as data is written the user should see the data(Here data should be written in Hbase). Row mutations in HBase are seen by the user as soon as they are done, atomicity is guaranteed at the row level, which seems to satisfy his requirement. If multi-row transactions are needed then I agree HBase might not be what he wants. Can't we handle transaction through application or container, before data even goes to HBase? And I do have one more doubt, how to handle low read latency? -Jignesh
Re: Problem with MR job
Can you check your DN data directories once, whether the blocks present or not? Can you give the DN and NN logs. Please put them in some site and share the link here. Regards, Uma - Original Message - From: George Kousiouris gkous...@mail.ntua.gr Date: Wednesday, September 21, 2011 8:06 pm Subject: Re: Problem with MR job To: common-user@hadoop.apache.org Cc: Uma Maheswara Rao G 72686 mahesw...@huawei.com Hi, Some more logs, specifically from the JobTracker: 2011-09-21 10:22:43,482 INFO org.apache.hadoop.mapred.JobInProgress: Initializing job_201109211018_0001 2011-09-21 10:22:43,538 ERROR org.apache.hadoop.mapred.JobHistory: Failed creating job history log file for job job_201109211018_0001 java.io.FileNotFoundException: /usr/lib/hadoop- 0.20/logs/history/master_1316614721548_job_201109211018_0001_hdfs_Input+Driver+running+over+input%3A+hdfs%3A%2F%2Fmaster%2Fuse (P$ at java.io.FileOutputStream.open(Native Method) at java.io.FileOutputStream.init(FileOutputStream.java:179) at org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.init(RawLocalFileSystem.java:189) at org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.init(RawLocalFileSystem.java:185) at org.apache.hadoop.fs.RawLocalFileSystem.create(RawLocalFileSystem.java:243) at org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSOutputSummer.init(ChecksumFileSystem.java:336) at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:369) at org.apache.hadoop.mapred.JobHistory$JobInfo.logSubmitted(JobHistory.java:1223) at org.apache.hadoop.mapred.JobInProgress$3.run(JobInProgress.java:681) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115) at org.apache.hadoop.mapred.JobInProgress.initTasks(JobInProgress.java:678) at org.apache.hadoop.mapred.JobTracker.initJob(JobTracker.java:4013) at org.apache.hadoop.mapred.EagerTaskInitializationListener$InitJob.run(EagerTaskInitializationListener.java:79) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) 2011-09-21 10:22:43,666 ERROR org.apache.hadoop.mapred.JobHistory: Failed to store job conf in the log dir java.io.FileNotFoundException: /usr/lib/hadoop- 0.20/logs/history/master_1316614721548_job_201109211018_0001_conf.xml (Permission denied) at java.io.FileOutputStream.open(Native Method) at java.io.FileOutputStream.init(FileOutputStream.java:179) at org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.init(RawLocalFileSystem.java:189) at org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.init(RawLocalFileSystem.java:185) at org.apache.hadoop.fs.RawLocalFileSystem.create(RawLocalFileSystem.java:243) at org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSOutputSummer.init(ChecksumFileSystem.java:336) at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:369) On 9/21/2011 5:15 PM, George Kousiouris wrote: Hi, The status seems healthy and the datanodes live: Status: HEALTHY Total size:118805326 B Total dirs:31 Total files:38 Total blocks (validated):38 (avg. block size 3126455 B) Minimally replicated blocks:38 (100.0 %) Over-replicated blocks:0 (0.0 %) Under-replicated blocks:9 (23.68421 %) Mis-replicated blocks:0 (0.0 %) Default replication factor:1 Average block replication:1.2368422 Corrupt blocks:0 Missing replicas:72 (153.19148 %) Number of data-nodes:2 Number of racks:1 FSCK ended at Wed Sep 21 10:06:17 EDT 2011 in 9 milliseconds The filesystem under path '/' is HEALTHY The jps command has the following output: hdfs@master:~$ jps 24292 SecondaryNameNode 30010 Jps 24109 DataNode 23962 NameNode Shouldn't this have two datanode listings? In our system, one of the datanodes and the namenode is the same machine, but i seem to remember that in the past even with this setup two datanode listings appeared in the jps output. Thanks, George On 9/21/2011 5:08 PM, Uma Maheswara Rao G 72686 wrote: Hi, Any cluster restart happend? ..is your NameNode detecting DataNodes as live? Looks DNs did not report anyblocks to NN yet. You have 13 blocks persisted in NameNode namespace. At least 12 blocks should be reported from your DNs. Other wise automatically it will not come out of safemode. Regards, Uma - Original
Re: risks of using Hadoop
Jignesh, Will your point 2 still be valid if we hire very experienced Java programmers? Kobina. On 20 September 2011 21:07, Jignesh Patel jign...@websoft.com wrote: @Kobina 1. Lack of skill set 2. Longer learning curve 3. Single point of failure @Uma I am curious to know about .20.2 is that stable? Is it same as the one you mention in your email(Federation changes), If I need scaled nameNode and append support, which version I should choose. Regarding Single point of failure, I believe Hortonworks(a.k.a Yahoo) is updating the Hadoop API. When that will be integrated with Hadoop. If I need -Jignesh On Sep 17, 2011, at 12:08 AM, Uma Maheswara Rao G 72686 wrote: Hi Kobina, Some experiences which may helpful for you with respective to DFS. 1. Selecting the correct version. I will recommend to use 0.20X version. This is pretty stable version and all other organizations prefers it. Well tested as well. Dont go for 21 version.This version is not a stable version.This is risk. 2. You should perform thorough test with your customer operations. (of-course you will do this :-)) 3. 0.20x version has the problem of SPOF. If NameNode goes down you will loose the data.One way of recovering is by using the secondaryNameNode.You can recover the data till last checkpoint.But here manual intervention is required. In latest trunk SPOF will be addressed bu HDFS-1623. 4. 0.20x NameNodes can not scale. Federation changes included in latest versions. ( i think in 22). this may not be the problem for your cluster. But please consider this aspect as well. 5. Please select the hadoop version depending on your security requirements. There are versions available for security as well in 0.20X. 6. If you plan to use Hbase, it requires append support. 20Append has the support for append. 0.20.205 release also will have append support but not yet released. Choose your correct version to avoid sudden surprises. Regards, Uma - Original Message - From: Kobina Kwarko kobina.kwa...@gmail.com Date: Saturday, September 17, 2011 3:42 am Subject: Re: risks of using Hadoop To: common-user@hadoop.apache.org We are planning to use Hadoop in my organisation for quality of servicesanalysis out of CDR records from mobile operators. We are thinking of having a small cluster of may be 10 - 15 nodes and I'm preparing the proposal. my office requires that i provide some risk analysis in the proposal. thank you. On 16 September 2011 20:34, Uma Maheswara Rao G 72686 mahesw...@huawei.comwrote: Hello, First of all where you are planning to use Hadoop? Regards, Uma - Original Message - From: Kobina Kwarko kobina.kwa...@gmail.com Date: Saturday, September 17, 2011 0:41 am Subject: risks of using Hadoop To: common-user common-user@hadoop.apache.org Hello, Please can someone point some of the risks we may incur if we decide to implement Hadoop? BR, Isaac.
Re: risks of using Hadoop
Jignesh, Please see my comments inline. - Original Message - From: Kobina Kwarko kobina.kwa...@gmail.com Date: Wednesday, September 21, 2011 9:33 pm Subject: Re: risks of using Hadoop To: common-user@hadoop.apache.org Jignesh, Will your point 2 still be valid if we hire very experienced Java programmers? Kobina. On 20 September 2011 21:07, Jignesh Patel jign...@websoft.com wrote: @Kobina 1. Lack of skill set 2. Longer learning curve 3. Single point of failure @Uma I am curious to know about .20.2 is that stable? Is it same as the one you mention in your email(Federation changes), If I need scaled nameNode and append support, which version I should choose. Regarding Single point of failure, I believe Hortonworks(a.k.a Yahoo) is updating the Hadoop API. When that will be integrated with Hadoop. If I need Yes, 0.20 versions are stable. Federation changes will not be available in 0.20 versions. I think Fedaration changes has been merged to 0.23 branch. So, from 0.23 onwards you can get Fedaration implementaion. But there is no release happend for 0.23 branch yet. Regarding NameNode High Availability, there is one issue HDFS-1623 to build.(Inprogress)This may take couple of months to integrate. -Jignesh On Sep 17, 2011, at 12:08 AM, Uma Maheswara Rao G 72686 wrote: Hi Kobina, Some experiences which may helpful for you with respective to DFS. 1. Selecting the correct version. I will recommend to use 0.20X version. This is pretty stable version and all other organizations prefers it. Well tested as well. Dont go for 21 version.This version is not a stable version.This is risk. 2. You should perform thorough test with your customer operations. (of-course you will do this :-)) 3. 0.20x version has the problem of SPOF. If NameNode goes down you will loose the data.One way of recovering is by using the secondaryNameNode.You can recover the data till last checkpoint.But here manual intervention is required. In latest trunk SPOF will be addressed bu HDFS-1623. 4. 0.20x NameNodes can not scale. Federation changes included in latest versions. ( i think in 22). this may not be the problem for your cluster. But please consider this aspect as well. 5. Please select the hadoop version depending on your security requirements. There are versions available for security as well in 0.20X. 6. If you plan to use Hbase, it requires append support. 20Append has the support for append. 0.20.205 release also will have append support but not yet released. Choose your correct version to avoid sudden surprises. Regards, Uma - Original Message - From: Kobina Kwarko kobina.kwa...@gmail.com Date: Saturday, September 17, 2011 3:42 am Subject: Re: risks of using Hadoop To: common-user@hadoop.apache.org We are planning to use Hadoop in my organisation for quality of servicesanalysis out of CDR records from mobile operators. We are thinking of having a small cluster of may be 10 - 15 nodes and I'm preparing the proposal. my office requires that i provide some risk analysis in the proposal. thank you. On 16 September 2011 20:34, Uma Maheswara Rao G 72686 mahesw...@huawei.comwrote: Hello, First of all where you are planning to use Hadoop? Regards, Uma - Original Message - From: Kobina Kwarko kobina.kwa...@gmail.com Date: Saturday, September 17, 2011 0:41 am Subject: risks of using Hadoop To: common-user common-user@hadoop.apache.org Hello, Please can someone point some of the risks we may incur if we decide to implement Hadoop? BR, Isaac. Regards, Uma
Re: risks of using Hadoop
Another way to decrease the risks is just to use Amazon Web Services. That might be a bit expensive On Sun, Sep 18, 2011 at 12:11 AM, Brian Bockelman bbock...@cse.unl.edu wrote: On Sep 16, 2011, at 11:08 PM, Uma Maheswara Rao G 72686 wrote: Hi Kobina, Some experiences which may helpful for you with respective to DFS. 1. Selecting the correct version. I will recommend to use 0.20X version. This is pretty stable version and all other organizations prefers it. Well tested as well. Dont go for 21 version.This version is not a stable version.This is risk. 2. You should perform thorough test with your customer operations. (of-course you will do this :-)) 3. 0.20x version has the problem of SPOF. If NameNode goes down you will loose the data.One way of recovering is by using the secondaryNameNode.You can recover the data till last checkpoint.But here manual intervention is required. In latest trunk SPOF will be addressed bu HDFS-1623. 4. 0.20x NameNodes can not scale. Federation changes included in latest versions. ( i think in 22). this may not be the problem for your cluster. But please consider this aspect as well. With respect to (3) and (4) - these are often completely overblown for many Hadoop use cases. If you use Hadoop as originally designed (large scale batch data processing), these likely don't matter. If you're looking at some of the newer use cases (low latency stuff or time-critical processing), or if you architect your solution poorly (lots of small files), these issues become relevant. Another case where I see folks get frustrated is using Hadoop as a plain old batch system; for non-data workflows, it doesn't measure up against specialized systems. You really want to make sure that Hadoop is the best tool for your job. Brian
RE: risks of using Hadoop
Tom, Normally someone who has a personal beef with someone will take it offline and deal with it. Clearly manners aren't your strong point... unfortunately making me respond to you in public. Since you asked, no, I don't have any beefs with IBM. In fact, I happen to have quite a few friends within IBM's IM pillar. (although many seem to taking Elvis' advice and left the building...) What I do have a problem with is you and your response to the posts in this thread. Its bad enough that you really don't know what you're talking about. But this is compounded by the fact that your posts end with your job title seems to indicate that you are a thought leader from a well known, brand name company. So unlike some schmuck off the street, because of your job title, someone may actually pay attention to you and take what you say at face value. The issue at hand is that the OP wanted to know the risks so that he can address them to give his pointy haired stake holders a warm fuzzy feeling. SPOF isn't a risk, but a point of FUD that is constantly being brought out by people who have an alternative that they wanted to promote. Brian pretty much put it in to perspective. You attempted to correct him, and while Brian was polite, I'm not. Why? Because I happen to know of enough people who still think that what BS IBM trots out must be true and taken at face value. I think you're more concerned with making an appearance than you are with anyone having a good experience. No offense, but again, you're not someone who has actual hands on experience so you're not in position to give advice. I don't know to write what you say out of being arrogant, but I have to wonder if you actually paid attention in your SSM class. Raising FUD and non issues as risk doesn't help anyone promote Hadoop, regardless of the vendor. What it does is cause the stakeholders reason to pause. Overstating risks can cause just as much harm as over promising results. Again, its Sales 101. Perhaps you're still trying to convert these folks off Hadoop on to IBM's DB2? No wait, that was someone else... and it wasn't Hadoop, it was Informix. (Sorry to the list, that was an inside joke that probably went over Tom's head, but for someone's benefit.) To help drill the point of the issue home... 1) Look at MapR, an IBM competitor who's derivative already solves this SPOF problem. 2) Look at how to set up a cluster (Apache, HortonWorks, Cloudera) where you can mitigate this by your node configuration along with simple sysadmin tricks like NFS mounting a drive from a different machine within the cluster (Preferably a different rack for a back up.) 3) Think about your backup and recovery of your Name Node's files. There's more, and I would encourage you to actually talk to a professional before giving out advice. ;-) HTH -Mike PS. My last PS talked about the big power switch in a switch box in the machine room that cuts the power. (When its a lever, do you really need to tell someone that its not a light switch? And I guess you could padlock it too) Seriously, there is more risk to data loss and corruption based on luser issues than there is of a SPOF (NN failure). To: common-user@hadoop.apache.org Subject: RE: risks of using Hadoop From: tdeut...@us.ibm.com Date: Wed, 21 Sep 2011 06:20:53 -0700 I am truly sorry if at some point in your life someone dropped an IBM logo on your head and it left a dent - but you are being a jerk. Right after you were engaging in your usual condescension a person from Xerox posted on the very issue you were blowing off. Things happen. To any system. I'm not knocking Hadoop - and frankly making sure new users have a good experience based on the real things that need to be aware of / manage is in everyone's interests here to grow the footprint. Please take note that no where in here have I ever said anything to discourage Hadoop deployments/use or anything that is vendor specific. Tom Deutsch Program Director CTO Office: Information Management Hadoop Product Manager / Customer Exec IBM 3565 Harbor Blvd Costa Mesa, CA 92626-1420 tdeut...@us.ibm.com Michael Segel michael_se...@hotmail.com 09/20/2011 02:52 PM Please respond to common-user@hadoop.apache.org To common-user@hadoop.apache.org cc Subject RE: risks of using Hadoop Tom, I think it is arrogant to parrot FUD when you've never had your hands dirty in any real Hadoop environment. So how could your response reflect the operational realities of running a Hadoop cluster? What Brian was saying was that the SPOF is an over played FUD trump card. Anyone who's built clusters will have mitigated the risks of losing the NN. Then there's MapR... where you don't have a SPOF. But again that's a derivative of Apache Hadoop. (Derivative isn't a bad thing...) You're right that you need
RE: risks of using Hadoop
Kobina The points 1 and 2 are definitely real risks. SPOF is not. As I pointed out in my mini-rant to Tom was that your end users / developers who use the cluster can do more harm to your cluster than a SPOF machine failure. I don't know what one would consider a 'long learning curve'. With the adoption of any new technology, you're talking at least 3-6 months based on the individual and the overall complexity of the environment. Take anyone who is a strong developer, put them through Cloudera's training, plus some play time, and you've shortened the learning curve. The better the java developer, the easier it is for them to pick up Hadoop. I would also suggest taking the approach of hiring a senior person who can cross train and mentor your staff. This too will shorten the runway. HTH -Mike Date: Wed, 21 Sep 2011 17:02:45 +0100 Subject: Re: risks of using Hadoop From: kobina.kwa...@gmail.com To: common-user@hadoop.apache.org Jignesh, Will your point 2 still be valid if we hire very experienced Java programmers? Kobina. On 20 September 2011 21:07, Jignesh Patel jign...@websoft.com wrote: @Kobina 1. Lack of skill set 2. Longer learning curve 3. Single point of failure @Uma I am curious to know about .20.2 is that stable? Is it same as the one you mention in your email(Federation changes), If I need scaled nameNode and append support, which version I should choose. Regarding Single point of failure, I believe Hortonworks(a.k.a Yahoo) is updating the Hadoop API. When that will be integrated with Hadoop. If I need -Jignesh On Sep 17, 2011, at 12:08 AM, Uma Maheswara Rao G 72686 wrote: Hi Kobina, Some experiences which may helpful for you with respective to DFS. 1. Selecting the correct version. I will recommend to use 0.20X version. This is pretty stable version and all other organizations prefers it. Well tested as well. Dont go for 21 version.This version is not a stable version.This is risk. 2. You should perform thorough test with your customer operations. (of-course you will do this :-)) 3. 0.20x version has the problem of SPOF. If NameNode goes down you will loose the data.One way of recovering is by using the secondaryNameNode.You can recover the data till last checkpoint.But here manual intervention is required. In latest trunk SPOF will be addressed bu HDFS-1623. 4. 0.20x NameNodes can not scale. Federation changes included in latest versions. ( i think in 22). this may not be the problem for your cluster. But please consider this aspect as well. 5. Please select the hadoop version depending on your security requirements. There are versions available for security as well in 0.20X. 6. If you plan to use Hbase, it requires append support. 20Append has the support for append. 0.20.205 release also will have append support but not yet released. Choose your correct version to avoid sudden surprises. Regards, Uma - Original Message - From: Kobina Kwarko kobina.kwa...@gmail.com Date: Saturday, September 17, 2011 3:42 am Subject: Re: risks of using Hadoop To: common-user@hadoop.apache.org We are planning to use Hadoop in my organisation for quality of servicesanalysis out of CDR records from mobile operators. We are thinking of having a small cluster of may be 10 - 15 nodes and I'm preparing the proposal. my office requires that i provide some risk analysis in the proposal. thank you. On 16 September 2011 20:34, Uma Maheswara Rao G 72686 mahesw...@huawei.comwrote: Hello, First of all where you are planning to use Hadoop? Regards, Uma - Original Message - From: Kobina Kwarko kobina.kwa...@gmail.com Date: Saturday, September 17, 2011 0:41 am Subject: risks of using Hadoop To: common-user common-user@hadoop.apache.org Hello, Please can someone point some of the risks we may incur if we decide to implement Hadoop? BR, Isaac.
How to get hadoop job information effectively?
I'm working a project to collect MapReduce job information on an application level. For example, a DW ETL process may involves several MapReduce jobs, we want to have a dashboard to show the progress of those jobs for the specific ETL process. JobStatus does not provide all information like JobTracker web page. JobInProgress is used in JobTracker and JobHistory and it is in JobTracker memory, and seem not exposed to the client side. The current method I am using is to check history log files and job conf XML file to extract those information like jobdetailhistory.jsp and jobhistory.jsp. Is there a better way to collect the information like JobInProgress? Thanks.
RE: risks of using Hadoop
I would completely agree with Mike's comments with one addition: Hadoop centers around how to manipulate the flow of data in a way to make the framework work for your specific problem. There are recipes for common problems but depending on your domain that might solve only 30-40% of your use cases. It should take little to no time for a good java dev to understand how to make an MR program. It will take significantly more time for that java dev to understand the domain and Hadoop well enough to consistently write *good* MR programs. Mike listed some great ways to cut down on that curve but you really want someone who has not only an affinity for code but can also apply the critical thinking to how you should pipeline your data. If you plan on using it purely with Pig/Hive abstractions on top then this can be negated significantly. Some my might disagree but that is my $0.02 Matt -Original Message- From: Michael Segel [mailto:michael_se...@hotmail.com] Sent: Wednesday, September 21, 2011 12:48 PM To: common-user@hadoop.apache.org Subject: RE: risks of using Hadoop Kobina The points 1 and 2 are definitely real risks. SPOF is not. As I pointed out in my mini-rant to Tom was that your end users / developers who use the cluster can do more harm to your cluster than a SPOF machine failure. I don't know what one would consider a 'long learning curve'. With the adoption of any new technology, you're talking at least 3-6 months based on the individual and the overall complexity of the environment. Take anyone who is a strong developer, put them through Cloudera's training, plus some play time, and you've shortened the learning curve. The better the java developer, the easier it is for them to pick up Hadoop. I would also suggest taking the approach of hiring a senior person who can cross train and mentor your staff. This too will shorten the runway. HTH -Mike Date: Wed, 21 Sep 2011 17:02:45 +0100 Subject: Re: risks of using Hadoop From: kobina.kwa...@gmail.com To: common-user@hadoop.apache.org Jignesh, Will your point 2 still be valid if we hire very experienced Java programmers? Kobina. On 20 September 2011 21:07, Jignesh Patel jign...@websoft.com wrote: @Kobina 1. Lack of skill set 2. Longer learning curve 3. Single point of failure @Uma I am curious to know about .20.2 is that stable? Is it same as the one you mention in your email(Federation changes), If I need scaled nameNode and append support, which version I should choose. Regarding Single point of failure, I believe Hortonworks(a.k.a Yahoo) is updating the Hadoop API. When that will be integrated with Hadoop. If I need -Jignesh On Sep 17, 2011, at 12:08 AM, Uma Maheswara Rao G 72686 wrote: Hi Kobina, Some experiences which may helpful for you with respective to DFS. 1. Selecting the correct version. I will recommend to use 0.20X version. This is pretty stable version and all other organizations prefers it. Well tested as well. Dont go for 21 version.This version is not a stable version.This is risk. 2. You should perform thorough test with your customer operations. (of-course you will do this :-)) 3. 0.20x version has the problem of SPOF. If NameNode goes down you will loose the data.One way of recovering is by using the secondaryNameNode.You can recover the data till last checkpoint.But here manual intervention is required. In latest trunk SPOF will be addressed bu HDFS-1623. 4. 0.20x NameNodes can not scale. Federation changes included in latest versions. ( i think in 22). this may not be the problem for your cluster. But please consider this aspect as well. 5. Please select the hadoop version depending on your security requirements. There are versions available for security as well in 0.20X. 6. If you plan to use Hbase, it requires append support. 20Append has the support for append. 0.20.205 release also will have append support but not yet released. Choose your correct version to avoid sudden surprises. Regards, Uma - Original Message - From: Kobina Kwarko kobina.kwa...@gmail.com Date: Saturday, September 17, 2011 3:42 am Subject: Re: risks of using Hadoop To: common-user@hadoop.apache.org We are planning to use Hadoop in my organisation for quality of servicesanalysis out of CDR records from mobile operators. We are thinking of having a small cluster of may be 10 - 15 nodes and I'm preparing the proposal. my office requires that i provide some risk analysis in the proposal. thank you. On 16 September 2011 20:34, Uma Maheswara Rao G 72686 mahesw...@huawei.comwrote: Hello, First of all where you are planning to use Hadoop? Regards, Uma - Original Message - From: Kobina Kwarko kobina.kwa...@gmail.com Date: Saturday, September 17, 2011 0:41 am Subject: risks
Re: Using HBase for real time transaction
On Wed, Sep 21, 2011 at 8:36 AM, Jignesh Patel jign...@websoft.com wrote: I am not looking for relational database. But looking creating multi tenant database, now at this time I am not sure whether it needs transactions or not and even that kind of architecture can support transactions. Currently in HBase nothing prevents you from having multiple tenants, as long as they have different table names. Also keep in mind that there's no security implemented, but it *might* make it for 0.92 (crossing fingers). Row mutations in HBase are seen by the user as soon as they are done, atomicity is guaranteed at the row level, which seems to satisfy his requirement. If multi-row transactions are needed then I agree HBase might not be what he wants. Can't we handle transaction through application or container, before data even goes to HBase? Sure, you could do something like what Megastore[1] does, but you really need to evaluate your needs and see if that works. And I do have one more doubt, how to handle low read latency? HBase offers that out of the box, a more precise question would be what 99th percentile read latency you need. Just for the sake of giving a data point, right now our 99p is 20ms but that's with our type of workload, machines, front end caching, etc, so YYMV. J-D 1. Megastore (transactions are described in chapter 3.3): http://www.cidrdb.org/cidr2011/Papers/CIDR11_Paper32.pdf
Re: risks of using Hadoop
I saw the title of this discussion started a few days ago but didn't pay attention to them. this morning i came across to some of these message and rofl, too much drama. According to my experience, there are some risks of using hadoop. 1) not real time and mission critical, you may consider hadoop as a good workhorse for offline processing, a good framework for large scale data analysis and data processing, however, there are many factors that affect hadoop jobs. Even the most well-written and robust code could fail because of some exceptional hardware and network problems. 2) don't put too much hope on efficiency, It can do some job which was impossible to achieve, but maybe not as fast as you imagine. There is no magic that hadoop creates everything in a blink. Usually and safely, you may prefer to break down your entire large job into several pieces, save and back the data step by step. In this fashion, hadoop could really get some huge job done, but still requires lots of manual effort. 3) no integrative workflow and open soruce multi-user administrative platform. This point is connected to the previous one because once a huge hadoop job started, especially for statistical analysis and machine learning task that requires many iterations, manual care is indispensable. As far as I know, there is yet no integrative workflow management system built for hadoop task. Moreover, if you have your private cluster running hadoop jobs and the coordination of multiple users could be a problem. For small group a board schedule is necessary, as for large group there might be a huge amount of work to configure hardware and virtual machines. In our experience, optimizing the cluster performance for hadoop is non-trivial and we met quite a lot of problems. Amazon EC2 is a good choice, but running long and large task on that could be quite expensive. 4) thinking of your problem carefully in a key-value fashion, try to minimize the use of reducer. Hadoop is actually shuffle, sort, aggregation of key-value pairs. Many practical problems at hand can be easily transformed to key-value data structure, however, most of the job can be done as mappers only. Don't jump into the reducer task too early, just trace all the data with a simple key of several bytes and finish mapper-only tasks as many as possible. In this way, you could avoid many unnecessary sort and aggregation tasks. Shi On 9/21/2011 1:01 PM, GOEKE, MATTHEW (AG/1000) wrote: I would completely agree with Mike's comments with one addition: Hadoop centers around how to manipulate the flow of data in a way to make the framework work for your specific problem. There are recipes for common problems but depending on your domain that might solve only 30-40% of your use cases. It should take little to no time for a good java dev to understand how to make an MR program. It will take significantly more time for that java dev to understand the domain and Hadoop well enough to consistently write *good* MR programs. Mike listed some great ways to cut down on that curve but you really want someone who has not only an affinity for code but can also apply the critical thinking to how you should pipeline your data. If you plan on using it purely with Pig/Hive abstractions on top then this can be negated significantly. Some my might disagree but that is my $0.02 Matt -Original Message- From: Michael Segel [mailto:michael_se...@hotmail.com] Sent: Wednesday, September 21, 2011 12:48 PM To: common-user@hadoop.apache.org Subject: RE: risks of using Hadoop Kobina The points 1 and 2 are definitely real risks. SPOF is not. As I pointed out in my mini-rant to Tom was that your end users / developers who use the cluster can do more harm to your cluster than a SPOF machine failure. I don't know what one would consider a 'long learning curve'. With the adoption of any new technology, you're talking at least 3-6 months based on the individual and the overall complexity of the environment. Take anyone who is a strong developer, put them through Cloudera's training, plus some play time, and you've shortened the learning curve. The better the java developer, the easier it is for them to pick up Hadoop. I would also suggest taking the approach of hiring a senior person who can cross train and mentor your staff. This too will shorten the runway. HTH -Mike Date: Wed, 21 Sep 2011 17:02:45 +0100 Subject: Re: risks of using Hadoop From: kobina.kwa...@gmail.com To: common-user@hadoop.apache.org Jignesh, Will your point 2 still be valid if we hire very experienced Java programmers? Kobina. On 20 September 2011 21:07, Jignesh Pateljign...@websoft.com wrote: @Kobina 1. Lack of skill set 2. Longer learning curve 3. Single point of failure @Uma I am curious to know about .20.2 is that stable? Is it same as the one you mention in your email(Federation changes), If I need scaled nameNode and append support,
Re: How to get hadoop job information effectively?
Not that I know of. We scrape web pages which is a horrible thing to do. There is a JIRA to add in some web service APIs to expose this type of information, but it is not going to be available for a while. --Bobby Evans On 9/21/11 1:01 PM, Benyi Wang bewang.t...@gmail.com wrote: I'm working a project to collect MapReduce job information on an application level. For example, a DW ETL process may involves several MapReduce jobs, we want to have a dashboard to show the progress of those jobs for the specific ETL process. JobStatus does not provide all information like JobTracker web page. JobInProgress is used in JobTracker and JobHistory and it is in JobTracker memory, and seem not exposed to the client side. The current method I am using is to check history log files and job conf XML file to extract those information like jobdetailhistory.jsp and jobhistory.jsp. Is there a better way to collect the information like JobInProgress? Thanks.
Re: risks of using Hadoop
I have been following this thread. Over the last two years that I have been using hadoop with a fairly large cluster, my biggest problem has been analyzing failures. In the beginning it was fairly simple - unformatted name node, task trackers not starting , heap allocation mistakes version id mismatch configuration mistakes, that were easily fixed using this group's help and or analyzing logs. Then the errors got a little more complicated - too many fetch failures, task exited with error code 134, error reading task output, etc where the logs were less useful this mailing list and the source became more useful - and given that I am not a Java expert, I needded to rely on this group more and more. There are wonderful people like Harsha, Steve and Todd who sincerely and correctly answer many queries. But this is a complex system are so many knobs and so many variables that knowing all possible failures is a probably close to impossible. This is just the framework. If you combine this with all the esoteric industries that hadoop is used for the complexity increases because of the domain expertise required. We won't even touch the voodoo magic that is involved in optimizing hadoop runs. So to mitigate the risk of running hadoop you need someone with four heads. - the domain head - one who can think and solve domain problems, the hadoop head- the person to translate this into M/R. The java head who understands java and can take a shot at looking at the source code and finding solutions to problems and the system head , the person who keeps the cluster buzzing along smoothly. So unless you have these heads or able to get these heads as required - there is some definite risk. Thanks once again to this wonderful group and many active people like Todd, Harsha , Steve and many others who have helped me and others go over that stumbling block., From: Ahmed Nagy ahmed.n...@gmail.com To: common-user@hadoop.apache.org Sent: Wednesday, September 21, 2011 2:02 AM Subject: Re: risks of using Hadoop Another way to decrease the risks is just to use Amazon Web Services. That might be a bit expensive On Sun, Sep 18, 2011 at 12:11 AM, Brian Bockelman bbock...@cse.unl.edu wrote: On Sep 16, 2011, at 11:08 PM, Uma Maheswara Rao G 72686 wrote: Hi Kobina, Some experiences which may helpful for you with respective to DFS. 1. Selecting the correct version. I will recommend to use 0.20X version. This is pretty stable version and all other organizations prefers it. Well tested as well. Dont go for 21 version.This version is not a stable version.This is risk. 2. You should perform thorough test with your customer operations. (of-course you will do this :-)) 3. 0.20x version has the problem of SPOF. If NameNode goes down you will loose the data.One way of recovering is by using the secondaryNameNode.You can recover the data till last checkpoint.But here manual intervention is required. In latest trunk SPOF will be addressed bu HDFS-1623. 4. 0.20x NameNodes can not scale. Federation changes included in latest versions. ( i think in 22). this may not be the problem for your cluster. But please consider this aspect as well. With respect to (3) and (4) - these are often completely overblown for many Hadoop use cases. If you use Hadoop as originally designed (large scale batch data processing), these likely don't matter. If you're looking at some of the newer use cases (low latency stuff or time-critical processing), or if you architect your solution poorly (lots of small files), these issues become relevant. Another case where I see folks get frustrated is using Hadoop as a plain old batch system; for non-data workflows, it doesn't measure up against specialized systems. You really want to make sure that Hadoop is the best tool for your job. Brian
Re: Java programmatic authentication of Hadoop Kerberos
Hi Lakshmi, Were you able to resolve the below issue. Even I'm facing the same issue, but couldn't resolve it. Please do reply me if you have the solution. Thanks in advance. Regards, Sivva. Sari1983 wrote: Hi, Kerberos has been configured for our Hadoop file system. I wish to do the authentication through a Java program. I'm able to perform the authentication using a normal java application. But, if I've any HDFS operations in the Java program, it's succeeded in reading the Keytab file, but showing some problems... org.apache.hadoop.security.UserGroupInformation loginUserFromKeytab INFO: Login successful for user principal name using keytab file keytab. The problems (Exceptions are) ... org.apache.hadoop.security.UserGroupInformation reloginFromKeytab INFO: Initiating logout for principal name Mar 21, 2011 8:56:32 AM org.apache.hadoop.security.UserGroupInformation reloginFromKeytab INFO: Initiating re-login for principal name Mar 21, 2011 8:56:34 AM org.apache.hadoop.security.UserGroupInformation hasSufficientTimeElapsed WARNING: Not attempting to re-login since the last re-login was attempted less than 600 seconds before. Mar 21, 2011 8:56:38 AM org.apache.hadoop.security.UserGroupInformation hasSufficientTimeElapsed WARNING: Not attempting to re-login since the last re-login was attempted less than 600 seconds before. .. Mar 21, 2011 8:56:51 AM org.apache.hadoop.security.UserGroupInformation hasSufficientTimeElapsed WARNING: Not attempting to re-login since the last re-login was attempted less than 600 seconds before. Mar 21, 2011 8:57:13 AM org.apache.hadoop.ipc.Client$Connection$1 run WARNING: Couldn't setup connection for Principal Name to null Exception in thread main java.io.IOException: Call to part of the principal name/10.204.97.33:8020 failed on local exception: java.io.IOException: Couldn't setup connection for principal name to null at org.apache.hadoop.ipc.Client.wrapException(Client.java:1139) at org.apache.hadoop.ipc.Client.call(Client.java:1107) at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:226) at $Proxy5.getProtocolVersion(Unknown Source) at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:398) at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:384) at org.apache.hadoop.hdfs.DFSClient.createRPCNamenode(DFSClient.java:111) at org.apache.hadoop.hdfs.DFSClient.init(DFSClient.java:213) at org.apache.hadoop.hdfs.DFSClient.init(DFSClient.java:180) at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:89) at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1514) at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:67) at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:1548) at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1530) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:228) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:111) at Fil2.main(Fil2.java:27) Caused by: java.io.IOException: Couldn't setup connection for principal name to null at org.apache.hadoop.ipc.Client$Connection$1.run(Client.java:503) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115) at org.apache.hadoop.ipc.Client$Connection.handleSaslConnectionFailure(Client.java:456) at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:558) at org.apache.hadoop.ipc.Client$Connection.access$2300(Client.java:210) at org.apache.hadoop.ipc.Client.getConnection(Client.java:1244) at org.apache.hadoop.ipc.Client.call(Client.java:1075) ... 15 more Caused by: java.io.IOException: Failed to specify server's Kerberos principal name at org.apache.hadoop.security.SaslRpcClient.init(SaslRpcClient.java:85) at org.apache.hadoop.ipc.Client$Connection.setupSaslConnection(Client.java:413) at org.apache.hadoop.ipc.Client$Connection.access$1100(Client.java:210) at org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:551) at org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:548) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115) at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:547) ... 18 more Please help me in resolving this issue. It's very urgent. I'm new to Kerberos and Hadoop. I appreciate any help. Thanks Regards, Lakshmi -- View this message in context: http://old.nabble.com/Java-programmatic-authentication-of-Hadoop-Kerberos-tp31198827p32503781.html Sent from
Reducer hanging ( swapping? )
Hi Folks, I am running hive on a 10 node cluster. Since my hive queries have joins in them, their reduce phases are a bit heavy. I have 2GB RAM on each TT . The problem is that my reducer hangs at 76% for a large amount of time. I guess this is due to excessive swapping from disk to memory. My vmstat shows (on one of the TTs) procs ---memory-- ---swap-- -io -system-- cpu r b swpd free buff cache si sobibo in cs us sy id wa 1 0 1860 34884 189948 199764400 2 101 0 0 100 0 My related config parms are pasted below. (I turned off speculative execution for both maps and reduces). Can anyone suggest me some improvements so as to make my reduce a bit faster? (I've allotted 900MB to task and reduced other params. Even then it is not showing any improvments.) . Any suggestions? property namemapred.min.split.size/name value65536/value /property property namemapred.reduce.copy.backoff/name value5/value /property property nameio.sort.factor/name value60/value /property property namemapred.reduce.parallel.copies/name value25/value /property property nameio.sort.mb/name value70/value /property property nameio.file.buffer.size/name value32768/value /property property namemapred.child.java.opts/name value-Xmx900m/value /property ===
Re: Reducer hanging ( swapping? )
2GB for a task tracker? Here are some possible thoughts. Compress map output. Change mapred.reduce.slowstart.completed.maps By the way I see no swapping. Anything interesting from the task tracker log? System log? Raj From: john smith js1987.sm...@gmail.com To: common-user@hadoop.apache.org Sent: Wednesday, September 21, 2011 4:52 PM Subject: Reducer hanging ( swapping? ) Hi Folks, I am running hive on a 10 node cluster. Since my hive queries have joins in them, their reduce phases are a bit heavy. I have 2GB RAM on each TT . The problem is that my reducer hangs at 76% for a large amount of time. I guess this is due to excessive swapping from disk to memory. My vmstat shows (on one of the TTs) procs ---memory-- ---swap-- -io -system-- cpu r b swpd free buff cache si so bi bo in cs us sy id wa 1 0 1860 34884 189948 1997644 0 0 2 1 0 1 0 0 100 0 My related config parms are pasted below. (I turned off speculative execution for both maps and reduces). Can anyone suggest me some improvements so as to make my reduce a bit faster? (I've allotted 900MB to task and reduced other params. Even then it is not showing any improvments.) . Any suggestions? property namemapred.min.split.size/name value65536/value /property property namemapred.reduce.copy.backoff/name value5/value /property property nameio.sort.factor/name value60/value /property property namemapred.reduce.parallel.copies/name value25/value /property property nameio.sort.mb/name value70/value /property property nameio.file.buffer.size/name value32768/value /property property namemapred.child.java.opts/name value-Xmx900m/value /property ===
Hadoop's use cases
Hello, I would like to collect Hadoop's compelling use cases. I am doing monitoring, measurements benchmarking on Hadoop and would like to focus on its strong side. I have been working on less strong sides (small files, and the results compared to other systems with similar goals were not appealing). The main areas would be ETL, ML related I would think. However I need to go into further details: What type of data is appealing to Hadoop, what size, what query? I found Cloudera's page http://www.cloudera.com/blog/category/use-case/, but it doesnt provide these type of details. Can you help with more info / ideas? Thanks, Keren -- Keren Ouaknine Cell: +972 54 2565404 Web: www.kereno.com
RE: risks of using Hadoop
Amen to that. I haven't heard a good rant in a long time, I am definitely amused end entertained. As a veteran of 3 years with Hadoop I will say that the SPOF issue is whatever you want to make it. But it has not, nor will it ever defer me from using this great system. Every system has its risks and they can be minimized by careful architectural crafting and intelligent usage. Bill -Original Message- From: Michael Segel [mailto:michael_se...@hotmail.com] Sent: Wednesday, September 21, 2011 1:48 PM To: common-user@hadoop.apache.org Subject: RE: risks of using Hadoop Kobina The points 1 and 2 are definitely real risks. SPOF is not. As I pointed out in my mini-rant to Tom was that your end users / developers who use the cluster can do more harm to your cluster than a SPOF machine failure. I don't know what one would consider a 'long learning curve'. With the adoption of any new technology, you're talking at least 3-6 months based on the individual and the overall complexity of the environment. Take anyone who is a strong developer, put them through Cloudera's training, plus some play time, and you've shortened the learning curve. The better the java developer, the easier it is for them to pick up Hadoop. I would also suggest taking the approach of hiring a senior person who can cross train and mentor your staff. This too will shorten the runway. HTH -Mike Date: Wed, 21 Sep 2011 17:02:45 +0100 Subject: Re: risks of using Hadoop From: kobina.kwa...@gmail.com To: common-user@hadoop.apache.org Jignesh, Will your point 2 still be valid if we hire very experienced Java programmers? Kobina. On 20 September 2011 21:07, Jignesh Patel jign...@websoft.com wrote: @Kobina 1. Lack of skill set 2. Longer learning curve 3. Single point of failure @Uma I am curious to know about .20.2 is that stable? Is it same as the one you mention in your email(Federation changes), If I need scaled nameNode and append support, which version I should choose. Regarding Single point of failure, I believe Hortonworks(a.k.a Yahoo) is updating the Hadoop API. When that will be integrated with Hadoop. If I need -Jignesh On Sep 17, 2011, at 12:08 AM, Uma Maheswara Rao G 72686 wrote: Hi Kobina, Some experiences which may helpful for you with respective to DFS. 1. Selecting the correct version. I will recommend to use 0.20X version. This is pretty stable version and all other organizations prefers it. Well tested as well. Dont go for 21 version.This version is not a stable version.This is risk. 2. You should perform thorough test with your customer operations. (of-course you will do this :-)) 3. 0.20x version has the problem of SPOF. If NameNode goes down you will loose the data.One way of recovering is by using the secondaryNameNode.You can recover the data till last checkpoint.But here manual intervention is required. In latest trunk SPOF will be addressed bu HDFS-1623. 4. 0.20x NameNodes can not scale. Federation changes included in latest versions. ( i think in 22). this may not be the problem for your cluster. But please consider this aspect as well. 5. Please select the hadoop version depending on your security requirements. There are versions available for security as well in 0.20X. 6. If you plan to use Hbase, it requires append support. 20Append has the support for append. 0.20.205 release also will have append support but not yet released. Choose your correct version to avoid sudden surprises. Regards, Uma - Original Message - From: Kobina Kwarko kobina.kwa...@gmail.com Date: Saturday, September 17, 2011 3:42 am Subject: Re: risks of using Hadoop To: common-user@hadoop.apache.org We are planning to use Hadoop in my organisation for quality of servicesanalysis out of CDR records from mobile operators. We are thinking of having a small cluster of may be 10 - 15 nodes and I'm preparing the proposal. my office requires that i provide some risk analysis in the proposal. thank you. On 16 September 2011 20:34, Uma Maheswara Rao G 72686 mahesw...@huawei.comwrote: Hello, First of all where you are planning to use Hadoop? Regards, Uma - Original Message - From: Kobina Kwarko kobina.kwa...@gmail.com Date: Saturday, September 17, 2011 0:41 am Subject: risks of using Hadoop To: common-user common-user@hadoop.apache.org Hello, Please can someone point some of the risks we may incur if we decide to implement Hadoop? BR, Isaac.
Re: Can we replace namenode machine with some other machine ?
You copy the same installations to new machine and change ip address. After that configure the new NN addresses to your clients and DNs. Also Does Namenode/JobTracker machine's configuration needs to be better than datanodes/tasktracker's ?? I did not get this question. Regards, Uma - Original Message - From: praveenesh kumar praveen...@gmail.com Date: Thursday, September 22, 2011 10:13 am Subject: Can we replace namenode machine with some other machine ? To: common-user@hadoop.apache.org Hi all, Can we replace our namenode machine later with some other machine. ? Actually I got a new server machine in my cluster and now I want to make this machine as my new namenode and jobtracker node ? Also Does Namenode/JobTracker machine's configuration needs to be betterthan datanodes/tasktracker's ?? How can I achieve this target with least overhead ? Thanks, Praveenesh
Re: RE: risks of using Hadoop
Absolutely agree with you. Mainly we should consider SPOF and minimize the problem with our carefulness. (there are many ways to minimize this issue, we have seen in this thread) Regards, Uma - Original Message - From: Bill Habermaas bill.haberm...@oracle.com Date: Thursday, September 22, 2011 10:04 am Subject: RE: risks of using Hadoop To: common-user@hadoop.apache.org Amen to that. I haven't heard a good rant in a long time, I am definitely amused end entertained. As a veteran of 3 years with Hadoop I will say that the SPOF issue is whatever you want to make it. But it has not, nor will it ever defer me from using this great system. Every system has its risks and they can be minimized by careful architectural crafting and intelligent usage. Bill -Original Message- From: Michael Segel [mailto:michael_se...@hotmail.com] Sent: Wednesday, September 21, 2011 1:48 PM To: common-user@hadoop.apache.org Subject: RE: risks of using Hadoop Kobina The points 1 and 2 are definitely real risks. SPOF is not. As I pointed out in my mini-rant to Tom was that your end users / developers who use the cluster can do more harm to your cluster than a SPOF machine failure. I don't know what one would consider a 'long learning curve'. With the adoption of any new technology, you're talking at least 3-6 months based on the individual and the overall complexity of the environment. Take anyone who is a strong developer, put them through Cloudera's training, plus some play time, and you've shortened the learning curve.The better the java developer, the easier it is for them to pick up Hadoop. I would also suggest taking the approach of hiring a senior person who can cross train and mentor your staff. This too will shorten the runway. HTH -Mike Date: Wed, 21 Sep 2011 17:02:45 +0100 Subject: Re: risks of using Hadoop From: kobina.kwa...@gmail.com To: common-user@hadoop.apache.org Jignesh, Will your point 2 still be valid if we hire very experienced Java programmers? Kobina. On 20 September 2011 21:07, Jignesh Patel jign...@websoft.com wrote: @Kobina 1. Lack of skill set 2. Longer learning curve 3. Single point of failure @Uma I am curious to know about .20.2 is that stable? Is it same as the one you mention in your email(Federation changes), If I need scaled nameNode and append support, which version I should choose. Regarding Single point of failure, I believe Hortonworks(a.k.a Yahoo) is updating the Hadoop API. When that will be integrated with Hadoop. If I need -Jignesh On Sep 17, 2011, at 12:08 AM, Uma Maheswara Rao G 72686 wrote: Hi Kobina, Some experiences which may helpful for you with respective to DFS. 1. Selecting the correct version. I will recommend to use 0.20X version. This is pretty stable version and all other organizations prefers it. Well tested as well. Dont go for 21 version.This version is not a stable version.This is risk. 2. You should perform thorough test with your customer operations.(of-course you will do this :-)) 3. 0.20x version has the problem of SPOF. If NameNode goes down you will loose the data.One way of recovering is by using the secondaryNameNode.You can recover the data till last checkpoint.But here manual intervention is required. In latest trunk SPOF will be addressed bu HDFS-1623. 4. 0.20x NameNodes can not scale. Federation changes included in latest versions. ( i think in 22). this may not be the problem for your cluster. But please consider this aspect as well. 5. Please select the hadoop version depending on your security requirements. There are versions available for security as well in 0.20X. 6. If you plan to use Hbase, it requires append support. 20Append has the support for append. 0.20.205 release also will have append support but not yet released. Choose your correct version to avoid sudden surprises. Regards, Uma - Original Message - From: Kobina Kwarko kobina.kwa...@gmail.com Date: Saturday, September 17, 2011 3:42 am Subject: Re: risks of using Hadoop To: common-user@hadoop.apache.org We are planning to use Hadoop in my organisation for quality of servicesanalysis out of CDR records from mobile operators. We are thinking of having a small cluster of may be 10 - 15 nodes and I'm preparing the proposal. my office requires that i provide some risk analysis in the proposal. thank you. On 16 September 2011 20:34, Uma Maheswara Rao G 72686 mahesw...@huawei.comwrote: Hello, First of all where you are planning to use Hadoop? Regards, Uma - Original Message - From: Kobina Kwarko kobina.kwa...@gmail.com Date: Saturday,
Re: Can we replace namenode machine with some other machine ?
If I just change configuration settings in slave machines, Will it effect any of the data that is currently residing in the cluster ?? And my second question was... Do we need the master node (NN/JT hosting machine) to have good configuration than our slave machines(DN/TT hosting machines). Actually my master node is a weaker machine than my slave machines,because I am assuming that master machines does not do much additional work, and its okay to have a weak machine as master. Now I have a new big server machine just being added to my cluster. So I am thinking shall I make this new machine as my new master(NN/JT) or just add this machine as slave ? Thanks, Praveenesh On Thu, Sep 22, 2011 at 10:20 AM, Uma Maheswara Rao G 72686 mahesw...@huawei.com wrote: You copy the same installations to new machine and change ip address. After that configure the new NN addresses to your clients and DNs. Also Does Namenode/JobTracker machine's configuration needs to be better than datanodes/tasktracker's ?? I did not get this question. Regards, Uma - Original Message - From: praveenesh kumar praveen...@gmail.com Date: Thursday, September 22, 2011 10:13 am Subject: Can we replace namenode machine with some other machine ? To: common-user@hadoop.apache.org Hi all, Can we replace our namenode machine later with some other machine. ? Actually I got a new server machine in my cluster and now I want to make this machine as my new namenode and jobtracker node ? Also Does Namenode/JobTracker machine's configuration needs to be betterthan datanodes/tasktracker's ?? How can I achieve this target with least overhead ? Thanks, Praveenesh
Re: Can we replace namenode machine with some other machine ?
By just changing the configs will not effect your data. You need to restart your DNs to connect to new NN. For the second question: It will again depends on your usage. If your files will more in DFS then NN will consume more memory as it needs to store all the metadata info of the files in NameSpace. If your files are more and more then it is recommended that dont put the NN and JT in same machine. Coming to DN case: Configured space will used for storing the block files.Once it is filled the space then NN will not select this DN for further writes. So, if one DN has less space should fine than less space for NN in big clusters. Configuring good configuration DN which has very good amount of space. And NN has less space to store your files metadata info then its of no use to have more space in DNs right :-) Regards, Uma - Original Message - From: praveenesh kumar praveen...@gmail.com Date: Thursday, September 22, 2011 10:42 am Subject: Re: Can we replace namenode machine with some other machine ? To: common-user@hadoop.apache.org If I just change configuration settings in slave machines, Will it effectany of the data that is currently residing in the cluster ?? And my second question was... Do we need the master node (NN/JT hosting machine) to have good configuration than our slave machines(DN/TT hosting machines). Actually my master node is a weaker machine than my slave machines,because I am assuming that master machines does not do much additional work, and its okay to have a weak machine as master. Now I have a new big server machine just being added to my cluster. So I am thinking shall I make this new machine as my new master(NN/JT) or just add this machine as slave ? Thanks, Praveenesh On Thu, Sep 22, 2011 at 10:20 AM, Uma Maheswara Rao G 72686 mahesw...@huawei.com wrote: You copy the same installations to new machine and change ip address. After that configure the new NN addresses to your clients and DNs. Also Does Namenode/JobTracker machine's configuration needs to be better than datanodes/tasktracker's ?? I did not get this question. Regards, Uma - Original Message - From: praveenesh kumar praveen...@gmail.com Date: Thursday, September 22, 2011 10:13 am Subject: Can we replace namenode machine with some other machine ? To: common-user@hadoop.apache.org Hi all, Can we replace our namenode machine later with some other machine. ? Actually I got a new server machine in my cluster and now I want to make this machine as my new namenode and jobtracker node ? Also Does Namenode/JobTracker machine's configuration needs to be betterthan datanodes/tasktracker's ?? How can I achieve this target with least overhead ? Thanks, Praveenesh