question about speed up the hdfs balancer
hi,maillist: i want to balancer for my HDFS cluster ,but the default speed is 1M/s , i want to set the option dfs.balance.bandwidthPerSec to 20M/s ,surpose i issue balancer command on node A ,i do not know if i just set the option on node A's hdfs-site.xml ,or i need to set the option on All node of my HDFS cluster? thanks a lot!
Re: issue about pig can not know HDFS HA configuration
this name is not a host name ,it is NN HA service name, behind the name ,is two NN box, one for active node , one for standby node On Wed, Nov 5, 2014 at 7:41 PM, Jagannath Naidu jagannath.na...@fosteringlinux.com wrote: On 5 November 2014 14:49, ch huang justlo...@gmail.com wrote: hi,maillist: i set namenode HA in my HDFS cluster,but seems pig can not know it ,why? 2014-11-05 14:34:54,710 [JobControl] INFO org.apache.hadoop.mapreduce.JobSubmitter - Cleaning up the staging area file:/tmp/hadoop-root/mapred/staging/root1861403840/.staging/job_local1861403840_0001 2014-11-05 14:34:54,716 [JobControl] WARN org.apache.hadoop.security.UserGroupInformation - PriviledgedActionException as:root (auth:SIMPLE) cause:org.apache.pig.backend.executionengine.ExecException: ERROR 2118: java.net.UnknownHostException: develop unknown host exception, this can be the issue. Check that the host is discoverable either form dns or from hosts. 2014-11-05 14:34:54,717 [JobControl] INFO org.apache.hadoop.mapreduce.lib.jobcontrol.ControlledJob - PigLatin:DefaultJobName got an error while submitting org.apache.pig.backend.executionengine.ExecException: ERROR 2118: java.net.UnknownHostException: develop at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:288) at org.apache.hadoop.mapreduce.JobSubmitter.writeNewSplits(JobSubmitter.java:493) at org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:510) at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:394) at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1295) at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1292) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1554) at org.apache.hadoop.mapreduce.Job.submit(Job.java:1292) at org.apache.hadoop.mapreduce.lib.jobcontrol.ControlledJob.submit(ControlledJob.java:335) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.pig.backend.hadoop23.PigJobControl.submit(PigJobControl.java:128) at org.apache.pig.backend.hadoop23.PigJobControl.run(PigJobControl.java:191) at java.lang.Thread.run(Thread.java:744) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher$1.run(MapReduceLauncher.java:270) Caused by: java.lang.IllegalArgumentException: java.net.UnknownHostException: develop at org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:377) at org.apache.hadoop.hdfs.NameNodeProxies.createNonHAProxy(NameNodeProxies.java:237) at org.apache.hadoop.hdfs.NameNodeProxies.createProxy(NameNodeProxies.java:141) at org.apache.hadoop.hdfs.DFSClient.init(DFSClient.java:576) at org.apache.hadoop.hdfs.DFSClient.init(DFSClient.java:521) at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:146) at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2397) at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:89) at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2431) at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2413) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:368) at org.apache.hadoop.fs.Path.getFileSystem(Path.java:296) at org.apache.hcatalog.mapreduce.HCatBaseInputFormat.setInputPath(HCatBaseInputFormat.java:326) at org.apache.hcatalog.mapreduce.HCatBaseInputFormat.getSplits(HCatBaseInputFormat.java:127) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:274) ... 18 more Caused by: java.net.UnknownHostException: develop ... 33 more -- Jaggu Naidu
issue about submit job to local ,not to cluster
hi,maillist: my cluster move from one IDC to another IDC ,when all done ,i run job ,and find the job run on local box not on cluster ,why? it is normal on old IDC!
how to copy data between two hdfs cluster fastly?
hi,maillist: i now use distcp to migrate data from CDH4.4 to CDH5.1 , i find when copy small file,it very good, but when transfer big data ,it very slow ,any good method recommand? thanks
Re: how to copy data between two hdfs cluster fastly?
no ,all default On Fri, Oct 17, 2014 at 5:46 PM, Azuryy Yu azury...@gmail.com wrote: Did you specified how many map tasks? On Fri, Oct 17, 2014 at 4:58 PM, ch huang justlo...@gmail.com wrote: hi,maillist: i now use distcp to migrate data from CDH4.4 to CDH5.1 , i find when copy small file,it very good, but when transfer big data ,it very slow ,any good method recommand? thanks
Re: how to copy data between two hdfs cluster fastly?
some file , total size is 2T ,and block size is 128M On Sat, Oct 18, 2014 at 2:26 AM, Shivram Mani sm...@pivotal.io wrote: What is your approx input size ? Do you have multiple files or is this one large file ? What is your block size (source and destination cluster) ? On Fri, Oct 17, 2014 at 4:19 AM, ch huang justlo...@gmail.com wrote: no ,all default On Fri, Oct 17, 2014 at 5:46 PM, Azuryy Yu azury...@gmail.com wrote: Did you specified how many map tasks? On Fri, Oct 17, 2014 at 4:58 PM, ch huang justlo...@gmail.com wrote: hi,maillist: i now use distcp to migrate data from CDH4.4 to CDH5.1 , i find when copy small file,it very good, but when transfer big data ,it very slow ,any good method recommand? thanks -- Thanks Shivram
Re: how to copy data between two hdfs cluster fastly?
yes On Sat, Oct 18, 2014 at 3:53 AM, Jakub Stransky stransky...@gmail.com wrote: Distcp? On 17 Oct 2014 20:51, Alexander Pivovarov apivova...@gmail.com wrote: try to run on dest cluster datanode $ hadoop fs -cp hdfs://from_cluster/hdfs://to_cluster/ On Fri, Oct 17, 2014 at 11:26 AM, Shivram Mani sm...@pivotal.io wrote: What is your approx input size ? Do you have multiple files or is this one large file ? What is your block size (source and destination cluster) ? On Fri, Oct 17, 2014 at 4:19 AM, ch huang justlo...@gmail.com wrote: no ,all default On Fri, Oct 17, 2014 at 5:46 PM, Azuryy Yu azury...@gmail.com wrote: Did you specified how many map tasks? On Fri, Oct 17, 2014 at 4:58 PM, ch huang justlo...@gmail.com wrote: hi,maillist: i now use distcp to migrate data from CDH4.4 to CDH5.1 , i find when copy small file,it very good, but when transfer big data ,it very slow ,any good method recommand? thanks -- Thanks Shivram
issue about namenode start slow
hi,maillist: my hadoop cluster lost power last weekend ,when i restart my namenode ,i find it get info from journalnode ,and do transaction again ,but i see it very slow,and i observer 50070 web ui ,find lot's of info like http://hz49:8480/getJournal?jid=developsegmentTxId=4261627storageInfo=-55%3A466484546%3A0%3ACID-a140fb1a-ac10-4053-8b91-8f19f2809b7c i look the size of my journalnode local storage,it's only about 400M i do not know why the load process took such long time?
issue about let a common user run application on YARN (with kerberose)
hi,maillist: i use kerberos to do auth for my hadoop cluster, i have a 3-node cluster(HDFS YARN) z1.example.com (NN,RM) z2.example.com (NM,DN) z3.example.com (NM,DN,proxyserver,historyserver) i create principal for NNDN hdfs/z1.example@example.com hdfs/z2.example@example.com hdfs/z3.example@example.com and for RMNM yarn/z1.example@example.com yarn/z2.example@example.com yarn/z3.example@example.com for mapreduce history server mapred/z1.example@example.com mapred/z2.example@example.com mapred/z3.example@example.com for http principals in SPNEGO (instead of kerberos SSL for HTTP transactions) HTTP/z1.example@example.com HTTP/z2.example@example.com HTTP/z3.example@example.com and i can start cluster (HDFS YARN) successfully but i do not know how to let a common user to run application on yarn ,(i am not know kerberos),any one can help?
Datanode can not start with error Error creating plugin: org.apache.hadoop.metrics2.sink.FileSink
hi,maillist: i have a 10-worknode hadoop cluster using CDH 4.4.0 , one of my datanode ,one of it's disk is full , when i restart this datanode ,i get error STARTUP_MSG: java = 1.7.0_45 / 2014-09-04 10:20:00,576 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: registered UNIX signal handlers for [TERM, HUP, INT] 2014-09-04 10:20:01,457 INFO org.apache.hadoop.metrics2.impl.MetricsConfig: loaded properties from hadoop-metrics2.properties 2014-09-04 10:20:01,465 WARN org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Error creating sink 'file' org.apache.hadoop.metrics2.impl.MetricsConfigException: Error creating plugin: org.apache.hadoop.metrics2.sink.FileSink at org.apache.hadoop.metrics2.impl.MetricsConfig.getPlugin(MetricsConfig.java:203) at org.apache.hadoop.metrics2.impl.MetricsSystemImpl.newSink(MetricsSystemImpl.java:478) at org.apache.hadoop.metrics2.impl.MetricsSystemImpl.configureSinks(MetricsSystemImpl.java:450) at org.apache.hadoop.metrics2.impl.MetricsSystemImpl.configure(MetricsSystemImpl.java:429) at org.apache.hadoop.metrics2.impl.MetricsSystemImpl.start(MetricsSystemImpl.java:180) at org.apache.hadoop.metrics2.impl.MetricsSystemImpl.init(MetricsSystemImpl.java:156) at org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.init(DefaultMetricsSystem.java:54) at org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.initialize(DefaultMetricsSystem.java:50) at org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:1792) at org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:1728) at org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:1751) at org.apache.hadoop.hdfs.server.datanode.DataNode.secureMain(DataNode.java:1904) at org.apache.hadoop.hdfs.server.datanode.DataNode.main(DataNode.java:1925) Caused by: org.apache.hadoop.metrics2.MetricsException: Error creating datanode-metrics.out at org.apache.hadoop.metrics2.sink.FileSink.init(FileSink.java:53) at org.apache.hadoop.metrics2.impl.MetricsConfig.getPlugin(MetricsConfig.java:199) ... 12 more Caused by: java.io.FileNotFoundException: datanode-metrics.out (Permission denied) at java.io.FileOutputStream.open(Native Method) at java.io.FileOutputStream.init(FileOutputStream.java:221) at java.io.FileWriter.init(FileWriter.java:107) at org.apache.hadoop.metrics2.sink.FileSink.init(FileSink.java:48) ... 13 more 2014-09-04 10:20:01,488 INFO org.apache.hadoop.metrics2.impl.MetricsSinkAdapter: Sink ganglia started 2014-09-04 10:20:01,546 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled snapshot period at 5 second(s). 2014-09-04 10:20:01,546 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: DataNode metrics system started 2014-09-04 10:20:01,547 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Configured hostname is ch15 2014-09-04 10:20:01,569 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Opened streaming server at /0.0.0.0:50010 2014-09-04 10:20:01,572 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Balancing bandwith is 10485760 bytes/s 2014-09-04 10:20:01,607 INFO org.mortbay.log: Logging to org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via org.mortbay.log.Slf4jLog 2014-09-04 10:20:01,657 INFO org.apache.hadoop.http.HttpServer: Added global filter 'safety' (class=org.apache.hadoop.http.HttpServer$QuotingInputFilter) 2014-09-04 10:20:01,660 INFO org.apache.hadoop.http.HttpServer: Added filter static_user_filter (class=org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter) to context datanode 2014-09-04 10:20:01,660 INFO org.apache.hadoop.http.HttpServer: Added filter static_user_filter (class=org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter) to context static 2014-09-04 10:20:01,660 INFO org.apache.hadoop.http.HttpServer: Added filter static_user_filter (class=org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter) to context logs 2014-09-04 10:20:01,664 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Opened info server at 0.0.0.0:50075 2014-09-04 10:20:01,668 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: dfs.webhdfs.enabled = true 2014-09-04 10:20:01,670 INFO org.apache.hadoop.http.HttpServer: addJerseyResourcePackage: packageName=org.apache.hadoop.hdfs.server.datanode.web.resources;org.apache.hadoop.hdfs.web.resources, pathSpec=/webhdfs/v1/* 2014-09-04 10:20:01,676 INFO org.apache.hadoop.http.HttpServer: HttpServer.start() threw a non Bind IOException java.net.BindException: Port in use: 0.0.0.0:50075 at org.apache.hadoop.http.HttpServer.openListener(HttpServer.java:729) at org.apache.hadoop.http.HttpServer.start(HttpServer.java:673) at org.apache.hadoop.hdfs.server.datanode.DataNode.startInfoServer(DataNode.java:424) at
issue about hadoop data migrate between IDC
hi,maillist: my company signed a new IDC ,i must move the hadoop data(about 30T data) to new IDC,any good suggestion?
issue about distcp Source and target differ in block-size. Use -pb to preserve block-sizes during copy.
hi,maillist: i try to copy data from my old cluster to new cluster,i get error ,how to handle this? 14/07/24 18:35:58 INFO mapreduce.Job: Task Id : attempt_1406182801379_0004_m_00_1, Status : FAILED Error: java.io.IOException: File copy failed: webhdfs://CH22:50070/mytest/pipe_url_bak/part-m-1 -- webhdfs://develop/tmp/pipe_url_bak/part-m-1 at org.apache.hadoop.tools.mapred.CopyMapper.copyFileWithRetry(CopyMapper.java:262) at org.apache.hadoop.tools.mapred.CopyMapper.map(CopyMapper.java:229) at org.apache.hadoop.tools.mapred.CopyMapper.map(CopyMapper.java:45) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163) Caused by: java.io.IOException: Couldn't run retriable-command: Copying webhdfs://CH22:50070/mytest/pipe_url_bak/part-m-1 to webhdfs://develop/tmp/pipe_url_bak/part-m-1 at org.apache.hadoop.tools.util.RetriableCommand.execute(RetriableCommand.java:101) at org.apache.hadoop.tools.mapred.CopyMapper.copyFileWithRetry(CopyMapper.java:258) ... 10 more Caused by: java.io.IOException: Error writing request body to server at sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.checkError(HttpURLConnection.java:3192) at sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.write(HttpURLConnection.java:3175) at java.io.BufferedOutputStream.write(BufferedOutputStream.java:122) at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:58) at java.io.DataOutputStream.write(DataOutputStream.java:107) at java.io.BufferedOutputStream.write(BufferedOutputStream.java:122) at org.apache.hadoop.tools.mapred.RetriableFileCopyCommand.copyBytes(RetriableFileCopyCommand.java:231) at org.apache.hadoop.tools.mapred.RetriableFileCopyCommand.copyToTmpFile(RetriableFileCopyCommand.java:164) at org.apache.hadoop.tools.mapred.RetriableFileCopyCommand.doCopy(RetriableFileCopyCommand.java:118) at org.apache.hadoop.tools.mapred.RetriableFileCopyCommand.doExecute(RetriableFileCopyCommand.java:95) at org.apache.hadoop.tools.util.RetriableCommand.execute(RetriableCommand.java:87) ... 11 more 14/07/24 18:35:59 INFO mapreduce.Job: map 16% reduce 0% 14/07/24 18:39:39 INFO mapreduce.Job: map 17% reduce 0% 14/07/24 19:04:27 INFO mapreduce.Job: Task Id : attempt_1406182801379_0004_m_00_2, Status : FAILED Error: java.io.IOException: File copy failed: webhdfs://CH22:50070/mytest/pipe_url_bak/part-m-1 -- webhdfs://develop/tmp/pipe_url_bak/part-m-1 at org.apache.hadoop.tools.mapred.CopyMapper.copyFileWithRetry(CopyMapper.java:262) at org.apache.hadoop.tools.mapred.CopyMapper.map(CopyMapper.java:229) at org.apache.hadoop.tools.mapred.CopyMapper.map(CopyMapper.java:45) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163) Caused by: java.io.IOException: Couldn't run retriable-command: Copying webhdfs://CH22:50070/mytest/pipe_url_bak/part-m-1 to webhdfs://develop/tmp/pipe_url_bak/part-m-1 at org.apache.hadoop.tools.util.RetriableCommand.execute(RetriableCommand.java:101)
Re: issue about distcp Source and target differ in block-size. Use -pb to preserve block-sizes during copy.
' to transaction ID 62737 2014-07-24 17:37:34,195 INFO org.apache.hadoop.hdfs.server.namenode.EditLogInputStream: Fast-forwarding stream ' http://hz24:8480/getJournal?jid=developsegmentTxId=62737storageInfo=-55%3A466484546%3A0%3ACID-a140fb1a-ac10-4053-8b91-8f19f2809b7c' to transaction ID 62737 2014-07-24 17:37:34,223 INFO BlockStateChange: BLOCK* addToInvalidates: blk_1073753271_12644 192.168.10.51:50010 192.168.10.49:50010 192.168.10.50:50010 2014-07-24 17:37:34,224 INFO org.apache.hadoop.hdfs.server.namenode.FSImage: Edits file http://hz24:8480/getJournal?jid=developsegmentTxId=62737storageInfo=-55%3A466484546%3A0 : 2014-07-24 17:37:34,225 INFO org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer: Loaded 3 edits starting from txid 62736 2014-07-24 17:37:37,050 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* InvalidateBlocks: ask 192.168.10.51:50010 to delete [blk_1073753271_12644] 2014-07-24 17:37:40,050 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* InvalidateBlocks: ask 192.168.10.49:50010 to delete [blk_1073753271_12644] 2014-07-24 17:37:43,051 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* InvalidateBlocks: ask 192.168.10.50:50010 to delete [blk_1073753271_12644] 2014-07-24 17:39:34,255 INFO org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer: Triggering log roll on remote NameNode hz24/192.168.10.24:8020 On Fri, Jul 25, 2014 at 10:25 AM, Stanley Shi s...@gopivotal.com wrote: Would you please also past the corresponding namenode log? Regards, *Stanley Shi,* On Fri, Jul 25, 2014 at 9:15 AM, ch huang justlo...@gmail.com wrote: hi,maillist: i try to copy data from my old cluster to new cluster,i get error ,how to handle this? 14/07/24 18:35:58 INFO mapreduce.Job: Task Id : attempt_1406182801379_0004_m_00_1, Status : FAILED Error: java.io.IOException: File copy failed: webhdfs://CH22:50070/mytest/pipe_url_bak/part-m-1 -- webhdfs://develop/tmp/pipe_url_bak/part-m-1 at org.apache.hadoop.tools.mapred.CopyMapper.copyFileWithRetry(CopyMapper.java:262) at org.apache.hadoop.tools.mapred.CopyMapper.map(CopyMapper.java:229) at org.apache.hadoop.tools.mapred.CopyMapper.map(CopyMapper.java:45) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163) Caused by: java.io.IOException: Couldn't run retriable-command: Copying webhdfs://CH22:50070/mytest/pipe_url_bak/part-m-1 to webhdfs://develop/tmp/pipe_url_bak/part-m-1 at org.apache.hadoop.tools.util.RetriableCommand.execute(RetriableCommand.java:101) at org.apache.hadoop.tools.mapred.CopyMapper.copyFileWithRetry(CopyMapper.java:258) ... 10 more Caused by: java.io.IOException: Error writing request body to server at sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.checkError(HttpURLConnection.java:3192) at sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.write(HttpURLConnection.java:3175) at java.io.BufferedOutputStream.write(BufferedOutputStream.java:122) at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:58) at java.io.DataOutputStream.write(DataOutputStream.java:107) at java.io.BufferedOutputStream.write(BufferedOutputStream.java:122) at org.apache.hadoop.tools.mapred.RetriableFileCopyCommand.copyBytes(RetriableFileCopyCommand.java:231) at org.apache.hadoop.tools.mapred.RetriableFileCopyCommand.copyToTmpFile(RetriableFileCopyCommand.java:164) at org.apache.hadoop.tools.mapred.RetriableFileCopyCommand.doCopy(RetriableFileCopyCommand.java:118) at org.apache.hadoop.tools.mapred.RetriableFileCopyCommand.doExecute(RetriableFileCopyCommand.java:95) at org.apache.hadoop.tools.util.RetriableCommand.execute(RetriableCommand.java:87) ... 11 more 14/07/24 18:35:59 INFO mapreduce.Job: map 16% reduce 0% 14/07/24 18:39:39 INFO mapreduce.Job: map 17% reduce 0% 14/07/24 19:04:27 INFO mapreduce.Job: Task Id : attempt_1406182801379_0004_m_00_2, Status : FAILED Error: java.io.IOException: File copy failed: webhdfs://CH22:50070/mytest/pipe_url_bak/part-m-1 -- webhdfs://develop/tmp/pipe_url_bak/part-m-1 at org.apache.hadoop.tools.mapred.CopyMapper.copyFileWithRetry(CopyMapper.java:262) at org.apache.hadoop.tools.mapred.CopyMapper.map(CopyMapper.java:229) at org.apache.hadoop.tools.mapred.CopyMapper.map
Re: issue about run MR job use system user
i reslove this by make alex directory under staging directory and set the owner to alex On Thu, Jul 24, 2014 at 10:11 PM, java8964 java8...@hotmail.com wrote: Are you sure user 'Alex' belongs to 'hadoop' group? Why not your run command 'id alex' to prove it? And 'Alex' belongs to 'hadoop' group can be confirmed on the namenode? Yong -- Date: Thu, 24 Jul 2014 17:11:06 +0800 Subject: issue about run MR job use system user From: justlo...@gmail.com To: user@hadoop.apache.org hi,maillist: i create a system user on a box of my hadoop cluster ,but when i run MR job use this user ,it get a problem, the /data directory is for mapreduce history server option, and i also add the user into hadoop group ,since the /data privilege is 775 ,so it can write by user in hadoop group,why still cause permssion error? anyone can help? # useradd alex configuration property namemapreduce.framework.name/name valueyarn/value /property property namemapreduce.jobhistory.address/name value192.168.10.49:10020/value /property property namemapreduce.jobhistory.webapp.address/name value192.168.10.49:19888/value /property property nameyarn.app.mapreduce.am.staging-dir/name value/data/value /property . configuration $ hadoop fs -ls / Found 6 items drwxrwxr-x - hdfs hadoop 0 2014-07-14 18:17 /data $ hadoop fs -ls /data Found 3 items drwx-- - hdfs hadoop 0 2014-07-09 08:49 /data/hdfs drwxrwxrwt - hdfs hadoop 0 2014-07-08 18:52 /data/history drwx-- - pipe hadoop 0 2014-07-14 18:17 /data/pipe [alex@hz23 ~]$ id uid=501(alex) gid=501(alex) groups=501(alex),497(hadoop) [alex@hz23 ~]$ hadoop jar /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples-2.3.0-cdh5.0.2.jar pi 2 100 Number of Maps = 2 Samples per Map = 100 Wrote input for Map #0 Wrote input for Map #1 Starting Job 14/07/24 17:06:23 WARN security.UserGroupInformation: PriviledgedActionException as:alex (auth:SIMPLE) cause:org.apache.hadoop.security.AccessControlException: Permission denied: user=alex, access=WRITE, inode=/data:hdfs:hadoop:drwxrwxr-x at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkFsPermission(FSPermissionChecker.java:265) at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:251) at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:232) at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:176) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:5490) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:5472) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkAncestorAccess(FSNamesystem.java:5446) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInternal(FSNamesystem.java:3600) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInt(FSNamesystem.java:3570) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirs(FSNamesystem.java:3544) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.mkdirs(NameNodeRpcServer.java:739) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.mkdirs(ClientNamenodeProtocolServerSideTranslatorPB.java:558) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1986) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1982) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1980) org.apache.hadoop.security.AccessControlException: Permission denied: user=alex, access=WRITE, inode=/data:hdfs:hadoop:drwxrwxr-x
issue about run MR job use system user in CDH5
hi,maillist: i set up CDH5 yarn cluster ,and set the following option in my mapred-site.xml file property nameyarn.app.mapreduce.am.staging-dir/name value/data/value /property mapreduce history server will set history dir in the directory /data ,but if i submit MR job use other user ,i get error , i add the user to hadoop group also no use ,why?how can i do it? thanks 2014-07-22 14:07:06,734 INFO [main] mapreduce.TableOutputFormat: Created table instance for test_1 2014-07-22 14:07:06,765 WARN [main] security.UserGroupInformation: PriviledgedActionException as:hbase (auth:SIMPLE) cause:org.apache.hadoop.security.AccessControlException: Permission denied: user=hbase, access=EXECUTE, inode=/data:mapred:hadoop:drwxrwx--- at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkFsPermission(FSPermissionChecker.java:265) at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:251) at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkTraverse(FSPermissionChecker.java:205) at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:168) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:5490) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getFileInfo(FSNamesystem.java:3499) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getFileInfo(NameNodeRpcServer.java:764) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getFileInfo(ClientNamenodeProtocolServerSideTranslatorPB.java:764) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1986) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1982) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1980) Exception in thread main org.apache.hadoop.security.AccessControlException: Permission denied: user=hbase, access=EXECUTE, inode=/data:mapred:hadoop:drwxrwx--- at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkFsPermission(FSPermissionChecker.java:265) at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:251) at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkTraverse(FSPermissionChecker.java:205) at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:168) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:5490) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getFileInfo(FSNamesystem.java:3499) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getFileInfo(NameNodeRpcServer.java:764) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getFileInfo(ClientNamenodeProtocolServerSideTranslatorPB.java:764) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1986) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1982) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1980)
why i can not use '*' in remove hadoop directory?
hi,maillist: i use sudo -u hdfs hadoop fs -rm -r -skipTrash /user/hive/warehouse/adx.db/dsp_request/2014-03*/* in CDH4.4,but i find it can not work in CDH5,why? # sudo -u hdfs hadoop fs -rm -r -skipTrash /user/hive/warehouse/dsp.db/dsp_request/2014-01*/* rm: `/user/hive/warehouse/dsp.db/dsp_request/2014-01*/*': No such file or directory
can i monitor all hadoop component from one box?
hi,maillist: i want to check all hadoop cluster component process is alive or die ,i do not know if it can do like check zookeeper node from one machine?thanks
what stage dir in yarn framework use for?
hi,mailist: i see yarn.app.mapreduce.am.staging-dir in doc ,and i do not know it use for ,and i also want to know if the content in this dir can be clean, and if it can be set auto clean?
the data will lost if the active node data not sync to standby ?
hi,maillist: i have a NN HA , when the active down ,the changed metadata is still not write to local disk ,and not sync to standby ,this metadata is lost,right?
Re: how can i monitor Decommission progress?
but it can not show me ,how much it already done On Fri, Jun 6, 2014 at 2:56 AM, Suresh Srinivas sur...@hortonworks.com wrote: The namenode webui provides that information. Click on the main webui the link associated with decommissioned nodes. Sent from phone On Jun 5, 2014, at 10:36 AM, Raj K Singh rajkrrsi...@gmail.com wrote: use $hadoop dfsadmin -report Raj K Singh http://in.linkedin.com/in/rajkrrsingh http://www.rajkrrsingh.blogspot.com Mobile Tel: +91 (0)9899821370 On Sat, May 31, 2014 at 11:26 AM, ch huang justlo...@gmail.com wrote: hi,maillist: i decommission three node out of my cluster,but question is how can i see the decommission progress?,i just can see admin state from web ui CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
issue about change NN node
hi,mailist: I want to replace my NN in hadoop cluster(not NN HA,no secondary NN),how can i do this?
Re: should i just assign history server address on NN or i have to assign on each node?
my RM also is my NN node ,so i just configure it on the RM node? ,not each other node? On Wed, Jun 4, 2014 at 11:47 AM, Stanley Shi s...@gopivotal.com wrote: should set it on RM node; Regards, *Stanley Shi,* On Wed, Jun 4, 2014 at 9:24 AM, ch huang justlo...@gmail.com wrote: hi,maillist: i installed my job history server on my one of NN(i use NN HA) ,i want to ask if i need set history server address on each node?
Re: issue about move data between two hdoop cluster
it will took how long time to transform 50T data? what if i set the two cluster only visit throw lan access? On Thu, Jun 5, 2014 at 9:06 AM, Tsuyoshi OZAWA ozawa.tsuyo...@gmail.com wrote: Hi Ch, How about using DistCp? http://hadoop.apache.org/docs/r1.2.1/distcp2.html Thanks, - Tsuyoshi On Wed, Jun 4, 2014 at 5:40 PM, ch huang justlo...@gmail.com wrote: hi,mailist: my company signed a new IDC , i need move the total 50T data from old hadoop cluster to the new cluster in new location ,how to do it? -- - Tsuyoshi
issue about how to decommission a datanode from hadoop cluster
hi,maillist: I use CDH4.4 yarnhdfs cluster ,i want to decommission a datanode ,should i modify hdfs-site.xml and mapred-xml of all node in cluster to exclude the node ,or i just need set hdfs-site.xml and mapred-xml on NN ?
how can i monitor Decommission progress?
hi,maillist: i decommission three node out of my cluster,but question is how can i see the decommission progress?,i just can see admin state from web ui
issue about remove yarn jobs history logs
hi,maillist: i want remove jobs history logs ,and i configure the following info in yarn-site.xml,but it seems no use ,why? ( i use CDH4.4 yarn ,i configue on each datanode ,and my job history server on one of my datanode) property nameyarn.log-aggregation-enable/name valuetrue/value /property property descriptionWhere to aggregate logs to./description nameyarn.nodemanager.remote-app-log-dir/name value/var/log/hadoop-yarn/apps/value /property property nameyarn.log-aggregation.retain-seconds/name value1209600/value !-- 14 days -- /property property nameyarn.log-aggregation.retain-check-interval-seconds/name value300/value /property
can i shutdown one of work node when i do cluster balance operation?
hi,maillist: i want to update my some of worknode JDK version ,but the whole cluster is in balance process ,can i do it?
how to save dead node when it's disk is full?
hi,maillist: one of my datanode is full ,so dead in cluster ,how can i do to let it live again?
Re: how to save dead node when it's disk is full?
thanks for reply ,but my situaction is different , all the dead node disk is full,i can not move data to another empty disk On Thu, May 29, 2014 at 8:55 AM, Ted Yu yuzhih...@gmail.com wrote: Cycling old bits: http://search-hadoop.com/m/uMDyU1bxBJS/datanode+disk+fullsubj=Re+Disk+on+data+node+full On Wed, May 28, 2014 at 5:34 PM, ch huang justlo...@gmail.com wrote: hi,maillist: one of my datanode is full ,so dead in cluster ,how can i do to let it live again?
issue abount disk volume of node
hi,maillist: i have a 3-datanode cluster ,with each one has 1T disk volume ,recently i need run an app which need lots of disk volume, so i add another three datanode with each one has 10T disk ,but as app running ,the 1T disk node is nearly full,how can i balance the data storage between 1T node and 10T node? thanks
question about NM heapsize
hi,maillist: i set YARN_NODEMANAGER_HEAPSIZE=15000,so the NM run in a 15G JVM,but why i see web ui of yarn ,in it's Active Nodes - Mem Avail ,only 8GB? ,why?
should i need tune mapred.child.java.opts option if i use yarn and MRV2?
hi,mailist: i want to know if this option still cause limitation in YARN?
issue about joblistcache
hi,maillist: i see lot's of information from yarn jobhistory server says , where is the JobListCache location? 2014-05-15 15:15:30,758 WARN org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager: Waiting to remove job_1395143379025_7877 from JobListCache because it is not in done yet
Re: issue about cluster balance
i record the disk status befor balance and after balance,from one of source node and one of destination node before source node /dev/sdd 1.8T 1009G 733G 58% /data/1 /dev/sde 1.8T 1005G 737G 58% /data/2 /dev/sda 1.8T 980G 762G 57% /data/3 /dev/sdb 1.8T 980G 762G 57% /data/4 /dev/sdc 1.8T 972G 769G 56% /data/5 /dev/sdf 1.8T 980G 762G 57% /data/ destination node /dev/sdb 1.8T 2.0G 1.7T 1% /data/1 /dev/sdc 1.8T 2.1G 1.7T 1% /data/2 /dev/sdd 1.8T 2.0G 1.7T 1% /data/3 /dev/sde 1.8T 2.2G 1.7T 1% /data/4 /dev/sdf 1.8T 2.2G 1.7T 1% /data/5 after /dev/sdd 1.8T 754G 988G 44% /data/1 /dev/sde 1.8T 736G 1006G 43% /data/2 /dev/sda 1.8T 730G 1011G 42% /data/3 /dev/sdb 1.8T 721G 1020G 42% /data/4 /dev/sdc 1.8T 721G 1021G 42% /data/5 /dev/sdf 1.8T 723G 1019G 42% /data/6 /dev/sdb 1.8T 388G 1.4T 23% /data/1 /dev/sdc 1.8T 381G 1.4T 22% /data/2 /dev/sdd 1.8T 378G 1.4T 22% /data/3 /dev/sde 1.8T 375G 1.4T 22% /data/4 /dev/sdf 1.8T 374G 1.4T 22% /data/5 my wonder is why the source node is not equal destination node ,like 30% each ?,and the balance took 62.9919295 hours On Tue, May 6, 2014 at 12:38 PM, Rakesh R rake...@huawei.com wrote: Could you give more details like, - Could you convert 7% to the total amount of moved data in MBs. - Also, could you tell me 7% data movement per DN ? - What values showing for the ‘over-utilized’, ‘above-average’, ‘below-average’, ‘below-average’ nodes. Balancer will do the pairing based on these values. - Please tell me the cluster topology - SAME_NODE_GROUP, SAME_RACK. Basically this will matters when choosing the sourceNode vs balancerNode pairs as well as the proxy source. Did you see all the DNs are getting utilized for the block movement. - Any exceptions occurred when block movement - How many iterations played in these hours -Rakesh *From:* ch huang [mailto:justlo...@gmail.com] *Sent:* 06 May 2014 06:10 *To:* user@hadoop.apache.org *Subject:* issue about cluster balance hi,maillist: i have a 5-node hadoop cluster,and yesterday i add 5 new box into my cluster,after that i start balance task,but it move only 7% data to new node in 20 hour , and i already set dfs.datanode.balance.bandwidthPerSec 10M ,and the threshold is 10%,why the balance task take long time ?
issue about cluster balance
hi,maillist: i have a 5-node hadoop cluster,and yesterday i add 5 new box into my cluster,after that i start balance task,but it move only 7% data to new node in 20 hour , and i already set dfs.datanode.balance.bandwidthPerSec 10M ,and the threshold is 10%,why the balance task take long time ?
Re: how can i archive old data in HDFS?
it just combine several file into one file ,no zip happened On Fri, Apr 11, 2014 at 9:10 PM, Peyman Mohajerian mohaj...@gmail.comwrote: There is: http://hadoop.apache.org/docs/r1.2.1/hadoop_archives.html But not sure if it compresses the data or not. On Thu, Apr 10, 2014 at 9:57 PM, Stanley Shi s...@gopivotal.com wrote: AFAIK, no tools now. Regards, *Stanley Shi,* On Fri, Apr 11, 2014 at 9:09 AM, ch huang justlo...@gmail.com wrote: hi,maillist: how can i archive old data in HDFS ,i have lot of old data ,the data will not be use ,but it take lot of space to store it ,i want to archive and zip the old data, HDFS can do this operation?
Re: use setrep change number of file replicas,but not work
i can use fsck to get Over-replicated blocks but how can i track pending delete ? On Thu, Apr 10, 2014 at 10:50 AM, Harsh J ha...@cloudera.com wrote: The replica deletion is asynchronous. You can track its deletions via the NameNode's over-replicated blocks and the pending delete metrics. On Thu, Apr 10, 2014 at 7:16 AM, ch huang justlo...@gmail.com wrote: hi,maillist: i try modify replica number on some dir but it seems not work ,anyone know why? # sudo -u hdfs hadoop fs -setrep -R 2 /user/hive/warehouse/mytest Replication 2 set: /user/hive/warehouse/mytest/dsp_request/2014-01-26/data_0 the file still store 3 replica ,but the echo number changed # hadoop fs -ls /user/hive/warehouse/mytest/dsp_request/2014-01-26 Found 1 items -rw-r--r-- 2 hdfs hdfs 17660 2014-01-26 18:34 /user/hive/warehouse/mytest/dsp_request/2014-01-26/data_0 # sudo -u hdfs hdfs fsck /user/hive/warehouse/mytest/dsp_request/2014-01-26/data_0 -files -blocks -locations Connecting to namenode via http://ch11:50070 FSCK started by hdfs (auth:SIMPLE) from /192.168.11.12 for path /user/hive/warehouse/mytest/dsp_request/2014-01-26/data_0 at Thu Apr 10 09:39:51 CST 2014 /user/hive/warehouse/mytest/dsp_request/2014-01-26/data_0 17660 bytes, 1 block(s): OK 0. BP-1043055049-192.168.11.11-1382442676609:blk_-9219869107960013037_1976591 len=17660 repl=3 [192.168.11.13:50010, 192.168.11.10:50010, 192.168.11.14:50010] i remove the file ,and upload new file ,as i understand ,the new file should be stored in 2 replica,but it still store 3 replica ,why? # sudo -u hdfs hadoop fs -rm -r -skipTrash /user/hive/warehouse/mytest/dsp_request/2014-01-26/* Deleted /user/hive/warehouse/mytest/dsp_request/2014-01-26/data_0 # hadoop fs -put ./data_0 /user/hive/warehouse/mytest/dsp_request/2014-01-26/ [root@ch12 ~]# hadoop fs -ls /user/hive/warehouse/mytest/dsp_request/2014-01-26 Found 1 items -rw-r--r-- 3 root hdfs 17660 2014-04-10 09:40 /user/hive/warehouse/mytest/dsp_request/2014-01-26/data_0 # sudo -u hdfs hdfs fsck /user/hive/warehouse/mytest/dsp_request/2014-01-26/data_0 -files -blocks -locations Connecting to namenode via http://ch11:50070 FSCK started by hdfs (auth:SIMPLE) from /192.168.11.12 for path /user/hive/warehouse/mytest/dsp_request/2014-01-26/data_0 at Thu Apr 10 09:41:12 CST 2014 /user/hive/warehouse/mytest/dsp_request/2014-01-26/data_0 17660 bytes, 1 block(s): OK 0. BP-1043055049-192.168.11.11-1382442676609:blk_6517693524032437780_8889786 len=17660 repl=3 [192.168.11.12:50010, 192.168.11.15:50010, 192.168.11.13:50010] -- Harsh J
Re: use setrep change number of file replicas,but not work
i set replica number from 3 to 2,but i dump NN metrics ,the PendingDeletionBlocks is zero ,why? if the check thread will sleep a interval then do it's check work ,how long the interval time is? On Thu, Apr 10, 2014 at 10:50 AM, Harsh J ha...@cloudera.com wrote: The replica deletion is asynchronous. You can track its deletions via the NameNode's over-replicated blocks and the pending delete metrics. On Thu, Apr 10, 2014 at 7:16 AM, ch huang justlo...@gmail.com wrote: hi,maillist: i try modify replica number on some dir but it seems not work ,anyone know why? # sudo -u hdfs hadoop fs -setrep -R 2 /user/hive/warehouse/mytest Replication 2 set: /user/hive/warehouse/mytest/dsp_request/2014-01-26/data_0 the file still store 3 replica ,but the echo number changed # hadoop fs -ls /user/hive/warehouse/mytest/dsp_request/2014-01-26 Found 1 items -rw-r--r-- 2 hdfs hdfs 17660 2014-01-26 18:34 /user/hive/warehouse/mytest/dsp_request/2014-01-26/data_0 # sudo -u hdfs hdfs fsck /user/hive/warehouse/mytest/dsp_request/2014-01-26/data_0 -files -blocks -locations Connecting to namenode via http://ch11:50070 FSCK started by hdfs (auth:SIMPLE) from /192.168.11.12 for path /user/hive/warehouse/mytest/dsp_request/2014-01-26/data_0 at Thu Apr 10 09:39:51 CST 2014 /user/hive/warehouse/mytest/dsp_request/2014-01-26/data_0 17660 bytes, 1 block(s): OK 0. BP-1043055049-192.168.11.11-1382442676609:blk_-9219869107960013037_1976591 len=17660 repl=3 [192.168.11.13:50010, 192.168.11.10:50010, 192.168.11.14:50010] i remove the file ,and upload new file ,as i understand ,the new file should be stored in 2 replica,but it still store 3 replica ,why? # sudo -u hdfs hadoop fs -rm -r -skipTrash /user/hive/warehouse/mytest/dsp_request/2014-01-26/* Deleted /user/hive/warehouse/mytest/dsp_request/2014-01-26/data_0 # hadoop fs -put ./data_0 /user/hive/warehouse/mytest/dsp_request/2014-01-26/ [root@ch12 ~]# hadoop fs -ls /user/hive/warehouse/mytest/dsp_request/2014-01-26 Found 1 items -rw-r--r-- 3 root hdfs 17660 2014-04-10 09:40 /user/hive/warehouse/mytest/dsp_request/2014-01-26/data_0 # sudo -u hdfs hdfs fsck /user/hive/warehouse/mytest/dsp_request/2014-01-26/data_0 -files -blocks -locations Connecting to namenode via http://ch11:50070 FSCK started by hdfs (auth:SIMPLE) from /192.168.11.12 for path /user/hive/warehouse/mytest/dsp_request/2014-01-26/data_0 at Thu Apr 10 09:41:12 CST 2014 /user/hive/warehouse/mytest/dsp_request/2014-01-26/data_0 17660 bytes, 1 block(s): OK 0. BP-1043055049-192.168.11.11-1382442676609:blk_6517693524032437780_8889786 len=17660 repl=3 [192.168.11.12:50010, 192.168.11.15:50010, 192.168.11.13:50010] -- Harsh J
which dir in HDFS can be clean ?
hi,maillist: my HDFS cluster run about 1 year ,and i find many dir is very large,i wonder if some of them can be clean? like /var/log/hadoop-yarn/apps
use setrep change number of file replicas,but not work
hi,maillist: i try modify replica number on some dir but it seems not work ,anyone know why? # sudo -u hdfs hadoop fs -setrep -R 2 /user/hive/warehouse/mytest Replication 2 set: /user/hive/warehouse/mytest/dsp_request/2014-01-26/data_0 the file still store 3 replica ,but the echo number changed # hadoop fs -ls /user/hive/warehouse/mytest/dsp_request/2014-01-26 Found 1 items -rw-r--r-- 2 hdfs hdfs 17660 2014-01-26 18:34 /user/hive/warehouse/mytest/dsp_request/2014-01-26/data_0 # sudo -u hdfs hdfs fsck /user/hive/warehouse/mytest/dsp_request/2014-01-26/data_0 -files -blocks -locations Connecting to namenode via http://ch11:50070 FSCK started by hdfs (auth:SIMPLE) from /192.168.11.12 for path /user/hive/warehouse/mytest/dsp_request/2014-01-26/data_0 at Thu Apr 10 09:39:51 CST 2014 /user/hive/warehouse/mytest/dsp_request/2014-01-26/data_0 17660 bytes, 1 block(s): OK 0. BP-1043055049-192.168.11.11-1382442676609:blk_-9219869107960013037_1976591 len=17660 repl=3 [192.168.11.13:50010, 192.168.11.10:50010, 192.168.11.14:50010] i remove the file ,and upload new file ,as i understand ,the new file should be stored in 2 replica,but it still store 3 replica ,why? # sudo -u hdfs hadoop fs -rm -r -skipTrash /user/hive/warehouse/mytest/dsp_request/2014-01-26/* Deleted /user/hive/warehouse/mytest/dsp_request/2014-01-26/data_0 # hadoop fs -put ./data_0 /user/hive/warehouse/mytest/dsp_request/2014-01-26/ [root@ch12 ~]# hadoop fs -ls /user/hive/warehouse/mytest/dsp_request/2014-01-26 Found 1 items -rw-r--r-- 3 root hdfs 17660 2014-04-10 09:40 /user/hive/warehouse/mytest/dsp_request/2014-01-26/data_0 # sudo -u hdfs hdfs fsck /user/hive/warehouse/mytest/dsp_request/2014-01-26/data_0 -files -blocks -locations Connecting to namenode via http://ch11:50070 FSCK started by hdfs (auth:SIMPLE) from /192.168.11.12 for path /user/hive/warehouse/mytest/dsp_request/2014-01-26/data_0 at Thu Apr 10 09:41:12 CST 2014 /user/hive/warehouse/mytest/dsp_request/2014-01-26/data_0 17660 bytes, 1 block(s): OK 0. BP-1043055049-192.168.11.11-1382442676609:blk_6517693524032437780_8889786 len=17660 repl=3 [192.168.11.12:50010, 192.168.11.15:50010, 192.168.11.13:50010]
issue about mv: `/user/hive/warehouse/dsp_execution/2014-01-31/data_00000': Input/output error
hi,maillist : i need move datafile from one dir to another dir,but i find if i type hadoop fs -mv /A/* /B/ in commandline ,it's ok,but if i let shell do it ,it will get mv: `/user/hive/warehouse/dsp_click/2014-03-31/data_0': Input/output error i do not know why?
lot of attempt_local296445216_0001_m_000386_0 dir in NN dir
hi,maillist: i find many dirs in /data/hadoopmapredlocal/taskTracker/hdfs/jobcache/job_local296445216_0001 ,it is my mapred local dir ,can i remove it safely? why there are many dirs ?
issue of Log aggregation has not completed or is not enabled.
hi,maillist: i try look application log use the following process # yarn application -list Application-Id Application-Name User Queue State Final-State Tracking-URL application_1395126130647_0014 select user_id as userid, adverti...stattime(Stage-1) hivehive FINISHED SUCCEEDED ch18:19888/jobhistory/job/job_1395126130647_0014 # yarn logs -applicationId application_1395126130647_0014 Logs not available at /var/log/hadoop-yarn/apps/root/logs/application_1395126130647_0014 Log aggregation has not completed or is not enabled. but i do enable Log aggregation function ,here is my yarn-site.xml configuration about log aggregation property nameyarn.log-aggregation-enable/name valuetrue/value /property property descriptionWhere to aggregate logs to./description nameyarn.nodemanager.remote-app-log-dir/name value/var/log/hadoop-yarn/apps/value /property the application logs is not put on hdfs successfully,why? # hadoop fs -ls /var/log/hadoop-yarn/apps/root/logs/application_1395126130647_0014 ls: `/var/log/hadoop-yarn/apps/root/logs/application_1395126130647_0014': No such file or directory
any optimize suggestion for high concurrent write into hdfs?
hi,maillist: is there any optimize for large of write into hdfs in same time ? thanks
issue about write append into hdfs ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: ch12:50010:DataXceiver error processing READ_BLOCK operation
hi,maillist: i see the following info in my hdfs log ,and the block belong to the file which write by scribe ,i do not know why is there any limit in hdfs system ? 2014-02-21 10:33:30,235 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: opReadBlock BP-1043055049-192.168.11.11-1382442676609:blk_-8536558734938003208_3823240 received exc eption java.io.IOException: Replica gen stamp block genstamp, block=BP-1043055049-192.168.11.11-1382442676609:blk_-8536558734938003208_3823240, replica=ReplicaWaitingToBeRecov ered, blk_-8536558734938003208_3820986, RWR getNumBytes() = 35840 getBytesOnDisk() = 35840 getVisibleLength()= -1 getVolume() = /data/4/dn/current getBlockFile()= /data/4/dn/current/BP-1043055049-192.168.11.11-1382442676609/current/rbw/blk_-8536558734938003208 unlinked=false 2014-02-21 10:33:30,235 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(192.168.11.12, storageID=DS-754202132-192.168.11.12-50010-1382443087835, infoP ort=50075, ipcPort=50020, storageInfo=lv=-40;cid=CID-0e777b8c-19f3-44a1-8af1-916877f2506c;nsid=2086828354;c=0):Got exception while serving BP-1043055049-192.168.11.11-1382442676 609:blk_-8536558734938003208_3823240 to /192.168.11.15:56564 java.io.IOException: Replica gen stamp block genstamp, block=BP-1043055049-192.168.11.11-1382442676609:blk_-8536558734938003208_3823240, replica=ReplicaWaitingToBeRecovered, b lk_-8536558734938003208_3820986, RWR getNumBytes() = 35840 getBytesOnDisk() = 35840 getVisibleLength()= -1 getVolume() = /data/4/dn/current getBlockFile()= /data/4/dn/current/BP-1043055049-192.168.11.11-1382442676609/current/rbw/blk_-8536558734938003208 unlinked=false at org.apache.hadoop.hdfs.server.datanode.BlockSender.init(BlockSender.java:205) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:326) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:92) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:64) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:221) at java.lang.Thread.run(Thread.java:744) 2014-02-21 10:33:30,236 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: ch12:50010:DataXceiver error processing READ_BLOCK operation src: /192.168.11.15:56564 dest: / 192.168.11.12:50010 java.io.IOException: Replica gen stamp block genstamp, block=BP-1043055049-192.168.11.11-1382442676609:blk_-8536558734938003208_3823240, replica=ReplicaWaitingToBeRecovered, blk_-8536558734938003208_3820986, RWR getNumBytes() = 35840 getBytesOnDisk() = 35840 getVisibleLength()= -1 getVolume() = /data/4/dn/current getBlockFile()= /data/4/dn/current/BP-1043055049-192.168.11.11-1382442676609/current/rbw/blk_-8536558734938003208 unlinked=false at org.apache.hadoop.hdfs.server.datanode.BlockSender.init(BlockSender.java:205) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:326) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:92) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:64) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:221) at java.lang.Thread.run(Thread.java:744)
Re: issue about write append into hdfs ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: ch12:50010:DataXceiver error processing READ_BLOCK operation
hi, i use CDH4.4 On Fri, Feb 21, 2014 at 12:04 PM, Ted Yu yuzhih...@gmail.com wrote: Which hadoop release are you using ? Cheers On Thu, Feb 20, 2014 at 8:57 PM, ch huang justlo...@gmail.com wrote: hi,maillist: i see the following info in my hdfs log ,and the block belong to the file which write by scribe ,i do not know why is there any limit in hdfs system ? 2014-02-21 10:33:30,235 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: opReadBlock BP-1043055049-192.168.11.11-1382442676609:blk_-8536558734938003208_3823240 received exc eption java.io.IOException: Replica gen stamp block genstamp, block=BP-1043055049-192.168.11.11-1382442676609:blk_-8536558734938003208_3823240, replica=ReplicaWaitingToBeRecov ered, blk_-8536558734938003208_3820986, RWR getNumBytes() = 35840 getBytesOnDisk() = 35840 getVisibleLength()= -1 getVolume() = /data/4/dn/current getBlockFile()= /data/4/dn/current/BP-1043055049-192.168.11.11-1382442676609/current/rbw/blk_-8536558734938003208 unlinked=false 2014-02-21 10:33:30,235 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(192.168.11.12, storageID=DS-754202132-192.168.11.12-50010-1382443087835, infoP ort=50075, ipcPort=50020, storageInfo=lv=-40;cid=CID-0e777b8c-19f3-44a1-8af1-916877f2506c;nsid=2086828354;c=0):Got exception while serving BP-1043055049-192.168.11.11-1382442676 609:blk_-8536558734938003208_3823240 to /192.168.11.15:56564 java.io.IOException: Replica gen stamp block genstamp, block=BP-1043055049-192.168.11.11-1382442676609:blk_-8536558734938003208_3823240, replica=ReplicaWaitingToBeRecovered, b lk_-8536558734938003208_3820986, RWR getNumBytes() = 35840 getBytesOnDisk() = 35840 getVisibleLength()= -1 getVolume() = /data/4/dn/current getBlockFile()= /data/4/dn/current/BP-1043055049-192.168.11.11-1382442676609/current/rbw/blk_-8536558734938003208 unlinked=false at org.apache.hadoop.hdfs.server.datanode.BlockSender.init(BlockSender.java:205) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:326) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:92) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:64) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:221) at java.lang.Thread.run(Thread.java:744) 2014-02-21 10:33:30,236 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: ch12:50010:DataXceiver error processing READ_BLOCK operation src: /192.168.11.15:56564 dest: / 192.168.11.12:50010 java.io.IOException: Replica gen stamp block genstamp, block=BP-1043055049-192.168.11.11-1382442676609:blk_-8536558734938003208_3823240, replica=ReplicaWaitingToBeRecovered, blk_-8536558734938003208_3820986, RWR getNumBytes() = 35840 getBytesOnDisk() = 35840 getVisibleLength()= -1 getVolume() = /data/4/dn/current getBlockFile()= /data/4/dn/current/BP-1043055049-192.168.11.11-1382442676609/current/rbw/blk_-8536558734938003208 unlinked=false at org.apache.hadoop.hdfs.server.datanode.BlockSender.init(BlockSender.java:205) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:326) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:92) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:64) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:221) at java.lang.Thread.run(Thread.java:744)
Re: issue about write append into hdfs ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: ch12:50010:DataXceiver error processing READ_BLOCK operation
i use default value it seems the value is 4096, and also i checked hdfs user limit ,it's large enough -bash-4.1$ ulimit -a core file size (blocks, -c) 0 data seg size (kbytes, -d) unlimited scheduling priority (-e) 0 file size (blocks, -f) unlimited pending signals (-i) 514914 max locked memory (kbytes, -l) 64 max memory size (kbytes, -m) unlimited open files (-n) 32768 pipe size(512 bytes, -p) 8 POSIX message queues (bytes, -q) 819200 real-time priority (-r) 0 stack size (kbytes, -s) 10240 cpu time (seconds, -t) unlimited max user processes (-u) 65536 virtual memory (kbytes, -v) unlimited file locks (-x) unlimited On Fri, Feb 21, 2014 at 12:25 PM, Anurag Tangri anurag_tan...@yahoo.comwrote: Did you check your unix open file limit and data node xceiver value ? Is it too low for the number of blocks/data in your cluster ? Thanks, Anurag Tangri On Feb 20, 2014, at 6:57 PM, ch huang justlo...@gmail.com wrote: hi,maillist: i see the following info in my hdfs log ,and the block belong to the file which write by scribe ,i do not know why is there any limit in hdfs system ? 2014-02-21 10:33:30,235 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: opReadBlock BP-1043055049-192.168.11.11-1382442676609:blk_-8536558734938003208_3823240 received exc eption java.io.IOException: Replica gen stamp block genstamp, block=BP-1043055049-192.168.11.11-1382442676609:blk_-8536558734938003208_3823240, replica=ReplicaWaitingToBeRecov ered, blk_-8536558734938003208_3820986, RWR getNumBytes() = 35840 getBytesOnDisk() = 35840 getVisibleLength()= -1 getVolume() = /data/4/dn/current getBlockFile()= /data/4/dn/current/BP-1043055049-192.168.11.11-1382442676609/current/rbw/blk_-8536558734938003208 unlinked=false 2014-02-21 10:33:30,235 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(192.168.11.12, storageID=DS-754202132-192.168.11.12-50010-1382443087835, infoP ort=50075, ipcPort=50020, storageInfo=lv=-40;cid=CID-0e777b8c-19f3-44a1-8af1-916877f2506c;nsid=2086828354;c=0):Got exception while serving BP-1043055049-192.168.11.11-1382442676 609:blk_-8536558734938003208_3823240 to /192.168.11.15:56564 java.io.IOException: Replica gen stamp block genstamp, block=BP-1043055049-192.168.11.11-1382442676609:blk_-8536558734938003208_3823240, replica=ReplicaWaitingToBeRecovered, b lk_-8536558734938003208_3820986, RWR getNumBytes() = 35840 getBytesOnDisk() = 35840 getVisibleLength()= -1 getVolume() = /data/4/dn/current getBlockFile()= /data/4/dn/current/BP-1043055049-192.168.11.11-1382442676609/current/rbw/blk_-8536558734938003208 unlinked=false at org.apache.hadoop.hdfs.server.datanode.BlockSender.init(BlockSender.java:205) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:326) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:92) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:64) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:221) at java.lang.Thread.run(Thread.java:744) 2014-02-21 10:33:30,236 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: ch12:50010:DataXceiver error processing READ_BLOCK operation src: /192.168.11.15:56564 dest: / 192.168.11.12:50010 java.io.IOException: Replica gen stamp block genstamp, block=BP-1043055049-192.168.11.11-1382442676609:blk_-8536558734938003208_3823240, replica=ReplicaWaitingToBeRecovered, blk_-8536558734938003208_3820986, RWR getNumBytes() = 35840 getBytesOnDisk() = 35840 getVisibleLength()= -1 getVolume() = /data/4/dn/current getBlockFile()= /data/4/dn/current/BP-1043055049-192.168.11.11-1382442676609/current/rbw/blk_-8536558734938003208 unlinked=false at org.apache.hadoop.hdfs.server.datanode.BlockSender.init(BlockSender.java:205) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:326) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:92) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:64) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:221) at java.lang.Thread.run(Thread.java:744)
Re: issue about write append into hdfs ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: ch12:50010:DataXceiver error processing READ_BLOCK operation
one more question is if i need add the value of data node xceiver need i add it to my NN config file? On Fri, Feb 21, 2014 at 12:25 PM, Anurag Tangri anurag_tan...@yahoo.comwrote: Did you check your unix open file limit and data node xceiver value ? Is it too low for the number of blocks/data in your cluster ? Thanks, Anurag Tangri On Feb 20, 2014, at 6:57 PM, ch huang justlo...@gmail.com wrote: hi,maillist: i see the following info in my hdfs log ,and the block belong to the file which write by scribe ,i do not know why is there any limit in hdfs system ? 2014-02-21 10:33:30,235 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: opReadBlock BP-1043055049-192.168.11.11-1382442676609:blk_-8536558734938003208_3823240 received exc eption java.io.IOException: Replica gen stamp block genstamp, block=BP-1043055049-192.168.11.11-1382442676609:blk_-8536558734938003208_3823240, replica=ReplicaWaitingToBeRecov ered, blk_-8536558734938003208_3820986, RWR getNumBytes() = 35840 getBytesOnDisk() = 35840 getVisibleLength()= -1 getVolume() = /data/4/dn/current getBlockFile()= /data/4/dn/current/BP-1043055049-192.168.11.11-1382442676609/current/rbw/blk_-8536558734938003208 unlinked=false 2014-02-21 10:33:30,235 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(192.168.11.12, storageID=DS-754202132-192.168.11.12-50010-1382443087835, infoP ort=50075, ipcPort=50020, storageInfo=lv=-40;cid=CID-0e777b8c-19f3-44a1-8af1-916877f2506c;nsid=2086828354;c=0):Got exception while serving BP-1043055049-192.168.11.11-1382442676 609:blk_-8536558734938003208_3823240 to /192.168.11.15:56564 java.io.IOException: Replica gen stamp block genstamp, block=BP-1043055049-192.168.11.11-1382442676609:blk_-8536558734938003208_3823240, replica=ReplicaWaitingToBeRecovered, b lk_-8536558734938003208_3820986, RWR getNumBytes() = 35840 getBytesOnDisk() = 35840 getVisibleLength()= -1 getVolume() = /data/4/dn/current getBlockFile()= /data/4/dn/current/BP-1043055049-192.168.11.11-1382442676609/current/rbw/blk_-8536558734938003208 unlinked=false at org.apache.hadoop.hdfs.server.datanode.BlockSender.init(BlockSender.java:205) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:326) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:92) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:64) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:221) at java.lang.Thread.run(Thread.java:744) 2014-02-21 10:33:30,236 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: ch12:50010:DataXceiver error processing READ_BLOCK operation src: /192.168.11.15:56564 dest: / 192.168.11.12:50010 java.io.IOException: Replica gen stamp block genstamp, block=BP-1043055049-192.168.11.11-1382442676609:blk_-8536558734938003208_3823240, replica=ReplicaWaitingToBeRecovered, blk_-8536558734938003208_3820986, RWR getNumBytes() = 35840 getBytesOnDisk() = 35840 getVisibleLength()= -1 getVolume() = /data/4/dn/current getBlockFile()= /data/4/dn/current/BP-1043055049-192.168.11.11-1382442676609/current/rbw/blk_-8536558734938003208 unlinked=false at org.apache.hadoop.hdfs.server.datanode.BlockSender.init(BlockSender.java:205) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:326) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:92) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:64) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:221) at java.lang.Thread.run(Thread.java:744)
Re: issue about write append into hdfs ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: ch12:50010:DataXceiver error processing READ_BLOCK operation
i changed all datanode config add dfs.datanode.max.xcievers value is 131072 and restart all DN, still no use On Fri, Feb 21, 2014 at 12:25 PM, Anurag Tangri anurag_tan...@yahoo.comwrote: Did you check your unix open file limit and data node xceiver value ? Is it too low for the number of blocks/data in your cluster ? Thanks, Anurag Tangri On Feb 20, 2014, at 6:57 PM, ch huang justlo...@gmail.com wrote: hi,maillist: i see the following info in my hdfs log ,and the block belong to the file which write by scribe ,i do not know why is there any limit in hdfs system ? 2014-02-21 10:33:30,235 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: opReadBlock BP-1043055049-192.168.11.11-1382442676609:blk_-8536558734938003208_3823240 received exc eption java.io.IOException: Replica gen stamp block genstamp, block=BP-1043055049-192.168.11.11-1382442676609:blk_-8536558734938003208_3823240, replica=ReplicaWaitingToBeRecov ered, blk_-8536558734938003208_3820986, RWR getNumBytes() = 35840 getBytesOnDisk() = 35840 getVisibleLength()= -1 getVolume() = /data/4/dn/current getBlockFile()= /data/4/dn/current/BP-1043055049-192.168.11.11-1382442676609/current/rbw/blk_-8536558734938003208 unlinked=false 2014-02-21 10:33:30,235 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(192.168.11.12, storageID=DS-754202132-192.168.11.12-50010-1382443087835, infoP ort=50075, ipcPort=50020, storageInfo=lv=-40;cid=CID-0e777b8c-19f3-44a1-8af1-916877f2506c;nsid=2086828354;c=0):Got exception while serving BP-1043055049-192.168.11.11-1382442676 609:blk_-8536558734938003208_3823240 to /192.168.11.15:56564 java.io.IOException: Replica gen stamp block genstamp, block=BP-1043055049-192.168.11.11-1382442676609:blk_-8536558734938003208_3823240, replica=ReplicaWaitingToBeRecovered, b lk_-8536558734938003208_3820986, RWR getNumBytes() = 35840 getBytesOnDisk() = 35840 getVisibleLength()= -1 getVolume() = /data/4/dn/current getBlockFile()= /data/4/dn/current/BP-1043055049-192.168.11.11-1382442676609/current/rbw/blk_-8536558734938003208 unlinked=false at org.apache.hadoop.hdfs.server.datanode.BlockSender.init(BlockSender.java:205) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:326) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:92) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:64) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:221) at java.lang.Thread.run(Thread.java:744) 2014-02-21 10:33:30,236 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: ch12:50010:DataXceiver error processing READ_BLOCK operation src: /192.168.11.15:56564 dest: / 192.168.11.12:50010 java.io.IOException: Replica gen stamp block genstamp, block=BP-1043055049-192.168.11.11-1382442676609:blk_-8536558734938003208_3823240, replica=ReplicaWaitingToBeRecovered, blk_-8536558734938003208_3820986, RWR getNumBytes() = 35840 getBytesOnDisk() = 35840 getVisibleLength()= -1 getVolume() = /data/4/dn/current getBlockFile()= /data/4/dn/current/BP-1043055049-192.168.11.11-1382442676609/current/rbw/blk_-8536558734938003208 unlinked=false at org.apache.hadoop.hdfs.server.datanode.BlockSender.init(BlockSender.java:205) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:326) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:92) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:64) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:221) at java.lang.Thread.run(Thread.java:744)
issure about write append to a file ,and close ,reopen the same file
hi,maillist: i use scribe to receive data from app ,write to hadoop hdfs,when the system in high concurrency connect it will cause hdfs error like the following ,the incoming connect will be blocked ,and the tomcat will die, in dir user/hive/warehouse/dsp.db/request ,the file data_0 will be rotate each hour ,but the scribe ( we modified the scribe code) will switch the same file when rotate happen ,so data_0 will be close ,and reopen . and when the load is high ,i can observe the corrupt replica of data_0,how can i handle with it? thanks [Thu Feb 13 23:59:59 2014] [hdfs] disconnected fileSys for /user/hive/warehouse/dsp.db/request [Thu Feb 13 23:59:59 2014] [hdfs] closing /user/hive/warehouse/dsp.db/request/2014-02-13/data_0 [Thu Feb 13 23:59:59 2014] [hdfs] disconnecting fileSys for /user/hive/warehouse/dsp.db/request/2014-02-13/data_0 [Thu Feb 13 23:59:59 2014] [hdfs] disconnected fileSys for /user/hive/warehouse/dsp.db/request/2014-02-13/data_0 [Thu Feb 13 23:59:59 2014] [hdfs] Connecting to HDFS for /user/hive/warehouse/dsp.db/request/2014-02-13/data_0 [Thu Feb 13 23:59:59 2014] [hdfs] opened for append /user/hive/warehouse/dsp.db/request/2014-02-13/data_0 [Thu Feb 13 23:59:59 2014] [dsp_request] Opened file /user/hive/warehouse/dsp.db/request/2014-02-13/data_0 for writing [Thu Feb 13 23:59:59 2014] [dsp_request] 23:59 rotating file 2014-02-13/data old size 10027577955 max size 100 [Thu Feb 13 23:59:59 2014] [hdfs] Connecting to HDFS for /user/hive/warehouse/dsp.db/request [Thu Feb 13 23:59:59 2014] [hdfs] disconnecting fileSys for /user/hive/warehouse/dsp.db/request 14/02/13 23:59:59 INFO hdfs.DFSClient: Exception in createBlockOutputStream java.io.IOException: Bad connect ack with firstBadLink as 192.168.11.13:50010 at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutputStream(DFSOutputStream.java:1117) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:992) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:494) 14/02/13 23:59:59 WARN hdfs.DFSClient: Error Recovery for block BP-1043055049-192.168.11.11-1382442676609:blk_433572108425800355_3411489 in pipeline 192.168.11.12:50010, 192.168.11.13:50010, 192.168.11.14:50010, 192.168.11.10:50010, 192.168.11.15:50010: bad datanode 192.168.11.13:50010 14/02/13 23:59:59 INFO hdfs.DFSClient: Exception in createBlockOutputStream java.io.IOException: Bad connect ack with firstBadLink as 192.168.11.10:50010 at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutputStream(DFSOutputStream.java:1117) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:992) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:494) 14/02/13 23:59:59 WARN hdfs.DFSClient: Error Recovery for block BP-1043055049-192.168.11.11-1382442676609:blk_433572108425800355_3411489 in pipeline 192.168.11.12:50010, 192.168.11.14:50010, 192.168.11.10:50010, 192.168.11.15:50010: bad datanode 192.168.11.10:50010 14/02/13 23:59:59 INFO hdfs.DFSClient: Exception in createBlockOutputStream java.io.IOException: Bad connect ack with firstBadLink as 192.168.11.15:50010 at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutputStream(DFSOutputStream.java:1117) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:992) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:494) 14/02/13 23:59:59 WARN hdfs.DFSClient: Error Recovery for block BP-1043055049-192.168.11.11-1382442676609:blk_433572108425800355_3411489 in pipeline 192.168.11.12:50010, 192.168.11.14:50010, 192.168.11.15:50010: bad datanode 192.168.11.15:50010 /user/hive/warehouse/dsp.db/request/2014-02-13/data_0: blk_433572108425800355_3411509 (replicas: l: 1 d: 0 c: 4 e: 0) 192.168.11.12:50010 : 192.168.11.13:50010(corrupt) : 192.168.11.14:50010(corrupt) : 192.168.11.10:50010(corrupt) : 192.168.11.15:50010(corrupt) :
what difference between livereplicas ,excessreplicas and excessblocks?
hi,maillist: i am very confused about the three conception ,because i see no excessreplicas in metadata dump file,but i see one excess block ,in metrics output ,why?
Which numa policy is the best for hadoop process?
hi,maillist: the numa arch CPU has several policies ,i wonder if anyone has tested it ,and which one is the best?
Re: hadoop report has corrupt block but i can not find any in block metadata
and still not find which block corrupt ,i search keyword 'orrupt' ,only get /hbase/.corrupt dir ,but it's a dir not corrupt block On Sat, Jan 25, 2014 at 6:31 PM, Shekhar Sharma shekhar2...@gmail.comwrote: Run fsck command hadoop fsck 《path》-files -blocks -locations On 25 Jan 2014 08:04, ch huang justlo...@gmail.com wrote: hi,maillist: this morning nagios alert hadoop has corrupt block ,i checked it use hdfs dfsadmin -report ,from it output ,it did has corrupt blocks Configured Capacity: 53163259158528 (48.35 TB) Present Capacity: 50117251458834 (45.58 TB) DFS Remaining: 45289289015296 (41.19 TB) DFS Used: 4827962443538 (4.39 TB) DFS Used%: 9.63% Under replicated blocks: 277 Blocks with corrupt replicas: 2 Missing blocks: 0 but i dump all metadata use # sudo -u hdfs hdfs dfsadmin -metasave and loock the record which c: not 0 i can not find any block with corrupt replicas,why?
how to caculate a HDFS directory size ?
hi,maillist: i want to caculate the size of a HDFS directory ,how to do it ?
how can i get job start and finish time?
hi,maillist: my web ui is not available, i can use yarn application -list ,my question is how can i get job start and finish time?
issue about how to assiging map output to reducer?
hi,maillist: i look the containers log from hadoop fs -cat /var/log/hadoop-yarn/apps/root/logs/application_1388730279827_2770/CHBM221_50853 and log say it get 25 map output , and assiging 7 to fetcher 5, assiging 7 to fetcher 4 and assiging 11 to fetcher 3,my question is why not assiging 8 to fetcher 5, assiging 8 to fetcher 4 and assiging 9 to fetcher 3 ? 2014-01-08 11:28:00,346 INFO [EventFetcher for fetching Map Completion Events] org.apache.hadoop.mapreduce.task.reduce.EventFetcher: attempt_1388730279827_2770_r_00_0: Got 25 new map-outputs 2014-01-08 11:28:00,348 INFO [fetcher#5] org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: Assiging CHBM223:8080 with 7 to fetcher#5 2014-01-08 11:28:00,349 INFO [fetcher#5] org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: assigned 7 of 7 to CHBM223:8080 to fetcher#5 2014-01-08 11:28:00,349 INFO [fetcher#4] org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: Assiging CHBM222:8080 with 7 to fetcher#4 2014-01-08 11:28:00,349 INFO [fetcher#4] org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: assigned 7 of 7 to CHBM222:8080 to fetcher#4 2014-01-08 11:28:00,352 INFO [fetcher#3] org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: Assiging CHBM221:8080 with 11 to fetcher#3 2014-01-08 11:28:00,352 INFO [fetcher#3] org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: assigned 11 of 11 to CHBM221:8080 to fetcher#3
Re: issue about running hive MR job in hadoop
yes i checked the code ,and find the Exception from lfs.mkdir(userFileCacheDir, null, false); also find the AM located in CHBM224 ,all will failed but,AM located in CHBM223,all success in CHBM224 # ls -l /data/mrlocal/1/yarn/ total 8 drwxrwxrwx 5 yarn yarn 4096 Nov 5 20:50 local drwxr-xr-x 3 yarn yarn 4096 Jan 3 15:57 logs # ls -l /data/mrlocal/2/yarn/ total 8 drwxrwxrwx 5 yarn yarn 4096 Nov 5 20:50 local drwxr-xr-x 3 yarn yarn 4096 Jan 3 15:57 logs in CHBM223 # ls /data/mrlocal/1/yarn/ -l total 8 drwxr-xr-x 5 yarn yarn 4096 Nov 5 20:51 local drwxr-xr-x 3 yarn yarn 4096 Jan 3 15:46 logs # ls /data/mrlocal/2/yarn/ -l total 8 drwxr-xr-x 5 yarn yarn 4096 Nov 5 20:51 local drwxr-xr-x 3 yarn yarn 4096 Jan 3 15:46 logs i also find if i let abnormal node (CHBM224) run and shutdown the other normal node ,when i submit a MR job use hive ,and the dir /data/mrlocal/2/yarn/local/usercache/hive/filecache ,it's mode will flush to 710 ,even i change the file to 755,but i test on a normal node (open one normal node ,and shutdown others) ,the dir mode will not changed # ls -l /data/mrlocal/2/yarn/local/usercache/hive/ total 16 drwx--x--- 7 yarn yarn 4096 Jan 3 16:30 appcache drwx--x--- 148 yarn yarn 12288 Jan 3 10:03 filecache On Fri, Jan 3, 2014 at 3:52 PM, Bing Jiang jiangbinglo...@gmail.com wrote: Could you check your yarn-local directory authority? From the diagnosis, error occurs at mkdir in local directory. I guess something wrong with local direcotry which is set as yarn local dir. 2014/1/3 ch huang justlo...@gmail.com hi, i submit a MR job through hive ,but when it run stage-2 ,it failed why? it seems permission problem ,but i do not know which dir cause the problem Application application_1388730279827_0035 failed 1 times due to AM Container for appattempt_1388730279827_0035_01 exited with exitCode: -1000 due to: EPERM: Operation not permitted at org.apache.hadoop.io.nativeio.NativeIO.chmod(Native Method) at org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:581) at org.apache.hadoop.fs.RawLocalFileSystem.mkdirs(RawLocalFileSystem.java:388) at org.apache.hadoop.fs.FileSystem.primitiveMkdir(FileSystem.java:1041) at org.apache.hadoop.fs.DelegateToFileSystem.mkdir(DelegateToFileSystem.java:150) at org.apache.hadoop.fs.FilterFs.mkdir(FilterFs.java:190) at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:698) at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:695) at org.apache.hadoop.fs.FileContext$FSLinkResolver.resolve(FileContext.java:2325) at org.apache.hadoop.fs.FileContext.mkdir(FileContext.java:695) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.initDirs(ContainerLocalizer.java:385) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.runLocalization(ContainerLocalizer.java:130) at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.startLocalizer(DefaultContainerExecutor.java:103) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.run(ResourceLocalizationService.java:861) .Failing this attempt.. Failing the application. -- Bing Jiang Tel:(86)134-2619-1361 weibo: http://weibo.com/jiangbinglover BLOG: www.binospace.com BLOG: http://blog.sina.com.cn/jiangbinglover Focus on distributed computing, HDFS/HBase
issue about running hive MR job in hadoop
hi, i submit a MR job through hive ,but when it run stage-2 ,it failed why? it seems permission problem ,but i do not know which dir cause the problem Application application_1388730279827_0035 failed 1 times due to AM Container for appattempt_1388730279827_0035_01 exited with exitCode: -1000 due to: EPERM: Operation not permitted at org.apache.hadoop.io.nativeio.NativeIO.chmod(Native Method) at org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:581) at org.apache.hadoop.fs.RawLocalFileSystem.mkdirs(RawLocalFileSystem.java:388) at org.apache.hadoop.fs.FileSystem.primitiveMkdir(FileSystem.java:1041) at org.apache.hadoop.fs.DelegateToFileSystem.mkdir(DelegateToFileSystem.java:150) at org.apache.hadoop.fs.FilterFs.mkdir(FilterFs.java:190) at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:698) at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:695) at org.apache.hadoop.fs.FileContext$FSLinkResolver.resolve(FileContext.java:2325) at org.apache.hadoop.fs.FileContext.mkdir(FileContext.java:695) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.initDirs(ContainerLocalizer.java:385) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.runLocalization(ContainerLocalizer.java:130) at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.startLocalizer(DefaultContainerExecutor.java:103) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.run(ResourceLocalizationService.java:861) .Failing this attempt.. Failing the application.
issue about hadoop streaming
hi,maillist: i read the doc about hadoop streaming,is it possible to construct job chain through pipeline and hadoop streaming ? if the first job like this first job : hadoop jar /usr/lib/hadoop-mapreduce/hadoop-streaming.jar -input /alex/messages -output /alex/stout4 -mapper /bin/cat -reducer /tmp/ mycount.pl -file /tmp/mycount.pl so i want to let the first job output become the second job input ,if can ,how to do it? thanks!
why the empty file need occupy one split?
hi,maillist: i read the code of FileInputFormat,and find it will make a split for empty file,but i think it's no means do such things,and it will cause MR framework create a extra map task to do things,anyone can explain?
what is MPP,HAWQ,and the relation between them and hadoop?
hi,maillist: ATT
Re: issue about no class find in running MR job
i think it's a official uage ,you can type hadoop ,and read the last line of the help output ,and i use CDH 4.4 ,i do not know if community version is support this usage On Sat, Dec 14, 2013 at 2:27 AM, Vinod Kumar Vavilapalli vino...@hortonworks.com wrote: That is not the correct usage. You should do hadoop jar your-jar-name main-class-name. Or if you are adventurous, directly invoke your class using java and setting appropriate classpath. Thanks, +Vinod On Dec 12, 2013, at 6:11 PM, ch huang justlo...@gmail.com wrote: hadoop ../test/WordCount CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
issue about no class find in running MR job
hi,maillist: i rewrite WordCount.java and try to compile and run it but it say not find main class ,why? [root@CHBM224 myMR]# cat WordCount.java import java.io.IOException; import java.util.StringTokenizer; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.conf.Configured; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Job; import org.apache.hadoop.mapreduce.Mapper; import org.apache.hadoop.mapreduce.Reducer; import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; import org.apache.hadoop.util.GenericOptionsParser; import org.apache.hadoop.util.ToolRunner; import org.apache.hadoop.util.Tool; public class WordCount extends Configured implements Tool { public static class TokenizerMapper extends MapperObject, Text, Text, IntWritable{ private final static IntWritable one = new IntWritable(1); private Text word = new Text(); public void map(Object key, Text value, Context context ) throws IOException, InterruptedException { StringTokenizer itr = new StringTokenizer(value.toString()); while (itr.hasMoreTokens()) { word.set(itr.nextToken()); context.write(word, one); } } } public static class IntSumReducer extends ReducerText,IntWritable,Text,IntWritable { private IntWritable result = new IntWritable(); public void reduce(Text key, IterableIntWritable values, Context context ) throws IOException, InterruptedException { int sum = 0; for (IntWritable val : values) { sum += val.get(); } result.set(sum); context.write(key, result); } } private static void usage() throws IOException { System.err.println(teragen num rows output dir); } public int run(String[] args) throws IOException, InterruptedException, ClassN otFoundException { Job job = Job.getInstance(getConf()); if (args.length != 2) { usage(); return 2; } job.setJobName(wordcount); job.setJarByClass(WordCount.class); job.setMapperClass(TokenizerMapper.class); job.setCombinerClass(IntSumReducer.class); job.setReducerClass(IntSumReducer.class); job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class); FileInputFormat.addInputPath(job, new Path(args[0])); FileOutputFormat.setOutputPath(job, new Path(args[1])); return job.waitForCompletion(true) ? 0 : 1; } public static void main(String[] args) throws Exception { int res = ToolRunner.run(new Configuration(), new WordCount(), args); System.exit(res); } } [root@CHBM224 myMR]# javac -cp '/usr/lib/hadoop/*:/usr/lib/hadoop-mapreduce/*' -d ../test WordCount.java [root@CHBM224 myMR]# hadoop ../test/WordCount Error: Could not find or load main class ...test.WordCount
Re: issue about no class find in running MR job
no ,it need not , hadoop can run class directly,i try in other box,it work fine, # hadoop com/test/demo/WordCount Error: Could not find or load main class com.test.demo.WordCount [root@CHBM224 test]# hadoop classpath /etc/hadoop/conf:/usr/lib/hadoop/lib/*:/usr/lib/hadoop/.//*:/usr/lib/hadoop-hdfs/./:/usr/lib/hadoop-hdfs/lib/*:/usr/lib/hadoop-hdfs/.//*:/usr/lib/hadoop-yarn/lib/*:/usr/lib/hadoop-yarn/.//*:/usr/lib/hadoop-mapreduce/lib/*:/usr/lib/hadoop-mapreduce/.//* [root@CHBM224 test]# echo $CLASSPATH .:/usr/java/jdk1.7.0_25/lib/dt.jar:/usr/java/jdk1.7.0_25/lib/tools.jar ---copy class path to another box ,and it's work fine # cd test/ [root@CH22 test]# hadoop com/test/demo/WordCount teragen num rows output dir [root@CH22 test]# hadoop classpath /etc/hadoop/conf:/usr/lib/hadoop/lib/*:/usr/lib/hadoop/.//*::/usr/lib/hadoop/lib:/usr/lib/hadoop-hdfs/./:/usr/lib/hadoop-hdfs/lib/*:/usr/lib/hadoop-hdfs/.//*:/usr/lib/hadoop-yarn/lib/*:/usr/lib/hadoop-yarn/.//*:/usr/lib/hadoop-mapreduce/lib/*:/usr/lib/hadoop-mapreduce/.//* [root@CH22 test]# echo $CLASSPATH .:/usr/java/jdk1.6.0_35/lib/dt.jar:/usr/java/jdk1.6.0_35/lib/tools.jar On Fri, Dec 13, 2013 at 10:17 AM, Tao Xiao xiaotao.cs@gmail.com wrote: how did you package and compile your jar ? did you specify the main class for the JAR file you generated ? 2013/12/13 ch huang justlo...@gmail.com hi,maillist: i rewrite WordCount.java and try to compile and run it but it say not find main class ,why? [root@CHBM224 myMR]# cat WordCount.java import java.io.IOException; import java.util.StringTokenizer; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.conf.Configured; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Job; import org.apache.hadoop.mapreduce.Mapper; import org.apache.hadoop.mapreduce.Reducer; import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; import org.apache.hadoop.util.GenericOptionsParser; import org.apache.hadoop.util.ToolRunner; import org.apache.hadoop.util.Tool; public class WordCount extends Configured implements Tool { public static class TokenizerMapper extends MapperObject, Text, Text, IntWritable{ private final static IntWritable one = new IntWritable(1); private Text word = new Text(); public void map(Object key, Text value, Context context ) throws IOException, InterruptedException { StringTokenizer itr = new StringTokenizer(value.toString()); while (itr.hasMoreTokens()) { word.set(itr.nextToken()); context.write(word, one); } } } public static class IntSumReducer extends ReducerText,IntWritable,Text,IntWritable { private IntWritable result = new IntWritable(); public void reduce(Text key, IterableIntWritable values, Context context ) throws IOException, InterruptedException { int sum = 0; for (IntWritable val : values) { sum += val.get(); } result.set(sum); context.write(key, result); } } private static void usage() throws IOException { System.err.println(teragen num rows output dir); } public int run(String[] args) throws IOException, InterruptedException, ClassN otFoundException { Job job = Job.getInstance(getConf()); if (args.length != 2) { usage(); return 2; } job.setJobName(wordcount); job.setJarByClass(WordCount.class); job.setMapperClass(TokenizerMapper.class); job.setCombinerClass(IntSumReducer.class); job.setReducerClass(IntSumReducer.class); job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class); FileInputFormat.addInputPath(job, new Path(args[0])); FileOutputFormat.setOutputPath(job, new Path(args[1])); return job.waitForCompletion(true) ? 0 : 1; } public static void main(String[] args) throws Exception { int res = ToolRunner.run(new Configuration(), new WordCount(), args); System.exit(res); } } [root@CHBM224 myMR]# javac -cp '/usr/lib/hadoop/*:/usr/lib/hadoop-mapreduce/*' -d ../test WordCount.java [root@CHBM224 myMR]# hadoop ../test/WordCount Error: Could not find or load main class ...test.WordCount
Re: issue about no class find in running MR job
i find the reason no main class found # hadoop classpath /etc/hadoop/conf:/usr/lib/hadoop/lib/*:/usr/lib/hadoop/.//*:/usr/lib/hadoop-hdfs/./:/usr/lib/hadoop-hdfs/lib/*:/usr/lib/hadoop-hdfs/.//*:/usr/lib/hadoop-yarn/lib/*:/usr/lib/hadoop-yarn/.//*:/usr/lib/hadoop-mapreduce/lib/*:/usr/lib/hadoop-mapreduce/.//*:/usr/lib/hadoop/lib normal /etc/hadoop/conf:/usr/lib/hadoop/lib/*:/usr/lib/hadoop/.//*::/usr/lib/hadoop/lib:/usr/lib/hadoop-hdfs/./:/usr/lib/hadoop-hdfs/lib/*:/usr/lib/hadoop-hdfs/.//*:/usr/lib/hadoop-yarn/lib/*:/usr/lib/hadoop-yarn/.//*:/usr/lib/hadoop-mapreduce/lib/*:/usr/lib/hadoop-mapreduce/.//* there exist :: :) i just do not know the mean On Fri, Dec 13, 2013 at 10:40 AM, ch huang justlo...@gmail.com wrote: no ,it need not , hadoop can run class directly,i try in other box,it work fine, # hadoop com/test/demo/WordCount Error: Could not find or load main class com.test.demo.WordCount [root@CHBM224 test]# hadoop classpath /etc/hadoop/conf:/usr/lib/hadoop/lib/*:/usr/lib/hadoop/.//*:/usr/lib/hadoop-hdfs/./:/usr/lib/hadoop-hdfs/lib/*:/usr/lib/hadoop-hdfs/.//*:/usr/lib/hadoop-yarn/lib/*:/usr/lib/hadoop-yarn/.//*:/usr/lib/hadoop-mapreduce/lib/*:/usr/lib/hadoop-mapreduce/.//* [root@CHBM224 test]# echo $CLASSPATH .:/usr/java/jdk1.7.0_25/lib/dt.jar:/usr/java/jdk1.7.0_25/lib/tools.jar ---copy class path to another box ,and it's work fine # cd test/ [root@CH22 test]# hadoop com/test/demo/WordCount teragen num rows output dir [root@CH22 test]# hadoop classpath /etc/hadoop/conf:/usr/lib/hadoop/lib/*:/usr/lib/hadoop/.//*::/usr/lib/hadoop/lib:/usr/lib/hadoop-hdfs/./:/usr/lib/hadoop-hdfs/lib/*:/usr/lib/hadoop-hdfs/.//*:/usr/lib/hadoop-yarn/lib/*:/usr/lib/hadoop-yarn/.//*:/usr/lib/hadoop-mapreduce/lib/*:/usr/lib/hadoop-mapreduce/.//* [root@CH22 test]# echo $CLASSPATH .:/usr/java/jdk1.6.0_35/lib/dt.jar:/usr/java/jdk1.6.0_35/lib/tools.jar On Fri, Dec 13, 2013 at 10:17 AM, Tao Xiao xiaotao.cs@gmail.comwrote: how did you package and compile your jar ? did you specify the main class for the JAR file you generated ? 2013/12/13 ch huang justlo...@gmail.com hi,maillist: i rewrite WordCount.java and try to compile and run it but it say not find main class ,why? [root@CHBM224 myMR]# cat WordCount.java import java.io.IOException; import java.util.StringTokenizer; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.conf.Configured; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Job; import org.apache.hadoop.mapreduce.Mapper; import org.apache.hadoop.mapreduce.Reducer; import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; import org.apache.hadoop.util.GenericOptionsParser; import org.apache.hadoop.util.ToolRunner; import org.apache.hadoop.util.Tool; public class WordCount extends Configured implements Tool { public static class TokenizerMapper extends MapperObject, Text, Text, IntWritable{ private final static IntWritable one = new IntWritable(1); private Text word = new Text(); public void map(Object key, Text value, Context context ) throws IOException, InterruptedException { StringTokenizer itr = new StringTokenizer(value.toString()); while (itr.hasMoreTokens()) { word.set(itr.nextToken()); context.write(word, one); } } } public static class IntSumReducer extends ReducerText,IntWritable,Text,IntWritable { private IntWritable result = new IntWritable(); public void reduce(Text key, IterableIntWritable values, Context context ) throws IOException, InterruptedException { int sum = 0; for (IntWritable val : values) { sum += val.get(); } result.set(sum); context.write(key, result); } } private static void usage() throws IOException { System.err.println(teragen num rows output dir); } public int run(String[] args) throws IOException, InterruptedException, ClassN otFoundException { Job job = Job.getInstance(getConf()); if (args.length != 2) { usage(); return 2; } job.setJobName(wordcount); job.setJarByClass(WordCount.class); job.setMapperClass(TokenizerMapper.class); job.setCombinerClass(IntSumReducer.class); job.setReducerClass(IntSumReducer.class); job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class); FileInputFormat.addInputPath(job, new Path(args[0])); FileOutputFormat.setOutputPath(job, new Path(args[1])); return job.waitForCompletion(true) ? 0 : 1; } public static void main(String[] args) throws Exception { int res = ToolRunner.run(new Configuration
issue about file in DN datadir
hi,maillist: i have a question about file which represent a block in DN,here is my way to find a block ,i have a file part-m-0 and i find one replica of one block blk_-5451264646515882190_106793 on box 192.168.10.224,when i search the datadir on 224 i find the meta file ,but no datafile why? # sudo -u hdfs hdfs fsck /alex/terasort/10G-input/part-m-0 -files -blocks -locations Connecting to namenode via http://CHBM220:50070 http://chbm220:50070/ FSCK started by hdfs (auth:SIMPLE) from /192.168.10.224 for path /alex/terasort/10G-input/part-m-0 at Wed Dec 11 14:45:15 CST 2013 /alex/terasort/10G-input/part-m-0 5 bytes, 8 block(s): OK 0. BP-50684181-192.168.10.220-1383638483950:blk_1612709339511818235_106786 len=67108864 repl=3 [192.168.10.222:50010, 192.168.10.221:50010, 192.168.10.223:50010] 1. BP-50684181-192.168.10.220-1383638483950:blk_-3802055733518151718_106789 len=67108864 repl=3 [192.168.10.222:50010, 192.168.10.221:50010, 192.168.10.223:50010] 2. BP-50684181-192.168.10.220-1383638483950:blk_-1672420361561559829_106791 len=67108864 repl=3 [192.168.10.222:50010, 192.168.10.224:50010, 192.168.10.223:50010] 3. BP-50684181-192.168.10.220-1383638483950:blk_-5451264646515882190_106793 len=67108864 repl=3 [192.168.10.222:50010, 192.168.10.221:50010, 192.168.10.224:50010] 4. BP-50684181-192.168.10.220-1383638483950:blk_6624597853174216221_106795 len=67108864 repl=3 [192.168.10.222:50010, 192.168.10.224:50010, 192.168.10.221:50010] 5. BP-50684181-192.168.10.220-1383638483950:blk_-4947775334639504308_106797 len=67108864 repl=3 [192.168.10.222:50010, 192.168.10.224:50010, 192.168.10.223:50010] 6. BP-50684181-192.168.10.220-1383638483950:blk_214751650269427943_106799 len=67108864 repl=3 [192.168.10.222:50010, 192.168.10.221:50010, 192.168.10.224:50010] # find /data -name 'blk_-5451264646515882190_106793*' /data/dataspace/3/current/BP-50684181-192.168.10.220-1383638483950/current/finalized/subdir39/blk_-5451264646515882190_106793.meta # ls /data/dataspace/3/current/BP-50684181-192.168.10.220-1383638483950/current/finalized/subdir39/ blk_3810334848964580951 blk_4621466474283145207_106809.meta blk_-5451264646515882190 blk_580162309124277323_106788.meta blk_3810334848964580951_106801.meta blk_516060569193828059 blk_-5451264646515882190_106793.meta blk_4621466474283145207 blk_516060569193828059_106796.meta blk_580162309124277323
Re: issue about corrupt block test
you are right ,but i only find meta file why no block data file? # find /data -name 'blk_-5451264646515882190_106793*' /data/dataspace/3/current/BP-50684181-192.168.10.220-1383638483950/current/finalized/subdir39/blk_-5451264646515882190_106793.meta # ls /data/dataspace/3/current/BP-50684181-192.168.10.220-1383638483950/current/finalized/subdir39/ blk_3810334848964580951 blk_4621466474283145207_106809.meta blk_-5451264646515882190 blk_580162309124277323_106788.meta blk_3810334848964580951_106801.meta blk_516060569193828059 blk_-5451264646515882190_106793.meta blk_4621466474283145207 blk_516060569193828059_106796.meta blk_580162309124277323 On Wed, Dec 11, 2013 at 3:16 PM, Harsh J ha...@cloudera.com wrote: Block files are not stored in a flat directory (to avoid FS limits of max files under a dir). Instead of looking for them right under finalized, issue a find query with the pattern instead and you should be able to spot it. On Wed, Dec 11, 2013 at 9:10 AM, ch huang justlo...@gmail.com wrote: hi,maillist: i try to corrupt a block of a file in my benchmark environment, as the following command i find blk_2504407693800874616_106252 ,it's replica on 192.168.10.224 is my target ,but i find all the datadir in 192.168.10.224 ,can not fine the datafile belongs to this replic ,why? # ls /data/dataspace/1/current/BP-50684181-192.168.10.220-1383638483950/current/finalized/blk_3717620888497075523_106232* ls: cannot access /data/dataspace/1/current/BP-50684181-192.168.10.220-1383638483950/current/finalized/blk_3717620888497075523_106232*: No such file or directory [root@CHBM224 conf]# ls /data/dataspace/1/current/BP-50684181-192.168.10.220-1383638483950/current/finalized/blk_2504407693800874616_106252* ls: cannot access /data/dataspace/1/current/BP-50684181-192.168.10.220-1383638483950/current/finalized/blk_2504407693800874616_106252*: No such file or directory [root@CHBM224 conf]# ls /data/dataspace/2/current/BP-50684181-192.168.10.220-1383638483950/current/finalized/blk_2504407693800874616_106252* ls: cannot access /data/dataspace/2/current/BP-50684181-192.168.10.220-1383638483950/current/finalized/blk_2504407693800874616_106252*: No such file or directory [root@CHBM224 conf]# ls /data/dataspace/3/current/BP-50684181-192.168.10.220-1383638483950/current/finalized/blk_2504407693800874616_106252* ls: cannot access /data/dataspace/3/current/BP-50684181-192.168.10.220-1383638483950/current/finalized/blk_2504407693800874616_106252*: No such file or directory [root@CHBM224 conf]# hdfs fsck /alex/terasort/1G-input/part-m-0 -files -blocks -locations Connecting to namenode via http://CHBM220:50070 http://chbm220:50070/ FSCK started by root (auth:SIMPLE) from /192.168.10.224 for path /alex/terasort/1G-input/part-m-0 at Wed Dec 11 11:35:42 CST 2013 /alex/terasort/1G-input/part-m-0 1 bytes, 2 block(s): OK 0. BP-50684181-192.168.10.220-1383638483950:blk_3717620888497075523_106232 len=67108864 repl=3 [192.168.10.222:50010, 192.168.10.223:50010, 192.168.10.221:50010] 1. BP-50684181-192.168.10.220-1383638483950:blk_2504407693800874616_106252 len=32891136 repl=3 [192.168.10.222:50010, 192.168.10.221:50010, 192.168.10.224:50010] -- Harsh J
Re: how to handle the corrupt block in HDFS?
the alert from my product env,i will test on my benchmark env,thanks On Thu, Dec 12, 2013 at 2:33 AM, Adam Kawa kawa.a...@gmail.com wrote: I have only 1-node cluster, so I am not able to verify it when replication factor is bigger than 1. I run the fsck on a file that consists of 3 blocks, and 1 block has a corrupt replica. fsck told that the system is HEALTHY. When I restarted the DN, then the block scanner (BlockPoolSliceScanner) started and it detected a corrupted replica. Then I run fsck again on that file, and it told me that the system is CORRUPT. If you have a small (and non-production) cluster, can you restart your datandoes and run fsck again? 2013/12/11 ch huang justlo...@gmail.com thanks for reply,but if the block just has 1 corrupt replica,hdfs fsck can not tell you which block of which file has a replica been corrupted,fsck just useful on all of one block's replica bad On Wed, Dec 11, 2013 at 10:01 AM, Adam Kawa kawa.a...@gmail.com wrote: When you identify a file with corrupt block(s), then you can locate the machines that stores its block by typing $ sudo -u hdfs hdfs fsck path-to-file -files -blocks -locations 2013/12/11 Adam Kawa kawa.a...@gmail.com Maybe this can work for you $ sudo -u hdfs hdfs fsck / -list-corruptfileblocks ? 2013/12/11 ch huang justlo...@gmail.com thanks for reply, what i do not know is how can i locate the block which has the corrupt replica,(so i can observe how long the corrupt replica will be removed and a new health replica replace it,because i get nagios alert for three days,i do not sure if it is the same corrupt replica cause the alert ,and i do not know the interval of hdfs check corrupt replica and clean it) On Tue, Dec 10, 2013 at 6:20 PM, Vinayakumar B vinayakuma...@huawei.com wrote: Hi ch huang, It may seem strange, but the fact is, *CorruptBlocks* through JMX means *“Number of blocks with corrupt replicas”. May not be all replicas are corrupt. *This you can check though jconsole for description. Where as *Corrupt blocks* through fsck means, *blocks with all replicas corrupt(non-recoverable)/ missing.* In your case, may be one of the replica is corrupt, not all replicas of same block. This corrupt replica will be deleted automatically if one more datanode available in your cluster and block replicated to that. Related to replication 10, As Peter Marron said, *some of the important files of the mapreduce job will set the replication of 10, to make it accessible faster and launch map tasks faster. * Anyway, if the job is success these files will be deleted auomatically. I think only in some cases if the jobs are killed in between these files will remain in hdfs showing underreplicated blocks. Thanks and Regards, Vinayakumar B *From:* Peter Marron [mailto:peter.mar...@trilliumsoftware.com] *Sent:* 10 December 2013 14:19 *To:* user@hadoop.apache.org *Subject:* RE: how to handle the corrupt block in HDFS? Hi, I am sure that there are others who will answer this better, but anyway. The default replication level for files in HDFS is 3 and so most files that you see will have a replication level of 3. However when you run a Map/Reduce job the system knows in advance that every node will need a copy of certain files. Specifically the job.xml and the various jars containing classes that will be needed to run the mappers and reducers. So the system arranges that some of these files have a higher replication level. This increases the chances that a copy will be found locally. By default this higher replication level is 10. This can seem a little odd on a cluster where you only have, say, 3 nodes. Because it means that you will almost always have some blocks that are marked under-replicated. I think that there was some discussion a while back to change this to make the replication level something like min(10, #number of nodes) However, as I recall, the general consensus was that this was extra complexity that wasn’t really worth it. If it ain’t broke… Hope that this helps. *Peter Marron* Senior Developer, Research Development Office: +44 *(0) 118-940-7609* peter.mar...@trilliumsoftware.com Theale Court First Floor, 11-13 High Street, Theale, RG7 5AH, UK https://www.facebook.com/pages/Trillium-Software/109184815778307 https://twitter.com/TrilliumSW http://www.linkedin.com/company/17710 *www.trilliumsoftware.com http://www.trilliumsoftware.com/* Be Certain About Your Data. Be Trillium Certain. *From:* ch huang [mailto:justlo...@gmail.com justlo...@gmail.com] *Sent:* 10 December 2013 01:21 *To:* user@hadoop.apache.org *Subject:* Re: how to handle the corrupt block in HDFS? more strange , in my HDFS cluster ,every block has three replicas,but i find some one has ten replicas ,why? # sudo -u hdfs hadoop fs -ls /data/hisstage/helen/.staging/job_1385542328307_0915 Found 5
Re: how to handle the corrupt block in HDFS?
and is fsck report data from BlockPoolSliceScanner? it seems run once each 3 weeks can i restart DN one by one without interrupt the job which is running? On Thu, Dec 12, 2013 at 2:33 AM, Adam Kawa kawa.a...@gmail.com wrote: I have only 1-node cluster, so I am not able to verify it when replication factor is bigger than 1. I run the fsck on a file that consists of 3 blocks, and 1 block has a corrupt replica. fsck told that the system is HEALTHY. When I restarted the DN, then the block scanner (BlockPoolSliceScanner) started and it detected a corrupted replica. Then I run fsck again on that file, and it told me that the system is CORRUPT. If you have a small (and non-production) cluster, can you restart your datandoes and run fsck again? 2013/12/11 ch huang justlo...@gmail.com thanks for reply,but if the block just has 1 corrupt replica,hdfs fsck can not tell you which block of which file has a replica been corrupted,fsck just useful on all of one block's replica bad On Wed, Dec 11, 2013 at 10:01 AM, Adam Kawa kawa.a...@gmail.com wrote: When you identify a file with corrupt block(s), then you can locate the machines that stores its block by typing $ sudo -u hdfs hdfs fsck path-to-file -files -blocks -locations 2013/12/11 Adam Kawa kawa.a...@gmail.com Maybe this can work for you $ sudo -u hdfs hdfs fsck / -list-corruptfileblocks ? 2013/12/11 ch huang justlo...@gmail.com thanks for reply, what i do not know is how can i locate the block which has the corrupt replica,(so i can observe how long the corrupt replica will be removed and a new health replica replace it,because i get nagios alert for three days,i do not sure if it is the same corrupt replica cause the alert ,and i do not know the interval of hdfs check corrupt replica and clean it) On Tue, Dec 10, 2013 at 6:20 PM, Vinayakumar B vinayakuma...@huawei.com wrote: Hi ch huang, It may seem strange, but the fact is, *CorruptBlocks* through JMX means *“Number of blocks with corrupt replicas”. May not be all replicas are corrupt. *This you can check though jconsole for description. Where as *Corrupt blocks* through fsck means, *blocks with all replicas corrupt(non-recoverable)/ missing.* In your case, may be one of the replica is corrupt, not all replicas of same block. This corrupt replica will be deleted automatically if one more datanode available in your cluster and block replicated to that. Related to replication 10, As Peter Marron said, *some of the important files of the mapreduce job will set the replication of 10, to make it accessible faster and launch map tasks faster. * Anyway, if the job is success these files will be deleted auomatically. I think only in some cases if the jobs are killed in between these files will remain in hdfs showing underreplicated blocks. Thanks and Regards, Vinayakumar B *From:* Peter Marron [mailto:peter.mar...@trilliumsoftware.com] *Sent:* 10 December 2013 14:19 *To:* user@hadoop.apache.org *Subject:* RE: how to handle the corrupt block in HDFS? Hi, I am sure that there are others who will answer this better, but anyway. The default replication level for files in HDFS is 3 and so most files that you see will have a replication level of 3. However when you run a Map/Reduce job the system knows in advance that every node will need a copy of certain files. Specifically the job.xml and the various jars containing classes that will be needed to run the mappers and reducers. So the system arranges that some of these files have a higher replication level. This increases the chances that a copy will be found locally. By default this higher replication level is 10. This can seem a little odd on a cluster where you only have, say, 3 nodes. Because it means that you will almost always have some blocks that are marked under-replicated. I think that there was some discussion a while back to change this to make the replication level something like min(10, #number of nodes) However, as I recall, the general consensus was that this was extra complexity that wasn’t really worth it. If it ain’t broke… Hope that this helps. *Peter Marron* Senior Developer, Research Development Office: +44 *(0) 118-940-7609* peter.mar...@trilliumsoftware.com Theale Court First Floor, 11-13 High Street, Theale, RG7 5AH, UK https://www.facebook.com/pages/Trillium-Software/109184815778307 https://twitter.com/TrilliumSW http://www.linkedin.com/company/17710 *www.trilliumsoftware.com http://www.trilliumsoftware.com/* Be Certain About Your Data. Be Trillium Certain. *From:* ch huang [mailto:justlo...@gmail.com justlo...@gmail.com] *Sent:* 10 December 2013 01:21 *To:* user@hadoop.apache.org *Subject:* Re: how to handle the corrupt block in HDFS? more strange , in my HDFS cluster ,every block has three replicas,but i find some one has ten replicas ,why? # sudo -u
Re: issue about Shuffled Maps in MR job summary
hi, suppose i have 5-worknode cluster,each worknode can allocate 40G mem ,and i do not care map task,be cause the map task in my job finished within half a minuter,as my observe the real slow task is reduce, i allocate 12G to each reduce task,so each worknode can support 3 reduce parallel,and the whole cluster can support 15 reducer,and i run the job with all 15 reducer, and i do not know if i increase reducer number from 15 to 30 ,each reduce allocate 6G MEM,that will speed the job or not ,the job run on my product env, it run nearly 1 week,it still not finished On Wed, Dec 11, 2013 at 9:50 PM, java8964 java8...@hotmail.com wrote: The whole job complete time depends on a lot of factors. Are you sure the reducers part is the bottleneck? Also, it also depends on how many Reducer input groups it has in your MR job. If you only have 20 reducer groups, even you jump your reducer count to 40, then the epoch of reducers part won't have too much change, as the additional 20 reducer task won't get data to process. If you have a lot of reducer input groups, and your cluster does have capacity at this time, and your also have a lot idle reducer slot, then increase your reducer count should decrease your whole job complete time. Make sense? Yong -- Date: Wed, 11 Dec 2013 14:20:24 +0800 Subject: Re: issue about Shuffled Maps in MR job summary From: justlo...@gmail.com To: user@hadoop.apache.org i read the doc, and find if i have 8 reducer ,a map task will output 8 partition ,each partition will be send to a different reducer, so if i increase reduce number ,the partition number increase ,but the volume on network traffic is same,why sometime ,increase reducer number will not decrease job complete time ? On Wed, Dec 11, 2013 at 1:48 PM, Vinayakumar B vinayakuma...@huawei.comwrote: It looks simple, J Shuffled Maps= Number of Map Tasks * Number of Reducers Thanks and Regards, Vinayakumar B *From:* ch huang [mailto:justlo...@gmail.com] *Sent:* 11 December 2013 10:56 *To:* user@hadoop.apache.org *Subject:* issue about Shuffled Maps in MR job summary hi,maillist: i run terasort with 16 reducers and 8 reducers,when i double reducer number, the Shuffled maps is also double ,my question is the job only run 20 map tasks (total input file is 10,and each file is 100M,my block size is 64M,so split is 20) why i need shuffle 160 maps in 8 reducers run and 320 maps in 16 reducers run?how to caculate the shuffle maps number? 16 reducer summary output: Shuffled Maps =320 8 reducer summary output: Shuffled Maps =160
how to corrupt a replica of a block by manually?
hi,maillist: what is the sample way to corrupt a replica of a block ,i opened a replica data file ,and delete a line,than use fsck ,nothing corrupt, should the DN be restarted?
is mapreduce.task.io.sort.mb control both map merge buffer and reduce merge buffer?
hi,maillist: Due to the heavy job on reduce task, i try to increase buffer size for sort merge,i wander if i increase mapreduce.task.io.sort.mb from 100m(default value) to 1G will cause each map task sort merge buffer also become 1G?
Re: issue about Shuffled Maps in MR job summary
one of important things is my input file is very small ,each file less than 10M,and i have a huge number of files On Thu, Dec 12, 2013 at 9:58 AM, java8964 java8...@hotmail.com wrote: Assume the block size is 128M, and your mapper each finishes within half minute, then there is not too much logic in your mapper, as it can finish processing 128M around 30 seconds. If your reducers cannot finish with 1 week, then something is wrong. So you may need to find out following: 1) How many mappers generated in your MR job? 2) Are they all finished? (Check them in the jobtracker through web or command line) 3) How many reducers in this job? 4) Are reducers starting? What stage are they in? Copying/Sorting/Reducing? 5) If in the reducing stage, check the userlog of reducers. Is your code running now? All these information you can find out from the Job Tracker web UI. Yong -- Date: Thu, 12 Dec 2013 09:03:29 +0800 Subject: Re: issue about Shuffled Maps in MR job summary From: justlo...@gmail.com To: user@hadoop.apache.org hi, suppose i have 5-worknode cluster,each worknode can allocate 40G mem ,and i do not care map task,be cause the map task in my job finished within half a minuter,as my observe the real slow task is reduce, i allocate 12G to each reduce task,so each worknode can support 3 reduce parallel,and the whole cluster can support 15 reducer,and i run the job with all 15 reducer, and i do not know if i increase reducer number from 15 to 30 ,each reduce allocate 6G MEM,that will speed the job or not ,the job run on my product env, it run nearly 1 week,it still not finished On Wed, Dec 11, 2013 at 9:50 PM, java8964 java8...@hotmail.com wrote: The whole job complete time depends on a lot of factors. Are you sure the reducers part is the bottleneck? Also, it also depends on how many Reducer input groups it has in your MR job. If you only have 20 reducer groups, even you jump your reducer count to 40, then the epoch of reducers part won't have too much change, as the additional 20 reducer task won't get data to process. If you have a lot of reducer input groups, and your cluster does have capacity at this time, and your also have a lot idle reducer slot, then increase your reducer count should decrease your whole job complete time. Make sense? Yong -- Date: Wed, 11 Dec 2013 14:20:24 +0800 Subject: Re: issue about Shuffled Maps in MR job summary From: justlo...@gmail.com To: user@hadoop.apache.org i read the doc, and find if i have 8 reducer ,a map task will output 8 partition ,each partition will be send to a different reducer, so if i increase reduce number ,the partition number increase ,but the volume on network traffic is same,why sometime ,increase reducer number will not decrease job complete time ? On Wed, Dec 11, 2013 at 1:48 PM, Vinayakumar B vinayakuma...@huawei.comwrote: It looks simple, J Shuffled Maps= Number of Map Tasks * Number of Reducers Thanks and Regards, Vinayakumar B *From:* ch huang [mailto:justlo...@gmail.com] *Sent:* 11 December 2013 10:56 *To:* user@hadoop.apache.org *Subject:* issue about Shuffled Maps in MR job summary hi,maillist: i run terasort with 16 reducers and 8 reducers,when i double reducer number, the Shuffled maps is also double ,my question is the job only run 20 map tasks (total input file is 10,and each file is 100M,my block size is 64M,so split is 20) why i need shuffle 160 maps in 8 reducers run and 320 maps in 16 reducers run?how to caculate the shuffle maps number? 16 reducer summary output: Shuffled Maps =320 8 reducer summary output: Shuffled Maps =160
Re: issue about running example job use custom mapreduce var
yes,you are right ,thanks On Wed, Dec 11, 2013 at 7:16 AM, Adam Kawa kawa.a...@gmail.com wrote: Accidentally, I clicked Sent by a mistake. Plase try: hadoop jar /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar terasort *-Dmapreduce.job.reduces=34* /alex/terasort/1G-input /alex/terasort/1G-output 2013/12/11 Adam Kawa kawa.a...@gmail.com Please try 2013/12/10 ch huang justlo...@gmail.com hi,maillist: i try assign reduce number in commandline but seems not useful,i run tera sort like this # hadoop jar /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar terasort /alex/terasort/1G-input /alex/terasort/1G-output -Dmapreduce.job.reduces=34 so default mapred-site.xml assign reducer number is 16,i try run it with 34 reducers,but the job run still as 16 reducers number,why? here is some output: Job Counters Launched map tasks=1 Launched reduce tasks=16 Rack-local map tasks=1 Total time spent by all maps in occupied slots (ms)=2318 Total time spent by all reduces in occupied slots (ms)=99714
Re: how to handle the corrupt block in HDFS?
By default this higher replication level is 10. is this value can be control via some option or variable? i only hive a 5-worknode cluster,and i think 5 replicas should be better,because every node can get a local replica. another question is ,why hdfs fsck check the cluster is healthy and no corrupt block,but i see one corrupt block though checking NN metrics? curl http://NNIP:50070/jmx http://nnip:50070/jmx ,thanks On Tue, Dec 10, 2013 at 4:48 PM, Peter Marron peter.mar...@trilliumsoftware.com wrote: Hi, I am sure that there are others who will answer this better, but anyway. The default replication level for files in HDFS is 3 and so most files that you see will have a replication level of 3. However when you run a Map/Reduce job the system knows in advance that every node will need a copy of certain files. Specifically the job.xml and the various jars containing classes that will be needed to run the mappers and reducers. So the system arranges that some of these files have a higher replication level. This increases the chances that a copy will be found locally. By default this higher replication level is 10. This can seem a little odd on a cluster where you only have, say, 3 nodes. Because it means that you will almost always have some blocks that are marked under-replicated. I think that there was some discussion a while back to change this to make the replication level something like min(10, #number of nodes) However, as I recall, the general consensus was that this was extra complexity that wasn’t really worth it. If it ain’t broke… Hope that this helps. *Peter Marron* Senior Developer, Research Development Office: +44 *(0) 118-940-7609* peter.mar...@trilliumsoftware.com Theale Court First Floor, 11-13 High Street, Theale, RG7 5AH, UK https://www.facebook.com/pages/Trillium-Software/109184815778307 https://twitter.com/TrilliumSW http://www.linkedin.com/company/17710 *www.trilliumsoftware.com http://www.trilliumsoftware.com/* Be Certain About Your Data. Be Trillium Certain. *From:* ch huang [mailto:justlo...@gmail.com] *Sent:* 10 December 2013 01:21 *To:* user@hadoop.apache.org *Subject:* Re: how to handle the corrupt block in HDFS? more strange , in my HDFS cluster ,every block has three replicas,but i find some one has ten replicas ,why? # sudo -u hdfs hadoop fs -ls /data/hisstage/helen/.staging/job_1385542328307_0915 Found 5 items -rw-r--r-- 3 helen hadoop 7 2013-11-29 14:01 /data/hisstage/helen/.staging/job_1385542328307_0915/appTokens -rw-r--r-- 10 helen hadoop2977839 2013-11-29 14:01 /data/hisstage/helen/.staging/job_1385542328307_0915/job.jar -rw-r--r-- 10 helen hadoop 3696 2013-11-29 14:01 /data/hisstage/helen/.staging/job_1385542328307_0915/job.split On Tue, Dec 10, 2013 at 9:15 AM, ch huang justlo...@gmail.com wrote: the strange thing is when i use the following command i find 1 corrupt block # curl -s http://ch11:50070/jmx |grep orrupt CorruptBlocks : 1, but when i run hdfs fsck / , i get none ,everything seems fine # sudo -u hdfs hdfs fsck / Status: HEALTHY Total size:1479728140875 B (Total open files size: 1677721600 B) Total dirs:21298 Total files: 100636 (Files currently being written: 25) Total blocks (validated): 119788 (avg. block size 12352891 B) (Total open file blocks (not validated): 37) Minimally replicated blocks: 119788 (100.0 %) Over-replicated blocks:0 (0.0 %) Under-replicated blocks: 166 (0.13857816 %) Mis-replicated blocks: 0 (0.0 %) Default replication factor:3 Average block replication: 3.0027633 Corrupt blocks:0 Missing replicas: 831 (0.23049656 %) Number of data-nodes: 5 Number of racks: 1 FSCK ended at Tue Dec 10 09:14:48 CST 2013 in 3276 milliseconds The filesystem under path '/' is HEALTHY On Tue, Dec 10, 2013 at 8:32 AM, ch huang justlo...@gmail.com wrote: hi,maillist: my nagios alert me that there is a corrupt block in HDFS all day,but i do not know how to remove it,and if the HDFS will handle this automaticlly? and if remove the corrupt block will cause any data lost?thanks image001.pngimage003.pngimage004.pngimage002.png
Re: how to handle the corrupt block in HDFS?
thanks for reply, what i do not know is how can i locate the block which has the corrupt replica,(so i can observe how long the corrupt replica will be removed and a new health replica replace it,because i get nagios alert for three days,i do not sure if it is the same corrupt replica cause the alert ,and i do not know the interval of hdfs check corrupt replica and clean it) On Tue, Dec 10, 2013 at 6:20 PM, Vinayakumar B vinayakuma...@huawei.comwrote: Hi ch huang, It may seem strange, but the fact is, *CorruptBlocks* through JMX means *“Number of blocks with corrupt replicas”. May not be all replicas are corrupt. *This you can check though jconsole for description. Where as *Corrupt blocks* through fsck means, *blocks with all replicas corrupt(non-recoverable)/ missing.* In your case, may be one of the replica is corrupt, not all replicas of same block. This corrupt replica will be deleted automatically if one more datanode available in your cluster and block replicated to that. Related to replication 10, As Peter Marron said, *some of the important files of the mapreduce job will set the replication of 10, to make it accessible faster and launch map tasks faster. * Anyway, if the job is success these files will be deleted auomatically. I think only in some cases if the jobs are killed in between these files will remain in hdfs showing underreplicated blocks. Thanks and Regards, Vinayakumar B *From:* Peter Marron [mailto:peter.mar...@trilliumsoftware.com] *Sent:* 10 December 2013 14:19 *To:* user@hadoop.apache.org *Subject:* RE: how to handle the corrupt block in HDFS? Hi, I am sure that there are others who will answer this better, but anyway. The default replication level for files in HDFS is 3 and so most files that you see will have a replication level of 3. However when you run a Map/Reduce job the system knows in advance that every node will need a copy of certain files. Specifically the job.xml and the various jars containing classes that will be needed to run the mappers and reducers. So the system arranges that some of these files have a higher replication level. This increases the chances that a copy will be found locally. By default this higher replication level is 10. This can seem a little odd on a cluster where you only have, say, 3 nodes. Because it means that you will almost always have some blocks that are marked under-replicated. I think that there was some discussion a while back to change this to make the replication level something like min(10, #number of nodes) However, as I recall, the general consensus was that this was extra complexity that wasn’t really worth it. If it ain’t broke… Hope that this helps. *Peter Marron* Senior Developer, Research Development Office: +44 *(0) 118-940-7609* peter.mar...@trilliumsoftware.com Theale Court First Floor, 11-13 High Street, Theale, RG7 5AH, UK https://www.facebook.com/pages/Trillium-Software/109184815778307 https://twitter.com/TrilliumSW http://www.linkedin.com/company/17710 *www.trilliumsoftware.com http://www.trilliumsoftware.com/* Be Certain About Your Data. Be Trillium Certain. *From:* ch huang [mailto:justlo...@gmail.com justlo...@gmail.com] *Sent:* 10 December 2013 01:21 *To:* user@hadoop.apache.org *Subject:* Re: how to handle the corrupt block in HDFS? more strange , in my HDFS cluster ,every block has three replicas,but i find some one has ten replicas ,why? # sudo -u hdfs hadoop fs -ls /data/hisstage/helen/.staging/job_1385542328307_0915 Found 5 items -rw-r--r-- 3 helen hadoop 7 2013-11-29 14:01 /data/hisstage/helen/.staging/job_1385542328307_0915/appTokens -rw-r--r-- 10 helen hadoop2977839 2013-11-29 14:01 /data/hisstage/helen/.staging/job_1385542328307_0915/job.jar -rw-r--r-- 10 helen hadoop 3696 2013-11-29 14:01 /data/hisstage/helen/.staging/job_1385542328307_0915/job.split On Tue, Dec 10, 2013 at 9:15 AM, ch huang justlo...@gmail.com wrote: the strange thing is when i use the following command i find 1 corrupt block # curl -s http://ch11:50070/jmx |grep orrupt CorruptBlocks : 1, but when i run hdfs fsck / , i get none ,everything seems fine # sudo -u hdfs hdfs fsck / Status: HEALTHY Total size:1479728140875 B (Total open files size: 1677721600 B) Total dirs:21298 Total files: 100636 (Files currently being written: 25) Total blocks (validated): 119788 (avg. block size 12352891 B) (Total open file blocks (not validated): 37) Minimally replicated blocks: 119788 (100.0 %) Over-replicated blocks:0 (0.0 %) Under-replicated blocks: 166 (0.13857816 %) Mis-replicated blocks: 0 (0.0 %) Default replication factor:3 Average block replication: 3.0027633 Corrupt blocks:0 Missing replicas: 831
Re: how to handle the corrupt block in HDFS?
thanks for reply,but if the block just has 1 corrupt replica,hdfs fsck can not tell you which block of which file has a replica been corrupted,fsck just useful on all of one block's replica bad On Wed, Dec 11, 2013 at 10:01 AM, Adam Kawa kawa.a...@gmail.com wrote: When you identify a file with corrupt block(s), then you can locate the machines that stores its block by typing $ sudo -u hdfs hdfs fsck path-to-file -files -blocks -locations 2013/12/11 Adam Kawa kawa.a...@gmail.com Maybe this can work for you $ sudo -u hdfs hdfs fsck / -list-corruptfileblocks ? 2013/12/11 ch huang justlo...@gmail.com thanks for reply, what i do not know is how can i locate the block which has the corrupt replica,(so i can observe how long the corrupt replica will be removed and a new health replica replace it,because i get nagios alert for three days,i do not sure if it is the same corrupt replica cause the alert ,and i do not know the interval of hdfs check corrupt replica and clean it) On Tue, Dec 10, 2013 at 6:20 PM, Vinayakumar B vinayakuma...@huawei.com wrote: Hi ch huang, It may seem strange, but the fact is, *CorruptBlocks* through JMX means *“Number of blocks with corrupt replicas”. May not be all replicas are corrupt. *This you can check though jconsole for description. Where as *Corrupt blocks* through fsck means, *blocks with all replicas corrupt(non-recoverable)/ missing.* In your case, may be one of the replica is corrupt, not all replicas of same block. This corrupt replica will be deleted automatically if one more datanode available in your cluster and block replicated to that. Related to replication 10, As Peter Marron said, *some of the important files of the mapreduce job will set the replication of 10, to make it accessible faster and launch map tasks faster. * Anyway, if the job is success these files will be deleted auomatically. I think only in some cases if the jobs are killed in between these files will remain in hdfs showing underreplicated blocks. Thanks and Regards, Vinayakumar B *From:* Peter Marron [mailto:peter.mar...@trilliumsoftware.com] *Sent:* 10 December 2013 14:19 *To:* user@hadoop.apache.org *Subject:* RE: how to handle the corrupt block in HDFS? Hi, I am sure that there are others who will answer this better, but anyway. The default replication level for files in HDFS is 3 and so most files that you see will have a replication level of 3. However when you run a Map/Reduce job the system knows in advance that every node will need a copy of certain files. Specifically the job.xml and the various jars containing classes that will be needed to run the mappers and reducers. So the system arranges that some of these files have a higher replication level. This increases the chances that a copy will be found locally. By default this higher replication level is 10. This can seem a little odd on a cluster where you only have, say, 3 nodes. Because it means that you will almost always have some blocks that are marked under-replicated. I think that there was some discussion a while back to change this to make the replication level something like min(10, #number of nodes) However, as I recall, the general consensus was that this was extra complexity that wasn’t really worth it. If it ain’t broke… Hope that this helps. *Peter Marron* Senior Developer, Research Development Office: +44 *(0) 118-940-7609* peter.mar...@trilliumsoftware.com Theale Court First Floor, 11-13 High Street, Theale, RG7 5AH, UK https://www.facebook.com/pages/Trillium-Software/109184815778307 https://twitter.com/TrilliumSW http://www.linkedin.com/company/17710 *www.trilliumsoftware.com http://www.trilliumsoftware.com/* Be Certain About Your Data. Be Trillium Certain. *From:* ch huang [mailto:justlo...@gmail.com justlo...@gmail.com] *Sent:* 10 December 2013 01:21 *To:* user@hadoop.apache.org *Subject:* Re: how to handle the corrupt block in HDFS? more strange , in my HDFS cluster ,every block has three replicas,but i find some one has ten replicas ,why? # sudo -u hdfs hadoop fs -ls /data/hisstage/helen/.staging/job_1385542328307_0915 Found 5 items -rw-r--r-- 3 helen hadoop 7 2013-11-29 14:01 /data/hisstage/helen/.staging/job_1385542328307_0915/appTokens -rw-r--r-- 10 helen hadoop2977839 2013-11-29 14:01 /data/hisstage/helen/.staging/job_1385542328307_0915/job.jar -rw-r--r-- 10 helen hadoop 3696 2013-11-29 14:01 /data/hisstage/helen/.staging/job_1385542328307_0915/job.split On Tue, Dec 10, 2013 at 9:15 AM, ch huang justlo...@gmail.com wrote: the strange thing is when i use the following command i find 1 corrupt block # curl -s http://ch11:50070/jmx |grep orrupt CorruptBlocks : 1, but when i run hdfs fsck / , i get none ,everything seems fine # sudo -u hdfs hdfs fsck
issue about corrupt block test
hi,maillist: i try to corrupt a block of a file in my benchmark environment, as the following command i find blk_2504407693800874616_106252 ,it's replica on 192.168.10.224 is my target ,but i find all the datadir in 192.168.10.224 ,can not fine the datafile belongs to this replic ,why? # ls /data/dataspace/1/current/BP-50684181-192.168.10.220-1383638483950/current/finalized/blk_3717620888497075523_106232* ls: cannot access /data/dataspace/1/current/BP-50684181-192.168.10.220-1383638483950/current/finalized/blk_3717620888497075523_106232*: No such file or directory [root@CHBM224 conf]# ls /data/dataspace/1/current/BP-50684181-192.168.10.220-1383638483950/current/finalized/blk_2504407693800874616_106252* ls: cannot access /data/dataspace/1/current/BP-50684181-192.168.10.220-1383638483950/current/finalized/blk_2504407693800874616_106252*: No such file or directory [root@CHBM224 conf]# ls /data/dataspace/2/current/BP-50684181-192.168.10.220-1383638483950/current/finalized/blk_2504407693800874616_106252* ls: cannot access /data/dataspace/2/current/BP-50684181-192.168.10.220-1383638483950/current/finalized/blk_2504407693800874616_106252*: No such file or directory [root@CHBM224 conf]# ls /data/dataspace/3/current/BP-50684181-192.168.10.220-1383638483950/current/finalized/blk_2504407693800874616_106252* ls: cannot access /data/dataspace/3/current/BP-50684181-192.168.10.220-1383638483950/current/finalized/blk_2504407693800874616_106252*: No such file or directory [root@CHBM224 conf]# hdfs fsck /alex/terasort/1G-input/part-m-0 -files -blocks -locations Connecting to namenode via http://CHBM220:50070 http://chbm220:50070/ FSCK started by root (auth:SIMPLE) from /192.168.10.224 for path /alex/terasort/1G-input/part-m-0 at Wed Dec 11 11:35:42 CST 2013 /alex/terasort/1G-input/part-m-0 1 bytes, 2 block(s): OK 0. BP-50684181-192.168.10.220-1383638483950:blk_3717620888497075523_106232 len=67108864 repl=3 [192.168.10.222:50010, 192.168.10.223:50010, 192.168.10.221:50010] 1. BP-50684181-192.168.10.220-1383638483950:blk_2504407693800874616_106252 len=32891136 repl=3 [192.168.10.222:50010, 192.168.10.221:50010, 192.168.10.224:50010]
issue about Shuffled Maps in MR job summary
hi,maillist: i run terasort with 16 reducers and 8 reducers,when i double reducer number, the Shuffled maps is also double ,my question is the job only run 20 map tasks (total input file is 10,and each file is 100M,my block size is 64M,so split is 20) why i need shuffle 160 maps in 8 reducers run and 320 maps in 16 reducers run?how to caculate the shuffle maps number? 16 reducer summary output: Shuffled Maps =320 8 reducer summary output: Shuffled Maps =160
Re: issue about Shuffled Maps in MR job summary
i read the doc, and find if i have 8 reducer ,a map task will output 8 partition ,each partition will be send to a different reducer, so if i increase reduce number ,the partition number increase ,but the volume on network traffic is same,why sometime ,increase reducer number will not decrease job complete time ? On Wed, Dec 11, 2013 at 1:48 PM, Vinayakumar B vinayakuma...@huawei.comwrote: It looks simple, J Shuffled Maps= Number of Map Tasks * Number of Reducers Thanks and Regards, Vinayakumar B *From:* ch huang [mailto:justlo...@gmail.com] *Sent:* 11 December 2013 10:56 *To:* user@hadoop.apache.org *Subject:* issue about Shuffled Maps in MR job summary hi,maillist: i run terasort with 16 reducers and 8 reducers,when i double reducer number, the Shuffled maps is also double ,my question is the job only run 20 map tasks (total input file is 10,and each file is 100M,my block size is 64M,so split is 20) why i need shuffle 160 maps in 8 reducers run and 320 maps in 16 reducers run?how to caculate the shuffle maps number? 16 reducer summary output: Shuffled Maps =320 8 reducer summary output: Shuffled Maps =160
Re: how to handle the corrupt block in HDFS?
the strange thing is when i use the following command i find 1 corrupt block # curl -s http://ch11:50070/jmx |grep orrupt CorruptBlocks : 1, but when i run hdfs fsck / , i get none ,everything seems fine # sudo -u hdfs hdfs fsck / Status: HEALTHY Total size:1479728140875 B (Total open files size: 1677721600 B) Total dirs:21298 Total files: 100636 (Files currently being written: 25) Total blocks (validated): 119788 (avg. block size 12352891 B) (Total open file blocks (not validated): 37) Minimally replicated blocks: 119788 (100.0 %) Over-replicated blocks:0 (0.0 %) Under-replicated blocks: 166 (0.13857816 %) Mis-replicated blocks: 0 (0.0 %) Default replication factor:3 Average block replication: 3.0027633 Corrupt blocks:0 Missing replicas: 831 (0.23049656 %) Number of data-nodes: 5 Number of racks: 1 FSCK ended at Tue Dec 10 09:14:48 CST 2013 in 3276 milliseconds The filesystem under path '/' is HEALTHY On Tue, Dec 10, 2013 at 8:32 AM, ch huang justlo...@gmail.com wrote: hi,maillist: my nagios alert me that there is a corrupt block in HDFS all day,but i do not know how to remove it,and if the HDFS will handle this automaticlly? and if remove the corrupt block will cause any data lost?thanks
Re: how to handle the corrupt block in HDFS?
more strange , in my HDFS cluster ,every block has three replicas,but i find some one has ten replicas ,why? # sudo -u hdfs hadoop fs -ls /data/hisstage/helen/.staging/job_1385542328307_0915 Found 5 items -rw-r--r-- 3 helen hadoop 7 2013-11-29 14:01 /data/hisstage/helen/.staging/job_1385542328307_0915/appTokens -rw-r--r-- 10 helen hadoop2977839 2013-11-29 14:01 /data/hisstage/helen/.staging/job_1385542328307_0915/job.jar -rw-r--r-- 10 helen hadoop 3696 2013-11-29 14:01 /data/hisstage/helen/.staging/job_1385542328307_0915/job.split On Tue, Dec 10, 2013 at 9:15 AM, ch huang justlo...@gmail.com wrote: the strange thing is when i use the following command i find 1 corrupt block # curl -s http://ch11:50070/jmx |grep orrupt CorruptBlocks : 1, but when i run hdfs fsck / , i get none ,everything seems fine # sudo -u hdfs hdfs fsck / Status: HEALTHY Total size:1479728140875 B (Total open files size: 1677721600 B) Total dirs:21298 Total files: 100636 (Files currently being written: 25) Total blocks (validated): 119788 (avg. block size 12352891 B) (Total open file blocks (not validated): 37) Minimally replicated blocks: 119788 (100.0 %) Over-replicated blocks:0 (0.0 %) Under-replicated blocks: 166 (0.13857816 %) Mis-replicated blocks: 0 (0.0 %) Default replication factor:3 Average block replication: 3.0027633 Corrupt blocks:0 Missing replicas: 831 (0.23049656 %) Number of data-nodes: 5 Number of racks: 1 FSCK ended at Tue Dec 10 09:14:48 CST 2013 in 3276 milliseconds The filesystem under path '/' is HEALTHY On Tue, Dec 10, 2013 at 8:32 AM, ch huang justlo...@gmail.com wrote: hi,maillist: my nagios alert me that there is a corrupt block in HDFS all day,but i do not know how to remove it,and if the HDFS will handle this automaticlly? and if remove the corrupt block will cause any data lost?thanks
issue about running example job use custom mapreduce var
hi,maillist: i try assign reduce number in commandline but seems not useful,i run tera sort like this # hadoop jar /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar terasort /alex/terasort/1G-input /alex/terasort/1G-output -Dmapreduce.job.reduces=34 so default mapred-site.xml assign reducer number is 16,i try run it with 34 reducers,but the job run still as 16 reducers number,why? here is some output: Job Counters Launched map tasks=1 Launched reduce tasks=16 Rack-local map tasks=1 Total time spent by all maps in occupied slots (ms)=2318 Total time spent by all reduces in occupied slots (ms)=99714
how many job request can be in queue when the first MR JOB is blocked due to lack of resource?
hi,maillist: any variables can control it?
issue about terasort read partition file from local fs instead HDFS
hi,maillist: i try to use terasort to benchmark my cluster ,when i run it ,i found tearsort try to read partition file from local filesystem not HDFS,i see a partition file in HDFS ,when i copy this file into local filesystem,run terasort again ,it's work fine ,but it run on local host instead of cluster ,why? and how can i let it run on cluster? # hadoop fs -ls /alex/terasort/1G-input Found 3 items -rw-r--r-- 3 root hadoop 0 2013-12-05 14:34 /alex/terasort/1G-input/_SUCCESS -rw-r--r-- 3 root hadoop129 2013-12-06 10:20 /alex/terasort/1G-input/_partition.lst -rw-r--r-- 3 root hadoop 10 2013-12-05 14:34 /alex/terasort/1G-input/part-0
Re: how many job request can be in queue when the first MR JOB is blocked due to lack of resource?
i search the code only src/hadoop-mapreduce1-project/src/contrib/capacity-scheduler/src/test/org/apache/hadoop/mapred/TestCapacitySchedulerConf.java file has the variables,i did see it on ./src/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacitySchedulerConfiguration.java On Fri, Dec 6, 2013 at 10:44 AM, rtejac rte...@gmail.com wrote: You can take a look at this parameter. This will control number of jobs a user can initialize. *mapred.capacity-scheduler.queue.default.maximum-initialized-jobs-per-user = …. * On Dec 5, 2013, at 5:33 PM, ch huang justlo...@gmail.com wrote: hi,maillist: any variables can control it?
Re: Monitor network traffic in hadoop
hi, Abdul Navaz: assign shuffle port in each NM using option mapreduce.shuffle.port in mapred-site.xml,then monitor this port use tcpdump or wireshark ,hope this info can help you On Fri, Dec 6, 2013 at 11:22 AM, navaz navaz@gmail.com wrote: Hello I am following the tutorial hadoop on single node cluster and I am able test word count program map reduce. its working fine. I would like to know How to monitor when shuffle phase network traffic occurs via wireshark or someother means. Pls guide me. Thanks Abdul Navaz Graduate student On Dec 5, 2013 6:56 PM, navaz navaz@gmail.com wrote: Hello I am following the tutorial hadoop on single node cluster and I am able test word count map reduce. its working fine. I would like to know How to monitor when shuffle phase network traffic occurs via wireshark or someother means. Pls guide me. Thanks Abdul Navaz Graduate student University of Houston ,TX
Re: error in copy from local file into HDFS
hi: you are right,my DN disk is full,i delete some file,now it's worked ,thanks On Fri, Dec 6, 2013 at 11:28 AM, Vinayakumar B vinayakuma...@huawei.comwrote: Hi Ch huang, Please check whether all datanodes in your cluster have enough disk space and number non-decommissioned nodes should be non-zero. Thanks and regards, Vinayakumar B *From:* ch huang [mailto:justlo...@gmail.com] *Sent:* 06 December 2013 07:14 *To:* user@hadoop.apache.org *Subject:* error in copy from local file into HDFS hi,maillist: i got a error when i put local file into HDFS [root@CHBM224 test]# hadoop fs -copyFromLocal /tmp/aa /alex/ 13/12/06 09:40:29 WARN hdfs.DFSClient: DataStreamer Exception org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /alex/aa._COPYING_ could only be replicated to 0 nodes instead of minReplication (=1). There are 4 datanode(s) running and no node(s) are excluded in this operation. at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1339) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2198) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:501) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:299) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:44954) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1002) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1751) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1747) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1745) at org.apache.hadoop.ipc.Client.call(Client.java:1237) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202) at com.sun.proxy.$Proxy9.addBlock(Unknown Source) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:291) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83) at com.sun.proxy.$Proxy10.addBlock(Unknown Source) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1177) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1030) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:488) copyFromLocal: File /alex/aa._COPYING_ could only be replicated to 0 nodes instead of minReplication (=1). There are 4 datanode(s) running and no node(s) are excluded in this operation. 13/12/06 09:40:29 ERROR hdfs.DFSClient: Failed to close file /alex/aa._COPYING_ org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /alex/aa._COPYING_ could only be replicated to 0 nodes instead of minReplication (=1). There are 4 datanode(s) running and no node(s) are excluded in this operation. at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1339) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2198) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:501) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:299) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:44954) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1002) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1751) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1747
Re: issue about capacity scheduler
if i have 40GB memory of cluster resource, and yarn.scheduler.capacity.maximum-am-resource-percent set to 0.1 ,so that's mean when i lauch a appMaster ,i need allocate 4GB to the appMaster ? ,if so, why i increasing the value will cause more appMaster running concurrently,instead of decreasing ? On Thu, Dec 5, 2013 at 5:04 AM, Jian He j...@hortonworks.com wrote: you can probably try increasing yarn.scheduler.capacity.maximum-am-resource-percent, This controls the max concurrently running AMs. Thanks, Jian On Wed, Dec 4, 2013 at 1:33 AM, ch huang justlo...@gmail.com wrote: hi,maillist : i use yarn framework and capacity scheduler ,and i have two queue ,one for hive and the other for big MR job in hive queue it's work fine,because hive task is very faster ,but what i think is user A submitted two big MR job ,so first big job eat all the resource belongs to the queue ,the other big MR job should wait until first job finished ,how can i let the same user 's MR job can run parallel? CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
Re: issue about capacity scheduler
another question is ,i set the yarn.scheduler.minimum-allocation-mb is 2GB,so the container size will at less 2GB ,but i see appMaster container only use 1GB heap size why? # ps -ef|grep 8062 yarn 8062 8047 5 09:04 ?00:00:09 /usr/java/jdk1.7.0_25/bin/java -Dlog4j.configuration=container-log4j.properties -Dyarn.app.mapreduce.container.log.dir=/data/mrlocal/1/yarn/logs/application_1386139114497_0024/container_1386139114497_0024_01_01 -Dyarn.app.mapreduce.container.log.filesize=0 -Dhadoop.root.logger=INFO,CLA -Xmx1024m org.apache.hadoop.mapreduce.v2.app.MRAppMaster On Thu, Dec 5, 2013 at 5:04 AM, Jian He j...@hortonworks.com wrote: you can probably try increasing yarn.scheduler.capacity.maximum-am-resource-percent, This controls the max concurrently running AMs. Thanks, Jian On Wed, Dec 4, 2013 at 1:33 AM, ch huang justlo...@gmail.com wrote: hi,maillist : i use yarn framework and capacity scheduler ,and i have two queue ,one for hive and the other for big MR job in hive queue it's work fine,because hive task is very faster ,but what i think is user A submitted two big MR job ,so first big job eat all the resource belongs to the queue ,the other big MR job should wait until first job finished ,how can i let the same user 's MR job can run parallel? CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
Re: issue about the MR JOB local dir
thank you ,but it seems the doc is littler old , doc says - *PUBLIC:* local-dir/filecache - *PRIVATE:* local-dir/usercache//filecache - *APPLICATION:* local-dir/usercache//appcache/app-id/ but here is my nodemanager directory,i guess nmPrivate belongs to private dir ,and filecache dir is not exist in usercache # ls /data/mrlocal/1/yarn/local filecache nmPrivate usercache [root@CHBM223 conf]# ls /data/mrlocal/1/yarn/local/filecache/ -1058429088916409723 4529188628984375230 -7014620238384418063 1084965746802723478 4537624275313838973 -7168597014714301440 -1624997938511480096 4630901056913375526 7270199361370573766 -1664837184667315424 -4642830643595652223 -7332220817185869511 1725675017861848111 4715236827440900877332904188082338506 1838346487029342338 4790459366530674957 -7450121760156930096 1865044782300039774800525395984004560 7478948409771297223 -2348110367263014791 -5080956154405911478 7486468764131639983 -2569725565520513438 524923119076958393-7755253483162230956 -2590767617048813033 -5270961733852362332 -7859425335924192987 2787947055181616358 -5381775829268220744 7967711417630616031 2816094634154232444 -5845090920164902899 8115657316961272063 286373945366133510-587409153437667574 -8196745140008584754 2931191327895309259 -5951079387471670627 -8338714062663466224 -304471400571947298 -6076923167039033115 -8473967805299855837 3250195466880585846080416638029534254 8513492322348652110 -3331048722364108374 6332597539903254606 -8567312237113801580 3360339691049457808 634308406792699 8737308241488535006 3368354412003774516566344665060319340 -8893869581665287805 3628504729266619560 -6639258108397695527 -8895898681278542021 -3801380133229678986 -6653760362065293300 8926294383627727352 3837066533086156807 -6782198269120858036 -8964326004503603190 3929223016635331138 -6814427383139267223 -9049325747073392755 4126862917222506438 -6814979781017122863 -9186700026428986961 [root@CHBM223 conf]# ls /data/mrlocal/1/yarn/local/nmPrivate/ application_1385444985453_0001 application_1385453784842_0010 application_1385522685434_0081 container_1385522685434_0073_01_14.pid application_1385445543402_0003 application_1385453784842_0013 container_1385522685434_0073_01_05.pid container_1385522685434_0073_01_17.pid application_1385445543402_0005 application_1385520079773_0005 container_1385522685434_0073_01_08.pid [root@CHBM223 conf]# ls /data/mrlocal/1/yarn/local/usercache/ hdfs helen hive root On Thu, Dec 5, 2013 at 5:12 AM, Jian He j...@hortonworks.com wrote: The following links may help you http://hortonworks.com/blog/management-of-application-dependencies-in-yarn/ http://hortonworks.com/blog/resource-localization-in-yarn-deep-dive/ Thanks, Jian On Tue, Dec 3, 2013 at 5:26 PM, ch huang justlo...@gmail.com wrote: hi,maillist: i see three dirs on my local MR job dir ,and i do not know these dirs usage,anyone knows? # ls /data/1/mrlocal/yarn/local/ filecache/ nmPrivate/ usercache/ CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
get error in running terasort tool
hi,maillist: i try run terasort in my cluster ,but failed ,following is error ,i do not know why, anyone can help? # hadoop jar /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar terasort /alex/terasort/1G-input /alex/terasort/1G-output 13/12/05 15:15:43 INFO terasort.TeraSort: starting 13/12/05 15:15:43 INFO mapred.FileInputFormat: Total input paths to process : 1 13/12/05 15:15:45 INFO zlib.ZlibFactory: Successfully loaded initialized native-zlib library 13/12/05 15:15:45 INFO compress.CodecPool: Got brand-new compressor [.deflate] Making 1 from 10 records Step size is 10.0 13/12/05 15:15:45 WARN hdfs.DFSClient: DataStreamer Exception org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /alex/terasort/1G-input/_partition.lst could only be replicated to 0 nodes instead of minReplication (=1). There are 4 datanode(s) running and no node(s) are excluded in this operation. at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1339) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2198) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:501) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:299) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:44954) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1002) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1751) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1747) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1745) at org.apache.hadoop.ipc.Client.call(Client.java:1237) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202) at com.sun.proxy.$Proxy9.addBlock(Unknown Source) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:291) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83) at com.sun.proxy.$Proxy10.addBlock(Unknown Source) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1177) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1030) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:488) org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /alex/terasort/1G-input/_partition.lst could only be replicated to 0 nodes instead of minReplication (=1). There are 4 datanode(s) running and no node(s) are excluded in this operation. at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1339) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2198) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:501) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:299) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:44954) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1002) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1751) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1747) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1745) at org.apache.hadoop.ipc.Client.call(Client.java:1237)
Re: get error in running terasort tool
BTW.i use CDH4.4 On Thu, Dec 5, 2013 at 3:18 PM, ch huang justlo...@gmail.com wrote: hi,maillist: i try run terasort in my cluster ,but failed ,following is error ,i do not know why, anyone can help? # hadoop jar /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar terasort /alex/terasort/1G-input /alex/terasort/1G-output 13/12/05 15:15:43 INFO terasort.TeraSort: starting 13/12/05 15:15:43 INFO mapred.FileInputFormat: Total input paths to process : 1 13/12/05 15:15:45 INFO zlib.ZlibFactory: Successfully loaded initialized native-zlib library 13/12/05 15:15:45 INFO compress.CodecPool: Got brand-new compressor [.deflate] Making 1 from 10 records Step size is 10.0 13/12/05 15:15:45 WARN hdfs.DFSClient: DataStreamer Exception org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /alex/terasort/1G-input/_partition.lst could only be replicated to 0 nodes instead of minReplication (=1). There are 4 datanode(s) running and no node(s) are excluded in this operation. at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1339) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2198) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:501) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:299) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:44954) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1002) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1751) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1747) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1745) at org.apache.hadoop.ipc.Client.call(Client.java:1237) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202) at com.sun.proxy.$Proxy9.addBlock(Unknown Source) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:291) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83) at com.sun.proxy.$Proxy10.addBlock(Unknown Source) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1177) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1030) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:488) org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /alex/terasort/1G-input/_partition.lst could only be replicated to 0 nodes instead of minReplication (=1). There are 4 datanode(s) running and no node(s) are excluded in this operation. at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1339) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2198) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:501) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:299) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:44954) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1002) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1751) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1747) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415
issue about total input byte of MR job
i run the MR job,at the MR output i see 13/12/03 14:02:28 INFO mapreduce.JobSubmitter: number of splits:2717 because my each data block size is 64M,so total byte is 2717*64M/1024= 170G but in the summary of end i see follow info ,so the HDFS read byte is 126792190158/1024/1024/1024 = 118G ,the two number is not very close ,why? File System Counters FILE: Number of bytes read=9642910241 FILE: Number of bytes written=120327706125 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 HDFS: Number of bytes read=126792190158 HDFS: Number of bytes written=0 HDFS: Number of read operations=8151 HDFS: Number of large read operations=0 HDFS: Number of write operations=0
issue about the MR JOB local dir
hi,maillist: i see three dirs on my local MR job dir ,and i do not know these dirs usage,anyone knows? # ls /data/1/mrlocal/yarn/local/ filecache/ nmPrivate/ usercache/
issue about read file from HDFS
hi,mailist: when HDFS 's file is in appending ,no other reader can get data from this file,so when i do some statistics action every five minuters use hive read hdfs file ,i can not read data ,any good way can offer? thanks
Re: issue about read file from HDFS
it seems not a good suggestion,get lot of partition dir and data file will be a big compact to NN On Wed, Dec 4, 2013 at 12:08 PM, Azuryy Yu azury...@gmail.com wrote: One suggestion is change your hive partition, add a hive partition every five minutes, and your HDFS file also roller every five minutes. On Wed, Dec 4, 2013 at 11:56 AM, ch huang justlo...@gmail.com wrote: hi,mailist: when HDFS 's file is in appending ,no other reader can get data from this file,so when i do some statistics action every five minuters use hive read hdfs file ,i can not read data ,any good way can offer? thanks
how to prevent JAVA HEAP OOM happen in shuffle process in a MR job?
hi,maillist: i recent get a problem,when i run MR job, it happened OOM in shuffle process,the options about MR is default,not changed,which option should i tuning? thanks
issure about MR job on yarn framework
hi,maillist: i run a job on my CDH4.4 yarn framework ,it's map task finished very fast,but reduce is very slow, i check it use ps command find it's work heap size is 200m,so i try to increase heap size used by reduce task,i add YARN_OPTS=$YARN_OPTS -Dmapreduce.reduce.java.opts=-Xmx1024m -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -Xloggc:$YARN_LOG_DIR/gc-$(hostname)-resourcemanager.log -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=5 -XX:GCLogFileSize=15M -XX:-UseGCOverheadLimitin yarn-env.sh file ,but when i restart the nodemanager ,i find new reduce task still use 200m heap ,why? # jps 2853 DataNode 19533 Jps 10949 YarnChild 10661 NodeManager 15130 HRegionServer # ps -ef|grep 10949 yarn 10949 10661 99 09:52 ?00:19:31 /usr/java/jdk1.7.0_45/bin/java -Djava.net.preferIPv4Stack=true -Dhadoop.metrics.log.level=WARN -Xmx200m -Djava.io.tmpdir=/data/1/mrlocal/yarn/local/usercache/hdfs/appcache/application_1385983958793_0022/container_1385983958793_0022_01_005650/tmp -Dlog4j.configuration=container-log4j.properties -Dyarn.app.mapreduce.container.log.dir=/data/2/mrlocal/yarn/logs/application_1385983958793_0022/container_1385983958793_0022_01_005650 -Dyarn.app.mapreduce.container.log.filesize=0 -Dhadoop.root.logger=INFO,CLA org.apache.hadoop.mapred.YarnChild 192.168.11.10 48936 attempt_1385983958793_0022_r_00_14 5650
Re: issure about MR job on yarn framework
another question is why the map process progress will back when it reach 100%? On Tue, Dec 3, 2013 at 10:07 AM, ch huang justlo...@gmail.com wrote: hi,maillist: i run a job on my CDH4.4 yarn framework ,it's map task finished very fast,but reduce is very slow, i check it use ps command find it's work heap size is 200m,so i try to increase heap size used by reduce task,i add YARN_OPTS=$YARN_OPTS -Dmapreduce.reduce.java.opts=-Xmx1024m -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -Xloggc:$YARN_LOG_DIR/gc-$(hostname)-resourcemanager.log -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=5 -XX:GCLogFileSize=15M -XX:-UseGCOverheadLimitin yarn-env.sh file ,but when i restart the nodemanager ,i find new reduce task still use 200m heap ,why? # jps 2853 DataNode 19533 Jps 10949 YarnChild 10661 NodeManager 15130 HRegionServer # ps -ef|grep 10949 yarn 10949 10661 99 09:52 ?00:19:31 /usr/java/jdk1.7.0_45/bin/java -Djava.net.preferIPv4Stack=true -Dhadoop.metrics.log.level=WARN -Xmx200m -Djava.io.tmpdir=/data/1/mrlocal/yarn/local/usercache/hdfs/appcache/application_1385983958793_0022/container_1385983958793_0022_01_005650/tmp -Dlog4j.configuration=container-log4j.properties -Dyarn.app.mapreduce.container.log.dir=/data/2/mrlocal/yarn/logs/application_1385983958793_0022/container_1385983958793_0022_01_005650 -Dyarn.app.mapreduce.container.log.filesize=0 -Dhadoop.root.logger=INFO,CLA org.apache.hadoop.mapred.YarnChild 192.168.11.10 48936 attempt_1385983958793_0022_r_00_14 5650