from:"ch huang"

question about speed up the hdfs balancer

2014-11-17 Thread ch huang

hi,maillist:
  i want to balancer for my HDFS cluster ,but the default speed is
1M/s , i want to set the option dfs.balance.bandwidthPerSec  to 20M/s
,surpose i issue balancer command on node A ,i do not know if i just set
the option on node A's  hdfs-site.xml ,or i need to set the option on All
node of my HDFS cluster? thanks a lot!

Re: issue about pig can not know HDFS HA configuration

2014-11-05 Thread ch huang

this name is not a host name ,it is NN HA service name, behind the name ,is
two NN box, one for active node , one for standby node

On Wed, Nov 5, 2014 at 7:41 PM, Jagannath Naidu 
jagannath.na...@fosteringlinux.com wrote:



 On 5 November 2014 14:49, ch huang justlo...@gmail.com wrote:

 hi,maillist:
i set namenode HA in my HDFS cluster,but seems pig can not know it
 ,why?

 2014-11-05 14:34:54,710 [JobControl] INFO
  org.apache.hadoop.mapreduce.JobSubmitter - Cleaning up the staging area
 file:/tmp/hadoop-root/mapred/staging/root1861403840/.staging/job_local1861403840_0001
 2014-11-05 14:34:54,716 [JobControl] WARN
  org.apache.hadoop.security.UserGroupInformation -
 PriviledgedActionException as:root (auth:SIMPLE)
 cause:org.apache.pig.backend.executionengine.ExecException: ERROR 2118:
 java.net.UnknownHostException: develop


 unknown host exception, this can be the issue. Check that the host is
 discoverable either form dns or from hosts.


 2014-11-05 14:34:54,717 [JobControl] INFO
  org.apache.hadoop.mapreduce.lib.jobcontrol.ControlledJob -
 PigLatin:DefaultJobName got an error while submitting
 org.apache.pig.backend.executionengine.ExecException: ERROR 2118:
 java.net.UnknownHostException: develop
 at
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:288)
 at
 org.apache.hadoop.mapreduce.JobSubmitter.writeNewSplits(JobSubmitter.java:493)
 at
 org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:510)
 at
 org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:394)
 at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1295)
 at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1292)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
 at
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1554)
 at org.apache.hadoop.mapreduce.Job.submit(Job.java:1292)
 at
 org.apache.hadoop.mapreduce.lib.jobcontrol.ControlledJob.submit(ControlledJob.java:335)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:606)
 at
 org.apache.pig.backend.hadoop23.PigJobControl.submit(PigJobControl.java:128)
 at
 org.apache.pig.backend.hadoop23.PigJobControl.run(PigJobControl.java:191)
 at java.lang.Thread.run(Thread.java:744)
 at
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher$1.run(MapReduceLauncher.java:270)
 Caused by: java.lang.IllegalArgumentException:
 java.net.UnknownHostException: develop
 at
 org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:377)
 at
 org.apache.hadoop.hdfs.NameNodeProxies.createNonHAProxy(NameNodeProxies.java:237)
 at
 org.apache.hadoop.hdfs.NameNodeProxies.createProxy(NameNodeProxies.java:141)
 at org.apache.hadoop.hdfs.DFSClient.init(DFSClient.java:576)
 at org.apache.hadoop.hdfs.DFSClient.init(DFSClient.java:521)
 at
 org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:146)
 at
 org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2397)
 at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:89)
 at
 org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2431)
 at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2413)
 at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:368)
 at org.apache.hadoop.fs.Path.getFileSystem(Path.java:296)
 at
 org.apache.hcatalog.mapreduce.HCatBaseInputFormat.setInputPath(HCatBaseInputFormat.java:326)
 at
 org.apache.hcatalog.mapreduce.HCatBaseInputFormat.getSplits(HCatBaseInputFormat.java:127)
 at
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:274)
 ... 18 more
 Caused by: java.net.UnknownHostException: develop
 ... 33 more




 --

 Jaggu Naidu

issue about submit job to local ,not to cluster

2014-10-21 Thread ch huang

hi,maillist:
   my cluster move from one IDC to another IDC ,when all done ,i run
job ,and find the job run on local box not on cluster ,why? it is normal on
old IDC!

how to copy data between two hdfs cluster fastly?

2014-10-17 Thread ch huang

hi,maillist:
 i now use distcp to migrate data from CDH4.4 to CDH5.1 , i
find when copy small file,it very good, but when transfer big data ,it very
slow ,any good method recommand? thanks

Re: how to copy data between two hdfs cluster fastly?

2014-10-17 Thread ch huang

no ,all default

On Fri, Oct 17, 2014 at 5:46 PM, Azuryy Yu azury...@gmail.com wrote:

 Did you specified how many map tasks?


 On Fri, Oct 17, 2014 at 4:58 PM, ch huang justlo...@gmail.com wrote:

 hi,maillist:
  i now use distcp to migrate data from CDH4.4 to CDH5.1 , i
 find when copy small file,it very good, but when transfer big data ,it very
 slow ,any good method recommand? thanks

Re: how to copy data between two hdfs cluster fastly?

2014-10-17 Thread ch huang

some file , total size  is 2T ,and block size  is 128M

On Sat, Oct 18, 2014 at 2:26 AM, Shivram Mani sm...@pivotal.io wrote:

 What is your approx input size ?
 Do you have multiple files or is this one large file ?
 What is your block size (source and destination cluster) ?

 On Fri, Oct 17, 2014 at 4:19 AM, ch huang justlo...@gmail.com wrote:

 no ,all default

 On Fri, Oct 17, 2014 at 5:46 PM, Azuryy Yu azury...@gmail.com wrote:

 Did you specified how many map tasks?


 On Fri, Oct 17, 2014 at 4:58 PM, ch huang justlo...@gmail.com wrote:

 hi,maillist:
  i now use distcp to migrate data from CDH4.4 to CDH5.1 , i
 find when copy small file,it very good, but when transfer big data ,it very
 slow ,any good method recommand? thanks






 --
 Thanks
 Shivram

Re: how to copy data between two hdfs cluster fastly?

2014-10-17 Thread ch huang

yes

On Sat, Oct 18, 2014 at 3:53 AM, Jakub Stransky stransky...@gmail.com
wrote:

 Distcp?
 On 17 Oct 2014 20:51, Alexander Pivovarov apivova...@gmail.com wrote:

 try to run on dest cluster datanode
 $ hadoop fs -cp hdfs://from_cluster/hdfs://to_cluster/



 On Fri, Oct 17, 2014 at 11:26 AM, Shivram Mani sm...@pivotal.io wrote:

 What is your approx input size ?
 Do you have multiple files or is this one large file ?
 What is your block size (source and destination cluster) ?

 On Fri, Oct 17, 2014 at 4:19 AM, ch huang justlo...@gmail.com wrote:

 no ,all default

 On Fri, Oct 17, 2014 at 5:46 PM, Azuryy Yu azury...@gmail.com wrote:

 Did you specified how many map tasks?


 On Fri, Oct 17, 2014 at 4:58 PM, ch huang justlo...@gmail.com wrote:

 hi,maillist:
  i now use distcp to migrate data from CDH4.4 to CDH5.1 ,
 i find when copy small file,it very good, but when transfer big data ,it
 very slow ,any good method recommand? thanks






 --
 Thanks
 Shivram

issue about namenode start slow

2014-09-21 Thread ch huang

hi,maillist:

  my hadoop cluster lost power last weekend ,when i restart my namenode
,i find it get info from journalnode ,and do transaction again ,but i see
it very slow,and i observer 50070 web ui ,find lot's of info like 
http://hz49:8480/getJournal?jid=developsegmentTxId=4261627storageInfo=-55%3A466484546%3A0%3ACID-a140fb1a-ac10-4053-8b91-8f19f2809b7c
 

i look the size of my journalnode local storage,it's only about 400M
i do not know why the load process took such long time?

issue about let a common user run application on YARN (with kerberose)

2014-09-16 Thread ch huang

hi,maillist:

   i use kerberos to do auth for my hadoop cluster, i have a 3-node
cluster(HDFS

 YARN)

z1.example.com (NN,RM)
z2.example.com (NM,DN)
z3.example.com (NM,DN,proxyserver,historyserver)

i create principal for NNDN
hdfs/z1.example@example.com
hdfs/z2.example@example.com
hdfs/z3.example@example.com

and for RMNM

yarn/z1.example@example.com
yarn/z2.example@example.com
yarn/z3.example@example.com

for mapreduce history server

mapred/z1.example@example.com
mapred/z2.example@example.com
mapred/z3.example@example.com

for http principals in SPNEGO (instead of kerberos SSL for HTTP
transactions)

HTTP/z1.example@example.com
HTTP/z2.example@example.com
HTTP/z3.example@example.com

and i can start cluster (HDFS  YARN) successfully

but i do not know how to let a common user to run application on yarn ,(i
am not know kerberos),any one can help?

Datanode can not start with error Error creating plugin: org.apache.hadoop.metrics2.sink.FileSink

2014-09-03 Thread ch huang

hi,maillist:

   i have a 10-worknode hadoop cluster using CDH 4.4.0 , one of my datanode
,one of it's disk is full

, when i restart this datanode ,i get error


STARTUP_MSG:   java = 1.7.0_45
/
2014-09-04 10:20:00,576 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: registered UNIX signal
handlers for [TERM, HUP, INT]
2014-09-04 10:20:01,457 INFO org.apache.hadoop.metrics2.impl.MetricsConfig:
loaded properties from hadoop-metrics2.properties
2014-09-04 10:20:01,465 WARN
org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Error creating sink
'file'
org.apache.hadoop.metrics2.impl.MetricsConfigException: Error creating
plugin: org.apache.hadoop.metrics2.sink.FileSink
at
org.apache.hadoop.metrics2.impl.MetricsConfig.getPlugin(MetricsConfig.java:203)
at
org.apache.hadoop.metrics2.impl.MetricsSystemImpl.newSink(MetricsSystemImpl.java:478)
at
org.apache.hadoop.metrics2.impl.MetricsSystemImpl.configureSinks(MetricsSystemImpl.java:450)
at
org.apache.hadoop.metrics2.impl.MetricsSystemImpl.configure(MetricsSystemImpl.java:429)
at
org.apache.hadoop.metrics2.impl.MetricsSystemImpl.start(MetricsSystemImpl.java:180)
at
org.apache.hadoop.metrics2.impl.MetricsSystemImpl.init(MetricsSystemImpl.java:156)
at
org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.init(DefaultMetricsSystem.java:54)
at
org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.initialize(DefaultMetricsSystem.java:50)
at
org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:1792)
at
org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:1728)
at
org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:1751)
at
org.apache.hadoop.hdfs.server.datanode.DataNode.secureMain(DataNode.java:1904)
at
org.apache.hadoop.hdfs.server.datanode.DataNode.main(DataNode.java:1925)
Caused by: org.apache.hadoop.metrics2.MetricsException: Error creating
datanode-metrics.out
at org.apache.hadoop.metrics2.sink.FileSink.init(FileSink.java:53)
at
org.apache.hadoop.metrics2.impl.MetricsConfig.getPlugin(MetricsConfig.java:199)
... 12 more
Caused by: java.io.FileNotFoundException: datanode-metrics.out (Permission
denied)
at java.io.FileOutputStream.open(Native Method)
at java.io.FileOutputStream.init(FileOutputStream.java:221)
at java.io.FileWriter.init(FileWriter.java:107)
at org.apache.hadoop.metrics2.sink.FileSink.init(FileSink.java:48)
... 13 more
2014-09-04 10:20:01,488 INFO
org.apache.hadoop.metrics2.impl.MetricsSinkAdapter: Sink ganglia started
2014-09-04 10:20:01,546 INFO
org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled snapshot
period at 5 second(s).
2014-09-04 10:20:01,546 INFO
org.apache.hadoop.metrics2.impl.MetricsSystemImpl: DataNode metrics system
started
2014-09-04 10:20:01,547 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: Configured hostname is ch15
2014-09-04 10:20:01,569 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: Opened streaming server at
/0.0.0.0:50010
2014-09-04 10:20:01,572 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: Balancing bandwith is
10485760 bytes/s
2014-09-04 10:20:01,607 INFO org.mortbay.log: Logging to
org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via
org.mortbay.log.Slf4jLog
2014-09-04 10:20:01,657 INFO org.apache.hadoop.http.HttpServer: Added
global filter 'safety'
(class=org.apache.hadoop.http.HttpServer$QuotingInputFilter)
2014-09-04 10:20:01,660 INFO org.apache.hadoop.http.HttpServer: Added
filter static_user_filter
(class=org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter) to
context datanode
2014-09-04 10:20:01,660 INFO org.apache.hadoop.http.HttpServer: Added
filter static_user_filter
(class=org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter) to
context static
2014-09-04 10:20:01,660 INFO org.apache.hadoop.http.HttpServer: Added
filter static_user_filter
(class=org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter) to
context logs
2014-09-04 10:20:01,664 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: Opened info server at
0.0.0.0:50075
2014-09-04 10:20:01,668 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: dfs.webhdfs.enabled = true
2014-09-04 10:20:01,670 INFO org.apache.hadoop.http.HttpServer:
addJerseyResourcePackage:
packageName=org.apache.hadoop.hdfs.server.datanode.web.resources;org.apache.hadoop.hdfs.web.resources,
pathSpec=/webhdfs/v1/*
2014-09-04 10:20:01,676 INFO org.apache.hadoop.http.HttpServer:
HttpServer.start() threw a non Bind IOException
java.net.BindException: Port in use: 0.0.0.0:50075
at
org.apache.hadoop.http.HttpServer.openListener(HttpServer.java:729)
at org.apache.hadoop.http.HttpServer.start(HttpServer.java:673)
at
org.apache.hadoop.hdfs.server.datanode.DataNode.startInfoServer(DataNode.java:424)
at

issue about hadoop data migrate between IDC

2014-08-06 Thread ch huang

hi,maillist:
   my company signed a new IDC ,i must move the hadoop
data(about 30T data) to new IDC,any good suggestion?

issue about distcp Source and target differ in block-size. Use -pb to preserve block-sizes during copy.

2014-07-24 Thread ch huang

hi,maillist:
   i try to copy data from my old cluster to new cluster,i get
error ,how to handle this?

14/07/24 18:35:58 INFO mapreduce.Job: Task Id :
attempt_1406182801379_0004_m_00_1, Status : FAILED
Error: java.io.IOException: File copy failed:
webhdfs://CH22:50070/mytest/pipe_url_bak/part-m-1 --
webhdfs://develop/tmp/pipe_url_bak/part-m-1
at
org.apache.hadoop.tools.mapred.CopyMapper.copyFileWithRetry(CopyMapper.java:262)
at
org.apache.hadoop.tools.mapred.CopyMapper.map(CopyMapper.java:229)
at org.apache.hadoop.tools.mapred.CopyMapper.map(CopyMapper.java:45)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
Caused by: java.io.IOException: Couldn't run retriable-command: Copying
webhdfs://CH22:50070/mytest/pipe_url_bak/part-m-1 to
webhdfs://develop/tmp/pipe_url_bak/part-m-1
at
org.apache.hadoop.tools.util.RetriableCommand.execute(RetriableCommand.java:101)
at
org.apache.hadoop.tools.mapred.CopyMapper.copyFileWithRetry(CopyMapper.java:258)
... 10 more
Caused by: java.io.IOException: Error writing request body to server
at
sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.checkError(HttpURLConnection.java:3192)
at
sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.write(HttpURLConnection.java:3175)
at java.io.BufferedOutputStream.write(BufferedOutputStream.java:122)
at
org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:58)
at java.io.DataOutputStream.write(DataOutputStream.java:107)
at java.io.BufferedOutputStream.write(BufferedOutputStream.java:122)
at
org.apache.hadoop.tools.mapred.RetriableFileCopyCommand.copyBytes(RetriableFileCopyCommand.java:231)
at
org.apache.hadoop.tools.mapred.RetriableFileCopyCommand.copyToTmpFile(RetriableFileCopyCommand.java:164)
at
org.apache.hadoop.tools.mapred.RetriableFileCopyCommand.doCopy(RetriableFileCopyCommand.java:118)
at
org.apache.hadoop.tools.mapred.RetriableFileCopyCommand.doExecute(RetriableFileCopyCommand.java:95)
at
org.apache.hadoop.tools.util.RetriableCommand.execute(RetriableCommand.java:87)
... 11 more
14/07/24 18:35:59 INFO mapreduce.Job:  map 16% reduce 0%
14/07/24 18:39:39 INFO mapreduce.Job:  map 17% reduce 0%
14/07/24 19:04:27 INFO mapreduce.Job: Task Id :
attempt_1406182801379_0004_m_00_2, Status : FAILED
Error: java.io.IOException: File copy failed:
webhdfs://CH22:50070/mytest/pipe_url_bak/part-m-1 --
webhdfs://develop/tmp/pipe_url_bak/part-m-1
at
org.apache.hadoop.tools.mapred.CopyMapper.copyFileWithRetry(CopyMapper.java:262)
at
org.apache.hadoop.tools.mapred.CopyMapper.map(CopyMapper.java:229)
at org.apache.hadoop.tools.mapred.CopyMapper.map(CopyMapper.java:45)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
Caused by: java.io.IOException: Couldn't run retriable-command: Copying
webhdfs://CH22:50070/mytest/pipe_url_bak/part-m-1 to
webhdfs://develop/tmp/pipe_url_bak/part-m-1
at
org.apache.hadoop.tools.util.RetriableCommand.execute(RetriableCommand.java:101)

Re: issue about distcp Source and target differ in block-size. Use -pb to preserve block-sizes during copy.

2014-07-24 Thread ch huang

'
to transaction ID 62737
2014-07-24 17:37:34,195 INFO
org.apache.hadoop.hdfs.server.namenode.EditLogInputStream: Fast-forwarding
stream '
http://hz24:8480/getJournal?jid=developsegmentTxId=62737storageInfo=-55%3A466484546%3A0%3ACID-a140fb1a-ac10-4053-8b91-8f19f2809b7c'
to transaction ID 62737
2014-07-24 17:37:34,223 INFO BlockStateChange: BLOCK* addToInvalidates:
blk_1073753271_12644 192.168.10.51:50010 192.168.10.49:50010
192.168.10.50:50010
2014-07-24 17:37:34,224 INFO
org.apache.hadoop.hdfs.server.namenode.FSImage: Edits file
http://hz24:8480/getJournal?jid=developsegmentTxId=62737storageInfo=-55%3A466484546%3A0
:
2014-07-24 17:37:34,225 INFO
org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer: Loaded 3 edits
starting from txid 62736
2014-07-24 17:37:37,050 INFO org.apache.hadoop.hdfs.StateChange: BLOCK*
InvalidateBlocks: ask 192.168.10.51:50010 to delete [blk_1073753271_12644]
2014-07-24 17:37:40,050 INFO org.apache.hadoop.hdfs.StateChange: BLOCK*
InvalidateBlocks: ask 192.168.10.49:50010 to delete [blk_1073753271_12644]
2014-07-24 17:37:43,051 INFO org.apache.hadoop.hdfs.StateChange: BLOCK*
InvalidateBlocks: ask 192.168.10.50:50010 to delete [blk_1073753271_12644]
2014-07-24 17:39:34,255 INFO
org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer: Triggering log
roll on remote NameNode hz24/192.168.10.24:8020


On Fri, Jul 25, 2014 at 10:25 AM, Stanley Shi s...@gopivotal.com wrote:

 Would you please also past the corresponding namenode log?

  Regards,
 *Stanley Shi,*



 On Fri, Jul 25, 2014 at 9:15 AM, ch huang justlo...@gmail.com wrote:

 hi,maillist:
i try to copy data from my old cluster to new cluster,i get
 error ,how to handle this?

 14/07/24 18:35:58 INFO mapreduce.Job: Task Id :
 attempt_1406182801379_0004_m_00_1, Status : FAILED
 Error: java.io.IOException: File copy failed:
 webhdfs://CH22:50070/mytest/pipe_url_bak/part-m-1 --
 webhdfs://develop/tmp/pipe_url_bak/part-m-1
 at
 org.apache.hadoop.tools.mapred.CopyMapper.copyFileWithRetry(CopyMapper.java:262)
 at
 org.apache.hadoop.tools.mapred.CopyMapper.map(CopyMapper.java:229)
 at
 org.apache.hadoop.tools.mapred.CopyMapper.map(CopyMapper.java:45)
 at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
 at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
 at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
 at
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
 at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
 Caused by: java.io.IOException: Couldn't run retriable-command: Copying
 webhdfs://CH22:50070/mytest/pipe_url_bak/part-m-1 to
 webhdfs://develop/tmp/pipe_url_bak/part-m-1
 at
 org.apache.hadoop.tools.util.RetriableCommand.execute(RetriableCommand.java:101)
 at
 org.apache.hadoop.tools.mapred.CopyMapper.copyFileWithRetry(CopyMapper.java:258)
 ... 10 more
 Caused by: java.io.IOException: Error writing request body to server
 at
 sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.checkError(HttpURLConnection.java:3192)
 at
 sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.write(HttpURLConnection.java:3175)
 at
 java.io.BufferedOutputStream.write(BufferedOutputStream.java:122)
 at
 org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:58)
 at java.io.DataOutputStream.write(DataOutputStream.java:107)
 at
 java.io.BufferedOutputStream.write(BufferedOutputStream.java:122)
 at
 org.apache.hadoop.tools.mapred.RetriableFileCopyCommand.copyBytes(RetriableFileCopyCommand.java:231)
 at
 org.apache.hadoop.tools.mapred.RetriableFileCopyCommand.copyToTmpFile(RetriableFileCopyCommand.java:164)
 at
 org.apache.hadoop.tools.mapred.RetriableFileCopyCommand.doCopy(RetriableFileCopyCommand.java:118)
 at
 org.apache.hadoop.tools.mapred.RetriableFileCopyCommand.doExecute(RetriableFileCopyCommand.java:95)
 at
 org.apache.hadoop.tools.util.RetriableCommand.execute(RetriableCommand.java:87)
 ... 11 more
 14/07/24 18:35:59 INFO mapreduce.Job:  map 16% reduce 0%
 14/07/24 18:39:39 INFO mapreduce.Job:  map 17% reduce 0%
 14/07/24 19:04:27 INFO mapreduce.Job: Task Id :
 attempt_1406182801379_0004_m_00_2, Status : FAILED
 Error: java.io.IOException: File copy failed:
 webhdfs://CH22:50070/mytest/pipe_url_bak/part-m-1 --
 webhdfs://develop/tmp/pipe_url_bak/part-m-1
 at
 org.apache.hadoop.tools.mapred.CopyMapper.copyFileWithRetry(CopyMapper.java:262)
 at
 org.apache.hadoop.tools.mapred.CopyMapper.map(CopyMapper.java:229)
 at
 org.apache.hadoop.tools.mapred.CopyMapper.map

Re: issue about run MR job use system user

2014-07-24 Thread ch huang

i reslove this by make alex directory under staging directory and set the
owner to alex

On Thu, Jul 24, 2014 at 10:11 PM, java8964 java8...@hotmail.com wrote:

  Are you sure user 'Alex' belongs to 'hadoop' group? Why not your run
 command 'id alex' to prove it? And 'Alex' belongs to 'hadoop' group can be
 confirmed on the namenode?

 Yong

  --
 Date: Thu, 24 Jul 2014 17:11:06 +0800
 Subject: issue about run MR job use system user
 From: justlo...@gmail.com
 To: user@hadoop.apache.org


 hi,maillist:
   i create a system user on a box of my hadoop cluster ,but when i
 run MR job use this user ,it get a problem, the /data directory is for
 mapreduce history server option, and i also add the user into hadoop group
 ,since the /data privilege is 775 ,so it can write by user in hadoop
 group,why still cause permssion error? anyone can help?

  # useradd alex

 configuration
 property
  namemapreduce.framework.name/name
  valueyarn/value
 /property
 property
  namemapreduce.jobhistory.address/name
  value192.168.10.49:10020/value
 /property
 property
  namemapreduce.jobhistory.webapp.address/name
  value192.168.10.49:19888/value
 /property
 property
  nameyarn.app.mapreduce.am.staging-dir/name
  value/data/value
 /property
 .
 configuration
 $ hadoop fs -ls /
 Found 6 items
 drwxrwxr-x   - hdfs  hadoop  0 2014-07-14 18:17 /data

 $ hadoop fs -ls /data
 Found 3 items
 drwx--   - hdfs hadoop  0 2014-07-09 08:49 /data/hdfs
 drwxrwxrwt   - hdfs hadoop  0 2014-07-08 18:52 /data/history
 drwx--   - pipe hadoop  0 2014-07-14 18:17 /data/pipe

 [alex@hz23 ~]$ id
 uid=501(alex) gid=501(alex) groups=501(alex),497(hadoop)


 [alex@hz23 ~]$  hadoop jar
 /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples-2.3.0-cdh5.0.2.jar pi 2
 100
 Number of Maps  = 2
 Samples per Map = 100
 Wrote input for Map #0
 Wrote input for Map #1
 Starting Job
 14/07/24 17:06:23 WARN security.UserGroupInformation:
 PriviledgedActionException as:alex (auth:SIMPLE)
 cause:org.apache.hadoop.security.AccessControlException: Permission denied:
 user=alex, access=WRITE, inode=/data:hdfs:hadoop:drwxrwxr-x
 at
 org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkFsPermission(FSPermissionChecker.java:265)
 at
 org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:251)
 at
 org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:232)
 at
 org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:176)
 at
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:5490)
 at
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:5472)
 at
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkAncestorAccess(FSNamesystem.java:5446)
 at
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInternal(FSNamesystem.java:3600)
 at
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInt(FSNamesystem.java:3570)
 at
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirs(FSNamesystem.java:3544)
 at
 org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.mkdirs(NameNodeRpcServer.java:739)
 at
 org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.mkdirs(ClientNamenodeProtocolServerSideTranslatorPB.java:558)
 at
 org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
 at
 org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
 at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026)
 at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1986)
 at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1982)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
 at
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
 at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1980)
 org.apache.hadoop.security.AccessControlException: Permission denied:
 user=alex, access=WRITE, inode=/data:hdfs:hadoop:drwxrwxr-x

issue about run MR job use system user in CDH5

2014-07-22 Thread ch huang

hi,maillist:

i set up CDH5 yarn cluster ,and set the following option in my
mapred-site.xml file

property
 nameyarn.app.mapreduce.am.staging-dir/name
 value/data/value
/property


mapreduce history server will set history dir in the directory /data ,but
if i submit MR job use other user ,i get error , i add the user to hadoop
group also no use ,why?how can i do it? thanks

2014-07-22 14:07:06,734 INFO  [main] mapreduce.TableOutputFormat: Created
table instance for test_1
2014-07-22 14:07:06,765 WARN  [main] security.UserGroupInformation:
PriviledgedActionException as:hbase (auth:SIMPLE)
cause:org.apache.hadoop.security.AccessControlException: Permission denied:
user=hbase, access=EXECUTE, inode=/data:mapred:hadoop:drwxrwx---
at
org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkFsPermission(FSPermissionChecker.java:265)
at
org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:251)
at
org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkTraverse(FSPermissionChecker.java:205)
at
org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:168)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:5490)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getFileInfo(FSNamesystem.java:3499)
at
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getFileInfo(NameNodeRpcServer.java:764)
at
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getFileInfo(ClientNamenodeProtocolServerSideTranslatorPB.java:764)
at
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1986)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1982)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1980)

Exception in thread main
org.apache.hadoop.security.AccessControlException: Permission denied:
user=hbase, access=EXECUTE, inode=/data:mapred:hadoop:drwxrwx---
at
org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkFsPermission(FSPermissionChecker.java:265)
at
org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:251)
at
org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkTraverse(FSPermissionChecker.java:205)
at
org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:168)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:5490)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getFileInfo(FSNamesystem.java:3499)
at
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getFileInfo(NameNodeRpcServer.java:764)
at
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getFileInfo(ClientNamenodeProtocolServerSideTranslatorPB.java:764)
at
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1986)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1982)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1980)

why i can not use '*' in remove hadoop directory?

2014-07-21 Thread ch huang

hi,maillist:
 i use sudo -u hdfs hadoop fs -rm -r -skipTrash
/user/hive/warehouse/adx.db/dsp_request/2014-03*/* in CDH4.4,but i find it
can not work in CDH5,why?

# sudo -u hdfs hadoop fs -rm -r -skipTrash
/user/hive/warehouse/dsp.db/dsp_request/2014-01*/*
rm: `/user/hive/warehouse/dsp.db/dsp_request/2014-01*/*': No such file or
directory

can i monitor all hadoop component from one box?

2014-07-07 Thread ch huang

hi,maillist:

  i want to check all hadoop cluster component process is alive or die
,i do not know if it can do like check zookeeper node from one
machine?thanks

what stage dir in yarn framework use for?

2014-06-13 Thread ch huang

hi,mailist:
 i see yarn.app.mapreduce.am.staging-dir in doc ,and i
do not know it use for ,and i also want to know if the content in this dir
can be clean,
and if it can be set auto clean?

the data will lost if the active node data not sync to standby ?

2014-06-11 Thread ch huang

hi,maillist:
   i have a NN HA , when the active down ,the changed metadata
is still not write to local disk ,and not sync to standby ,this metadata is
lost,right?

Re: how can i monitor Decommission progress?

2014-06-05 Thread ch huang

but it can not show me ,how much it already done

On Fri, Jun 6, 2014 at 2:56 AM, Suresh Srinivas sur...@hortonworks.com
wrote:

  The namenode webui provides that information. Click on the main webui
 the link associated with decommissioned nodes.

 Sent from phone

 On Jun 5, 2014, at 10:36 AM, Raj K Singh rajkrrsi...@gmail.com wrote:

   use

 $hadoop dfsadmin -report


  
 Raj K Singh
 http://in.linkedin.com/in/rajkrrsingh
 http://www.rajkrrsingh.blogspot.com
 Mobile  Tel: +91 (0)9899821370


 On Sat, May 31, 2014 at 11:26 AM, ch huang justlo...@gmail.com wrote:

 hi,maillist:
   i decommission three node out of my cluster,but question is
 how can i see the decommission progress?,i just can see admin state from
 web ui



 CONFIDENTIALITY NOTICE
 NOTICE: This message is intended for the use of the individual or entity
 to which it is addressed and may contain information that is confidential,
 privileged and exempt from disclosure under applicable law. If the reader
 of this message is not the intended recipient, you are hereby notified that
 any printing, copying, dissemination, distribution, disclosure or
 forwarding of this communication is strictly prohibited. If you have
 received this communication in error, please contact the sender immediately
 and delete it from your system. Thank You.

issue about change NN node

2014-06-05 Thread ch huang

hi,mailist:
  I want to replace my NN in hadoop cluster(not NN HA,no secondary
NN),how can i do this?

Re: should i just assign history server address on NN or i have to assign on each node?

2014-06-04 Thread ch huang

my RM also is my NN node ,so i just configure it on the RM node? ,not each
other node?

On Wed, Jun 4, 2014 at 11:47 AM, Stanley Shi s...@gopivotal.com wrote:

 should set it on RM node;

  Regards,
 *Stanley Shi,*



 On Wed, Jun 4, 2014 at 9:24 AM, ch huang justlo...@gmail.com wrote:

 hi,maillist:
 i installed my job history server on my one of NN(i use NN
 HA) ,i want to ask if i need set history server address on each node?

Re: issue about move data between two hdoop cluster

2014-06-04 Thread ch huang

 it will took how long time to transform 50T data? what if i set the two
cluster only visit throw lan access?

On Thu, Jun 5, 2014 at 9:06 AM, Tsuyoshi OZAWA ozawa.tsuyo...@gmail.com
wrote:

 Hi Ch,

 How about using DistCp?
 http://hadoop.apache.org/docs/r1.2.1/distcp2.html

 Thanks,
 - Tsuyoshi

 On Wed, Jun 4, 2014 at 5:40 PM, ch huang justlo...@gmail.com wrote:
  hi,mailist:
my company signed a new IDC , i need move the total 50T
 data
  from old hadoop cluster to the new cluster in new location ,how to do it?



 --
 - Tsuyoshi

issue about how to decommission a datanode from hadoop cluster

2014-05-30 Thread ch huang

hi,maillist:
I use CDH4.4 yarnhdfs cluster  ,i want to decommission a
datanode ,should i modify hdfs-site.xml and mapred-xml of all node in
cluster to exclude the node ,or i just need set hdfs-site.xml and
mapred-xml on NN ?

how can i monitor Decommission progress?

2014-05-30 Thread ch huang

hi,maillist:
  i decommission three node out of my cluster,but question is
how can i see the decommission progress?,i just can see admin state from
web ui

issue about remove yarn jobs history logs

2014-05-29 Thread ch huang

hi,maillist:
  i want remove jobs history logs ,and i configure the
following info in yarn-site.xml,but it seems no use ,why? ( i use CDH4.4
yarn ,i configue on each datanode ,and my job history server on one of my
datanode)

  property
nameyarn.log-aggregation-enable/name
valuetrue/value
  /property
  property
descriptionWhere to aggregate logs to./description
nameyarn.nodemanager.remote-app-log-dir/name
value/var/log/hadoop-yarn/apps/value
  /property
  property
nameyarn.log-aggregation.retain-seconds/name
value1209600/value  !-- 14 days --
  /property

  property
nameyarn.log-aggregation.retain-check-interval-seconds/name
value300/value
  /property

can i shutdown one of work node when i do cluster balance operation?

2014-05-28 Thread ch huang

hi,maillist:
i want to update my some of worknode JDK version ,but the whole
cluster is in balance process ,can i do it?

how to save dead node when it's disk is full?

2014-05-28 Thread ch huang

hi,maillist:
 one of my datanode is full ,so dead in cluster ,how can i
do to let it live again?

Re: how to save dead node when it's disk is full?

2014-05-28 Thread ch huang

thanks for reply ,but my situaction is different , all the dead node disk
is full,i can not move data to another empty disk

On Thu, May 29, 2014 at 8:55 AM, Ted Yu yuzhih...@gmail.com wrote:

 Cycling old bits:

 http://search-hadoop.com/m/uMDyU1bxBJS/datanode+disk+fullsubj=Re+Disk+on+data+node+full


 On Wed, May 28, 2014 at 5:34 PM, ch huang justlo...@gmail.com wrote:

 hi,maillist:
  one of my datanode is full ,so dead in cluster ,how can
 i do to let it live again?

issue abount disk volume of node

2014-05-27 Thread ch huang

hi,maillist:
i have a 3-datanode cluster ,with each one has 1T disk volume
,recently i need run an app which need lots of disk volume, so i add
another three datanode with each one has 10T disk ,but as app running ,the
1T disk node is nearly full,how can i balance the data storage between 1T
node and 10T node? thanks

question about NM heapsize

2014-05-22 Thread ch huang

hi,maillist:

i set YARN_NODEMANAGER_HEAPSIZE=15000,so the NM run in a 15G JVM,but
why i see web ui of yarn ,in it's Active Nodes - Mem Avail ,only 8GB? ,why?

should i need tune mapred.child.java.opts option if i use yarn and MRV2?

2014-05-22 Thread ch huang

hi,mailist:
 i want to know if this option still cause limitation in YARN?

issue about joblistcache

2014-05-16 Thread ch huang

hi,maillist:
 i see lot's of information from yarn jobhistory server says  ,
where is the JobListCache  location?

2014-05-15 15:15:30,758 WARN
org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager: Waiting to remove
job_1395143379025_7877 from JobListCache because it is not in done yet

Re: issue about cluster balance

2014-05-15 Thread ch huang

i record the disk status befor balance and after balance,from one of source
node  and one of destination node

before
source node
/dev/sdd  1.8T 1009G  733G  58% /data/1
/dev/sde  1.8T 1005G  737G  58% /data/2
/dev/sda  1.8T  980G  762G  57% /data/3
/dev/sdb  1.8T  980G  762G  57% /data/4
/dev/sdc  1.8T  972G  769G  56% /data/5
/dev/sdf  1.8T  980G  762G  57% /data/

destination node
/dev/sdb  1.8T  2.0G  1.7T   1% /data/1
/dev/sdc  1.8T  2.1G  1.7T   1% /data/2
/dev/sdd  1.8T  2.0G  1.7T   1% /data/3
/dev/sde  1.8T  2.2G  1.7T   1% /data/4
/dev/sdf  1.8T  2.2G  1.7T   1% /data/5

after
/dev/sdd  1.8T  754G  988G  44% /data/1
/dev/sde  1.8T  736G 1006G  43% /data/2
/dev/sda  1.8T  730G 1011G  42% /data/3
/dev/sdb  1.8T  721G 1020G  42% /data/4
/dev/sdc  1.8T  721G 1021G  42% /data/5
/dev/sdf  1.8T  723G 1019G  42% /data/6

/dev/sdb  1.8T  388G  1.4T  23% /data/1
/dev/sdc  1.8T  381G  1.4T  22% /data/2
/dev/sdd  1.8T  378G  1.4T  22% /data/3
/dev/sde  1.8T  375G  1.4T  22% /data/4
/dev/sdf  1.8T  374G  1.4T  22% /data/5

my wonder is why the source node is not equal destination node ,like 30%
each ?,and the balance took 62.9919295 hours

On Tue, May 6, 2014 at 12:38 PM, Rakesh R rake...@huawei.com wrote:

  Could you give more details like,

 -  Could you convert 7% to the total amount of moved data in MBs.

 -  Also, could you tell me 7% data movement per DN ?

 -  What values showing for the ‘over-utilized’, ‘above-average’,
 ‘below-average’, ‘below-average’ nodes. Balancer will do the pairing based
 on these values.

 -  Please tell me the cluster topology - SAME_NODE_GROUP,
 SAME_RACK. Basically this will matters when choosing the sourceNode vs
 balancerNode pairs as well as the proxy source.

 Did you see all the DNs are getting utilized for the block movement.

 -  Any exceptions occurred when block movement

 -  How many iterations played in these hours



 -Rakesh



 *From:* ch huang [mailto:justlo...@gmail.com]
 *Sent:* 06 May 2014 06:10
 *To:* user@hadoop.apache.org
 *Subject:* issue about cluster balance



 hi,maillist:

  i have a 5-node hadoop cluster,and yesterday i add 5 new
 box into my cluster,after that i start balance task,but it move only 7%
 data to new node in 20 hour , and i already set
 dfs.datanode.balance.bandwidthPerSec 10M ,and the threshold is 10%,why the
 balance task take long time ?

issue about cluster balance

2014-05-05 Thread ch huang

hi,maillist:
 i have a 5-node hadoop cluster,and yesterday i add 5 new
box into my cluster,after that i start balance task,but it move only 7%
data to new node in 20 hour , and i already set
dfs.datanode.balance.bandwidthPerSec 10M ,and the threshold is 10%,why the
balance task take long time ?

Re: how can i archive old data in HDFS?

2014-04-14 Thread ch huang

it just combine several file into one file ,no zip happened

On Fri, Apr 11, 2014 at 9:10 PM, Peyman Mohajerian mohaj...@gmail.comwrote:

 There is: http://hadoop.apache.org/docs/r1.2.1/hadoop_archives.html
 But not sure if it compresses the data or not.


 On Thu, Apr 10, 2014 at 9:57 PM, Stanley Shi s...@gopivotal.com wrote:

 AFAIK, no tools now.

  Regards,
 *Stanley Shi,*



 On Fri, Apr 11, 2014 at 9:09 AM, ch huang justlo...@gmail.com wrote:

 hi,maillist:
  how can i archive old data in HDFS ,i have lot of old data ,the
 data will not be use ,but it take lot of space to store it ,i want to
 archive and zip the old data, HDFS can do this operation?

Re: use setrep change number of file replicas,but not work

2014-04-10 Thread ch huang

i can use fsck to get Over-replicated blocks but how can i track pending
delete ?

On Thu, Apr 10, 2014 at 10:50 AM, Harsh J ha...@cloudera.com wrote:

 The replica deletion is asynchronous. You can track its deletions via
 the NameNode's over-replicated blocks and the pending delete metrics.

 On Thu, Apr 10, 2014 at 7:16 AM, ch huang justlo...@gmail.com wrote:
  hi,maillist:
  i try modify replica number on some dir but it seems not work
  ,anyone know why?
 
  # sudo -u hdfs hadoop fs -setrep -R 2 /user/hive/warehouse/mytest
  Replication 2 set:
  /user/hive/warehouse/mytest/dsp_request/2014-01-26/data_0
 
  the file still store 3 replica ,but the echo number changed
  # hadoop fs -ls /user/hive/warehouse/mytest/dsp_request/2014-01-26
  Found 1 items
  -rw-r--r--   2 hdfs hdfs  17660 2014-01-26 18:34
  /user/hive/warehouse/mytest/dsp_request/2014-01-26/data_0
 
  # sudo -u hdfs hdfs fsck
  /user/hive/warehouse/mytest/dsp_request/2014-01-26/data_0 -files
 -blocks
  -locations
  Connecting to namenode via http://ch11:50070
  FSCK started by hdfs (auth:SIMPLE) from /192.168.11.12 for path
  /user/hive/warehouse/mytest/dsp_request/2014-01-26/data_0 at Thu Apr
 10
  09:39:51 CST 2014
  /user/hive/warehouse/mytest/dsp_request/2014-01-26/data_0 17660
 bytes, 1
  block(s):  OK
  0.
 
 BP-1043055049-192.168.11.11-1382442676609:blk_-9219869107960013037_1976591
  len=17660 repl=3 [192.168.11.13:50010, 192.168.11.10:50010,
  192.168.11.14:50010]
 
  i remove the file ,and upload new file ,as i understand ,the new file
 should
  be stored in 2 replica,but it still store 3 replica ,why?
  # sudo -u hdfs hadoop fs -rm -r -skipTrash
  /user/hive/warehouse/mytest/dsp_request/2014-01-26/*
  Deleted /user/hive/warehouse/mytest/dsp_request/2014-01-26/data_0
  # hadoop fs -put ./data_0
  /user/hive/warehouse/mytest/dsp_request/2014-01-26/
  [root@ch12 ~]# hadoop fs -ls
  /user/hive/warehouse/mytest/dsp_request/2014-01-26
  Found 1 items
  -rw-r--r--   3 root hdfs  17660 2014-04-10 09:40
  /user/hive/warehouse/mytest/dsp_request/2014-01-26/data_0
  # sudo -u hdfs hdfs fsck
  /user/hive/warehouse/mytest/dsp_request/2014-01-26/data_0 -files
 -blocks
  -locations
  Connecting to namenode via http://ch11:50070
  FSCK started by hdfs (auth:SIMPLE) from /192.168.11.12 for path
  /user/hive/warehouse/mytest/dsp_request/2014-01-26/data_0 at Thu Apr
 10
  09:41:12 CST 2014
  /user/hive/warehouse/mytest/dsp_request/2014-01-26/data_0 17660
 bytes, 1
  block(s):  OK
  0.
 BP-1043055049-192.168.11.11-1382442676609:blk_6517693524032437780_8889786
  len=17660 repl=3 [192.168.11.12:50010, 192.168.11.15:50010,
  192.168.11.13:50010]



 --
 Harsh J

Re: use setrep change number of file replicas,but not work

2014-04-10 Thread ch huang

i set replica number from 3 to 2,but i dump NN metrics ,the
PendingDeletionBlocks is zero ,why?
if the check thread will sleep a interval then do it's check work ,how long
the interval time is?

On Thu, Apr 10, 2014 at 10:50 AM, Harsh J ha...@cloudera.com wrote:

 The replica deletion is asynchronous. You can track its deletions via
 the NameNode's over-replicated blocks and the pending delete metrics.

 On Thu, Apr 10, 2014 at 7:16 AM, ch huang justlo...@gmail.com wrote:
  hi,maillist:
  i try modify replica number on some dir but it seems not work
  ,anyone know why?
 
  # sudo -u hdfs hadoop fs -setrep -R 2 /user/hive/warehouse/mytest
  Replication 2 set:
  /user/hive/warehouse/mytest/dsp_request/2014-01-26/data_0
 
  the file still store 3 replica ,but the echo number changed
  # hadoop fs -ls /user/hive/warehouse/mytest/dsp_request/2014-01-26
  Found 1 items
  -rw-r--r--   2 hdfs hdfs  17660 2014-01-26 18:34
  /user/hive/warehouse/mytest/dsp_request/2014-01-26/data_0
 
  # sudo -u hdfs hdfs fsck
  /user/hive/warehouse/mytest/dsp_request/2014-01-26/data_0 -files
 -blocks
  -locations
  Connecting to namenode via http://ch11:50070
  FSCK started by hdfs (auth:SIMPLE) from /192.168.11.12 for path
  /user/hive/warehouse/mytest/dsp_request/2014-01-26/data_0 at Thu Apr
 10
  09:39:51 CST 2014
  /user/hive/warehouse/mytest/dsp_request/2014-01-26/data_0 17660
 bytes, 1
  block(s):  OK
  0.
 
 BP-1043055049-192.168.11.11-1382442676609:blk_-9219869107960013037_1976591
  len=17660 repl=3 [192.168.11.13:50010, 192.168.11.10:50010,
  192.168.11.14:50010]
 
  i remove the file ,and upload new file ,as i understand ,the new file
 should
  be stored in 2 replica,but it still store 3 replica ,why?
  # sudo -u hdfs hadoop fs -rm -r -skipTrash
  /user/hive/warehouse/mytest/dsp_request/2014-01-26/*
  Deleted /user/hive/warehouse/mytest/dsp_request/2014-01-26/data_0
  # hadoop fs -put ./data_0
  /user/hive/warehouse/mytest/dsp_request/2014-01-26/
  [root@ch12 ~]# hadoop fs -ls
  /user/hive/warehouse/mytest/dsp_request/2014-01-26
  Found 1 items
  -rw-r--r--   3 root hdfs  17660 2014-04-10 09:40
  /user/hive/warehouse/mytest/dsp_request/2014-01-26/data_0
  # sudo -u hdfs hdfs fsck
  /user/hive/warehouse/mytest/dsp_request/2014-01-26/data_0 -files
 -blocks
  -locations
  Connecting to namenode via http://ch11:50070
  FSCK started by hdfs (auth:SIMPLE) from /192.168.11.12 for path
  /user/hive/warehouse/mytest/dsp_request/2014-01-26/data_0 at Thu Apr
 10
  09:41:12 CST 2014
  /user/hive/warehouse/mytest/dsp_request/2014-01-26/data_0 17660
 bytes, 1
  block(s):  OK
  0.
 BP-1043055049-192.168.11.11-1382442676609:blk_6517693524032437780_8889786
  len=17660 repl=3 [192.168.11.12:50010, 192.168.11.15:50010,
  192.168.11.13:50010]



 --
 Harsh J

which dir in HDFS can be clean ?

2014-04-10 Thread ch huang

hi,maillist:
  my HDFS cluster run about 1 year ,and i find many dir is very
large,i wonder if some of them can be clean?

like

/var/log/hadoop-yarn/apps

use setrep change number of file replicas,but not work

2014-04-09 Thread ch huang

hi,maillist:
i try modify replica number on some dir but it seems not work
,anyone know why?

# sudo -u hdfs hadoop fs -setrep -R 2 /user/hive/warehouse/mytest
Replication 2 set:
/user/hive/warehouse/mytest/dsp_request/2014-01-26/data_0

the file still store 3 replica ,but the echo number changed
# hadoop fs -ls /user/hive/warehouse/mytest/dsp_request/2014-01-26
Found 1 items
-rw-r--r--   2 hdfs hdfs  17660 2014-01-26 18:34
/user/hive/warehouse/mytest/dsp_request/2014-01-26/data_0

# sudo -u hdfs hdfs fsck
/user/hive/warehouse/mytest/dsp_request/2014-01-26/data_0 -files
-blocks -locations
Connecting to namenode via http://ch11:50070
FSCK started by hdfs (auth:SIMPLE) from /192.168.11.12 for path
/user/hive/warehouse/mytest/dsp_request/2014-01-26/data_0 at Thu Apr 10
09:39:51 CST 2014
/user/hive/warehouse/mytest/dsp_request/2014-01-26/data_0 17660 bytes,
1 block(s):  OK
0.
BP-1043055049-192.168.11.11-1382442676609:blk_-9219869107960013037_1976591
len=17660 repl=3 [192.168.11.13:50010, 192.168.11.10:50010,
192.168.11.14:50010]

i remove the file ,and upload new file ,as i understand ,the new file
should be stored in 2 replica,but it still store 3 replica ,why?
# sudo -u hdfs hadoop fs -rm -r -skipTrash
/user/hive/warehouse/mytest/dsp_request/2014-01-26/*
Deleted /user/hive/warehouse/mytest/dsp_request/2014-01-26/data_0
# hadoop fs -put ./data_0
/user/hive/warehouse/mytest/dsp_request/2014-01-26/
[root@ch12 ~]# hadoop fs -ls
/user/hive/warehouse/mytest/dsp_request/2014-01-26
Found 1 items
-rw-r--r--   3 root hdfs  17660 2014-04-10 09:40
/user/hive/warehouse/mytest/dsp_request/2014-01-26/data_0
# sudo -u hdfs hdfs fsck
/user/hive/warehouse/mytest/dsp_request/2014-01-26/data_0 -files
-blocks -locations
Connecting to namenode via http://ch11:50070
FSCK started by hdfs (auth:SIMPLE) from /192.168.11.12 for path
/user/hive/warehouse/mytest/dsp_request/2014-01-26/data_0 at Thu Apr 10
09:41:12 CST 2014
/user/hive/warehouse/mytest/dsp_request/2014-01-26/data_0 17660 bytes,
1 block(s):  OK
0.
BP-1043055049-192.168.11.11-1382442676609:blk_6517693524032437780_8889786
len=17660 repl=3 [192.168.11.12:50010, 192.168.11.15:50010,
192.168.11.13:50010]

issue about mv: `/user/hive/warehouse/dsp_execution/2014-01-31/data_00000': Input/output error

2014-03-30 Thread ch huang

hi,maillist :
i need move datafile from one dir to another dir,but i find if
i type
hadoop fs -mv /A/* /B/  in commandline ,it's ok,but if i let shell do it
,it will get

mv: `/user/hive/warehouse/dsp_click/2014-03-31/data_0': Input/output
error

i do not know why?

lot of attempt_local296445216_0001_m_000386_0 dir in NN dir

2014-03-24 Thread ch huang

hi,maillist:
  i find many dirs in
/data/hadoopmapredlocal/taskTracker/hdfs/jobcache/job_local296445216_0001
,it is my mapred local dir ,can i remove it safely? why there are many dirs
?

issue of Log aggregation has not completed or is not enabled.

2014-03-18 Thread ch huang

hi,maillist:
 i try look application log use the following process

# yarn application -list
Application-Id  Application-Name
User   Queue   State
Final-State Tracking-URL
application_1395126130647_0014  select user_id as userid,
adverti...stattime(Stage-1) hivehive
FINISHED   SUCCEEDED
ch18:19888/jobhistory/job/job_1395126130647_0014
# yarn logs -applicationId application_1395126130647_0014
Logs not available at
/var/log/hadoop-yarn/apps/root/logs/application_1395126130647_0014
Log aggregation has not completed or is not enabled.

but i do enable Log aggregation function ,here is my yarn-site.xml
configuration about log aggregation

  property
nameyarn.log-aggregation-enable/name
valuetrue/value
  /property
  property
descriptionWhere to aggregate logs to./description
nameyarn.nodemanager.remote-app-log-dir/name
value/var/log/hadoop-yarn/apps/value
  /property

the application logs is not put on hdfs successfully,why?

# hadoop fs -ls
/var/log/hadoop-yarn/apps/root/logs/application_1395126130647_0014
ls: `/var/log/hadoop-yarn/apps/root/logs/application_1395126130647_0014':
No such file or directory

any optimize suggestion for high concurrent write into hdfs?

2014-02-20 Thread ch huang

hi,maillist:
  is there any optimize for large of write into hdfs in same time ?
thanks

issue about write append into hdfs ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: ch12:50010:DataXceiver error processing READ_BLOCK operation

2014-02-20 Thread ch huang

hi,maillist:
  i see the following info in my hdfs log ,and the block belong to
the file which write by scribe ,i do not know why
is there any limit in hdfs system ?

2014-02-21 10:33:30,235 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: opReadBlock
BP-1043055049-192.168.11.11-1382442676609:blk_-8536558734938003208_3823240
received exc
eption java.io.IOException: Replica gen stamp  block genstamp,
block=BP-1043055049-192.168.11.11-1382442676609:blk_-8536558734938003208_3823240,
replica=ReplicaWaitingToBeRecov
ered, blk_-8536558734938003208_3820986, RWR
  getNumBytes() = 35840
  getBytesOnDisk()  = 35840
  getVisibleLength()= -1
  getVolume()   = /data/4/dn/current
  getBlockFile()=
/data/4/dn/current/BP-1043055049-192.168.11.11-1382442676609/current/rbw/blk_-8536558734938003208
  unlinked=false
2014-02-21 10:33:30,235 WARN
org.apache.hadoop.hdfs.server.datanode.DataNode:
DatanodeRegistration(192.168.11.12,
storageID=DS-754202132-192.168.11.12-50010-1382443087835, infoP
ort=50075, ipcPort=50020,
storageInfo=lv=-40;cid=CID-0e777b8c-19f3-44a1-8af1-916877f2506c;nsid=2086828354;c=0):Got
exception while serving BP-1043055049-192.168.11.11-1382442676
609:blk_-8536558734938003208_3823240 to /192.168.11.15:56564
java.io.IOException: Replica gen stamp  block genstamp,
block=BP-1043055049-192.168.11.11-1382442676609:blk_-8536558734938003208_3823240,
replica=ReplicaWaitingToBeRecovered, b
lk_-8536558734938003208_3820986, RWR
  getNumBytes() = 35840
  getBytesOnDisk()  = 35840
  getVisibleLength()= -1
  getVolume()   = /data/4/dn/current
  getBlockFile()=
/data/4/dn/current/BP-1043055049-192.168.11.11-1382442676609/current/rbw/blk_-8536558734938003208
  unlinked=false
at
org.apache.hadoop.hdfs.server.datanode.BlockSender.init(BlockSender.java:205)
at
org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:326)
at
org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:92)
at
org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:64)
at
org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:221)
at java.lang.Thread.run(Thread.java:744)
2014-02-21 10:33:30,236 ERROR
org.apache.hadoop.hdfs.server.datanode.DataNode: ch12:50010:DataXceiver
error processing READ_BLOCK operation  src: /192.168.11.15:56564 dest: /
192.168.11.12:50010
java.io.IOException: Replica gen stamp  block genstamp,
block=BP-1043055049-192.168.11.11-1382442676609:blk_-8536558734938003208_3823240,
replica=ReplicaWaitingToBeRecovered, blk_-8536558734938003208_3820986, RWR
  getNumBytes() = 35840
  getBytesOnDisk()  = 35840
  getVisibleLength()= -1
  getVolume()   = /data/4/dn/current
  getBlockFile()=
/data/4/dn/current/BP-1043055049-192.168.11.11-1382442676609/current/rbw/blk_-8536558734938003208
  unlinked=false
at
org.apache.hadoop.hdfs.server.datanode.BlockSender.init(BlockSender.java:205)
at
org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:326)
at
org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:92)
at
org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:64)
at
org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:221)
at java.lang.Thread.run(Thread.java:744)

Re: issue about write append into hdfs ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: ch12:50010:DataXceiver error processing READ_BLOCK operation

2014-02-20 Thread ch huang

hi, i use CDH4.4

On Fri, Feb 21, 2014 at 12:04 PM, Ted Yu yuzhih...@gmail.com wrote:

 Which hadoop release are you using ?

 Cheers


 On Thu, Feb 20, 2014 at 8:57 PM, ch huang justlo...@gmail.com wrote:

  hi,maillist:
   i see the following info in my hdfs log ,and the block belong
 to the file which write by scribe ,i do not know why
 is there any limit in hdfs system ?

 2014-02-21 10:33:30,235 INFO
 org.apache.hadoop.hdfs.server.datanode.DataNode: opReadBlock
 BP-1043055049-192.168.11.11-1382442676609:blk_-8536558734938003208_3823240
 received exc
 eption java.io.IOException: Replica gen stamp  block genstamp,
 block=BP-1043055049-192.168.11.11-1382442676609:blk_-8536558734938003208_3823240,
 replica=ReplicaWaitingToBeRecov
 ered, blk_-8536558734938003208_3820986, RWR
   getNumBytes() = 35840
   getBytesOnDisk()  = 35840
   getVisibleLength()= -1
   getVolume()   = /data/4/dn/current
   getBlockFile()=
 /data/4/dn/current/BP-1043055049-192.168.11.11-1382442676609/current/rbw/blk_-8536558734938003208
   unlinked=false
 2014-02-21 10:33:30,235 WARN
 org.apache.hadoop.hdfs.server.datanode.DataNode:
 DatanodeRegistration(192.168.11.12,
 storageID=DS-754202132-192.168.11.12-50010-1382443087835, infoP
 ort=50075, ipcPort=50020,
 storageInfo=lv=-40;cid=CID-0e777b8c-19f3-44a1-8af1-916877f2506c;nsid=2086828354;c=0):Got
 exception while serving BP-1043055049-192.168.11.11-1382442676
 609:blk_-8536558734938003208_3823240 to /192.168.11.15:56564
 java.io.IOException: Replica gen stamp  block genstamp,
 block=BP-1043055049-192.168.11.11-1382442676609:blk_-8536558734938003208_3823240,
 replica=ReplicaWaitingToBeRecovered, b
 lk_-8536558734938003208_3820986, RWR
   getNumBytes() = 35840
   getBytesOnDisk()  = 35840
   getVisibleLength()= -1
   getVolume()   = /data/4/dn/current
   getBlockFile()=
 /data/4/dn/current/BP-1043055049-192.168.11.11-1382442676609/current/rbw/blk_-8536558734938003208
   unlinked=false
 at
 org.apache.hadoop.hdfs.server.datanode.BlockSender.init(BlockSender.java:205)
 at
 org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:326)
 at
 org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:92)
 at
 org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:64)
 at
 org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:221)
 at java.lang.Thread.run(Thread.java:744)
 2014-02-21 10:33:30,236 ERROR
 org.apache.hadoop.hdfs.server.datanode.DataNode: ch12:50010:DataXceiver
 error processing READ_BLOCK operation  src: /192.168.11.15:56564 dest: /
 192.168.11.12:50010
 java.io.IOException: Replica gen stamp  block genstamp,
 block=BP-1043055049-192.168.11.11-1382442676609:blk_-8536558734938003208_3823240,
 replica=ReplicaWaitingToBeRecovered, blk_-8536558734938003208_3820986, RWR
   getNumBytes() = 35840
   getBytesOnDisk()  = 35840
   getVisibleLength()= -1
   getVolume()   = /data/4/dn/current
   getBlockFile()=
 /data/4/dn/current/BP-1043055049-192.168.11.11-1382442676609/current/rbw/blk_-8536558734938003208
   unlinked=false
 at
 org.apache.hadoop.hdfs.server.datanode.BlockSender.init(BlockSender.java:205)
 at
 org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:326)
 at
 org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:92)
 at
 org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:64)
 at
 org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:221)
 at java.lang.Thread.run(Thread.java:744)

Re: issue about write append into hdfs ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: ch12:50010:DataXceiver error processing READ_BLOCK operation

2014-02-20 Thread ch huang

i use default value it seems the value is 4096,

and also i checked hdfs user limit ,it's large enough

-bash-4.1$ ulimit -a
core file size  (blocks, -c) 0
data seg size   (kbytes, -d) unlimited
scheduling priority (-e) 0
file size   (blocks, -f) unlimited
pending signals (-i) 514914
max locked memory   (kbytes, -l) 64
max memory size (kbytes, -m) unlimited
open files  (-n) 32768
pipe size(512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority  (-r) 0
stack size  (kbytes, -s) 10240
cpu time   (seconds, -t) unlimited
max user processes  (-u) 65536
virtual memory  (kbytes, -v) unlimited
file locks  (-x) unlimited


On Fri, Feb 21, 2014 at 12:25 PM, Anurag Tangri anurag_tan...@yahoo.comwrote:

  Did you check your unix open file limit and data node xceiver value ?

 Is it too low for the number of blocks/data in your cluster ?

 Thanks,
 Anurag Tangri

 On Feb 20, 2014, at 6:57 PM, ch huang justlo...@gmail.com wrote:

   hi,maillist:
   i see the following info in my hdfs log ,and the block belong to
 the file which write by scribe ,i do not know why
 is there any limit in hdfs system ?

 2014-02-21 10:33:30,235 INFO
 org.apache.hadoop.hdfs.server.datanode.DataNode: opReadBlock
 BP-1043055049-192.168.11.11-1382442676609:blk_-8536558734938003208_3823240
 received exc
 eption java.io.IOException: Replica gen stamp  block genstamp,
 block=BP-1043055049-192.168.11.11-1382442676609:blk_-8536558734938003208_3823240,
 replica=ReplicaWaitingToBeRecov
 ered, blk_-8536558734938003208_3820986, RWR
   getNumBytes() = 35840
   getBytesOnDisk()  = 35840
   getVisibleLength()= -1
   getVolume()   = /data/4/dn/current
   getBlockFile()=
 /data/4/dn/current/BP-1043055049-192.168.11.11-1382442676609/current/rbw/blk_-8536558734938003208
   unlinked=false
 2014-02-21 10:33:30,235 WARN
 org.apache.hadoop.hdfs.server.datanode.DataNode:
 DatanodeRegistration(192.168.11.12,
 storageID=DS-754202132-192.168.11.12-50010-1382443087835, infoP
 ort=50075, ipcPort=50020,
 storageInfo=lv=-40;cid=CID-0e777b8c-19f3-44a1-8af1-916877f2506c;nsid=2086828354;c=0):Got
 exception while serving BP-1043055049-192.168.11.11-1382442676
 609:blk_-8536558734938003208_3823240 to /192.168.11.15:56564
 java.io.IOException: Replica gen stamp  block genstamp,
 block=BP-1043055049-192.168.11.11-1382442676609:blk_-8536558734938003208_3823240,
 replica=ReplicaWaitingToBeRecovered, b
 lk_-8536558734938003208_3820986, RWR
   getNumBytes() = 35840
   getBytesOnDisk()  = 35840
   getVisibleLength()= -1
   getVolume()   = /data/4/dn/current
   getBlockFile()=
 /data/4/dn/current/BP-1043055049-192.168.11.11-1382442676609/current/rbw/blk_-8536558734938003208
   unlinked=false
 at
 org.apache.hadoop.hdfs.server.datanode.BlockSender.init(BlockSender.java:205)
 at
 org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:326)
 at
 org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:92)
 at
 org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:64)
 at
 org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:221)
 at java.lang.Thread.run(Thread.java:744)
 2014-02-21 10:33:30,236 ERROR
 org.apache.hadoop.hdfs.server.datanode.DataNode: ch12:50010:DataXceiver
 error processing READ_BLOCK operation  src: /192.168.11.15:56564 dest: /
 192.168.11.12:50010
 java.io.IOException: Replica gen stamp  block genstamp,
 block=BP-1043055049-192.168.11.11-1382442676609:blk_-8536558734938003208_3823240,
 replica=ReplicaWaitingToBeRecovered, blk_-8536558734938003208_3820986, RWR
   getNumBytes() = 35840
   getBytesOnDisk()  = 35840
   getVisibleLength()= -1
   getVolume()   = /data/4/dn/current
   getBlockFile()=
 /data/4/dn/current/BP-1043055049-192.168.11.11-1382442676609/current/rbw/blk_-8536558734938003208
   unlinked=false
 at
 org.apache.hadoop.hdfs.server.datanode.BlockSender.init(BlockSender.java:205)
 at
 org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:326)
 at
 org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:92)
 at
 org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:64)
 at
 org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:221)
 at java.lang.Thread.run(Thread.java:744)

Re: issue about write append into hdfs ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: ch12:50010:DataXceiver error processing READ_BLOCK operation

2014-02-20 Thread ch huang

one more question is if i need add the value of data node xceiver
need i add it to my NN config file?



On Fri, Feb 21, 2014 at 12:25 PM, Anurag Tangri anurag_tan...@yahoo.comwrote:

  Did you check your unix open file limit and data node xceiver value ?

 Is it too low for the number of blocks/data in your cluster ?

 Thanks,
 Anurag Tangri

 On Feb 20, 2014, at 6:57 PM, ch huang justlo...@gmail.com wrote:

   hi,maillist:
   i see the following info in my hdfs log ,and the block belong to
 the file which write by scribe ,i do not know why
 is there any limit in hdfs system ?

 2014-02-21 10:33:30,235 INFO
 org.apache.hadoop.hdfs.server.datanode.DataNode: opReadBlock
 BP-1043055049-192.168.11.11-1382442676609:blk_-8536558734938003208_3823240
 received exc
 eption java.io.IOException: Replica gen stamp  block genstamp,
 block=BP-1043055049-192.168.11.11-1382442676609:blk_-8536558734938003208_3823240,
 replica=ReplicaWaitingToBeRecov
 ered, blk_-8536558734938003208_3820986, RWR
   getNumBytes() = 35840
   getBytesOnDisk()  = 35840
   getVisibleLength()= -1
   getVolume()   = /data/4/dn/current
   getBlockFile()=
 /data/4/dn/current/BP-1043055049-192.168.11.11-1382442676609/current/rbw/blk_-8536558734938003208
   unlinked=false
 2014-02-21 10:33:30,235 WARN
 org.apache.hadoop.hdfs.server.datanode.DataNode:
 DatanodeRegistration(192.168.11.12,
 storageID=DS-754202132-192.168.11.12-50010-1382443087835, infoP
 ort=50075, ipcPort=50020,
 storageInfo=lv=-40;cid=CID-0e777b8c-19f3-44a1-8af1-916877f2506c;nsid=2086828354;c=0):Got
 exception while serving BP-1043055049-192.168.11.11-1382442676
 609:blk_-8536558734938003208_3823240 to /192.168.11.15:56564
 java.io.IOException: Replica gen stamp  block genstamp,
 block=BP-1043055049-192.168.11.11-1382442676609:blk_-8536558734938003208_3823240,
 replica=ReplicaWaitingToBeRecovered, b
 lk_-8536558734938003208_3820986, RWR
   getNumBytes() = 35840
   getBytesOnDisk()  = 35840
   getVisibleLength()= -1
   getVolume()   = /data/4/dn/current
   getBlockFile()=
 /data/4/dn/current/BP-1043055049-192.168.11.11-1382442676609/current/rbw/blk_-8536558734938003208
   unlinked=false
 at
 org.apache.hadoop.hdfs.server.datanode.BlockSender.init(BlockSender.java:205)
 at
 org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:326)
 at
 org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:92)
 at
 org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:64)
 at
 org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:221)
 at java.lang.Thread.run(Thread.java:744)
 2014-02-21 10:33:30,236 ERROR
 org.apache.hadoop.hdfs.server.datanode.DataNode: ch12:50010:DataXceiver
 error processing READ_BLOCK operation  src: /192.168.11.15:56564 dest: /
 192.168.11.12:50010
 java.io.IOException: Replica gen stamp  block genstamp,
 block=BP-1043055049-192.168.11.11-1382442676609:blk_-8536558734938003208_3823240,
 replica=ReplicaWaitingToBeRecovered, blk_-8536558734938003208_3820986, RWR
   getNumBytes() = 35840
   getBytesOnDisk()  = 35840
   getVisibleLength()= -1
   getVolume()   = /data/4/dn/current
   getBlockFile()=
 /data/4/dn/current/BP-1043055049-192.168.11.11-1382442676609/current/rbw/blk_-8536558734938003208
   unlinked=false
 at
 org.apache.hadoop.hdfs.server.datanode.BlockSender.init(BlockSender.java:205)
 at
 org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:326)
 at
 org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:92)
 at
 org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:64)
 at
 org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:221)
 at java.lang.Thread.run(Thread.java:744)

Re: issue about write append into hdfs ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: ch12:50010:DataXceiver error processing READ_BLOCK operation

2014-02-20 Thread ch huang

i changed all datanode config add dfs.datanode.max.xcievers value is 131072
and restart all DN, still no use

On Fri, Feb 21, 2014 at 12:25 PM, Anurag Tangri anurag_tan...@yahoo.comwrote:

  Did you check your unix open file limit and data node xceiver value ?

 Is it too low for the number of blocks/data in your cluster ?

 Thanks,
 Anurag Tangri

 On Feb 20, 2014, at 6:57 PM, ch huang justlo...@gmail.com wrote:

   hi,maillist:
   i see the following info in my hdfs log ,and the block belong to
 the file which write by scribe ,i do not know why
 is there any limit in hdfs system ?

 2014-02-21 10:33:30,235 INFO
 org.apache.hadoop.hdfs.server.datanode.DataNode: opReadBlock
 BP-1043055049-192.168.11.11-1382442676609:blk_-8536558734938003208_3823240
 received exc
 eption java.io.IOException: Replica gen stamp  block genstamp,
 block=BP-1043055049-192.168.11.11-1382442676609:blk_-8536558734938003208_3823240,
 replica=ReplicaWaitingToBeRecov
 ered, blk_-8536558734938003208_3820986, RWR
   getNumBytes() = 35840
   getBytesOnDisk()  = 35840
   getVisibleLength()= -1
   getVolume()   = /data/4/dn/current
   getBlockFile()=
 /data/4/dn/current/BP-1043055049-192.168.11.11-1382442676609/current/rbw/blk_-8536558734938003208
   unlinked=false
 2014-02-21 10:33:30,235 WARN
 org.apache.hadoop.hdfs.server.datanode.DataNode:
 DatanodeRegistration(192.168.11.12,
 storageID=DS-754202132-192.168.11.12-50010-1382443087835, infoP
 ort=50075, ipcPort=50020,
 storageInfo=lv=-40;cid=CID-0e777b8c-19f3-44a1-8af1-916877f2506c;nsid=2086828354;c=0):Got
 exception while serving BP-1043055049-192.168.11.11-1382442676
 609:blk_-8536558734938003208_3823240 to /192.168.11.15:56564
 java.io.IOException: Replica gen stamp  block genstamp,
 block=BP-1043055049-192.168.11.11-1382442676609:blk_-8536558734938003208_3823240,
 replica=ReplicaWaitingToBeRecovered, b
 lk_-8536558734938003208_3820986, RWR
   getNumBytes() = 35840
   getBytesOnDisk()  = 35840
   getVisibleLength()= -1
   getVolume()   = /data/4/dn/current
   getBlockFile()=
 /data/4/dn/current/BP-1043055049-192.168.11.11-1382442676609/current/rbw/blk_-8536558734938003208
   unlinked=false
 at
 org.apache.hadoop.hdfs.server.datanode.BlockSender.init(BlockSender.java:205)
 at
 org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:326)
 at
 org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:92)
 at
 org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:64)
 at
 org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:221)
 at java.lang.Thread.run(Thread.java:744)
 2014-02-21 10:33:30,236 ERROR
 org.apache.hadoop.hdfs.server.datanode.DataNode: ch12:50010:DataXceiver
 error processing READ_BLOCK operation  src: /192.168.11.15:56564 dest: /
 192.168.11.12:50010
 java.io.IOException: Replica gen stamp  block genstamp,
 block=BP-1043055049-192.168.11.11-1382442676609:blk_-8536558734938003208_3823240,
 replica=ReplicaWaitingToBeRecovered, blk_-8536558734938003208_3820986, RWR
   getNumBytes() = 35840
   getBytesOnDisk()  = 35840
   getVisibleLength()= -1
   getVolume()   = /data/4/dn/current
   getBlockFile()=
 /data/4/dn/current/BP-1043055049-192.168.11.11-1382442676609/current/rbw/blk_-8536558734938003208
   unlinked=false
 at
 org.apache.hadoop.hdfs.server.datanode.BlockSender.init(BlockSender.java:205)
 at
 org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:326)
 at
 org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:92)
 at
 org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:64)
 at
 org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:221)
 at java.lang.Thread.run(Thread.java:744)

issure about write append to a file ,and close ,reopen the same file

2014-02-13 Thread ch huang

hi,maillist:
   i use scribe to receive data from app ,write to hadoop hdfs,when
the system in high concurrency connect
it will cause hdfs error like the following ,the incoming connect will be
blocked ,and the tomcat will die,

in dir user/hive/warehouse/dsp.db/request ,the file data_0 will be
rotate each hour ,but the scribe ( we modified the scribe code) will switch
the same file when rotate happen ,so  data_0 will be close ,and reopen .
and when the load is high ,i can observe the corrupt replica of
data_0,how can i handle with it? thanks

[Thu Feb 13 23:59:59 2014] [hdfs] disconnected fileSys for
/user/hive/warehouse/dsp.db/request
[Thu Feb 13 23:59:59 2014] [hdfs] closing
/user/hive/warehouse/dsp.db/request/2014-02-13/data_0
[Thu Feb 13 23:59:59 2014] [hdfs] disconnecting fileSys for
/user/hive/warehouse/dsp.db/request/2014-02-13/data_0
[Thu Feb 13 23:59:59 2014] [hdfs] disconnected fileSys for
/user/hive/warehouse/dsp.db/request/2014-02-13/data_0
[Thu Feb 13 23:59:59 2014] [hdfs] Connecting to HDFS for
/user/hive/warehouse/dsp.db/request/2014-02-13/data_0
[Thu Feb 13 23:59:59 2014] [hdfs] opened for append
/user/hive/warehouse/dsp.db/request/2014-02-13/data_0
[Thu Feb 13 23:59:59 2014] [dsp_request] Opened file
/user/hive/warehouse/dsp.db/request/2014-02-13/data_0 for writing
[Thu Feb 13 23:59:59 2014] [dsp_request] 23:59 rotating file
2014-02-13/data old size 10027577955 max size 100
[Thu Feb 13 23:59:59 2014] [hdfs] Connecting to HDFS for
/user/hive/warehouse/dsp.db/request
[Thu Feb 13 23:59:59 2014] [hdfs] disconnecting fileSys for
/user/hive/warehouse/dsp.db/request
14/02/13 23:59:59 INFO hdfs.DFSClient: Exception in createBlockOutputStream
java.io.IOException: Bad connect ack with firstBadLink as
192.168.11.13:50010
at
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutputStream(DFSOutputStream.java:1117)
at
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:992)
at
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:494)
14/02/13 23:59:59 WARN hdfs.DFSClient: Error Recovery for block
BP-1043055049-192.168.11.11-1382442676609:blk_433572108425800355_3411489 in
pipeline 192.168.11.12:50010, 192.168.11.13:50010, 192.168.11.14:50010,
192.168.11.10:50010, 192.168.11.15:50010: bad datanode 192.168.11.13:50010
14/02/13 23:59:59 INFO hdfs.DFSClient: Exception in createBlockOutputStream
java.io.IOException: Bad connect ack with firstBadLink as
192.168.11.10:50010
at
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutputStream(DFSOutputStream.java:1117)
at
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:992)
at
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:494)
14/02/13 23:59:59 WARN hdfs.DFSClient: Error Recovery for block
BP-1043055049-192.168.11.11-1382442676609:blk_433572108425800355_3411489 in
pipeline 192.168.11.12:50010, 192.168.11.14:50010, 192.168.11.10:50010,
192.168.11.15:50010: bad datanode 192.168.11.10:50010
14/02/13 23:59:59 INFO hdfs.DFSClient: Exception in createBlockOutputStream
java.io.IOException: Bad connect ack with firstBadLink as
192.168.11.15:50010
at
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutputStream(DFSOutputStream.java:1117)
at
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:992)
at
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:494)
14/02/13 23:59:59 WARN hdfs.DFSClient: Error Recovery for block
BP-1043055049-192.168.11.11-1382442676609:blk_433572108425800355_3411489 in
pipeline 192.168.11.12:50010, 192.168.11.14:50010, 192.168.11.15:50010: bad
datanode 192.168.11.15:50010


/user/hive/warehouse/dsp.db/request/2014-02-13/data_0:
blk_433572108425800355_3411509 (replicas: l: 1 d: 0 c: 4 e: 0)
192.168.11.12:50010 :  192.168.11.13:50010(corrupt) :
192.168.11.14:50010(corrupt)
:  192.168.11.10:50010(corrupt) :  192.168.11.15:50010(corrupt) :

what difference between livereplicas ,excessreplicas and excessblocks?

2014-02-11 Thread ch huang

hi,maillist:
 i am very confused about the three conception ,because i see no
excessreplicas in metadata dump file,but i see one excess block ,in metrics
output ,why?

Which numa policy is the best for hadoop process?

2014-01-27 Thread ch huang

hi,maillist:
  the numa arch CPU has several policies ,i wonder if anyone has
tested it ,and which one is the best?

Re: hadoop report has corrupt block but i can not find any in block metadata

2014-01-25 Thread ch huang

and still not find which block corrupt ,i search keyword 'orrupt' ,only get

/hbase/.corrupt dir ,but it's a dir not corrupt block

On Sat, Jan 25, 2014 at 6:31 PM, Shekhar Sharma shekhar2...@gmail.comwrote:

 Run fsck command

 hadoop fsck 《path》-files  -blocks   -locations
  On 25 Jan 2014 08:04, ch huang justlo...@gmail.com wrote:

 hi,maillist:
this morning nagios alert hadoop has corrupt block ,i checked
 it use 
 hdfs dfsadmin -report ,from it output ,it did has corrupt blocks

 Configured Capacity: 53163259158528 (48.35 TB)
 Present Capacity: 50117251458834 (45.58 TB)
 DFS Remaining: 45289289015296 (41.19 TB)
 DFS Used: 4827962443538 (4.39 TB)
 DFS Used%: 9.63%
 Under replicated blocks: 277
 Blocks with corrupt replicas: 2
 Missing blocks: 0

 but i dump all metadata use 
 # sudo -u hdfs hdfs dfsadmin -metasave

 and loock the record which c: not 0 i can not find any block with
 corrupt replicas,why?

how to caculate a HDFS directory size ?

2014-01-08 Thread ch huang

hi,maillist:
  i want to caculate the size of a HDFS directory ,how to do it ?

how can i get job start and finish time?

2014-01-08 Thread ch huang

hi,maillist:
   my web ui is not available, i can use yarn application -list
,my question is how can i get job start and finish time?

issue about how to assiging map output to reducer?

2014-01-07 Thread ch huang

hi,maillist:
i look the containers log from  hadoop fs -cat
/var/log/hadoop-yarn/apps/root/logs/application_1388730279827_2770/CHBM221_50853

and log say it get 25 map output , and assiging 7 to fetcher 5, assiging 7
to fetcher 4 and assiging 11 to fetcher 3,my question is why not
 assiging 8 to fetcher 5, assiging 8 to fetcher 4 and assiging 9 to
fetcher 3  ?

2014-01-08 11:28:00,346 INFO [EventFetcher for fetching Map Completion
Events] org.apache.hadoop.mapreduce.task.reduce.EventFetcher:
attempt_1388730279827_2770_r_00_0: Got 25 new map-outputs
2014-01-08 11:28:00,348 INFO [fetcher#5]
org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: Assiging
CHBM223:8080 with 7 to fetcher#5
2014-01-08 11:28:00,349 INFO [fetcher#5]
org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: assigned 7 of 7
to CHBM223:8080 to fetcher#5
2014-01-08 11:28:00,349 INFO [fetcher#4]
org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: Assiging
CHBM222:8080 with 7 to fetcher#4
2014-01-08 11:28:00,349 INFO [fetcher#4]
org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: assigned 7 of 7
to CHBM222:8080 to fetcher#4
2014-01-08 11:28:00,352 INFO [fetcher#3]
org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: Assiging
CHBM221:8080 with 11 to fetcher#3
2014-01-08 11:28:00,352 INFO [fetcher#3]
org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: assigned 11 of 11
to CHBM221:8080 to fetcher#3

Re: issue about running hive MR job in hadoop

2014-01-03 Thread ch huang

yes i checked the code ,and find the Exception from

 lfs.mkdir(userFileCacheDir, null, false);

also find the AM located in CHBM224 ,all will failed but,AM located in
CHBM223,all success

in CHBM224

# ls -l /data/mrlocal/1/yarn/
total 8
drwxrwxrwx 5 yarn yarn 4096 Nov  5 20:50 local
drwxr-xr-x 3 yarn yarn 4096 Jan  3 15:57 logs
# ls -l /data/mrlocal/2/yarn/
total 8
drwxrwxrwx 5 yarn yarn 4096 Nov  5 20:50 local
drwxr-xr-x 3 yarn yarn 4096 Jan  3 15:57 logs

in CHBM223

# ls /data/mrlocal/1/yarn/ -l
total 8
drwxr-xr-x 5 yarn yarn 4096 Nov  5 20:51 local
drwxr-xr-x 3 yarn yarn 4096 Jan  3 15:46 logs
# ls /data/mrlocal/2/yarn/ -l
total 8
drwxr-xr-x 5 yarn yarn 4096 Nov  5 20:51 local
drwxr-xr-x 3 yarn yarn 4096 Jan  3 15:46 logs


i also find if i let abnormal node (CHBM224) run and shutdown the other
normal node ,when i submit a MR job use hive ,and the dir
/data/mrlocal/2/yarn/local/usercache/hive/filecache ,it's mode will flush
to 710 ,even i change the file to 755,but i test on a normal node (open one
normal node ,and shutdown others) ,the dir mode will not changed
# ls -l /data/mrlocal/2/yarn/local/usercache/hive/
total 16
drwx--x---   7 yarn yarn  4096 Jan  3 16:30 appcache
drwx--x--- 148 yarn yarn 12288 Jan  3 10:03 filecache


On Fri, Jan 3, 2014 at 3:52 PM, Bing Jiang jiangbinglo...@gmail.com wrote:

  Could you check your yarn-local directory authority? From the diagnosis,
 error occurs at mkdir in local directory.
 I guess something wrong with  local direcotry which is set as yarn local
 dir.



 2014/1/3 ch huang justlo...@gmail.com

 hi, i submit a MR job through hive ,but when it run stage-2 ,it failed
 why?

  it seems permission problem ,but i do not know which dir cause the
 problem

 Application application_1388730279827_0035 failed 1 times due to AM
 Container for appattempt_1388730279827_0035_01 exited with exitCode:
 -1000 due to: EPERM: Operation not permitted at
 org.apache.hadoop.io.nativeio.NativeIO.chmod(Native Method) at
 org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:581)
 at
 org.apache.hadoop.fs.RawLocalFileSystem.mkdirs(RawLocalFileSystem.java:388)
 at org.apache.hadoop.fs.FileSystem.primitiveMkdir(FileSystem.java:1041) at
 org.apache.hadoop.fs.DelegateToFileSystem.mkdir(DelegateToFileSystem.java:150)
 at org.apache.hadoop.fs.FilterFs.mkdir(FilterFs.java:190) at
 org.apache.hadoop.fs.FileContext$4.next(FileContext.java:698) at
 org.apache.hadoop.fs.FileContext$4.next(FileContext.java:695) at
 org.apache.hadoop.fs.FileContext$FSLinkResolver.resolve(FileContext.java:2325)
 at org.apache.hadoop.fs.FileContext.mkdir(FileContext.java:695) at
 org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.initDirs(ContainerLocalizer.java:385)
 at
 org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.runLocalization(ContainerLocalizer.java:130)
 at
 org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.startLocalizer(DefaultContainerExecutor.java:103)
 at
 org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.run(ResourceLocalizationService.java:861)
 .Failing this attempt.. Failing the application.




 --
 Bing Jiang
 Tel：(86)134-2619-1361
 weibo: http://weibo.com/jiangbinglover
 BLOG: www.binospace.com
 BLOG: http://blog.sina.com.cn/jiangbinglover
 Focus on distributed computing, HDFS/HBase

issue about running hive MR job in hadoop

2014-01-02 Thread ch huang

hi, i submit a MR job through hive ,but when it run stage-2 ,it failed why?

 it seems permission problem ,but i do not know which dir cause the problem

Application application_1388730279827_0035 failed 1 times due to AM
Container for appattempt_1388730279827_0035_01 exited with exitCode:
-1000 due to: EPERM: Operation not permitted at
org.apache.hadoop.io.nativeio.NativeIO.chmod(Native Method) at
org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:581)
at
org.apache.hadoop.fs.RawLocalFileSystem.mkdirs(RawLocalFileSystem.java:388)
at org.apache.hadoop.fs.FileSystem.primitiveMkdir(FileSystem.java:1041) at
org.apache.hadoop.fs.DelegateToFileSystem.mkdir(DelegateToFileSystem.java:150)
at org.apache.hadoop.fs.FilterFs.mkdir(FilterFs.java:190) at
org.apache.hadoop.fs.FileContext$4.next(FileContext.java:698) at
org.apache.hadoop.fs.FileContext$4.next(FileContext.java:695) at
org.apache.hadoop.fs.FileContext$FSLinkResolver.resolve(FileContext.java:2325)
at org.apache.hadoop.fs.FileContext.mkdir(FileContext.java:695) at
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.initDirs(ContainerLocalizer.java:385)
at
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.runLocalization(ContainerLocalizer.java:130)
at
org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.startLocalizer(DefaultContainerExecutor.java:103)
at
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.run(ResourceLocalizationService.java:861)
.Failing this attempt.. Failing the application.

issue about hadoop streaming

2013-12-25 Thread ch huang

hi,maillist:

   i read the doc about hadoop streaming,is it possible to construct job
chain through pipeline and hadoop streaming ?
if the first job like this
first job : hadoop jar /usr/lib/hadoop-mapreduce/hadoop-streaming.jar
-input /alex/messages -output /alex/stout4 -mapper /bin/cat -reducer /tmp/
mycount.pl -file /tmp/mycount.pl

so i want to let the first job output become the second job input ,if can
,how to do it? thanks!

why the empty file need occupy one split?

2013-12-24 Thread ch huang

hi,maillist:
   i read the code of FileInputFormat,and find it will make a split
for empty file,but i think it's no means do such things,and it will cause
MR framework create a extra map task to do things,anyone can explain?

what is MPP,HAWQ,and the relation between them and hadoop?

2013-12-16 Thread ch huang

hi,maillist:
ATT

Re: issue about no class find in running MR job

2013-12-15 Thread ch huang

i think it's a official uage ,you can type hadoop ,and read the last line
of the help output
,and i use CDH 4.4 ,i do not know if community version is support this usage


On Sat, Dec 14, 2013 at 2:27 AM, Vinod Kumar Vavilapalli 
vino...@hortonworks.com wrote:

 That is not the correct usage. You should do hadoop jar your-jar-name
 main-class-name. Or if you are adventurous, directly invoke your class
 using java and setting appropriate classpath.

   Thanks,
 +Vinod

  On Dec 12, 2013, at 6:11 PM, ch huang justlo...@gmail.com wrote:

 hadoop ../test/WordCount



 CONFIDENTIALITY NOTICE
 NOTICE: This message is intended for the use of the individual or entity
 to which it is addressed and may contain information that is confidential,
 privileged and exempt from disclosure under applicable law. If the reader
 of this message is not the intended recipient, you are hereby notified that
 any printing, copying, dissemination, distribution, disclosure or
 forwarding of this communication is strictly prohibited. If you have
 received this communication in error, please contact the sender immediately
 and delete it from your system. Thank You.

issue about no class find in running MR job

2013-12-12 Thread ch huang

hi,maillist:

 i rewrite WordCount.java and try to compile and run  it but it
say not find main class ,why?

[root@CHBM224 myMR]# cat WordCount.java
import java.io.IOException;
import java.util.StringTokenizer;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.GenericOptionsParser;
import org.apache.hadoop.util.ToolRunner;
import org.apache.hadoop.util.Tool;
public class WordCount extends Configured implements Tool {
  public static class TokenizerMapper
   extends MapperObject, Text, Text, IntWritable{
private final static IntWritable one = new IntWritable(1);
private Text word = new Text();
public void map(Object key, Text value, Context context
) throws IOException, InterruptedException {
  StringTokenizer itr = new StringTokenizer(value.toString());
  while (itr.hasMoreTokens()) {
word.set(itr.nextToken());
context.write(word, one);
  }
}
  }
  public static class IntSumReducer
   extends ReducerText,IntWritable,Text,IntWritable {
private IntWritable result = new IntWritable();
public void reduce(Text key, IterableIntWritable values,
   Context context
   ) throws IOException, InterruptedException {
  int sum = 0;
  for (IntWritable val : values) {
sum += val.get();
  }
  result.set(sum);
  context.write(key, result);
}
  }
  private static void usage() throws IOException {
System.err.println(teragen num rows output dir);
  }
  public int run(String[] args) throws IOException, InterruptedException,
ClassN
otFoundException {
Job job = Job.getInstance(getConf());
if (args.length != 2) {
  usage();
  return 2;
}
job.setJobName(wordcount);
job.setJarByClass(WordCount.class);
job.setMapperClass(TokenizerMapper.class);
job.setCombinerClass(IntSumReducer.class);
job.setReducerClass(IntSumReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
return job.waitForCompletion(true) ? 0 : 1;
  }
  public static void main(String[] args) throws Exception {
int res = ToolRunner.run(new Configuration(), new WordCount(),
args);
System.exit(res);
  }
}
[root@CHBM224 myMR]# javac -cp
'/usr/lib/hadoop/*:/usr/lib/hadoop-mapreduce/*' -d ../test WordCount.java
[root@CHBM224 myMR]#  hadoop ../test/WordCount
Error: Could not find or load main class ...test.WordCount

Re: issue about no class find in running MR job

2013-12-12 Thread ch huang

no ,it need not , hadoop can run class directly,i try in other box,it work
fine,


# hadoop com/test/demo/WordCount
Error: Could not find or load main class com.test.demo.WordCount
[root@CHBM224 test]# hadoop classpath
/etc/hadoop/conf:/usr/lib/hadoop/lib/*:/usr/lib/hadoop/.//*:/usr/lib/hadoop-hdfs/./:/usr/lib/hadoop-hdfs/lib/*:/usr/lib/hadoop-hdfs/.//*:/usr/lib/hadoop-yarn/lib/*:/usr/lib/hadoop-yarn/.//*:/usr/lib/hadoop-mapreduce/lib/*:/usr/lib/hadoop-mapreduce/.//*
[root@CHBM224 test]# echo $CLASSPATH
.:/usr/java/jdk1.7.0_25/lib/dt.jar:/usr/java/jdk1.7.0_25/lib/tools.jar
---copy class path to another box ,and it's work fine

# cd test/
[root@CH22 test]# hadoop com/test/demo/WordCount
teragen num rows output dir
[root@CH22 test]# hadoop classpath
/etc/hadoop/conf:/usr/lib/hadoop/lib/*:/usr/lib/hadoop/.//*::/usr/lib/hadoop/lib:/usr/lib/hadoop-hdfs/./:/usr/lib/hadoop-hdfs/lib/*:/usr/lib/hadoop-hdfs/.//*:/usr/lib/hadoop-yarn/lib/*:/usr/lib/hadoop-yarn/.//*:/usr/lib/hadoop-mapreduce/lib/*:/usr/lib/hadoop-mapreduce/.//*
[root@CH22 test]# echo $CLASSPATH
.:/usr/java/jdk1.6.0_35/lib/dt.jar:/usr/java/jdk1.6.0_35/lib/tools.jar


On Fri, Dec 13, 2013 at 10:17 AM, Tao Xiao xiaotao.cs@gmail.com wrote:

 how did you package and compile your jar ? did you specify the main class
 for the JAR file you generated ?


 2013/12/13 ch huang justlo...@gmail.com

 hi,maillist:

  i rewrite WordCount.java and try to compile and run  it but
 it say not find main class ,why?

 [root@CHBM224 myMR]# cat WordCount.java
 import java.io.IOException;
 import java.util.StringTokenizer;
 import org.apache.hadoop.conf.Configuration;
 import org.apache.hadoop.conf.Configured;
 import org.apache.hadoop.fs.Path;
 import org.apache.hadoop.io.IntWritable;
 import org.apache.hadoop.io.Text;
 import org.apache.hadoop.mapreduce.Job;
 import org.apache.hadoop.mapreduce.Mapper;
 import org.apache.hadoop.mapreduce.Reducer;
 import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
 import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
 import org.apache.hadoop.util.GenericOptionsParser;
 import org.apache.hadoop.util.ToolRunner;
 import org.apache.hadoop.util.Tool;
 public class WordCount extends Configured implements Tool {
   public static class TokenizerMapper
extends MapperObject, Text, Text, IntWritable{
 private final static IntWritable one = new IntWritable(1);
 private Text word = new Text();
 public void map(Object key, Text value, Context context
 ) throws IOException, InterruptedException {
   StringTokenizer itr = new StringTokenizer(value.toString());
   while (itr.hasMoreTokens()) {
 word.set(itr.nextToken());
 context.write(word, one);
   }
 }
   }
   public static class IntSumReducer
extends ReducerText,IntWritable,Text,IntWritable {
 private IntWritable result = new IntWritable();
 public void reduce(Text key, IterableIntWritable values,
Context context
) throws IOException, InterruptedException {
   int sum = 0;
   for (IntWritable val : values) {
 sum += val.get();
   }
   result.set(sum);
   context.write(key, result);
 }
   }
   private static void usage() throws IOException {
 System.err.println(teragen num rows output dir);
   }
   public int run(String[] args) throws IOException, InterruptedException,
 ClassN
 otFoundException {
 Job job = Job.getInstance(getConf());
 if (args.length != 2) {
   usage();
   return 2;
 }
 job.setJobName(wordcount);
 job.setJarByClass(WordCount.class);
 job.setMapperClass(TokenizerMapper.class);
 job.setCombinerClass(IntSumReducer.class);
 job.setReducerClass(IntSumReducer.class);
 job.setOutputKeyClass(Text.class);
 job.setOutputValueClass(IntWritable.class);
 FileInputFormat.addInputPath(job, new Path(args[0]));
 FileOutputFormat.setOutputPath(job, new Path(args[1]));
 return job.waitForCompletion(true) ? 0 : 1;
   }
   public static void main(String[] args) throws Exception {
 int res = ToolRunner.run(new Configuration(), new WordCount(),
 args);
 System.exit(res);
   }
 }
 [root@CHBM224 myMR]# javac -cp
 '/usr/lib/hadoop/*:/usr/lib/hadoop-mapreduce/*' -d ../test WordCount.java
 [root@CHBM224 myMR]#  hadoop ../test/WordCount
 Error: Could not find or load main class ...test.WordCount

Re: issue about no class find in running MR job

2013-12-12 Thread ch huang

i find the reason
no main class found

# hadoop classpath
/etc/hadoop/conf:/usr/lib/hadoop/lib/*:/usr/lib/hadoop/.//*:/usr/lib/hadoop-hdfs/./:/usr/lib/hadoop-hdfs/lib/*:/usr/lib/hadoop-hdfs/.//*:/usr/lib/hadoop-yarn/lib/*:/usr/lib/hadoop-yarn/.//*:/usr/lib/hadoop-mapreduce/lib/*:/usr/lib/hadoop-mapreduce/.//*:/usr/lib/hadoop/lib

normal

/etc/hadoop/conf:/usr/lib/hadoop/lib/*:/usr/lib/hadoop/.//*::/usr/lib/hadoop/lib:/usr/lib/hadoop-hdfs/./:/usr/lib/hadoop-hdfs/lib/*:/usr/lib/hadoop-hdfs/.//*:/usr/lib/hadoop-yarn/lib/*:/usr/lib/hadoop-yarn/.//*:/usr/lib/hadoop-mapreduce/lib/*:/usr/lib/hadoop-mapreduce/.//*

there exist ::   :) i just do not know the mean

On Fri, Dec 13, 2013 at 10:40 AM, ch huang justlo...@gmail.com wrote:

 no ,it need not , hadoop can run class directly,i try in other box,it work
 fine,


 # hadoop com/test/demo/WordCount
 Error: Could not find or load main class com.test.demo.WordCount
 [root@CHBM224 test]# hadoop classpath

 /etc/hadoop/conf:/usr/lib/hadoop/lib/*:/usr/lib/hadoop/.//*:/usr/lib/hadoop-hdfs/./:/usr/lib/hadoop-hdfs/lib/*:/usr/lib/hadoop-hdfs/.//*:/usr/lib/hadoop-yarn/lib/*:/usr/lib/hadoop-yarn/.//*:/usr/lib/hadoop-mapreduce/lib/*:/usr/lib/hadoop-mapreduce/.//*
 [root@CHBM224 test]# echo $CLASSPATH
 .:/usr/java/jdk1.7.0_25/lib/dt.jar:/usr/java/jdk1.7.0_25/lib/tools.jar
 ---copy class path to another box ,and it's work fine

 # cd test/
 [root@CH22 test]# hadoop com/test/demo/WordCount

 teragen num rows output dir
 [root@CH22 test]# hadoop classpath

 /etc/hadoop/conf:/usr/lib/hadoop/lib/*:/usr/lib/hadoop/.//*::/usr/lib/hadoop/lib:/usr/lib/hadoop-hdfs/./:/usr/lib/hadoop-hdfs/lib/*:/usr/lib/hadoop-hdfs/.//*:/usr/lib/hadoop-yarn/lib/*:/usr/lib/hadoop-yarn/.//*:/usr/lib/hadoop-mapreduce/lib/*:/usr/lib/hadoop-mapreduce/.//*
 [root@CH22 test]# echo $CLASSPATH
 .:/usr/java/jdk1.6.0_35/lib/dt.jar:/usr/java/jdk1.6.0_35/lib/tools.jar


  On Fri, Dec 13, 2013 at 10:17 AM, Tao Xiao xiaotao.cs@gmail.comwrote:

 how did you package and compile your jar ? did you specify the main class
 for the JAR file you generated ?


 2013/12/13 ch huang justlo...@gmail.com

 hi,maillist:

  i rewrite WordCount.java and try to compile and run  it but
 it say not find main class ,why?

 [root@CHBM224 myMR]# cat WordCount.java
 import java.io.IOException;
 import java.util.StringTokenizer;
 import org.apache.hadoop.conf.Configuration;
 import org.apache.hadoop.conf.Configured;
 import org.apache.hadoop.fs.Path;
 import org.apache.hadoop.io.IntWritable;
 import org.apache.hadoop.io.Text;
 import org.apache.hadoop.mapreduce.Job;
 import org.apache.hadoop.mapreduce.Mapper;
 import org.apache.hadoop.mapreduce.Reducer;
 import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
 import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
 import org.apache.hadoop.util.GenericOptionsParser;
 import org.apache.hadoop.util.ToolRunner;
 import org.apache.hadoop.util.Tool;
 public class WordCount extends Configured implements Tool {
   public static class TokenizerMapper
extends MapperObject, Text, Text, IntWritable{
 private final static IntWritable one = new IntWritable(1);
 private Text word = new Text();
 public void map(Object key, Text value, Context context
 ) throws IOException, InterruptedException {
   StringTokenizer itr = new StringTokenizer(value.toString());
   while (itr.hasMoreTokens()) {
 word.set(itr.nextToken());
 context.write(word, one);
   }
 }
   }
   public static class IntSumReducer
extends ReducerText,IntWritable,Text,IntWritable {
 private IntWritable result = new IntWritable();
 public void reduce(Text key, IterableIntWritable values,
Context context
) throws IOException, InterruptedException {
   int sum = 0;
   for (IntWritable val : values) {
 sum += val.get();
   }
   result.set(sum);
   context.write(key, result);
 }
   }
   private static void usage() throws IOException {
 System.err.println(teragen num rows output dir);
   }
   public int run(String[] args) throws IOException,
 InterruptedException,
 ClassN
 otFoundException {
 Job job = Job.getInstance(getConf());
 if (args.length != 2) {
   usage();
   return 2;
 }
 job.setJobName(wordcount);
 job.setJarByClass(WordCount.class);
 job.setMapperClass(TokenizerMapper.class);
 job.setCombinerClass(IntSumReducer.class);
 job.setReducerClass(IntSumReducer.class);
 job.setOutputKeyClass(Text.class);
 job.setOutputValueClass(IntWritable.class);
 FileInputFormat.addInputPath(job, new Path(args[0]));
 FileOutputFormat.setOutputPath(job, new Path(args[1]));
 return job.waitForCompletion(true) ? 0 : 1;
   }
   public static void main(String[] args) throws Exception {
 int res = ToolRunner.run(new Configuration

issue about file in DN datadir

2013-12-11 Thread ch huang

hi,maillist:
  i have a question about file which represent a block in
DN,here is my way to find a block ,i have a file part-m-0
and i find one replica of one block blk_-5451264646515882190_106793   on
box 192.168.10.224,when i search the datadir on 224
i find the meta file ,but no datafile why?

# sudo -u hdfs hdfs fsck /alex/terasort/10G-input/part-m-0 -files
-blocks -locations
Connecting to namenode via http://CHBM220:50070 http://chbm220:50070/
FSCK started by hdfs (auth:SIMPLE) from /192.168.10.224 for path
/alex/terasort/10G-input/part-m-0 at Wed Dec 11 14:45:15 CST 2013
/alex/terasort/10G-input/part-m-0 5 bytes, 8 block(s):  OK
0. BP-50684181-192.168.10.220-1383638483950:blk_1612709339511818235_106786
len=67108864 repl=3 [192.168.10.222:50010, 192.168.10.221:50010,
192.168.10.223:50010]
1. BP-50684181-192.168.10.220-1383638483950:blk_-3802055733518151718_106789
len=67108864 repl=3 [192.168.10.222:50010, 192.168.10.221:50010,
192.168.10.223:50010]
2. BP-50684181-192.168.10.220-1383638483950:blk_-1672420361561559829_106791
len=67108864 repl=3 [192.168.10.222:50010, 192.168.10.224:50010,
192.168.10.223:50010]
3. BP-50684181-192.168.10.220-1383638483950:blk_-5451264646515882190_106793
len=67108864 repl=3 [192.168.10.222:50010, 192.168.10.221:50010,
192.168.10.224:50010]
4. BP-50684181-192.168.10.220-1383638483950:blk_6624597853174216221_106795
len=67108864 repl=3 [192.168.10.222:50010, 192.168.10.224:50010,
192.168.10.221:50010]
5. BP-50684181-192.168.10.220-1383638483950:blk_-4947775334639504308_106797
len=67108864 repl=3 [192.168.10.222:50010, 192.168.10.224:50010,
192.168.10.223:50010]
6. BP-50684181-192.168.10.220-1383638483950:blk_214751650269427943_106799
len=67108864 repl=3 [192.168.10.222:50010, 192.168.10.221:50010,
192.168.10.224:50010]

# find /data -name 'blk_-5451264646515882190_106793*'
/data/dataspace/3/current/BP-50684181-192.168.10.220-1383638483950/current/finalized/subdir39/blk_-5451264646515882190_106793.meta
# ls
/data/dataspace/3/current/BP-50684181-192.168.10.220-1383638483950/current/finalized/subdir39/
blk_3810334848964580951  blk_4621466474283145207_106809.meta
blk_-5451264646515882190  blk_580162309124277323_106788.meta
blk_3810334848964580951_106801.meta  blk_516060569193828059
blk_-5451264646515882190_106793.meta
blk_4621466474283145207  blk_516060569193828059_106796.meta
blk_580162309124277323

Re: issue about corrupt block test

2013-12-11 Thread ch huang

you are right ,but i only find meta file why no block data file?
# find /data -name 'blk_-5451264646515882190_106793*'
/data/dataspace/3/current/BP-50684181-192.168.10.220-1383638483950/current/finalized/subdir39/blk_-5451264646515882190_106793.meta

# ls
/data/dataspace/3/current/BP-50684181-192.168.10.220-1383638483950/current/finalized/subdir39/
blk_3810334848964580951  blk_4621466474283145207_106809.meta
blk_-5451264646515882190  blk_580162309124277323_106788.meta
blk_3810334848964580951_106801.meta  blk_516060569193828059
blk_-5451264646515882190_106793.meta
blk_4621466474283145207  blk_516060569193828059_106796.meta
blk_580162309124277323


On Wed, Dec 11, 2013 at 3:16 PM, Harsh J ha...@cloudera.com wrote:

 Block files are not stored in a flat directory (to avoid FS limits of
 max files under a dir). Instead of looking for them right under
 finalized, issue a find query with the pattern instead and you
 should be able to spot it.

 On Wed, Dec 11, 2013 at 9:10 AM, ch huang justlo...@gmail.com wrote:
  hi,maillist:
   i try to corrupt a block of a file in my benchmark
 environment,
  as the following command i find blk_2504407693800874616_106252 ,it's
 replica
  on 192.168.10.224 is my target ,but i find all the datadir in
 192.168.10.224
  ,can not fine the datafile belongs to this replic ,why?
 
  # ls
 
 /data/dataspace/1/current/BP-50684181-192.168.10.220-1383638483950/current/finalized/blk_3717620888497075523_106232*
  ls: cannot access
 
 /data/dataspace/1/current/BP-50684181-192.168.10.220-1383638483950/current/finalized/blk_3717620888497075523_106232*:
  No such file or directory
  [root@CHBM224 conf]# ls
 
 /data/dataspace/1/current/BP-50684181-192.168.10.220-1383638483950/current/finalized/blk_2504407693800874616_106252*
  ls: cannot access
 
 /data/dataspace/1/current/BP-50684181-192.168.10.220-1383638483950/current/finalized/blk_2504407693800874616_106252*:
  No such file or directory
  [root@CHBM224 conf]# ls
 
 /data/dataspace/2/current/BP-50684181-192.168.10.220-1383638483950/current/finalized/blk_2504407693800874616_106252*
  ls: cannot access
 
 /data/dataspace/2/current/BP-50684181-192.168.10.220-1383638483950/current/finalized/blk_2504407693800874616_106252*:
  No such file or directory
  [root@CHBM224 conf]# ls
 
 /data/dataspace/3/current/BP-50684181-192.168.10.220-1383638483950/current/finalized/blk_2504407693800874616_106252*
  ls: cannot access
 
 /data/dataspace/3/current/BP-50684181-192.168.10.220-1383638483950/current/finalized/blk_2504407693800874616_106252*:
  No such file or directory
  [root@CHBM224 conf]# hdfs fsck /alex/terasort/1G-input/part-m-0
 -files
  -blocks -locations
  Connecting to namenode via http://CHBM220:50070 http://chbm220:50070/
  FSCK started by root (auth:SIMPLE) from /192.168.10.224 for path
  /alex/terasort/1G-input/part-m-0 at Wed Dec 11 11:35:42 CST 2013
  /alex/terasort/1G-input/part-m-0 1 bytes, 2 block(s):  OK
  0.
 BP-50684181-192.168.10.220-1383638483950:blk_3717620888497075523_106232
  len=67108864 repl=3 [192.168.10.222:50010, 192.168.10.223:50010,
  192.168.10.221:50010]
  1.
 BP-50684181-192.168.10.220-1383638483950:blk_2504407693800874616_106252
  len=32891136 repl=3 [192.168.10.222:50010, 192.168.10.221:50010,
  192.168.10.224:50010]



 --
 Harsh J

Re: how to handle the corrupt block in HDFS?

2013-12-11 Thread ch huang

the alert from my product env,i will test on my benchmark env,thanks

On Thu, Dec 12, 2013 at 2:33 AM, Adam Kawa kawa.a...@gmail.com wrote:

  I have only 1-node cluster, so I am not able to verify it when
 replication factor is bigger than 1.

  I run the fsck on a file that consists of 3 blocks, and 1 block has a
 corrupt replica. fsck told that the system is HEALTHY.

 When I restarted the DN, then the block scanner (BlockPoolSliceScanner)
 started and it detected a corrupted replica. Then I run fsck again on that
 file, and it told me that the system is CORRUPT.

 If you have a small (and non-production) cluster, can you restart your
 datandoes and run fsck again?



 2013/12/11 ch huang justlo...@gmail.com

 thanks for reply,but if the block just has  1 corrupt replica,hdfs fsck
 can not tell you which block of which file has a replica been
 corrupted,fsck just useful on all of one block's replica bad

 On Wed, Dec 11, 2013 at 10:01 AM, Adam Kawa kawa.a...@gmail.com wrote:

 When you identify a file with corrupt block(s), then you can locate the
 machines that stores its block by typing
 $ sudo -u hdfs hdfs fsck path-to-file -files -blocks -locations


 2013/12/11 Adam Kawa kawa.a...@gmail.com

 Maybe this can work for you
 $ sudo -u hdfs hdfs fsck / -list-corruptfileblocks
 ?


 2013/12/11 ch huang justlo...@gmail.com

 thanks for reply, what i do not know is how can i locate the block
 which has the corrupt replica,(so i can observe how long the corrupt
 replica will be removed and a new health replica replace it,because i get
 nagios alert for three days,i do not sure if it is the same corrupt 
 replica
 cause the alert ,and i do not know the interval of hdfs check corrupt
 replica and clean it)


 On Tue, Dec 10, 2013 at 6:20 PM, Vinayakumar B 
 vinayakuma...@huawei.com wrote:

  Hi ch huang,



 It may seem strange, but the fact is,

 *CorruptBlocks* through JMX means *“Number of blocks with corrupt
 replicas”. May not be all replicas are corrupt.  *This you can check
 though jconsole for description.



 Where as *Corrupt blocks* through fsck means, *blocks with all
 replicas corrupt(non-recoverable)/ missing.*



 In your case, may be one of the replica is corrupt, not all replicas
 of same block. This corrupt replica will be deleted automatically if one
 more datanode available in your cluster and block replicated to that.





 Related to replication 10, As Peter Marron said, *some of the
 important files of the mapreduce job will set the replication of 10, to
 make it accessible faster and launch map tasks faster. *

 Anyway, if the job is success these files will be deleted
 auomatically. I think only in some cases if the jobs are killed in 
 between
 these files will remain in hdfs showing underreplicated blocks.



 Thanks and Regards,

 Vinayakumar B



 *From:* Peter Marron [mailto:peter.mar...@trilliumsoftware.com]
 *Sent:* 10 December 2013 14:19
 *To:* user@hadoop.apache.org
 *Subject:* RE: how to handle the corrupt block in HDFS?



 Hi,



 I am sure that there are others who will answer this better, but
 anyway.

 The default replication level for files in HDFS is 3 and so most
 files that you

 see will have a replication level of 3. However when you run a
 Map/Reduce

 job the system knows in advance that every node will need a copy of

 certain files. Specifically the job.xml and the various jars
 containing

 classes that will be needed to run the mappers and reducers. So the

 system arranges that some of these files have a higher replication
 level. This increases

 the chances that a copy will be found locally.

 By default this higher replication level is 10.



 This can seem a little odd on a cluster where you only have, say, 3
 nodes.

 Because it means that you will almost always have some blocks that
 are marked

 under-replicated. I think that there was some discussion a while back
 to change

 this to make the replication level something like min(10, #number of
 nodes)

 However, as I recall, the general consensus was that this was extra

 complexity that wasn’t really worth it. If it ain’t broke…



 Hope that this helps.



 *Peter Marron*

 Senior Developer, Research  Development



 Office: +44 *(0) 118-940-7609*  peter.mar...@trilliumsoftware.com

 Theale Court First Floor, 11-13 High Street, Theale, RG7 5AH, UK

   https://www.facebook.com/pages/Trillium-Software/109184815778307

  https://twitter.com/TrilliumSW

  http://www.linkedin.com/company/17710



 *www.trilliumsoftware.com http://www.trilliumsoftware.com/*

 Be Certain About Your Data. Be Trillium Certain.



 *From:* ch huang [mailto:justlo...@gmail.com justlo...@gmail.com]
 *Sent:* 10 December 2013 01:21
 *To:* user@hadoop.apache.org
 *Subject:* Re: how to handle the corrupt block in HDFS?



 more strange , in my HDFS cluster ,every block has three replicas,but
 i find some one has ten replicas ,why?



 # sudo -u hdfs hadoop fs -ls
 /data/hisstage/helen/.staging/job_1385542328307_0915
 Found 5

Re: how to handle the corrupt block in HDFS?

2013-12-11 Thread ch huang

and is fsck report data from BlockPoolSliceScanner? it seems run once each
3 weeks
can i restart DN one by one without interrupt the job which is running?

On Thu, Dec 12, 2013 at 2:33 AM, Adam Kawa kawa.a...@gmail.com wrote:

  I have only 1-node cluster, so I am not able to verify it when
 replication factor is bigger than 1.

  I run the fsck on a file that consists of 3 blocks, and 1 block has a
 corrupt replica. fsck told that the system is HEALTHY.

 When I restarted the DN, then the block scanner (BlockPoolSliceScanner)
 started and it detected a corrupted replica. Then I run fsck again on that
 file, and it told me that the system is CORRUPT.

 If you have a small (and non-production) cluster, can you restart your
 datandoes and run fsck again?



 2013/12/11 ch huang justlo...@gmail.com

 thanks for reply,but if the block just has  1 corrupt replica,hdfs fsck
 can not tell you which block of which file has a replica been
 corrupted,fsck just useful on all of one block's replica bad

 On Wed, Dec 11, 2013 at 10:01 AM, Adam Kawa kawa.a...@gmail.com wrote:

 When you identify a file with corrupt block(s), then you can locate the
 machines that stores its block by typing
 $ sudo -u hdfs hdfs fsck path-to-file -files -blocks -locations


 2013/12/11 Adam Kawa kawa.a...@gmail.com

 Maybe this can work for you
 $ sudo -u hdfs hdfs fsck / -list-corruptfileblocks
 ?


 2013/12/11 ch huang justlo...@gmail.com

 thanks for reply, what i do not know is how can i locate the block
 which has the corrupt replica,(so i can observe how long the corrupt
 replica will be removed and a new health replica replace it,because i get
 nagios alert for three days,i do not sure if it is the same corrupt 
 replica
 cause the alert ,and i do not know the interval of hdfs check corrupt
 replica and clean it)


 On Tue, Dec 10, 2013 at 6:20 PM, Vinayakumar B 
 vinayakuma...@huawei.com wrote:

  Hi ch huang,



 It may seem strange, but the fact is,

 *CorruptBlocks* through JMX means *“Number of blocks with corrupt
 replicas”. May not be all replicas are corrupt.  *This you can check
 though jconsole for description.



 Where as *Corrupt blocks* through fsck means, *blocks with all
 replicas corrupt(non-recoverable)/ missing.*



 In your case, may be one of the replica is corrupt, not all replicas
 of same block. This corrupt replica will be deleted automatically if one
 more datanode available in your cluster and block replicated to that.





 Related to replication 10, As Peter Marron said, *some of the
 important files of the mapreduce job will set the replication of 10, to
 make it accessible faster and launch map tasks faster. *

 Anyway, if the job is success these files will be deleted
 auomatically. I think only in some cases if the jobs are killed in 
 between
 these files will remain in hdfs showing underreplicated blocks.



 Thanks and Regards,

 Vinayakumar B



 *From:* Peter Marron [mailto:peter.mar...@trilliumsoftware.com]
 *Sent:* 10 December 2013 14:19
 *To:* user@hadoop.apache.org
 *Subject:* RE: how to handle the corrupt block in HDFS?



 Hi,



 I am sure that there are others who will answer this better, but
 anyway.

 The default replication level for files in HDFS is 3 and so most
 files that you

 see will have a replication level of 3. However when you run a
 Map/Reduce

 job the system knows in advance that every node will need a copy of

 certain files. Specifically the job.xml and the various jars
 containing

 classes that will be needed to run the mappers and reducers. So the

 system arranges that some of these files have a higher replication
 level. This increases

 the chances that a copy will be found locally.

 By default this higher replication level is 10.



 This can seem a little odd on a cluster where you only have, say, 3
 nodes.

 Because it means that you will almost always have some blocks that
 are marked

 under-replicated. I think that there was some discussion a while back
 to change

 this to make the replication level something like min(10, #number of
 nodes)

 However, as I recall, the general consensus was that this was extra

 complexity that wasn’t really worth it. If it ain’t broke…



 Hope that this helps.



 *Peter Marron*

 Senior Developer, Research  Development



 Office: +44 *(0) 118-940-7609*  peter.mar...@trilliumsoftware.com

 Theale Court First Floor, 11-13 High Street, Theale, RG7 5AH, UK

   https://www.facebook.com/pages/Trillium-Software/109184815778307

  https://twitter.com/TrilliumSW

  http://www.linkedin.com/company/17710



 *www.trilliumsoftware.com http://www.trilliumsoftware.com/*

 Be Certain About Your Data. Be Trillium Certain.



 *From:* ch huang [mailto:justlo...@gmail.com justlo...@gmail.com]
 *Sent:* 10 December 2013 01:21
 *To:* user@hadoop.apache.org
 *Subject:* Re: how to handle the corrupt block in HDFS?



 more strange , in my HDFS cluster ,every block has three replicas,but
 i find some one has ten replicas ,why?



 # sudo -u

Re: issue about Shuffled Maps in MR job summary

2013-12-11 Thread ch huang

hi,
suppose i have 5-worknode cluster,each worknode can allocate 40G mem
,and i do not care map task,be cause the map task in my job finished within
half a minuter,as my observe the real slow task is reduce, i allocate 12G
to each reduce task,so each worknode can support 3 reduce parallel,and the
whole cluster can support 15 reducer,and i run the job with all 15 reducer,
and i do not know if i increase reducer number from 15 to 30 ,each reduce
allocate 6G MEM,that will speed the job or not ,the job run on my product
env, it run nearly 1 week,it still not finished

On Wed, Dec 11, 2013 at 9:50 PM, java8964 java8...@hotmail.com wrote:

  The whole job complete time depends on a lot of factors. Are you sure
 the reducers part is the bottleneck?

 Also, it also depends on how many Reducer input groups it has in your MR
 job. If you only have 20 reducer groups, even you jump your reducer count
 to 40, then the epoch of reducers part won't have too much change, as the
 additional 20 reducer task won't get data to process.

 If you have a lot of reducer input groups, and your cluster does have
 capacity at this time, and your also have a lot idle reducer slot, then
 increase your reducer count should decrease your whole job complete time.

 Make sense?

 Yong

  --
 Date: Wed, 11 Dec 2013 14:20:24 +0800
 Subject: Re: issue about Shuffled Maps in MR job summary
 From: justlo...@gmail.com
 To: user@hadoop.apache.org


 i read the doc, and find if i have 8 reducer ,a map task will output 8
 partition ,each partition will be send to a different reducer, so if i
 increase reduce number ,the partition number increase ,but the volume on
 network traffic is same,why sometime ,increase reducer number will not
 decrease job complete time ?

 On Wed, Dec 11, 2013 at 1:48 PM, Vinayakumar B 
 vinayakuma...@huawei.comwrote:

  It looks simple, J



 Shuffled Maps= Number of Map Tasks * Number of Reducers



 Thanks and Regards,

 Vinayakumar B



 *From:* ch huang [mailto:justlo...@gmail.com]
 *Sent:* 11 December 2013 10:56
 *To:* user@hadoop.apache.org
 *Subject:* issue about Shuffled Maps in MR job summary



 hi,maillist:

i run terasort with 16 reducers and 8 reducers,when i double
 reducer number, the Shuffled maps is also double ,my question is the job
 only run 20 map tasks (total input file is 10,and each file is 100M,my
 block size is 64M,so split is 20) why i need shuffle 160 maps in 8 reducers
 run and 320 maps in 16 reducers run?how to caculate the shuffle maps number?



 16 reducer summary output:





  Shuffled Maps =320



 8 reducer summary output:



 Shuffled Maps =160

how to corrupt a replica of a block by manually?

2013-12-11 Thread ch huang

hi,maillist:
what is the sample way to corrupt a replica of a block ,i opened a
replica data file ,and delete a line,than use fsck ,nothing corrupt,
 should the DN  be restarted?

is mapreduce.task.io.sort.mb control both map merge buffer and reduce merge buffer?

2013-12-11 Thread ch huang

hi,maillist:
  Due to the heavy job on reduce task, i try to increase buffer
size for sort merge,i wander if i increase mapreduce.task.io.sort.mb from
100m(default value) to 1G will cause each map task  sort merge buffer also
become 1G?

Re: issue about Shuffled Maps in MR job summary

2013-12-11 Thread ch huang

one of important things is my input file is very small ,each file less than
10M,and i have a huge number of files

On Thu, Dec 12, 2013 at 9:58 AM, java8964 java8...@hotmail.com wrote:

  Assume the block size is 128M, and your mapper each finishes within half
 minute, then there is not too much logic in your mapper, as it can finish
 processing 128M around 30 seconds. If your reducers cannot finish with 1
 week, then something is wrong.

 So you may need to find out following:

 1) How many mappers generated in your MR job?
 2) Are they all finished? (Check them in the jobtracker through web or
 command line)
 3) How many reducers in this job?
 4) Are reducers starting? What stage are they in? Copying/Sorting/Reducing?
 5) If in the reducing stage, check the userlog of reducers. Is your code
 running now?

 All these information you can find out from the Job Tracker web UI.

 Yong

  --
 Date: Thu, 12 Dec 2013 09:03:29 +0800

 Subject: Re: issue about Shuffled Maps in MR job summary
 From: justlo...@gmail.com
 To: user@hadoop.apache.org

 hi,
 suppose i have 5-worknode cluster,each worknode can allocate 40G mem
 ,and i do not care map task,be cause the map task in my job finished within
 half a minuter,as my observe the real slow task is reduce, i allocate 12G
 to each reduce task,so each worknode can support 3 reduce parallel,and the
 whole cluster can support 15 reducer,and i run the job with all 15 reducer,
 and i do not know if i increase reducer number from 15 to 30 ,each reduce
 allocate 6G MEM,that will speed the job or not ,the job run on my product
 env, it run nearly 1 week,it still not finished

 On Wed, Dec 11, 2013 at 9:50 PM, java8964 java8...@hotmail.com wrote:

  The whole job complete time depends on a lot of factors. Are you sure
 the reducers part is the bottleneck?

 Also, it also depends on how many Reducer input groups it has in your MR
 job. If you only have 20 reducer groups, even you jump your reducer count
 to 40, then the epoch of reducers part won't have too much change, as the
 additional 20 reducer task won't get data to process.

 If you have a lot of reducer input groups, and your cluster does have
 capacity at this time, and your also have a lot idle reducer slot, then
 increase your reducer count should decrease your whole job complete time.

 Make sense?

 Yong

  --
 Date: Wed, 11 Dec 2013 14:20:24 +0800
 Subject: Re: issue about Shuffled Maps in MR job summary
 From: justlo...@gmail.com
 To: user@hadoop.apache.org


 i read the doc, and find if i have 8 reducer ,a map task will output 8
 partition ,each partition will be send to a different reducer, so if i
 increase reduce number ,the partition number increase ,but the volume on
 network traffic is same,why sometime ,increase reducer number will not
 decrease job complete time ?

 On Wed, Dec 11, 2013 at 1:48 PM, Vinayakumar B 
 vinayakuma...@huawei.comwrote:

  It looks simple, J

 Shuffled Maps= Number of Map Tasks * Number of Reducers

 Thanks and Regards,
 Vinayakumar B

 *From:* ch huang [mailto:justlo...@gmail.com]
 *Sent:* 11 December 2013 10:56
 *To:* user@hadoop.apache.org
 *Subject:* issue about Shuffled Maps in MR job summary

 hi,maillist:
i run terasort with 16 reducers and 8 reducers,when i double
 reducer number, the Shuffled maps is also double ,my question is the job
 only run 20 map tasks (total input file is 10,and each file is 100M,my
 block size is 64M,so split is 20) why i need shuffle 160 maps in 8 reducers
 run and 320 maps in 16 reducers run?how to caculate the shuffle maps number?

 16 reducer summary output:


  Shuffled Maps =320

  8 reducer summary output:

 Shuffled Maps =160

Re: issue about running example job use custom mapreduce var

2013-12-10 Thread ch huang

yes,you are right ,thanks

On Wed, Dec 11, 2013 at 7:16 AM, Adam Kawa kawa.a...@gmail.com wrote:

  Accidentally, I clicked Sent by a mistake. Plase try:

 hadoop jar /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar
 terasort *-Dmapreduce.job.reduces=34* /alex/terasort/1G-input
 /alex/terasort/1G-output


 2013/12/11 Adam Kawa kawa.a...@gmail.com

 Please try



 2013/12/10 ch huang justlo...@gmail.com

 hi,maillist:
 i try assign reduce number in commandline but seems not
 useful,i run tera sort like this

 # hadoop jar /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar
 terasort /alex/terasort/1G-input /alex/terasort/1G-output
 -Dmapreduce.job.reduces=34

 so default mapred-site.xml assign reducer number is 16,i try run it with
 34 reducers,but the job run still as 16 reducers number,why?

 here is some output:

 Job Counters
 Launched map tasks=1
 Launched reduce tasks=16
 Rack-local map tasks=1
 Total time spent by all maps in occupied slots (ms)=2318
 Total time spent by all reduces in occupied slots
 (ms)=99714

Re: how to handle the corrupt block in HDFS?

2013-12-10 Thread ch huang

By default this higher replication level is 10. 
is this value can be control via some option or variable? i only hive a
5-worknode cluster,and i think 5 replicas should be better,because every
node can get a local replica.

another question is ,why hdfs fsck check the cluster is healthy and no
corrupt block,but i see one corrupt block though checking NN metrics?
curl http://NNIP:50070/jmx http://nnip:50070/jmx ,thanks


On Tue, Dec 10, 2013 at 4:48 PM, Peter Marron 
peter.mar...@trilliumsoftware.com wrote:

  Hi,



 I am sure that there are others who will answer this better, but anyway.

 The default replication level for files in HDFS is 3 and so most files
 that you

 see will have a replication level of 3. However when you run a Map/Reduce

 job the system knows in advance that every node will need a copy of

 certain files. Specifically the job.xml and the various jars containing

 classes that will be needed to run the mappers and reducers. So the

 system arranges that some of these files have a higher replication level.
 This increases

 the chances that a copy will be found locally.

 By default this higher replication level is 10.



 This can seem a little odd on a cluster where you only have, say, 3 nodes.

 Because it means that you will almost always have some blocks that are
 marked

 under-replicated. I think that there was some discussion a while back to
 change

 this to make the replication level something like min(10, #number of nodes)

 However, as I recall, the general consensus was that this was extra

 complexity that wasn’t really worth it. If it ain’t broke…



 Hope that this helps.



 *Peter Marron*

 Senior Developer, Research  Development



 Office: +44 *(0) 118-940-7609*  peter.mar...@trilliumsoftware.com

 Theale Court First Floor, 11-13 High Street, Theale, RG7 5AH, UK

   https://www.facebook.com/pages/Trillium-Software/109184815778307

  https://twitter.com/TrilliumSW

  http://www.linkedin.com/company/17710



 *www.trilliumsoftware.com http://www.trilliumsoftware.com/*

 Be Certain About Your Data. Be Trillium Certain.



 *From:* ch huang [mailto:justlo...@gmail.com]
 *Sent:* 10 December 2013 01:21
 *To:* user@hadoop.apache.org
 *Subject:* Re: how to handle the corrupt block in HDFS?



 more strange , in my HDFS cluster ,every block has three replicas,but i
 find some one has ten replicas ,why?



 # sudo -u hdfs hadoop fs -ls
 /data/hisstage/helen/.staging/job_1385542328307_0915
 Found 5 items
 -rw-r--r--   3 helen hadoop  7 2013-11-29 14:01
 /data/hisstage/helen/.staging/job_1385542328307_0915/appTokens
 -rw-r--r--  10 helen hadoop2977839 2013-11-29 14:01
 /data/hisstage/helen/.staging/job_1385542328307_0915/job.jar
 -rw-r--r--  10 helen hadoop   3696 2013-11-29 14:01
 /data/hisstage/helen/.staging/job_1385542328307_0915/job.split

  On Tue, Dec 10, 2013 at 9:15 AM, ch huang justlo...@gmail.com wrote:

 the strange thing is when i use the following command i find 1 corrupt
 block



 #  curl -s http://ch11:50070/jmx |grep orrupt
 CorruptBlocks : 1,

 but when i run hdfs fsck / , i get none ,everything seems fine



 # sudo -u hdfs hdfs fsck /

 



 Status: HEALTHY
  Total size:1479728140875 B (Total open files size: 1677721600 B)
  Total dirs:21298
  Total files:   100636 (Files currently being written: 25)
  Total blocks (validated):  119788 (avg. block size 12352891 B) (Total
 open file blocks (not validated): 37)
  Minimally replicated blocks:   119788 (100.0 %)
  Over-replicated blocks:0 (0.0 %)
  Under-replicated blocks:   166 (0.13857816 %)
  Mis-replicated blocks: 0 (0.0 %)
  Default replication factor:3
  Average block replication: 3.0027633
  Corrupt blocks:0
  Missing replicas:  831 (0.23049656 %)
  Number of data-nodes:  5
  Number of racks:   1
 FSCK ended at Tue Dec 10 09:14:48 CST 2013 in 3276 milliseconds


 The filesystem under path '/' is HEALTHY

   On Tue, Dec 10, 2013 at 8:32 AM, ch huang justlo...@gmail.com wrote:

 hi,maillist:

 my nagios alert me that there is a corrupt block in HDFS all
 day,but i do not know how to remove it,and if the HDFS will handle this
 automaticlly? and if remove the corrupt block will cause any data
 lost?thanks





image001.pngimage003.pngimage004.pngimage002.png

Re: how to handle the corrupt block in HDFS?

2013-12-10 Thread ch huang

thanks for reply, what i do not know is how can i locate the block which
has the corrupt replica,(so i can observe how long the corrupt replica will
be removed and a new health replica replace it,because i get nagios alert
for three days,i do not sure if it is the same corrupt replica cause the
alert ,and i do not know the interval of hdfs check corrupt replica and
clean it)

On Tue, Dec 10, 2013 at 6:20 PM, Vinayakumar B vinayakuma...@huawei.comwrote:

  Hi ch huang,



 It may seem strange, but the fact is,

 *CorruptBlocks* through JMX means *“Number of blocks with corrupt
 replicas”. May not be all replicas are corrupt.  *This you can check
 though jconsole for description.



 Where as *Corrupt blocks* through fsck means, *blocks with all replicas
 corrupt(non-recoverable)/ missing.*



 In your case, may be one of the replica is corrupt, not all replicas of
 same block. This corrupt replica will be deleted automatically if one more
 datanode available in your cluster and block replicated to that.





 Related to replication 10, As Peter Marron said, *some of the important
 files of the mapreduce job will set the replication of 10, to make it
 accessible faster and launch map tasks faster. *

 Anyway, if the job is success these files will be deleted auomatically. I
 think only in some cases if the jobs are killed in between these files will
 remain in hdfs showing underreplicated blocks.



 Thanks and Regards,

 Vinayakumar B



 *From:* Peter Marron [mailto:peter.mar...@trilliumsoftware.com]
 *Sent:* 10 December 2013 14:19
 *To:* user@hadoop.apache.org
 *Subject:* RE: how to handle the corrupt block in HDFS?



 Hi,



 I am sure that there are others who will answer this better, but anyway.

 The default replication level for files in HDFS is 3 and so most files
 that you

 see will have a replication level of 3. However when you run a Map/Reduce

 job the system knows in advance that every node will need a copy of

 certain files. Specifically the job.xml and the various jars containing

 classes that will be needed to run the mappers and reducers. So the

 system arranges that some of these files have a higher replication level.
 This increases

 the chances that a copy will be found locally.

 By default this higher replication level is 10.



 This can seem a little odd on a cluster where you only have, say, 3 nodes.

 Because it means that you will almost always have some blocks that are
 marked

 under-replicated. I think that there was some discussion a while back to
 change

 this to make the replication level something like min(10, #number of nodes)

 However, as I recall, the general consensus was that this was extra

 complexity that wasn’t really worth it. If it ain’t broke…



 Hope that this helps.



 *Peter Marron*

 Senior Developer, Research  Development



 Office: +44 *(0) 118-940-7609*  peter.mar...@trilliumsoftware.com

 Theale Court First Floor, 11-13 High Street, Theale, RG7 5AH, UK

   https://www.facebook.com/pages/Trillium-Software/109184815778307

  https://twitter.com/TrilliumSW

  http://www.linkedin.com/company/17710



 *www.trilliumsoftware.com http://www.trilliumsoftware.com/*

 Be Certain About Your Data. Be Trillium Certain.



 *From:* ch huang [mailto:justlo...@gmail.com justlo...@gmail.com]
 *Sent:* 10 December 2013 01:21
 *To:* user@hadoop.apache.org
 *Subject:* Re: how to handle the corrupt block in HDFS?



 more strange , in my HDFS cluster ,every block has three replicas,but i
 find some one has ten replicas ,why?



 # sudo -u hdfs hadoop fs -ls
 /data/hisstage/helen/.staging/job_1385542328307_0915
 Found 5 items
 -rw-r--r--   3 helen hadoop  7 2013-11-29 14:01
 /data/hisstage/helen/.staging/job_1385542328307_0915/appTokens
 -rw-r--r--  10 helen hadoop2977839 2013-11-29 14:01
 /data/hisstage/helen/.staging/job_1385542328307_0915/job.jar
 -rw-r--r--  10 helen hadoop   3696 2013-11-29 14:01
 /data/hisstage/helen/.staging/job_1385542328307_0915/job.split

 On Tue, Dec 10, 2013 at 9:15 AM, ch huang justlo...@gmail.com wrote:

 the strange thing is when i use the following command i find 1 corrupt
 block



 #  curl -s http://ch11:50070/jmx |grep orrupt
 CorruptBlocks : 1,

 but when i run hdfs fsck / , i get none ,everything seems fine



 # sudo -u hdfs hdfs fsck /

 



 Status: HEALTHY
  Total size:1479728140875 B (Total open files size: 1677721600 B)
  Total dirs:21298
  Total files:   100636 (Files currently being written: 25)
  Total blocks (validated):  119788 (avg. block size 12352891 B) (Total
 open file blocks (not validated): 37)
  Minimally replicated blocks:   119788 (100.0 %)
  Over-replicated blocks:0 (0.0 %)
  Under-replicated blocks:   166 (0.13857816 %)
  Mis-replicated blocks: 0 (0.0 %)
  Default replication factor:3
  Average block replication: 3.0027633
  Corrupt blocks:0
  Missing replicas:  831

Re: how to handle the corrupt block in HDFS?

2013-12-10 Thread ch huang

thanks for reply,but if the block just has  1 corrupt replica,hdfs fsck can
not tell you which block of which file has a replica been corrupted,fsck
just useful on all of one block's replica bad

On Wed, Dec 11, 2013 at 10:01 AM, Adam Kawa kawa.a...@gmail.com wrote:

 When you identify a file with corrupt block(s), then you can locate the
 machines that stores its block by typing
 $ sudo -u hdfs hdfs fsck path-to-file -files -blocks -locations


 2013/12/11 Adam Kawa kawa.a...@gmail.com

 Maybe this can work for you
 $ sudo -u hdfs hdfs fsck / -list-corruptfileblocks
 ?


 2013/12/11 ch huang justlo...@gmail.com

 thanks for reply, what i do not know is how can i locate the block which
 has the corrupt replica,(so i can observe how long the corrupt replica will
 be removed and a new health replica replace it,because i get nagios alert
 for three days,i do not sure if it is the same corrupt replica cause the
 alert ,and i do not know the interval of hdfs check corrupt replica and
 clean it)


 On Tue, Dec 10, 2013 at 6:20 PM, Vinayakumar B vinayakuma...@huawei.com
  wrote:

  Hi ch huang,



 It may seem strange, but the fact is,

 *CorruptBlocks* through JMX means *“Number of blocks with corrupt
 replicas”. May not be all replicas are corrupt.  *This you can check
 though jconsole for description.



 Where as *Corrupt blocks* through fsck means, *blocks with all
 replicas corrupt(non-recoverable)/ missing.*



 In your case, may be one of the replica is corrupt, not all replicas of
 same block. This corrupt replica will be deleted automatically if one more
 datanode available in your cluster and block replicated to that.





 Related to replication 10, As Peter Marron said, *some of the
 important files of the mapreduce job will set the replication of 10, to
 make it accessible faster and launch map tasks faster. *

 Anyway, if the job is success these files will be deleted auomatically.
 I think only in some cases if the jobs are killed in between these files
 will remain in hdfs showing underreplicated blocks.



 Thanks and Regards,

 Vinayakumar B



 *From:* Peter Marron [mailto:peter.mar...@trilliumsoftware.com]
 *Sent:* 10 December 2013 14:19
 *To:* user@hadoop.apache.org
 *Subject:* RE: how to handle the corrupt block in HDFS?



 Hi,



 I am sure that there are others who will answer this better, but anyway.

 The default replication level for files in HDFS is 3 and so most files
 that you

 see will have a replication level of 3. However when you run a
 Map/Reduce

 job the system knows in advance that every node will need a copy of

 certain files. Specifically the job.xml and the various jars containing

 classes that will be needed to run the mappers and reducers. So the

 system arranges that some of these files have a higher replication
 level. This increases

 the chances that a copy will be found locally.

 By default this higher replication level is 10.



 This can seem a little odd on a cluster where you only have, say, 3
 nodes.

 Because it means that you will almost always have some blocks that are
 marked

 under-replicated. I think that there was some discussion a while back
 to change

 this to make the replication level something like min(10, #number of
 nodes)

 However, as I recall, the general consensus was that this was extra

 complexity that wasn’t really worth it. If it ain’t broke…



 Hope that this helps.



 *Peter Marron*

 Senior Developer, Research  Development



 Office: +44 *(0) 118-940-7609*  peter.mar...@trilliumsoftware.com

 Theale Court First Floor, 11-13 High Street, Theale, RG7 5AH, UK

   https://www.facebook.com/pages/Trillium-Software/109184815778307

  https://twitter.com/TrilliumSW

  http://www.linkedin.com/company/17710



 *www.trilliumsoftware.com http://www.trilliumsoftware.com/*

 Be Certain About Your Data. Be Trillium Certain.



 *From:* ch huang [mailto:justlo...@gmail.com justlo...@gmail.com]
 *Sent:* 10 December 2013 01:21
 *To:* user@hadoop.apache.org
 *Subject:* Re: how to handle the corrupt block in HDFS?



 more strange , in my HDFS cluster ,every block has three replicas,but i
 find some one has ten replicas ,why?



 # sudo -u hdfs hadoop fs -ls
 /data/hisstage/helen/.staging/job_1385542328307_0915
 Found 5 items
 -rw-r--r--   3 helen hadoop  7 2013-11-29 14:01
 /data/hisstage/helen/.staging/job_1385542328307_0915/appTokens
 -rw-r--r--  10 helen hadoop2977839 2013-11-29 14:01
 /data/hisstage/helen/.staging/job_1385542328307_0915/job.jar
 -rw-r--r--  10 helen hadoop   3696 2013-11-29 14:01
 /data/hisstage/helen/.staging/job_1385542328307_0915/job.split

 On Tue, Dec 10, 2013 at 9:15 AM, ch huang justlo...@gmail.com wrote:

 the strange thing is when i use the following command i find 1 corrupt
 block



 #  curl -s http://ch11:50070/jmx |grep orrupt
 CorruptBlocks : 1,

 but when i run hdfs fsck / , i get none ,everything seems fine



 # sudo -u hdfs hdfs fsck

issue about corrupt block test

2013-12-10 Thread ch huang

hi,maillist:
 i try to corrupt a block of a file in my benchmark
environment, as the following command i find blk_2504407693800874616_106252
,it's replica on 192.168.10.224 is my target ,but i find all the datadir in
192.168.10.224 ,can not fine the datafile belongs to this replic ,why?

# ls
/data/dataspace/1/current/BP-50684181-192.168.10.220-1383638483950/current/finalized/blk_3717620888497075523_106232*
ls: cannot access
/data/dataspace/1/current/BP-50684181-192.168.10.220-1383638483950/current/finalized/blk_3717620888497075523_106232*:
No such file or directory
[root@CHBM224 conf]# ls
/data/dataspace/1/current/BP-50684181-192.168.10.220-1383638483950/current/finalized/blk_2504407693800874616_106252*
ls: cannot access
/data/dataspace/1/current/BP-50684181-192.168.10.220-1383638483950/current/finalized/blk_2504407693800874616_106252*:
No such file or directory
[root@CHBM224 conf]# ls
/data/dataspace/2/current/BP-50684181-192.168.10.220-1383638483950/current/finalized/blk_2504407693800874616_106252*
ls: cannot access
/data/dataspace/2/current/BP-50684181-192.168.10.220-1383638483950/current/finalized/blk_2504407693800874616_106252*:
No such file or directory
[root@CHBM224 conf]# ls
/data/dataspace/3/current/BP-50684181-192.168.10.220-1383638483950/current/finalized/blk_2504407693800874616_106252*
ls: cannot access
/data/dataspace/3/current/BP-50684181-192.168.10.220-1383638483950/current/finalized/blk_2504407693800874616_106252*:
No such file or directory
[root@CHBM224 conf]# hdfs fsck /alex/terasort/1G-input/part-m-0 -files
-blocks -locations
Connecting to namenode via http://CHBM220:50070 http://chbm220:50070/
FSCK started by root (auth:SIMPLE) from /192.168.10.224 for path
/alex/terasort/1G-input/part-m-0 at Wed Dec 11 11:35:42 CST 2013
/alex/terasort/1G-input/part-m-0 1 bytes, 2 block(s):  OK
0. BP-50684181-192.168.10.220-1383638483950:blk_3717620888497075523_106232
len=67108864 repl=3 [192.168.10.222:50010, 192.168.10.223:50010,
192.168.10.221:50010]
1. BP-50684181-192.168.10.220-1383638483950:blk_2504407693800874616_106252
len=32891136 repl=3 [192.168.10.222:50010, 192.168.10.221:50010,
192.168.10.224:50010]

issue about Shuffled Maps in MR job summary

2013-12-10 Thread ch huang

hi,maillist:
   i run terasort with 16 reducers and 8 reducers,when i double
reducer number, the Shuffled maps is also double ,my question is the job
only run 20 map tasks (total input file is 10,and each file is 100M,my
block size is 64M,so split is 20) why i need shuffle 160 maps in 8 reducers
run and 320 maps in 16 reducers run?how to caculate the shuffle maps number?

16 reducer summary output:


 Shuffled Maps =320

 8 reducer summary output:

Shuffled Maps =160

Re: issue about Shuffled Maps in MR job summary

2013-12-10 Thread ch huang

i read the doc, and find if i have 8 reducer ,a map task will output 8
partition ,each partition will be send to a different reducer, so if i
increase reduce number ,the partition number increase ,but the volume on
network traffic is same,why sometime ,increase reducer number will not
decrease job complete time ?

On Wed, Dec 11, 2013 at 1:48 PM, Vinayakumar B vinayakuma...@huawei.comwrote:

  It looks simple, J



 Shuffled Maps= Number of Map Tasks * Number of Reducers



 Thanks and Regards,

 Vinayakumar B



 *From:* ch huang [mailto:justlo...@gmail.com]
 *Sent:* 11 December 2013 10:56
 *To:* user@hadoop.apache.org
 *Subject:* issue about Shuffled Maps in MR job summary



 hi,maillist:

i run terasort with 16 reducers and 8 reducers,when i double
 reducer number, the Shuffled maps is also double ,my question is the job
 only run 20 map tasks (total input file is 10,and each file is 100M,my
 block size is 64M,so split is 20) why i need shuffle 160 maps in 8 reducers
 run and 320 maps in 16 reducers run?how to caculate the shuffle maps number?



 16 reducer summary output:





  Shuffled Maps =320



 8 reducer summary output:



 Shuffled Maps =160

Re: how to handle the corrupt block in HDFS?

2013-12-09 Thread ch huang

the strange thing is when i use the following command i find 1 corrupt block

#  curl -s http://ch11:50070/jmx |grep orrupt
CorruptBlocks : 1,
but when i run hdfs fsck / , i get none ,everything seems fine

# sudo -u hdfs hdfs fsck /


Status: HEALTHY
 Total size:1479728140875 B (Total open files size: 1677721600 B)
 Total dirs:21298
 Total files:   100636 (Files currently being written: 25)
 Total blocks (validated):  119788 (avg. block size 12352891 B) (Total
open file blocks (not validated): 37)
 Minimally replicated blocks:   119788 (100.0 %)
 Over-replicated blocks:0 (0.0 %)
 Under-replicated blocks:   166 (0.13857816 %)
 Mis-replicated blocks: 0 (0.0 %)
 Default replication factor:3
 Average block replication: 3.0027633
 Corrupt blocks:0
 Missing replicas:  831 (0.23049656 %)
 Number of data-nodes:  5
 Number of racks:   1
FSCK ended at Tue Dec 10 09:14:48 CST 2013 in 3276 milliseconds

The filesystem under path '/' is HEALTHY


On Tue, Dec 10, 2013 at 8:32 AM, ch huang justlo...@gmail.com wrote:

 hi,maillist:
 my nagios alert me that there is a corrupt block in HDFS all
 day,but i do not know how to remove it,and if the HDFS will handle this
 automaticlly? and if remove the corrupt block will cause any data
 lost?thanks

Re: how to handle the corrupt block in HDFS?

2013-12-09 Thread ch huang

more strange , in my HDFS cluster ,every block has three replicas,but i
find some one has ten replicas ,why?

# sudo -u hdfs hadoop fs -ls
/data/hisstage/helen/.staging/job_1385542328307_0915
Found 5 items
-rw-r--r--   3 helen hadoop  7 2013-11-29 14:01
/data/hisstage/helen/.staging/job_1385542328307_0915/appTokens
-rw-r--r--  10 helen hadoop2977839 2013-11-29 14:01
/data/hisstage/helen/.staging/job_1385542328307_0915/job.jar
-rw-r--r--  10 helen hadoop   3696 2013-11-29 14:01
/data/hisstage/helen/.staging/job_1385542328307_0915/job.split


On Tue, Dec 10, 2013 at 9:15 AM, ch huang justlo...@gmail.com wrote:

 the strange thing is when i use the following command i find 1 corrupt
 block

 #  curl -s http://ch11:50070/jmx |grep orrupt
 CorruptBlocks : 1,
 but when i run hdfs fsck / , i get none ,everything seems fine

 # sudo -u hdfs hdfs fsck /
 

 Status: HEALTHY
  Total size:1479728140875 B (Total open files size: 1677721600 B)
  Total dirs:21298
  Total files:   100636 (Files currently being written: 25)
  Total blocks (validated):  119788 (avg. block size 12352891 B) (Total
 open file blocks (not validated): 37)
  Minimally replicated blocks:   119788 (100.0 %)
  Over-replicated blocks:0 (0.0 %)
  Under-replicated blocks:   166 (0.13857816 %)
  Mis-replicated blocks: 0 (0.0 %)
  Default replication factor:3
  Average block replication: 3.0027633
  Corrupt blocks:0
  Missing replicas:  831 (0.23049656 %)
  Number of data-nodes:  5
  Number of racks:   1
 FSCK ended at Tue Dec 10 09:14:48 CST 2013 in 3276 milliseconds

 The filesystem under path '/' is HEALTHY


  On Tue, Dec 10, 2013 at 8:32 AM, ch huang justlo...@gmail.com wrote:

 hi,maillist:
 my nagios alert me that there is a corrupt block in HDFS all
 day,but i do not know how to remove it,and if the HDFS will handle this
 automaticlly? and if remove the corrupt block will cause any data
 lost?thanks

issue about running example job use custom mapreduce var

2013-12-09 Thread ch huang

hi,maillist:
i try assign reduce number in commandline but seems not
useful,i run tera sort like this

# hadoop jar /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar
terasort /alex/terasort/1G-input /alex/terasort/1G-output
-Dmapreduce.job.reduces=34

so default mapred-site.xml assign reducer number is 16,i try run it with 34
reducers,but the job run still as 16 reducers number,why?

here is some output:

Job Counters
Launched map tasks=1
Launched reduce tasks=16
Rack-local map tasks=1
Total time spent by all maps in occupied slots (ms)=2318
Total time spent by all reduces in occupied slots (ms)=99714

how many job request can be in queue when the first MR JOB is blocked due to lack of resource?

2013-12-05 Thread ch huang

hi,maillist:
any variables can control it?

issue about terasort read partition file from local fs instead HDFS

2013-12-05 Thread ch huang

hi,maillist:
  i try to use terasort to benchmark my cluster ,when i run it
,i found tearsort try to read partition file from local filesystem
not HDFS,i see a partition file in HDFS ,when i copy this file into local
filesystem,run terasort again ,it's work fine ,but it run on local host
instead of cluster ,why? and how can i let it run on cluster?

# hadoop fs -ls /alex/terasort/1G-input
Found 3 items
-rw-r--r--   3 root hadoop  0 2013-12-05 14:34
/alex/terasort/1G-input/_SUCCESS
-rw-r--r--   3 root hadoop129 2013-12-06 10:20
/alex/terasort/1G-input/_partition.lst
-rw-r--r--   3 root hadoop 10 2013-12-05 14:34
/alex/terasort/1G-input/part-0

Re: how many job request can be in queue when the first MR JOB is blocked due to lack of resource?

2013-12-05 Thread ch huang

i search the code only
src/hadoop-mapreduce1-project/src/contrib/capacity-scheduler/src/test/org/apache/hadoop/mapred/TestCapacitySchedulerConf.java
file has the variables,i did see it on
./src/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacitySchedulerConfiguration.java


On Fri, Dec 6, 2013 at 10:44 AM, rtejac rte...@gmail.com wrote:

 You can take a look at this parameter. This will control number of jobs a
 user can initialize.

 *mapred.capacity-scheduler.queue.default.maximum-initialized-jobs-per-user
 = …. *

  On Dec 5, 2013, at 5:33 PM, ch huang justlo...@gmail.com wrote:

  hi,maillist:
 any variables can control it?

Re: Monitor network traffic in hadoop

2013-12-05 Thread ch huang

hi, Abdul Navaz:
assign shuffle port in each NM using option mapreduce.shuffle.port in
mapred-site.xml,then
monitor this port use tcpdump or wireshark ,hope this info can help you

On Fri, Dec 6, 2013 at 11:22 AM, navaz navaz@gmail.com wrote:

 Hello

 I am following the tutorial hadoop on single node cluster and I am able
 test word count program  map reduce. its working fine.

 I would like to know
 How to monitor when shuffle phase network traffic occurs via wireshark or
 someother means.

 Pls guide me.

 Thanks
 Abdul Navaz
 Graduate student
 On Dec 5, 2013 6:56 PM, navaz navaz@gmail.com wrote:

 Hello

 I am following the tutorial hadoop on single node cluster and I am able
 test word count map reduce. its working fine.

 I would like to know
 How to monitor when shuffle phase network traffic occurs via wireshark or
 someother means.

 Pls guide me.

 Thanks
 Abdul Navaz
 Graduate student
 University of Houston ,TX

Re: error in copy from local file into HDFS

2013-12-05 Thread ch huang

hi:
  you are right,my DN disk is full,i delete some file,now it's worked
,thanks

On Fri, Dec 6, 2013 at 11:28 AM, Vinayakumar B vinayakuma...@huawei.comwrote:

  Hi Ch huang,



Please check whether all datanodes in your cluster have enough disk
 space and number non-decommissioned nodes should be non-zero.



 Thanks and regards,

 Vinayakumar B



 *From:* ch huang [mailto:justlo...@gmail.com]
 *Sent:* 06 December 2013 07:14
 *To:* user@hadoop.apache.org
 *Subject:* error in copy from local file into HDFS



 hi,maillist:

   i got a error when i put local file into HDFS



 [root@CHBM224 test]# hadoop fs -copyFromLocal /tmp/aa /alex/
 13/12/06 09:40:29 WARN hdfs.DFSClient: DataStreamer Exception
 org.apache.hadoop.ipc.RemoteException(java.io.IOException): File
 /alex/aa._COPYING_ could only be replicated to 0 nodes instead of
 minReplication (=1).  There are 4 datanode(s) running and no node(s) are
 excluded in this operation.
 at
 org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1339)
 at
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2198)
 at
 org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:501)
 at
 org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:299)
 at
 org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:44954)
 at
 org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
 at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1002)
 at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1751)
 at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1747)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
 at
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
 at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1745)

 at org.apache.hadoop.ipc.Client.call(Client.java:1237)
 at
 org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
 at com.sun.proxy.$Proxy9.addBlock(Unknown Source)
 at
 org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:291)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:606)
 at
 org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164)
 at
 org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83)
 at com.sun.proxy.$Proxy10.addBlock(Unknown Source)
 at
 org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1177)
 at
 org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1030)
 at
 org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:488)
 copyFromLocal: File /alex/aa._COPYING_ could only be replicated to 0 nodes
 instead of minReplication (=1).  There are 4 datanode(s) running and no
 node(s) are excluded in this operation.
 13/12/06 09:40:29 ERROR hdfs.DFSClient: Failed to close file
 /alex/aa._COPYING_
 org.apache.hadoop.ipc.RemoteException(java.io.IOException): File
 /alex/aa._COPYING_ could only be replicated to 0 nodes instead of
 minReplication (=1).  There are 4 datanode(s) running and no node(s) are
 excluded in this operation.
 at
 org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1339)
 at
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2198)
 at
 org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:501)
 at
 org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:299)
 at
 org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:44954)
 at
 org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
 at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1002)
 at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1751)
 at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1747

Re: issue about capacity scheduler

2013-12-04 Thread ch huang

if i have 40GB memory of cluster resource, and
yarn.scheduler.capacity.maximum-am-resource-percent set to 0.1 ,so that's
mean when i lauch a appMaster ,i need allocate 4GB to the  appMaster ? ,if
so, why i increasing the value will cause more appMaster running
concurrently,instead of decreasing ?

On Thu, Dec 5, 2013 at 5:04 AM, Jian He j...@hortonworks.com wrote:

 you can probably try increasing
 yarn.scheduler.capacity.maximum-am-resource-percent,
 This controls the max concurrently running AMs.

 Thanks,
 Jian


 On Wed, Dec 4, 2013 at 1:33 AM, ch huang justlo...@gmail.com wrote:

 hi,maillist :
  i use yarn framework and capacity scheduler  ,and i have
 two queue ,one for hive and the other for big MR job
 in hive  queue it's work fine,because hive task is very faster ,but what
 i think is user A submitted two big MR job ,so first big job eat
 all the resource belongs to the queue ,the other big MR job should wait
 until first job finished ,how can i let the same user 's MR job can run
 parallel?



 CONFIDENTIALITY NOTICE
 NOTICE: This message is intended for the use of the individual or entity
 to which it is addressed and may contain information that is confidential,
 privileged and exempt from disclosure under applicable law. If the reader
 of this message is not the intended recipient, you are hereby notified that
 any printing, copying, dissemination, distribution, disclosure or
 forwarding of this communication is strictly prohibited. If you have
 received this communication in error, please contact the sender immediately
 and delete it from your system. Thank You.

Re: issue about capacity scheduler

2013-12-04 Thread ch huang

another question is ,i set the yarn.scheduler.minimum-allocation-mb is
2GB,so the container size will at less 2GB ,but i see appMaster container
only use 1GB heap size why?

# ps -ef|grep 8062
yarn  8062  8047  5 09:04 ?00:00:09
/usr/java/jdk1.7.0_25/bin/java
-Dlog4j.configuration=container-log4j.properties
-Dyarn.app.mapreduce.container.log.dir=/data/mrlocal/1/yarn/logs/application_1386139114497_0024/container_1386139114497_0024_01_01
-Dyarn.app.mapreduce.container.log.filesize=0 -Dhadoop.root.logger=INFO,CLA
-Xmx1024m org.apache.hadoop.mapreduce.v2.app.MRAppMaster


On Thu, Dec 5, 2013 at 5:04 AM, Jian He j...@hortonworks.com wrote:

 you can probably try increasing
 yarn.scheduler.capacity.maximum-am-resource-percent,
 This controls the max concurrently running AMs.

 Thanks,
 Jian


 On Wed, Dec 4, 2013 at 1:33 AM, ch huang justlo...@gmail.com wrote:

 hi,maillist :
  i use yarn framework and capacity scheduler  ,and i have
 two queue ,one for hive and the other for big MR job
 in hive  queue it's work fine,because hive task is very faster ,but what
 i think is user A submitted two big MR job ,so first big job eat
 all the resource belongs to the queue ,the other big MR job should wait
 until first job finished ,how can i let the same user 's MR job can run
 parallel?



 CONFIDENTIALITY NOTICE
 NOTICE: This message is intended for the use of the individual or entity
 to which it is addressed and may contain information that is confidential,
 privileged and exempt from disclosure under applicable law. If the reader
 of this message is not the intended recipient, you are hereby notified that
 any printing, copying, dissemination, distribution, disclosure or
 forwarding of this communication is strictly prohibited. If you have
 received this communication in error, please contact the sender immediately
 and delete it from your system. Thank You.

Re: issue about the MR JOB local dir

2013-12-04 Thread ch huang

thank you ,but it seems the doc is littler old , doc says


   - *PUBLIC:* local-dir/filecache
   - *PRIVATE:* local-dir/usercache//filecache
   - *APPLICATION:* local-dir/usercache//appcache/app-id/

but here is my nodemanager directory,i guess nmPrivate belongs to private
dir ,and filecache dir is not exist in usercache

# ls /data/mrlocal/1/yarn/local
filecache  nmPrivate  usercache
[root@CHBM223 conf]# ls /data/mrlocal/1/yarn/local/filecache/
-1058429088916409723  4529188628984375230   -7014620238384418063
1084965746802723478   4537624275313838973   -7168597014714301440
-1624997938511480096  4630901056913375526   7270199361370573766
-1664837184667315424  -4642830643595652223  -7332220817185869511
1725675017861848111   4715236827440900877332904188082338506
1838346487029342338   4790459366530674957   -7450121760156930096
1865044782300039774800525395984004560   7478948409771297223
-2348110367263014791  -5080956154405911478  7486468764131639983
-2569725565520513438  524923119076958393-7755253483162230956
-2590767617048813033  -5270961733852362332  -7859425335924192987
2787947055181616358   -5381775829268220744  7967711417630616031
2816094634154232444   -5845090920164902899  8115657316961272063
286373945366133510-587409153437667574   -8196745140008584754
2931191327895309259   -5951079387471670627  -8338714062663466224
-304471400571947298   -6076923167039033115  -8473967805299855837
3250195466880585846080416638029534254   8513492322348652110
-3331048722364108374  6332597539903254606   -8567312237113801580
3360339691049457808   634308406792699   8737308241488535006
3368354412003774516566344665060319340   -8893869581665287805
3628504729266619560   -6639258108397695527  -8895898681278542021
-3801380133229678986  -6653760362065293300  8926294383627727352
3837066533086156807   -6782198269120858036  -8964326004503603190
3929223016635331138   -6814427383139267223  -9049325747073392755
4126862917222506438   -6814979781017122863  -9186700026428986961
[root@CHBM223 conf]# ls /data/mrlocal/1/yarn/local/nmPrivate/
application_1385444985453_0001  application_1385453784842_0010
application_1385522685434_0081
container_1385522685434_0073_01_14.pid
application_1385445543402_0003  application_1385453784842_0013
container_1385522685434_0073_01_05.pid
container_1385522685434_0073_01_17.pid
application_1385445543402_0005  application_1385520079773_0005
container_1385522685434_0073_01_08.pid
[root@CHBM223 conf]# ls /data/mrlocal/1/yarn/local/usercache/
hdfs  helen  hive  root



On Thu, Dec 5, 2013 at 5:12 AM, Jian He j...@hortonworks.com wrote:

 The following links may help you
 http://hortonworks.com/blog/management-of-application-dependencies-in-yarn/
 http://hortonworks.com/blog/resource-localization-in-yarn-deep-dive/

 Thanks,
 Jian


 On Tue, Dec 3, 2013 at 5:26 PM, ch huang justlo...@gmail.com wrote:

 hi,maillist:
 i see three dirs on my local MR job dir ,and i do not know
 these dirs usage,anyone knows?

 # ls /data/1/mrlocal/yarn/local/
 filecache/ nmPrivate/ usercache/



 CONFIDENTIALITY NOTICE
 NOTICE: This message is intended for the use of the individual or entity
 to which it is addressed and may contain information that is confidential,
 privileged and exempt from disclosure under applicable law. If the reader
 of this message is not the intended recipient, you are hereby notified that
 any printing, copying, dissemination, distribution, disclosure or
 forwarding of this communication is strictly prohibited. If you have
 received this communication in error, please contact the sender immediately
 and delete it from your system. Thank You.

get error in running terasort tool

2013-12-04 Thread ch huang

hi,maillist:
  i try run terasort in my cluster ,but failed ,following
is error ,i do not know why, anyone can help?

# hadoop jar /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar
terasort /alex/terasort/1G-input /alex/terasort/1G-output
13/12/05 15:15:43 INFO terasort.TeraSort: starting
13/12/05 15:15:43 INFO mapred.FileInputFormat: Total input paths to process
: 1
13/12/05 15:15:45 INFO zlib.ZlibFactory: Successfully loaded  initialized
native-zlib library
13/12/05 15:15:45 INFO compress.CodecPool: Got brand-new compressor
[.deflate]
Making 1 from 10 records
Step size is 10.0
13/12/05 15:15:45 WARN hdfs.DFSClient: DataStreamer Exception
org.apache.hadoop.ipc.RemoteException(java.io.IOException): File
/alex/terasort/1G-input/_partition.lst could only be replicated to 0 nodes
instead of minReplication (=1).  There are 4 datanode(s) running and no
node(s) are excluded in this operation.
at
org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1339)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2198)
at
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:501)
at
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:299)
at
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:44954)
at
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1002)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1751)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1747)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1745)
at org.apache.hadoop.ipc.Client.call(Client.java:1237)
at
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
at com.sun.proxy.$Proxy9.addBlock(Unknown Source)
at
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:291)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164)
at
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83)
at com.sun.proxy.$Proxy10.addBlock(Unknown Source)
at
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1177)
at
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1030)
at
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:488)
org.apache.hadoop.ipc.RemoteException(java.io.IOException): File
/alex/terasort/1G-input/_partition.lst could only be replicated to 0 nodes
instead of minReplication (=1).  There are 4 datanode(s) running and no
node(s) are excluded in this operation.
at
org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1339)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2198)
at
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:501)
at
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:299)
at
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:44954)
at
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1002)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1751)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1747)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1745)
at org.apache.hadoop.ipc.Client.call(Client.java:1237)

Re: get error in running terasort tool

2013-12-04 Thread ch huang

BTW.i use CDH4.4

On Thu, Dec 5, 2013 at 3:18 PM, ch huang justlo...@gmail.com wrote:

 hi,maillist:
   i try run terasort in my cluster ,but failed ,following
 is error ,i do not know why, anyone can help?

 # hadoop jar /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar
 terasort /alex/terasort/1G-input /alex/terasort/1G-output
 13/12/05 15:15:43 INFO terasort.TeraSort: starting
 13/12/05 15:15:43 INFO mapred.FileInputFormat: Total input paths to
 process : 1
 13/12/05 15:15:45 INFO zlib.ZlibFactory: Successfully loaded  initialized
 native-zlib library
 13/12/05 15:15:45 INFO compress.CodecPool: Got brand-new compressor
 [.deflate]
 Making 1 from 10 records
 Step size is 10.0
 13/12/05 15:15:45 WARN hdfs.DFSClient: DataStreamer Exception
 org.apache.hadoop.ipc.RemoteException(java.io.IOException): File
 /alex/terasort/1G-input/_partition.lst could only be replicated to 0 nodes
 instead of minReplication (=1).  There are 4 datanode(s) running and no
 node(s) are excluded in this operation.
 at
 org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1339)
 at
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2198)
 at
 org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:501)
 at
 org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:299)
 at
 org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:44954)
 at
 org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
 at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1002)
 at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1751)
 at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1747)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
 at
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
 at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1745)
 at org.apache.hadoop.ipc.Client.call(Client.java:1237)
 at
 org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
 at com.sun.proxy.$Proxy9.addBlock(Unknown Source)
 at
 org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:291)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:606)
 at
 org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164)
 at
 org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83)
 at com.sun.proxy.$Proxy10.addBlock(Unknown Source)
 at
 org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1177)
 at
 org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1030)
 at
 org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:488)
 org.apache.hadoop.ipc.RemoteException(java.io.IOException): File
 /alex/terasort/1G-input/_partition.lst could only be replicated to 0 nodes
 instead of minReplication (=1).  There are 4 datanode(s) running and no
 node(s) are excluded in this operation.
 at
 org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1339)
 at
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2198)
 at
 org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:501)
 at
 org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:299)
 at
 org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:44954)
 at
 org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
 at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1002)
 at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1751)
 at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1747)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415

issue about total input byte of MR job

2013-12-03 Thread ch huang

i run the MR job,at the MR output i see

13/12/03 14:02:28 INFO mapreduce.JobSubmitter: number of splits:2717

because my each data block size is 64M,so total byte is 2717*64M/1024= 170G

but in the summary of end i see follow info ,so the HDFS read byte is
126792190158/1024/1024/1024 = 118G ,the two number is not very close ,why?

File System Counters
FILE: Number of bytes read=9642910241
FILE: Number of bytes written=120327706125
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=126792190158
HDFS: Number of bytes written=0
HDFS: Number of read operations=8151
HDFS: Number of large read operations=0
HDFS: Number of write operations=0

issue about the MR JOB local dir

2013-12-03 Thread ch huang

hi,maillist:
i see three dirs on my local MR job dir ,and i do not know
these dirs usage,anyone knows?

# ls /data/1/mrlocal/yarn/local/
filecache/ nmPrivate/ usercache/

issue about read file from HDFS

2013-12-03 Thread ch huang

hi,mailist:
   when HDFS 's file is in appending ,no other reader can get
data from this file,so when i do some statistics action every five minuters
use hive read hdfs file ,i can not read data ,any good way can offer? thanks

Re: issue about read file from HDFS

2013-12-03 Thread ch huang

it seems not a good suggestion,get lot of partition dir and data file will
be a big compact to NN

On Wed, Dec 4, 2013 at 12:08 PM, Azuryy Yu azury...@gmail.com wrote:

 One suggestion is change your hive partition, add a hive partition  every
 five minutes, and your HDFS file also roller every five minutes.


 On Wed, Dec 4, 2013 at 11:56 AM, ch huang justlo...@gmail.com wrote:

 hi,mailist:
when HDFS 's file is in appending ,no other reader can get
 data from this file,so when i do some statistics action every five minuters
 use hive read hdfs file ,i can not read data ,any good way can offer? thanks

how to prevent JAVA HEAP OOM happen in shuffle process in a MR job?

2013-12-02 Thread ch huang

hi,maillist:
i recent get a problem,when i run MR job, it happened OOM in shuffle
process,the options about MR is default,not changed,which option should i
tuning? thanks

issure about MR job on yarn framework

2013-12-02 Thread ch huang

hi,maillist:
  i run a job on my CDH4.4 yarn framework ,it's map task
finished very fast,but reduce is very slow, i check it use ps command
find it's work heap size is 200m,so i try to increase heap size used by
reduce task,i add YARN_OPTS=$YARN_OPTS
-Dmapreduce.reduce.java.opts=-Xmx1024m -verbose:gc -XX:+PrintGCDetails
-XX:+PrintGCDateStamps
-Xloggc:$YARN_LOG_DIR/gc-$(hostname)-resourcemanager.log
-XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=5 -XX:GCLogFileSize=15M
-XX:-UseGCOverheadLimitin yarn-env.sh file ,but when i restart the
nodemanager ,i find new reduce task still use 200m heap ,why?

# jps
2853 DataNode
19533 Jps
10949 YarnChild
10661 NodeManager
15130 HRegionServer
# ps -ef|grep 10949
yarn 10949 10661 99 09:52 ?00:19:31
/usr/java/jdk1.7.0_45/bin/java -Djava.net.preferIPv4Stack=true
-Dhadoop.metrics.log.level=WARN -Xmx200m
-Djava.io.tmpdir=/data/1/mrlocal/yarn/local/usercache/hdfs/appcache/application_1385983958793_0022/container_1385983958793_0022_01_005650/tmp
-Dlog4j.configuration=container-log4j.properties
-Dyarn.app.mapreduce.container.log.dir=/data/2/mrlocal/yarn/logs/application_1385983958793_0022/container_1385983958793_0022_01_005650
-Dyarn.app.mapreduce.container.log.filesize=0 -Dhadoop.root.logger=INFO,CLA
org.apache.hadoop.mapred.YarnChild 192.168.11.10 48936
attempt_1385983958793_0022_r_00_14 5650

Re: issure about MR job on yarn framework

2013-12-02 Thread ch huang

another question is why the map process progress will back when it reach
100%?




On Tue, Dec 3, 2013 at 10:07 AM, ch huang justlo...@gmail.com wrote:

 hi,maillist:
   i run a job on my CDH4.4 yarn framework ,it's map task
 finished very fast,but reduce is very slow, i check it use ps command
 find it's work heap size is 200m,so i try to increase heap size used by
 reduce task,i add YARN_OPTS=$YARN_OPTS
 -Dmapreduce.reduce.java.opts=-Xmx1024m -verbose:gc -XX:+PrintGCDetails
 -XX:+PrintGCDateStamps
 -Xloggc:$YARN_LOG_DIR/gc-$(hostname)-resourcemanager.log
 -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=5 -XX:GCLogFileSize=15M
 -XX:-UseGCOverheadLimitin yarn-env.sh file ,but when i restart the
 nodemanager ,i find new reduce task still use 200m heap ,why?

 # jps
 2853 DataNode
 19533 Jps
 10949 YarnChild
 10661 NodeManager
 15130 HRegionServer
 # ps -ef|grep 10949
 yarn 10949 10661 99 09:52 ?00:19:31
 /usr/java/jdk1.7.0_45/bin/java -Djava.net.preferIPv4Stack=true
 -Dhadoop.metrics.log.level=WARN -Xmx200m
 -Djava.io.tmpdir=/data/1/mrlocal/yarn/local/usercache/hdfs/appcache/application_1385983958793_0022/container_1385983958793_0022_01_005650/tmp
 -Dlog4j.configuration=container-log4j.properties
 -Dyarn.app.mapreduce.container.log.dir=/data/2/mrlocal/yarn/logs/application_1385983958793_0022/container_1385983958793_0022_01_005650
 -Dyarn.app.mapreduce.container.log.filesize=0 -Dhadoop.root.logger=INFO,CLA
 org.apache.hadoop.mapred.YarnChild 192.168.11.10 48936
 attempt_1385983958793_0022_r_00_14 5650

1 2 >

1 - 100 of 187 matches

Mail list logo