RE: intermediate results files
If you are 100% sure that all the node data nodes are available and healthy for that period of time, you can choose the replication factor as 1 or 3. Thanks Devaraj k From: John Lilley [mailto:john.lil...@redpoint.net] Sent: 02 July 2013 04:40 To: user@hadoop.apache.org Subject: RE: intermediate results files I've seen some benchmarks where replication=1 runs at about 50MB/sec and replication=3 runs at about 33MB/sec, but I can't seem to find that now. John From: Mohammad Tariq [mailto:donta...@gmail.com] Sent: Monday, July 01, 2013 5:03 PM To: user@hadoop.apache.orgmailto:user@hadoop.apache.org Subject: Re: intermediate results files Hello John, IMHO, it doesn't matter. Your job will write the result just once. Replica creation is handled at the HDFS layer so it has nothing to with your job. Your job will still be writing at the same speed. Warm Regards, Tariq cloudfront.blogspot.comhttp://cloudfront.blogspot.com On Tue, Jul 2, 2013 at 4:16 AM, John Lilley john.lil...@redpoint.netmailto:john.lil...@redpoint.net wrote: If my reducers are going to create results that are temporary in nature (consumed by the next processing stage) is it recommended to use a replication factor 3 to improve performance? Thanks john
RE: temporary folders for YARN tasks
You can make use of this configuration to do the same. property descriptionList of directories to store localized files in. An application's localized file directory will be found in: ${yarn.nodemanager.local-dirs}/usercache/${user}/appcache/application_${appid}. Individual containers' work directories, called container_${contid}, will be subdirectories of this. /description nameyarn.nodemanager.local-dirs/name value${hadoop.tmp.dir}/nm-local-dir/value /property Thanks Devaraj k From: John Lilley [mailto:john.lil...@redpoint.net] Sent: 02 July 2013 02:08 To: user@hadoop.apache.org Subject: temporary folders for YARN tasks When a YARN app and its tasks wants to write temporary files, how does it know where to write the files? I am assuming that each task has some temporary space available, and I hope it is available across multiple disk volumes for parallel performance. Are those files cleaned up automatically after task exit? If I want to give lifetime control of the files to an auxiliary service (along the lines of MR shuffle passing files to the aux service), how would I do that, and would that entail different file locations? Thanks John
Re: Hang when add/remove a datanode into/from a 2 datanode cluster
Yes, the default replication factor is 3. However, in my case, it's strange: during decommission hangs, I found some block's expected replicas is 3, but the 'dfs.replication' value in hdfs-site.xml of every cluster node is always 2 from the beginning of cluster setup. Below is my steps: 1. Install a Hadoop 1.1.1 cluster, with 2 datanodes: dn1 and dn2. And, in hdfs-site.xml, set the 'dfs.replication' to 2 2. Add node dn3 into the cluster as a new datanode, and did not change the ' dfs.replication' value in hdfs-site.xml and keep it as 2 note: step 2 passed 3. Decommission dn3 from the cluster Expected result: dn3 could be decommissioned successfully Actual result: a). decommission progress hangs and the status always be 'Waiting DataNode status: Decommissioned'. But, if I execute 'hadoop dfs -setrep -R 2 /', the decommission continues and will be completed finally. b). However, if the initial cluster includes = 3 datanodes, this issue won't be encountered when add/remove another datanode. For example, if I setup a cluster with 3 datanodes, and then I can successfully add the 4th datanode into it, and then also can successfully remove the 4th datanode from the cluster. I doubt it's a bug and plan to open a jira to Hadoop HDFS for this. Any comments? Thanks! 2013/6/21 Harsh J ha...@cloudera.com The dfs.replication is a per-file parameter. If you have a client that does not use the supplied configs, then its default replication is 3 and all files it will create (as part of the app or via a job config) will be with replication factor 3. You can do an -lsr to find all files and filter which ones have been created with a factor of 3 (versus expected config of 2). On Fri, Jun 21, 2013 at 3:13 PM, sam liu samliuhad...@gmail.com wrote: Hi George, Actually, in my hdfs-site.xml, I always set 'dfs.replication'to 2. But still encounter this issue. Thanks! 2013/6/21 George Kousiouris gkous...@mail.ntua.gr Hi, I think i have faced this before, the problem is that you have the rep factor=3 so it seems to hang because it needs 3 nodes to achieve the factor (replicas are not created on the same node). If you set the replication factor=2 i think you will not have this issue. So in general you must make sure that the rep factor is = to the available datanodes. BR, George On 6/21/2013 12:29 PM, sam liu wrote: Hi, I encountered an issue which hangs the decommission operatoin. Its steps: 1. Install a Hadoop 1.1.1 cluster, with 2 datanodes: dn1 and dn2. And, in hdfs-site.xml, set the 'dfs.replication' to 2 2. Add node dn3 into the cluster as a new datanode, and did not change the 'dfs.replication' value in hdfs-site.xml and keep it as 2 note: step 2 passed 3. Decommission dn3 from the cluster Expected result: dn3 could be decommissioned successfully Actual result: decommission progress hangs and the status always be 'Waiting DataNode status: Decommissioned' However, if the initial cluster includes = 3 datanodes, this issue won't be encountered when add/remove another datanode. Also, after step 2, I noticed that some block's expected replicas is 3, but the 'dfs.replication' value in hdfs-site.xml is always 2! Could anyone pls help provide some triages? Thanks in advance! -- --- George Kousiouris, PhD Electrical and Computer Engineer Division of Communications, Electronics and Information Engineering School of Electrical and Computer Engineering Tel: +30 210 772 2546 Mobile: +30 6939354121 Fax: +30 210 772 2569 Email: gkous...@mail.ntua.gr Site: http://users.ntua.gr/gkousiou/ National Technical University of Athens 9 Heroon Polytechniou str., 157 73 Zografou, Athens, Greece -- Harsh J
Re: temporary folders for YARN tasks
LocalDirAllocator should help with this. You can look through MapReduce code to see how it's used. -Sandy On Mon, Jul 1, 2013 at 11:01 PM, Devaraj k devara...@huawei.com wrote: You can make use of this configuration to do the same. ** ** property descriptionList of directories to store *localized* files in. An *** * application's *localized* file directory will be found in: ${yarn.nodemanager.local-*dirs*}/*usercache*/${user}/*appcache* /application_${*appid*}. Individual containers' work directories, called container_${*contid*}, will be *subdirectories* of this. /description nameyarn.nodemanager.local-*dirs*/name value${hadoop.tmp.dir}/*nm*-local-*dir*/value /property ** ** Thanks Devaraj k ** ** *From:* John Lilley [mailto:john.lil...@redpoint.net] *Sent:* 02 July 2013 02:08 *To:* user@hadoop.apache.org *Subject:* temporary folders for YARN tasks ** ** When a YARN app and its tasks wants to write temporary files, how does it know where to write the files? I am assuming that each task has some temporary space available, and I hope it is available across multiple disk volumes for parallel performance. Are those files cleaned up automatically after task exit? If I want to give lifetime control of the files to an auxiliary service (along the lines of MR shuffle passing files to the aux service), how would I do that, and would that entail different file locations? Thanks John ** ** ** **
reply: a question about dfs.replication
YouPeng Yang, you said that may be the answer. Thank you. 发件人: YouPeng Yang [mailto:yypvsxf19870...@gmail.com] 发送时间: Tuesday, July 02, 2013 12:52 收件人: user@hadoop.apache.org 主题: Re: reply: a question about dfs.replication HI HU and Yu Aggree with dfs.replication is a client side configuration, not server side. It make the point in my last mail sense. And the cmd:hdfs dfs -setrep -R -w 2 / solve the problem that I can not change the existed file's replication value. 2013/7/2 Azuryy Yu azury...@gmail.com It's not HDFS issue. dfs.replication is a client side configuration, not server side. so you need to set it to '2' on your client side( your application running on). THEN execute command such as : hdfs dfs -put or call HDFS API in java application. On Tue, Jul 2, 2013 at 12:25 PM, Francis.Hu francis...@reachjunction.com wrote: Thanks all of you, I just get the problem fixed through the command: hdfs dfs -setrep -R -w 2 / Is that an issue of HDFS ? Why do i need to execute manually a command to tell the hadoop the replication factor even it is set in hdfs-site.xml ? Thanks, Francis.Hu 发件人: Francis.Hu [mailto:francis...@reachjunction.com] 发送时间: Tuesday, July 02, 2013 11:30 收件人: user@hadoop.apache.org 主题: 答复: 答复: a question about dfs.replication Yes , it returns 2 correctly after hdfs getconf -confkey dfs.replication but in web page ,it is 3 as below: 发件人: yypvsxf19870706 [mailto:yypvsxf19870...@gmail.com] 发送时间: Monday, July 01, 2013 23:24 收件人: user@hadoop.apache.org 主题: Re: 答复: a question about dfs.replication Hi Could you please get the property value by using : hdfs getconf -confkey dfs.replication. 鍙戣嚜鎴戠殑 iPhone 鍦?2013-7-1锛?5:51锛孎rancis.Hu francis...@reachjunction.com 鍐欓亾锛?br Actually, My java client is running with the same configuration as the hadoop's . The dfs.replication is already set as 2 in my hadoop's configuration. So i think the dfs.replication is already overrided by my configuration in hdfs-site.xml. but seems it doesn't work even i overrided the parameter evidently. 鍙戜欢浜?span lang=EN-US: 袝屑械谢褜褟薪芯胁 袘芯褉懈褋 [mailto:emelya...@post.km.ru] 鍙戦€佹椂闂?span lang=EN-US: Monday, July 01, 2013 15:18 鏀朵欢浜?span lang=EN-US: user@hadoop.apache.org 涓婚: Re: a question about dfs.replication On 01.07.2013 10:19, Francis.Hu wrote: Hi, All I am installing a cluster with Hadoop 2.0.5-alpha. I have one namenode and two datanodes. The dfs.replication is set as 2 in hdfs-site.xml. After all configuration work is done, I started all nodes. Then I saved a file into HDFS through java client. nOW I can access hdfs web page: x.x.x.x:50070,and also see the file is already listed in the hdfs list. My question is: The replication column in HDFS web page is showing as 3, not 2. Does anyone know What the problem is? ---Actual setting of hdfs-site.xml property namedfs.replication/name value2/value /property After that, I typed dfsamdin command to check the file: hdfs fsck /test3/ The result of above command: /test3/hello005.txt: Under replicated BP-609310498-192.168.219.129-1372323727200:blk_-1069303317294683372_1006. Target Replicas is 3 but found 2 replica(s). Status: HEALTHY Total size:35 B Total dirs:1 Total files: 1 Total blocks (validated): 1 (avg. block size 35 B) Minimally replicated blocks: 1 (100.0 %) Over-replicated blocks:0 (0.0 %) Under-replicated blocks: 1 (100.0 %) Mis-replicated blocks: 0 (0.0 %) Default replication factor:2 Average block replication: 2.0 Corrupt blocks:0 Missing replicas: 1 (33.32 %) Number of data-nodes: 3 Number of racks: 1 FSCK ended at Sat Jun 29 16:51:37 CST 2013 in 6 milliseconds Thanks, Francis Hu If I'm not mistaking dfs.replication parameter in config sets only default replication factor, which can be overrided when putting file to hdfs. image001.pngimage002.png
Re: data loss after cluster wide power loss
Hi Uma, I think there is minimum performance degration if set dfs.datanode.synconclose to true. On Tue, Jul 2, 2013 at 3:31 PM, Uma Maheswara Rao G mahesw...@huawei.comwrote: Hi Dave, Looks like your analysis is correct. I have faced similar issue some time back. See the discussion link: http://markmail.org/message/ruev3aa4x5zh2l4w#query:+page:1+mid:33gcdcu3coodkks3+state:results On sudden restarts, it can lost the OS filesystem edits. Similar thing happened in our case, i.e, after restart blocks were moved back to BeingWritten directory even though they were finalized. After restart they were marked as corrupt. You could set dfs.datanode.synconclose to true to avoid this sort of things, but that will degrade performance. Regards, Uma -Original Message- From: ddlat...@gmail.com [mailto:ddlat...@gmail.com] On Behalf Of Dave Latham Sent: 01 July 2013 16:08 To: hdfs-u...@hadoop.apache.org Cc: hdfs-...@hadoop.apache.org Subject: Re: data loss after cluster wide power loss Much appreciated, Suresh. Let me know if I can provide any more information or if you'd like me to open a JIRA. Dave On Mon, Jul 1, 2013 at 8:48 PM, Suresh Srinivas sur...@hortonworks.com wrote: Dave, Thanks for the detailed email. Sorry I did not read all the details you had sent earlier completely (on my phone). As you said, this is not related to data loss related to HBase log and hsync. I think you are right; the rename operation itself might not have hit the disk. I think we should either ensure metadata operation is synced on the datanode or handle it being reported as blockBeingWritten. Let me spend sometime to debug this issue. One surprising thing is, all the replicas were reported as blockBeingWritten. Regards, Suresh On Mon, Jul 1, 2013 at 6:03 PM, Dave Latham lat...@davelink.net wrote: (Removing hbase list and adding hdfs-dev list as this is pretty internal stuff). Reading through the code a bit: FSDataOutputStream.close calls DFSOutputStream.close calls DFSOutputStream.closeInternal - sets currentPacket.lastPacketInBlock = true - then calls DFSOutputStream.flushInternal - enqueues current packet - waits for ack BlockReceiver.run - if (lastPacketInBlock !receiver.finalized) calls FSDataset.finalizeBlock calls FSDataset.finalizeBlockInternal calls FSVolume.addBlock calls FSDir.addBlock calls FSDir.addBlock - renames block from blocksBeingWritten tmp dir to current dest dir This looks to me as I would expect a synchronous chain from a DFS client to moving the file from blocksBeingWritten to the current dir so that once the file is closed that it the block files would be in the proper directory - even if the contents of the file are still in the OS buffer rather than synced to disk. It's only after this moving of blocks that NameNode.complete file is called. There are several conditions and loops in there that I'm not certain this chain is fully reliable in all cases without a greater understanding of the code. Could it be the case that the rename operation itself is not synced and that ext3 lost the fact that the block files were moved? Or is there a bug in the close file logic that for some reason the block files are not always moved into place when a file is closed? Thanks for your patience, Dave On Mon, Jul 1, 2013 at 3:35 PM, Dave Latham lat...@davelink.net wrote: Thanks for the response, Suresh. I'm not sure that I understand the details properly. From my reading of HDFS-744 the hsync API would allow a client to make sure that at any point in time it's writes so far hit the disk. For example, for HBase it could apply a fsync after adding some edits to its WAL to ensure those edits are fully durable for a file which is still open. However, in this case the dfs file was closed and even renamed. Is it the case that even after a dfs file is closed and renamed that the data blocks would still not be synced and would still be stored by the datanode in blocksBeingWritten rather than in current? If that is case, would it be better for the NameNode not to reject replicas that are in blocksBeingWritten, especially if it doesn't have any other replicas available? Dave On Mon, Jul 1, 2013 at 3:16 PM, Suresh Srinivas sur...@hortonworks.comwrote: Yes this is a known issue. The HDFS part of this was addressed in https://issues.apache.org/jira/browse/HDFS-744 for 2.0.2-alpha and is not available in 1.x release. I think HBase does not use this API yet. On Mon, Jul 1, 2013 at 3:00 PM, Dave Latham lat...@davelink.net wrote: We're running HBase over HDFS 1.0.2 on about 1000 nodes. On Saturday the data center we were in had a total power failure and the cluster went down hard. When we brought it back up, HDFS reported 4 files as CORRUPT. We recovered the data in question
Re: How to write/run MPI program on Yarn?
Any one could help answer above questions? Thanks a lot! 2013/7/1 sam liu samliuhad...@gmail.com Thanks Pramod and Clark! 1. What's the relationship of Hadoop 2.x branch and mpich2-yarn project? 2. Does Hadoop 2.x branch plan to include MPI implementation? I mentioned there is already a JIRA: https://issues.apache.org/jira/browse/MAPREDUCE-2911 2013/7/1 Clark Yang (杨卓荦) yangzhuo...@gmail.com Hi, sam Please try it following the README.md on https://github.com/clarkyzl/mpich2-yarn/. The design is simple ,make each nodemanger runs a smpd daemon and make application master run mpiexec to submit mpi jobs. The biggest problem here, I think, is the yarn scheduler do not support gang secheduling algorithm, and the scheduling is very very slow. Any ideas please feel free to share. Cheers, Zhuoluo (Clark) Yang 2013/6/30 Pramod N npramo...@gmail.com Hi Sam, This might interest you. https://github.com/clarkyzl/mpich2-yarn Pramod N http://atmachinelearner.blogspot.in Bruce Wayne of web @machinelearner https://twitter.com/machinelearner -- On Sun, Jun 30, 2013 at 6:02 PM, sam liu samliuhad...@gmail.com wrote: Hi Experts, Does Hadoop 2.0.5-alpha supports MPI programs? - If yes, is there any example of writing a MPI program on Yarn? How to write the client side code to configure and submit the MPI job? - If no, which Hadoop version will support MPI programming? Thanks!
Re: How to write/run MPI program on Yarn?
Sam, The fundamental idea of YARN is to split up the two major functionalities of the JobTracker, resource management and job scheduling/monitoring, into separate daemons. With YARN, we're going to be able to run multiple workload, one of them being MapReduce another one being MPI. So mpich2-yarn is the port of MPI to run on top of yarn. You may want to read : http://hortonworks.com/hadoop/yarn/ http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/YARN.html Hope it helps, Olivier On 2 July 2013 02:21, sam liu samliuhad...@gmail.com wrote: Any one could help answer above questions? Thanks a lot! 2013/7/1 sam liu samliuhad...@gmail.com Thanks Pramod and Clark! 1. What's the relationship of Hadoop 2.x branch and mpich2-yarn project? 2. Does Hadoop 2.x branch plan to include MPI implementation? I mentioned there is already a JIRA: https://issues.apache.org/jira/browse/MAPREDUCE-2911 2013/7/1 Clark Yang (杨卓荦) yangzhuo...@gmail.com Hi, sam Please try it following the README.md on https://github.com/clarkyzl/mpich2-yarn/. The design is simple ,make each nodemanger runs a smpd daemon and make application master run mpiexec to submit mpi jobs. The biggest problem here, I think, is the yarn scheduler do not support gang secheduling algorithm, and the scheduling is very very slow. Any ideas please feel free to share. Cheers, Zhuoluo (Clark) Yang 2013/6/30 Pramod N npramo...@gmail.com Hi Sam, This might interest you. https://github.com/clarkyzl/mpich2-yarn Pramod N http://atmachinelearner.blogspot.in Bruce Wayne of web @machinelearner https://twitter.com/machinelearner -- On Sun, Jun 30, 2013 at 6:02 PM, sam liu samliuhad...@gmail.comwrote: Hi Experts, Does Hadoop 2.0.5-alpha supports MPI programs? - If yes, is there any example of writing a MPI program on Yarn? How to write the client side code to configure and submit the MPI job? - If no, which Hadoop version will support MPI programming? Thanks! -- Olivier Renault Solution Engineer - Big Data - Hortonworks, Inc. +44 7500 933 036 orena...@hortonworks.com www.hortonworks.com http://hortonworks.com/products/hortonworks-sandbox/
Latest disctcp code
Hi, Which branch of Hadoop has latest Disctp code. The branch-1 mentions something like distcp2 https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1/src/tools/org/apache/hadoop/tools/distcp2/util/DistCpUtils.java The trunk has no mention of distcp2 http://svn.apache.org/repos/asf/hadoop/common/trunk/hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/ Same is the case of 2.1 branch http://svn.apache.org/repos/asf/hadoop/common/branches/branch-2.1.0-beta/hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/ Can we say that branch 1 has latest code and that has not been back ported to branch 2? What would be best place to look into , can you please point Thanks
Re: Latest disctcp code
Hello Harsh, Thank you very much for your reply. Regards, Jagat On Tue, Jul 2, 2013 at 8:29 PM, Harsh J ha...@cloudera.com wrote: The trunk's distcp is by default distcp2. The branch-1 received a backport of distcp2 recently, so is named differently. In general we try not to have a new feature introduced in branch-1. All new features must go to the trunk first, before being back-ported into maintained release branches. On Tue, Jul 2, 2013 at 3:19 PM, Jagat Singh jagatsi...@gmail.com wrote: Hi, Which branch of Hadoop has latest Disctp code. The branch-1 mentions something like distcp2 https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1/src/tools/org/apache/hadoop/tools/distcp2/util/DistCpUtils.java The trunk has no mention of distcp2 http://svn.apache.org/repos/asf/hadoop/common/trunk/hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/ Same is the case of 2.1 branch http://svn.apache.org/repos/asf/hadoop/common/branches/branch-2.1.0-beta/hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/ Can we say that branch 1 has latest code and that has not been back ported to branch 2? What would be best place to look into , can you please point Thanks -- Harsh J
Re: data loss after cluster wide power loss
Hi Uma, Thanks for the pointer. Your case sounds very similar. The main differences that I see are that in my case it happened on all 3 replicas and the power failure occurred merely seconds after the blocks were finalized. So I guess the question is whether HDFS can do anything to better recover from such situations. I'm also curious whether ext4 would be less susceptible than ext3. I will definitely look at enabling dfs.datanode.synconclose once we upgrade to a version of hdfs that has it. I would love to see some performance numbers if anyone has run them. Also appears that HBase is considering enabling it by default (cf. comments on HBase-5954). Dave On Tue, Jul 2, 2013 at 12:31 AM, Uma Maheswara Rao G mahesw...@huawei.comwrote: Hi Dave, Looks like your analysis is correct. I have faced similar issue some time back. See the discussion link: http://markmail.org/message/ruev3aa4x5zh2l4w#query:+page:1+mid:33gcdcu3coodkks3+state:results On sudden restarts, it can lost the OS filesystem edits. Similar thing happened in our case, i.e, after restart blocks were moved back to BeingWritten directory even though they were finalized. After restart they were marked as corrupt. You could set dfs.datanode.synconclose to true to avoid this sort of things, but that will degrade performance. Regards, Uma -Original Message- From: ddlat...@gmail.com [mailto:ddlat...@gmail.com] On Behalf Of Dave Latham Sent: 01 July 2013 16:08 To: hdfs-u...@hadoop.apache.org Cc: hdfs-...@hadoop.apache.org Subject: Re: data loss after cluster wide power loss Much appreciated, Suresh. Let me know if I can provide any more information or if you'd like me to open a JIRA. Dave On Mon, Jul 1, 2013 at 8:48 PM, Suresh Srinivas sur...@hortonworks.com wrote: Dave, Thanks for the detailed email. Sorry I did not read all the details you had sent earlier completely (on my phone). As you said, this is not related to data loss related to HBase log and hsync. I think you are right; the rename operation itself might not have hit the disk. I think we should either ensure metadata operation is synced on the datanode or handle it being reported as blockBeingWritten. Let me spend sometime to debug this issue. One surprising thing is, all the replicas were reported as blockBeingWritten. Regards, Suresh On Mon, Jul 1, 2013 at 6:03 PM, Dave Latham lat...@davelink.net wrote: (Removing hbase list and adding hdfs-dev list as this is pretty internal stuff). Reading through the code a bit: FSDataOutputStream.close calls DFSOutputStream.close calls DFSOutputStream.closeInternal - sets currentPacket.lastPacketInBlock = true - then calls DFSOutputStream.flushInternal - enqueues current packet - waits for ack BlockReceiver.run - if (lastPacketInBlock !receiver.finalized) calls FSDataset.finalizeBlock calls FSDataset.finalizeBlockInternal calls FSVolume.addBlock calls FSDir.addBlock calls FSDir.addBlock - renames block from blocksBeingWritten tmp dir to current dest dir This looks to me as I would expect a synchronous chain from a DFS client to moving the file from blocksBeingWritten to the current dir so that once the file is closed that it the block files would be in the proper directory - even if the contents of the file are still in the OS buffer rather than synced to disk. It's only after this moving of blocks that NameNode.complete file is called. There are several conditions and loops in there that I'm not certain this chain is fully reliable in all cases without a greater understanding of the code. Could it be the case that the rename operation itself is not synced and that ext3 lost the fact that the block files were moved? Or is there a bug in the close file logic that for some reason the block files are not always moved into place when a file is closed? Thanks for your patience, Dave On Mon, Jul 1, 2013 at 3:35 PM, Dave Latham lat...@davelink.net wrote: Thanks for the response, Suresh. I'm not sure that I understand the details properly. From my reading of HDFS-744 the hsync API would allow a client to make sure that at any point in time it's writes so far hit the disk. For example, for HBase it could apply a fsync after adding some edits to its WAL to ensure those edits are fully durable for a file which is still open. However, in this case the dfs file was closed and even renamed. Is it the case that even after a dfs file is closed and renamed that the data blocks would still not be synced and would still be stored by the datanode in blocksBeingWritten rather than in current? If that is case, would it be better for the NameNode not to reject replicas that are in blocksBeingWritten, especially if it doesn't have any other replicas available? Dave On Mon, Jul 1, 2013 at 3:16 PM, Suresh Srinivas sur...@hortonworks.comwrote: Yes
Fwd: Number format exception : For input string
I wrote a script as below. Data = LOAD 'part-r-0' AS (session_start_gmt:long) FilterData = FILTER Data BY session_start_gmt=1369546091667 I get below error 2013-07-01 22:48:06,510 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1200: For input string: 1369546091667 In detail log it says number format exception. When I give x = group Data ALL; y = FOREACH GENERATE MIN(Data.session_start_gmt) as min_session_start_time,MAX(Data.session_start_gmt) as max_session_start_time; I get below output (1369546091667,1369638849418) When just give session_start_gmt=1369546091 (3 digits less) it works fine So, why is the first error coming when I compare with exact value.? Thanks
RE: intermediate results files
Replication also has downstream effects: it puts pressure on the available network bandwidth and disk I/O bandwidth when the cluster is loaded. john From: Mohammad Tariq [mailto:donta...@gmail.com] Sent: Monday, July 01, 2013 6:35 PM To: user@hadoop.apache.org Subject: Re: intermediate results files I see. This difference is because of the fact that the next block of data will not be written to HDFS until the previous block was successfully written to 'all' the DNs selected for replication. This implies that higher RF means more time for the completion of a block write. Warm Regards, Tariq cloudfront.blogspot.comhttp://cloudfront.blogspot.com On Tue, Jul 2, 2013 at 4:39 AM, John Lilley john.lil...@redpoint.netmailto:john.lil...@redpoint.net wrote: I've seen some benchmarks where replication=1 runs at about 50MB/sec and replication=3 runs at about 33MB/sec, but I can't seem to find that now. John From: Mohammad Tariq [mailto:donta...@gmail.commailto:donta...@gmail.com] Sent: Monday, July 01, 2013 5:03 PM To: user@hadoop.apache.orgmailto:user@hadoop.apache.org Subject: Re: intermediate results files Hello John, IMHO, it doesn't matter. Your job will write the result just once. Replica creation is handled at the HDFS layer so it has nothing to with your job. Your job will still be writing at the same speed. Warm Regards, Tariq cloudfront.blogspot.comhttp://cloudfront.blogspot.com On Tue, Jul 2, 2013 at 4:16 AM, John Lilley john.lil...@redpoint.netmailto:john.lil...@redpoint.net wrote: If my reducers are going to create results that are temporary in nature (consumed by the next processing stage) is it recommended to use a replication factor 3 to improve performance? Thanks john
RE: YARN tasks and child processes
Devaraj, Thanks, this is also good information. But I was really asking if a child *process* that was spawned by a task can persist, in addition to the data. john From: Devaraj k [mailto:devara...@huawei.com] Sent: Monday, July 01, 2013 11:50 PM To: user@hadoop.apache.org Subject: RE: YARN tasks and child processes It is possible to persist the data by YARN task, you can choose whichever place you want to persist. If you choose to persist in HDFS, you need to take care deleting the data after using it. If you choose to write in local dir, you may write the data into the nm local dirs (i.e 'yarn.nodemanager.local-dirs' configuration) accordingly with the app id container id, and this will be cleaned up after the app completion. You need to make use of this persisted data before completing the application. Thanks Devaraj k From: John Lilley [mailto:john.lil...@redpoint.net] Sent: 02 July 2013 04:44 To: user@hadoop.apache.orgmailto:user@hadoop.apache.org Subject: YARN tasks and child processes Is it possible for a child process of a YARN task to persist after the task is complete? I am looking at an alternative to a YARN auxiliary process that may be simpler to implement, if I can have a task spawn a process that persists for some time after the task finishes. Thanks, John
Re: intermediate results files
Hi John! If your block is going to be replicated to three nodes, then in the default block placement policy, 2 of them will be on the same rack, and a third one will be on a different rack. Depending on the network bandwidths available intra-rack and inter-rack, writing with replication factor=3 may be almost as fast or (more likely) slower. With replication factor=2, the default block placement is to place them on different racks, so you wouldn't gain much. So you can 1. Either choose replication factor = 1 2. Change the block placement policy such that even with replication factor=2, it will choose two nodes in the same rack. HTH Ravi From: Devaraj k devara...@huawei.com To: user@hadoop.apache.org user@hadoop.apache.org Sent: Tuesday, July 2, 2013 1:00 AM Subject: RE: intermediate results files If you are 100% sure that all the node data nodes are available and healthy for that period of time, you can choose the replication factor as 1 or 3. Thanks Devaraj k From:John Lilley [mailto:john.lil...@redpoint.net] Sent: 02 July 2013 04:40 To: user@hadoop.apache.org Subject: RE: intermediate results files I’ve seen some benchmarks where replication=1 runs at about 50MB/sec and replication=3 runs at about 33MB/sec, but I can’t seem to find that now. John From:Mohammad Tariq [mailto:donta...@gmail.com] Sent: Monday, July 01, 2013 5:03 PM To: user@hadoop.apache.org Subject: Re: intermediate results files Hello John, IMHO, it doesn't matter. Your job will write the result just once. Replica creation is handled at the HDFS layer so it has nothing to with your job. Your job will still be writing at the same speed. Warm Regards, Tariq cloudfront.blogspot.com On Tue, Jul 2, 2013 at 4:16 AM, John Lilley john.lil...@redpoint.net wrote: If my reducers are going to create results that are temporary in nature (consumed by the next processing stage) is it recommended to use a replication factor 3 to improve performance? Thanks john
HDFS file section rewrite
I'm sure this has been asked a zillion times, so please just point me to the JIRA comments: is there a feature underway to allow for re-writing of HDFS file sections? Thanks John
typical JSON data sets
I would like to hear your experiences working with large JSON data sets, specifically: 1) How large is each JSON document? 2) Do they tend to be a single JSON doc per file, or multiples per file? 3) Do the JSON schemas change over time? 4) Are there interesting public data sets you would recommend for experiment? Thanks John
Containers and CPU
I have YARN tasks that benefit from multicore scaling. However, they don't *always* use more than one core. I would like to allocate containers based only on memory, and let each task use as many cores as needed, without allocating exclusive CPU slots in the scheduler. For example, on an 8-core node with 16GB memory, I'd like to be able to run 3 tasks each consuming 4GB memory and each using as much CPU as they like. Is this the default behavior if I don't specify CPU restrictions to the scheduler? Thanks John
RE: some idea about the Data Compression
Geelong, 1. These files will probably be some standard format like .gz or .bz2 or .zip. In that case, pick an appropriate InputFormat. See e.g. http://cotdp.com/2012/07/hadoop-processing-zip-files-in-mapreduce/, http://stackoverflow.com/questions/14497572/reading-gzipped-file-in-hadoop-using-custom-recordreader 2. Generally, compression is a Good Thing and will improve performance. But only if you use a fast compressor like LZO or Snappy. Gzip, ZIP, BZ2, etc are no good for this. You also need to ensure that your compressed files are splittable if you are going to create a single file that will be processed by a later MR stage, for this a SequenceFile is helpful. For typical intermediate outputs it doesn't matter as much because you will have a folder of file parts and these are pre split in some sense. Once upon a time, LZO compression was a thing that you had to install as a separate component, but I think the modern distros include it. See for example: http://kickstarthadoop.blogspot.com/2012/02/use-compression-with-mapreduce.html , http://blog.cloudera.com/blog/2009/05/10-mapreduce-tips/, http://my.safaribooksonline.com/book/software-engineering-and-development/9781449328917/compression/id3689058, https://www.inkling.com/read/hadoop-definitive-guide-tom-white-3rd/chapter-4/compression (section 4.2 in the Elephant book). John From: Geelong Yao [mailto:geelong...@gmail.com] Sent: Thursday, June 20, 2013 12:30 AM To: user@hadoop.apache.org Subject: some idea about the Data Compression Hi , everyone I am working on the data compression 1.data compression before the raw data were uploaded into HDFS. 2.data compression while processing in Hadoop to reduce the pressure on IO. Can anyone give me some ideas on above 2 directions BRs Geelong -- From Good To Great
Custom JoinRecordReader class
Hi all, I would like some help/direction on implementing a custom join class. I believe this is the way to address my task at hand, which is given 2 matrices in SequenceFile format, I wish to run operations on all pairs of rows between them. The rows may not be equal in number. The actual operations will be taken care of in Mahout. I wrote a custom class working off of InnerJoinRecordReader and OuterJoinRecordReader but they of course always get fed and thus return pairs of keys that match. How can I get a return of all key pairs? Or does this go completely against the hadoop map-reduce framework? Thanks in advance for any input.
RE: Containers and CPU
I believe this is the default behavior. By default, only memory limit on resources is enforced. The capacity scheduler will use DefaultResourceCalculator to compute resource allocation for containers by default, which also does not take CPU into account. -Chuan From: John Lilley [mailto:john.lil...@redpoint.net] Sent: Tuesday, July 02, 2013 8:57 AM To: user@hadoop.apache.org Subject: Containers and CPU I have YARN tasks that benefit from multicore scaling. However, they don't *always* use more than one core. I would like to allocate containers based only on memory, and let each task use as many cores as needed, without allocating exclusive CPU slots in the scheduler. For example, on an 8-core node with 16GB memory, I'd like to be able to run 3 tasks each consuming 4GB memory and each using as much CPU as they like. Is this the default behavior if I don't specify CPU restrictions to the scheduler? Thanks John
Re: Containers and CPU
CPU limits are only enforced if cgroups is turned on. With cgroups on, they are only limited when there is contention, in which case tasks are given CPU time in proportion to the number of cores requested for/allocated to them. Does that make sense? -Sandy On Tue, Jul 2, 2013 at 9:50 AM, Chuan Liu chuan...@microsoft.com wrote: I believe this is the default behavior. By default, only memory limit on resources is enforced. The capacity scheduler will use DefaultResourceCalculator to compute resource allocation for containers by default, which also does not take CPU into account. ** ** -Chuan ** ** *From:* John Lilley [mailto:john.lil...@redpoint.net] *Sent:* Tuesday, July 02, 2013 8:57 AM *To:* user@hadoop.apache.org *Subject:* Containers and CPU ** ** I have YARN tasks that benefit from multicore scaling. However, they don’t **always** use more than one core. I would like to allocate containers based only on memory, and let each task use as many cores as needed, without allocating exclusive CPU “slots” in the scheduler. For example, on an 8-core node with 16GB memory, I’d like to be able to run 3 tasks each consuming 4GB memory and each using as much CPU as they like. Is this the default behavior if I don’t specify CPU restrictions to the scheduler? Thanks John ** ** ** **
RE: Yarn HDFS and Yarn Exceptions when processing larger datasets.
Blah blah, Can you build and run the DistributedShell example? If it does not run correctly this would tend to implicate your configuration. If it run correctly then your code is suspect. John From: blah blah [mailto:tmp5...@gmail.com] Sent: Tuesday, June 25, 2013 6:09 PM To: user@hadoop.apache.org Subject: Yarn HDFS and Yarn Exceptions when processing larger datasets. Hi All First let me excuse for the poor thread title but I have no idea how to express the problem in one sentence. I have implemented new Application Master with the use of Yarn. I am using old Yarn development version. Revision 1437315, from 2013-01-23 (SNAPSHOT 3.0.0). I can not update to current trunk version, as prototype deadline is soon, and I don't have time to include Yarn API changes. Currently I execute experiments in pseudo-distributed mode, I use guava version 14.0-rc1. I have a problem with Yarn's and HDFS Exceptions for larger datasets. My AM works fine and I can execute it without a problem for a debug dataset (1MB size). But when I increase the size of input to 6.8 MB, I am getting the following exceptions: AM_Exceptions_Stack Exception in thread Thread-3 java.lang.reflect.UndeclaredThrowableException at org.apache.hadoop.yarn.exceptions.impl.pb.YarnRemoteExceptionPBImpl.unwrapAndThrowException(YarnRemoteExceptionPBImpl.java:135) at org.apache.hadoop.yarn.api.impl.pb.client.AMRMProtocolPBClientImpl.allocate(AMRMProtocolPBClientImpl.java:77) at org.apache.hadoop.yarn.client.AMRMClientImpl.allocate(AMRMClientImpl.java:194) at org.tudelft.ludograph.app.AppMasterContainerRequester.sendContainerAskToRM(AppMasterContainerRequester.java:219) at org.tudelft.ludograph.app.AppMasterContainerRequester.run(AppMasterContainerRequester.java:315) at java.lang.Thread.run(Thread.java:662) Caused by: com.google.protobuf.ServiceException: java.io.IOException: Failed on local exception: java.io.IOException: Response is null.; Host Details : local host is: linux-ljc5.site/127.0.0.1http://127.0.0.1; destination host is: 0.0.0.0:8030; at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:212) at $Proxy10.allocate(Unknown Source) at org.apache.hadoop.yarn.api.impl.pb.client.AMRMProtocolPBClientImpl.allocate(AMRMProtocolPBClientImpl.java:75) ... 4 more Caused by: java.io.IOException: Failed on local exception: java.io.IOException: Response is null.; Host Details : local host is: linux-ljc5.site/127.0.0.1http://127.0.0.1; destination host is: 0.0.0.0:8030; at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:760) at org.apache.hadoop.ipc.Client.call(Client.java:1240) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202) ... 6 more Caused by: java.io.IOException: Response is null. at org.apache.hadoop.ipc.Client$Connection.receiveRpcResponse(Client.java:950) at org.apache.hadoop.ipc.Client$Connection.run(Client.java:844) Container_Exception Exception in thread org.apache.hadoop.hdfs.SocketCache@6da0d866mailto:org.apache.hadoop.hdfs.SocketCache@6da0d866 java.lang.NoSuchMethodError: com.google.common.collect.LinkedListMultimap.values()Ljava/util/List; at org.apache.hadoop.hdfs.SocketCache.clear(SocketCache.java:257) at org.apache.hadoop.hdfs.SocketCache.access$100(SocketCache.java:45) at org.apache.hadoop.hdfs.SocketCache$1.run(SocketCache.java:126) at java.lang.Thread.run(Thread.java:662) As I said this problem does not occur for the 1MB input. For the 6MB input nothing is changed except the input dataset. Now a little bit of what am I doing, to give you the context of the problem. My AM starts N (debug 4) containers and each container reads its input data part. When this process is finished I am exchanging parts of input between containers (exchanging IDs of input structures, to provide means for communication between data structures). During the process of exchanging IDs these exceptions occur. I start Netty Server/Client on each container and I use ports 12000-12099 as mean of communicating these IDs. Any help will be greatly appreciated. Sorry for any typos and if the explanation is not clear just ask for any details you are interested in. Currently it is after 2 AM I hope this will be a valid excuse. regards tmp
Re: HDFS file section rewrite
HDFS only supports regular writes and append. Random write is not supported. I do not know of any feature/jira that is underway to support this feature. On Tue, Jul 2, 2013 at 9:01 AM, John Lilley john.lil...@redpoint.netwrote: I’m sure this has been asked a zillion times, so please just point me to the JIRA comments: is there a feature underway to allow for re-writing of HDFS file sections? Thanks John ** ** -- http://hortonworks.com/download/
RE: YARN tasks and child processes
Thanks, that answers my question. I am trying to explore alternatives to a YARN auxiliary service, but apparently this isn’t an option. John From: Ravi Prakash [mailto:ravi...@ymail.com] Sent: Tuesday, July 02, 2013 9:55 AM To: user@hadoop.apache.org Subject: Re: YARN tasks and child processes Nopes! The node manager kills the entire process tree when the task reports that it is done. Now if you were able to figure out a way for one of the children to break out of the process tree, maybe? However your approach is obviously not recommended. You would be stealing from the resources that YARN should have available. From: John Lilley john.lil...@redpoint.netmailto:john.lil...@redpoint.net To: user@hadoop.apache.orgmailto:user@hadoop.apache.org user@hadoop.apache.orgmailto:user@hadoop.apache.org Sent: Tuesday, July 2, 2013 10:41 AM Subject: RE: YARN tasks and child processes Devaraj, Thanks, this is also good information. But I was really asking if a child *process* that was spawned by a task can persist, in addition to the data. john From: Devaraj k [mailto:devara...@huawei.com] Sent: Monday, July 01, 2013 11:50 PM To: user@hadoop.apache.orgmailto:user@hadoop.apache.org Subject: RE: YARN tasks and child processes It is possible to persist the data by YARN task, you can choose whichever place you want to persist. If you choose to persist in HDFS, you need to take care deleting the data after using it. If you choose to write in local dir, you may write the data into the nm local dirs (i.e ‘yarn.nodemanager.local-dirs’ configuration) accordingly with the app id container id, and this will be cleaned up after the app completion. You need to make use of this persisted data before completing the application. Thanks Devaraj k From: John Lilley [mailto:john.lil...@redpoint.net] Sent: 02 July 2013 04:44 To: user@hadoop.apache.orgmailto:user@hadoop.apache.org Subject: YARN tasks and child processes Is it possible for a child process of a YARN task to persist after the task is complete? I am looking at an alternative to a YARN auxiliary process that may be simpler to implement, if I can have a task spawn a process that persists for some time after the task finishes. Thanks, John
RE: Containers and CPU
Sandy, Sorry, I don't completely follow. When you say with cgroups on, is that an attribute of the AM, the Scheduler, or the Site/RM? In other words is it site-wide or something that my application can control? With cgroups on, is there still a way to get my desired behavior? I'd really like all tasks to have access to all CPU cores and simply fight it out in the OS thread scheduler. Thanks, john From: Sandy Ryza [mailto:sandy.r...@cloudera.com] Sent: Tuesday, July 02, 2013 11:56 AM To: user@hadoop.apache.org Subject: Re: Containers and CPU CPU limits are only enforced if cgroups is turned on. With cgroups on, they are only limited when there is contention, in which case tasks are given CPU time in proportion to the number of cores requested for/allocated to them. Does that make sense? -Sandy On Tue, Jul 2, 2013 at 9:50 AM, Chuan Liu chuan...@microsoft.commailto:chuan...@microsoft.com wrote: I believe this is the default behavior. By default, only memory limit on resources is enforced. The capacity scheduler will use DefaultResourceCalculator to compute resource allocation for containers by default, which also does not take CPU into account. -Chuan From: John Lilley [mailto:john.lil...@redpoint.netmailto:john.lil...@redpoint.net] Sent: Tuesday, July 02, 2013 8:57 AM To: user@hadoop.apache.orgmailto:user@hadoop.apache.org Subject: Containers and CPU I have YARN tasks that benefit from multicore scaling. However, they don't *always* use more than one core. I would like to allocate containers based only on memory, and let each task use as many cores as needed, without allocating exclusive CPU slots in the scheduler. For example, on an 8-core node with 16GB memory, I'd like to be able to run 3 tasks each consuming 4GB memory and each using as much CPU as they like. Is this the default behavior if I don't specify CPU restrictions to the scheduler? Thanks John
RE: Containers and CPU
To explain my reasoning, suppose that I have an application that performs some CPU-intensive calculation, and can scale to multiple cores internally, but it doesn't need those cores all the time because the CPU-intensive phase is only a part of the overall computation. I'm not sure I understand cgroups' CPU control - does it statically mask cores available to processes, or does it set up a prioritization for access to all available cores? Thanks, John From: John Lilley [mailto:john.lil...@redpoint.net] Sent: Tuesday, July 02, 2013 1:12 PM To: user@hadoop.apache.org Subject: RE: Containers and CPU Sandy, Sorry, I don't completely follow. When you say with cgroups on, is that an attribute of the AM, the Scheduler, or the Site/RM? In other words is it site-wide or something that my application can control? With cgroups on, is there still a way to get my desired behavior? I'd really like all tasks to have access to all CPU cores and simply fight it out in the OS thread scheduler. Thanks, john From: Sandy Ryza [mailto:sandy.r...@cloudera.com] Sent: Tuesday, July 02, 2013 11:56 AM To: user@hadoop.apache.orgmailto:user@hadoop.apache.org Subject: Re: Containers and CPU CPU limits are only enforced if cgroups is turned on. With cgroups on, they are only limited when there is contention, in which case tasks are given CPU time in proportion to the number of cores requested for/allocated to them. Does that make sense? -Sandy On Tue, Jul 2, 2013 at 9:50 AM, Chuan Liu chuan...@microsoft.commailto:chuan...@microsoft.com wrote: I believe this is the default behavior. By default, only memory limit on resources is enforced. The capacity scheduler will use DefaultResourceCalculator to compute resource allocation for containers by default, which also does not take CPU into account. -Chuan From: John Lilley [mailto:john.lil...@redpoint.netmailto:john.lil...@redpoint.net] Sent: Tuesday, July 02, 2013 8:57 AM To: user@hadoop.apache.orgmailto:user@hadoop.apache.org Subject: Containers and CPU I have YARN tasks that benefit from multicore scaling. However, they don't *always* use more than one core. I would like to allocate containers based only on memory, and let each task use as many cores as needed, without allocating exclusive CPU slots in the scheduler. For example, on an 8-core node with 16GB memory, I'd like to be able to run 3 tasks each consuming 4GB memory and each using as much CPU as they like. Is this the default behavior if I don't specify CPU restrictions to the scheduler? Thanks John
HDFS temporary file locations
Is there any convention for clients/applications wishing to use temporary file space in HDFS? For example, my application wants to: 1) Load data into some temporary space in HDFS as an external client 2) Run an AM, which produces HDFS output (also in the temporary space) 3) Read the data out of the HDFS temporary space 4) Remove the temporary files Thanks John
Re: Containers and CPU
Use of cgroups for controlling CPU is off by default, but can be turned on as a nodemanager configuration with yarn.nodemanager.linux-container-executor.resources-handler.class. So it is site-wide. If you want tasks to purely fight it out in the OS thread scheduler, simply don't change from the default. Even with cgroups on, all tasks will have access to all CPU cores. We don't do any pinning of tasks to cores. If a task is requested with a single vcore and placed on an otherwise empty machine with 8 cores, it will have access to all 8 cores. If 3 other tasks that requested a single vcore are later placed on the same node, and all tasks are using as much CPU as they can get their hands on, then each of the tasks will get 2 cores of CPU-time. On Tue, Jul 2, 2013 at 12:12 PM, John Lilley john.lil...@redpoint.netwrote: Sandy, Sorry, I don’t completely follow. When you say “with cgroups on”, is that an attribute of the AM, the Scheduler, or the Site/RM? In other words is it site-wide or something that my application can control? With cgroups on, is there still a way to get my desired behavior? I’d really like all tasks to have access to all CPU cores and simply fight it out in the OS thread scheduler. Thanks, john ** ** *From:* Sandy Ryza [mailto:sandy.r...@cloudera.com] *Sent:* Tuesday, July 02, 2013 11:56 AM *To:* user@hadoop.apache.org *Subject:* Re: Containers and CPU ** ** CPU limits are only enforced if cgroups is turned on. With cgroups on, they are only limited when there is contention, in which case tasks are given CPU time in proportion to the number of cores requested for/allocated to them. Does that make sense? ** ** -Sandy ** ** On Tue, Jul 2, 2013 at 9:50 AM, Chuan Liu chuan...@microsoft.com wrote:* *** I believe this is the default behavior. By default, only memory limit on resources is enforced. The capacity scheduler will use DefaultResourceCalculator to compute resource allocation for containers by default, which also does not take CPU into account. -Chuan *From:* John Lilley [mailto:john.lil...@redpoint.net] *Sent:* Tuesday, July 02, 2013 8:57 AM *To:* user@hadoop.apache.org *Subject:* Containers and CPU I have YARN tasks that benefit from multicore scaling. However, they don’t **always** use more than one core. I would like to allocate containers based only on memory, and let each task use as many cores as needed, without allocating exclusive CPU “slots” in the scheduler. For example, on an 8-core node with 16GB memory, I’d like to be able to run 3 tasks each consuming 4GB memory and each using as much CPU as they like. Is this the default behavior if I don’t specify CPU restrictions to the scheduler? Thanks John ** **
RE: Containers and CPU
Sandy, Thanks, I think I understand. So it only makes a difference if cgroups is on AND the AM requests multiple cores? E.g. if each task wants 4 cores the RM would only allow two containers per 8-core node? John From: Sandy Ryza [mailto:sandy.r...@cloudera.com] Sent: Tuesday, July 02, 2013 1:26 PM To: user@hadoop.apache.org Subject: Re: Containers and CPU Use of cgroups for controlling CPU is off by default, but can be turned on as a nodemanager configuration with yarn.nodemanager.linux-container-executor.resources-handler.class. So it is site-wide. If you want tasks to purely fight it out in the OS thread scheduler, simply don't change from the default. Even with cgroups on, all tasks will have access to all CPU cores. We don't do any pinning of tasks to cores. If a task is requested with a single vcore and placed on an otherwise empty machine with 8 cores, it will have access to all 8 cores. If 3 other tasks that requested a single vcore are later placed on the same node, and all tasks are using as much CPU as they can get their hands on, then each of the tasks will get 2 cores of CPU-time. On Tue, Jul 2, 2013 at 12:12 PM, John Lilley john.lil...@redpoint.netmailto:john.lil...@redpoint.net wrote: Sandy, Sorry, I don't completely follow. When you say with cgroups on, is that an attribute of the AM, the Scheduler, or the Site/RM? In other words is it site-wide or something that my application can control? With cgroups on, is there still a way to get my desired behavior? I'd really like all tasks to have access to all CPU cores and simply fight it out in the OS thread scheduler. Thanks, john From: Sandy Ryza [mailto:sandy.r...@cloudera.commailto:sandy.r...@cloudera.com] Sent: Tuesday, July 02, 2013 11:56 AM To: user@hadoop.apache.orgmailto:user@hadoop.apache.org Subject: Re: Containers and CPU CPU limits are only enforced if cgroups is turned on. With cgroups on, they are only limited when there is contention, in which case tasks are given CPU time in proportion to the number of cores requested for/allocated to them. Does that make sense? -Sandy On Tue, Jul 2, 2013 at 9:50 AM, Chuan Liu chuan...@microsoft.commailto:chuan...@microsoft.com wrote: I believe this is the default behavior. By default, only memory limit on resources is enforced. The capacity scheduler will use DefaultResourceCalculator to compute resource allocation for containers by default, which also does not take CPU into account. -Chuan From: John Lilley [mailto:john.lil...@redpoint.netmailto:john.lil...@redpoint.net] Sent: Tuesday, July 02, 2013 8:57 AM To: user@hadoop.apache.orgmailto:user@hadoop.apache.org Subject: Containers and CPU I have YARN tasks that benefit from multicore scaling. However, they don't *always* use more than one core. I would like to allocate containers based only on memory, and let each task use as many cores as needed, without allocating exclusive CPU slots in the scheduler. For example, on an 8-core node with 16GB memory, I'd like to be able to run 3 tasks each consuming 4GB memory and each using as much CPU as they like. Is this the default behavior if I don't specify CPU restrictions to the scheduler? Thanks John
Re: Yarn HDFS and Yarn Exceptions when processing larger datasets.
Hi Just a quick short reply (tomorrow is my prototype presentation). @Omkar Joshi - RM port 8030 already running when I start my AM - I'll do the client thread size AM - Only AM communicates with RM - RM/NM no exceptions there (as far as I remember will check later [sorry]) Furthermore in fully distributed mode AM doesn't throw exceptions anymore, only Containers. @John Lilley Yes the problem is with my code (I don't want to imply that it is YARN's problem). I have successfully run Distributed Shell and YARN's MapReduce jobs with much bigger datasets than 1mb ;). I just don't know where to start looking for the problem, especially for the Containers exceptions as they occur after my containers are done with HDFS (until they store final output). The only idea I have is that these exceptions occur during Containers communication. Instead of sending multiple messages my containers aggregate all messages per container into one big message (the biggest around 8k-10k chars), thus each container sends only 1 message to other container (which includes multiple messages). I don't know if this information is important, but I am planning to see what will happen if I partition the messages (1024). I got this idea from the Containers exception org.apache.hadoop.hdfs.SocketCache, I am using SocketChannels to send these big messages, so maybe I am creating some Socket conflict . regards tmp 2013/7/2 John Lilley john.lil...@redpoint.net Blah blah, Can you build and run the DistributedShell example? If it does not run correctly this would tend to implicate your configuration. If it run correctly then your code is suspect. John ** ** ** ** *From:* blah blah [mailto:tmp5...@gmail.com] *Sent:* Tuesday, June 25, 2013 6:09 PM *To:* user@hadoop.apache.org *Subject:* Yarn HDFS and Yarn Exceptions when processing larger datasets. ** ** Hi All First let me excuse for the poor thread title but I have no idea how to express the problem in one sentence. I have implemented new Application Master with the use of Yarn. I am using old Yarn development version. Revision 1437315, from 2013-01-23 (SNAPSHOT 3.0.0). I can not update to current trunk version, as prototype deadline is soon, and I don't have time to include Yarn API changes. Currently I execute experiments in pseudo-distributed mode, I use guava version 14.0-rc1. I have a problem with Yarn's and HDFS Exceptions for larger datasets. My AM works fine and I can execute it without a problem for a debug dataset (1MB size). But when I increase the size of input to 6.8 MB, I am getting the following exceptions: AM_Exceptions_Stack Exception in thread Thread-3 java.lang.reflect.UndeclaredThrowableException at org.apache.hadoop.yarn.exceptions.impl.pb.YarnRemoteExceptionPBImpl.unwrapAndThrowException(YarnRemoteExceptionPBImpl.java:135) at org.apache.hadoop.yarn.api.impl.pb.client.AMRMProtocolPBClientImpl.allocate(AMRMProtocolPBClientImpl.java:77) at org.apache.hadoop.yarn.client.AMRMClientImpl.allocate(AMRMClientImpl.java:194) at org.tudelft.ludograph.app.AppMasterContainerRequester.sendContainerAskToRM(AppMasterContainerRequester.java:219) at org.tudelft.ludograph.app.AppMasterContainerRequester.run(AppMasterContainerRequester.java:315) at java.lang.Thread.run(Thread.java:662) Caused by: com.google.protobuf.ServiceException: java.io.IOException: Failed on local exception: java.io.IOException: Response is null.; Host Details : local host is: linux-ljc5.site/127.0.0.1; destination host is: 0.0.0.0:8030; at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:212) at $Proxy10.allocate(Unknown Source) at org.apache.hadoop.yarn.api.impl.pb.client.AMRMProtocolPBClientImpl.allocate(AMRMProtocolPBClientImpl.java:75) ... 4 more Caused by: java.io.IOException: Failed on local exception: java.io.IOException: Response is null.; Host Details : local host is: linux-ljc5.site/127.0.0.1; destination host is: 0.0.0.0:8030; at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:760) at org.apache.hadoop.ipc.Client.call(Client.java:1240) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202) ... 6 more Caused by: java.io.IOException: Response is null. at org.apache.hadoop.ipc.Client$Connection.receiveRpcResponse(Client.java:950) at org.apache.hadoop.ipc.Client$Connection.run(Client.java:844) Container_Exception Exception in thread org.apache.hadoop.hdfs.SocketCache@6da0d866 java.lang.NoSuchMethodError: com.google.common.collect.LinkedListMultimap.values()Ljava/util/List; at org.apache.hadoop.hdfs.SocketCache.clear(SocketCache.java:257) at org.apache.hadoop.hdfs.SocketCache.access$100(SocketCache.java:45) at org.apache.hadoop.hdfs.SocketCache$1.run(SocketCache.java:126) at java.lang.Thread.run(Thread.java:662) As I said this
RE: HDFS file section rewrite
I found this: https://issues.apache.org/jira/browse/HADOOP-5215 Doesn't seem to have attracted much interest. John From: Suresh Srinivas [mailto:sur...@hortonworks.com] Sent: Tuesday, July 02, 2013 1:03 PM To: hdfs-u...@hadoop.apache.org Subject: Re: HDFS file section rewrite HDFS only supports regular writes and append. Random write is not supported. I do not know of any feature/jira that is underway to support this feature. On Tue, Jul 2, 2013 at 9:01 AM, John Lilley john.lil...@redpoint.netmailto:john.lil...@redpoint.net wrote: I'm sure this has been asked a zillion times, so please just point me to the JIRA comments: is there a feature underway to allow for re-writing of HDFS file sections? Thanks John -- http://hortonworks.com/download/
Re: Containers and CPU
That's correct. -Sandy On Tue, Jul 2, 2013 at 12:28 PM, John Lilley john.lil...@redpoint.netwrote: Sandy, Thanks, I think I understand. So it only makes a difference if cgroups is on AND the AM requests multiple cores? E.g. if each task wants 4 cores the RM would only allow two containers per 8-core node? John ** ** ** ** *From:* Sandy Ryza [mailto:sandy.r...@cloudera.com] *Sent:* Tuesday, July 02, 2013 1:26 PM *To:* user@hadoop.apache.org *Subject:* Re: Containers and CPU ** ** Use of cgroups for controlling CPU is off by default, but can be turned on as a nodemanager configuration with yarn.nodemanager.linux-container-executor.resources-handler.class. So it is site-wide. If you want tasks to purely fight it out in the OS thread scheduler, simply don't change from the default. ** ** Even with cgroups on, all tasks will have access to all CPU cores. We don't do any pinning of tasks to cores. If a task is requested with a single vcore and placed on an otherwise empty machine with 8 cores, it will have access to all 8 cores. If 3 other tasks that requested a single vcore are later placed on the same node, and all tasks are using as much CPU as they can get their hands on, then each of the tasks will get 2 cores of CPU-time. ** ** On Tue, Jul 2, 2013 at 12:12 PM, John Lilley john.lil...@redpoint.net wrote: Sandy, Sorry, I don’t completely follow. When you say “with cgroups on”, is that an attribute of the AM, the Scheduler, or the Site/RM? In other words is it site-wide or something that my application can control? With cgroups on, is there still a way to get my desired behavior? I’d really like all tasks to have access to all CPU cores and simply fight it out in the OS thread scheduler. Thanks, john *From:* Sandy Ryza [mailto:sandy.r...@cloudera.com] *Sent:* Tuesday, July 02, 2013 11:56 AM *To:* user@hadoop.apache.org *Subject:* Re: Containers and CPU CPU limits are only enforced if cgroups is turned on. With cgroups on, they are only limited when there is contention, in which case tasks are given CPU time in proportion to the number of cores requested for/allocated to them. Does that make sense? -Sandy On Tue, Jul 2, 2013 at 9:50 AM, Chuan Liu chuan...@microsoft.com wrote:* *** I believe this is the default behavior. By default, only memory limit on resources is enforced. The capacity scheduler will use DefaultResourceCalculator to compute resource allocation for containers by default, which also does not take CPU into account. -Chuan *From:* John Lilley [mailto:john.lil...@redpoint.net] *Sent:* Tuesday, July 02, 2013 8:57 AM *To:* user@hadoop.apache.org *Subject:* Containers and CPU I have YARN tasks that benefit from multicore scaling. However, they don’t **always** use more than one core. I would like to allocate containers based only on memory, and let each task use as many cores as needed, without allocating exclusive CPU “slots” in the scheduler. For example, on an 8-core node with 16GB memory, I’d like to be able to run 3 tasks each consuming 4GB memory and each using as much CPU as they like. Is this the default behavior if I don’t specify CPU restrictions to the scheduler? Thanks John ** **
Re: Exception in createBlockOutputStream - poss firewall issue
Hi John, exactly what I was thinking, however I haven't found a way to do that. If I ever have time I'll trawl through the code, however I've managed to avoid the issue by placing both machines inside the firewall. Regards Robin Sent from my iPhone On 2 Jul 2013, at 19:48, John Lilley john.lil...@redpoint.net wrote: I don’t know the answer… but if it is possible to make the DNs report a domain-name instead of an IP quad it may help. John From: Robin East [mailto:robin.e...@xense.co.uk] Sent: Thursday, June 27, 2013 12:18 AM To: user@hadoop.apache.org Subject: Re: Exception in createBlockOutputStream - poss firewall issue Ok I should have added that external address of the cluster is NATed to the internal address. The internal address yyy.yyy.yyy.yyy is not a routable address for the client. The client can reach the namenode on xxx.xxx.xxx.xxx but I guess the data node must advertise itself using the internal address (yyy.yyy.yyy.yyy). What is the way round this ( not an uncommon problem in secure environments). Robin Sent from my iPad On 27 Jun 2013, at 03:41, Harsh J ha...@cloudera.com wrote: Clients will read/write data to the DNs directly. DNs serve on port 50010 and 50020 by default. Please open up these ports, aside of the NN's RPC ports, to be able to read/write data. On Thu, Jun 27, 2013 at 2:23 AM, Robin East robin.e...@xense.co.uk wrote: I have a single node hadoop cluster setup behind a firewall and am trying to create files using a java program outside the firewall and get the exception below. The java program works fine inside the firewall. The ip address for the single cluster is xxx.xxx.xxx.xxx however it appears that in the createBlockOutputStream the client things the data node is at ip yyy.yyy.yyy.yyy (the internal address of the cluster) which is not accessible. The java code looks like this (using hadoop 1.1.2): private static void createHdfsFile() throws IOException { Configuration conf = new Configuration(); conf.set(fs.default.name, hdfs://+hdfsHost+:9000); FileSystem hdfs = FileSystem.get(conf); System.out.println(HDFS Working Directory: + hdfs.getWorkingDirectory().toString()); FSDataOutputStream os = hdfs.create(new Path(/user/hadoop/test2.txt)); os.writeChars(Example text\n for a hadoop write call\n\ntesting\n); os.close(); } Any idea how I can get this to work? HDFS Working Directory: hdfs://xxx.xxx.xxx.xxx:9000/user/z Jun 26, 2013 8:08:52 PM org.apache.hadoop.hdfs.DFSClient$DFSOutputStream createBlockOutputStream INFO: Exception in createBlockOutputStream yyy.yyy.yyy.yyy:50010 java.net.ConnectException: Connection timed out: no furth er information Jun 26, 2013 8:08:52 PM org.apache.hadoop.hdfs.DFSClient$DFSOutputStream nextBlockOutputStream INFO: Abandoning block blk_4933973859208379842_1028 Jun 26, 2013 8:08:53 PM org.apache.hadoop.hdfs.DFSClient$DFSOutputStream nextBlockOutputStream INFO: Excluding datanode yyy.yyy.yyy.yyy:50010 Jun 26, 2013 8:08:53 PM org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer run WARNING: DataStreamer Exception: org.apache.hadoop.ipc.RemoteException: java.io.IOException: File /user/hadoop/test2.txt could only be replicated to 0 nodes, instead of 1 at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1558) at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:696) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:601) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:563) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1388) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1384) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1382) at org.apache.hadoop.ipc.Client.call(Client.java:1107) at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:229) at $Proxy1.addBlock(Unknown Source) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) at java.lang.reflect.Method.invoke(Unknown Source) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:85) at
[no subject]
Hello, I have a Hadoop 2.0.5 Alpha cluster. When I execute any Hadoop command, I see the following message. WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Is it at the lib/native folder? How do I configure the system to load it? Thanks, Chui-hui
Unable to load native-hadoop library
Hello, I have a Hadoop 2.0.5 Alpha cluster. When I execute any Hadoop command, I see the following message. WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Is it at the lib/native folder? How do I configure the system to load it? Thanks, Chui-hui
Re: Unable to load native-hadoop library
Take a look here: http://search-hadoop.com/m/FXOOOTJruq1 On Tue, Jul 2, 2013 at 3:25 PM, Chui-Hui Chiu cch...@tigers.lsu.edu wrote: Hello, I have a Hadoop 2.0.5 Alpha cluster. When I execute any Hadoop command, I see the following message. WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Is it at the lib/native folder? How do I configure the system to load it? Thanks, Chui-hui
What's Yarn?
Hi Dear all, I just fount it occasionally, maybe all you know that, but I just show here again. Yet Another Resource Negotiator—YARN from: http://adtmag.com/blogs/watersworks/2012/08/apache-yarn-promotion.aspx
Parameter 'yarn.nodemanager.resource.cpu-cores' does not work
Hi, With Hadoop 2.0.4-alpha, yarn.nodemanager.resource.cpu-cores does not work for me: 1. The performance of running same terasort job do not change, even after increasing or decreasing the value of 'yarn.nodemanager.resource.cpu-cores' in yarn-site.xml and restart the yarn cluster. 2. Even if I set the value of both 'yarn.nodemanager.resource.cpu-cores' and 'yarn.nodemanager.vcores-pcores-ratio' to 0, the MR job still could complete without any exception, but the expected behavior should be that no cpu could be assigned to the container, and then no job could be executed on the cluster. Right? Thanks!