Re: how to control (or understand) the memory usage in hdfs
oh, really? ulimit -n is 2048, I'd assumed that would be sufficient for just testing on my machine. I was going to use 4096 in production. my hdfs-site.xml has dfs.datanode.max.xcievers set to 4096. As for my logs... there's a lot of INFO entries, I haven't gotten around to configuring it down yet - I'm not quite sure why it's so extensive at INFO level. My log files is 4.4gb (is this a sign I've configured or done something wrong?) I grep -v INFO in the log to get the actual error entry (assuming the stack trace is actually is on the same line or else those stack lines maybe misleading) 2013-03-23 15:11:43,653 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(127.0.0.1:50010, storageID=DS-1419421989-192.168.1.5-50010-1363780956652, infoPort=50075, ipcPort=50020):DataXceiveServer: Exiting due to:java.lang.OutOfMemoryError: unable to create new native thread at java.lang.Thread.start0(Native Method) at java.lang.Thread.start(Thread.java:691) at org.apache.hadoop.hdfs.server.datanode.DataXceiverServer.run(DataXceiverServer.java:133) at java.lang.Thread.run(Thread.java:722) 2013-03-23 15:11:44,177 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(127.0.0.1:50010, storageID=DS-1419421989-192.168.1.5-50010-1363780956652, infoPort=50075, ipcPort=50020):DataXceiver java.io.InterruptedIOException: Interruped while waiting for IO on channel java.nio.channels.SocketChannel[closed]. 0 millis timeout left. at org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool.select(SocketIOWithTimeout.java:349) at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:157) at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155) at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128) at java.io.BufferedInputStream.read1(BufferedInputStream.java:273) at java.io.BufferedInputStream.read(BufferedInputStream.java:334) at java.io.DataInputStream.read(DataInputStream.java:149) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.readToBuf(BlockReceiver.java:292) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.readNextPacket(BlockReceiver.java:339) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:403) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:581) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:406) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:112) at java.lang.Thread.run(Thread.java:722) On 3/23/13, Harsh J ha...@cloudera.com wrote: I'm guessing your OutOfMemory then is due to Unable to create native thread message? Do you mind sharing your error logs with us? Cause if its that, then its a ulimit/system limits issue and not a real memory issue. On Sat, Mar 23, 2013 at 2:30 PM, Ted r6squee...@gmail.com wrote: I just checked and after running my tests, I generate only 670mb of data, on 89 blocks. What's more, when I ran the test this time, I had increased my memory to 2048mb so it completed fine - but I decided to run jconsole through the test so I could see what's happenning. The data node never exceeded 200mb of memory usage. It mostly stayed under 100mb. I'm not sure why it would complain about out of memory and shut itself down when it was only 1024. It was fairly consistently doing that the last few days including this morning right before I switched it to 2048. I'm going to run the test again with 1024mb and jconsole running, none of this makes any sense to me. On 3/23/13, Harsh J ha...@cloudera.com wrote: I run a 128 MB heap size DN for my simple purposes on my Mac and it runs well for what load I apply on it. A DN's primary, growing memory consumption comes from the # of blocks it carries. All of these blocks' file paths are mapped and kept in the RAM during its lifetime. If your DN has acquired a lot of blocks by now, like say close to a million or more, then 1 GB may not suffice anymore to hold them in and you'd need to scale up (add more RAM or increase heap size if you have more RAM)/scale out (add another node and run the balancer). On Sat, Mar 23, 2013 at 10:03 AM, Ted r6squee...@gmail.com wrote: Hi I'm new to hadoop/hdfs and I'm just running some tests on my local machines in a single node setup. I'm encountering out of memory errors on the jvm running my data node. I'm pretty sure I can just increase the heap size to fix the errors, but my question is about how memory is actually used. As an example, with other things like an OS's disk-cache or say databases, if you have or let it use as an example 1gb of ram, it will work with what it has available, if the data is more than 1gb of ram it just means it'll swap in and out of memory/disk more
Best practises to learn hadoop for new users
Re: DistributedCache - why not read directly from HDFS?
Thanks for your reply Harsh. So if I want to read a simple text file, choosing whether to use DistributedCachce or HDFS it becomes just a matter of performance. Alberto On 23 March 2013 16:17, Harsh J ha...@cloudera.com wrote: A DistributedCache is not used just to distribute simple files but also native libraries and such which cannot be loaded by certain if its on HDFS. Also, keeping it on HDFS could provide less performant as non-local reads could happen (depending on the files' replication factor). On Sat, Mar 23, 2013 at 8:23 PM, Alberto Cordioli cordioli.albe...@gmail.com wrote: Hi all, I was not able to find an answer to the following question. If the question has already been answered please give me the pointer to the right thread. Which are actually the differences between read file from HDFS in one mapper and use DistributedCache. I saw that with DistributedCache you can give an hdfs path and the task nodes will get the data on local file system. But which advantages we have compared with a simple HDFS read with FSDataInputStream.open() method? Thank you very much, Alberto -- Alberto Cordioli -- Harsh J -- Alberto Cordioli
Child JVM memory allocation / Usage
Hi, I configured my child jvm heap to 2 GB. So, I thought I could really read 1.5GB of data and store it in memory (mapper/reducer). I wanted to confirm the same and wrote the following piece of code in the configure method of mapper. @Override public void configure(JobConf job) { System.out.println(FREE MEMORY -- + Runtime.getRuntime().freeMemory()); System.out.println(MAX MEMORY --- + Runtime.getRuntime().maxMemory()); } Surprisingly the output was FREE MEMORY -- 341854864 = 320 MB MAX MEMORY ---1908932608 = 1.9 GB I am just wondering what processes are taking up that extra 1.6GB of heap which I configured for the child jvm heap. Appreciate in helping me understand the scenario. Regards Nagarjuna K
2 Reduce method in one Job
I want to get reduce output as key and value then I want to pass them to a new reduce as input key and input value. So is there any Map-Reduce-Reduce kind of method? Thanks to all.
Re: 2 Reduce method in one Job
there isn't such method, you had to submit another MR. On Mar 24, 2013 9:03 PM, Fatih Haltas fatih.hal...@nyu.edu wrote: I want to get reduce output as key and value then I want to pass them to a new reduce as input key and input value. So is there any Map-Reduce-Reduce kind of method? Thanks to all.
Re: 2 Reduce method in one Job
You seem to want to re-sort/partition your data without materializing it onto HDFS. Azuryy is right: There isn't a way right now and a second job (with an identity mapper) is necessary. With YARN this is more possible to implement into the project, though. The newly inducted incubator project Tez sorta targets this. Its in its nascent stages though (for general user use), and the website should hopefully appear at http://incubator.apache.org/projects/tez.html soon. Meanwhile, you can read the proposal behind this project at http://wiki.apache.org/incubator/TezProposal. Initial sources are at https://svn.apache.org/repos/asf/incubator/tez/trunk/. On Sun, Mar 24, 2013 at 6:33 PM, Fatih Haltas fatih.hal...@nyu.edu wrote: I want to get reduce output as key and value then I want to pass them to a new reduce as input key and input value. So is there any Map-Reduce-Reduce kind of method? Thanks to all. -- Harsh J
Re: 2 Reduce method in one Job
Thank you very much. You are right Harsh, it is exactly what i am trying to do. I want to process my result, according to the keys and i donot spend time writing this data to hdfs, I want to pass data as input to another reduce. One more question then, Creating 2 diffirent job, secondone has only reduce for example, is it possible to pass first jobs output as argument to second job? On Sun, Mar 24, 2013 at 5:44 PM, Harsh J ha...@cloudera.com wrote: You seem to want to re-sort/partition your data without materializing it onto HDFS. Azuryy is right: There isn't a way right now and a second job (with an identity mapper) is necessary. With YARN this is more possible to implement into the project, though. The newly inducted incubator project Tez sorta targets this. Its in its nascent stages though (for general user use), and the website should hopefully appear at http://incubator.apache.org/projects/tez.html soon. Meanwhile, you can read the proposal behind this project at http://wiki.apache.org/incubator/TezProposal. Initial sources are at https://svn.apache.org/repos/asf/incubator/tez/trunk/. On Sun, Mar 24, 2013 at 6:33 PM, Fatih Haltas fatih.hal...@nyu.edu wrote: I want to get reduce output as key and value then I want to pass them to a new reduce as input key and input value. So is there any Map-Reduce-Reduce kind of method? Thanks to all. -- Harsh J
Re: 2 Reduce method in one Job
Yes, just use an identity mapper (in new API, the base Mapper class itself identity-maps, in the old API use IdentityMapper class) and set the input path as the output path of the first job. If you'll be ending up doing more such step-wise job chaining, consider using Apache Oozie's workflow system. On Sun, Mar 24, 2013 at 7:23 PM, Fatih Haltas fatih.hal...@nyu.edu wrote: Thank you very much. You are right Harsh, it is exactly what i am trying to do. I want to process my result, according to the keys and i donot spend time writing this data to hdfs, I want to pass data as input to another reduce. One more question then, Creating 2 diffirent job, secondone has only reduce for example, is it possible to pass first jobs output as argument to second job? On Sun, Mar 24, 2013 at 5:44 PM, Harsh J ha...@cloudera.com wrote: You seem to want to re-sort/partition your data without materializing it onto HDFS. Azuryy is right: There isn't a way right now and a second job (with an identity mapper) is necessary. With YARN this is more possible to implement into the project, though. The newly inducted incubator project Tez sorta targets this. Its in its nascent stages though (for general user use), and the website should hopefully appear at http://incubator.apache.org/projects/tez.html soon. Meanwhile, you can read the proposal behind this project at http://wiki.apache.org/incubator/TezProposal. Initial sources are at https://svn.apache.org/repos/asf/incubator/tez/trunk/. On Sun, Mar 24, 2013 at 6:33 PM, Fatih Haltas fatih.hal...@nyu.edu wrote: I want to get reduce output as key and value then I want to pass them to a new reduce as input key and input value. So is there any Map-Reduce-Reduce kind of method? Thanks to all. -- Harsh J -- Harsh J
Re: disk used percentage is not symmetric on datanodes (balancer)
Are you running balancer? If balancer is running and if it is slow, try increasing the balancer bandwidth On 24 March 2013 09:21, Tapas Sarangi tapas.sara...@gmail.com wrote: Thanks for the follow up. I don't know whether attachment will pass through this mailing list, but I am attaching a pdf that contains the usage of all live nodes. All nodes starting with letter g are the ones with smaller storage space where as nodes starting with letter s have larger storage space. As you will see, most of the gXX nodes are completely full whereas sXX nodes have a lot of unused space. Recently, we are facing crisis frequently as 'hdfs' goes into a mode where it is not able to write any further even though the total space available in the cluster is about 500 TB. We believe this has something to do with the way it is balancing the nodes, but don't understand the problem yet. May be the attached PDF will help some of you (experts) to see what is going wrong here... Thanks -- Balancer know about topology,but when calculate balancing it operates only with nodes not with racks. You can see how it work in Balancer.java in BalancerDatanode about string 509. I was wrong about 350Tb,35Tb it calculates in such way : For example: cluster_capacity=3.5Pb cluster_dfsused=2Pb avgutil=cluster_dfsused/cluster_capacity*100=57.14% used cluster capacity Then we know avg node utilization (node_dfsused/node_capacity*100) .Balancer think that all good if avgutil +10node_utilizazation=avgutil-10. Ideal case that all node used avgutl of capacity.but for 12TB node its only 6.5Tb and for 72Tb its about 40Tb. Balancer cant help you. Show me http://namenode.rambler.ru:50070/dfsnodelist.jsp?whatNodes=LIVEif you can. In ideal case with replication factor 2 ,with two nodes 12Tb and 72Tb you will be able to have only 12Tb replication data. Yes, this is true for exactly two nodes in the cluster with 12 TB and 72 TB, but not true for more than two nodes in the cluster. Best way,on my opinion,it is using multiple racks.Nodes in rack must be with identical capacity.Racks must be identical capacity. For example: rack1: 1 node with 72Tb rack2: 6 nodes with 12Tb rack3: 3 nodes with 24Tb It helps with balancing,because dublicated block must be another rack. The same question I asked earlier in this message, does multiple racks with default threshold for the balancer minimizes the difference between racks ? Why did you select hdfs?May be lustre,cephfs and other is better choise. It wasn't my decision, and I probably can't change it now. I am new to this cluster and trying to understand few issues. I will explore other options as you mentioned. -- http://balajin.net/blog http://flic.kr/balajijegan
Re: question for commetter
is there a reason why you dont want to run MRv2 under yarn? On 22 March 2013 22:49, Azuryy Yu azury...@gmail.com wrote: is there a way to separate hdfs2 from hadoop2? I want use hdfs2 and mapreduce1.0.4, exclude yarn. because I need HDFS-HA. -- http://balajin.net/blog http://flic.kr/balajijegan
Re: disk used percentage is not symmetric on datanodes (balancer)
Yes, we are running balancer, though a balancer process runs for almost a day or more before exiting and starting over. Current dfs.balance.bandwidthPerSec value is set to 2x10^9. I assume that's bytes so about 2 GigaByte/sec. Shouldn't that be reasonable ? If it is in Bits then we have a problem. What's the unit for dfs.balance.bandwidthPerSec ? - On Mar 24, 2013, at 1:23 PM, Balaji Narayanan (பாலாஜி நாராயணன்) li...@balajin.net wrote: Are you running balancer? If balancer is running and if it is slow, try increasing the balancer bandwidth On 24 March 2013 09:21, Tapas Sarangi tapas.sara...@gmail.com wrote: Thanks for the follow up. I don't know whether attachment will pass through this mailing list, but I am attaching a pdf that contains the usage of all live nodes. All nodes starting with letter g are the ones with smaller storage space where as nodes starting with letter s have larger storage space. As you will see, most of the gXX nodes are completely full whereas sXX nodes have a lot of unused space. Recently, we are facing crisis frequently as 'hdfs' goes into a mode where it is not able to write any further even though the total space available in the cluster is about 500 TB. We believe this has something to do with the way it is balancing the nodes, but don't understand the problem yet. May be the attached PDF will help some of you (experts) to see what is going wrong here... Thanks -- Balancer know about topology,but when calculate balancing it operates only with nodes not with racks. You can see how it work in Balancer.java in BalancerDatanode about string 509. I was wrong about 350Tb,35Tb it calculates in such way : For example: cluster_capacity=3.5Pb cluster_dfsused=2Pb avgutil=cluster_dfsused/cluster_capacity*100=57.14% used cluster capacity Then we know avg node utilization (node_dfsused/node_capacity*100) .Balancer think that all good if avgutil +10node_utilizazation=avgutil-10. Ideal case that all node used avgutl of capacity.but for 12TB node its only 6.5Tb and for 72Tb its about 40Tb. Balancer cant help you. Show me http://namenode.rambler.ru:50070/dfsnodelist.jsp?whatNodes=LIVE if you can. In ideal case with replication factor 2 ,with two nodes 12Tb and 72Tb you will be able to have only 12Tb replication data. Yes, this is true for exactly two nodes in the cluster with 12 TB and 72 TB, but not true for more than two nodes in the cluster. Best way,on my opinion,it is using multiple racks.Nodes in rack must be with identical capacity.Racks must be identical capacity. For example: rack1: 1 node with 72Tb rack2: 6 nodes with 12Tb rack3: 3 nodes with 24Tb It helps with balancing,because dublicated block must be another rack. The same question I asked earlier in this message, does multiple racks with default threshold for the balancer minimizes the difference between racks ? Why did you select hdfs?May be lustre,cephfs and other is better choise. It wasn't my decision, and I probably can't change it now. I am new to this cluster and trying to understand few issues. I will explore other options as you mentioned. -- http://balajin.net/blog http://flic.kr/balajijegan
Re: disk used percentage is not symmetric on datanodes (balancer)
-setBalancerBandwidth bandwidth in bytes per second So the value is bytes per second. If it is running and exiting,it means it has completed the balancing. On 24 March 2013 11:32, Tapas Sarangi tapas.sara...@gmail.com wrote: Yes, we are running balancer, though a balancer process runs for almost a day or more before exiting and starting over. Current dfs.balance.bandwidthPerSec value is set to 2x10^9. I assume that's bytes so about 2 GigaByte/sec. Shouldn't that be reasonable ? If it is in Bits then we have a problem. What's the unit for dfs.balance.bandwidthPerSec ? - On Mar 24, 2013, at 1:23 PM, Balaji Narayanan (பாலாஜி நாராயணன்) li...@balajin.net wrote: Are you running balancer? If balancer is running and if it is slow, try increasing the balancer bandwidth On 24 March 2013 09:21, Tapas Sarangi tapas.sara...@gmail.com wrote: Thanks for the follow up. I don't know whether attachment will pass through this mailing list, but I am attaching a pdf that contains the usage of all live nodes. All nodes starting with letter g are the ones with smaller storage space where as nodes starting with letter s have larger storage space. As you will see, most of the gXX nodes are completely full whereas sXX nodes have a lot of unused space. Recently, we are facing crisis frequently as 'hdfs' goes into a mode where it is not able to write any further even though the total space available in the cluster is about 500 TB. We believe this has something to do with the way it is balancing the nodes, but don't understand the problem yet. May be the attached PDF will help some of you (experts) to see what is going wrong here... Thanks -- Balancer know about topology,but when calculate balancing it operates only with nodes not with racks. You can see how it work in Balancer.java in BalancerDatanode about string 509. I was wrong about 350Tb,35Tb it calculates in such way : For example: cluster_capacity=3.5Pb cluster_dfsused=2Pb avgutil=cluster_dfsused/cluster_capacity*100=57.14% used cluster capacity Then we know avg node utilization (node_dfsused/node_capacity*100) .Balancer think that all good if avgutil +10node_utilizazation=avgutil-10. Ideal case that all node used avgutl of capacity.but for 12TB node its only 6.5Tb and for 72Tb its about 40Tb. Balancer cant help you. Show me http://namenode.rambler.ru:50070/dfsnodelist.jsp?whatNodes=LIVEif you can. In ideal case with replication factor 2 ,with two nodes 12Tb and 72Tb you will be able to have only 12Tb replication data. Yes, this is true for exactly two nodes in the cluster with 12 TB and 72 TB, but not true for more than two nodes in the cluster. Best way,on my opinion,it is using multiple racks.Nodes in rack must be with identical capacity.Racks must be identical capacity. For example: rack1: 1 node with 72Tb rack2: 6 nodes with 12Tb rack3: 3 nodes with 24Tb It helps with balancing,because dublicated block must be another rack. The same question I asked earlier in this message, does multiple racks with default threshold for the balancer minimizes the difference between racks ? Why did you select hdfs?May be lustre,cephfs and other is better choise. It wasn't my decision, and I probably can't change it now. I am new to this cluster and trying to understand few issues. I will explore other options as you mentioned. -- http://balajin.net/blog http://flic.kr/balajijegan -- http://balajin.net/blog http://flic.kr/balajijegan
Re: disk used percentage is not symmetric on datanodes (balancer)
Yes, thanks for pointing, but I already know that it is completing the balancing when exiting otherwise it shouldn't exit. Your answer doesn't solve the problem I mentioned earlier in my message. 'hdfs' is stalling and hadoop is not writing unless space is cleared up from the cluster even though df shows the cluster has about 500 TB of free space. --- On Mar 24, 2013, at 1:54 PM, Balaji Narayanan (பாலாஜி நாராயணன்) bal...@balajin.net wrote: -setBalancerBandwidth bandwidth in bytes per second So the value is bytes per second. If it is running and exiting,it means it has completed the balancing. On 24 March 2013 11:32, Tapas Sarangi tapas.sara...@gmail.com wrote: Yes, we are running balancer, though a balancer process runs for almost a day or more before exiting and starting over. Current dfs.balance.bandwidthPerSec value is set to 2x10^9. I assume that's bytes so about 2 GigaByte/sec. Shouldn't that be reasonable ? If it is in Bits then we have a problem. What's the unit for dfs.balance.bandwidthPerSec ? - On Mar 24, 2013, at 1:23 PM, Balaji Narayanan (பாலாஜி நாராயணன்) li...@balajin.net wrote: Are you running balancer? If balancer is running and if it is slow, try increasing the balancer bandwidth On 24 March 2013 09:21, Tapas Sarangi tapas.sara...@gmail.com wrote: Thanks for the follow up. I don't know whether attachment will pass through this mailing list, but I am attaching a pdf that contains the usage of all live nodes. All nodes starting with letter g are the ones with smaller storage space where as nodes starting with letter s have larger storage space. As you will see, most of the gXX nodes are completely full whereas sXX nodes have a lot of unused space. Recently, we are facing crisis frequently as 'hdfs' goes into a mode where it is not able to write any further even though the total space available in the cluster is about 500 TB. We believe this has something to do with the way it is balancing the nodes, but don't understand the problem yet. May be the attached PDF will help some of you (experts) to see what is going wrong here... Thanks -- Balancer know about topology,but when calculate balancing it operates only with nodes not with racks. You can see how it work in Balancer.java in BalancerDatanode about string 509. I was wrong about 350Tb,35Tb it calculates in such way : For example: cluster_capacity=3.5Pb cluster_dfsused=2Pb avgutil=cluster_dfsused/cluster_capacity*100=57.14% used cluster capacity Then we know avg node utilization (node_dfsused/node_capacity*100) .Balancer think that all good if avgutil +10node_utilizazation=avgutil-10. Ideal case that all node used avgutl of capacity.but for 12TB node its only 6.5Tb and for 72Tb its about 40Tb. Balancer cant help you. Show me http://namenode.rambler.ru:50070/dfsnodelist.jsp?whatNodes=LIVE if you can. In ideal case with replication factor 2 ,with two nodes 12Tb and 72Tb you will be able to have only 12Tb replication data. Yes, this is true for exactly two nodes in the cluster with 12 TB and 72 TB, but not true for more than two nodes in the cluster. Best way,on my opinion,it is using multiple racks.Nodes in rack must be with identical capacity.Racks must be identical capacity. For example: rack1: 1 node with 72Tb rack2: 6 nodes with 12Tb rack3: 3 nodes with 24Tb It helps with balancing,because dublicated block must be another rack. The same question I asked earlier in this message, does multiple racks with default threshold for the balancer minimizes the difference between racks ? Why did you select hdfs?May be lustre,cephfs and other is better choise. It wasn't my decision, and I probably can't change it now. I am new to this cluster and trying to understand few issues. I will explore other options as you mentioned. -- http://balajin.net/blog http://flic.kr/balajijegan -- http://balajin.net/blog http://flic.kr/balajijegan
Re: disk used percentage is not symmetric on datanodes (balancer)
On both types of nodes, what is your dfs.data.dir set to? Does it specify multiple folders on the same set's of drives or is it 1-1 between folder and drive? If it's set to multiple folders on the same drives, it is probably multiplying the amount of available capacity incorrectly in that it assumes a 1-1 relationship between folder and total capacity of the drive. On Sun, Mar 24, 2013 at 3:01 PM, Tapas Sarangi tapas.sara...@gmail.comwrote: Yes, thanks for pointing, but I already know that it is completing the balancing when exiting otherwise it shouldn't exit. Your answer doesn't solve the problem I mentioned earlier in my message. 'hdfs' is stalling and hadoop is not writing unless space is cleared up from the cluster even though df shows the cluster has about 500 TB of free space. --- On Mar 24, 2013, at 1:54 PM, Balaji Narayanan (பாலாஜி நாராயணன்) bal...@balajin.net wrote: -setBalancerBandwidth bandwidth in bytes per second So the value is bytes per second. If it is running and exiting,it means it has completed the balancing. On 24 March 2013 11:32, Tapas Sarangi tapas.sara...@gmail.com wrote: Yes, we are running balancer, though a balancer process runs for almost a day or more before exiting and starting over. Current dfs.balance.bandwidthPerSec value is set to 2x10^9. I assume that's bytes so about 2 GigaByte/sec. Shouldn't that be reasonable ? If it is in Bits then we have a problem. What's the unit for dfs.balance.bandwidthPerSec ? - On Mar 24, 2013, at 1:23 PM, Balaji Narayanan (பாலாஜி நாராயணன்) li...@balajin.net wrote: Are you running balancer? If balancer is running and if it is slow, try increasing the balancer bandwidth On 24 March 2013 09:21, Tapas Sarangi tapas.sara...@gmail.com wrote: Thanks for the follow up. I don't know whether attachment will pass through this mailing list, but I am attaching a pdf that contains the usage of all live nodes. All nodes starting with letter g are the ones with smaller storage space where as nodes starting with letter s have larger storage space. As you will see, most of the gXX nodes are completely full whereas sXX nodes have a lot of unused space. Recently, we are facing crisis frequently as 'hdfs' goes into a mode where it is not able to write any further even though the total space available in the cluster is about 500 TB. We believe this has something to do with the way it is balancing the nodes, but don't understand the problem yet. May be the attached PDF will help some of you (experts) to see what is going wrong here... Thanks -- Balancer know about topology,but when calculate balancing it operates only with nodes not with racks. You can see how it work in Balancer.java in BalancerDatanode about string 509. I was wrong about 350Tb,35Tb it calculates in such way : For example: cluster_capacity=3.5Pb cluster_dfsused=2Pb avgutil=cluster_dfsused/cluster_capacity*100=57.14% used cluster capacity Then we know avg node utilization (node_dfsused/node_capacity*100) .Balancer think that all good if avgutil +10node_utilizazation=avgutil-10. Ideal case that all node used avgutl of capacity.but for 12TB node its only 6.5Tb and for 72Tb its about 40Tb. Balancer cant help you. Show me http://namenode.rambler.ru:50070/dfsnodelist.jsp?whatNodes=LIVEif you can. In ideal case with replication factor 2 ,with two nodes 12Tb and 72Tb you will be able to have only 12Tb replication data. Yes, this is true for exactly two nodes in the cluster with 12 TB and 72 TB, but not true for more than two nodes in the cluster. Best way,on my opinion,it is using multiple racks.Nodes in rack must be with identical capacity.Racks must be identical capacity. For example: rack1: 1 node with 72Tb rack2: 6 nodes with 12Tb rack3: 3 nodes with 24Tb It helps with balancing,because dublicated block must be another rack. The same question I asked earlier in this message, does multiple racks with default threshold for the balancer minimizes the difference between racks ? Why did you select hdfs?May be lustre,cephfs and other is better choise. It wasn't my decision, and I probably can't change it now. I am new to this cluster and trying to understand few issues. I will explore other options as you mentioned. -- http://balajin.net/blog http://flic.kr/balajijegan -- http://balajin.net/blog http://flic.kr/balajijegan
Re: disk used percentage is not symmetric on datanodes (balancer)
Thanks. We have a 1-1 configuration of drives and folder in all the datanodes. -Tapas On Mar 24, 2013, at 3:29 PM, Jamal B jm151...@gmail.com wrote: On both types of nodes, what is your dfs.data.dir set to? Does it specify multiple folders on the same set's of drives or is it 1-1 between folder and drive? If it's set to multiple folders on the same drives, it is probably multiplying the amount of available capacity incorrectly in that it assumes a 1-1 relationship between folder and total capacity of the drive. On Sun, Mar 24, 2013 at 3:01 PM, Tapas Sarangi tapas.sara...@gmail.com wrote: Yes, thanks for pointing, but I already know that it is completing the balancing when exiting otherwise it shouldn't exit. Your answer doesn't solve the problem I mentioned earlier in my message. 'hdfs' is stalling and hadoop is not writing unless space is cleared up from the cluster even though df shows the cluster has about 500 TB of free space. --- On Mar 24, 2013, at 1:54 PM, Balaji Narayanan (பாலாஜி நாராயணன்) bal...@balajin.net wrote: -setBalancerBandwidth bandwidth in bytes per second So the value is bytes per second. If it is running and exiting,it means it has completed the balancing. On 24 March 2013 11:32, Tapas Sarangi tapas.sara...@gmail.com wrote: Yes, we are running balancer, though a balancer process runs for almost a day or more before exiting and starting over. Current dfs.balance.bandwidthPerSec value is set to 2x10^9. I assume that's bytes so about 2 GigaByte/sec. Shouldn't that be reasonable ? If it is in Bits then we have a problem. What's the unit for dfs.balance.bandwidthPerSec ? - On Mar 24, 2013, at 1:23 PM, Balaji Narayanan (பாலாஜி நாராயணன்) li...@balajin.net wrote: Are you running balancer? If balancer is running and if it is slow, try increasing the balancer bandwidth On 24 March 2013 09:21, Tapas Sarangi tapas.sara...@gmail.com wrote: Thanks for the follow up. I don't know whether attachment will pass through this mailing list, but I am attaching a pdf that contains the usage of all live nodes. All nodes starting with letter g are the ones with smaller storage space where as nodes starting with letter s have larger storage space. As you will see, most of the gXX nodes are completely full whereas sXX nodes have a lot of unused space. Recently, we are facing crisis frequently as 'hdfs' goes into a mode where it is not able to write any further even though the total space available in the cluster is about 500 TB. We believe this has something to do with the way it is balancing the nodes, but don't understand the problem yet. May be the attached PDF will help some of you (experts) to see what is going wrong here... Thanks -- Balancer know about topology,but when calculate balancing it operates only with nodes not with racks. You can see how it work in Balancer.java in BalancerDatanode about string 509. I was wrong about 350Tb,35Tb it calculates in such way : For example: cluster_capacity=3.5Pb cluster_dfsused=2Pb avgutil=cluster_dfsused/cluster_capacity*100=57.14% used cluster capacity Then we know avg node utilization (node_dfsused/node_capacity*100) .Balancer think that all good if avgutil +10node_utilizazation=avgutil-10. Ideal case that all node used avgutl of capacity.but for 12TB node its only 6.5Tb and for 72Tb its about 40Tb. Balancer cant help you. Show me http://namenode.rambler.ru:50070/dfsnodelist.jsp?whatNodes=LIVE if you can. In ideal case with replication factor 2 ,with two nodes 12Tb and 72Tb you will be able to have only 12Tb replication data. Yes, this is true for exactly two nodes in the cluster with 12 TB and 72 TB, but not true for more than two nodes in the cluster. Best way,on my opinion,it is using multiple racks.Nodes in rack must be with identical capacity.Racks must be identical capacity. For example: rack1: 1 node with 72Tb rack2: 6 nodes with 12Tb rack3: 3 nodes with 24Tb It helps with balancing,because dublicated block must be another rack. The same question I asked earlier in this message, does multiple racks with default threshold for the balancer minimizes the difference between racks ? Why did you select hdfs?May be lustre,cephfs and other is better choise. It wasn't my decision, and I probably can't change it now. I am new to this cluster and trying to understand few issues. I will explore other options as you mentioned. -- http://balajin.net/blog http://flic.kr/balajijegan -- http://balajin.net/blog http://flic.kr/balajijegan
Re: disk used percentage is not symmetric on datanodes (balancer)
Then I think the only way around this would be to decommission 1 at a time, the smaller nodes, and ensure that the blocks are moved to the larger nodes. And once complete, bring back in the smaller nodes, but maybe only after you tweak the rack topology to match your disk layout more than network layout to compensate for the unbalanced nodes. Just my 2 cents On Sun, Mar 24, 2013 at 4:31 PM, Tapas Sarangi tapas.sara...@gmail.comwrote: Thanks. We have a 1-1 configuration of drives and folder in all the datanodes. -Tapas On Mar 24, 2013, at 3:29 PM, Jamal B jm151...@gmail.com wrote: On both types of nodes, what is your dfs.data.dir set to? Does it specify multiple folders on the same set's of drives or is it 1-1 between folder and drive? If it's set to multiple folders on the same drives, it is probably multiplying the amount of available capacity incorrectly in that it assumes a 1-1 relationship between folder and total capacity of the drive. On Sun, Mar 24, 2013 at 3:01 PM, Tapas Sarangi tapas.sara...@gmail.comwrote: Yes, thanks for pointing, but I already know that it is completing the balancing when exiting otherwise it shouldn't exit. Your answer doesn't solve the problem I mentioned earlier in my message. 'hdfs' is stalling and hadoop is not writing unless space is cleared up from the cluster even though df shows the cluster has about 500 TB of free space. --- On Mar 24, 2013, at 1:54 PM, Balaji Narayanan (பாலாஜி நாராயணன்) bal...@balajin.net wrote: -setBalancerBandwidth bandwidth in bytes per second So the value is bytes per second. If it is running and exiting,it means it has completed the balancing. On 24 March 2013 11:32, Tapas Sarangi tapas.sara...@gmail.com wrote: Yes, we are running balancer, though a balancer process runs for almost a day or more before exiting and starting over. Current dfs.balance.bandwidthPerSec value is set to 2x10^9. I assume that's bytes so about 2 GigaByte/sec. Shouldn't that be reasonable ? If it is in Bits then we have a problem. What's the unit for dfs.balance.bandwidthPerSec ? - On Mar 24, 2013, at 1:23 PM, Balaji Narayanan (பாலாஜி நாராயணன்) li...@balajin.net wrote: Are you running balancer? If balancer is running and if it is slow, try increasing the balancer bandwidth On 24 March 2013 09:21, Tapas Sarangi tapas.sara...@gmail.com wrote: Thanks for the follow up. I don't know whether attachment will pass through this mailing list, but I am attaching a pdf that contains the usage of all live nodes. All nodes starting with letter g are the ones with smaller storage space where as nodes starting with letter s have larger storage space. As you will see, most of the gXX nodes are completely full whereas sXX nodes have a lot of unused space. Recently, we are facing crisis frequently as 'hdfs' goes into a mode where it is not able to write any further even though the total space available in the cluster is about 500 TB. We believe this has something to do with the way it is balancing the nodes, but don't understand the problem yet. May be the attached PDF will help some of you (experts) to see what is going wrong here... Thanks -- Balancer know about topology,but when calculate balancing it operates only with nodes not with racks. You can see how it work in Balancer.java in BalancerDatanode about string 509. I was wrong about 350Tb,35Tb it calculates in such way : For example: cluster_capacity=3.5Pb cluster_dfsused=2Pb avgutil=cluster_dfsused/cluster_capacity*100=57.14% used cluster capacity Then we know avg node utilization (node_dfsused/node_capacity*100) .Balancer think that all good if avgutil +10node_utilizazation=avgutil-10. Ideal case that all node used avgutl of capacity.but for 12TB node its only 6.5Tb and for 72Tb its about 40Tb. Balancer cant help you. Show me http://namenode.rambler.ru:50070/dfsnodelist.jsp?whatNodes=LIVEif you can. In ideal case with replication factor 2 ,with two nodes 12Tb and 72Tb you will be able to have only 12Tb replication data. Yes, this is true for exactly two nodes in the cluster with 12 TB and 72 TB, but not true for more than two nodes in the cluster. Best way,on my opinion,it is using multiple racks.Nodes in rack must be with identical capacity.Racks must be identical capacity. For example: rack1: 1 node with 72Tb rack2: 6 nodes with 12Tb rack3: 3 nodes with 24Tb It helps with balancing,because dublicated block must be another rack. The same question I asked earlier in this message, does multiple racks with default threshold for the balancer minimizes the difference between racks ? Why did you select hdfs?May be lustre,cephfs and other is better choise. It wasn't my decision, and I probably can't change it now. I am new to this cluster and trying to understand few issues. I will explore other options as you mentioned. -- http://balajin.net/blog
Re: disk used percentage is not symmetric on datanodes (balancer)
you said that threshold=10.Run mannualy command : hadoop balancer threshold 9.5 ,then 9 and so with 0.5 step. On Sun, Mar 24, 2013 at 11:01 PM, Tapas Sarangi tapas.sara...@gmail.comwrote: Yes, thanks for pointing, but I already know that it is completing the balancing when exiting otherwise it shouldn't exit. Your answer doesn't solve the problem I mentioned earlier in my message. 'hdfs' is stalling and hadoop is not writing unless space is cleared up from the cluster even though df shows the cluster has about 500 TB of free space. --- On Mar 24, 2013, at 1:54 PM, Balaji Narayanan (பாலாஜி நாராயணன்) bal...@balajin.net wrote: -setBalancerBandwidth bandwidth in bytes per second So the value is bytes per second. If it is running and exiting,it means it has completed the balancing. On 24 March 2013 11:32, Tapas Sarangi tapas.sara...@gmail.com wrote: Yes, we are running balancer, though a balancer process runs for almost a day or more before exiting and starting over. Current dfs.balance.bandwidthPerSec value is set to 2x10^9. I assume that's bytes so about 2 GigaByte/sec. Shouldn't that be reasonable ? If it is in Bits then we have a problem. What's the unit for dfs.balance.bandwidthPerSec ? - On Mar 24, 2013, at 1:23 PM, Balaji Narayanan (பாலாஜி நாராயணன்) li...@balajin.net wrote: Are you running balancer? If balancer is running and if it is slow, try increasing the balancer bandwidth On 24 March 2013 09:21, Tapas Sarangi tapas.sara...@gmail.com wrote: Thanks for the follow up. I don't know whether attachment will pass through this mailing list, but I am attaching a pdf that contains the usage of all live nodes. All nodes starting with letter g are the ones with smaller storage space where as nodes starting with letter s have larger storage space. As you will see, most of the gXX nodes are completely full whereas sXX nodes have a lot of unused space. Recently, we are facing crisis frequently as 'hdfs' goes into a mode where it is not able to write any further even though the total space available in the cluster is about 500 TB. We believe this has something to do with the way it is balancing the nodes, but don't understand the problem yet. May be the attached PDF will help some of you (experts) to see what is going wrong here... Thanks -- Balancer know about topology,but when calculate balancing it operates only with nodes not with racks. You can see how it work in Balancer.java in BalancerDatanode about string 509. I was wrong about 350Tb,35Tb it calculates in such way : For example: cluster_capacity=3.5Pb cluster_dfsused=2Pb avgutil=cluster_dfsused/cluster_capacity*100=57.14% used cluster capacity Then we know avg node utilization (node_dfsused/node_capacity*100) .Balancer think that all good if avgutil +10node_utilizazation=avgutil-10. Ideal case that all node used avgutl of capacity.but for 12TB node its only 6.5Tb and for 72Tb its about 40Tb. Balancer cant help you. Show me http://namenode.rambler.ru:50070/dfsnodelist.jsp?whatNodes=LIVEif you can. In ideal case with replication factor 2 ,with two nodes 12Tb and 72Tb you will be able to have only 12Tb replication data. Yes, this is true for exactly two nodes in the cluster with 12 TB and 72 TB, but not true for more than two nodes in the cluster. Best way,on my opinion,it is using multiple racks.Nodes in rack must be with identical capacity.Racks must be identical capacity. For example: rack1: 1 node with 72Tb rack2: 6 nodes with 12Tb rack3: 3 nodes with 24Tb It helps with balancing,because dublicated block must be another rack. The same question I asked earlier in this message, does multiple racks with default threshold for the balancer minimizes the difference between racks ? Why did you select hdfs?May be lustre,cephfs and other is better choise. It wasn't my decision, and I probably can't change it now. I am new to this cluster and trying to understand few issues. I will explore other options as you mentioned. -- http://balajin.net/blog http://flic.kr/balajijegan -- http://balajin.net/blog http://flic.kr/balajijegan
Re: disk used percentage is not symmetric on datanodes (balancer)
Hi, Thanks for the idea, I will give this a try and report back. My worry is if we decommission a small node (one at a time), will it move the data to larger nodes or choke another smaller nodes ? In principle it should distribute the blocks, the point is it is not distributing the way we expect it to, so do you think this may cause further problems ? - On Mar 24, 2013, at 3:37 PM, Jamal B jm151...@gmail.com wrote: Then I think the only way around this would be to decommission 1 at a time, the smaller nodes, and ensure that the blocks are moved to the larger nodes. And once complete, bring back in the smaller nodes, but maybe only after you tweak the rack topology to match your disk layout more than network layout to compensate for the unbalanced nodes. Just my 2 cents On Sun, Mar 24, 2013 at 4:31 PM, Tapas Sarangi tapas.sara...@gmail.com wrote: Thanks. We have a 1-1 configuration of drives and folder in all the datanodes. -Tapas On Mar 24, 2013, at 3:29 PM, Jamal B jm151...@gmail.com wrote: On both types of nodes, what is your dfs.data.dir set to? Does it specify multiple folders on the same set's of drives or is it 1-1 between folder and drive? If it's set to multiple folders on the same drives, it is probably multiplying the amount of available capacity incorrectly in that it assumes a 1-1 relationship between folder and total capacity of the drive. On Sun, Mar 24, 2013 at 3:01 PM, Tapas Sarangi tapas.sara...@gmail.com wrote: Yes, thanks for pointing, but I already know that it is completing the balancing when exiting otherwise it shouldn't exit. Your answer doesn't solve the problem I mentioned earlier in my message. 'hdfs' is stalling and hadoop is not writing unless space is cleared up from the cluster even though df shows the cluster has about 500 TB of free space. --- On Mar 24, 2013, at 1:54 PM, Balaji Narayanan (பாலாஜி நாராயணன்) bal...@balajin.net wrote: -setBalancerBandwidth bandwidth in bytes per second So the value is bytes per second. If it is running and exiting,it means it has completed the balancing. On 24 March 2013 11:32, Tapas Sarangi tapas.sara...@gmail.com wrote: Yes, we are running balancer, though a balancer process runs for almost a day or more before exiting and starting over. Current dfs.balance.bandwidthPerSec value is set to 2x10^9. I assume that's bytes so about 2 GigaByte/sec. Shouldn't that be reasonable ? If it is in Bits then we have a problem. What's the unit for dfs.balance.bandwidthPerSec ? - On Mar 24, 2013, at 1:23 PM, Balaji Narayanan (பாலாஜி நாராயணன்) li...@balajin.net wrote: Are you running balancer? If balancer is running and if it is slow, try increasing the balancer bandwidth On 24 March 2013 09:21, Tapas Sarangi tapas.sara...@gmail.com wrote: Thanks for the follow up. I don't know whether attachment will pass through this mailing list, but I am attaching a pdf that contains the usage of all live nodes. All nodes starting with letter g are the ones with smaller storage space where as nodes starting with letter s have larger storage space. As you will see, most of the gXX nodes are completely full whereas sXX nodes have a lot of unused space. Recently, we are facing crisis frequently as 'hdfs' goes into a mode where it is not able to write any further even though the total space available in the cluster is about 500 TB. We believe this has something to do with the way it is balancing the nodes, but don't understand the problem yet. May be the attached PDF will help some of you (experts) to see what is going wrong here... Thanks -- Balancer know about topology,but when calculate balancing it operates only with nodes not with racks. You can see how it work in Balancer.java in BalancerDatanode about string 509. I was wrong about 350Tb,35Tb it calculates in such way : For example: cluster_capacity=3.5Pb cluster_dfsused=2Pb avgutil=cluster_dfsused/cluster_capacity*100=57.14% used cluster capacity Then we know avg node utilization (node_dfsused/node_capacity*100) .Balancer think that all good if avgutil +10node_utilizazation=avgutil-10. Ideal case that all node used avgutl of capacity.but for 12TB node its only 6.5Tb and for 72Tb its about 40Tb. Balancer cant help you. Show me http://namenode.rambler.ru:50070/dfsnodelist.jsp?whatNodes=LIVE if you can. In ideal case with replication factor 2 ,with two nodes 12Tb and 72Tb you will be able to have only 12Tb replication data. Yes, this is true for exactly two nodes in the cluster with 12 TB and 72 TB, but not true for more than two nodes in the cluster. Best way,on my opinion,it is using multiple racks.Nodes in rack must be with identical capacity.Racks must be identical capacity. For example: rack1: 1 node with 72Tb rack2: 6 nodes with 12Tb rack3: 3 nodes with 24Tb
Re: disk used percentage is not symmetric on datanodes (balancer)
On Mar 24, 2013, at 3:40 PM, Alexey Babutin zorlaxpokemon...@gmail.com wrote: you said that threshold=10.Run mannualy command : hadoop balancer threshold 9.5 ,then 9 and so with 0.5 step. We are not setting threshold anywhere in our configuration and thus considering the default which I believe is 10. Why do you suggest such steps need to be tested for balancer ? Please explain. I guess we had a discussion earlier on this thread and came to the conclusion that the threshold will not help in this situation. - On Sun, Mar 24, 2013 at 11:01 PM, Tapas Sarangi tapas.sara...@gmail.com wrote: Yes, thanks for pointing, but I already know that it is completing the balancing when exiting otherwise it shouldn't exit. Your answer doesn't solve the problem I mentioned earlier in my message. 'hdfs' is stalling and hadoop is not writing unless space is cleared up from the cluster even though df shows the cluster has about 500 TB of free space. --- On Mar 24, 2013, at 1:54 PM, Balaji Narayanan (பாலாஜி நாராயணன்) bal...@balajin.net wrote: -setBalancerBandwidth bandwidth in bytes per second So the value is bytes per second. If it is running and exiting,it means it has completed the balancing. On 24 March 2013 11:32, Tapas Sarangi tapas.sara...@gmail.com wrote: Yes, we are running balancer, though a balancer process runs for almost a day or more before exiting and starting over. Current dfs.balance.bandwidthPerSec value is set to 2x10^9. I assume that's bytes so about 2 GigaByte/sec. Shouldn't that be reasonable ? If it is in Bits then we have a problem. What's the unit for dfs.balance.bandwidthPerSec ? - On Mar 24, 2013, at 1:23 PM, Balaji Narayanan (பாலாஜி நாராயணன்) li...@balajin.net wrote: Are you running balancer? If balancer is running and if it is slow, try increasing the balancer bandwidth On 24 March 2013 09:21, Tapas Sarangi tapas.sara...@gmail.com wrote: Thanks for the follow up. I don't know whether attachment will pass through this mailing list, but I am attaching a pdf that contains the usage of all live nodes. All nodes starting with letter g are the ones with smaller storage space where as nodes starting with letter s have larger storage space. As you will see, most of the gXX nodes are completely full whereas sXX nodes have a lot of unused space. Recently, we are facing crisis frequently as 'hdfs' goes into a mode where it is not able to write any further even though the total space available in the cluster is about 500 TB. We believe this has something to do with the way it is balancing the nodes, but don't understand the problem yet. May be the attached PDF will help some of you (experts) to see what is going wrong here... Thanks -- Balancer know about topology,but when calculate balancing it operates only with nodes not with racks. You can see how it work in Balancer.java in BalancerDatanode about string 509. I was wrong about 350Tb,35Tb it calculates in such way : For example: cluster_capacity=3.5Pb cluster_dfsused=2Pb avgutil=cluster_dfsused/cluster_capacity*100=57.14% used cluster capacity Then we know avg node utilization (node_dfsused/node_capacity*100) .Balancer think that all good if avgutil +10node_utilizazation=avgutil-10. Ideal case that all node used avgutl of capacity.but for 12TB node its only 6.5Tb and for 72Tb its about 40Tb. Balancer cant help you. Show me http://namenode.rambler.ru:50070/dfsnodelist.jsp?whatNodes=LIVE if you can. In ideal case with replication factor 2 ,with two nodes 12Tb and 72Tb you will be able to have only 12Tb replication data. Yes, this is true for exactly two nodes in the cluster with 12 TB and 72 TB, but not true for more than two nodes in the cluster. Best way,on my opinion,it is using multiple racks.Nodes in rack must be with identical capacity.Racks must be identical capacity. For example: rack1: 1 node with 72Tb rack2: 6 nodes with 12Tb rack3: 3 nodes with 24Tb It helps with balancing,because dublicated block must be another rack. The same question I asked earlier in this message, does multiple racks with default threshold for the balancer minimizes the difference between racks ? Why did you select hdfs?May be lustre,cephfs and other is better choise. It wasn't my decision, and I probably can't change it now. I am new to this cluster and trying to understand few issues. I will explore other options as you mentioned. -- http://balajin.net/blog http://flic.kr/balajijegan -- http://balajin.net/blog http://flic.kr/balajijegan
Re: disk used percentage is not symmetric on datanodes (balancer)
I think that it makes help,but start from 1 node.watch where data have moved On Mon, Mar 25, 2013 at 12:44 AM, Tapas Sarangi tapas.sara...@gmail.comwrote: Hi, Thanks for the idea, I will give this a try and report back. My worry is if we decommission a small node (one at a time), will it move the data to larger nodes or choke another smaller nodes ? In principle it should distribute the blocks, the point is it is not distributing the way we expect it to, so do you think this may cause further problems ? - On Mar 24, 2013, at 3:37 PM, Jamal B jm151...@gmail.com wrote: Then I think the only way around this would be to decommission 1 at a time, the smaller nodes, and ensure that the blocks are moved to the larger nodes. And once complete, bring back in the smaller nodes, but maybe only after you tweak the rack topology to match your disk layout more than network layout to compensate for the unbalanced nodes. Just my 2 cents On Sun, Mar 24, 2013 at 4:31 PM, Tapas Sarangi tapas.sara...@gmail.comwrote: Thanks. We have a 1-1 configuration of drives and folder in all the datanodes. -Tapas On Mar 24, 2013, at 3:29 PM, Jamal B jm151...@gmail.com wrote: On both types of nodes, what is your dfs.data.dir set to? Does it specify multiple folders on the same set's of drives or is it 1-1 between folder and drive? If it's set to multiple folders on the same drives, it is probably multiplying the amount of available capacity incorrectly in that it assumes a 1-1 relationship between folder and total capacity of the drive. On Sun, Mar 24, 2013 at 3:01 PM, Tapas Sarangi tapas.sara...@gmail.comwrote: Yes, thanks for pointing, but I already know that it is completing the balancing when exiting otherwise it shouldn't exit. Your answer doesn't solve the problem I mentioned earlier in my message. 'hdfs' is stalling and hadoop is not writing unless space is cleared up from the cluster even though df shows the cluster has about 500 TB of free space. --- On Mar 24, 2013, at 1:54 PM, Balaji Narayanan (பாலாஜி நாராயணன்) bal...@balajin.net wrote: -setBalancerBandwidth bandwidth in bytes per second So the value is bytes per second. If it is running and exiting,it means it has completed the balancing. On 24 March 2013 11:32, Tapas Sarangi tapas.sara...@gmail.com wrote: Yes, we are running balancer, though a balancer process runs for almost a day or more before exiting and starting over. Current dfs.balance.bandwidthPerSec value is set to 2x10^9. I assume that's bytes so about 2 GigaByte/sec. Shouldn't that be reasonable ? If it is in Bits then we have a problem. What's the unit for dfs.balance.bandwidthPerSec ? - On Mar 24, 2013, at 1:23 PM, Balaji Narayanan (பாலாஜி நாராயணன்) li...@balajin.net wrote: Are you running balancer? If balancer is running and if it is slow, try increasing the balancer bandwidth On 24 March 2013 09:21, Tapas Sarangi tapas.sara...@gmail.com wrote: Thanks for the follow up. I don't know whether attachment will pass through this mailing list, but I am attaching a pdf that contains the usage of all live nodes. All nodes starting with letter g are the ones with smaller storage space where as nodes starting with letter s have larger storage space. As you will see, most of the gXX nodes are completely full whereas sXX nodes have a lot of unused space. Recently, we are facing crisis frequently as 'hdfs' goes into a mode where it is not able to write any further even though the total space available in the cluster is about 500 TB. We believe this has something to do with the way it is balancing the nodes, but don't understand the problem yet. May be the attached PDF will help some of you (experts) to see what is going wrong here... Thanks -- Balancer know about topology,but when calculate balancing it operates only with nodes not with racks. You can see how it work in Balancer.java in BalancerDatanode about string 509. I was wrong about 350Tb,35Tb it calculates in such way : For example: cluster_capacity=3.5Pb cluster_dfsused=2Pb avgutil=cluster_dfsused/cluster_capacity*100=57.14% used cluster capacity Then we know avg node utilization (node_dfsused/node_capacity*100) .Balancer think that all good if avgutil +10node_utilizazation=avgutil-10. Ideal case that all node used avgutl of capacity.but for 12TB node its only 6.5Tb and for 72Tb its about 40Tb. Balancer cant help you. Show me http://namenode.rambler.ru:50070/dfsnodelist.jsp?whatNodes=LIVE if you can. In ideal case with replication factor 2 ,with two nodes 12Tb and 72Tb you will be able to have only 12Tb replication data. Yes, this is true for exactly two nodes in the cluster with 12 TB and 72 TB, but not true for more than two nodes in the cluster. Best way,on my opinion,it is using multiple racks.Nodes in rack must be with identical capacity.Racks must be identical capacity. For
Re: disk used percentage is not symmetric on datanodes (balancer)
It shouldn't cause further problems since most of your small nodes are already their capacity. You could set or increase the dfs reserved property on your smaller nodes to force the flow of blocks onto the larger nodes. On Mar 24, 2013 4:45 PM, Tapas Sarangi tapas.sara...@gmail.com wrote: Hi, Thanks for the idea, I will give this a try and report back. My worry is if we decommission a small node (one at a time), will it move the data to larger nodes or choke another smaller nodes ? In principle it should distribute the blocks, the point is it is not distributing the way we expect it to, so do you think this may cause further problems ? - On Mar 24, 2013, at 3:37 PM, Jamal B jm151...@gmail.com wrote: Then I think the only way around this would be to decommission 1 at a time, the smaller nodes, and ensure that the blocks are moved to the larger nodes. And once complete, bring back in the smaller nodes, but maybe only after you tweak the rack topology to match your disk layout more than network layout to compensate for the unbalanced nodes. Just my 2 cents On Sun, Mar 24, 2013 at 4:31 PM, Tapas Sarangi tapas.sara...@gmail.comwrote: Thanks. We have a 1-1 configuration of drives and folder in all the datanodes. -Tapas On Mar 24, 2013, at 3:29 PM, Jamal B jm151...@gmail.com wrote: On both types of nodes, what is your dfs.data.dir set to? Does it specify multiple folders on the same set's of drives or is it 1-1 between folder and drive? If it's set to multiple folders on the same drives, it is probably multiplying the amount of available capacity incorrectly in that it assumes a 1-1 relationship between folder and total capacity of the drive. On Sun, Mar 24, 2013 at 3:01 PM, Tapas Sarangi tapas.sara...@gmail.comwrote: Yes, thanks for pointing, but I already know that it is completing the balancing when exiting otherwise it shouldn't exit. Your answer doesn't solve the problem I mentioned earlier in my message. 'hdfs' is stalling and hadoop is not writing unless space is cleared up from the cluster even though df shows the cluster has about 500 TB of free space. --- On Mar 24, 2013, at 1:54 PM, Balaji Narayanan (பாலாஜி நாராயணன்) bal...@balajin.net wrote: -setBalancerBandwidth bandwidth in bytes per second So the value is bytes per second. If it is running and exiting,it means it has completed the balancing. On 24 March 2013 11:32, Tapas Sarangi tapas.sara...@gmail.com wrote: Yes, we are running balancer, though a balancer process runs for almost a day or more before exiting and starting over. Current dfs.balance.bandwidthPerSec value is set to 2x10^9. I assume that's bytes so about 2 GigaByte/sec. Shouldn't that be reasonable ? If it is in Bits then we have a problem. What's the unit for dfs.balance.bandwidthPerSec ? - On Mar 24, 2013, at 1:23 PM, Balaji Narayanan (பாலாஜி நாராயணன்) li...@balajin.net wrote: Are you running balancer? If balancer is running and if it is slow, try increasing the balancer bandwidth On 24 March 2013 09:21, Tapas Sarangi tapas.sara...@gmail.com wrote: Thanks for the follow up. I don't know whether attachment will pass through this mailing list, but I am attaching a pdf that contains the usage of all live nodes. All nodes starting with letter g are the ones with smaller storage space where as nodes starting with letter s have larger storage space. As you will see, most of the gXX nodes are completely full whereas sXX nodes have a lot of unused space. Recently, we are facing crisis frequently as 'hdfs' goes into a mode where it is not able to write any further even though the total space available in the cluster is about 500 TB. We believe this has something to do with the way it is balancing the nodes, but don't understand the problem yet. May be the attached PDF will help some of you (experts) to see what is going wrong here... Thanks -- Balancer know about topology,but when calculate balancing it operates only with nodes not with racks. You can see how it work in Balancer.java in BalancerDatanode about string 509. I was wrong about 350Tb,35Tb it calculates in such way : For example: cluster_capacity=3.5Pb cluster_dfsused=2Pb avgutil=cluster_dfsused/cluster_capacity*100=57.14% used cluster capacity Then we know avg node utilization (node_dfsused/node_capacity*100) .Balancer think that all good if avgutil +10node_utilizazation=avgutil-10. Ideal case that all node used avgutl of capacity.but for 12TB node its only 6.5Tb and for 72Tb its about 40Tb. Balancer cant help you. Show me http://namenode.rambler.ru:50070/dfsnodelist.jsp?whatNodes=LIVE if you can. In ideal case with replication factor 2 ,with two nodes 12Tb and 72Tb you will be able to have only 12Tb replication data. Yes, this is true for exactly two nodes in the cluster with 12 TB and 72 TB, but not true for more than two nodes in the cluster.
Re: question for commetter
good question, i just want HA, dont want to change more configuration. On Mar 25, 2013 2:32 AM, Balaji Narayanan (பாலாஜி நாராயணன்) li...@balajin.net wrote: is there a reason why you dont want to run MRv2 under yarn? On 22 March 2013 22:49, Azuryy Yu azury...@gmail.com wrote: is there a way to separate hdfs2 from hadoop2? I want use hdfs2 and mapreduce1.0.4, exclude yarn. because I need HDFS-HA. -- http://balajin.net/blog http://flic.kr/balajijegan
Re: disk used percentage is not symmetric on datanodes (balancer)
Thanks. Does this need a restart of hadoop in the nodes where this modification is made ? - On Mar 24, 2013, at 8:06 PM, Jamal B jm151...@gmail.com wrote: dfs.datanode.du.reserved You could tweak that param on the smaller nodes to force the flow of blocks to other nodes. A short term hack at best, but should help the situation a bit. On Mar 24, 2013 7:09 PM, Tapas Sarangi tapas.sara...@gmail.com wrote: On Mar 24, 2013, at 4:34 PM, Jamal B jm151...@gmail.com wrote: It shouldn't cause further problems since most of your small nodes are already their capacity. You could set or increase the dfs reserved property on your smaller nodes to force the flow of blocks onto the larger nodes. Thanks. Can you please specify which are the dfs properties that we can set or modify to force the flow of blocks directed towards the larger nodes than the smaller nodes ? - On Mar 24, 2013 4:45 PM, Tapas Sarangi tapas.sara...@gmail.com wrote: Hi, Thanks for the idea, I will give this a try and report back. My worry is if we decommission a small node (one at a time), will it move the data to larger nodes or choke another smaller nodes ? In principle it should distribute the blocks, the point is it is not distributing the way we expect it to, so do you think this may cause further problems ? - On Mar 24, 2013, at 3:37 PM, Jamal B jm151...@gmail.com wrote: Then I think the only way around this would be to decommission 1 at a time, the smaller nodes, and ensure that the blocks are moved to the larger nodes. And once complete, bring back in the smaller nodes, but maybe only after you tweak the rack topology to match your disk layout more than network layout to compensate for the unbalanced nodes. Just my 2 cents On Sun, Mar 24, 2013 at 4:31 PM, Tapas Sarangi tapas.sara...@gmail.com wrote: Thanks. We have a 1-1 configuration of drives and folder in all the datanodes. -Tapas On Mar 24, 2013, at 3:29 PM, Jamal B jm151...@gmail.com wrote: On both types of nodes, what is your dfs.data.dir set to? Does it specify multiple folders on the same set's of drives or is it 1-1 between folder and drive? If it's set to multiple folders on the same drives, it is probably multiplying the amount of available capacity incorrectly in that it assumes a 1-1 relationship between folder and total capacity of the drive. On Sun, Mar 24, 2013 at 3:01 PM, Tapas Sarangi tapas.sara...@gmail.com wrote: Yes, thanks for pointing, but I already know that it is completing the balancing when exiting otherwise it shouldn't exit. Your answer doesn't solve the problem I mentioned earlier in my message. 'hdfs' is stalling and hadoop is not writing unless space is cleared up from the cluster even though df shows the cluster has about 500 TB of free space. --- On Mar 24, 2013, at 1:54 PM, Balaji Narayanan (பாலாஜி நாராயணன்) bal...@balajin.net wrote: -setBalancerBandwidth bandwidth in bytes per second So the value is bytes per second. If it is running and exiting,it means it has completed the balancing. On 24 March 2013 11:32, Tapas Sarangi tapas.sara...@gmail.com wrote: Yes, we are running balancer, though a balancer process runs for almost a day or more before exiting and starting over. Current dfs.balance.bandwidthPerSec value is set to 2x10^9. I assume that's bytes so about 2 GigaByte/sec. Shouldn't that be reasonable ? If it is in Bits then we have a problem. What's the unit for dfs.balance.bandwidthPerSec ? - On Mar 24, 2013, at 1:23 PM, Balaji Narayanan (பாலாஜி நாராயணன்) li...@balajin.net wrote: Are you running balancer? If balancer is running and if it is slow, try increasing the balancer bandwidth On 24 March 2013 09:21, Tapas Sarangi tapas.sara...@gmail.com wrote: Thanks for the follow up. I don't know whether attachment will pass through this mailing list, but I am attaching a pdf that contains the usage of all live nodes. All nodes starting with letter g are the ones with smaller storage space where as nodes starting with letter s have larger storage space. As you will see, most of the gXX nodes are completely full whereas sXX nodes have a lot of unused space. Recently, we are facing crisis frequently as 'hdfs' goes into a mode where it is not able to write any further even though the total space available in the cluster is about 500 TB. We believe this has something to do with the way it is balancing the nodes, but don't understand the problem yet. May be the attached PDF will help some of you (experts) to see what is going wrong here... Thanks -- Balancer know about topology,but when calculate balancing it operates only with nodes not with racks. You can see how it work in Balancer.java in BalancerDatanode about string 509. I was wrong about 350Tb,35Tb it calculates in such way :
Re: Child JVM memory allocation / Usage
did you set the min heap size == your max head size? if you didn't, free memory only shows you the difference between used and commit, not used and max. On 3/24/13, nagarjuna kanamarlapudi nagarjuna.kanamarlap...@gmail.com wrote: Hi, I configured my child jvm heap to 2 GB. So, I thought I could really read 1.5GB of data and store it in memory (mapper/reducer). I wanted to confirm the same and wrote the following piece of code in the configure method of mapper. @Override public void configure(JobConf job) { System.out.println(FREE MEMORY -- + Runtime.getRuntime().freeMemory()); System.out.println(MAX MEMORY --- + Runtime.getRuntime().maxMemory()); } Surprisingly the output was FREE MEMORY -- 341854864 = 320 MB MAX MEMORY ---1908932608 = 1.9 GB I am just wondering what processes are taking up that extra 1.6GB of heap which I configured for the child jvm heap. Appreciate in helping me understand the scenario. Regards Nagarjuna K -- Ted.
Re: Child JVM memory allocation / Usage
Hi Ted, As far as i can recollect, I onl configured these parameters property namemapred.child.java.opts/name value-Xmx2048m/value descriptionthis number is the number of megabytes of memory that each mapper and each reducers will have available to use. If jobs start running out of heap space, this may need to be increased./description /property property namemapred.child.ulimit/name value3145728/value descriptionthis number is the number of kilobytes of memory that each mapper and each reducer will have available to use. If jobs start running out of heap space, this may need to be increased./description /property On Mon, Mar 25, 2013 at 6:57 AM, Ted r6squee...@gmail.com wrote: did you set the min heap size == your max head size? if you didn't, free memory only shows you the difference between used and commit, not used and max. On 3/24/13, nagarjuna kanamarlapudi nagarjuna.kanamarlap...@gmail.com wrote: Hi, I configured my child jvm heap to 2 GB. So, I thought I could really read 1.5GB of data and store it in memory (mapper/reducer). I wanted to confirm the same and wrote the following piece of code in the configure method of mapper. @Override public void configure(JobConf job) { System.out.println(FREE MEMORY -- + Runtime.getRuntime().freeMemory()); System.out.println(MAX MEMORY --- + Runtime.getRuntime().maxMemory()); } Surprisingly the output was FREE MEMORY -- 341854864 = 320 MB MAX MEMORY ---1908932608 = 1.9 GB I am just wondering what processes are taking up that extra 1.6GB of heap which I configured for the child jvm heap. Appreciate in helping me understand the scenario. Regards Nagarjuna K -- Ted.
[no subject]
Dear Sir, I have a question about Hadoop, when I use Hadoop and Mapreduce to finish a job (only one job in here), can I control the file to work in which node? For example, I have only one job and this job have 10 files (10 mapper need to run). Also in my severs, I have one head node and four working node. My question is: can I control those 10 files to working in which node? Such as: No.1 file work in node1, No.3 file work in node2, No.5 file work in node3 and No.8 file work in node4. If I can do this, that means I can control the task. Is that means I still can control this file in next around (I have a loop in head node;I can do another mapreduce work). For example, I can set up No.5 file in 1st around worked node3 and I also can set up No.5 file work in node 2 in 2nd around. If I cannot, is that means, for Hadoop, the file will work in which node just like a “black box”, the user cannot control the file will work in which node, because you think the user do not need control it, just let HDFS help them to finish the parallel work. Therefore, the Hadoop cannot control the task in one job, but can control the multiple jobs. Thank you so much! Fan Bai PhD Candidate Computer Science Department Georgia State University Atlanta, GA 30303
Re: disk used percentage is not symmetric on datanodes (balancer)
Yes On Mar 24, 2013 9:25 PM, Tapas Sarangi tapas.sara...@gmail.com wrote: Thanks. Does this need a restart of hadoop in the nodes where this modification is made ? - On Mar 24, 2013, at 8:06 PM, Jamal B jm151...@gmail.com wrote: dfs.datanode.du.reserved You could tweak that param on the smaller nodes to force the flow of blocks to other nodes. A short term hack at best, but should help the situation a bit. On Mar 24, 2013 7:09 PM, Tapas Sarangi tapas.sara...@gmail.com wrote: On Mar 24, 2013, at 4:34 PM, Jamal B jm151...@gmail.com wrote: It shouldn't cause further problems since most of your small nodes are already their capacity. You could set or increase the dfs reserved property on your smaller nodes to force the flow of blocks onto the larger nodes. Thanks. Can you please specify which are the dfs properties that we can set or modify to force the flow of blocks directed towards the larger nodes than the smaller nodes ? - On Mar 24, 2013 4:45 PM, Tapas Sarangi tapas.sara...@gmail.com wrote: Hi, Thanks for the idea, I will give this a try and report back. My worry is if we decommission a small node (one at a time), will it move the data to larger nodes or choke another smaller nodes ? In principle it should distribute the blocks, the point is it is not distributing the way we expect it to, so do you think this may cause further problems ? - On Mar 24, 2013, at 3:37 PM, Jamal B jm151...@gmail.com wrote: Then I think the only way around this would be to decommission 1 at a time, the smaller nodes, and ensure that the blocks are moved to the larger nodes. And once complete, bring back in the smaller nodes, but maybe only after you tweak the rack topology to match your disk layout more than network layout to compensate for the unbalanced nodes. Just my 2 cents On Sun, Mar 24, 2013 at 4:31 PM, Tapas Sarangi tapas.sara...@gmail.comwrote: Thanks. We have a 1-1 configuration of drives and folder in all the datanodes. -Tapas On Mar 24, 2013, at 3:29 PM, Jamal B jm151...@gmail.com wrote: On both types of nodes, what is your dfs.data.dir set to? Does it specify multiple folders on the same set's of drives or is it 1-1 between folder and drive? If it's set to multiple folders on the same drives, it is probably multiplying the amount of available capacity incorrectly in that it assumes a 1-1 relationship between folder and total capacity of the drive. On Sun, Mar 24, 2013 at 3:01 PM, Tapas Sarangi tapas.sara...@gmail.com wrote: Yes, thanks for pointing, but I already know that it is completing the balancing when exiting otherwise it shouldn't exit. Your answer doesn't solve the problem I mentioned earlier in my message. 'hdfs' is stalling and hadoop is not writing unless space is cleared up from the cluster even though df shows the cluster has about 500 TB of free space. --- On Mar 24, 2013, at 1:54 PM, Balaji Narayanan (பாலாஜி நாராயணன்) bal...@balajin.net wrote: -setBalancerBandwidth bandwidth in bytes per second So the value is bytes per second. If it is running and exiting,it means it has completed the balancing. On 24 March 2013 11:32, Tapas Sarangi tapas.sara...@gmail.com wrote: Yes, we are running balancer, though a balancer process runs for almost a day or more before exiting and starting over. Current dfs.balance.bandwidthPerSec value is set to 2x10^9. I assume that's bytes so about 2 GigaByte/sec. Shouldn't that be reasonable ? If it is in Bits then we have a problem. What's the unit for dfs.balance.bandwidthPerSec ? - On Mar 24, 2013, at 1:23 PM, Balaji Narayanan (பாலாஜி நாராயணன்) li...@balajin.net wrote: Are you running balancer? If balancer is running and if it is slow, try increasing the balancer bandwidth On 24 March 2013 09:21, Tapas Sarangi tapas.sara...@gmail.comwrote: Thanks for the follow up. I don't know whether attachment will pass through this mailing list, but I am attaching a pdf that contains the usage of all live nodes. All nodes starting with letter g are the ones with smaller storage space where as nodes starting with letter s have larger storage space. As you will see, most of the gXX nodes are completely full whereas sXX nodes have a lot of unused space. Recently, we are facing crisis frequently as 'hdfs' goes into a mode where it is not able to write any further even though the total space available in the cluster is about 500 TB. We believe this has something to do with the way it is balancing the nodes, but don't understand the problem yet. May be the attached PDF will help some of you (experts) to see what is going wrong here... Thanks -- Balancer know about topology,but when calculate balancing it operates only with nodes not with racks. You can see how it work in Balancer.java in BalancerDatanode about string 509. I was wrong about 350Tb,35Tb it calculates in such way : For
Re: is it possible to disable security in MapReduce to avoid having PriviledgedActionException?
What is the exact error you're getting? Can you please paste with the full stack trace and your version in use? Many times the PriviledgedActionException is just a wrapper around the real cause and gets overlooked. It does not necessarily appear due to security code (whether security is enabled or disabled). In any case, if you meant to run MR with zero UGI.doAs (which will wrap with that exception) then no, thats not possible to do. On Mon, Mar 25, 2013 at 12:57 AM, Pedro Sá da Costa psdc1...@gmail.com wrote: Hi, is it possible to disable security in MapReduce to avoid having PriviledgedActionException? Thanks, -- Harsh J
Re:Re: disk used percentage is not symmetric on datanodes (balancer)
if the balancer is not running ,or with a low bandwith and slow reaction, i think there may have a signatual unsymmetric between datanodes . At 2013-03-25 04:37:05,Jamal B jm151...@gmail.com wrote: Then I think the only way around this would be to decommission 1 at a time, the smaller nodes, and ensure that the blocks are moved to the larger nodes. And once complete, bring back in the smaller nodes, but maybe only after you tweak the rack topology to match your disk layout more than network layout to compensate for the unbalanced nodes. Just my 2 cents On Sun, Mar 24, 2013 at 4:31 PM, Tapas Sarangi tapas.sara...@gmail.com wrote: Thanks. We have a 1-1 configuration of drives and folder in all the datanodes. -Tapas On Mar 24, 2013, at 3:29 PM, Jamal B jm151...@gmail.com wrote: On both types of nodes, what is your dfs.data.dir set to? Does it specify multiple folders on the same set's of drives or is it 1-1 between folder and drive? If it's set to multiple folders on the same drives, it is probably multiplying the amount of available capacity incorrectly in that it assumes a 1-1 relationship between folder and total capacity of the drive. On Sun, Mar 24, 2013 at 3:01 PM, Tapas Sarangi tapas.sara...@gmail.com wrote: Yes, thanks for pointing, but I already know that it is completing the balancing when exiting otherwise it shouldn't exit. Your answer doesn't solve the problem I mentioned earlier in my message. 'hdfs' is stalling and hadoop is not writing unless space is cleared up from the cluster even though df shows the cluster has about 500 TB of free space. --- On Mar 24, 2013, at 1:54 PM, Balaji Narayanan (பாலாஜி நாராயணன்) bal...@balajin.net wrote: -setBalancerBandwidth bandwidth in bytes per second So the value is bytes per second. If it is running and exiting,it means it has completed the balancing. On 24 March 2013 11:32, Tapas Sarangi tapas.sara...@gmail.com wrote: Yes, we are running balancer, though a balancer process runs for almost a day or more before exiting and starting over. Current dfs.balance.bandwidthPerSec value is set to 2x10^9. I assume that's bytes so about 2 GigaByte/sec. Shouldn't that be reasonable ? If it is in Bits then we have a problem. What's the unit for dfs.balance.bandwidthPerSec ? - On Mar 24, 2013, at 1:23 PM, Balaji Narayanan (பாலாஜி நாராயணன்) li...@balajin.net wrote: Are you running balancer? If balancer is running and if it is slow, try increasing the balancer bandwidth On 24 March 2013 09:21, Tapas Sarangi tapas.sara...@gmail.com wrote: Thanks for the follow up. I don't know whether attachment will pass through this mailing list, but I am attaching a pdf that contains the usage of all live nodes. All nodes starting with letter g are the ones with smaller storage space where as nodes starting with letter s have larger storage space. As you will see, most of the gXX nodes are completely full whereas sXX nodes have a lot of unused space. Recently, we are facing crisis frequently as 'hdfs' goes into a mode where it is not able to write any further even though the total space available in the cluster is about 500 TB. We believe this has something to do with the way it is balancing the nodes, but don't understand the problem yet. May be the attached PDF will help some of you (experts) to see what is going wrong here... Thanks -- Balancer know about topology,but when calculate balancing it operates only with nodes not with racks. You can see how it work in Balancer.java in BalancerDatanode about string 509. I was wrong about 350Tb,35Tb it calculates in such way : For example: cluster_capacity=3.5Pb cluster_dfsused=2Pb avgutil=cluster_dfsused/cluster_capacity*100=57.14% used cluster capacity Then we know avg node utilization (node_dfsused/node_capacity*100) .Balancer think that all good if avgutil +10node_utilizazation=avgutil-10. Ideal case that all node used avgutl of capacity.but for 12TB node its only 6.5Tb and for 72Tb its about 40Tb. Balancer cant help you. Show me http://namenode.rambler.ru:50070/dfsnodelist.jsp?whatNodes=LIVE if you can. In ideal case with replication factor 2 ,with two nodes 12Tb and 72Tb you will be able to have only 12Tb replication data. Yes, this is true for exactly two nodes in the cluster with 12 TB and 72 TB, but not true for more than two nodes in the cluster. Best way,on my opinion,it is using multiple racks.Nodes in rack must be with identical capacity.Racks must be identical capacity. For example: rack1: 1 node with 72Tb rack2: 6 nodes with 12Tb rack3: 3 nodes with 24Tb It helps with balancing,because dublicated block must be another rack. The same question I asked earlier in this message, does multiple racks with default threshold for the balancer minimizes the difference between racks ? Why did you select hdfs?May be lustre,cephfs and other is better choise. It wasn't my decision,
Hadoop-2.x native libraries
Hi, How to get hadoop-2.0.3-alpha native libraries, it was compiled under 32bits OS in the released package currently.
Re: Hadoop-2.x native libraries
If you're using a tarball, you'll need to build a native-added tarball yourself with mvn package -Pdist,native,docs -DskipTests -Dtar and then use that. Alternatively, if you're interested in packages, use the Apache Bigtop's scripts from http://bigtop.apache.org/ project's repository and generate the packages with native libs as well. On Mon, Mar 25, 2013 at 9:27 AM, Azuryy Yu azury...@gmail.com wrote: Hi, How to get hadoop-2.0.3-alpha native libraries, it was compiled under 32bits OS in the released package currently. -- Harsh J
Re: Child JVM memory allocation / Usage
The MapTask may consume some memory of its own as well. What is your io.sort.mb (MR1) or mapreduce.task.io.sort.mb (MR2) set to? On Sun, Mar 24, 2013 at 3:40 PM, nagarjuna kanamarlapudi nagarjuna.kanamarlap...@gmail.com wrote: Hi, I configured my child jvm heap to 2 GB. So, I thought I could really read 1.5GB of data and store it in memory (mapper/reducer). I wanted to confirm the same and wrote the following piece of code in the configure method of mapper. @Override public void configure(JobConf job) { System.out.println(FREE MEMORY -- + Runtime.getRuntime().freeMemory()); System.out.println(MAX MEMORY --- + Runtime.getRuntime().maxMemory()); } Surprisingly the output was FREE MEMORY -- 341854864 = 320 MB MAX MEMORY ---1908932608 = 1.9 GB I am just wondering what processes are taking up that extra 1.6GB of heap which I configured for the child jvm heap. Appreciate in helping me understand the scenario. Regards Nagarjuna K -- Harsh J
Re: Child JVM memory allocation / Usage
io.sort.mb = 256 MB On Monday, March 25, 2013, Harsh J wrote: The MapTask may consume some memory of its own as well. What is your io.sort.mb (MR1) or mapreduce.task.io.sort.mb (MR2) set to? On Sun, Mar 24, 2013 at 3:40 PM, nagarjuna kanamarlapudi nagarjuna.kanamarlap...@gmail.com javascript:; wrote: Hi, I configured my child jvm heap to 2 GB. So, I thought I could really read 1.5GB of data and store it in memory (mapper/reducer). I wanted to confirm the same and wrote the following piece of code in the configure method of mapper. @Override public void configure(JobConf job) { System.out.println(FREE MEMORY -- + Runtime.getRuntime().freeMemory()); System.out.println(MAX MEMORY --- + Runtime.getRuntime().maxMemory()); } Surprisingly the output was FREE MEMORY -- 341854864 = 320 MB MAX MEMORY ---1908932608 = 1.9 GB I am just wondering what processes are taking up that extra 1.6GB of heap which I configured for the child jvm heap. Appreciate in helping me understand the scenario. Regards Nagarjuna K -- Harsh J -- Sent from iPhone
shuffling one intermediate pair to more than one reducer
Hello I have use case where i want to shuffle same pair to more than one reducer. is there anyone tried this or can give suggestion how to implement it. I have crated jira for same https://issues.apache.org/jira/browse/MAPREDUCE-5063 Thank you. -- * * * Thanx and Regards* * Vikas Jadhav*
Re: MapReduce Failed and Killed
Any MapReduce task needs to communicate with the tasktracker that launched it periodically in order to let the tasktracker know it is still alive and active. The time for which silence is tolerated is controlled by a configuration property mapred.task.timeout. It looks like in your case, this has already been bumped up to 20 minutes (from the default 10 minutes). It also looks like this is not sufficient. You could bump this value even further up. However, the correct approach could be to see what the reducer is actually doing to become inactive during this time. Can you look at the reducer attempt's logs (which you can access from the web UI of the Jobtracker) and post them here ? Thanks hemanth On Fri, Mar 22, 2013 at 5:32 PM, Jinchun Kim cien...@gmail.com wrote: Hi, All. I'm trying to create category-based splits of Wikipedia dataset(41GB) and the training data set(5GB) using Mahout. I'm using following command. $MAHOUT_HOME/bin/mahout wikipediaDataSetCreator -i wikipedia/chunks -o wikipediainput -c $MAHOUT_HOME/examples/temp/categories.txt I had no problem with the training data set, but Hadoop showed following messages when I tried to do a same job with Wikipedia dataset, . 13/03/21 22:31:00 INFO mapred.JobClient: map 27% reduce 1% 13/03/21 22:40:31 INFO mapred.JobClient: map 27% reduce 2% 13/03/21 22:58:49 INFO mapred.JobClient: map 27% reduce 3% 13/03/21 23:22:57 INFO mapred.JobClient: map 27% reduce 4% 13/03/21 23:46:32 INFO mapred.JobClient: map 27% reduce 5% 13/03/22 00:27:14 INFO mapred.JobClient: map 27% reduce 6% 13/03/22 01:06:55 INFO mapred.JobClient: map 27% reduce 7% 13/03/22 01:14:06 INFO mapred.JobClient: map 27% reduce 3% 13/03/22 01:15:35 INFO mapred.JobClient: Task Id : attempt_201303211339_0002_r_00_1, Status : FAILED Task attempt_201303211339_0002_r_00_1 failed to report status for 1200 seconds. Killing! 13/03/22 01:20:09 INFO mapred.JobClient: map 27% reduce 4% 13/03/22 01:33:35 INFO mapred.JobClient: Task Id : attempt_201303211339_0002_m_37_1, Status : FAILED Task attempt_201303211339_0002_m_37_1 failed to report status for 1228 seconds. Killing! 13/03/22 01:35:12 INFO mapred.JobClient: map 27% reduce 5% 13/03/22 01:40:38 INFO mapred.JobClient: map 27% reduce 6% 13/03/22 01:52:28 INFO mapred.JobClient: map 27% reduce 7% 13/03/22 02:16:27 INFO mapred.JobClient: map 27% reduce 8% 13/03/22 02:19:02 INFO mapred.JobClient: Task Id : attempt_201303211339_0002_m_18_1, Status : FAILED Task attempt_201303211339_0002_m_18_1 failed to report status for 1204 seconds. Killing! 13/03/22 02:49:03 INFO mapred.JobClient: map 27% reduce 9% 13/03/22 02:52:04 INFO mapred.JobClient: map 28% reduce 9% Because I just started to learn how to run Hadoop, I have no idea how to solve this problem... Does anyone have an idea how to handle this weird thing? -- *Jinchun Kim*
Re: Hadoop-2.x native libraries
Thanks Harsh! I used -Pnative got it. I am compile src code. I made MRv1 work with HDFSv2 successfully. On Mar 25, 2013 12:56 PM, Harsh J ha...@cloudera.com wrote: If you're using a tarball, you'll need to build a native-added tarball yourself with mvn package -Pdist,native,docs -DskipTests -Dtar and then use that. Alternatively, if you're interested in packages, use the Apache Bigtop's scripts from http://bigtop.apache.org/ project's repository and generate the packages with native libs as well. On Mon, Mar 25, 2013 at 9:27 AM, Azuryy Yu azury...@gmail.com wrote: Hi, How to get hadoop-2.0.3-alpha native libraries, it was compiled under 32bits OS in the released package currently. -- Harsh J
Any answer ? Candidate application for map reduce
Any answers from anyone of you :) Regards Bala From: AMARNATH, Balachandar [mailto:balachandar.amarn...@airbus.com] Sent: 22 March 2013 10:25 To: user@hadoop.apache.org Subject: Candidate application for map reduce Hello, I am looking for an sample application (preferably image processing, feature detection etc) that can be good candidate of map reduce paradigm. To very specific, I am looking for an open source simple application that process data and produce some results (A). When you split the data in to chunks, feed the chunks to the application, and merge the processed chunks to get A back. Is there any website where can I look into such kind of benchmark applications ? Any pointers and thoughts will be helpful here. With thanks and regards Balachandar The information in this e-mail is confidential. The contents may not be disclosed or used by anyone other than the addressee. Access to this e-mail by anyone else is unauthorised. If you are not the intended recipient, please notify Airbus immediately and delete this e-mail. Airbus cannot accept any responsibility for the accuracy or completeness of this e-mail as it has been sent over public networks. If you have any concerns over the content of this message or its Accuracy or Integrity, please contact Airbus immediately. All outgoing e-mails from Airbus are checked using regularly updated virus scanning software but you should take whatever measures you deem to be appropriate to ensure that this message and any attachments are virus free. The information in this e-mail is confidential. The contents may not be disclosed or used by anyone other than the addressee. Access to this e-mail by anyone else is unauthorised. If you are not the intended recipient, please notify Airbus immediately and delete this e-mail. Airbus cannot accept any responsibility for the accuracy or completeness of this e-mail as it has been sent over public networks. If you have any concerns over the content of this message or its Accuracy or Integrity, please contact Airbus immediately. All outgoing e-mails from Airbus are checked using regularly updated virus scanning software but you should take whatever measures you deem to be appropriate to ensure that this message and any attachments are virus free.