[ https://issues.apache.org/jira/browse/HADOOP-2094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12560037#action_12560037 ]
Raghu Angadi commented on HADOOP-2094: -------------------------------------- Random partition is fine and patch looks fine. If there are two writers, there is 25% probability that both write to the same partition. with 3, it becomes 62.5% (that 2 are more writing the same disk) 90% for 4 etc.. If that is ok, then this patch is fine. Assuming typically these apps are IO bound, this sounds pretty large panalty. But I don't know how it fixes problems reported in the description.. actually I did not quite understand the problem any way. > DFS should not use round robin policy in determing on which volume (file > system partition) to allocate for the next block > -------------------------------------------------------------------------------------------------------------------------- > > Key: HADOOP-2094 > URL: https://issues.apache.org/jira/browse/HADOOP-2094 > Project: Hadoop > Issue Type: Improvement > Components: dfs > Reporter: Runping Qi > Assignee: dhruba borthakur > Attachments: randomDatanodePartition.patch > > > When multiple file system partitions are configured for the data storage of a > data node, > it uses a strict round robin policy to decide which partition to use for > writing the next block. > This may result in anormaly cases in which the blocks of a file are not > evenly distributed across > the partitions. For example, when we use distcp to copy files with each node > have 4 mappers running concurrently, > those 4 mappers are writing to DFS at about the same rate. Thus, it is > possible that the 4 mappers write out > blocks interleavingly. If there are 4 file system partitions configured for > the local data node, it is possible that each mapper will > continue to write its blocks on to the same file system partition. > A simple random placement policy will avoid such anormaly cases, and does not > have any obvious drawbacks. > -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.