[jira] Commented: (MAPREDUCE-2216) speculation should normalize progress rates based on amount of input data
[ https://issues.apache.org/jira/browse/MAPREDUCE-2216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12970055#action_12970055 ] Devaraj Das commented on MAPREDUCE-2216: MAPREDUCE-718? > speculation should normalize progress rates based on amount of input data > - > > Key: MAPREDUCE-2216 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-2216 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: jobtracker >Reporter: Joydeep Sen Sarma > > We frequently see skews in data distribution both on the mappers and > reducers. The small ones finish quickly and the longer ones immediately get > speculated. We should normalize progress rates used by speculation with some > metric correlated to the amount of data processed by the task (like bytes > read of rows processed). That will prevent these unnecessary speculations. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1334) contrib/index - test - TestIndexUpdater fails due to an additional presence of file _SUCCESS in hdfs
[ https://issues.apache.org/jira/browse/MAPREDUCE-1334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hemanth Yamijala updated MAPREDUCE-1334: Resolution: Fixed Hadoop Flags: [Reviewed] Status: Resolved (was: Patch Available) I just committed this. Thanks, Kay Kay. > contrib/index - test - TestIndexUpdater fails due to an additional presence > of file _SUCCESS in hdfs > - > > Key: MAPREDUCE-1334 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1334 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: contrib/index >Reporter: Kay Kay >Assignee: Kay Kay > Fix For: 0.23.0 > > Attachments: MAPREDUCE-1334.patch, MAPREDUCE-1334.patch > > > $ cd src/contrib/index > $ ant clean test > This fails the test TestIndexUpdater due to a mismatch in the - doneFileNames > - data structure, when it is being run with different parameters. > (ArrayIndexOutOfBoundsException raised when inserting elements in > doneFileNames, array ). > Debugging further - there seems to be an additional file called as - > hdfs://localhost:36021/myoutput/_SUCCESS , taken into consideration in > addition to those that begins with done* . The presence of the extra file > causes the error. > Attaching a patch that would circumvent this by increasing the array length > of shards by 1 . > But longer term the test fixtures need to be probably revisited to see if the > presence of _SUCCESS as a file is a good thing to begin with before we even > get to this test case. > Any comments / suggestions on the same welcome. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-2215) A more elegant FileSystem#listCorruptFileBlocks API (RAID changes)
[ https://issues.apache.org/jira/browse/MAPREDUCE-2215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hairong Kuang updated MAPREDUCE-2215: - Resolution: Fixed Fix Version/s: 0.23.0 Hadoop Flags: [Reviewed] Status: Resolved (was: Patch Available) I've just committed this. Thanks, Patrick! > A more elegant FileSystem#listCorruptFileBlocks API (RAID changes) > -- > > Key: MAPREDUCE-2215 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-2215 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: contrib/raid >Reporter: Patrick Kling >Assignee: Patrick Kling > Fix For: 0.23.0 > > Attachments: MAPREDUCE-2215.patch > > > Map/reduce changes related to HADOOP-7060 and HDFS-1533. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-2215) A more elegant FileSystem#listCorruptFileBlocks API (RAID changes)
[ https://issues.apache.org/jira/browse/MAPREDUCE-2215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12970016#action_12970016 ] Hairong Kuang commented on MAPREDUCE-2215: -- +1. This looks good to me. > A more elegant FileSystem#listCorruptFileBlocks API (RAID changes) > -- > > Key: MAPREDUCE-2215 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-2215 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: contrib/raid >Reporter: Patrick Kling >Assignee: Patrick Kling > Attachments: MAPREDUCE-2215.patch > > > Map/reduce changes related to HADOOP-7060 and HDFS-1533. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-2215) A more elegant FileSystem#listCorruptFileBlocks API (RAID changes)
[ https://issues.apache.org/jira/browse/MAPREDUCE-2215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12969988#action_12969988 ] Patrick Kling commented on MAPREDUCE-2215: -- I verified that the tests still pass after updating the patches for HADOOP-7060 and HDFS-1533. > A more elegant FileSystem#listCorruptFileBlocks API (RAID changes) > -- > > Key: MAPREDUCE-2215 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-2215 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: contrib/raid >Reporter: Patrick Kling >Assignee: Patrick Kling > Attachments: MAPREDUCE-2215.patch > > > Map/reduce changes related to HADOOP-7060 and HDFS-1533. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1861) Raid should rearrange the replicas while raiding
[ https://issues.apache.org/jira/browse/MAPREDUCE-1861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12969981#action_12969981 ] Scott Chen commented on MAPREDUCE-1861: --- {code} [exec] [exec] +1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] +1 tests included. The patch appears to include 3 new or modified tests. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. [exec] [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings. [exec] [exec] +1 system test framework. The patch passed system test framework compile. [exec] [exec] [exec] [exec] [exec] == [exec] == [exec] Finished build. [exec] == [exec] == {code} > Raid should rearrange the replicas while raiding > > > Key: MAPREDUCE-1861 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1861 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: contrib/raid >Affects Versions: 0.23.0 >Reporter: Scott Chen >Assignee: Scott Chen > Fix For: 0.23.0 > > Attachments: MAPREDUCE-1861-v2.txt, MAPREDUCE-1861-v3.txt, > MAPREDUCE-1861.txt, MAPREDUCE-1861.txt > > > Raided file introduce extra dependencies on the blocks on the same stripe. > Therefore we need a new way to place the blocks. > It is desirable that raided file satisfies the following two conditions: > a. Replicas on the same stripe should be on different machines (or racks) > b. Replicas of the same block should be on different racks > MAPREDUCE-1831 will try to delete the replicas on the same stripe and the > same machine (a). > But in the mean time, it will try to maintain the number of distinct racks of > one block (b). > We cannot satisfy (a) and (b) at the same time with the current logic in > BlockPlacementPolicyDefault.chooseTarget(). > One choice we have is to change BlockPlacementPolicyDefault.chooseTarget(). > However, this placement is in general good for all files including the > unraided ones. > It is not clear to us that we can make this good for both raided and unraided > files. > So we propose this idea that when raiding the file. We create one more > off-rack replica (so the replication=4 now). > Than we delete two blocks using the policy in MAPREDUCE-1831 after that > (replication=2 now). > This way we can rearrange the replicas to satisfy (a) and (b) at the same > time. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1861) Raid should rearrange the replicas while raiding
[ https://issues.apache.org/jira/browse/MAPREDUCE-1861?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Scott Chen updated MAPREDUCE-1861: -- Attachment: MAPREDUCE-1861-v3.txt Addressed Ram's comments > Raid should rearrange the replicas while raiding > > > Key: MAPREDUCE-1861 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1861 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: contrib/raid >Affects Versions: 0.23.0 >Reporter: Scott Chen >Assignee: Scott Chen > Fix For: 0.23.0 > > Attachments: MAPREDUCE-1861-v2.txt, MAPREDUCE-1861-v3.txt, > MAPREDUCE-1861.txt, MAPREDUCE-1861.txt > > > Raided file introduce extra dependencies on the blocks on the same stripe. > Therefore we need a new way to place the blocks. > It is desirable that raided file satisfies the following two conditions: > a. Replicas on the same stripe should be on different machines (or racks) > b. Replicas of the same block should be on different racks > MAPREDUCE-1831 will try to delete the replicas on the same stripe and the > same machine (a). > But in the mean time, it will try to maintain the number of distinct racks of > one block (b). > We cannot satisfy (a) and (b) at the same time with the current logic in > BlockPlacementPolicyDefault.chooseTarget(). > One choice we have is to change BlockPlacementPolicyDefault.chooseTarget(). > However, this placement is in general good for all files including the > unraided ones. > It is not clear to us that we can make this good for both raided and unraided > files. > So we propose this idea that when raiding the file. We create one more > off-rack replica (so the replication=4 now). > Than we delete two blocks using the policy in MAPREDUCE-1831 after that > (replication=2 now). > This way we can rearrange the replicas to satisfy (a) and (b) at the same > time. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-2216) speculation should normalize progress rates based on amount of input data
[ https://issues.apache.org/jira/browse/MAPREDUCE-2216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joydeep Sen Sarma updated MAPREDUCE-2216: - Component/s: jobtracker hard to believe there's not a jira open for this already - please close/redirect if that is so. > speculation should normalize progress rates based on amount of input data > - > > Key: MAPREDUCE-2216 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-2216 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: jobtracker >Reporter: Joydeep Sen Sarma > > We frequently see skews in data distribution both on the mappers and > reducers. The small ones finish quickly and the longer ones immediately get > speculated. We should normalize progress rates used by speculation with some > metric correlated to the amount of data processed by the task (like bytes > read of rows processed). That will prevent these unnecessary speculations. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (MAPREDUCE-2216) speculation should normalize progress rates based on amount of input data
speculation should normalize progress rates based on amount of input data - Key: MAPREDUCE-2216 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2216 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: Joydeep Sen Sarma We frequently see skews in data distribution both on the mappers and reducers. The small ones finish quickly and the longer ones immediately get speculated. We should normalize progress rates used by speculation with some metric correlated to the amount of data processed by the task (like bytes read of rows processed). That will prevent these unnecessary speculations. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1831) BlockPlacement policy for RAID
[ https://issues.apache.org/jira/browse/MAPREDUCE-1831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12969887#action_12969887 ] Ramkumar Vadali commented on MAPREDUCE-1831: +1 looks good. Please run unit-tests and test-patch. > BlockPlacement policy for RAID > -- > > Key: MAPREDUCE-1831 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1831 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: contrib/raid >Affects Versions: 0.23.0 >Reporter: Scott Chen >Assignee: Scott Chen > Fix For: 0.23.0 > > Attachments: MAPREDUCE-1831-v2.txt, MAPREDUCE-1831.20100610.txt, > MAPREDUCE-1831.txt, MAPREDUCE-1831.v1.1.txt > > > Raid introduce the new dependency between blocks within a file. > The blocks help decode each other. Therefore we should avoid put them on the > same machine. > The proposed BlockPlacementPolicy does the following > 1. When writing parity blocks, it avoid the parity blocks and source blocks > sit together. > 2. When reducing replication number, it deletes the blocks that sits with > other dependent blocks. > 3. It does not change the way we write normal files. It only has different > behavior when processing raid files. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1861) Raid should rearrange the replicas while raiding
[ https://issues.apache.org/jira/browse/MAPREDUCE-1861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12969872#action_12969872 ] Scott Chen commented on MAPREDUCE-1861: --- Here is the review board https://reviews.apache.org/r/160/ > Raid should rearrange the replicas while raiding > > > Key: MAPREDUCE-1861 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1861 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: contrib/raid >Affects Versions: 0.23.0 >Reporter: Scott Chen >Assignee: Scott Chen > Fix For: 0.23.0 > > Attachments: MAPREDUCE-1861-v2.txt, MAPREDUCE-1861.txt, > MAPREDUCE-1861.txt > > > Raided file introduce extra dependencies on the blocks on the same stripe. > Therefore we need a new way to place the blocks. > It is desirable that raided file satisfies the following two conditions: > a. Replicas on the same stripe should be on different machines (or racks) > b. Replicas of the same block should be on different racks > MAPREDUCE-1831 will try to delete the replicas on the same stripe and the > same machine (a). > But in the mean time, it will try to maintain the number of distinct racks of > one block (b). > We cannot satisfy (a) and (b) at the same time with the current logic in > BlockPlacementPolicyDefault.chooseTarget(). > One choice we have is to change BlockPlacementPolicyDefault.chooseTarget(). > However, this placement is in general good for all files including the > unraided ones. > It is not clear to us that we can make this good for both raided and unraided > files. > So we propose this idea that when raiding the file. We create one more > off-rack replica (so the replication=4 now). > Than we delete two blocks using the policy in MAPREDUCE-1831 after that > (replication=2 now). > This way we can rearrange the replicas to satisfy (a) and (b) at the same > time. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1831) BlockPlacement policy for RAID
[ https://issues.apache.org/jira/browse/MAPREDUCE-1831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12969871#action_12969871 ] Scott Chen commented on MAPREDUCE-1831: --- Here's the review board. https://reviews.apache.org/r/159/ > BlockPlacement policy for RAID > -- > > Key: MAPREDUCE-1831 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1831 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: contrib/raid >Affects Versions: 0.23.0 >Reporter: Scott Chen >Assignee: Scott Chen > Fix For: 0.23.0 > > Attachments: MAPREDUCE-1831-v2.txt, MAPREDUCE-1831.20100610.txt, > MAPREDUCE-1831.txt, MAPREDUCE-1831.v1.1.txt > > > Raid introduce the new dependency between blocks within a file. > The blocks help decode each other. Therefore we should avoid put them on the > same machine. > The proposed BlockPlacementPolicy does the following > 1. When writing parity blocks, it avoid the parity blocks and source blocks > sit together. > 2. When reducing replication number, it deletes the blocks that sits with > other dependent blocks. > 3. It does not change the way we write normal files. It only has different > behavior when processing raid files. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-2215) A more elegant FileSystem#listCorruptFileBlocks API (RAID changes)
[ https://issues.apache.org/jira/browse/MAPREDUCE-2215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Kling updated MAPREDUCE-2215: - Status: Patch Available (was: Open) > A more elegant FileSystem#listCorruptFileBlocks API (RAID changes) > -- > > Key: MAPREDUCE-2215 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-2215 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: contrib/raid >Reporter: Patrick Kling >Assignee: Patrick Kling > Attachments: MAPREDUCE-2215.patch > > > Map/reduce changes related to HADOOP-7060 and HDFS-1533. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-2215) A more elegant FileSystem#listCorruptFileBlocks API (RAID changes)
[ https://issues.apache.org/jira/browse/MAPREDUCE-2215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Kling updated MAPREDUCE-2215: - Attachment: MAPREDUCE-2215.patch This patch passes all unit tests in src/contrib/raid. ant test-patch output: {code} [exec] -1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] -1 tests included. The patch doesn't appear to include any new or modified tests. [exec] Please justify why no new tests are needed for this patch. [exec] Also please list what manual steps were performed to verify this patch. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. [exec] [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings. [exec] [exec] +1 system test framework. The patch passed system test framework compile. {code} This change is tested indirectly by TestRaidShellFsck and TestBlockFixer. > A more elegant FileSystem#listCorruptFileBlocks API (RAID changes) > -- > > Key: MAPREDUCE-2215 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-2215 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: contrib/raid >Reporter: Patrick Kling >Assignee: Patrick Kling > Attachments: MAPREDUCE-2215.patch > > > Map/reduce changes related to HADOOP-7060 and HDFS-1533. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.