[jira] Commented: (MAPREDUCE-1176) Contribution: FixedLengthInputFormat and FixedLengthRecordReader
[ https://issues.apache.org/jira/browse/MAPREDUCE-1176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12777832#action_12777832 ] Hadoop QA commented on MAPREDUCE-1176: -- +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12424931/MAPREDUCE-1176-v2.patch against trunk revision 836063. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/244/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/244/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/244/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/244/console This message is automatically generated. > Contribution: FixedLengthInputFormat and FixedLengthRecordReader > > > Key: MAPREDUCE-1176 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1176 > Project: Hadoop Map/Reduce > Issue Type: New Feature >Affects Versions: 0.20.1, 0.20.2 > Environment: Any >Reporter: BitsOfInfo >Priority: Minor > Attachments: MAPREDUCE-1176-v1.patch, MAPREDUCE-1176-v2.patch > > > Hello, > I would like to contribute the following two classes for incorporation into > the mapreduce.lib.input package. These two classes can be used when you need > to read data from files containing fixed length (fixed width) records. Such > files have no CR/LF (or any combination thereof), no delimiters etc, but each > record is a fixed length, and extra data is padded with spaces. The data is > one gigantic line within a file. > Provided are two classes first is the FixedLengthInputFormat and its > corresponding FixedLengthRecordReader. When creating a job that specifies > this input format, the job must have the > "mapreduce.input.fixedlengthinputformat.record.length" property set as follows > myJobConf.setInt("mapreduce.input.fixedlengthinputformat.record.length",[myFixedRecordLength]); > OR > myJobConf.setInt(FixedLengthInputFormat.FIXED_RECORD_LENGTH, > [myFixedRecordLength]); > This input format overrides computeSplitSize() in order to ensure that > InputSplits do not contain any partial records since with fixed records there > is no way to determine where a record begins if that were to occur. Each > InputSplit passed to the FixedLengthRecordReader will start at the beginning > of a record, and the last byte in the InputSplit will be the last byte of a > record. The override of computeSplitSize() delegates to FileInputFormat's > compute method, and then adjusts the returned split size by doing the > following: (Math.floor(fileInputFormatsComputedSplitSize / fixedRecordLength) > * fixedRecordLength) > This suite of fixed length input format classes, does not support compressed > files. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1176) Contribution: FixedLengthInputFormat and FixedLengthRecordReader
[ https://issues.apache.org/jira/browse/MAPREDUCE-1176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon updated MAPREDUCE-1176: --- Status: Open (was: Patch Available) toggling patch status (this kicks the qa bot to rerun the checks) > Contribution: FixedLengthInputFormat and FixedLengthRecordReader > > > Key: MAPREDUCE-1176 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1176 > Project: Hadoop Map/Reduce > Issue Type: New Feature >Affects Versions: 0.20.1, 0.20.2 > Environment: Any >Reporter: BitsOfInfo >Priority: Minor > Attachments: MAPREDUCE-1176-v1.patch, MAPREDUCE-1176-v2.patch > > > Hello, > I would like to contribute the following two classes for incorporation into > the mapreduce.lib.input package. These two classes can be used when you need > to read data from files containing fixed length (fixed width) records. Such > files have no CR/LF (or any combination thereof), no delimiters etc, but each > record is a fixed length, and extra data is padded with spaces. The data is > one gigantic line within a file. > Provided are two classes first is the FixedLengthInputFormat and its > corresponding FixedLengthRecordReader. When creating a job that specifies > this input format, the job must have the > "mapreduce.input.fixedlengthinputformat.record.length" property set as follows > myJobConf.setInt("mapreduce.input.fixedlengthinputformat.record.length",[myFixedRecordLength]); > OR > myJobConf.setInt(FixedLengthInputFormat.FIXED_RECORD_LENGTH, > [myFixedRecordLength]); > This input format overrides computeSplitSize() in order to ensure that > InputSplits do not contain any partial records since with fixed records there > is no way to determine where a record begins if that were to occur. Each > InputSplit passed to the FixedLengthRecordReader will start at the beginning > of a record, and the last byte in the InputSplit will be the last byte of a > record. The override of computeSplitSize() delegates to FileInputFormat's > compute method, and then adjusts the returned split size by doing the > following: (Math.floor(fileInputFormatsComputedSplitSize / fixedRecordLength) > * fixedRecordLength) > This suite of fixed length input format classes, does not support compressed > files. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1176) Contribution: FixedLengthInputFormat and FixedLengthRecordReader
[ https://issues.apache.org/jira/browse/MAPREDUCE-1176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon updated MAPREDUCE-1176: --- Status: Patch Available (was: Open) > Contribution: FixedLengthInputFormat and FixedLengthRecordReader > > > Key: MAPREDUCE-1176 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1176 > Project: Hadoop Map/Reduce > Issue Type: New Feature >Affects Versions: 0.20.1, 0.20.2 > Environment: Any >Reporter: BitsOfInfo >Priority: Minor > Attachments: MAPREDUCE-1176-v1.patch, MAPREDUCE-1176-v2.patch > > > Hello, > I would like to contribute the following two classes for incorporation into > the mapreduce.lib.input package. These two classes can be used when you need > to read data from files containing fixed length (fixed width) records. Such > files have no CR/LF (or any combination thereof), no delimiters etc, but each > record is a fixed length, and extra data is padded with spaces. The data is > one gigantic line within a file. > Provided are two classes first is the FixedLengthInputFormat and its > corresponding FixedLengthRecordReader. When creating a job that specifies > this input format, the job must have the > "mapreduce.input.fixedlengthinputformat.record.length" property set as follows > myJobConf.setInt("mapreduce.input.fixedlengthinputformat.record.length",[myFixedRecordLength]); > OR > myJobConf.setInt(FixedLengthInputFormat.FIXED_RECORD_LENGTH, > [myFixedRecordLength]); > This input format overrides computeSplitSize() in order to ensure that > InputSplits do not contain any partial records since with fixed records there > is no way to determine where a record begins if that were to occur. Each > InputSplit passed to the FixedLengthRecordReader will start at the beginning > of a record, and the last byte in the InputSplit will be the last byte of a > record. The override of computeSplitSize() delegates to FileInputFormat's > compute method, and then adjusts the returned split size by doing the > following: (Math.floor(fileInputFormatsComputedSplitSize / fixedRecordLength) > * fixedRecordLength) > This suite of fixed length input format classes, does not support compressed > files. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1176) Contribution: FixedLengthInputFormat and FixedLengthRecordReader
[ https://issues.apache.org/jira/browse/MAPREDUCE-1176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] BitsOfInfo updated MAPREDUCE-1176: -- Attachment: MAPREDUCE-1176-v2.patch Updated version of patch - Moved from 4 spaces to 2 spaces - Fixed long lines - Getting rid of java file attachments - removed 0.20.1 comment and other old comments > Contribution: FixedLengthInputFormat and FixedLengthRecordReader > > > Key: MAPREDUCE-1176 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1176 > Project: Hadoop Map/Reduce > Issue Type: New Feature >Affects Versions: 0.20.1, 0.20.2 > Environment: Any >Reporter: BitsOfInfo >Priority: Minor > Attachments: MAPREDUCE-1176-v1.patch, MAPREDUCE-1176-v2.patch > > > Hello, > I would like to contribute the following two classes for incorporation into > the mapreduce.lib.input package. These two classes can be used when you need > to read data from files containing fixed length (fixed width) records. Such > files have no CR/LF (or any combination thereof), no delimiters etc, but each > record is a fixed length, and extra data is padded with spaces. The data is > one gigantic line within a file. > Provided are two classes first is the FixedLengthInputFormat and its > corresponding FixedLengthRecordReader. When creating a job that specifies > this input format, the job must have the > "mapreduce.input.fixedlengthinputformat.record.length" property set as follows > myJobConf.setInt("mapreduce.input.fixedlengthinputformat.record.length",[myFixedRecordLength]); > OR > myJobConf.setInt(FixedLengthInputFormat.FIXED_RECORD_LENGTH, > [myFixedRecordLength]); > This input format overrides computeSplitSize() in order to ensure that > InputSplits do not contain any partial records since with fixed records there > is no way to determine where a record begins if that were to occur. Each > InputSplit passed to the FixedLengthRecordReader will start at the beginning > of a record, and the last byte in the InputSplit will be the last byte of a > record. The override of computeSplitSize() delegates to FileInputFormat's > compute method, and then adjusts the returned split size by doing the > following: (Math.floor(fileInputFormatsComputedSplitSize / fixedRecordLength) > * fixedRecordLength) > This suite of fixed length input format classes, does not support compressed > files. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1176) Contribution: FixedLengthInputFormat and FixedLengthRecordReader
[ https://issues.apache.org/jira/browse/MAPREDUCE-1176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] BitsOfInfo updated MAPREDUCE-1176: -- Attachment: (was: FixedLengthRecordReader.java) > Contribution: FixedLengthInputFormat and FixedLengthRecordReader > > > Key: MAPREDUCE-1176 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1176 > Project: Hadoop Map/Reduce > Issue Type: New Feature >Affects Versions: 0.20.1, 0.20.2 > Environment: Any >Reporter: BitsOfInfo >Priority: Minor > Attachments: MAPREDUCE-1176-v1.patch, MAPREDUCE-1176-v2.patch > > > Hello, > I would like to contribute the following two classes for incorporation into > the mapreduce.lib.input package. These two classes can be used when you need > to read data from files containing fixed length (fixed width) records. Such > files have no CR/LF (or any combination thereof), no delimiters etc, but each > record is a fixed length, and extra data is padded with spaces. The data is > one gigantic line within a file. > Provided are two classes first is the FixedLengthInputFormat and its > corresponding FixedLengthRecordReader. When creating a job that specifies > this input format, the job must have the > "mapreduce.input.fixedlengthinputformat.record.length" property set as follows > myJobConf.setInt("mapreduce.input.fixedlengthinputformat.record.length",[myFixedRecordLength]); > OR > myJobConf.setInt(FixedLengthInputFormat.FIXED_RECORD_LENGTH, > [myFixedRecordLength]); > This input format overrides computeSplitSize() in order to ensure that > InputSplits do not contain any partial records since with fixed records there > is no way to determine where a record begins if that were to occur. Each > InputSplit passed to the FixedLengthRecordReader will start at the beginning > of a record, and the last byte in the InputSplit will be the last byte of a > record. The override of computeSplitSize() delegates to FileInputFormat's > compute method, and then adjusts the returned split size by doing the > following: (Math.floor(fileInputFormatsComputedSplitSize / fixedRecordLength) > * fixedRecordLength) > This suite of fixed length input format classes, does not support compressed > files. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1176) Contribution: FixedLengthInputFormat and FixedLengthRecordReader
[ https://issues.apache.org/jira/browse/MAPREDUCE-1176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] BitsOfInfo updated MAPREDUCE-1176: -- Attachment: (was: FixedLengthInputFormat.java) > Contribution: FixedLengthInputFormat and FixedLengthRecordReader > > > Key: MAPREDUCE-1176 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1176 > Project: Hadoop Map/Reduce > Issue Type: New Feature >Affects Versions: 0.20.1, 0.20.2 > Environment: Any >Reporter: BitsOfInfo >Priority: Minor > Attachments: MAPREDUCE-1176-v1.patch, MAPREDUCE-1176-v2.patch > > > Hello, > I would like to contribute the following two classes for incorporation into > the mapreduce.lib.input package. These two classes can be used when you need > to read data from files containing fixed length (fixed width) records. Such > files have no CR/LF (or any combination thereof), no delimiters etc, but each > record is a fixed length, and extra data is padded with spaces. The data is > one gigantic line within a file. > Provided are two classes first is the FixedLengthInputFormat and its > corresponding FixedLengthRecordReader. When creating a job that specifies > this input format, the job must have the > "mapreduce.input.fixedlengthinputformat.record.length" property set as follows > myJobConf.setInt("mapreduce.input.fixedlengthinputformat.record.length",[myFixedRecordLength]); > OR > myJobConf.setInt(FixedLengthInputFormat.FIXED_RECORD_LENGTH, > [myFixedRecordLength]); > This input format overrides computeSplitSize() in order to ensure that > InputSplits do not contain any partial records since with fixed records there > is no way to determine where a record begins if that were to occur. Each > InputSplit passed to the FixedLengthRecordReader will start at the beginning > of a record, and the last byte in the InputSplit will be the last byte of a > record. The override of computeSplitSize() delegates to FileInputFormat's > compute method, and then adjusts the returned split size by doing the > following: (Math.floor(fileInputFormatsComputedSplitSize / fixedRecordLength) > * fixedRecordLength) > This suite of fixed length input format classes, does not support compressed > files. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1140) Per cache-file refcount can become negative when tasks release distributed-cache files
[ https://issues.apache.org/jira/browse/MAPREDUCE-1140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12777809#action_12777809 ] Jothi Padmanabhan commented on MAPREDUCE-1140: -- In {{getLocalCache}}, if the file is already present, only reference count is incremented, so does not throw any IOExceptions. If the file is not present, {{cachedArchives.put}} is done only after the necessary processing (read methods that can throw IOExceptions) is complete. So, I do not think this patch will lead to any inconsistency. I however do agree that it is a good idea to do the refactor of test case to avoid duplication. > Per cache-file refcount can become negative when tasks release > distributed-cache files > -- > > Key: MAPREDUCE-1140 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1140 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: tasktracker >Affects Versions: 0.20.2, 0.21.0, 0.22.0 >Reporter: Vinod K V >Assignee: Amareshwari Sriramadasu > Attachments: patch-1140-1.txt, patch-1140-ydist.txt, patch-1140.txt > > -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1176) Contribution: FixedLengthInputFormat and FixedLengthRecordReader
[ https://issues.apache.org/jira/browse/MAPREDUCE-1176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12777802#action_12777802 ] BitsOfInfo commented on MAPREDUCE-1176: --- I followed the instructions listed @ http://wiki.apache.org/hadoop/HowToContribute "Finally, patches should be attached to an issue report in Jira via the Attach File link on the issue's Jira. When you believe that your patch is ready to be committed, select the Submit Patch link on the issue's Jira. " So are you saying to delete the 2 *.java files and only upload the .patch? The *.patch file does contain a unit test in it so I am not sure why the comment above reported no tests were included. I ran this patch file against a clean trunk copy locally on my test machine and also verified it was ok through the "test-patch" task on the contribute how-to page. I'll remove the 4 space indents, when looking through other sources I found the 2 spaces and tons of wrapping unreadable. > Contribution: FixedLengthInputFormat and FixedLengthRecordReader > > > Key: MAPREDUCE-1176 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1176 > Project: Hadoop Map/Reduce > Issue Type: New Feature >Affects Versions: 0.20.1, 0.20.2 > Environment: Any >Reporter: BitsOfInfo >Priority: Minor > Attachments: FixedLengthInputFormat.java, > FixedLengthRecordReader.java, MAPREDUCE-1176-v1.patch > > > Hello, > I would like to contribute the following two classes for incorporation into > the mapreduce.lib.input package. These two classes can be used when you need > to read data from files containing fixed length (fixed width) records. Such > files have no CR/LF (or any combination thereof), no delimiters etc, but each > record is a fixed length, and extra data is padded with spaces. The data is > one gigantic line within a file. > Provided are two classes first is the FixedLengthInputFormat and its > corresponding FixedLengthRecordReader. When creating a job that specifies > this input format, the job must have the > "mapreduce.input.fixedlengthinputformat.record.length" property set as follows > myJobConf.setInt("mapreduce.input.fixedlengthinputformat.record.length",[myFixedRecordLength]); > OR > myJobConf.setInt(FixedLengthInputFormat.FIXED_RECORD_LENGTH, > [myFixedRecordLength]); > This input format overrides computeSplitSize() in order to ensure that > InputSplits do not contain any partial records since with fixed records there > is no way to determine where a record begins if that were to occur. Each > InputSplit passed to the FixedLengthRecordReader will start at the beginning > of a record, and the last byte in the InputSplit will be the last byte of a > record. The override of computeSplitSize() delegates to FileInputFormat's > compute method, and then adjusts the returned split size by doing the > following: (Math.floor(fileInputFormatsComputedSplitSize / fixedRecordLength) > * fixedRecordLength) > This suite of fixed length input format classes, does not support compressed > files. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1176) Contribution: FixedLengthInputFormat and FixedLengthRecordReader
[ https://issues.apache.org/jira/browse/MAPREDUCE-1176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1298#action_1298 ] Todd Lipcon commented on MAPREDUCE-1176: Hi, - Please *only* upload the patch to Hudson. Otherwise the QA bot gets confused and tries to apply your .java files as a patch. - Also, the coding style guidelines for Hadoop are have an indentation level of 2 spaces. It looks like your patch is full of tabs. There are a few other style violations. The coding style is http://java.sun.com/docs/codeconv/ with the change of 2 spaces instead of 4. It's probably easier to look through other parts of the Hadoop codebase and simply follow their example. - There's a comment referring to the 0.20.1 code. Since this patch is slated for trunk, not 0.20.1, please remove that. - There are some other bits of commented-out code. These are a no-no - either the code works and is important, in which case it should be there, or it's not important (or broken) and it shouldn't. Thanks again for contributing to Hadoop! The review process can take a while but it's important to maintain style consistency across the codebase. > Contribution: FixedLengthInputFormat and FixedLengthRecordReader > > > Key: MAPREDUCE-1176 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1176 > Project: Hadoop Map/Reduce > Issue Type: New Feature >Affects Versions: 0.20.1, 0.20.2 > Environment: Any >Reporter: BitsOfInfo >Priority: Minor > Attachments: FixedLengthInputFormat.java, > FixedLengthRecordReader.java, MAPREDUCE-1176-v1.patch > > > Hello, > I would like to contribute the following two classes for incorporation into > the mapreduce.lib.input package. These two classes can be used when you need > to read data from files containing fixed length (fixed width) records. Such > files have no CR/LF (or any combination thereof), no delimiters etc, but each > record is a fixed length, and extra data is padded with spaces. The data is > one gigantic line within a file. > Provided are two classes first is the FixedLengthInputFormat and its > corresponding FixedLengthRecordReader. When creating a job that specifies > this input format, the job must have the > "mapreduce.input.fixedlengthinputformat.record.length" property set as follows > myJobConf.setInt("mapreduce.input.fixedlengthinputformat.record.length",[myFixedRecordLength]); > OR > myJobConf.setInt(FixedLengthInputFormat.FIXED_RECORD_LENGTH, > [myFixedRecordLength]); > This input format overrides computeSplitSize() in order to ensure that > InputSplits do not contain any partial records since with fixed records there > is no way to determine where a record begins if that were to occur. Each > InputSplit passed to the FixedLengthRecordReader will start at the beginning > of a record, and the last byte in the InputSplit will be the last byte of a > record. The override of computeSplitSize() delegates to FileInputFormat's > compute method, and then adjusts the returned split size by doing the > following: (Math.floor(fileInputFormatsComputedSplitSize / fixedRecordLength) > * fixedRecordLength) > This suite of fixed length input format classes, does not support compressed > files. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1176) Contribution: FixedLengthInputFormat and FixedLengthRecordReader
[ https://issues.apache.org/jira/browse/MAPREDUCE-1176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1295#action_1295 ] Hadoop QA commented on MAPREDUCE-1176: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12424924/FixedLengthRecordReader.java against trunk revision 836063. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. -1 patch. The patch command could not apply the patch. Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/243/console This message is automatically generated. > Contribution: FixedLengthInputFormat and FixedLengthRecordReader > > > Key: MAPREDUCE-1176 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1176 > Project: Hadoop Map/Reduce > Issue Type: New Feature >Affects Versions: 0.20.1, 0.20.2 > Environment: Any >Reporter: BitsOfInfo >Priority: Minor > Attachments: FixedLengthInputFormat.java, > FixedLengthRecordReader.java, MAPREDUCE-1176-v1.patch > > > Hello, > I would like to contribute the following two classes for incorporation into > the mapreduce.lib.input package. These two classes can be used when you need > to read data from files containing fixed length (fixed width) records. Such > files have no CR/LF (or any combination thereof), no delimiters etc, but each > record is a fixed length, and extra data is padded with spaces. The data is > one gigantic line within a file. > Provided are two classes first is the FixedLengthInputFormat and its > corresponding FixedLengthRecordReader. When creating a job that specifies > this input format, the job must have the > "mapreduce.input.fixedlengthinputformat.record.length" property set as follows > myJobConf.setInt("mapreduce.input.fixedlengthinputformat.record.length",[myFixedRecordLength]); > OR > myJobConf.setInt(FixedLengthInputFormat.FIXED_RECORD_LENGTH, > [myFixedRecordLength]); > This input format overrides computeSplitSize() in order to ensure that > InputSplits do not contain any partial records since with fixed records there > is no way to determine where a record begins if that were to occur. Each > InputSplit passed to the FixedLengthRecordReader will start at the beginning > of a record, and the last byte in the InputSplit will be the last byte of a > record. The override of computeSplitSize() delegates to FileInputFormat's > compute method, and then adjusts the returned split size by doing the > following: (Math.floor(fileInputFormatsComputedSplitSize / fixedRecordLength) > * fixedRecordLength) > This suite of fixed length input format classes, does not support compressed > files. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1176) Contribution: FixedLengthInputFormat and FixedLengthRecordReader
[ https://issues.apache.org/jira/browse/MAPREDUCE-1176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] BitsOfInfo updated MAPREDUCE-1176: -- Affects Version/s: 0.20.2 Release Note: Addition of FixedLengthInputFormat and FixedLengthRecordReader in the org.apache.hadoop.mapreduce.lib.input package. These two classes can be used when you need to read data from files containing fixed length (fixed width) records. Such files have no CR/LF (or any combination thereof), no delimiters etc, but each record is a fixed length, and extra data is padded with spaces. The data is one gigantic line within a file. When creating a job that specifies this input format, the job must have the "mapreduce.input.fixedlengthinputformat.record.length" property set as follows myJobConf.setInt("mapreduce.input.fixedlengthinputformat.record.length",[myFixedRecordLength]); Please see javadoc for more details. Status: Patch Available (was: Open) Attached is a patch file which adds FixedLengthInputFormat, FixedLengthRecordReader and a unit test for the new input format. This patch was made against the trunk as of 11/13. > Contribution: FixedLengthInputFormat and FixedLengthRecordReader > > > Key: MAPREDUCE-1176 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1176 > Project: Hadoop Map/Reduce > Issue Type: New Feature >Affects Versions: 0.20.1, 0.20.2 > Environment: Any >Reporter: BitsOfInfo >Priority: Minor > Attachments: FixedLengthInputFormat.java, > FixedLengthRecordReader.java, MAPREDUCE-1176-v1.patch > > > Hello, > I would like to contribute the following two classes for incorporation into > the mapreduce.lib.input package. These two classes can be used when you need > to read data from files containing fixed length (fixed width) records. Such > files have no CR/LF (or any combination thereof), no delimiters etc, but each > record is a fixed length, and extra data is padded with spaces. The data is > one gigantic line within a file. > Provided are two classes first is the FixedLengthInputFormat and its > corresponding FixedLengthRecordReader. When creating a job that specifies > this input format, the job must have the > "mapreduce.input.fixedlengthinputformat.record.length" property set as follows > myJobConf.setInt("mapreduce.input.fixedlengthinputformat.record.length",[myFixedRecordLength]); > OR > myJobConf.setInt(FixedLengthInputFormat.FIXED_RECORD_LENGTH, > [myFixedRecordLength]); > This input format overrides computeSplitSize() in order to ensure that > InputSplits do not contain any partial records since with fixed records there > is no way to determine where a record begins if that were to occur. Each > InputSplit passed to the FixedLengthRecordReader will start at the beginning > of a record, and the last byte in the InputSplit will be the last byte of a > record. The override of computeSplitSize() delegates to FileInputFormat's > compute method, and then adjusts the returned split size by doing the > following: (Math.floor(fileInputFormatsComputedSplitSize / fixedRecordLength) > * fixedRecordLength) > This suite of fixed length input format classes, does not support compressed > files. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1176) Contribution: FixedLengthInputFormat and FixedLengthRecordReader
[ https://issues.apache.org/jira/browse/MAPREDUCE-1176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] BitsOfInfo updated MAPREDUCE-1176: -- Attachment: FixedLengthInputFormat.java Re-attached FixedLengthInputFormat source file, to contain same version as in patch file > Contribution: FixedLengthInputFormat and FixedLengthRecordReader > > > Key: MAPREDUCE-1176 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1176 > Project: Hadoop Map/Reduce > Issue Type: New Feature >Affects Versions: 0.20.1 > Environment: Any >Reporter: BitsOfInfo >Priority: Minor > Attachments: FixedLengthInputFormat.java, > FixedLengthRecordReader.java, MAPREDUCE-1176-v1.patch > > > Hello, > I would like to contribute the following two classes for incorporation into > the mapreduce.lib.input package. These two classes can be used when you need > to read data from files containing fixed length (fixed width) records. Such > files have no CR/LF (or any combination thereof), no delimiters etc, but each > record is a fixed length, and extra data is padded with spaces. The data is > one gigantic line within a file. > Provided are two classes first is the FixedLengthInputFormat and its > corresponding FixedLengthRecordReader. When creating a job that specifies > this input format, the job must have the > "mapreduce.input.fixedlengthinputformat.record.length" property set as follows > myJobConf.setInt("mapreduce.input.fixedlengthinputformat.record.length",[myFixedRecordLength]); > OR > myJobConf.setInt(FixedLengthInputFormat.FIXED_RECORD_LENGTH, > [myFixedRecordLength]); > This input format overrides computeSplitSize() in order to ensure that > InputSplits do not contain any partial records since with fixed records there > is no way to determine where a record begins if that were to occur. Each > InputSplit passed to the FixedLengthRecordReader will start at the beginning > of a record, and the last byte in the InputSplit will be the last byte of a > record. The override of computeSplitSize() delegates to FileInputFormat's > compute method, and then adjusts the returned split size by doing the > following: (Math.floor(fileInputFormatsComputedSplitSize / fixedRecordLength) > * fixedRecordLength) > This suite of fixed length input format classes, does not support compressed > files. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1176) Contribution: FixedLengthInputFormat and FixedLengthRecordReader
[ https://issues.apache.org/jira/browse/MAPREDUCE-1176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] BitsOfInfo updated MAPREDUCE-1176: -- Attachment: (was: FixedLengthRecordReader.java) > Contribution: FixedLengthInputFormat and FixedLengthRecordReader > > > Key: MAPREDUCE-1176 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1176 > Project: Hadoop Map/Reduce > Issue Type: New Feature >Affects Versions: 0.20.1 > Environment: Any >Reporter: BitsOfInfo >Priority: Minor > Attachments: FixedLengthInputFormat.java, > FixedLengthRecordReader.java, MAPREDUCE-1176-v1.patch > > > Hello, > I would like to contribute the following two classes for incorporation into > the mapreduce.lib.input package. These two classes can be used when you need > to read data from files containing fixed length (fixed width) records. Such > files have no CR/LF (or any combination thereof), no delimiters etc, but each > record is a fixed length, and extra data is padded with spaces. The data is > one gigantic line within a file. > Provided are two classes first is the FixedLengthInputFormat and its > corresponding FixedLengthRecordReader. When creating a job that specifies > this input format, the job must have the > "mapreduce.input.fixedlengthinputformat.record.length" property set as follows > myJobConf.setInt("mapreduce.input.fixedlengthinputformat.record.length",[myFixedRecordLength]); > OR > myJobConf.setInt(FixedLengthInputFormat.FIXED_RECORD_LENGTH, > [myFixedRecordLength]); > This input format overrides computeSplitSize() in order to ensure that > InputSplits do not contain any partial records since with fixed records there > is no way to determine where a record begins if that were to occur. Each > InputSplit passed to the FixedLengthRecordReader will start at the beginning > of a record, and the last byte in the InputSplit will be the last byte of a > record. The override of computeSplitSize() delegates to FileInputFormat's > compute method, and then adjusts the returned split size by doing the > following: (Math.floor(fileInputFormatsComputedSplitSize / fixedRecordLength) > * fixedRecordLength) > This suite of fixed length input format classes, does not support compressed > files. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1176) Contribution: FixedLengthInputFormat and FixedLengthRecordReader
[ https://issues.apache.org/jira/browse/MAPREDUCE-1176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] BitsOfInfo updated MAPREDUCE-1176: -- Attachment: FixedLengthRecordReader.java Re-attached source to match the same as in patch file > Contribution: FixedLengthInputFormat and FixedLengthRecordReader > > > Key: MAPREDUCE-1176 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1176 > Project: Hadoop Map/Reduce > Issue Type: New Feature >Affects Versions: 0.20.1 > Environment: Any >Reporter: BitsOfInfo >Priority: Minor > Attachments: FixedLengthInputFormat.java, > FixedLengthRecordReader.java, MAPREDUCE-1176-v1.patch > > > Hello, > I would like to contribute the following two classes for incorporation into > the mapreduce.lib.input package. These two classes can be used when you need > to read data from files containing fixed length (fixed width) records. Such > files have no CR/LF (or any combination thereof), no delimiters etc, but each > record is a fixed length, and extra data is padded with spaces. The data is > one gigantic line within a file. > Provided are two classes first is the FixedLengthInputFormat and its > corresponding FixedLengthRecordReader. When creating a job that specifies > this input format, the job must have the > "mapreduce.input.fixedlengthinputformat.record.length" property set as follows > myJobConf.setInt("mapreduce.input.fixedlengthinputformat.record.length",[myFixedRecordLength]); > OR > myJobConf.setInt(FixedLengthInputFormat.FIXED_RECORD_LENGTH, > [myFixedRecordLength]); > This input format overrides computeSplitSize() in order to ensure that > InputSplits do not contain any partial records since with fixed records there > is no way to determine where a record begins if that were to occur. Each > InputSplit passed to the FixedLengthRecordReader will start at the beginning > of a record, and the last byte in the InputSplit will be the last byte of a > record. The override of computeSplitSize() delegates to FileInputFormat's > compute method, and then adjusts the returned split size by doing the > following: (Math.floor(fileInputFormatsComputedSplitSize / fixedRecordLength) > * fixedRecordLength) > This suite of fixed length input format classes, does not support compressed > files. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1176) Contribution: FixedLengthInputFormat and FixedLengthRecordReader
[ https://issues.apache.org/jira/browse/MAPREDUCE-1176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] BitsOfInfo updated MAPREDUCE-1176: -- Attachment: (was: FixedLengthInputFormat.java) > Contribution: FixedLengthInputFormat and FixedLengthRecordReader > > > Key: MAPREDUCE-1176 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1176 > Project: Hadoop Map/Reduce > Issue Type: New Feature >Affects Versions: 0.20.1 > Environment: Any >Reporter: BitsOfInfo >Priority: Minor > Attachments: FixedLengthRecordReader.java, MAPREDUCE-1176-v1.patch > > > Hello, > I would like to contribute the following two classes for incorporation into > the mapreduce.lib.input package. These two classes can be used when you need > to read data from files containing fixed length (fixed width) records. Such > files have no CR/LF (or any combination thereof), no delimiters etc, but each > record is a fixed length, and extra data is padded with spaces. The data is > one gigantic line within a file. > Provided are two classes first is the FixedLengthInputFormat and its > corresponding FixedLengthRecordReader. When creating a job that specifies > this input format, the job must have the > "mapreduce.input.fixedlengthinputformat.record.length" property set as follows > myJobConf.setInt("mapreduce.input.fixedlengthinputformat.record.length",[myFixedRecordLength]); > OR > myJobConf.setInt(FixedLengthInputFormat.FIXED_RECORD_LENGTH, > [myFixedRecordLength]); > This input format overrides computeSplitSize() in order to ensure that > InputSplits do not contain any partial records since with fixed records there > is no way to determine where a record begins if that were to occur. Each > InputSplit passed to the FixedLengthRecordReader will start at the beginning > of a record, and the last byte in the InputSplit will be the last byte of a > record. The override of computeSplitSize() delegates to FileInputFormat's > compute method, and then adjusts the returned split size by doing the > following: (Math.floor(fileInputFormatsComputedSplitSize / fixedRecordLength) > * fixedRecordLength) > This suite of fixed length input format classes, does not support compressed > files. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1176) Contribution: FixedLengthInputFormat and FixedLengthRecordReader
[ https://issues.apache.org/jira/browse/MAPREDUCE-1176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] BitsOfInfo updated MAPREDUCE-1176: -- Attachment: MAPREDUCE-1176-v1.patch Attached is a patch file which adds FixedLengthInputFormat, FixedLengthRecordReader and a unit test for the new input format. This patch was made against the trunk as of 11/13. > Contribution: FixedLengthInputFormat and FixedLengthRecordReader > > > Key: MAPREDUCE-1176 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1176 > Project: Hadoop Map/Reduce > Issue Type: New Feature >Affects Versions: 0.20.1 > Environment: Any >Reporter: BitsOfInfo >Priority: Minor > Attachments: FixedLengthRecordReader.java, MAPREDUCE-1176-v1.patch > > > Hello, > I would like to contribute the following two classes for incorporation into > the mapreduce.lib.input package. These two classes can be used when you need > to read data from files containing fixed length (fixed width) records. Such > files have no CR/LF (or any combination thereof), no delimiters etc, but each > record is a fixed length, and extra data is padded with spaces. The data is > one gigantic line within a file. > Provided are two classes first is the FixedLengthInputFormat and its > corresponding FixedLengthRecordReader. When creating a job that specifies > this input format, the job must have the > "mapreduce.input.fixedlengthinputformat.record.length" property set as follows > myJobConf.setInt("mapreduce.input.fixedlengthinputformat.record.length",[myFixedRecordLength]); > OR > myJobConf.setInt(FixedLengthInputFormat.FIXED_RECORD_LENGTH, > [myFixedRecordLength]); > This input format overrides computeSplitSize() in order to ensure that > InputSplits do not contain any partial records since with fixed records there > is no way to determine where a record begins if that were to occur. Each > InputSplit passed to the FixedLengthRecordReader will start at the beginning > of a record, and the last byte in the InputSplit will be the last byte of a > record. The override of computeSplitSize() delegates to FileInputFormat's > compute method, and then adjusts the returned split size by doing the > following: (Math.floor(fileInputFormatsComputedSplitSize / fixedRecordLength) > * fixedRecordLength) > This suite of fixed length input format classes, does not support compressed > files. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-967) TaskTracker does not need to fully unjar job jars
[ https://issues.apache.org/jira/browse/MAPREDUCE-967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1232#action_1232 ] Hadoop QA commented on MAPREDUCE-967: - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12424909/mapreduce-967.txt against trunk revision 835968. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 5 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. -1 javac. The patch appears to cause tar ant target to fail. -1 findbugs. The patch appears to cause Findbugs to fail. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. -1 contrib tests. The patch failed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/139/testReport/ Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/139/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/139/console This message is automatically generated. > TaskTracker does not need to fully unjar job jars > - > > Key: MAPREDUCE-967 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-967 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: tasktracker >Affects Versions: 0.21.0 >Reporter: Todd Lipcon >Assignee: Todd Lipcon > Attachments: mapreduce-967-branch-0.20.txt, mapreduce-967.txt, > mapreduce-967.txt, mapreduce-967.txt > > > In practice we have seen some users submitting job jars that consist of > 10,000+ classes. Unpacking these jars into mapred.local.dir and then cleaning > up after them has a significant cost (both in wall clock and in unnecessary > heavy disk utilization). This cost can be easily avoided -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-961) ResourceAwareLoadManager to dynamically decide new tasks based on current CPU/memory load on TaskTracker(s)
[ https://issues.apache.org/jira/browse/MAPREDUCE-961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1229#action_1229 ] Hadoop QA commented on MAPREDUCE-961: - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12424888/MAPREDUCE-961-v4.patch against trunk revision 835968. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 9 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/242/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/242/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/242/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/242/console This message is automatically generated. > ResourceAwareLoadManager to dynamically decide new tasks based on current > CPU/memory load on TaskTracker(s) > --- > > Key: MAPREDUCE-961 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-961 > Project: Hadoop Map/Reduce > Issue Type: New Feature > Components: contrib/fair-share >Affects Versions: 0.22.0 >Reporter: dhruba borthakur >Assignee: Scott Chen > Fix For: 0.22.0 > > Attachments: HIVE-961.patch, MAPREDUCE-961-v2.patch, > MAPREDUCE-961-v3.patch, MAPREDUCE-961-v4.patch > > > Design and develop a ResouceAwareLoadManager for the FairShare scheduler that > dynamically decides how many maps/reduces to run on a particular machine > based on the CPU/Memory/diskIO/network usage in that machine. The amount of > resources currently used on each task tracker is being fed into the > ResourceAwareLoadManager in real-time via an entity that is external to > Hadoop. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-967) TaskTracker does not need to fully unjar job jars
[ https://issues.apache.org/jira/browse/MAPREDUCE-967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1223#action_1223 ] Todd Lipcon commented on MAPREDUCE-967: --- Forgot to note - this will probably fail Hudson since it requires the common jar built from HADOOP-6346 to add new functions to support regexes in Configuration. > TaskTracker does not need to fully unjar job jars > - > > Key: MAPREDUCE-967 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-967 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: tasktracker >Affects Versions: 0.21.0 >Reporter: Todd Lipcon >Assignee: Todd Lipcon > Attachments: mapreduce-967-branch-0.20.txt, mapreduce-967.txt, > mapreduce-967.txt, mapreduce-967.txt > > > In practice we have seen some users submitting job jars that consist of > 10,000+ classes. Unpacking these jars into mapred.local.dir and then cleaning > up after them has a significant cost (both in wall clock and in unnecessary > heavy disk utilization). This cost can be easily avoided -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-967) TaskTracker does not need to fully unjar job jars
[ https://issues.apache.org/jira/browse/MAPREDUCE-967?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon updated MAPREDUCE-967: -- Attachment: mapreduce-967.txt New patch for the new version of HADOOP-6346. This one does *not* move the RunJar class to mapreduce, since we determined over in that issue that it isn't the best course of action. One question for reviewer: the constant for the new configuration key is in JobContext, whereas the default is in JobConf. I was following some other examples from the code, but it seems a little bit messy here. Where are the right places to add new configuration parameters that work in both APIs? > TaskTracker does not need to fully unjar job jars > - > > Key: MAPREDUCE-967 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-967 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: tasktracker >Affects Versions: 0.21.0 >Reporter: Todd Lipcon >Assignee: Todd Lipcon > Attachments: mapreduce-967-branch-0.20.txt, mapreduce-967.txt, > mapreduce-967.txt, mapreduce-967.txt > > > In practice we have seen some users submitting job jars that consist of > 10,000+ classes. Unpacking these jars into mapred.local.dir and then cleaning > up after them has a significant cost (both in wall clock and in unnecessary > heavy disk utilization). This cost can be easily avoided -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-967) TaskTracker does not need to fully unjar job jars
[ https://issues.apache.org/jira/browse/MAPREDUCE-967?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon updated MAPREDUCE-967: -- Status: Patch Available (was: Open) > TaskTracker does not need to fully unjar job jars > - > > Key: MAPREDUCE-967 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-967 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: tasktracker >Affects Versions: 0.21.0 >Reporter: Todd Lipcon >Assignee: Todd Lipcon > Attachments: mapreduce-967-branch-0.20.txt, mapreduce-967.txt, > mapreduce-967.txt, mapreduce-967.txt > > > In practice we have seen some users submitting job jars that consist of > 10,000+ classes. Unpacking these jars into mapred.local.dir and then cleaning > up after them has a significant cost (both in wall clock and in unnecessary > heavy disk utilization). This cost can be easily avoided -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-967) TaskTracker does not need to fully unjar job jars
[ https://issues.apache.org/jira/browse/MAPREDUCE-967?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon updated MAPREDUCE-967: -- Status: Open (was: Patch Available) > TaskTracker does not need to fully unjar job jars > - > > Key: MAPREDUCE-967 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-967 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: tasktracker >Affects Versions: 0.21.0 >Reporter: Todd Lipcon >Assignee: Todd Lipcon > Attachments: mapreduce-967-branch-0.20.txt, mapreduce-967.txt, > mapreduce-967.txt > > > In practice we have seen some users submitting job jars that consist of > 10,000+ classes. Unpacking these jars into mapred.local.dir and then cleaning > up after them has a significant cost (both in wall clock and in unnecessary > heavy disk utilization). This cost can be easily avoided -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (MAPREDUCE-1214) Add support for counters in Hadoop Local Mode
Add support for counters in Hadoop Local Mode - Key: MAPREDUCE-1214 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1214 Project: Hadoop Map/Reduce Issue Type: Improvement Reporter: Ankit Modi Currently there is no support for counters ( Records and Bytes written ) in Hadoop Local Mode. Pig requires to provide counters to user when running in Hadoop Local Mode. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-961) ResourceAwareLoadManager to dynamically decide new tasks based on current CPU/memory load on TaskTracker(s)
[ https://issues.apache.org/jira/browse/MAPREDUCE-961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12777690#action_12777690 ] Scott Chen commented on MAPREDUCE-961: -- I will write an overall design document and post it here soon. > ResourceAwareLoadManager to dynamically decide new tasks based on current > CPU/memory load on TaskTracker(s) > --- > > Key: MAPREDUCE-961 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-961 > Project: Hadoop Map/Reduce > Issue Type: New Feature > Components: contrib/fair-share >Affects Versions: 0.22.0 >Reporter: dhruba borthakur >Assignee: Scott Chen > Fix For: 0.22.0 > > Attachments: HIVE-961.patch, MAPREDUCE-961-v2.patch, > MAPREDUCE-961-v3.patch, MAPREDUCE-961-v4.patch > > > Design and develop a ResouceAwareLoadManager for the FairShare scheduler that > dynamically decides how many maps/reduces to run on a particular machine > based on the CPU/Memory/diskIO/network usage in that machine. The amount of > resources currently used on each task tracker is being fed into the > ResourceAwareLoadManager in real-time via an entity that is external to > Hadoop. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-961) ResourceAwareLoadManager to dynamically decide new tasks based on current CPU/memory load on TaskTracker(s)
[ https://issues.apache.org/jira/browse/MAPREDUCE-961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12777644#action_12777644 ] Scott Chen commented on MAPREDUCE-961: -- Thanks for the comment, Arun. I have changed the patch a lot following the suggestion froms Matei and Vinod. The last patch is total different from the first one. I am sorry about the confusion. The following is the design 1. We obtain the available memory on the TT using MemoryCalculatorPlugin. Originally this class calculates only total memory only, we add a slight change so that it also computes the available memory. 2. The information is reported with TaskTrackerStatus.ResourceStatus back to JT. 3. In MemBasedLoadManager, we look at the available memory on TT, the maximum memory per task (from jobConf) and a configured reserved memory on TT. If (available memory - task memory > reserved memory), we return true which allows scheduler to lauch the task. The initial idea also includes using the memory usage of a job collecting in the cluster. Right now we only use the value obtained in jobConf. After MAPREDUCE-220 is done, we can use the task memory estimated by the previous tasks. > ResourceAwareLoadManager to dynamically decide new tasks based on current > CPU/memory load on TaskTracker(s) > --- > > Key: MAPREDUCE-961 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-961 > Project: Hadoop Map/Reduce > Issue Type: New Feature > Components: contrib/fair-share >Affects Versions: 0.22.0 >Reporter: dhruba borthakur >Assignee: Scott Chen > Fix For: 0.22.0 > > Attachments: HIVE-961.patch, MAPREDUCE-961-v2.patch, > MAPREDUCE-961-v3.patch, MAPREDUCE-961-v4.patch > > > Design and develop a ResouceAwareLoadManager for the FairShare scheduler that > dynamically decides how many maps/reduces to run on a particular machine > based on the CPU/Memory/diskIO/network usage in that machine. The amount of > resources currently used on each task tracker is being fed into the > ResourceAwareLoadManager in real-time via an entity that is external to > Hadoop. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-961) ResourceAwareLoadManager to dynamically decide new tasks based on current CPU/memory load on TaskTracker(s)
[ https://issues.apache.org/jira/browse/MAPREDUCE-961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Scott Chen updated MAPREDUCE-961: - Status: Patch Available (was: Open) > ResourceAwareLoadManager to dynamically decide new tasks based on current > CPU/memory load on TaskTracker(s) > --- > > Key: MAPREDUCE-961 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-961 > Project: Hadoop Map/Reduce > Issue Type: New Feature > Components: contrib/fair-share >Affects Versions: 0.22.0 >Reporter: dhruba borthakur >Assignee: Scott Chen > Fix For: 0.22.0 > > Attachments: HIVE-961.patch, MAPREDUCE-961-v2.patch, > MAPREDUCE-961-v3.patch, MAPREDUCE-961-v4.patch > > > Design and develop a ResouceAwareLoadManager for the FairShare scheduler that > dynamically decides how many maps/reduces to run on a particular machine > based on the CPU/Memory/diskIO/network usage in that machine. The amount of > resources currently used on each task tracker is being fed into the > ResourceAwareLoadManager in real-time via an entity that is external to > Hadoop. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1189) Reduce ivy console output to ovservable level
[ https://issues.apache.org/jira/browse/MAPREDUCE-1189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12777625#action_12777625 ] Hudson commented on MAPREDUCE-1189: --- Integrated in Hadoop-Mapreduce-trunk-Commit #117 (See [http://hudson.zones.apache.org/hudson/job/Hadoop-Mapreduce-trunk-Commit/117/]) . Reduce ivy console output to ovservable level. Contributed by Konstantin Boudnik > Reduce ivy console output to ovservable level > - > > Key: MAPREDUCE-1189 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1189 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: build >Affects Versions: 0.22.0 >Reporter: Konstantin Boudnik >Assignee: Konstantin Boudnik > Fix For: 0.22.0 > > Attachments: MAPREDUCE-1189.patch > > > It is very hard to see what's going in the build because ivy is literally > flood the console with nonsensical messages... -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-961) ResourceAwareLoadManager to dynamically decide new tasks based on current CPU/memory load on TaskTracker(s)
[ https://issues.apache.org/jira/browse/MAPREDUCE-961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Scott Chen updated MAPREDUCE-961: - Attachment: MAPREDUCE-961-v4.patch > ResourceAwareLoadManager to dynamically decide new tasks based on current > CPU/memory load on TaskTracker(s) > --- > > Key: MAPREDUCE-961 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-961 > Project: Hadoop Map/Reduce > Issue Type: New Feature > Components: contrib/fair-share >Affects Versions: 0.22.0 >Reporter: dhruba borthakur >Assignee: Scott Chen > Fix For: 0.22.0 > > Attachments: HIVE-961.patch, MAPREDUCE-961-v2.patch, > MAPREDUCE-961-v3.patch, MAPREDUCE-961-v4.patch > > > Design and develop a ResouceAwareLoadManager for the FairShare scheduler that > dynamically decides how many maps/reduces to run on a particular machine > based on the CPU/Memory/diskIO/network usage in that machine. The amount of > resources currently used on each task tracker is being fed into the > ResourceAwareLoadManager in real-time via an entity that is external to > Hadoop. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1189) Reduce ivy console output to ovservable level
[ https://issues.apache.org/jira/browse/MAPREDUCE-1189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantin Boudnik updated MAPREDUCE-1189: -- Resolution: Fixed Fix Version/s: 0.22.0 Status: Resolved (was: Patch Available) I've just committed this. > Reduce ivy console output to ovservable level > - > > Key: MAPREDUCE-1189 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1189 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: build >Affects Versions: 0.22.0 >Reporter: Konstantin Boudnik >Assignee: Konstantin Boudnik > Fix For: 0.22.0 > > Attachments: MAPREDUCE-1189.patch > > > It is very hard to see what's going in the build because ivy is literally > flood the console with nonsensical messages... -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-961) ResourceAwareLoadManager to dynamically decide new tasks based on current CPU/memory load on TaskTracker(s)
[ https://issues.apache.org/jira/browse/MAPREDUCE-961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12777620#action_12777620 ] Arun C Murthy commented on MAPREDUCE-961: - I'm looking at all the comments here and I cannot find a single, coherent, design for this feature - a major one. Can you please put up a design? I'd first like to understand/debate the design before I look at the patch. > ResourceAwareLoadManager to dynamically decide new tasks based on current > CPU/memory load on TaskTracker(s) > --- > > Key: MAPREDUCE-961 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-961 > Project: Hadoop Map/Reduce > Issue Type: New Feature > Components: contrib/fair-share >Affects Versions: 0.22.0 >Reporter: dhruba borthakur >Assignee: Scott Chen > Fix For: 0.22.0 > > Attachments: HIVE-961.patch, MAPREDUCE-961-v2.patch, > MAPREDUCE-961-v3.patch > > > Design and develop a ResouceAwareLoadManager for the FairShare scheduler that > dynamically decides how many maps/reduces to run on a particular machine > based on the CPU/Memory/diskIO/network usage in that machine. The amount of > resources currently used on each task tracker is being fed into the > ResourceAwareLoadManager in real-time via an entity that is external to > Hadoop. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-961) ResourceAwareLoadManager to dynamically decide new tasks based on current CPU/memory load on TaskTracker(s)
[ https://issues.apache.org/jira/browse/MAPREDUCE-961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Scott Chen updated MAPREDUCE-961: - Attachment: MAPREDUCE-961-v4.patch Fixed the failed test in TestTTMemoryReporting. This test will run on linux only. Previously I was running test on mac so I did not catch the this failure. > ResourceAwareLoadManager to dynamically decide new tasks based on current > CPU/memory load on TaskTracker(s) > --- > > Key: MAPREDUCE-961 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-961 > Project: Hadoop Map/Reduce > Issue Type: New Feature > Components: contrib/fair-share >Affects Versions: 0.22.0 >Reporter: dhruba borthakur >Assignee: Scott Chen > Fix For: 0.22.0 > > Attachments: HIVE-961.patch, MAPREDUCE-961-v2.patch, > MAPREDUCE-961-v3.patch > > > Design and develop a ResouceAwareLoadManager for the FairShare scheduler that > dynamically decides how many maps/reduces to run on a particular machine > based on the CPU/Memory/diskIO/network usage in that machine. The amount of > resources currently used on each task tracker is being fed into the > ResourceAwareLoadManager in real-time via an entity that is external to > Hadoop. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-961) ResourceAwareLoadManager to dynamically decide new tasks based on current CPU/memory load on TaskTracker(s)
[ https://issues.apache.org/jira/browse/MAPREDUCE-961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Scott Chen updated MAPREDUCE-961: - Attachment: (was: MAPREDUCE-961-v4.patch) > ResourceAwareLoadManager to dynamically decide new tasks based on current > CPU/memory load on TaskTracker(s) > --- > > Key: MAPREDUCE-961 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-961 > Project: Hadoop Map/Reduce > Issue Type: New Feature > Components: contrib/fair-share >Affects Versions: 0.22.0 >Reporter: dhruba borthakur >Assignee: Scott Chen > Fix For: 0.22.0 > > Attachments: HIVE-961.patch, MAPREDUCE-961-v2.patch, > MAPREDUCE-961-v3.patch > > > Design and develop a ResouceAwareLoadManager for the FairShare scheduler that > dynamically decides how many maps/reduces to run on a particular machine > based on the CPU/Memory/diskIO/network usage in that machine. The amount of > resources currently used on each task tracker is being fed into the > ResourceAwareLoadManager in real-time via an entity that is external to > Hadoop. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1203) DBOutputFormat: add batch size support for JDBC and recieve DBWritable object in value not in key
[ https://issues.apache.org/jira/browse/MAPREDUCE-1203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12777516#action_12777516 ] Enis Soztutar commented on MAPREDUCE-1203: -- Yes, unfortunatelly, JDBC does not specify the behavior for this. As far as I remember, execute batch returns the number of documents that it is able to send. Another use case, that I want at some point is to enable processing even after some records fail. Logically, failing to send a few records should be acceptable. Considering this use case and yours implementing different fail strategies makes sense. I guess we could first try execute batch, then fall back to execute update one by one. Alternatively, we could write a wrapper Input format, which limits the number of records in the map tasks so that there is less to send at each task. > DBOutputFormat: add batch size support for JDBC and recieve DBWritable > object in value not in key > -- > > Key: MAPREDUCE-1203 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1203 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Reporter: Alexander Schwid >Assignee: Aaron Kimball >Priority: Minor > Attachments: HADOOP-4331.patch, patch.txt > > > package mapred.lib.db > added batch size support for JDBC in DBOutputFormat > recieve DBWritable object in value not in key in DBOutputFormat -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1140) Per cache-file refcount can become negative when tasks release distributed-cache files
[ https://issues.apache.org/jira/browse/MAPREDUCE-1140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12777476#action_12777476 ] Hemanth Yamijala commented on MAPREDUCE-1140: - Some comments on the test case: - Can we refactor TestTrackerDistributedCacheManager and its subclass for LinuxTaskController, by introducing an API like canRun in the base class. Let it return true and this can be overridden in TestTrackerDistributedCacheManagerWithLinuxTaskController to return false if the platform is not Linux. This way, I suppose we don't have to keep overridding every testcase in the base class everytime we make a change. - I would enhance testReferenceCount to add 3 files instead of 2, the first file should be localized correctly, the second should fail after the reference count is incremented, and the third will not be localized at all. Then we can verify that the reference count for all files is 0. > Per cache-file refcount can become negative when tasks release > distributed-cache files > -- > > Key: MAPREDUCE-1140 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1140 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: tasktracker >Affects Versions: 0.20.2, 0.21.0, 0.22.0 >Reporter: Vinod K V >Assignee: Amareshwari Sriramadasu > Attachments: patch-1140-1.txt, patch-1140-ydist.txt, patch-1140.txt > > -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1140) Per cache-file refcount can become negative when tasks release distributed-cache files
[ https://issues.apache.org/jira/browse/MAPREDUCE-1140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12777472#action_12777472 ] Hemanth Yamijala commented on MAPREDUCE-1140: - bq. This is done, because getLocalCache increments referenceCount first and then localizes. Reference count should be decremented for the one just failed also. So, it should be added to the list before the getLocalCache call. Umm. But (atleast theoretically), it is still possible that a call to getLocalCache fails before referenceCount is incremented. For e.g. makeRelative throws IOException; so does getLocalCacheForWrite. Hence, we still have a situation where we record a file as being localized (by storing it in localizedCacheFiles), but the reference count is not actually incremented. And releaseCache would have the bug this JIRA is talking about still. One more point I am slightly uncomfortable about is the duplication of state because of the new list localizedCacheFiles. Here's an alternate proposal: - Modify CacheFile to have a boolean saying isLocalized. By default, this is false. This will be set to true if distributedCacheManager.getLocalCache returns successfully. - To handle the case you have mentioned above, where a failure can happen after referenceCount is incremented in getLocalCache, I would suggest we catch exceptions inside getLocalCache, and on an exception, decrement the referenceCount and re-throw the exception. This seems right to me - because if the getLocalCache doesn't complete, shouldn't we be consistent by decrementing the reference count ? Would this work ? > Per cache-file refcount can become negative when tasks release > distributed-cache files > -- > > Key: MAPREDUCE-1140 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1140 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: tasktracker >Affects Versions: 0.20.2, 0.21.0, 0.22.0 >Reporter: Vinod K V >Assignee: Amareshwari Sriramadasu > Attachments: patch-1140-1.txt, patch-1140-ydist.txt, patch-1140.txt > > -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1213) TaskTrackers restart is very slow because ti deletes distributed cache directory synchronously
[ https://issues.apache.org/jira/browse/MAPREDUCE-1213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12777470#action_12777470 ] dhruba borthakur commented on MAPREDUCE-1213: - One option is to rename the currentworkdir to a temporary location and then delete these file asynchronously via the technique used in HDFS-611 > TaskTrackers restart is very slow because ti deletes distributed cache > directory synchronously > -- > > Key: MAPREDUCE-1213 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1213 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 0.20.1 >Reporter: dhruba borthakur > > We are seeing that when we restart a tasktracker, it tries to recursively > delete all the file in the distributed cache. It invoked > FileUtil.fullyDelete() which is very very slow. This means that the > TaskTracker cannot join the cluster for an extended period of time (upto 2 > hours for us). The problem is acute if the number of files in a distributed > cache is a few-thousands. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-915) For secure environments, the Map/Reduce debug script must be run as the user.
[ https://issues.apache.org/jira/browse/MAPREDUCE-915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12777468#action_12777468 ] Hadoop QA commented on MAPREDUCE-915: - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12424820/915-4.patch against trunk revision 835237. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 12 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/138/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/138/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/138/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/138/console This message is automatically generated. > For secure environments, the Map/Reduce debug script must be run as the user. > - > > Key: MAPREDUCE-915 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-915 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: security, tasktracker >Affects Versions: 0.21.0 >Reporter: Hemanth Yamijala >Assignee: Devaraj Das >Priority: Blocker > Fix For: 0.21.0 > > Attachments: 915-4.patch, 915.1.patch, 915.2.patch, 915.patch > > > The Taskcontroller model allows admins to set up a cluster configuration that > runs tasks as users. The debug script feature of Map/Reduce provided by the > configuration options: mapred.map.task.debug.script and > mapred.reduce.task.debug.script need to be run as the user as well in such > environments, rather than as the tasktracker user. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (MAPREDUCE-1213) TaskTrackers restart is very slow because ti deletes distributed cache directory synchronously
TaskTrackers restart is very slow because ti deletes distributed cache directory synchronously -- Key: MAPREDUCE-1213 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1213 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 0.20.1 Reporter: dhruba borthakur We are seeing that when we restart a tasktracker, it tries to recursively delete all the file in the distributed cache. It invoked FileUtil.fullyDelete() which is very very slow. This means that the TaskTracker cannot join the cluster for an extended period of time (upto 2 hours for us). The problem is acute if the number of files in a distributed cache is a few-thousands. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-754) NPE in expiry thread when a TT is lost
[ https://issues.apache.org/jira/browse/MAPREDUCE-754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12777434#action_12777434 ] Hemanth Yamijala commented on MAPREDUCE-754: Some more comments: - It would be useful to add a javadoc for getNumberOfUniqueHosts, along with a reason why blacklisted hosts must be excluded from this count. Please remember our offline discussion where we spoke about why this number must be excluded when scheduling. Other comments on test cases: - In TestLostTracker, please separate case 3 into a separate test case. It is generally good unit testing practice to test separate conditions in separate tests. - We can assert some state after case 2 and case 3 in addition to just making sure method calls succeed. For e.g. in the case of blacklisting, we can check the number of active hosts is decremented by the right value (because we are changing that API as well and will be a good check). Likewise we can also check that a host is blacklisted or a job is finished, etc. - Please create the hosts.exclude file in a folder relative to TEST_DIR. - The testcase testBlacklistedNodeDecommissioning can blacklist a node by globally blacklisting - rather than the health check script, which is slightly more complicated. One reason for doing so is that we can do this without having to wait for blacklisting to happen asynchronously. - Common code in this class related to global blacklisting because of job failures as well as refresh of hosts can also be refactored into separate utility methods and reused. - Instead of checking if the decommissioned tracker is not present in the list of trackers, since we are starting with only one tracker, we can explicitly check that the number of trackers in jt.taskTrackers is 0. - Some additional tests that I can suggest -- Blacklist + decommission when there are multiple trackers per host -- Have a cluster with 3 trackers, blacklist one of them, decommission 2 of them, and make sure the active, decommissioned and blacklisted counts all match. > NPE in expiry thread when a TT is lost > -- > > Key: MAPREDUCE-754 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-754 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: jobtracker >Affects Versions: 0.20.1 >Reporter: Ramya R >Assignee: Amar Kamat >Priority: Minor > Fix For: 0.22.0 > > Attachments: mapreduce-754-v1.1.patch, mapreduce-754-v1.2.patch, > mapreduce-754-wip.patch > > > NullPointerException is obtained in Tracker Expiry Thread. Below is the > exception obtained in the JT logs > {noformat} > ERROR org.apache.hadoop.mapred.JobTracker: Tracker Expiry Thread got > exception: java.lang.NullPointerException > at > org.apache.hadoop.mapred.JobTracker.updateTaskTrackerStatus(JobTracker.java:2971) > at org.apache.hadoop.mapred.JobTracker.access$300(JobTracker.java:104) > at > org.apache.hadoop.mapred.JobTracker$ExpireTrackers.run(JobTracker.java:381) > at java.lang.Thread.run(Thread.java:619) > {noformat} > The steps to reproduce this issue are: > * Blacklist a TT. > * Restart it. > * The above exception is obtained when the first instance of TT is marked as > lost. > However the above exception does not break any functionality. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1119) When tasks fail to report status, show tasks's stack dump before killing
[ https://issues.apache.org/jira/browse/MAPREDUCE-1119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12777433#action_12777433 ] Hadoop QA commented on MAPREDUCE-1119: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12424806/MAPREDUCE-1119.5.patch against trunk revision 835237. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 8 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. -1 contrib tests. The patch failed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/241/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/241/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/241/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/241/console This message is automatically generated. > When tasks fail to report status, show tasks's stack dump before killing > > > Key: MAPREDUCE-1119 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1119 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: tasktracker >Affects Versions: 0.22.0 >Reporter: Todd Lipcon >Assignee: Aaron Kimball > Attachments: MAPREDUCE-1119.2.patch, MAPREDUCE-1119.3.patch, > MAPREDUCE-1119.4.patch, MAPREDUCE-1119.5.patch, MAPREDUCE-1119.patch > > > When the TT kills tasks that haven't reported status, it should somehow > gather a stack dump for the task. This could be done either by sending a > SIGQUIT (so the dump ends up in stdout) or perhaps something like JDI to > gather the stack directly from Java. This may be somewhat tricky since the > child may be running as another user (so the SIGQUIT would have to go through > LinuxTaskController). This feature would make debugging these kinds of > failures much easier, especially if we could somehow get it into the > TaskDiagnostic message -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.