[jira] Commented: (MAPREDUCE-1788) o.a.h.mapreduce.Job shouldn't make a copy of the JobConf
[ https://issues.apache.org/jira/browse/MAPREDUCE-1788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12981155#action_12981155 ] Arun C Murthy commented on MAPREDUCE-1788: -- Downgraded, I'm not sure this is critical right away. Also, I remember Owen having some reservation about this... foggy. > o.a.h.mapreduce.Job shouldn't make a copy of the JobConf > > > Key: MAPREDUCE-1788 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1788 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: client >Affects Versions: 0.21.0 >Reporter: Arun C Murthy >Assignee: Arun C Murthy > > Having o.a.h.mapreduce.Job make a copy of the passed in JobConf has several > issues: any modifications done by various pieces such as InputSplit etc. are > not reflected back and causes issues for frameworks built on top. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1788) o.a.h.mapreduce.Job shouldn't make a copy of the JobConf
[ https://issues.apache.org/jira/browse/MAPREDUCE-1788?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun C Murthy updated MAPREDUCE-1788: - Priority: Major (was: Blocker) > o.a.h.mapreduce.Job shouldn't make a copy of the JobConf > > > Key: MAPREDUCE-1788 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1788 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: client >Affects Versions: 0.21.0 >Reporter: Arun C Murthy >Assignee: Arun C Murthy > > Having o.a.h.mapreduce.Job make a copy of the passed in JobConf has several > issues: any modifications done by various pieces such as InputSplit etc. are > not reflected back and causes issues for frameworks built on top. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-2254) Allow setting of end-of-record delimiter for TextInputFormat
[ https://issues.apache.org/jira/browse/MAPREDUCE-2254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ahmed Radwan updated MAPREDUCE-2254: Attachment: (was: HADOOP-7096.patch) > Allow setting of end-of-record delimiter for TextInputFormat > > > Key: MAPREDUCE-2254 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-2254 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Reporter: Ahmed Radwan > Attachments: MAPREDUCE-2245.patch > > > It will be useful to allow setting the end-of-record delimiter for > TextInputFormat. The current implementation hardcodes '\n', '\r' or '\r\n' as > the only possible record delimiters. This is a problem if users have embedded > newlines in their data fields (which is pretty common). This is also a > problem for other tools using this TextInputFormat (See for example: > https://issues.apache.org/jira/browse/PIG-836 and > https://issues.cloudera.org/browse/SQOOP-136). > I have wrote a patch to address this issue. This patch allows users to > specify any custom end-of-record delimiter using a new added configuration > property. For backward compatibility, if this new configuration property is > absent, then the same exact previous delimiters are used (i.e., '\n', '\r' or > '\r\n'). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-2254) Allow setting of end-of-record delimiter for TextInputFormat
[ https://issues.apache.org/jira/browse/MAPREDUCE-2254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12981114#action_12981114 ] Ahmed Radwan commented on MAPREDUCE-2254: - I have attached the patches. The review board has a problem with uploading git diff files. > Allow setting of end-of-record delimiter for TextInputFormat > > > Key: MAPREDUCE-2254 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-2254 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Reporter: Ahmed Radwan > Attachments: HADOOP-7096.patch, MAPREDUCE-2245.patch > > > It will be useful to allow setting the end-of-record delimiter for > TextInputFormat. The current implementation hardcodes '\n', '\r' or '\r\n' as > the only possible record delimiters. This is a problem if users have embedded > newlines in their data fields (which is pretty common). This is also a > problem for other tools using this TextInputFormat (See for example: > https://issues.apache.org/jira/browse/PIG-836 and > https://issues.cloudera.org/browse/SQOOP-136). > I have wrote a patch to address this issue. This patch allows users to > specify any custom end-of-record delimiter using a new added configuration > property. For backward compatibility, if this new configuration property is > absent, then the same exact previous delimiters are used (i.e., '\n', '\r' or > '\r\n'). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-2254) Allow setting of end-of-record delimiter for TextInputFormat
[ https://issues.apache.org/jira/browse/MAPREDUCE-2254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ahmed Radwan updated MAPREDUCE-2254: Attachment: (was: 2.patch) > Allow setting of end-of-record delimiter for TextInputFormat > > > Key: MAPREDUCE-2254 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-2254 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Reporter: Ahmed Radwan > Attachments: HADOOP-7096.patch, MAPREDUCE-2245.patch > > > It will be useful to allow setting the end-of-record delimiter for > TextInputFormat. The current implementation hardcodes '\n', '\r' or '\r\n' as > the only possible record delimiters. This is a problem if users have embedded > newlines in their data fields (which is pretty common). This is also a > problem for other tools using this TextInputFormat (See for example: > https://issues.apache.org/jira/browse/PIG-836 and > https://issues.cloudera.org/browse/SQOOP-136). > I have wrote a patch to address this issue. This patch allows users to > specify any custom end-of-record delimiter using a new added configuration > property. For backward compatibility, if this new configuration property is > absent, then the same exact previous delimiters are used (i.e., '\n', '\r' or > '\r\n'). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-2254) Allow setting of end-of-record delimiter for TextInputFormat
[ https://issues.apache.org/jira/browse/MAPREDUCE-2254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ahmed Radwan updated MAPREDUCE-2254: Attachment: (was: 1.patch) > Allow setting of end-of-record delimiter for TextInputFormat > > > Key: MAPREDUCE-2254 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-2254 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Reporter: Ahmed Radwan > Attachments: HADOOP-7096.patch, MAPREDUCE-2245.patch > > > It will be useful to allow setting the end-of-record delimiter for > TextInputFormat. The current implementation hardcodes '\n', '\r' or '\r\n' as > the only possible record delimiters. This is a problem if users have embedded > newlines in their data fields (which is pretty common). This is also a > problem for other tools using this TextInputFormat (See for example: > https://issues.apache.org/jira/browse/PIG-836 and > https://issues.cloudera.org/browse/SQOOP-136). > I have wrote a patch to address this issue. This patch allows users to > specify any custom end-of-record delimiter using a new added configuration > property. For backward compatibility, if this new configuration property is > absent, then the same exact previous delimiters are used (i.e., '\n', '\r' or > '\r\n'). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-2254) Allow setting of end-of-record delimiter for TextInputFormat
[ https://issues.apache.org/jira/browse/MAPREDUCE-2254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ahmed Radwan updated MAPREDUCE-2254: Attachment: HADOOP-7096.patch MAPREDUCE-2245.patch The updated patches. Added a new test case, and used --no-prefix when generating the git diff files. > Allow setting of end-of-record delimiter for TextInputFormat > > > Key: MAPREDUCE-2254 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-2254 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Reporter: Ahmed Radwan > Attachments: HADOOP-7096.patch, MAPREDUCE-2245.patch > > > It will be useful to allow setting the end-of-record delimiter for > TextInputFormat. The current implementation hardcodes '\n', '\r' or '\r\n' as > the only possible record delimiters. This is a problem if users have embedded > newlines in their data fields (which is pretty common). This is also a > problem for other tools using this TextInputFormat (See for example: > https://issues.apache.org/jira/browse/PIG-836 and > https://issues.cloudera.org/browse/SQOOP-136). > I have wrote a patch to address this issue. This patch allows users to > specify any custom end-of-record delimiter using a new added configuration > property. For backward compatibility, if this new configuration property is > absent, then the same exact previous delimiters are used (i.e., '\n', '\r' or > '\r\n'). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-2254) Allow setting of end-of-record delimiter for TextInputFormat
[ https://issues.apache.org/jira/browse/MAPREDUCE-2254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12981099#action_12981099 ] Ahmed Radwan commented on MAPREDUCE-2254: - Uploaded the patch to review board for easier review. https://reviews.apache.org/r/293/ https://reviews.apache.org/r/312/ I have also added a new test case to the patch. > Allow setting of end-of-record delimiter for TextInputFormat > > > Key: MAPREDUCE-2254 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-2254 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Reporter: Ahmed Radwan > Attachments: 1.patch, 2.patch > > > It will be useful to allow setting the end-of-record delimiter for > TextInputFormat. The current implementation hardcodes '\n', '\r' or '\r\n' as > the only possible record delimiters. This is a problem if users have embedded > newlines in their data fields (which is pretty common). This is also a > problem for other tools using this TextInputFormat (See for example: > https://issues.apache.org/jira/browse/PIG-836 and > https://issues.cloudera.org/browse/SQOOP-136). > I have wrote a patch to address this issue. This patch allows users to > specify any custom end-of-record delimiter using a new added configuration > property. For backward compatibility, if this new configuration property is > absent, then the same exact previous delimiters are used (i.e., '\n', '\r' or > '\r\n'). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-2250) Fix logging in raid code.
[ https://issues.apache.org/jira/browse/MAPREDUCE-2250?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ramkumar Vadali updated MAPREDUCE-2250: --- Attachment: MAPREDUCE-2250.1.patch Update after svn up > Fix logging in raid code. > - > > Key: MAPREDUCE-2250 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-2250 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: contrib/raid >Reporter: Ramkumar Vadali >Assignee: Ramkumar Vadali >Priority: Trivial > Attachments: MAPREDUCE-2250.1.patch, MAPREDUCE-2250.patch > > > There are quite a few error messages being logged with a log level of info. > That should be fixed to help debugging. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-2250) Fix logging in raid code.
[ https://issues.apache.org/jira/browse/MAPREDUCE-2250?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ramkumar Vadali updated MAPREDUCE-2250: --- Hadoop Flags: [Reviewed] Status: Patch Available (was: Open) > Fix logging in raid code. > - > > Key: MAPREDUCE-2250 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-2250 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: contrib/raid >Reporter: Ramkumar Vadali >Assignee: Ramkumar Vadali >Priority: Trivial > Attachments: MAPREDUCE-2250.1.patch, MAPREDUCE-2250.patch > > > There are quite a few error messages being logged with a log level of info. > That should be fixed to help debugging. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-2250) Fix logging in raid code.
[ https://issues.apache.org/jira/browse/MAPREDUCE-2250?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ramkumar Vadali updated MAPREDUCE-2250: --- Status: Open (was: Patch Available) > Fix logging in raid code. > - > > Key: MAPREDUCE-2250 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-2250 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: contrib/raid >Reporter: Ramkumar Vadali >Assignee: Ramkumar Vadali >Priority: Trivial > Attachments: MAPREDUCE-2250.1.patch, MAPREDUCE-2250.patch > > > There are quite a few error messages being logged with a log level of info. > That should be fixed to help debugging. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-2248) DistributedRaidFileSystem should unraid only the corrupt block
[ https://issues.apache.org/jira/browse/MAPREDUCE-2248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12981065#action_12981065 ] Hudson commented on MAPREDUCE-2248: --- Integrated in Hadoop-Mapreduce-trunk-Commit #577 (See [https://hudson.apache.org/hudson/job/Hadoop-Mapreduce-trunk-Commit/577/]) MAPREDUCE-2248. DistributedRaidFileSystem should unraid only the corrupt block (Ramkumar Vadali via schen) > DistributedRaidFileSystem should unraid only the corrupt block > -- > > Key: MAPREDUCE-2248 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-2248 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Affects Versions: 0.23.0 >Reporter: Ramkumar Vadali >Assignee: Ramkumar Vadali > Fix For: 0.23.0 > > Attachments: MAPREDUCE-2248.1.patch, MAPREDUCE-2248.patch > > > DistributedRaidFileSystem unraids the entire file if it hits a corrupt block. > It is better to unraid just the corrupt block and use the rest of the file as > normal. This becomes really important when we have tera-byte sized files. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (MAPREDUCE-2260) Remove auto-generated native build files
Remove auto-generated native build files Key: MAPREDUCE-2260 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2260 Project: Hadoop Map/Reduce Issue Type: Improvement Components: build Reporter: Roman Shaposhnik The repo currently includes the automake and autoconf generated files for the native build. Per discussion on HADOOP-6421 let's remove them and use the host's automake and autoconf. We should also do this for libhdfs and fuse-dfs. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (MAPREDUCE-2259) Hadoop Streaming JAR location might be updated
Hadoop Streaming JAR location might be updated -- Key: MAPREDUCE-2259 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2259 Project: Hadoop Map/Reduce Issue Type: Improvement Components: documentation Environment: N/A Reporter: minoru nishikubo Priority: Trivial examples in docs/streaming.html: $HADOOP_HOME/hadoop-streaming.jar might be updated to $HADOOP_HOME/contrib/streaming/hadoop-$HADOOP-VERSION-streaming.jar for someone could not find the streaming archive. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-2248) DistributedRaidFileSystem should unraid only the corrupt block
[ https://issues.apache.org/jira/browse/MAPREDUCE-2248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Scott Chen updated MAPREDUCE-2248: -- Resolution: Fixed Status: Resolved (was: Patch Available) I just committed this. Thanks Ram. > DistributedRaidFileSystem should unraid only the corrupt block > -- > > Key: MAPREDUCE-2248 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-2248 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Affects Versions: 0.23.0 >Reporter: Ramkumar Vadali >Assignee: Ramkumar Vadali > Fix For: 0.23.0 > > Attachments: MAPREDUCE-2248.1.patch, MAPREDUCE-2248.patch > > > DistributedRaidFileSystem unraids the entire file if it hits a corrupt block. > It is better to unraid just the corrupt block and use the rest of the file as > normal. This becomes really important when we have tera-byte sized files. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-2248) DistributedRaidFileSystem should unraid only the corrupt block
[ https://issues.apache.org/jira/browse/MAPREDUCE-2248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12980951#action_12980951 ] Ramkumar Vadali commented on MAPREDUCE-2248: TEST RESULTS {code} test-junit: [junit] WARNING: multiple versions of ant detected in path for junit [junit] jar:file:/home/rvadali/local/external/ant/lib/ant.jar!/org/apache/tools/ant/Project.class [junit] and jar:file:/home/rvadali/.ivy2/cache/ant/ant/jars/ant-1.6.5.jar!/org/apache/tools/ant/Project.class [junit] Running org.apache.hadoop.hdfs.TestRaidDfs [junit] Tests run: 4, Failures: 0, Errors: 0, Time elapsed: 524.787 sec [junit] Running org.apache.hadoop.hdfs.server.namenode.TestBlockPlacementPolicyRaid [junit] Tests run: 6, Failures: 0, Errors: 0, Time elapsed: 154.653 sec [junit] Running org.apache.hadoop.raid.TestBlockFixer [junit] Tests run: 14, Failures: 0, Errors: 0, Time elapsed: 944.872 sec [junit] Running org.apache.hadoop.raid.TestDirectoryTraversal [junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 13.241 sec [junit] Running org.apache.hadoop.raid.TestErasureCodes [junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 17.78 sec [junit] Running org.apache.hadoop.raid.TestGaloisField [junit] Tests run: 7, Failures: 0, Errors: 0, Time elapsed: 0.293 sec [junit] Running org.apache.hadoop.raid.TestHarIndexParser [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.036 sec [junit] Running org.apache.hadoop.raid.TestRaidFilter [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 15.007 sec [junit] Running org.apache.hadoop.raid.TestRaidHar [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 178.351 sec [junit] Running org.apache.hadoop.raid.TestRaidNode [junit] Tests run: 4, Failures: 0, Errors: 0, Time elapsed: 646.931 sec [junit] Running org.apache.hadoop.raid.TestRaidPurge [junit] Tests run: 4, Failures: 0, Errors: 0, Time elapsed: 253.727 sec [junit] Running org.apache.hadoop.raid.TestRaidShell [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 21.994 sec [junit] Running org.apache.hadoop.raid.TestRaidShellFsck [junit] Tests run: 11, Failures: 0, Errors: 0, Time elapsed: 270.783 sec [junit] Running org.apache.hadoop.raid.TestReedSolomonDecoder [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 25.14 sec [junit] Running org.apache.hadoop.raid.TestReedSolomonEncoder [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 3.769 sec {code} {code} [exec] [exec] +1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] +1 tests included. The patch appears to include 4 new or modified tests. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. [exec] [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings. [exec] [exec] +1 system test framework. The patch passed system test framework compile. [exec] [exec] [exec] [exec] [exec] == [exec] == [exec] Finished build. [exec] == [exec] == [exec] {code} > DistributedRaidFileSystem should unraid only the corrupt block > -- > > Key: MAPREDUCE-2248 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-2248 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Affects Versions: 0.23.0 >Reporter: Ramkumar Vadali >Assignee: Ramkumar Vadali > Fix For: 0.23.0 > > Attachments: MAPREDUCE-2248.1.patch, MAPREDUCE-2248.patch > > > DistributedRaidFileSystem unraids the entire file if it hits a corrupt block. > It is better to unraid just the corrupt block and use the rest of the file as > normal. This becomes really important when we have tera-byte sized files. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-2248) DistributedRaidFileSystem should unraid only the corrupt block
[ https://issues.apache.org/jira/browse/MAPREDUCE-2248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Scott Chen updated MAPREDUCE-2248: -- Fix Version/s: 0.23.0 Affects Version/s: 0.23.0 Hadoop Flags: [Reviewed] Status: Patch Available (was: Open) > DistributedRaidFileSystem should unraid only the corrupt block > -- > > Key: MAPREDUCE-2248 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-2248 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Affects Versions: 0.23.0 >Reporter: Ramkumar Vadali >Assignee: Ramkumar Vadali > Fix For: 0.23.0 > > Attachments: MAPREDUCE-2248.1.patch, MAPREDUCE-2248.patch > > > DistributedRaidFileSystem unraids the entire file if it hits a corrupt block. > It is better to unraid just the corrupt block and use the rest of the file as > normal. This becomes really important when we have tera-byte sized files. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1478) Separate the mapred.lib and mapreduce.lib classes to a different jar and include the user jar ahead of the lib jar.
[ https://issues.apache.org/jira/browse/MAPREDUCE-1478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tom White updated MAPREDUCE-1478: - Attachment: MAPREDUCE-1478.patch move.sh Here's an initial first cut at implementing this. The part for including the user's classpath first is covered in MAPREDUCE-1938. This patch creates a new source tree under src/lib for the MapReduce libraries, and the build creates a separate library jar. It compiles, but I haven't done any testing yet. There are a number of changes that are needed to remove core's dependency on the libraries. Most are small, like removing dependencies on constants or configuration names, but here are some of the other ones I made: * Introduce InputSplitCallback so MapTask doesn't have a dependency on FileSplit. Make mapred.FileSplit implement this interface so it can modify JobConf before the mapper is run. * Task depends on FileOutputCommitter. Push this code down into FileOutputCommitter implementations of OutputCommitter#setupTask. * MapTask, Task depend on the public WrappedMapper, WrappedReducer classes. These need to be constructed reflectively or have private duplicates made (I did the latter in this patch). * org.apache.hadoop.mapreduce.util.ConfigUtil reflectively calls the new org.apache.hadoop.mapreduce.lib.ConfigUtil so that the deprecated keys for the libraries are added. You need to run the move.sh script before applying the patch. > Separate the mapred.lib and mapreduce.lib classes to a different jar and > include the user jar ahead of the lib jar. > --- > > Key: MAPREDUCE-1478 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1478 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: task >Reporter: Owen O'Malley > Attachments: MAPREDUCE-1478.patch, move.sh > > > Currently the user can't include updated library jars as part of their job. > By pulling out the lib classes we can include the classes (eg. > TextInputFormat) in the user's jar and get their version and not the system > installed one. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-2258) IFile reader closes stream and compressor in wrong order
[ https://issues.apache.org/jira/browse/MAPREDUCE-2258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12980885#action_12980885 ] Todd Lipcon commented on MAPREDUCE-2258: Hong: I agree LzoCodec is preferable to LzopCodec for use in intermediate compression, but I think the bug you referenced is no longer the case. LzopCodecs can now be pooled properly with Chris's patch you referenced plus changes on the lzo side. Just to clarify, you agree this code in IFile is wrong and should be fixed, right? > IFile reader closes stream and compressor in wrong order > > > Key: MAPREDUCE-2258 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-2258 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: task >Affects Versions: 0.20.4, 0.22.0 >Reporter: Todd Lipcon > Fix For: 0.22.0 > > > In IFile.Reader.close(), we return the decompressor to the pool and then call > close() on the input stream. This is backwards and causes a rare race in the > case of LzopCodec, since LzopInputStream makes a few calls on the > decompressor object inside close(). If another thread pulls the decompressor > out of the pool and starts to use it in the meantime, the first thread's > close() will cause the second thread to potentially miss pieces of data. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-2258) IFile reader closes stream and compressor in wrong order
[ https://issues.apache.org/jira/browse/MAPREDUCE-2258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12980882#action_12980882 ] Hong Tang commented on MAPREDUCE-2258: -- Such pattern would in general affect all CompressionCodec's and is similar to a bug I filed earlier: HADOOP-4195. On the other hand, as explained by Chris D in HADOOP-4162, LzopCodec cannot be safely reused in Hadoop, and thus the problem you described actually should not happen. In fact, repeatedly getting LzopCodec from CodecPool is likely get you into OOM. > IFile reader closes stream and compressor in wrong order > > > Key: MAPREDUCE-2258 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-2258 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: task >Affects Versions: 0.20.4, 0.22.0 >Reporter: Todd Lipcon > Fix For: 0.22.0 > > > In IFile.Reader.close(), we return the decompressor to the pool and then call > close() on the input stream. This is backwards and causes a rare race in the > case of LzopCodec, since LzopInputStream makes a few calls on the > decompressor object inside close(). If another thread pulls the decompressor > out of the pool and starts to use it in the meantime, the first thread's > close() will cause the second thread to potentially miss pieces of data. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-2258) IFile reader closes stream and compressor in wrong order
[ https://issues.apache.org/jira/browse/MAPREDUCE-2258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12980871#action_12980871 ] Todd Lipcon commented on MAPREDUCE-2258: The following is a unit test I wrote on the hadoop-lzo side that does the same behavior as IFile.Reader.close(): https://github.com/toddlipcon/hadoop-lzo/commit/a5af3b93f52f55828dfc05e7503d38383eec9dc5 It fails reliably since some threads only manage to read part of the data in the file. > IFile reader closes stream and compressor in wrong order > > > Key: MAPREDUCE-2258 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-2258 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: task >Affects Versions: 0.20.4, 0.22.0 >Reporter: Todd Lipcon > Fix For: 0.22.0 > > > In IFile.Reader.close(), we return the decompressor to the pool and then call > close() on the input stream. This is backwards and causes a rare race in the > case of LzopCodec, since LzopInputStream makes a few calls on the > decompressor object inside close(). If another thread pulls the decompressor > out of the pool and starts to use it in the meantime, the first thread's > close() will cause the second thread to potentially miss pieces of data. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (MAPREDUCE-2258) IFile reader closes stream and compressor in wrong order
IFile reader closes stream and compressor in wrong order Key: MAPREDUCE-2258 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2258 Project: Hadoop Map/Reduce Issue Type: Bug Components: task Affects Versions: 0.20.4, 0.22.0 Reporter: Todd Lipcon Fix For: 0.22.0 In IFile.Reader.close(), we return the decompressor to the pool and then call close() on the input stream. This is backwards and causes a rare race in the case of LzopCodec, since LzopInputStream makes a few calls on the decompressor object inside close(). If another thread pulls the decompressor out of the pool and starts to use it in the meantime, the first thread's close() will cause the second thread to potentially miss pieces of data. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.