[jira] [Updated] (MAPREDUCE-4511) Add IFile readahead
[ https://issues.apache.org/jira/browse/MAPREDUCE-4511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ahmed Radwan updated MAPREDUCE-4511: Attachment: MAPREDUCE-4511_branch-1_rev4.patch Here are updated patches that address the case when NativeIO is not available (i.e., raPool == null), and also the case when the passed conf to IFileInputStream is null. I saw these issues while conducting further testing. Add IFile readahead --- Key: MAPREDUCE-4511 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4511 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv1, mrv2 Reporter: Ahmed Radwan Assignee: Ahmed Radwan Attachments: MAPREDUCE-4511_branch-1_rev2.patch, MAPREDUCE-4511_branch-1_rev3.patch, MAPREDUCE-4511_branch-1_rev4.patch, MAPREDUCE-4511_branch1.patch, MAPREDUCE-4511_trunk.patch, MAPREDUCE-4511_trunk_rev2.patch, MAPREDUCE-4511_trunk_rev3.patch, MAPREDUCE-4511_trunk_rev4.patch This ticket is to add IFile readahead as part of HADOOP-7714. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-4511) Add IFile readahead
[ https://issues.apache.org/jira/browse/MAPREDUCE-4511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ahmed Radwan updated MAPREDUCE-4511: Attachment: MAPREDUCE-4511_trunk_rev4.patch Here is the trunk version. Add IFile readahead --- Key: MAPREDUCE-4511 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4511 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv1, mrv2 Reporter: Ahmed Radwan Assignee: Ahmed Radwan Attachments: MAPREDUCE-4511_branch-1_rev2.patch, MAPREDUCE-4511_branch-1_rev3.patch, MAPREDUCE-4511_branch-1_rev4.patch, MAPREDUCE-4511_branch1.patch, MAPREDUCE-4511_trunk.patch, MAPREDUCE-4511_trunk_rev2.patch, MAPREDUCE-4511_trunk_rev3.patch, MAPREDUCE-4511_trunk_rev4.patch This ticket is to add IFile readahead as part of HADOOP-7714. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-1487) io.DataInputBuffer.getLength() semantic wrong/confused
[ https://issues.apache.org/jira/browse/MAPREDUCE-1487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13432685#comment-13432685 ] Devin Bayer commented on MAPREDUCE-1487: It's very embarrassing this issue isn't fixed. Do the developers realise Hadoop cannot even copy data from mapper to reducer without corruption? io.DataInputBuffer.getLength() semantic wrong/confused -- Key: MAPREDUCE-1487 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1487 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 0.20.1, 0.20.2, 0.21.0 Environment: linux Reporter: Yang Yang I was trying Google Protocol Buffer as a value type on hadoop, then when I used it in reducer, the parser always failed. while it worked fine on a plain inputstream reader or mapper. the reason is that the reducer interface in Task.java gave a buffer larger than an actual encoded record to the parser, and the parser does not stop until it reaches the buffer end, so it parsed some junk bytes. the root cause is due to hadoop.io.DataInputBuffer.java : in 0.20.1 DataInputBuffer.java line 47: public void reset(byte[] input, int start, int length) { this.buf = input; this.count = start+length; this.mark = start; this.pos = start; } public byte[] getData() { return buf; } public int getPosition() { return pos; } public int getLength() { return count; } we see that the above logic seems to assume that getLength() returns the total ** capacity ***, not the actual content length, of the buffer, yet latter code seems to assume the semantic that length is actual content length, i.e. end - start : /** Resets the data that the buffer reads. */ public void reset(byte[] input, int start, int length) { buffer.reset(input, start, length); } i.e. if u call reset( getPosition(), getLength() ) on the same buffer again and again, the length would be infinitely increased. this confusion in semantic is reflected in many places, at leat in IFile.java, and Task.java, where it caused the original issue. around line 980 of Task.java, we see valueIn.reset(nextValueBytes.getData(), nextValueBytes.getPosition(), nextValueBytes.getLength()) if the position above is not empty, the above actually sets a buffer too long, causing the reported issue. changing the Task.java as a hack , to valueIn.reset(nextValueBytes.getData(), nextValueBytes.getPosition(), nextValueBytes.getLength() - nextValueBytes.getPosition()); fixed the issue, but the semantic of DataInputBuffer should be fixed and streamlined -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4470) Fix TestCombineFileInputFormat.testForEmptyFile
[ https://issues.apache.org/jira/browse/MAPREDUCE-4470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13432694#comment-13432694 ] Ilya Katsov commented on MAPREDUCE-4470: Mariappan, Thank you for a test. Could you please clarify what is the correct way to obtain split locations in InputFormat? Fix TestCombineFileInputFormat.testForEmptyFile --- Key: MAPREDUCE-4470 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4470 Project: Hadoop Map/Reduce Issue Type: Bug Components: test Affects Versions: 2.0.0-alpha Reporter: Kihwal Lee Fix For: 2.1.0-alpha, 3.0.0 Attachments: MAPREDUCE-4470.patch, TestFileInputFormat.java TestCombineFileInputFormat.testForEmptyFile started failing after HADOOP-8599. It expects one split on an empty input file, but with HADOOP-8599 it gets zero. The new behavior seems correct, but is it breaking anything else? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4511) Add IFile readahead
[ https://issues.apache.org/jira/browse/MAPREDUCE-4511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13432727#comment-13432727 ] Hadoop QA commented on MAPREDUCE-4511: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12540461/MAPREDUCE-4511_trunk_rev4.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 1 new or modified test files. -1 javac. The patch appears to cause the build to fail. Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2723//console This message is automatically generated. Add IFile readahead --- Key: MAPREDUCE-4511 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4511 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv1, mrv2 Reporter: Ahmed Radwan Assignee: Ahmed Radwan Attachments: MAPREDUCE-4511_branch-1_rev2.patch, MAPREDUCE-4511_branch-1_rev3.patch, MAPREDUCE-4511_branch-1_rev4.patch, MAPREDUCE-4511_branch1.patch, MAPREDUCE-4511_trunk.patch, MAPREDUCE-4511_trunk_rev2.patch, MAPREDUCE-4511_trunk_rev3.patch, MAPREDUCE-4511_trunk_rev4.patch This ticket is to add IFile readahead as part of HADOOP-7714. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (MAPREDUCE-4548) M/R jobs can not access S3 if Kerberos is enabled
Manuel DE FERRAN created MAPREDUCE-4548: --- Summary: M/R jobs can not access S3 if Kerberos is enabled Key: MAPREDUCE-4548 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4548 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: Manuel DE FERRAN With Kerberos enabled, any job that is taking as input or output s3 files fails. It can be easily reproduced with wordcount shipped in hadoop-examples.jar and a public S3 file: {code} /opt/hadoop/bin/hadoop --config /opt/hadoop/conf/ jar /opt/hadoop/hadoop-examples-1.0.0.jar wordcount s3n://ubikodpublic/test out01 {code} returns: {code} 12/08/10 12:40:19 INFO hdfs.DFSClient: Created HDFS_DELEGATION_TOKEN token 192 for hadoop on 10.85.151.233:9000 12/08/10 12:40:19 INFO security.TokenCache: Got dt for hdfs://aws04.machine.com:9000/mapred/staging/hadoop/.staging/job_201208101229_0004;uri=10.85.151.233:9000;t.service=10.85.151.233:9000 12/08/10 12:40:19 INFO mapred.JobClient: Cleaning up the staging area hdfs://aws04.machine.com:9000/mapred/staging/hadoop/.staging/job_201208101229_0004 java.lang.IllegalArgumentException: java.net.UnknownHostException: ubikodpublic at org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:293) at org.apache.hadoop.security.SecurityUtil.buildDTServiceName(SecurityUtil.java:317) at org.apache.hadoop.fs.FileSystem.getCanonicalServiceName(FileSystem.java:189) at org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodesInternal(TokenCache.java:92) at org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodes(TokenCache.java:79) at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:197) at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:252) SNIP {code} This patch seems to fix it. {code} Index: core/org/apache/hadoop/security/SecurityUtil.java === --- core/org/apache/hadoop/security/SecurityUtil.java (révision 1305278) +++ core/org/apache/hadoop/security/SecurityUtil.java (copie de travail) @@ -313,6 +313,9 @@ if (authority == null || authority.isEmpty()) { return null; } +if (uri.getScheme().equals(s3n) || uri.getScheme().equals(s3)) { + return null; +} InetSocketAddress addr = NetUtils.createSocketAddr(authority, defPort); return buildTokenService(addr).toString(); } {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-4548) M/R jobs can not access S3 if Kerberos is enabled
[ https://issues.apache.org/jira/browse/MAPREDUCE-4548?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Manuel DE FERRAN updated MAPREDUCE-4548: Environment: hadoop-1.0.0;MIT kerberos;java 1.6.0_26 M/R jobs can not access S3 if Kerberos is enabled - Key: MAPREDUCE-4548 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4548 Project: Hadoop Map/Reduce Issue Type: Bug Environment: hadoop-1.0.0;MIT kerberos;java 1.6.0_26 Reporter: Manuel DE FERRAN With Kerberos enabled, any job that is taking as input or output s3 files fails. It can be easily reproduced with wordcount shipped in hadoop-examples.jar and a public S3 file: {code} /opt/hadoop/bin/hadoop --config /opt/hadoop/conf/ jar /opt/hadoop/hadoop-examples-1.0.0.jar wordcount s3n://ubikodpublic/test out01 {code} returns: {code} 12/08/10 12:40:19 INFO hdfs.DFSClient: Created HDFS_DELEGATION_TOKEN token 192 for hadoop on 10.85.151.233:9000 12/08/10 12:40:19 INFO security.TokenCache: Got dt for hdfs://aws04.machine.com:9000/mapred/staging/hadoop/.staging/job_201208101229_0004;uri=10.85.151.233:9000;t.service=10.85.151.233:9000 12/08/10 12:40:19 INFO mapred.JobClient: Cleaning up the staging area hdfs://aws04.machine.com:9000/mapred/staging/hadoop/.staging/job_201208101229_0004 java.lang.IllegalArgumentException: java.net.UnknownHostException: ubikodpublic at org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:293) at org.apache.hadoop.security.SecurityUtil.buildDTServiceName(SecurityUtil.java:317) at org.apache.hadoop.fs.FileSystem.getCanonicalServiceName(FileSystem.java:189) at org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodesInternal(TokenCache.java:92) at org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodes(TokenCache.java:79) at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:197) at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:252) SNIP {code} This patch seems to fix it. {code} Index: core/org/apache/hadoop/security/SecurityUtil.java === --- core/org/apache/hadoop/security/SecurityUtil.java (révision 1305278) +++ core/org/apache/hadoop/security/SecurityUtil.java (copie de travail) @@ -313,6 +313,9 @@ if (authority == null || authority.isEmpty()) { return null; } +if (uri.getScheme().equals(s3n) || uri.getScheme().equals(s3)) { + return null; +} InetSocketAddress addr = NetUtils.createSocketAddr(authority, defPort); return buildTokenService(addr).toString(); } {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira