[jira] [Updated] (MAPREDUCE-4511) Add IFile readahead

2012-08-10 Thread Ahmed Radwan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ahmed Radwan updated MAPREDUCE-4511:


Attachment: MAPREDUCE-4511_branch-1_rev4.patch

Here are updated patches that address the case when NativeIO is not available 
(i.e., raPool == null), and also the case when the passed conf to 
IFileInputStream is null. I saw these issues while conducting further testing.

 Add IFile readahead
 ---

 Key: MAPREDUCE-4511
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4511
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv1, mrv2
Reporter: Ahmed Radwan
Assignee: Ahmed Radwan
 Attachments: MAPREDUCE-4511_branch-1_rev2.patch, 
 MAPREDUCE-4511_branch-1_rev3.patch, MAPREDUCE-4511_branch-1_rev4.patch, 
 MAPREDUCE-4511_branch1.patch, MAPREDUCE-4511_trunk.patch, 
 MAPREDUCE-4511_trunk_rev2.patch, MAPREDUCE-4511_trunk_rev3.patch, 
 MAPREDUCE-4511_trunk_rev4.patch


 This ticket is to add IFile readahead as part of HADOOP-7714.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (MAPREDUCE-4511) Add IFile readahead

2012-08-10 Thread Ahmed Radwan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ahmed Radwan updated MAPREDUCE-4511:


Attachment: MAPREDUCE-4511_trunk_rev4.patch

Here is the trunk version.

 Add IFile readahead
 ---

 Key: MAPREDUCE-4511
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4511
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv1, mrv2
Reporter: Ahmed Radwan
Assignee: Ahmed Radwan
 Attachments: MAPREDUCE-4511_branch-1_rev2.patch, 
 MAPREDUCE-4511_branch-1_rev3.patch, MAPREDUCE-4511_branch-1_rev4.patch, 
 MAPREDUCE-4511_branch1.patch, MAPREDUCE-4511_trunk.patch, 
 MAPREDUCE-4511_trunk_rev2.patch, MAPREDUCE-4511_trunk_rev3.patch, 
 MAPREDUCE-4511_trunk_rev4.patch


 This ticket is to add IFile readahead as part of HADOOP-7714.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-1487) io.DataInputBuffer.getLength() semantic wrong/confused

2012-08-10 Thread Devin Bayer (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13432685#comment-13432685
 ] 

Devin Bayer commented on MAPREDUCE-1487:


It's very embarrassing this issue isn't fixed. Do the developers realise Hadoop 
cannot even copy data from mapper to reducer without corruption?

 io.DataInputBuffer.getLength() semantic wrong/confused
 --

 Key: MAPREDUCE-1487
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1487
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 0.20.1, 0.20.2, 0.21.0
 Environment: linux
Reporter: Yang Yang

 I was trying Google Protocol Buffer as a value type on hadoop,
 then when I used it in reducer, the parser always failed.
 while it worked fine on a plain inputstream reader or mapper.
 the reason is that the reducer interface in Task.java gave a buffer larger 
 than an actual encoded record to the parser, and the parser does not stop 
 until it reaches
 the buffer end, so it parsed some  junk bytes.
 the root cause is due to hadoop.io.DataInputBuffer.java :
 in 0.20.1  DataInputBuffer.java  line 47:
 public void reset(byte[] input, int start, int length) {
   this.buf = input;
   this.count = start+length;
   this.mark = start;
   this.pos = start;
 }
 public byte[] getData() { return buf; }
 public int getPosition() { return pos; }
 public int getLength() { return count; }
 we see that the above logic seems to assume that getLength() returns the 
 total ** capacity ***, not the actual content length, of the buffer, yet 
 latter code
 seems to assume the semantic that length is actual content length, i.e. end 
 - start :
  /** Resets the data that the buffer reads. */
   public void reset(byte[] input, int start, int length) {
 buffer.reset(input, start, length);
   }
 i.e. if u call reset( getPosition(), getLength() ) on the same buffer again 
 and again, the length would be infinitely increased.
 this confusion in semantic is reflected in  many places, at leat in 
 IFile.java, and Task.java, where it caused the original issue.
 around line 980 of Task.java, we see
valueIn.reset(nextValueBytes.getData(), nextValueBytes.getPosition(), 
 nextValueBytes.getLength())  
 if the position above is not empty, the above actually sets a buffer too 
 long, causing the reported issue.
 changing the Task.java as a hack , to 
   valueIn.reset(nextValueBytes.getData(), nextValueBytes.getPosition(), 
 nextValueBytes.getLength() - nextValueBytes.getPosition());
 fixed the issue, but the semantic of DataInputBuffer should be fixed and 
 streamlined

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-4470) Fix TestCombineFileInputFormat.testForEmptyFile

2012-08-10 Thread Ilya Katsov (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13432694#comment-13432694
 ] 

Ilya Katsov commented on MAPREDUCE-4470:


Mariappan,
Thank you for a test. Could you please clarify what is the correct way to 
obtain split locations in InputFormat?

 Fix TestCombineFileInputFormat.testForEmptyFile
 ---

 Key: MAPREDUCE-4470
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4470
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: test
Affects Versions: 2.0.0-alpha
Reporter: Kihwal Lee
 Fix For: 2.1.0-alpha, 3.0.0

 Attachments: MAPREDUCE-4470.patch, TestFileInputFormat.java


 TestCombineFileInputFormat.testForEmptyFile started failing after 
 HADOOP-8599. 
 It expects one split on an empty input file, but with HADOOP-8599 it gets 
 zero. The new behavior seems correct, but is it breaking anything else?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-4511) Add IFile readahead

2012-08-10 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13432727#comment-13432727
 ] 

Hadoop QA commented on MAPREDUCE-4511:
--

-1 overall.  Here are the results of testing the latest attachment 
  
http://issues.apache.org/jira/secure/attachment/12540461/MAPREDUCE-4511_trunk_rev4.patch
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 1 new or modified test 
files.

-1 javac.  The patch appears to cause the build to fail.

Console output: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2723//console

This message is automatically generated.

 Add IFile readahead
 ---

 Key: MAPREDUCE-4511
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4511
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv1, mrv2
Reporter: Ahmed Radwan
Assignee: Ahmed Radwan
 Attachments: MAPREDUCE-4511_branch-1_rev2.patch, 
 MAPREDUCE-4511_branch-1_rev3.patch, MAPREDUCE-4511_branch-1_rev4.patch, 
 MAPREDUCE-4511_branch1.patch, MAPREDUCE-4511_trunk.patch, 
 MAPREDUCE-4511_trunk_rev2.patch, MAPREDUCE-4511_trunk_rev3.patch, 
 MAPREDUCE-4511_trunk_rev4.patch


 This ticket is to add IFile readahead as part of HADOOP-7714.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (MAPREDUCE-4548) M/R jobs can not access S3 if Kerberos is enabled

2012-08-10 Thread Manuel DE FERRAN (JIRA)
Manuel DE FERRAN created MAPREDUCE-4548:
---

 Summary: M/R jobs can not access S3 if Kerberos is enabled
 Key: MAPREDUCE-4548
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4548
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: Manuel DE FERRAN


With Kerberos enabled, any job that is taking as input or output s3 files fails.

It can be easily reproduced with wordcount shipped in hadoop-examples.jar and a 
public S3 file:
{code}
/opt/hadoop/bin/hadoop --config /opt/hadoop/conf/ jar 
/opt/hadoop/hadoop-examples-1.0.0.jar wordcount s3n://ubikodpublic/test out01
{code}

returns:
{code}
12/08/10 12:40:19 INFO hdfs.DFSClient: Created HDFS_DELEGATION_TOKEN token 192 
for hadoop on 10.85.151.233:9000
12/08/10 12:40:19 INFO security.TokenCache: Got dt for 
hdfs://aws04.machine.com:9000/mapred/staging/hadoop/.staging/job_201208101229_0004;uri=10.85.151.233:9000;t.service=10.85.151.233:9000
12/08/10 12:40:19 INFO mapred.JobClient: Cleaning up the staging area 
hdfs://aws04.machine.com:9000/mapred/staging/hadoop/.staging/job_201208101229_0004
java.lang.IllegalArgumentException: java.net.UnknownHostException: ubikodpublic
at 
org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:293)
at 
org.apache.hadoop.security.SecurityUtil.buildDTServiceName(SecurityUtil.java:317)
at 
org.apache.hadoop.fs.FileSystem.getCanonicalServiceName(FileSystem.java:189)
at 
org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodesInternal(TokenCache.java:92)
at 
org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodes(TokenCache.java:79)
at 
org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:197)
at 
org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:252)
SNIP
{code}

This patch seems to fix it.
{code}
Index: core/org/apache/hadoop/security/SecurityUtil.java
===
--- core/org/apache/hadoop/security/SecurityUtil.java   (révision 1305278)
+++ core/org/apache/hadoop/security/SecurityUtil.java   (copie de travail)
@@ -313,6 +313,9 @@
 if (authority == null || authority.isEmpty()) {
   return null;
 }
+if (uri.getScheme().equals(s3n) || uri.getScheme().equals(s3)) {
+  return null;
+}
 InetSocketAddress addr = NetUtils.createSocketAddr(authority, defPort);
 return buildTokenService(addr).toString();
}
{code}


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (MAPREDUCE-4548) M/R jobs can not access S3 if Kerberos is enabled

2012-08-10 Thread Manuel DE FERRAN (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4548?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manuel DE FERRAN updated MAPREDUCE-4548:


Environment: hadoop-1.0.0;MIT kerberos;java 1.6.0_26

 M/R jobs can not access S3 if Kerberos is enabled
 -

 Key: MAPREDUCE-4548
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4548
 Project: Hadoop Map/Reduce
  Issue Type: Bug
 Environment: hadoop-1.0.0;MIT kerberos;java 1.6.0_26
Reporter: Manuel DE FERRAN

 With Kerberos enabled, any job that is taking as input or output s3 files 
 fails.
 It can be easily reproduced with wordcount shipped in hadoop-examples.jar and 
 a public S3 file:
 {code}
 /opt/hadoop/bin/hadoop --config /opt/hadoop/conf/ jar 
 /opt/hadoop/hadoop-examples-1.0.0.jar wordcount s3n://ubikodpublic/test out01
 {code}
 returns:
 {code}
 12/08/10 12:40:19 INFO hdfs.DFSClient: Created HDFS_DELEGATION_TOKEN token 
 192 for hadoop on 10.85.151.233:9000
 12/08/10 12:40:19 INFO security.TokenCache: Got dt for 
 hdfs://aws04.machine.com:9000/mapred/staging/hadoop/.staging/job_201208101229_0004;uri=10.85.151.233:9000;t.service=10.85.151.233:9000
 12/08/10 12:40:19 INFO mapred.JobClient: Cleaning up the staging area 
 hdfs://aws04.machine.com:9000/mapred/staging/hadoop/.staging/job_201208101229_0004
 java.lang.IllegalArgumentException: java.net.UnknownHostException: 
 ubikodpublic
 at 
 org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:293)
 at 
 org.apache.hadoop.security.SecurityUtil.buildDTServiceName(SecurityUtil.java:317)
 at 
 org.apache.hadoop.fs.FileSystem.getCanonicalServiceName(FileSystem.java:189)
 at 
 org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodesInternal(TokenCache.java:92)
 at 
 org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodes(TokenCache.java:79)
 at 
 org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:197)
 at 
 org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:252)
 SNIP
 {code}
 This patch seems to fix it.
 {code}
 Index: core/org/apache/hadoop/security/SecurityUtil.java
 ===
 --- core/org/apache/hadoop/security/SecurityUtil.java   (révision 1305278)
 +++ core/org/apache/hadoop/security/SecurityUtil.java   (copie de travail)
 @@ -313,6 +313,9 @@
  if (authority == null || authority.isEmpty()) {
return null;
  }
 +if (uri.getScheme().equals(s3n) || uri.getScheme().equals(s3)) {
 +  return null;
 +}
  InetSocketAddress addr = NetUtils.createSocketAddr(authority, defPort);
  return buildTokenService(addr).toString();
 }
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira