[jira] [Created] (MAPREDUCE-2782) MR-279: Unit (mockito) tests for CS

2011-08-05 Thread Arun C Murthy (JIRA)
MR-279: Unit (mockito) tests for CS
---

 Key: MAPREDUCE-2782
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2782
 Project: Hadoop Map/Reduce
  Issue Type: Test
  Components: mrv2
Reporter: Arun C Murthy
Assignee: Arun C Murthy
 Fix For: 0.23.0


Add (true) unit tests for CapacityScheduler

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (MAPREDUCE-2782) MR-279: Unit (mockito) tests for CS

2011-08-05 Thread Arun C Murthy (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2782?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy updated MAPREDUCE-2782:
-

Attachment: cs_test.patch

Bunch of unit tests for CS covering user limits, queue limits, multiple 
applications per queue, reservations etc.

I'll add more soon for scheduling across queue hierarchies, priorities etc.

> MR-279: Unit (mockito) tests for CS
> ---
>
> Key: MAPREDUCE-2782
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2782
> Project: Hadoop Map/Reduce
>  Issue Type: Test
>  Components: mrv2
>Reporter: Arun C Murthy
>Assignee: Arun C Murthy
> Fix For: 0.23.0
>
> Attachments: cs_test.patch
>
>
> Add (true) unit tests for CapacityScheduler

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (MAPREDUCE-2702) [MR-279] OutputCommitter changes for MR Application Master recovery

2011-08-05 Thread Sharad Agarwal (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sharad Agarwal updated MAPREDUCE-2702:
--

Attachment: mr2702_v1.patch

OutputCommitter and FileOutputCommitter changes. work in progress. Continuing 
testing..

> [MR-279] OutputCommitter changes for MR Application Master recovery
> ---
>
> Key: MAPREDUCE-2702
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2702
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>  Components: mrv2
>Reporter: Sharad Agarwal
>Assignee: Sharad Agarwal
> Attachments: mr2702_v1.patch
>
>
> In MR AM recovers from a crash, it only reruns the non completed tasks. The 
> completed tasks (along with their output, if any) needs to be recovered from 
> the previous life. This would require some changes in OutputCommitter.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-2702) [MR-279] OutputCommitter changes for MR Application Master recovery

2011-08-05 Thread Sharad Agarwal (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13079906#comment-13079906
 ] 

Sharad Agarwal commented on MAPREDUCE-2702:
---

Also as part of the last patch, fixed the directory structure from 
mapreduce/mr-client/hadoop-mapreduce-client-core/src/test/mapred to 
mapreduce/mr-client/hadoop-mapreduce-client-core/src/test/java

> [MR-279] OutputCommitter changes for MR Application Master recovery
> ---
>
> Key: MAPREDUCE-2702
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2702
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>  Components: mrv2
>Reporter: Sharad Agarwal
>Assignee: Sharad Agarwal
> Attachments: mr2702_v1.patch
>
>
> In MR AM recovers from a crash, it only reruns the non completed tasks. The 
> completed tasks (along with their output, if any) needs to be recovered from 
> the previous life. This would require some changes in OutputCommitter.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Assigned] (MAPREDUCE-2765) DistCp Rewrite

2011-08-05 Thread Amareshwari Sriramadasu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amareshwari Sriramadasu reassigned MAPREDUCE-2765:
--

Assignee: Mithun Radhakrishnan

> DistCp Rewrite
> --
>
> Key: MAPREDUCE-2765
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2765
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>  Components: distcp
>Affects Versions: 0.20.203.0
>Reporter: Mithun Radhakrishnan
>Assignee: Mithun Radhakrishnan
> Attachments: distcpv2.20.203.patch
>
>
> This is a slightly modified version of the DistCp rewrite that Yahoo uses in 
> production today. The rewrite was ground-up, with specific focus on:
> 1. improved startup time (postponing as much work as possible to the MR job)
> 2. support for multiple copy-strategies
> 3. new features (e.g. -atomic, -async, -bandwidth.)
> 4. improved programmatic use
> Some effort has gone into refactoring what used to be achieved by a single 
> large (1.7 KLOC) source file, into a design that (hopefully) reads better too.
> The proposed DistCpV2 preserves command-line-compatibility with the old 
> version, and should be a drop-in replacement.
> New to v2:
> 1. Copy-strategies and the DynamicInputFormat:
>   A copy-strategy determines the policy by which source-file-paths are 
> distributed between map-tasks. (These boil down to the choice of the 
> input-format.) 
>   If no strategy is explicitly specified on the command-line, the policy 
> chosen is "uniform size", where v2 behaves identically to old-DistCp. (The 
> number of bytes transferred by each map-task is roughly equal, at a per-file 
> granularity.) 
>   Alternatively, v2 ships with a "dynamic" copy-strategy (in the 
> DynamicInputFormat). This policy acknowledges that 
>   (a)  dividing files based only on file-size might not be an 
> even distribution (E.g. if some datanodes are slower than others, or if some 
> files are skipped.)
>   (b) a "static" association of a source-path to a map increases 
> the likelihood of long-tails during copy.
>   The "dynamic" strategy divides the list-of-source-paths into a number 
> (> nMaps) of smaller parts. When each map completes its current list of 
> paths, it picks up a new list to process, if available. So if a map-task is 
> stuck on a slow (and not necessarily large) file, other maps can pick up the 
> slack. The thinner the file-list is sliced, the greater the parallelism (and 
> the lower the chances of long-tails). Within reason, of course: the number of 
> these short-lived list-files is capped at an overridable maximum.
>   Internal benchmarks against source/target clusters with some slow(ish) 
> datanodes have indicated significant performance gains when using the 
> dynamic-strategy. Gains are most pronounced when nFiles greatly exceeds nMaps.
>   Please note that the DynamicInputFormat might prove useful outside of 
> DistCp. It is hence available as a mapred/lib, unfettered to DistCpV2. Also 
> note that the copy-strategies have no bearing on the CopyMapper.map() 
> implementation.
>   
> 2. Improved startup-time and programmatic use:
>   When the old-DistCp runs with -update, and creates the 
> list-of-source-paths, it attempts to filter out files that might be skipped 
> (by comparing file-sizes, checksums, etc.) This significantly increases the 
> startup time (or the time spent in serial processing till the MR job is 
> launched), blocking the calling-thread. This becomes pronounced as nFiles 
> increases. (Internal benchmarks have seen situations where more time is spent 
> setting up the job than on the actual transfer.)
>   DistCpV2 postpones as much work as possible to the MR job. The 
> file-listing isn't filtered until the map-task runs (at which time, identical 
> files are skipped). DistCpV2 can now be run "asynchronously". The program 
> quits at job-launch, logging the job-id for tracking. Programmatically, the 
> DistCp.execute() returns a Job instance for progress-tracking.
>   
> 3. New features:
>   (a)   -async: As described in #2.
>   (b)   -atomic: Data is copied to a (user-specifiable) tmp-location, and 
> then moved atomically to destination.
>   (c)   -bandwidth: Enforces a limit on the bandwidth consumed per map.
>   (d)   -strategy: As above.
>   
> A more comprehensive description the newer features, how the dynamic-strategy 
> works, etc. is available in src/site/xdoc/, and in the pdf that's generated 
> therefrom, during the build.
> High on the list of things to do is support to parallelize copies on a 
> per-block level. (i.e. Incorporation of HDFS-222.)
> I look forward to comments, suggestions and discussion that will hopefully 
> ensue. I have this running against Hadoop 0.20.203.0. I also have a po

[jira] [Resolved] (MAPREDUCE-1325) Fix IsolationRunner to run with reduces too

2011-08-05 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli resolved MAPREDUCE-1325.


Resolution: Won't Fix

IsolationRunner is removed via MAPREDUCE-2606. Closing this as "won't fix".

> Fix IsolationRunner to run with reduces too
> ---
>
> Key: MAPREDUCE-1325
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1325
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: task
>Affects Versions: 0.21.0
>Reporter: Vinod Kumar Vavilapalli
>
> HADOOP-4041 fixed various problems with IsolationRunner, but to keep the 
> patch simple it completely broke it for reduces and postponed the effort for 
> another issue. We should fix IsolationRunner to work with reduces here.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Resolved] (MAPREDUCE-1663) mapred.local.dir for IsolationRunner is not set properly

2011-08-05 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli resolved MAPREDUCE-1663.


Resolution: Won't Fix

IsolationRunner is removed via MAPREDUCE-2606. Closing this as "won't fix".

> mapred.local.dir for IsolationRunner is not set properly
> 
>
> Key: MAPREDUCE-1663
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1663
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: task
>Affects Versions: 0.21.0
>Reporter: Amareshwari Sriramadasu
>
> MAPREDUCE-842 enforces that mapred.local.dir for Task to be attemptDirs in 
> all local dirs. The conf is not set to attemptDirs for IsolationRunner. 
> So, the map output files are created directly under Cluster's local dir. The 
> same can be seen in TestIsolationRunner. Test does not fail because it is not 
> validated.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-2781) mr279 RM application finishtime not set

2011-08-05 Thread Thomas Graves (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13079958#comment-13079958
 ] 

Thomas Graves commented on MAPREDUCE-2781:
--

Sorry, looks like I forgot to hit enter with my comment last night.  I was 
actually planning on adding some and want to do a bit more validation on all 
the kill transitions. Posted the patch in case anyone wanted to take an initial 
look at it. Thanks.

> mr279 RM application finishtime not set
> ---
>
> Key: MAPREDUCE-2781
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2781
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mrv2
>Reporter: Thomas Graves
>Assignee: Thomas Graves
>Priority: Minor
> Attachments: finishtime.patch
>
>
> The RM Application finishTime isn't being set.  Looks like it got lost in the 
> RM refactor.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (MAPREDUCE-2059) RecoveryManager attempts to add jobtracker.info

2011-08-05 Thread Subroto Sanyal (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Subroto Sanyal updated MAPREDUCE-2059:
--

Attachment: MAPREDUCE-2059.patch

> RecoveryManager attempts to add jobtracker.info
> ---
>
> Key: MAPREDUCE-2059
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2059
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: jobtracker
>Affects Versions: 0.21.0, 0.22.0
> Environment: 
> https://svn.apache.org/repos/asf/hadoop/mapreduce/trunk@994941
>Reporter: Dan Adkins
> Attachments: MAPREDUCE-2059.patch
>
>
> The jobtracker is treating the file 'jobtracker.info' in the system data 
> directory as a job to be recovered, resulting in the following:
> 10/09/09 18:06:02 WARN mapred.JobTracker: Failed to add the job 
> jobtracker.info
> java.lang.IllegalArgumentException: JobId string : jobtracker.info is not 
> properly formed
> at org.apache.hadoop.mapreduce.JobID.forName(JobID.java:158)
> at org.apache.hadoop.mapred.JobID.forName(JobID.java:84)
> at 
> org.apache.hadoop.mapred.JobTracker$RecoveryManager.addJobForRecovery(JobTracker.java:1057)
> at org.apache.hadoop.mapred.JobTracker.(JobTracker.java:1565)
> at 
> org.apache.hadoop.mapred.JobTracker.startTracker(JobTracker.java:275)
> at 
> org.apache.hadoop.mapred.JobTracker.startTracker(JobTracker.java:267)
> at 
> org.apache.hadoop.mapred.JobTracker.startTracker(JobTracker.java:262)
> at org.apache.hadoop.mapred.JobTracker.main(JobTracker.java:4256)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-2564) NullPointerException in WritableComparator

2011-08-05 Thread Devaraj K (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13079969#comment-13079969
 ] 

Devaraj K commented on MAPREDUCE-2564:
--

As per the code, we cannot use the WritableComparator directly because the 
constructors are made as protected. If there is no comparator for the output 
key class, it uses this way to create instance of it and it initializes the 
buffer and keys.

{code:xml}
  public static synchronized WritableComparator get(Class c) {
WritableComparator comparator = comparators.get(c);
if (comparator == null)
  comparator = new WritableComparator(c, true);
return comparator;
  }
{code}

When we are writing custom comparator and if we don’t want to use the existing 
compare() method we can use the below constructor from the subclass
{code:xml}
protected WritableComparator(Class keyClass) {
this(keyClass, false);
  }
{code}
Otherwise we can use this constructor
{code:xml}
protected WritableComparator(Class keyClass,
  boolean createInstances)
{code}
Can you elaborate the usage scenario?


> NullPointerException in WritableComparator
> --
>
> Key: MAPREDUCE-2564
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2564
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: task
>Affects Versions: 0.20.203.0
> Environment: java version "1.6.0_23"
> Java(TM) SE Runtime Environment (build 1.6.0_23-b05)
> Java HotSpot(TM) 64-Bit Server VM (build 19.0-b09, mixed mode)
> hadoop-0.20.203.0rc1
>Reporter: Joseph Shraibman
>Priority: Blocker
>
> java.lang.NullPointerException
> at 
> org.apache.hadoop.io.WritableComparator.compare(WritableComparator.java:96)
> at 
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.compare(MapTask.java:1110)
> at org.apache.hadoop.util.QuickSort.sortInternal(QuickSort.java:70)
> at org.apache.hadoop.util.QuickSort.sort(QuickSort.java:59)
> at 
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1398)
> at 
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1297)
> at 
> org.apache.hadoop.mapred.MapTask$NewOutputCollector.close(MapTask.java:698)
> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:765)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:369)
> at 
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:210)
> It is easy to see why this is happening.  The WritableComparator is created 
> in JobConf line 776:
> {code:title=JobConf.java}
>return 
> WritableComparator.get(getMapOutputKeyClass().asSubclass(WritableComparable.class));
> }
> {code}
> which calls
> {code:title=WritableComparator.java|borderStyle=solid}
> protected WritableComparator(Class keyClass) {
> this(keyClass, false);
>   }
>   protected WritableComparator(Class keyClass,
>   boolean createInstances) {
> this.keyClass = keyClass;
> if (createInstances) {
>   key1 = newKey();
>   key2 = newKey();
>   buffer = new DataInputBuffer();
> } else {
>   key1 = key2 = null;
>   buffer = null;
> }
>   }
> {code}
> key1, key2, and buffer end up being null. When compare() is called the NPE is 
> thrown because buffer is null
> {code:title=WritableComparator.java|borderStyle=solid}
>  public int compare(byte[] b1, int s1, int l1, byte[] b2, int s2, int l2) {
> try {
>   buffer.reset(b1, s1, l1);   // parse key1
>   key1.readFields(buffer);
>   
>   buffer.reset(b2, s2, l2);   // parse key2
>   key2.readFields(buffer);
>   
> } catch (IOException e) {
>   throw new RuntimeException(e);
> }
> 
> return compare(key1, key2);   // compare them
>   }
> {code}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (MAPREDUCE-2059) RecoveryManager attempts to add jobtracker.info

2011-08-05 Thread Subroto Sanyal (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Subroto Sanyal updated MAPREDUCE-2059:
--

 Tags: Jobtracker
   Labels: hadoop  (was: )
Affects Version/s: (was: 0.22.0)
   (was: 0.21.0)
   0.23.0
   Status: Patch Available  (was: Open)

The attached patch verifies whether the files for job recovery don't start with 
a name of same as restartCount file (*jobtracker.info*). This will filter out 
*jobtracker.info.rec* if it is present by chance.

> RecoveryManager attempts to add jobtracker.info
> ---
>
> Key: MAPREDUCE-2059
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2059
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: jobtracker
>Affects Versions: 0.23.0
> Environment: 
> https://svn.apache.org/repos/asf/hadoop/mapreduce/trunk@994941
>Reporter: Dan Adkins
>  Labels: hadoop
> Attachments: MAPREDUCE-2059.patch
>
>
> The jobtracker is treating the file 'jobtracker.info' in the system data 
> directory as a job to be recovered, resulting in the following:
> 10/09/09 18:06:02 WARN mapred.JobTracker: Failed to add the job 
> jobtracker.info
> java.lang.IllegalArgumentException: JobId string : jobtracker.info is not 
> properly formed
> at org.apache.hadoop.mapreduce.JobID.forName(JobID.java:158)
> at org.apache.hadoop.mapred.JobID.forName(JobID.java:84)
> at 
> org.apache.hadoop.mapred.JobTracker$RecoveryManager.addJobForRecovery(JobTracker.java:1057)
> at org.apache.hadoop.mapred.JobTracker.(JobTracker.java:1565)
> at 
> org.apache.hadoop.mapred.JobTracker.startTracker(JobTracker.java:275)
> at 
> org.apache.hadoop.mapred.JobTracker.startTracker(JobTracker.java:267)
> at 
> org.apache.hadoop.mapred.JobTracker.startTracker(JobTracker.java:262)
> at org.apache.hadoop.mapred.JobTracker.main(JobTracker.java:4256)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-2102) Collect iops per task

2011-08-05 Thread Robert Joseph Evans (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13079983#comment-13079983
 ] 

Robert Joseph Evans commented on MAPREDUCE-2102:


I am not sure how to do this on other operating systems.  On Linux if Hadoop is 
running on a newer Kernel then it can use /proc//io, but I believe that it 
has to be newer then 2.6.20.  Anything else would require a very ugly hack like 
LD_PRELOAD.

> Collect iops per task
> -
>
> Key: MAPREDUCE-2102
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2102
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>  Components: task
>Reporter: Arun C Murthy
>
> We should enhance the MR framework to collect iops per task along with 
> CPU/Memory (MAPREDUCE-220).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-2781) mr279 RM application finishtime not set

2011-08-05 Thread Thomas Graves (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13079987#comment-13079987
 ] 

Thomas Graves commented on MAPREDUCE-2781:
--

startTime was also not being set, fixed that.  start time is actually more like 
submit time. Might be nice to add some finer grain times in there for atleast 
submit, launch.  I also don't think each attempt is tracking its start/finish 
time so those would be nice enhancements.

> mr279 RM application finishtime not set
> ---
>
> Key: MAPREDUCE-2781
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2781
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mrv2
>Reporter: Thomas Graves
>Assignee: Thomas Graves
>Priority: Minor
> Attachments: finishtime.patch
>
>
> The RM Application finishTime isn't being set.  Looks like it got lost in the 
> RM refactor.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Resolved] (MAPREDUCE-2564) NullPointerException in WritableComparator

2011-08-05 Thread Joseph Shraibman (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph Shraibman resolved MAPREDUCE-2564.
-

   Resolution: Invalid
Fix Version/s: 0.20.203.0

> NullPointerException in WritableComparator
> --
>
> Key: MAPREDUCE-2564
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2564
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: task
>Affects Versions: 0.20.203.0
> Environment: java version "1.6.0_23"
> Java(TM) SE Runtime Environment (build 1.6.0_23-b05)
> Java HotSpot(TM) 64-Bit Server VM (build 19.0-b09, mixed mode)
> hadoop-0.20.203.0rc1
>Reporter: Joseph Shraibman
>Priority: Blocker
> Fix For: 0.20.203.0
>
>
> java.lang.NullPointerException
> at 
> org.apache.hadoop.io.WritableComparator.compare(WritableComparator.java:96)
> at 
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.compare(MapTask.java:1110)
> at org.apache.hadoop.util.QuickSort.sortInternal(QuickSort.java:70)
> at org.apache.hadoop.util.QuickSort.sort(QuickSort.java:59)
> at 
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1398)
> at 
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1297)
> at 
> org.apache.hadoop.mapred.MapTask$NewOutputCollector.close(MapTask.java:698)
> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:765)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:369)
> at 
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:210)
> It is easy to see why this is happening.  The WritableComparator is created 
> in JobConf line 776:
> {code:title=JobConf.java}
>return 
> WritableComparator.get(getMapOutputKeyClass().asSubclass(WritableComparable.class));
> }
> {code}
> which calls
> {code:title=WritableComparator.java|borderStyle=solid}
> protected WritableComparator(Class keyClass) {
> this(keyClass, false);
>   }
>   protected WritableComparator(Class keyClass,
>   boolean createInstances) {
> this.keyClass = keyClass;
> if (createInstances) {
>   key1 = newKey();
>   key2 = newKey();
>   buffer = new DataInputBuffer();
> } else {
>   key1 = key2 = null;
>   buffer = null;
> }
>   }
> {code}
> key1, key2, and buffer end up being null. When compare() is called the NPE is 
> thrown because buffer is null
> {code:title=WritableComparator.java|borderStyle=solid}
>  public int compare(byte[] b1, int s1, int l1, byte[] b2, int s2, int l2) {
> try {
>   buffer.reset(b1, s1, l1);   // parse key1
>   key1.readFields(buffer);
>   
>   buffer.reset(b2, s2, l2);   // parse key2
>   key2.readFields(buffer);
>   
> } catch (IOException e) {
>   throw new RuntimeException(e);
> }
> 
> return compare(key1, key2);   // compare them
>   }
> {code}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-2564) NullPointerException in WritableComparator

2011-08-05 Thread Joseph Shraibman (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13079994#comment-13079994
 ] 

Joseph Shraibman commented on MAPREDUCE-2564:
-

Hmm.  This was fixed between 0.20.2 and 0.20.203.  It seems I reported this 
against 203.  Is it possible I used 0.20.2 and looked at the source of 0.20.2 
while thinking it was 203?  It is hard to remember what I was doing on that day 
two months ago.

> NullPointerException in WritableComparator
> --
>
> Key: MAPREDUCE-2564
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2564
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: task
>Affects Versions: 0.20.203.0
> Environment: java version "1.6.0_23"
> Java(TM) SE Runtime Environment (build 1.6.0_23-b05)
> Java HotSpot(TM) 64-Bit Server VM (build 19.0-b09, mixed mode)
> hadoop-0.20.203.0rc1
>Reporter: Joseph Shraibman
>Priority: Blocker
> Fix For: 0.20.203.0
>
>
> java.lang.NullPointerException
> at 
> org.apache.hadoop.io.WritableComparator.compare(WritableComparator.java:96)
> at 
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.compare(MapTask.java:1110)
> at org.apache.hadoop.util.QuickSort.sortInternal(QuickSort.java:70)
> at org.apache.hadoop.util.QuickSort.sort(QuickSort.java:59)
> at 
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1398)
> at 
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1297)
> at 
> org.apache.hadoop.mapred.MapTask$NewOutputCollector.close(MapTask.java:698)
> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:765)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:369)
> at 
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:210)
> It is easy to see why this is happening.  The WritableComparator is created 
> in JobConf line 776:
> {code:title=JobConf.java}
>return 
> WritableComparator.get(getMapOutputKeyClass().asSubclass(WritableComparable.class));
> }
> {code}
> which calls
> {code:title=WritableComparator.java|borderStyle=solid}
> protected WritableComparator(Class keyClass) {
> this(keyClass, false);
>   }
>   protected WritableComparator(Class keyClass,
>   boolean createInstances) {
> this.keyClass = keyClass;
> if (createInstances) {
>   key1 = newKey();
>   key2 = newKey();
>   buffer = new DataInputBuffer();
> } else {
>   key1 = key2 = null;
>   buffer = null;
> }
>   }
> {code}
> key1, key2, and buffer end up being null. When compare() is called the NPE is 
> thrown because buffer is null
> {code:title=WritableComparator.java|borderStyle=solid}
>  public int compare(byte[] b1, int s1, int l1, byte[] b2, int s2, int l2) {
> try {
>   buffer.reset(b1, s1, l1);   // parse key1
>   key1.readFields(buffer);
>   
>   buffer.reset(b2, s2, l2);   // parse key2
>   key2.readFields(buffer);
>   
> } catch (IOException e) {
>   throw new RuntimeException(e);
> }
> 
> return compare(key1, key2);   // compare them
>   }
> {code}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-2764) Fix renewal of dfs delegation tokens

2011-08-05 Thread Daryn Sharp (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13080014#comment-13080014
 ] 

Daryn Sharp commented on MAPREDUCE-2764:


bq. The token is used for rpc connection and when token selector does a match 
it needs the service to have rpc port. But if we store hftp uri in the token, 
the port will be http port. How is that being handled?

The {{getServiceAuthority()}} is extracting the authority from the service if 
it's a uri, else it uses the service itself if it's not a uri.

bq. One approach could be to store both issuer and service in the current 
service field.

The the service is currently "authority", and I changed it to be either 
"authority" or "scheme://authority".  I'd like to eventually move everything to 
"scheme://authority" hence the compatibility.

bq. I think the AbstractDelegationtionTokenSelector change and introduction of 
TokenMatcher is unnecessary.  BlockTokenSelector or JobTokenSelector shouldn't 
need to change at all.

That means the token selectors go back to copy-n-paste...  The base 
{{selectTokens()}} method filters the tokens to find the expected token kind.  
This eliminates the copy-n-paste code which limits the ability to alter the 
standard behavior.

The {{TokenMatcher}} is used to allow tokens, if those so chose, to provide 
custom selection criteria above and beyond the standard service match.  This is 
also used to match a RPC socket to its token, else it just adds even more 
copy-n-paste.

{{JobTokenSelector}} only removed methods now in the base class.  
{{BlockTokenSelector}} oddly doesn't care about service at all, so it's 
{{selectToken()}} is overridden to provide a {{TokenMatcher}} that ignores 
service.

bq. A change in hdfs's DelegationTokenSelector should suffice for this jira.

Iff we are going back to copy-n-paste...

> Fix renewal of dfs delegation tokens
> 
>
> Key: MAPREDUCE-2764
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2764
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Daryn Sharp
>Assignee: Daryn Sharp
> Fix For: 0.20.205.0
>
> Attachments: MAPREDUCE-2764.patch
>
>
> The JT may have issues renewing hftp tokens which disrupt long distcp jobs.  
> The problem is the JT's delegation token renewal code is built on brittle 
> assumptions.  The token's service field contains only the "ip:port" pair.  
> The renewal process assumes that the scheme must be hdfs.  If that fails due 
> to a {{VersionMismatchException}}, it tries https based on another assumption 
> that it must be hftp if it's not hdfs.  A number of other exceptions, most 
> commonly {{IOExceptions}}, can be generated which fouls up the renewal since 
> it won't fallback to https.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-2764) Fix renewal of dfs delegation tokens

2011-08-05 Thread Daryn Sharp (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13080034#comment-13080034
 ] 

Daryn Sharp commented on MAPREDUCE-2764:


I just re-read your comments.  There is indeed a problem with hftp, but it's 
not insurmountable, and it's not incompatible with this change.

Other DFS filesystems, and h-s-ftp need only one token.  Hftp needs two tokens: 
http & https.  I think this explains the non-standard behavior of hftp setting 
up its own token renewal thread.

Unfortunately the existing {{FileSystem.getDelegationToken()}} only returns a 
single token.  It's not unreasonable to think that filesystems, like hftp, will 
require multiple tokens.  I'll probably need to add a 
{{FileSystem.getDelegationToken-s-}} which will return a collection of tokens.

> Fix renewal of dfs delegation tokens
> 
>
> Key: MAPREDUCE-2764
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2764
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Daryn Sharp
>Assignee: Daryn Sharp
> Fix For: 0.20.205.0
>
> Attachments: MAPREDUCE-2764.patch
>
>
> The JT may have issues renewing hftp tokens which disrupt long distcp jobs.  
> The problem is the JT's delegation token renewal code is built on brittle 
> assumptions.  The token's service field contains only the "ip:port" pair.  
> The renewal process assumes that the scheme must be hdfs.  If that fails due 
> to a {{VersionMismatchException}}, it tries https based on another assumption 
> that it must be hftp if it's not hdfs.  A number of other exceptions, most 
> commonly {{IOExceptions}}, can be generated which fouls up the renewal since 
> it won't fallback to https.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (MAPREDUCE-2780) Standardize the value of token service

2011-08-05 Thread Daryn Sharp (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2780?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daryn Sharp updated MAPREDUCE-2780:
---

Attachment: MAPREDUCE-2780.patch

Add {{Token.setService(InetSocketAddress)}} and change all token producers to 
call the method instead of constructing the same string.

> Standardize the value of token service
> --
>
> Key: MAPREDUCE-2780
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2780
> Project: Hadoop Map/Reduce
>  Issue Type: Sub-task
>Reporter: Daryn Sharp
>Assignee: Daryn Sharp
> Fix For: 0.20.205.0
>
> Attachments: MAPREDUCE-2780.patch
>
>
> The token's service field must (currently) be set to "ip:port".  All the 
> producers of a token are independently building the service string.  This 
> should be done via a common method to reduce the chance of error, and to 
> facilitate the field value being easily changed in the (near) future.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-2782) MR-279: Unit (mockito) tests for CS

2011-08-05 Thread Mahadev konar (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2782?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13080065#comment-13080065
 ] 

Mahadev konar commented on MAPREDUCE-2782:
--

arun,
 I think you generated the diff across an older version of MR-279. There's lot 
of accidental changes regarding securityinfo (in common) in your patch.

> MR-279: Unit (mockito) tests for CS
> ---
>
> Key: MAPREDUCE-2782
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2782
> Project: Hadoop Map/Reduce
>  Issue Type: Test
>  Components: mrv2
>Reporter: Arun C Murthy
>Assignee: Arun C Murthy
> Fix For: 0.23.0
>
> Attachments: cs_test.patch
>
>
> Add (true) unit tests for CapacityScheduler

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (MAPREDUCE-2780) Standardize the value of token service

2011-08-05 Thread Daryn Sharp (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2780?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daryn Sharp updated MAPREDUCE-2780:
---

Attachment: MAPREDUCE-2780-2.patch

Oops, posted earlier rev of this patch.  Small change to keep the service field 
encoding completely encapsulated within the Token class.

> Standardize the value of token service
> --
>
> Key: MAPREDUCE-2780
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2780
> Project: Hadoop Map/Reduce
>  Issue Type: Sub-task
>Reporter: Daryn Sharp
>Assignee: Daryn Sharp
> Fix For: 0.20.205.0
>
> Attachments: MAPREDUCE-2780-2.patch, MAPREDUCE-2780.patch
>
>
> The token's service field must (currently) be set to "ip:port".  All the 
> producers of a token are independently building the service string.  This 
> should be done via a common method to reduce the chance of error, and to 
> facilitate the field value being easily changed in the (near) future.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (MAPREDUCE-2779) JobSplitWriter.java can't handle large job.split file

2011-08-05 Thread Ming Ma (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ming Ma updated MAPREDUCE-2779:
---

Affects Version/s: 0.23.0
   0.22.0

> JobSplitWriter.java can't handle large job.split file
> -
>
> Key: MAPREDUCE-2779
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2779
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: job submission
>Affects Versions: 0.22.0, 0.23.0
>Reporter: Ming Ma
> Attachments: MAPREDUCE-2779-trunk.patch
>
>
> We use cascading MultiInputFormat. MultiInputFormat sometimes generates big 
> job.split used internally by hadoop, sometimes it can go beyond 2GB.
> In JobSplitWriter.java, the function that generates such file uses 32bit 
> signed integer to compute offset into job.split.
> writeNewSplits
> ...
> int prevCount = out.size();
> ...
> int currCount = out.size();
> writeOldSplits
> ...
>   long offset = out.size();
> ...
>   int currLen = out.size();

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (MAPREDUCE-2783) mr279 job history handling after killing application

2011-08-05 Thread Thomas Graves (JIRA)
mr279 job history handling after killing application


 Key: MAPREDUCE-2783
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2783
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Reporter: Thomas Graves


The job history/application tracking url handling during kill is not 
consistent. Currently if you kill a job that was running the tracking url 
points to job history, but job history server doesn't have the job.  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-2779) JobSplitWriter.java can't handle large job.split file

2011-08-05 Thread Joep Rottinghuis (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13080105#comment-13080105
 ] 

Joep Rottinghuis commented on MAPREDUCE-2779:
-

Patch looks good.
Affects 0.20-security-* branches as well.

FSDataOutputStream.getPos is not thread safe but then again 
DataOutPutStream.size does not seem to be thread safe either.
Even through the DataOutPutStream.write method is synchronized, 
FSDataOutputStrem.write is not synchronized.
This does not seem to be an issue in the current code path because 
createSplitFiles does not expose out.


> JobSplitWriter.java can't handle large job.split file
> -
>
> Key: MAPREDUCE-2779
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2779
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: job submission
>Affects Versions: 0.22.0, 0.23.0
>Reporter: Ming Ma
> Attachments: MAPREDUCE-2779-trunk.patch
>
>
> We use cascading MultiInputFormat. MultiInputFormat sometimes generates big 
> job.split used internally by hadoop, sometimes it can go beyond 2GB.
> In JobSplitWriter.java, the function that generates such file uses 32bit 
> signed integer to compute offset into job.split.
> writeNewSplits
> ...
> int prevCount = out.size();
> ...
> int currCount = out.size();
> writeOldSplits
> ...
>   long offset = out.size();
> ...
>   int currLen = out.size();

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (MAPREDUCE-2779) JobSplitWriter.java can't handle large job.split file

2011-08-05 Thread Ming Ma (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ming Ma updated MAPREDUCE-2779:
---

Affects Version/s: 0.20.205.0
   Status: Patch Available  (was: Open)

> JobSplitWriter.java can't handle large job.split file
> -
>
> Key: MAPREDUCE-2779
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2779
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: job submission
>Affects Versions: 0.20.205.0, 0.22.0, 0.23.0
>Reporter: Ming Ma
> Attachments: MAPREDUCE-2779-trunk.patch
>
>
> We use cascading MultiInputFormat. MultiInputFormat sometimes generates big 
> job.split used internally by hadoop, sometimes it can go beyond 2GB.
> In JobSplitWriter.java, the function that generates such file uses 32bit 
> signed integer to compute offset into job.split.
> writeNewSplits
> ...
> int prevCount = out.size();
> ...
> int currCount = out.size();
> writeOldSplits
> ...
>   long offset = out.size();
> ...
>   int currLen = out.size();

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-2782) MR-279: Unit (mockito) tests for CS

2011-08-05 Thread Arun C Murthy (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2782?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13080113#comment-13080113
 ] 

Arun C Murthy commented on MAPREDUCE-2782:
--

Teaches me to stop working before it's too late. I'll fix it as part of the 
next patch which has more tests. Thanks.

> MR-279: Unit (mockito) tests for CS
> ---
>
> Key: MAPREDUCE-2782
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2782
> Project: Hadoop Map/Reduce
>  Issue Type: Test
>  Components: mrv2
>Reporter: Arun C Murthy
>Assignee: Arun C Murthy
> Fix For: 0.23.0
>
> Attachments: cs_test.patch
>
>
> Add (true) unit tests for CapacityScheduler

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-2783) mr279 job history handling after killing application

2011-08-05 Thread Thomas Graves (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13080119#comment-13080119
 ] 

Thomas Graves commented on MAPREDUCE-2783:
--

The handling of this is also not right when transitioning from Running to 
Failed state.

> mr279 job history handling after killing application
> 
>
> Key: MAPREDUCE-2783
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2783
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mrv2
>Reporter: Thomas Graves
>
> The job history/application tracking url handling during kill is not 
> consistent. Currently if you kill a job that was running the tracking url 
> points to job history, but job history server doesn't have the job.  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-2764) Fix renewal of dfs delegation tokens

2011-08-05 Thread Jitendra Nath Pandey (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13080128#comment-13080128
 ] 

Jitendra Nath Pandey commented on MAPREDUCE-2764:
-

> I'll probably need to add a FileSystem.getDelegationToken-s- which will 
> return a collection of tokens.
 I didn't get that, how will multiple tokens solve this problem? Currently, 
only one token can be passed in the URL for hftp calls, and it is that token 
which needs to be renewed by JT and the same token is eventually used for rpc.

> Iff we are going back to copy-n-paste...
 There may be some code repeated in current code base. All I am recommending is 
to do that refactoring in a separate jira. My point was that, if we don't do 
that refactoring in this patch only DelegationTokenSelector will need to 
change. 

> The getServiceAuthority() is extracting the authority from the service if 
> it's a uri, else it uses the 
> service itself if it's not a uri.
>The the service is currently "authority", and I changed it to be either 
>"authority" or >"scheme://authority". I'd like to eventually move everything 
>to "scheme://authority" hence the >compatibility.

 The problem is that we can't use same authority for both rpc and hftp, because 
ports are different.

> Fix renewal of dfs delegation tokens
> 
>
> Key: MAPREDUCE-2764
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2764
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Daryn Sharp
>Assignee: Daryn Sharp
> Fix For: 0.20.205.0
>
> Attachments: MAPREDUCE-2764.patch
>
>
> The JT may have issues renewing hftp tokens which disrupt long distcp jobs.  
> The problem is the JT's delegation token renewal code is built on brittle 
> assumptions.  The token's service field contains only the "ip:port" pair.  
> The renewal process assumes that the scheme must be hdfs.  If that fails due 
> to a {{VersionMismatchException}}, it tries https based on another assumption 
> that it must be hftp if it's not hdfs.  A number of other exceptions, most 
> commonly {{IOExceptions}}, can be generated which fouls up the renewal since 
> it won't fallback to https.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (MAPREDUCE-2767) Remove Linux task-controller from 0.22 branch

2011-08-05 Thread Milind Bhandarkar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Milind Bhandarkar updated MAPREDUCE-2767:
-

Status: Patch Available  (was: Open)

Submitting for tests to run.

> Remove Linux task-controller from 0.22 branch
> -
>
> Key: MAPREDUCE-2767
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2767
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: security
>Affects Versions: 0.22.0
>Reporter: Milind Bhandarkar
>Assignee: Milind Bhandarkar
>Priority: Blocker
> Fix For: 0.22.0
>
> Attachments: MR2767.patch
>
>
> There's a potential security hole in the task-controller as it stands. Based 
> on the discussion on general@, removing task-controller from the 0.22 branch 
> will pave way for 0.22.0 release. (This was done for the 0.21.0 release as 
> well: see MAPREDUCE-2014.) We can roll a 0.22.1 release with the 
> task-controller when it is fixed.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (MAPREDUCE-2767) Remove Linux task-controller from 0.22 branch

2011-08-05 Thread Milind Bhandarkar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Milind Bhandarkar updated MAPREDUCE-2767:
-

Status: Open  (was: Patch Available)

Cancelling patch, and re-submitting to see if the (now available) jenkins picks 
it up.

> Remove Linux task-controller from 0.22 branch
> -
>
> Key: MAPREDUCE-2767
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2767
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: security
>Affects Versions: 0.22.0
>Reporter: Milind Bhandarkar
>Assignee: Milind Bhandarkar
>Priority: Blocker
> Fix For: 0.22.0
>
> Attachments: MR2767.patch
>
>
> There's a potential security hole in the task-controller as it stands. Based 
> on the discussion on general@, removing task-controller from the 0.22 branch 
> will pave way for 0.22.0 release. (This was done for the 0.21.0 release as 
> well: see MAPREDUCE-2014.) We can roll a 0.22.1 release with the 
> task-controller when it is fixed.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-2767) Remove Linux task-controller from 0.22 branch

2011-08-05 Thread Aaron T. Myers (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13080229#comment-13080229
 ] 

Aaron T. Myers commented on MAPREDUCE-2767:
---

Hey Milind, I'm pretty sure you have to actually attach a new patch. Simply 
going Patch Available -> Open -> Patch Available isn't sufficient, I believe.

> Remove Linux task-controller from 0.22 branch
> -
>
> Key: MAPREDUCE-2767
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2767
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: security
>Affects Versions: 0.22.0
>Reporter: Milind Bhandarkar
>Assignee: Milind Bhandarkar
>Priority: Blocker
> Fix For: 0.22.0
>
> Attachments: MR2767.patch
>
>
> There's a potential security hole in the task-controller as it stands. Based 
> on the discussion on general@, removing task-controller from the 0.22 branch 
> will pave way for 0.22.0 release. (This was done for the 0.21.0 release as 
> well: see MAPREDUCE-2014.) We can roll a 0.22.1 release with the 
> task-controller when it is fixed.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (MAPREDUCE-2767) Remove Linux task-controller from 0.22 branch

2011-08-05 Thread Milind Bhandarkar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Milind Bhandarkar updated MAPREDUCE-2767:
-

Status: Open  (was: Patch Available)

Following @atm's directions.

> Remove Linux task-controller from 0.22 branch
> -
>
> Key: MAPREDUCE-2767
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2767
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: security
>Affects Versions: 0.22.0
>Reporter: Milind Bhandarkar
>Assignee: Milind Bhandarkar
>Priority: Blocker
> Fix For: 0.22.0
>
> Attachments: MR2767.patch
>
>
> There's a potential security hole in the task-controller as it stands. Based 
> on the discussion on general@, removing task-controller from the 0.22 branch 
> will pave way for 0.22.0 release. (This was done for the 0.21.0 release as 
> well: see MAPREDUCE-2014.) We can roll a 0.22.1 release with the 
> task-controller when it is fixed.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (MAPREDUCE-2767) Remove Linux task-controller from 0.22 branch

2011-08-05 Thread Milind Bhandarkar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Milind Bhandarkar updated MAPREDUCE-2767:
-

Attachment: MR2767.patch

Attaching the same patch as before for jenkins to pick up.

> Remove Linux task-controller from 0.22 branch
> -
>
> Key: MAPREDUCE-2767
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2767
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: security
>Affects Versions: 0.22.0
>Reporter: Milind Bhandarkar
>Assignee: Milind Bhandarkar
>Priority: Blocker
> Fix For: 0.22.0
>
> Attachments: MR2767.patch
>
>
> There's a potential security hole in the task-controller as it stands. Based 
> on the discussion on general@, removing task-controller from the 0.22 branch 
> will pave way for 0.22.0 release. (This was done for the 0.21.0 release as 
> well: see MAPREDUCE-2014.) We can roll a 0.22.1 release with the 
> task-controller when it is fixed.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (MAPREDUCE-2767) Remove Linux task-controller from 0.22 branch

2011-08-05 Thread Milind Bhandarkar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Milind Bhandarkar updated MAPREDUCE-2767:
-

Attachment: (was: MR2767.patch)

> Remove Linux task-controller from 0.22 branch
> -
>
> Key: MAPREDUCE-2767
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2767
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: security
>Affects Versions: 0.22.0
>Reporter: Milind Bhandarkar
>Assignee: Milind Bhandarkar
>Priority: Blocker
> Fix For: 0.22.0
>
> Attachments: MR2767.patch
>
>
> There's a potential security hole in the task-controller as it stands. Based 
> on the discussion on general@, removing task-controller from the 0.22 branch 
> will pave way for 0.22.0 release. (This was done for the 0.21.0 release as 
> well: see MAPREDUCE-2014.) We can roll a 0.22.1 release with the 
> task-controller when it is fixed.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (MAPREDUCE-2767) Remove Linux task-controller from 0.22 branch

2011-08-05 Thread Milind Bhandarkar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Milind Bhandarkar updated MAPREDUCE-2767:
-

Status: Patch Available  (was: Open)

Dear jenkins, please pick up the patch.

> Remove Linux task-controller from 0.22 branch
> -
>
> Key: MAPREDUCE-2767
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2767
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: security
>Affects Versions: 0.22.0
>Reporter: Milind Bhandarkar
>Assignee: Milind Bhandarkar
>Priority: Blocker
> Fix For: 0.22.0
>
> Attachments: MR2767.patch
>
>
> There's a potential security hole in the task-controller as it stands. Based 
> on the discussion on general@, removing task-controller from the 0.22 branch 
> will pave way for 0.22.0 release. (This was done for the 0.21.0 release as 
> well: see MAPREDUCE-2014.) We can roll a 0.22.1 release with the 
> task-controller when it is fixed.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (MAPREDUCE-2508) vaidya script uses the wrong path for vaidya jar due to jar renaming

2011-08-05 Thread Owen O'Malley (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2508?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley updated MAPREDUCE-2508:
-

Fix Version/s: (was: 0.20.204.0)
   0.20.205.0

> vaidya script uses the wrong path for vaidya jar due to jar renaming
> 
>
> Key: MAPREDUCE-2508
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2508
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: contrib/vaidya
>Reporter: Allen Wittenauer
>Priority: Trivial
> Fix For: 0.20.205.0
>
>
> This clearly wasn't tested in 203.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-2764) Fix renewal of dfs delegation tokens

2011-08-05 Thread Jitendra Nath Pandey (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13080263#comment-13080263
 ] 

Jitendra Nath Pandey commented on MAPREDUCE-2764:
-

I had a long discussion with Owen. Here is the new proposal:

- Add setKind method in the token
- Add a new kind for tokens i.e. HFTP
- HftpFileSystem client fetches the token and sets the kind to HFTP and the 
service to ip:http-port.
- Renewer determines the filesystem to use (hftp or DFS) using the kind of the 
token. The hftp port is also obtained from the token.
- Hftp client looks for HFTP tokens in the UGI when making an hftp call to nn.
- Hftp client sends the token in the url to the namenode. Before serializing 
the token in the url, Hftp client changes the service to ip:rpc-port and kind 
of the token back to HDFS.

Pros:
  No change to Namenode.
  No change to rpc.
  No change to token selectors.
  Most of the changes are confined to HftpFileSystem class.
  HftpFilesystem already has a mechanism to map the http port to rpc port, 
which it can use to set the service before encoding the token in the url.
  Changing the kind will make sure that this token is not accidentally used for 
rpc connection.

Cons:
  Token obtained over hftp, cannot be used over rpc by clients.

> Fix renewal of dfs delegation tokens
> 
>
> Key: MAPREDUCE-2764
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2764
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Daryn Sharp
>Assignee: Daryn Sharp
> Fix For: 0.20.205.0
>
> Attachments: MAPREDUCE-2764.patch
>
>
> The JT may have issues renewing hftp tokens which disrupt long distcp jobs.  
> The problem is the JT's delegation token renewal code is built on brittle 
> assumptions.  The token's service field contains only the "ip:port" pair.  
> The renewal process assumes that the scheme must be hdfs.  If that fails due 
> to a {{VersionMismatchException}}, it tries https based on another assumption 
> that it must be hftp if it's not hdfs.  A number of other exceptions, most 
> commonly {{IOExceptions}}, can be generated which fouls up the renewal since 
> it won't fallback to https.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-2764) Fix renewal of dfs delegation tokens

2011-08-05 Thread Daryn Sharp (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13080320#comment-13080320
 ] 

Daryn Sharp commented on MAPREDUCE-2764:


I don't think the goal should be to minimize change in the hopes of reducing 
risk, but rather to improve the code base to reduce risk. First, a brief(ish) 
and polite rebuttal:
# Changes to namenode and rpc client were trivial.  Every token producer should 
not know how to properly encode the socket into a service string.  
{code}
-  String s = NameNode.getAddress(conf).getAddress().getHostAddress()
- + ":" + NameNode.getAddress(conf).getPort();
-  token.setService(new Text(s));
+  token.setService(NameNode.getAddress(conf));
{code}
# Changes to rpc client are trivial.  Just like the namenode changes, the 
encoding of the service is abstracted.
{code}
-  InetSocketAddress addr = remoteId.getAddress();
-  token = tokenSelector.selectToken(new Text(addr.getAddress()
-  .getHostAddress() + ":" + addr.getPort()), 
-  ticket.getTokens());
+  token = tokenSelector.selectToken(server, ticket.getTokens());
{code}
# The refactor of the token selectors is essentially creating a base class to 
eliminate all the copy-n-paste in the individual selectors.  There's not much 
risk there.  Adding URI support was very simple.

The additional cons to the counter-proposal:
* The token renewer will be hardcoded such that mapreduce will need to be 
recompiled when another filesystem is added.
* Mapred will require a mapping of token type to schemes.
* Can't simply subclass an existing filesystem with a new scheme w/o 
recompilation.
* The tokens and selectors are generics-based.  The type is used to do 
unchecked-casting, so adding the ability to change the type is playing with 
fire.
* HftpFilesystem should not be guessing the rpc port when the rpc port is in 
the original token.
* Is not a sustainable design pattern.

IMHO, the token renewer should be "dumb" and not require knowledge of every 
filesystem.  Ergo, all filesystem tokens should have the same type.  All 
filesystem tokens should be routed to their filesystem object.  The filesystem 
object handles renewal.  To solve the hftp/remote-hdfs token issue, I'd prefer 
for the hftp token to simply wrap/contain the remote dfs token instead of 
twiddling its fields.

Pros:
* Overall simple and clean(er) design
* No editing of a token type to scheme mapping for new filesystems
* No recompilation of mapreduce to add a filesystem 
* Eliminates brittle & risky copy-n-paste in the token producers
* Prepares the code to be more flexible and extensible with future service types
* Does not guess the remote rpc port

Cons:
* Slices across multiple components
* It's a bit more work, mainly due to cleanup of existing code

The "nice thing" about this type of low-level change is it will break 
immediately if implemented incorrectly...

> Fix renewal of dfs delegation tokens
> 
>
> Key: MAPREDUCE-2764
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2764
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Daryn Sharp
>Assignee: Daryn Sharp
> Fix For: 0.20.205.0
>
> Attachments: MAPREDUCE-2764.patch
>
>
> The JT may have issues renewing hftp tokens which disrupt long distcp jobs.  
> The problem is the JT's delegation token renewal code is built on brittle 
> assumptions.  The token's service field contains only the "ip:port" pair.  
> The renewal process assumes that the scheme must be hdfs.  If that fails due 
> to a {{VersionMismatchException}}, it tries https based on another assumption 
> that it must be hftp if it's not hdfs.  A number of other exceptions, most 
> commonly {{IOExceptions}}, can be generated which fouls up the renewal since 
> it won't fallback to https.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-1834) TestSimulatorDeterministicReplay timesout on trunk

2011-08-05 Thread arunkumar (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13080362#comment-13080362
 ] 

arunkumar commented on MAPREDUCE-1834:
--

I did svn checkout of common,hdfs and mapreduce from 
http://svn.apache.org/repos/asf/hadoop/common/tags/release-0.21.0/ 
I tried to build three of them and BUILD WAS SUCCESSFUL for all three. 
>From /mapreduce/src/contrib/mumak/ i tried $ant and $ant test.
   
The former was successful but the later says timeout and showed failure at :

Running org.apache.hadoop.mapred.TestSimulatorDeterministicReplay  
Tests run: 1, Failures: 1, Errors: 0, Time elapsed: 30.647 sec

I tried to apply latest patch given here but it gives :
 
patching file DeterministicCollectionAspects.aj
Hunk #1 FAILED at 21.
Hunk #2 FAILED at 42.
2 out of 2 hunks FAILED -- saving rejects to file 
DeterministicCollectionAspects.aj.rej
patching file FakeConcurrentHashMap.java
patching file TestSimulatorDeterministicReplay.java
Hunk #2 succeeded at 26 with fuzz 2 (offset 1 line).
Hunk #3 succeeded at 50 (offset 1 line).

Actually,In svn checkout there are no DeterministicCollectionAspects.aj and 
FakeConcurrentHashMap.java files.
I don't know enough about to fix it. Any help ?



> TestSimulatorDeterministicReplay timesout on trunk
> --
>
> Key: MAPREDUCE-1834
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1834
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: contrib/mumak
>Affects Versions: 0.21.0
>Reporter: Amareshwari Sriramadasu
>Assignee: Hong Tang
> Fix For: 0.22.0
>
> Attachments: MAPREDUCE-1834.patch, 
> TestSimulatorDeterministicReplay.log, mr-1834-20100727.patch, 
> mr-1834-20100729.patch, mr-1834-20100802.patch
>
>
> TestSimulatorDeterministicReplay timesout on trunk.
> See hudson patch build 
> http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/216/testReport/org.apache.hadoop.mapred/TestSimulatorDeterministicReplay/testMain/

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira