[jira] Commented: (MAPREDUCE-1026) Shuffle should be secure

2009-12-10 Thread Aaron Kimball (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12789044#action_12789044
 ] 

Aaron Kimball commented on MAPREDUCE-1026:
--

I am finding a NullPointerException in Shuffle when I run things with the 
LocalJobRunner:

{code}
09/12/10 16:08:58 WARN mapred.LocalJobRunner: job_local_0001
java.lang.NullPointerException
at org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:108)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:358)
at 
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:299)
{code}

{{reduceTask.getJobTokens()}} is returning null; I can't see anyplace in 
LocalJobRunner where the JobTokens object is being initialized. I think this 
patch is to blame?

 Shuffle should be secure
 

 Key: MAPREDUCE-1026
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1026
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
  Components: security
Reporter: Owen O'Malley
Assignee: Boris Shkolnik
 Fix For: 0.22.0

 Attachments: MAPREDUCE-1026-1.patch, MAPREDUCE-1026-12.patch, 
 MAPREDUCE-1026-13.patch, MAPREDUCE-1026-14.patch, MAPREDUCE-1026-15.patch, 
 MAPREDUCE-1026-2.patch, MAPREDUCE-1026-3.patch, MAPREDUCE-1026-7.patch, 
 MAPREDUCE-1026-9.patch, MAPREDUCE-1026.patch, MAPREDUCE-1026.patch


 Since the user's data is available via http from the TaskTrackers, we should 
 require a job-specific secret to access it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1026) Shuffle should be secure

2009-12-10 Thread Devaraj Das (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12789083#action_12789083
 ] 

Devaraj Das commented on MAPREDUCE-1026:


I don't think so. In the local mode, shuffle shouldn't be invoked at all...

 Shuffle should be secure
 

 Key: MAPREDUCE-1026
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1026
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
  Components: security
Reporter: Owen O'Malley
Assignee: Boris Shkolnik
 Fix For: 0.22.0

 Attachments: MAPREDUCE-1026-1.patch, MAPREDUCE-1026-12.patch, 
 MAPREDUCE-1026-13.patch, MAPREDUCE-1026-14.patch, MAPREDUCE-1026-15.patch, 
 MAPREDUCE-1026-2.patch, MAPREDUCE-1026-3.patch, MAPREDUCE-1026-7.patch, 
 MAPREDUCE-1026-9.patch, MAPREDUCE-1026.patch, MAPREDUCE-1026.patch


 Since the user's data is available via http from the TaskTrackers, we should 
 require a job-specific secret to access it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1026) Shuffle should be secure

2009-12-02 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12784923#action_12784923
 ] 

Hudson commented on MAPREDUCE-1026:
---

Integrated in Hadoop-Mapreduce-trunk #162 (See 
[http://hudson.zones.apache.org/hudson/job/Hadoop-Mapreduce-trunk/162/])


 Shuffle should be secure
 

 Key: MAPREDUCE-1026
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1026
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
  Components: security
Reporter: Owen O'Malley
Assignee: Boris Shkolnik
 Fix For: 0.22.0

 Attachments: MAPREDUCE-1026-1.patch, MAPREDUCE-1026-12.patch, 
 MAPREDUCE-1026-13.patch, MAPREDUCE-1026-14.patch, MAPREDUCE-1026-15.patch, 
 MAPREDUCE-1026-2.patch, MAPREDUCE-1026-3.patch, MAPREDUCE-1026-7.patch, 
 MAPREDUCE-1026-9.patch, MAPREDUCE-1026.patch, MAPREDUCE-1026.patch


 Since the user's data is available via http from the TaskTrackers, we should 
 require a job-specific secret to access it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1026) Shuffle should be secure

2009-11-23 Thread Boris Shkolnik (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12781512#action_12781512
 ] 

Boris Shkolnik commented on MAPREDUCE-1026:
---

created MAPREDUCE-1236 for LOG.isdebugenabled issue

 Shuffle should be secure
 

 Key: MAPREDUCE-1026
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1026
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
  Components: security
Reporter: Owen O'Malley
Assignee: Boris Shkolnik
 Fix For: 0.22.0

 Attachments: MAPREDUCE-1026-1.patch, MAPREDUCE-1026-12.patch, 
 MAPREDUCE-1026-13.patch, MAPREDUCE-1026-14.patch, MAPREDUCE-1026-15.patch, 
 MAPREDUCE-1026-2.patch, MAPREDUCE-1026-3.patch, MAPREDUCE-1026-7.patch, 
 MAPREDUCE-1026-9.patch, MAPREDUCE-1026.patch, MAPREDUCE-1026.patch


 Since the user's data is available via http from the TaskTrackers, we should 
 require a job-specific secret to access it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1026) Shuffle should be secure

2009-11-20 Thread Devaraj Das (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12780854#action_12780854
 ] 

Devaraj Das commented on MAPREDUCE-1026:


I missed some LOG.debug statements that creates string objects unnecessarily. 
We should make the LOGs conditional on 'if (isDebugEnabled)' in a separate jira.

 Shuffle should be secure
 

 Key: MAPREDUCE-1026
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1026
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
  Components: security
Reporter: Owen O'Malley
Assignee: Boris Shkolnik
 Fix For: 0.22.0

 Attachments: MAPREDUCE-1026-1.patch, MAPREDUCE-1026-12.patch, 
 MAPREDUCE-1026-13.patch, MAPREDUCE-1026-14.patch, MAPREDUCE-1026-15.patch, 
 MAPREDUCE-1026-2.patch, MAPREDUCE-1026-3.patch, MAPREDUCE-1026-7.patch, 
 MAPREDUCE-1026-9.patch, MAPREDUCE-1026.patch, MAPREDUCE-1026.patch


 Since the user's data is available via http from the TaskTrackers, we should 
 require a job-specific secret to access it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1026) Shuffle should be secure

2009-11-19 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12780285#action_12780285
 ] 

Hadoop QA commented on MAPREDUCE-1026:
--

+1 overall.  Here are the results of testing the latest attachment 
  
http://issues.apache.org/jira/secure/attachment/12425504/MAPREDUCE-1026-14.patch
  against trunk revision 881673.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 6 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/146/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/146/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/146/artifact/trunk/build/test/checkstyle-errors.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/146/console

This message is automatically generated.

 Shuffle should be secure
 

 Key: MAPREDUCE-1026
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1026
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
  Components: security
Reporter: Owen O'Malley
Assignee: Boris Shkolnik
 Attachments: MAPREDUCE-1026-1.patch, MAPREDUCE-1026-12.patch, 
 MAPREDUCE-1026-13.patch, MAPREDUCE-1026-14.patch, MAPREDUCE-1026-2.patch, 
 MAPREDUCE-1026-3.patch, MAPREDUCE-1026-7.patch, MAPREDUCE-1026-9.patch, 
 MAPREDUCE-1026.patch, MAPREDUCE-1026.patch


 Since the user's data is available via http from the TaskTrackers, we should 
 require a job-specific secret to access it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1026) Shuffle should be secure

2009-11-18 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12779817#action_12779817
 ] 

Hadoop QA commented on MAPREDUCE-1026:
--

-1 overall.  Here are the results of testing the latest attachment 
  
http://issues.apache.org/jira/secure/attachment/12425415/MAPREDUCE-1026-13.patch
  against trunk revision 881673.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 6 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

-1 findbugs.  The patch appears to introduce 1 new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed core unit tests.

-1 contrib tests.  The patch failed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/251/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/251/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/251/artifact/trunk/build/test/checkstyle-errors.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/251/console

This message is automatically generated.

 Shuffle should be secure
 

 Key: MAPREDUCE-1026
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1026
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
  Components: security
Reporter: Owen O'Malley
Assignee: Boris Shkolnik
 Attachments: MAPREDUCE-1026-1.patch, MAPREDUCE-1026-12.patch, 
 MAPREDUCE-1026-13.patch, MAPREDUCE-1026-2.patch, MAPREDUCE-1026-3.patch, 
 MAPREDUCE-1026-7.patch, MAPREDUCE-1026-9.patch, MAPREDUCE-1026.patch, 
 MAPREDUCE-1026.patch


 Since the user's data is available via http from the TaskTrackers, we should 
 require a job-specific secret to access it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1026) Shuffle should be secure

2009-11-10 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12776260#action_12776260
 ] 

Hadoop QA commented on MAPREDUCE-1026:
--

-1 overall.  Here are the results of testing the latest attachment 
  
http://issues.apache.org/jira/secure/attachment/12424539/MAPREDUCE-1026-3.patch
  against trunk revision 834284.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 3 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

-1 release audit.  The applied patch generated 161 release audit warnings 
(more than the trunk's current 159 warnings).

-1 core tests.  The patch failed core unit tests.

-1 contrib tests.  The patch failed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/235/testReport/
Release audit warnings: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/235/artifact/trunk/patchprocess/releaseAuditDiffWarnings.txt
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/235/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/235/artifact/trunk/build/test/checkstyle-errors.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/235/console

This message is automatically generated.

 Shuffle should be secure
 

 Key: MAPREDUCE-1026
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1026
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
  Components: security
Reporter: Owen O'Malley
Assignee: Boris Shkolnik
 Attachments: MAPREDUCE-1026-1.patch, MAPREDUCE-1026-2.patch, 
 MAPREDUCE-1026-3.patch, MAPREDUCE-1026.patch, MAPREDUCE-1026.patch


 Since the user's data is available via http from the TaskTrackers, we should 
 require a job-specific secret to access it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1026) Shuffle should be secure

2009-11-09 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12775250#action_12775250
 ] 

Hadoop QA commented on MAPREDUCE-1026:
--

-1 overall.  Here are the results of testing the latest attachment 
  
http://issues.apache.org/jira/secure/attachment/12424422/MAPREDUCE-1026-2.patch
  against trunk revision 834284.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 3 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

-1 javac.  The patch appears to cause tar ant target to fail.

-1 findbugs.  The patch appears to cause Findbugs to fail.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

-1 core tests.  The patch failed core unit tests.

-1 contrib tests.  The patch failed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/233/testReport/
Checkstyle results: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/233/artifact/trunk/build/test/checkstyle-errors.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/233/console

This message is automatically generated.

 Shuffle should be secure
 

 Key: MAPREDUCE-1026
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1026
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
  Components: security
Reporter: Owen O'Malley
Assignee: Boris Shkolnik
 Attachments: MAPREDUCE-1026-1.patch, MAPREDUCE-1026-2.patch, 
 MAPREDUCE-1026.patch, MAPREDUCE-1026.patch


 Since the user's data is available via http from the TaskTrackers, we should 
 require a job-specific secret to access it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1026) Shuffle should be secure

2009-11-04 Thread Kan Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12773596#action_12773596
 ] 

Kan Zhang commented on MAPREDUCE-1026:
--

@Devaraj
 Since the token will be used (later on in a separate jira) to bootstrap even 
 the task-TT mutual authentication
Are you talking about Task-TT heartbeats over RPC? For this connection, I 
suggest we use a separate key (in the format of Delegation token) that is 
generated by TT and given to Task just before it is launched. This way the key 
is known only to the local task and helps prevent Tasks running on other 
machines connecting this TT accidentally. In terms of implementation, TT can do 
this in the same way that NN does, e.g., instantiate a DelegationTokenHandler 
for generating Delegation token and couple it with RPC (no need to persist the 
MasterKey though).

 Shuffle should be secure
 

 Key: MAPREDUCE-1026
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1026
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
  Components: security
Reporter: Owen O'Malley
Assignee: Boris Shkolnik
 Attachments: MAPREDUCE-1026.patch, MAPREDUCE-1026.patch


 Since the user's data is available via http from the TaskTrackers, we should 
 require a job-specific secret to access it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1026) Shuffle should be secure

2009-11-04 Thread Kan Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12773599#action_12773599
 ] 

Kan Zhang commented on MAPREDUCE-1026:
--

 This way the key is known only to the local task
Also, no need to persist this key as part of the job. This key is just a 
runtime artifact of the Task and TT.

 Shuffle should be secure
 

 Key: MAPREDUCE-1026
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1026
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
  Components: security
Reporter: Owen O'Malley
Assignee: Boris Shkolnik
 Attachments: MAPREDUCE-1026.patch, MAPREDUCE-1026.patch


 Since the user's data is available via http from the TaskTrackers, we should 
 require a job-specific secret to access it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1026) Shuffle should be secure

2009-11-04 Thread Devaraj Das (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12773611#action_12773611
 ] 

Devaraj Das commented on MAPREDUCE-1026:


Kan the RPC port on the TaskTracker is supposed to be bound to only localhost. 
So others outside the node in question shouldn't be able to do RPC. 
But lets keep that discussion to a separate jira. 

 Shuffle should be secure
 

 Key: MAPREDUCE-1026
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1026
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
  Components: security
Reporter: Owen O'Malley
Assignee: Boris Shkolnik
 Attachments: MAPREDUCE-1026.patch, MAPREDUCE-1026.patch


 Since the user's data is available via http from the TaskTrackers, we should 
 require a job-specific secret to access it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1026) Shuffle should be secure

2009-11-04 Thread Devaraj Das (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12773701#action_12773701
 ] 

Devaraj Das commented on MAPREDUCE-1026:


Looked at the patch some more. Few more comments:
1) The tasktracker needs to maintain a mapping from JobIDs to job-tokens
2) The call to localizeJobTokenFile should be done before the call to 
taskController.initializeJob(context) in the TaskTracker.localizeJob method. 
Could the localizeJobTokenFile be called within TaskTracker.localizeJobFiles
3) Minor: for the request/response HTTP headers, make the first character upper 
case
4) HMacUtil could override the equals method and put in logic for comapring two 
HMacUtil objects, instead of defining verifyHash.
5) The Comp class in StoreKeys.java seems to be unused. StoreKeys could be 
Writable (as opposed to having to define load/store methods)

For the case where a reduce task fails due to the TaskTracker(s) not being 
authentic, we probably need care. Two things might happen - the JobTracker 
might get enough notifications from other reduces in the system, and it might 
just decide to re-execute the map. The other situation is what is bothering me 
- the reduce task would kill itself after a certain threshold number of trials. 
This would be bad. IIRC it is not predictable which one could happen first.

 Shuffle should be secure
 

 Key: MAPREDUCE-1026
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1026
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
  Components: security
Reporter: Owen O'Malley
Assignee: Boris Shkolnik
 Attachments: MAPREDUCE-1026.patch, MAPREDUCE-1026.patch


 Since the user's data is available via http from the TaskTrackers, we should 
 require a job-specific secret to access it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1026) Shuffle should be secure

2009-11-03 Thread Devaraj Das (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12773396#action_12773396
 ] 

Devaraj Das commented on MAPREDUCE-1026:


Looked at the patch in brief. Some first level comments:
1) Remove the method setJobTokenFile from JobConf. This is really a TT-Task 
configuration.
2) It probably makes sense to have the task read the configuration from the 
localized file directly. Since the token will be used (later on in a separate 
jira) to bootstrap even the task-TT mutual authentication, it it better to 
check permissions on the localized file before trusting the key. The other 
option is to have the task read it from the hdfs.. 
3) What happens if the shuffle fails due to authentication problems? Maybe that 
needs to be handled specially w.r.t things like fetch failure notifications, 
and the reduce task killing itself after some trials..
4) The JobTracker should create the job-token file during running initTasks for 
the job in question.

 Shuffle should be secure
 

 Key: MAPREDUCE-1026
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1026
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
  Components: security
Reporter: Owen O'Malley
Assignee: Boris Shkolnik
 Attachments: MAPREDUCE-1026.patch, MAPREDUCE-1026.patch


 Since the user's data is available via http from the TaskTrackers, we should 
 require a job-specific secret to access it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1026) Shuffle should be secure

2009-10-22 Thread Devaraj Das (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12769011#action_12769011
 ] 

Devaraj Das commented on MAPREDUCE-1026:


Actually, it probably makes sense to write the job token file during the job 
initialization. The other place is to do it in the submitJob RPC method but it 
would mean the RPC handler is blocked during the HDFS access.

 Shuffle should be secure
 

 Key: MAPREDUCE-1026
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1026
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
  Components: security
Reporter: Owen O'Malley
Assignee: Boris Shkolnik

 Since the user's data is available via http from the TaskTrackers, we should 
 require a job-specific secret to access it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1026) Shuffle should be secure

2009-09-25 Thread Owen O'Malley (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12759627#action_12759627
 ] 

Owen O'Malley commented on MAPREDUCE-1026:
--

To clarify, in this jira you intend to:

1. Use a job specific random key, which is included in the URL of the fetch.
2. Allow jobs to request encryption of the map output using a second job 
specific random key.  I assume the configuration boolean would be something 
like mapred.job.shuffle.encrypt. If the outputs are encrypted, I assume that we 
checksum the unencrypted data and include the checksum in the encryption.

Once you have done that, there isn't any motivation to pay for https. 

 Shuffle should be secure
 

 Key: MAPREDUCE-1026
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1026
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
  Components: security
Reporter: Owen O'Malley
Assignee: Devaraj Das

 Since the user's data is available via http from the TaskTrackers, we should 
 require a job-specific secret to access it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1026) Shuffle should be secure

2009-09-25 Thread Devaraj Das (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12759634#action_12759634
 ] 

Devaraj Das commented on MAPREDUCE-1026:


bq. 1. Use a job specific random key, which is included in the URL of the fetch.
Yes.
bq. 2. Allow jobs to request encryption of the map output using a second job 
specific random key. I assume the configuration boolean would be something like 
mapred.job.shuffle.encrypt.
Yes.

bq. If the outputs are encrypted, I assume that we checksum the unencrypted 
data and include the checksum in the encryption.
I am not sure whether this is required to be done. The encrypted bytes would be 
checksummed automatically as we write them to the disk. Do we need to build the 
extra logic of checksumming the unencrypted bytes (that might be a big deal 
when we have multiple map output spills that we finally merge at the end, and 
spill to disk). I propose we just live with the (auto) checksum of the 
encrypted bytes.

 Shuffle should be secure
 

 Key: MAPREDUCE-1026
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1026
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
  Components: security
Reporter: Owen O'Malley
Assignee: Devaraj Das

 Since the user's data is available via http from the TaskTrackers, we should 
 require a job-specific secret to access it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1026) Shuffle should be secure

2009-09-24 Thread Devaraj Das (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12759337#action_12759337
 ] 

Devaraj Das commented on MAPREDUCE-1026:


Summarizing some offline discussions:
1. Performance issues to do with 1.5 extra round trips to the TaskTracker for 
HTTP Digest authentication could be a significant cost when the map outputs are 
small.
2. Instead of that, can we do the following:
   2.1. Tasks authenticate to the TaskTrackers by simply passing the key in the 
URL. This doesn't cost us anything.
   2.2. Map tasks encrypts the final spill file on the map side when they are 
written to disk (and reducers decrypt them). This could be done using a key 
different from the shuffle key used in 2.1.
The idea is that at some point we anyway should have encrypted map outputs to 
have maximum security for the intermediate outputs. We can do that on-the-wire 
via https, or, have encrypted files. The latter should be much less costly when 
compared with the former. The point of having both 2.1 and 2.2 is to make the 
transfer very secure without introducing overheads to do with extra round trips 
for (digest) authentication.

Thoughts?

 Shuffle should be secure
 

 Key: MAPREDUCE-1026
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1026
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
  Components: security
Reporter: Owen O'Malley
Assignee: Devaraj Das

 Since the user's data is available via http from the TaskTrackers, we should 
 require a job-specific secret to access it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1026) Shuffle should be secure

2009-09-22 Thread Owen O'Malley (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12758452#action_12758452
 ] 

Owen O'Malley commented on MAPREDUCE-1026:
--

The JobClient should create a random key of 10 characters from [a-zA-Z0-9] and 
put it in the job conf as secret.mapred.job.shuffle.key. I'd propose that we 
add all secret keys in a sub-tree of the config key space (secret.*) so that 
the web ui can hide them. The reducer can include the key in the url and the 
TaskTracker can check to make sure it is correct.

 Shuffle should be secure
 

 Key: MAPREDUCE-1026
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1026
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: security
Reporter: Owen O'Malley
Assignee: Devaraj Das

 Since the user's data is available via http from the TaskTrackers, we should 
 require a job-specific secret to access it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1026) Shuffle should be secure

2009-09-22 Thread Jeff Hammerbacher (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12758477#action_12758477
 ] 

Jeff Hammerbacher commented on MAPREDUCE-1026:
--

Hey Owen (and probably Doug),

While we're here: how would this strategy change if map output was transferred 
to the reducers using Avro's RPC? Is there authentication in the handshake, and 
encryption (ssl?) for the data?

Just trying to educate myself for The Future (tm).

Thanks,
Jeff

 Shuffle should be secure
 

 Key: MAPREDUCE-1026
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1026
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: security
Reporter: Owen O'Malley
Assignee: Devaraj Das

 Since the user's data is available via http from the TaskTrackers, we should 
 require a job-specific secret to access it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.