[jira] Commented: (MAPREDUCE-1026) Shuffle should be secure
[ https://issues.apache.org/jira/browse/MAPREDUCE-1026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12789044#action_12789044 ] Aaron Kimball commented on MAPREDUCE-1026: -- I am finding a NullPointerException in Shuffle when I run things with the LocalJobRunner: {code} 09/12/10 16:08:58 WARN mapred.LocalJobRunner: job_local_0001 java.lang.NullPointerException at org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:108) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:358) at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:299) {code} {{reduceTask.getJobTokens()}} is returning null; I can't see anyplace in LocalJobRunner where the JobTokens object is being initialized. I think this patch is to blame? Shuffle should be secure Key: MAPREDUCE-1026 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1026 Project: Hadoop Map/Reduce Issue Type: Sub-task Components: security Reporter: Owen O'Malley Assignee: Boris Shkolnik Fix For: 0.22.0 Attachments: MAPREDUCE-1026-1.patch, MAPREDUCE-1026-12.patch, MAPREDUCE-1026-13.patch, MAPREDUCE-1026-14.patch, MAPREDUCE-1026-15.patch, MAPREDUCE-1026-2.patch, MAPREDUCE-1026-3.patch, MAPREDUCE-1026-7.patch, MAPREDUCE-1026-9.patch, MAPREDUCE-1026.patch, MAPREDUCE-1026.patch Since the user's data is available via http from the TaskTrackers, we should require a job-specific secret to access it. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1026) Shuffle should be secure
[ https://issues.apache.org/jira/browse/MAPREDUCE-1026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12789083#action_12789083 ] Devaraj Das commented on MAPREDUCE-1026: I don't think so. In the local mode, shuffle shouldn't be invoked at all... Shuffle should be secure Key: MAPREDUCE-1026 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1026 Project: Hadoop Map/Reduce Issue Type: Sub-task Components: security Reporter: Owen O'Malley Assignee: Boris Shkolnik Fix For: 0.22.0 Attachments: MAPREDUCE-1026-1.patch, MAPREDUCE-1026-12.patch, MAPREDUCE-1026-13.patch, MAPREDUCE-1026-14.patch, MAPREDUCE-1026-15.patch, MAPREDUCE-1026-2.patch, MAPREDUCE-1026-3.patch, MAPREDUCE-1026-7.patch, MAPREDUCE-1026-9.patch, MAPREDUCE-1026.patch, MAPREDUCE-1026.patch Since the user's data is available via http from the TaskTrackers, we should require a job-specific secret to access it. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1026) Shuffle should be secure
[ https://issues.apache.org/jira/browse/MAPREDUCE-1026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12784923#action_12784923 ] Hudson commented on MAPREDUCE-1026: --- Integrated in Hadoop-Mapreduce-trunk #162 (See [http://hudson.zones.apache.org/hudson/job/Hadoop-Mapreduce-trunk/162/]) Shuffle should be secure Key: MAPREDUCE-1026 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1026 Project: Hadoop Map/Reduce Issue Type: Sub-task Components: security Reporter: Owen O'Malley Assignee: Boris Shkolnik Fix For: 0.22.0 Attachments: MAPREDUCE-1026-1.patch, MAPREDUCE-1026-12.patch, MAPREDUCE-1026-13.patch, MAPREDUCE-1026-14.patch, MAPREDUCE-1026-15.patch, MAPREDUCE-1026-2.patch, MAPREDUCE-1026-3.patch, MAPREDUCE-1026-7.patch, MAPREDUCE-1026-9.patch, MAPREDUCE-1026.patch, MAPREDUCE-1026.patch Since the user's data is available via http from the TaskTrackers, we should require a job-specific secret to access it. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1026) Shuffle should be secure
[ https://issues.apache.org/jira/browse/MAPREDUCE-1026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12781512#action_12781512 ] Boris Shkolnik commented on MAPREDUCE-1026: --- created MAPREDUCE-1236 for LOG.isdebugenabled issue Shuffle should be secure Key: MAPREDUCE-1026 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1026 Project: Hadoop Map/Reduce Issue Type: Sub-task Components: security Reporter: Owen O'Malley Assignee: Boris Shkolnik Fix For: 0.22.0 Attachments: MAPREDUCE-1026-1.patch, MAPREDUCE-1026-12.patch, MAPREDUCE-1026-13.patch, MAPREDUCE-1026-14.patch, MAPREDUCE-1026-15.patch, MAPREDUCE-1026-2.patch, MAPREDUCE-1026-3.patch, MAPREDUCE-1026-7.patch, MAPREDUCE-1026-9.patch, MAPREDUCE-1026.patch, MAPREDUCE-1026.patch Since the user's data is available via http from the TaskTrackers, we should require a job-specific secret to access it. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1026) Shuffle should be secure
[ https://issues.apache.org/jira/browse/MAPREDUCE-1026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12780854#action_12780854 ] Devaraj Das commented on MAPREDUCE-1026: I missed some LOG.debug statements that creates string objects unnecessarily. We should make the LOGs conditional on 'if (isDebugEnabled)' in a separate jira. Shuffle should be secure Key: MAPREDUCE-1026 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1026 Project: Hadoop Map/Reduce Issue Type: Sub-task Components: security Reporter: Owen O'Malley Assignee: Boris Shkolnik Fix For: 0.22.0 Attachments: MAPREDUCE-1026-1.patch, MAPREDUCE-1026-12.patch, MAPREDUCE-1026-13.patch, MAPREDUCE-1026-14.patch, MAPREDUCE-1026-15.patch, MAPREDUCE-1026-2.patch, MAPREDUCE-1026-3.patch, MAPREDUCE-1026-7.patch, MAPREDUCE-1026-9.patch, MAPREDUCE-1026.patch, MAPREDUCE-1026.patch Since the user's data is available via http from the TaskTrackers, we should require a job-specific secret to access it. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1026) Shuffle should be secure
[ https://issues.apache.org/jira/browse/MAPREDUCE-1026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12780285#action_12780285 ] Hadoop QA commented on MAPREDUCE-1026: -- +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12425504/MAPREDUCE-1026-14.patch against trunk revision 881673. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 6 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/146/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/146/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/146/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/146/console This message is automatically generated. Shuffle should be secure Key: MAPREDUCE-1026 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1026 Project: Hadoop Map/Reduce Issue Type: Sub-task Components: security Reporter: Owen O'Malley Assignee: Boris Shkolnik Attachments: MAPREDUCE-1026-1.patch, MAPREDUCE-1026-12.patch, MAPREDUCE-1026-13.patch, MAPREDUCE-1026-14.patch, MAPREDUCE-1026-2.patch, MAPREDUCE-1026-3.patch, MAPREDUCE-1026-7.patch, MAPREDUCE-1026-9.patch, MAPREDUCE-1026.patch, MAPREDUCE-1026.patch Since the user's data is available via http from the TaskTrackers, we should require a job-specific secret to access it. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1026) Shuffle should be secure
[ https://issues.apache.org/jira/browse/MAPREDUCE-1026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12779817#action_12779817 ] Hadoop QA commented on MAPREDUCE-1026: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12425415/MAPREDUCE-1026-13.patch against trunk revision 881673. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 6 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 1 new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. -1 contrib tests. The patch failed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/251/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/251/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/251/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/251/console This message is automatically generated. Shuffle should be secure Key: MAPREDUCE-1026 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1026 Project: Hadoop Map/Reduce Issue Type: Sub-task Components: security Reporter: Owen O'Malley Assignee: Boris Shkolnik Attachments: MAPREDUCE-1026-1.patch, MAPREDUCE-1026-12.patch, MAPREDUCE-1026-13.patch, MAPREDUCE-1026-2.patch, MAPREDUCE-1026-3.patch, MAPREDUCE-1026-7.patch, MAPREDUCE-1026-9.patch, MAPREDUCE-1026.patch, MAPREDUCE-1026.patch Since the user's data is available via http from the TaskTrackers, we should require a job-specific secret to access it. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1026) Shuffle should be secure
[ https://issues.apache.org/jira/browse/MAPREDUCE-1026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12776260#action_12776260 ] Hadoop QA commented on MAPREDUCE-1026: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12424539/MAPREDUCE-1026-3.patch against trunk revision 834284. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. -1 release audit. The applied patch generated 161 release audit warnings (more than the trunk's current 159 warnings). -1 core tests. The patch failed core unit tests. -1 contrib tests. The patch failed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/235/testReport/ Release audit warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/235/artifact/trunk/patchprocess/releaseAuditDiffWarnings.txt Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/235/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/235/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/235/console This message is automatically generated. Shuffle should be secure Key: MAPREDUCE-1026 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1026 Project: Hadoop Map/Reduce Issue Type: Sub-task Components: security Reporter: Owen O'Malley Assignee: Boris Shkolnik Attachments: MAPREDUCE-1026-1.patch, MAPREDUCE-1026-2.patch, MAPREDUCE-1026-3.patch, MAPREDUCE-1026.patch, MAPREDUCE-1026.patch Since the user's data is available via http from the TaskTrackers, we should require a job-specific secret to access it. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1026) Shuffle should be secure
[ https://issues.apache.org/jira/browse/MAPREDUCE-1026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12775250#action_12775250 ] Hadoop QA commented on MAPREDUCE-1026: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12424422/MAPREDUCE-1026-2.patch against trunk revision 834284. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. -1 javac. The patch appears to cause tar ant target to fail. -1 findbugs. The patch appears to cause Findbugs to fail. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. -1 contrib tests. The patch failed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/233/testReport/ Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/233/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/233/console This message is automatically generated. Shuffle should be secure Key: MAPREDUCE-1026 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1026 Project: Hadoop Map/Reduce Issue Type: Sub-task Components: security Reporter: Owen O'Malley Assignee: Boris Shkolnik Attachments: MAPREDUCE-1026-1.patch, MAPREDUCE-1026-2.patch, MAPREDUCE-1026.patch, MAPREDUCE-1026.patch Since the user's data is available via http from the TaskTrackers, we should require a job-specific secret to access it. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1026) Shuffle should be secure
[ https://issues.apache.org/jira/browse/MAPREDUCE-1026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12773596#action_12773596 ] Kan Zhang commented on MAPREDUCE-1026: -- @Devaraj Since the token will be used (later on in a separate jira) to bootstrap even the task-TT mutual authentication Are you talking about Task-TT heartbeats over RPC? For this connection, I suggest we use a separate key (in the format of Delegation token) that is generated by TT and given to Task just before it is launched. This way the key is known only to the local task and helps prevent Tasks running on other machines connecting this TT accidentally. In terms of implementation, TT can do this in the same way that NN does, e.g., instantiate a DelegationTokenHandler for generating Delegation token and couple it with RPC (no need to persist the MasterKey though). Shuffle should be secure Key: MAPREDUCE-1026 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1026 Project: Hadoop Map/Reduce Issue Type: Sub-task Components: security Reporter: Owen O'Malley Assignee: Boris Shkolnik Attachments: MAPREDUCE-1026.patch, MAPREDUCE-1026.patch Since the user's data is available via http from the TaskTrackers, we should require a job-specific secret to access it. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1026) Shuffle should be secure
[ https://issues.apache.org/jira/browse/MAPREDUCE-1026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12773599#action_12773599 ] Kan Zhang commented on MAPREDUCE-1026: -- This way the key is known only to the local task Also, no need to persist this key as part of the job. This key is just a runtime artifact of the Task and TT. Shuffle should be secure Key: MAPREDUCE-1026 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1026 Project: Hadoop Map/Reduce Issue Type: Sub-task Components: security Reporter: Owen O'Malley Assignee: Boris Shkolnik Attachments: MAPREDUCE-1026.patch, MAPREDUCE-1026.patch Since the user's data is available via http from the TaskTrackers, we should require a job-specific secret to access it. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1026) Shuffle should be secure
[ https://issues.apache.org/jira/browse/MAPREDUCE-1026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12773611#action_12773611 ] Devaraj Das commented on MAPREDUCE-1026: Kan the RPC port on the TaskTracker is supposed to be bound to only localhost. So others outside the node in question shouldn't be able to do RPC. But lets keep that discussion to a separate jira. Shuffle should be secure Key: MAPREDUCE-1026 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1026 Project: Hadoop Map/Reduce Issue Type: Sub-task Components: security Reporter: Owen O'Malley Assignee: Boris Shkolnik Attachments: MAPREDUCE-1026.patch, MAPREDUCE-1026.patch Since the user's data is available via http from the TaskTrackers, we should require a job-specific secret to access it. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1026) Shuffle should be secure
[ https://issues.apache.org/jira/browse/MAPREDUCE-1026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12773701#action_12773701 ] Devaraj Das commented on MAPREDUCE-1026: Looked at the patch some more. Few more comments: 1) The tasktracker needs to maintain a mapping from JobIDs to job-tokens 2) The call to localizeJobTokenFile should be done before the call to taskController.initializeJob(context) in the TaskTracker.localizeJob method. Could the localizeJobTokenFile be called within TaskTracker.localizeJobFiles 3) Minor: for the request/response HTTP headers, make the first character upper case 4) HMacUtil could override the equals method and put in logic for comapring two HMacUtil objects, instead of defining verifyHash. 5) The Comp class in StoreKeys.java seems to be unused. StoreKeys could be Writable (as opposed to having to define load/store methods) For the case where a reduce task fails due to the TaskTracker(s) not being authentic, we probably need care. Two things might happen - the JobTracker might get enough notifications from other reduces in the system, and it might just decide to re-execute the map. The other situation is what is bothering me - the reduce task would kill itself after a certain threshold number of trials. This would be bad. IIRC it is not predictable which one could happen first. Shuffle should be secure Key: MAPREDUCE-1026 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1026 Project: Hadoop Map/Reduce Issue Type: Sub-task Components: security Reporter: Owen O'Malley Assignee: Boris Shkolnik Attachments: MAPREDUCE-1026.patch, MAPREDUCE-1026.patch Since the user's data is available via http from the TaskTrackers, we should require a job-specific secret to access it. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1026) Shuffle should be secure
[ https://issues.apache.org/jira/browse/MAPREDUCE-1026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12773396#action_12773396 ] Devaraj Das commented on MAPREDUCE-1026: Looked at the patch in brief. Some first level comments: 1) Remove the method setJobTokenFile from JobConf. This is really a TT-Task configuration. 2) It probably makes sense to have the task read the configuration from the localized file directly. Since the token will be used (later on in a separate jira) to bootstrap even the task-TT mutual authentication, it it better to check permissions on the localized file before trusting the key. The other option is to have the task read it from the hdfs.. 3) What happens if the shuffle fails due to authentication problems? Maybe that needs to be handled specially w.r.t things like fetch failure notifications, and the reduce task killing itself after some trials.. 4) The JobTracker should create the job-token file during running initTasks for the job in question. Shuffle should be secure Key: MAPREDUCE-1026 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1026 Project: Hadoop Map/Reduce Issue Type: Sub-task Components: security Reporter: Owen O'Malley Assignee: Boris Shkolnik Attachments: MAPREDUCE-1026.patch, MAPREDUCE-1026.patch Since the user's data is available via http from the TaskTrackers, we should require a job-specific secret to access it. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1026) Shuffle should be secure
[ https://issues.apache.org/jira/browse/MAPREDUCE-1026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12769011#action_12769011 ] Devaraj Das commented on MAPREDUCE-1026: Actually, it probably makes sense to write the job token file during the job initialization. The other place is to do it in the submitJob RPC method but it would mean the RPC handler is blocked during the HDFS access. Shuffle should be secure Key: MAPREDUCE-1026 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1026 Project: Hadoop Map/Reduce Issue Type: Sub-task Components: security Reporter: Owen O'Malley Assignee: Boris Shkolnik Since the user's data is available via http from the TaskTrackers, we should require a job-specific secret to access it. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1026) Shuffle should be secure
[ https://issues.apache.org/jira/browse/MAPREDUCE-1026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12759627#action_12759627 ] Owen O'Malley commented on MAPREDUCE-1026: -- To clarify, in this jira you intend to: 1. Use a job specific random key, which is included in the URL of the fetch. 2. Allow jobs to request encryption of the map output using a second job specific random key. I assume the configuration boolean would be something like mapred.job.shuffle.encrypt. If the outputs are encrypted, I assume that we checksum the unencrypted data and include the checksum in the encryption. Once you have done that, there isn't any motivation to pay for https. Shuffle should be secure Key: MAPREDUCE-1026 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1026 Project: Hadoop Map/Reduce Issue Type: Sub-task Components: security Reporter: Owen O'Malley Assignee: Devaraj Das Since the user's data is available via http from the TaskTrackers, we should require a job-specific secret to access it. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1026) Shuffle should be secure
[ https://issues.apache.org/jira/browse/MAPREDUCE-1026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12759634#action_12759634 ] Devaraj Das commented on MAPREDUCE-1026: bq. 1. Use a job specific random key, which is included in the URL of the fetch. Yes. bq. 2. Allow jobs to request encryption of the map output using a second job specific random key. I assume the configuration boolean would be something like mapred.job.shuffle.encrypt. Yes. bq. If the outputs are encrypted, I assume that we checksum the unencrypted data and include the checksum in the encryption. I am not sure whether this is required to be done. The encrypted bytes would be checksummed automatically as we write them to the disk. Do we need to build the extra logic of checksumming the unencrypted bytes (that might be a big deal when we have multiple map output spills that we finally merge at the end, and spill to disk). I propose we just live with the (auto) checksum of the encrypted bytes. Shuffle should be secure Key: MAPREDUCE-1026 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1026 Project: Hadoop Map/Reduce Issue Type: Sub-task Components: security Reporter: Owen O'Malley Assignee: Devaraj Das Since the user's data is available via http from the TaskTrackers, we should require a job-specific secret to access it. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1026) Shuffle should be secure
[ https://issues.apache.org/jira/browse/MAPREDUCE-1026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12759337#action_12759337 ] Devaraj Das commented on MAPREDUCE-1026: Summarizing some offline discussions: 1. Performance issues to do with 1.5 extra round trips to the TaskTracker for HTTP Digest authentication could be a significant cost when the map outputs are small. 2. Instead of that, can we do the following: 2.1. Tasks authenticate to the TaskTrackers by simply passing the key in the URL. This doesn't cost us anything. 2.2. Map tasks encrypts the final spill file on the map side when they are written to disk (and reducers decrypt them). This could be done using a key different from the shuffle key used in 2.1. The idea is that at some point we anyway should have encrypted map outputs to have maximum security for the intermediate outputs. We can do that on-the-wire via https, or, have encrypted files. The latter should be much less costly when compared with the former. The point of having both 2.1 and 2.2 is to make the transfer very secure without introducing overheads to do with extra round trips for (digest) authentication. Thoughts? Shuffle should be secure Key: MAPREDUCE-1026 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1026 Project: Hadoop Map/Reduce Issue Type: Sub-task Components: security Reporter: Owen O'Malley Assignee: Devaraj Das Since the user's data is available via http from the TaskTrackers, we should require a job-specific secret to access it. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1026) Shuffle should be secure
[ https://issues.apache.org/jira/browse/MAPREDUCE-1026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12758452#action_12758452 ] Owen O'Malley commented on MAPREDUCE-1026: -- The JobClient should create a random key of 10 characters from [a-zA-Z0-9] and put it in the job conf as secret.mapred.job.shuffle.key. I'd propose that we add all secret keys in a sub-tree of the config key space (secret.*) so that the web ui can hide them. The reducer can include the key in the url and the TaskTracker can check to make sure it is correct. Shuffle should be secure Key: MAPREDUCE-1026 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1026 Project: Hadoop Map/Reduce Issue Type: Improvement Components: security Reporter: Owen O'Malley Assignee: Devaraj Das Since the user's data is available via http from the TaskTrackers, we should require a job-specific secret to access it. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1026) Shuffle should be secure
[ https://issues.apache.org/jira/browse/MAPREDUCE-1026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12758477#action_12758477 ] Jeff Hammerbacher commented on MAPREDUCE-1026: -- Hey Owen (and probably Doug), While we're here: how would this strategy change if map output was transferred to the reducers using Avro's RPC? Is there authentication in the handshake, and encryption (ssl?) for the data? Just trying to educate myself for The Future (tm). Thanks, Jeff Shuffle should be secure Key: MAPREDUCE-1026 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1026 Project: Hadoop Map/Reduce Issue Type: Improvement Components: security Reporter: Owen O'Malley Assignee: Devaraj Das Since the user's data is available via http from the TaskTrackers, we should require a job-specific secret to access it. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.