[jira] [Updated] (SPARK-5682) Add encrypted shuffle in spark

liyunzhang_intel (JIRA) Thu, 19 Mar 2015 00:56:12 -0700

     [ 
https://issues.apache.org/jira/browse/SPARK-5682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


liyunzhang_intel updated SPARK-5682:
------------------------------------
    Attachment: Design Document of Encrypted Spark Shuffle_20150318.docx

[~srowen], i have submitted new design doc-Design Document of Encrypted Spark 
Shuffle_20150318 and also submitted newest code to pull request. In this 
submit, following big changes are made:
* Delete hadoop2.6 profile. We don't depend on hadoop 2.6 because I add crypto 
classes like CryptoInputStream.scala,CryptoOutputStream.scala and so on in core 
module org.apache.Spark.crypto package.
* AES  is a specification for the encryption of electronic data. There are 5 
common modes in AES. CTR is one of the modes. We use two codec 
JceAesCtrCryptoCodec and OpensslAesCtrCryptoCodec to enable spark encrypted 
shuffle which is also used in hadoop encrypted shuffle. JceAesCtrypoCodec uses 
encrypted algorithms  jdk provides while OpensslAesCtrCryptoCodec uses 
encrypted algorithms  openssl provides. In current code, we only implement 
JceAesCtrypoCodec and will implement OpensslAesCtrCryptoCodec later.

How to test?
* download code from https://github.com/kellyzly/spark
* build : mvn package -Pyarn -Dyarn.version=2.6.0 -Phadoop-2.4  
-Dhadoop.version=2.6.0 -Phive -DskipTests
* when need enable encrypted shuffle, add following in  conf/spark-defaults.conf
spark.encrypted.shuffle          true
spark.job.encrypted-intermediate-data   true
spark.security.crypto.cipher.suite      AES/CTR/NoPadding
spark.security.crypto.codec.classes.aes.ctr.nopadding   
org.apache.spark.crypto.JceAesCtrCryptoCodec
* start master and work: sbin/start-all.sh
* edit SparkPi source code to worldcount, run wordcount
** ./bin/spark-submit --class org.apache.spark.examples.SparkPi --master 
yarn-cluster --num-executors 3     --driver-memory 1g     --executor-memory 1g  
   --executor-cores 1   
examples/target/my.spark-examples_2.10-1.3.0-SNAPSHOT.jar
** ./bin/spark-submit --class org.apache.spark.examples.SparkPi --master 
yarn-client --num-executors 3     --driver-memory 1g     --executor-memory 1g   
  --executor-cores 1   examples/target/my.spark-examples_2.10-1.3.0-SNAPSHOT.jar


> Add encrypted shuffle in spark
> ------------------------------
>
>                 Key: SPARK-5682
>                 URL: https://issues.apache.org/jira/browse/SPARK-5682
>             Project: Spark
>          Issue Type: New Feature
>          Components: Shuffle
>            Reporter: liyunzhang_intel
>         Attachments: Design Document of Encrypted Spark 
> Shuffle_20150209.docx, Design Document of Encrypted Spark 
> Shuffle_20150318.docx
>
>
> Encrypted shuffle is enabled in hadoop 2.6 which make the process of shuffle 
> data safer. This feature is necessary in spark. We reuse hadoop encrypted 
> shuffle feature to spark and because ugi credential info is necessary in 
> encrypted shuffle, we first enable encrypted shuffle on spark-on-yarn 
> framework.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-5682) Add encrypted shuffle in spark

Reply via email to