[jira] [Updated] (SPARK-5682) Add encrypted shuffle in spark
[ https://issues.apache.org/jira/browse/SPARK-5682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin updated SPARK-5682: --- Labels: (was: stc) > Add encrypted shuffle in spark > -- > > Key: SPARK-5682 > URL: https://issues.apache.org/jira/browse/SPARK-5682 > Project: Spark > Issue Type: New Feature > Components: Shuffle >Reporter: liyunzhang_intel > Attachments: Design Document of Encrypted Spark > Shuffle_20150209.docx, Design Document of Encrypted Spark > Shuffle_20150318.docx, Design Document of Encrypted Spark > Shuffle_20150402.docx, Design Document of Encrypted Spark > Shuffle_20150506.docx > > > Encrypted shuffle is enabled in hadoop 2.6 which make the process of shuffle > data safer. This feature is necessary in spark. AES is a specification for > the encryption of electronic data. There are 5 common modes in AES. CTR is > one of the modes. We use two codec JceAesCtrCryptoCodec and > OpensslAesCtrCryptoCodec to enable spark encrypted shuffle which is also used > in hadoop encrypted shuffle. JceAesCtrypoCodec uses encrypted algorithms jdk > provides while OpensslAesCtrCryptoCodec uses encrypted algorithms openssl > provides. > Because ugi credential info is used in the process of encrypted shuffle, we > first enable encrypted shuffle on spark-on-yarn framework. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-5682) Add encrypted shuffle in spark
[ https://issues.apache.org/jira/browse/SPARK-5682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Luciano Resende updated SPARK-5682: --- Labels: stc (was: ) > Add encrypted shuffle in spark > -- > > Key: SPARK-5682 > URL: https://issues.apache.org/jira/browse/SPARK-5682 > Project: Spark > Issue Type: New Feature > Components: Shuffle >Reporter: liyunzhang_intel > Labels: stc > Attachments: Design Document of Encrypted Spark > Shuffle_20150209.docx, Design Document of Encrypted Spark > Shuffle_20150318.docx, Design Document of Encrypted Spark > Shuffle_20150402.docx, Design Document of Encrypted Spark > Shuffle_20150506.docx > > > Encrypted shuffle is enabled in hadoop 2.6 which make the process of shuffle > data safer. This feature is necessary in spark. AES is a specification for > the encryption of electronic data. There are 5 common modes in AES. CTR is > one of the modes. We use two codec JceAesCtrCryptoCodec and > OpensslAesCtrCryptoCodec to enable spark encrypted shuffle which is also used > in hadoop encrypted shuffle. JceAesCtrypoCodec uses encrypted algorithms jdk > provides while OpensslAesCtrCryptoCodec uses encrypted algorithms openssl > provides. > Because ugi credential info is used in the process of encrypted shuffle, we > first enable encrypted shuffle on spark-on-yarn framework. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-5682) Add encrypted shuffle in spark
[ https://issues.apache.org/jira/browse/SPARK-5682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] liyunzhang_intel updated SPARK-5682: Attachment: Design Document of Encrypted Spark Shuffle_20150506.docx Design Document of Encrypted Spark Shuffle_20150506.docx is the latest design doc. > Add encrypted shuffle in spark > -- > > Key: SPARK-5682 > URL: https://issues.apache.org/jira/browse/SPARK-5682 > Project: Spark > Issue Type: New Feature > Components: Shuffle >Reporter: liyunzhang_intel > Attachments: Design Document of Encrypted Spark > Shuffle_20150209.docx, Design Document of Encrypted Spark > Shuffle_20150318.docx, Design Document of Encrypted Spark > Shuffle_20150402.docx, Design Document of Encrypted Spark > Shuffle_20150506.docx > > > Encrypted shuffle is enabled in hadoop 2.6 which make the process of shuffle > data safer. This feature is necessary in spark. AES is a specification for > the encryption of electronic data. There are 5 common modes in AES. CTR is > one of the modes. We use two codec JceAesCtrCryptoCodec and > OpensslAesCtrCryptoCodec to enable spark encrypted shuffle which is also used > in hadoop encrypted shuffle. JceAesCtrypoCodec uses encrypted algorithms jdk > provides while OpensslAesCtrCryptoCodec uses encrypted algorithms openssl > provides. > Because ugi credential info is used in the process of encrypted shuffle, we > first enable encrypted shuffle on spark-on-yarn framework. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-5682) Add encrypted shuffle in spark
[ https://issues.apache.org/jira/browse/SPARK-5682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] liyunzhang_intel updated SPARK-5682: Attachment: Design Document of Encrypted Spark Shuffle_20150402.docx > Add encrypted shuffle in spark > -- > > Key: SPARK-5682 > URL: https://issues.apache.org/jira/browse/SPARK-5682 > Project: Spark > Issue Type: New Feature > Components: Shuffle >Reporter: liyunzhang_intel > Attachments: Design Document of Encrypted Spark > Shuffle_20150209.docx, Design Document of Encrypted Spark > Shuffle_20150318.docx, Design Document of Encrypted Spark > Shuffle_20150402.docx > > > Encrypted shuffle is enabled in hadoop 2.6 which make the process of shuffle > data safer. This feature is necessary in spark. AES is a specification for > the encryption of electronic data. There are 5 common modes in AES. CTR is > one of the modes. We use two codec JceAesCtrCryptoCodec and > OpensslAesCtrCryptoCodec to enable spark encrypted shuffle which is also used > in hadoop encrypted shuffle. JceAesCtrypoCodec uses encrypted algorithms jdk > provides while OpensslAesCtrCryptoCodec uses encrypted algorithms openssl > provides. > Because ugi credential info is used in the process of encrypted shuffle, we > first enable encrypted shuffle on spark-on-yarn framework. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-5682) Add encrypted shuffle in spark
[ https://issues.apache.org/jira/browse/SPARK-5682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] liyunzhang_intel updated SPARK-5682: Attachment: (was: Design Document of Encrypted Spark Shuffle_20150401.docx) > Add encrypted shuffle in spark > -- > > Key: SPARK-5682 > URL: https://issues.apache.org/jira/browse/SPARK-5682 > Project: Spark > Issue Type: New Feature > Components: Shuffle >Reporter: liyunzhang_intel > Attachments: Design Document of Encrypted Spark > Shuffle_20150209.docx, Design Document of Encrypted Spark > Shuffle_20150318.docx, Design Document of Encrypted Spark > Shuffle_20150402.docx > > > Encrypted shuffle is enabled in hadoop 2.6 which make the process of shuffle > data safer. This feature is necessary in spark. AES is a specification for > the encryption of electronic data. There are 5 common modes in AES. CTR is > one of the modes. We use two codec JceAesCtrCryptoCodec and > OpensslAesCtrCryptoCodec to enable spark encrypted shuffle which is also used > in hadoop encrypted shuffle. JceAesCtrypoCodec uses encrypted algorithms jdk > provides while OpensslAesCtrCryptoCodec uses encrypted algorithms openssl > provides. > Because ugi credential info is used in the process of encrypted shuffle, we > first enable encrypted shuffle on spark-on-yarn framework. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-5682) Add encrypted shuffle in spark
[ https://issues.apache.org/jira/browse/SPARK-5682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] liyunzhang_intel updated SPARK-5682: Attachment: Design Document of Encrypted Spark Shuffle_20150401.docx > Add encrypted shuffle in spark > -- > > Key: SPARK-5682 > URL: https://issues.apache.org/jira/browse/SPARK-5682 > Project: Spark > Issue Type: New Feature > Components: Shuffle >Reporter: liyunzhang_intel > Attachments: Design Document of Encrypted Spark > Shuffle_20150209.docx, Design Document of Encrypted Spark > Shuffle_20150318.docx, Design Document of Encrypted Spark > Shuffle_20150401.docx > > > Encrypted shuffle is enabled in hadoop 2.6 which make the process of shuffle > data safer. This feature is necessary in spark. AES is a specification for > the encryption of electronic data. There are 5 common modes in AES. CTR is > one of the modes. We use two codec JceAesCtrCryptoCodec and > OpensslAesCtrCryptoCodec to enable spark encrypted shuffle which is also used > in hadoop encrypted shuffle. JceAesCtrypoCodec uses encrypted algorithms jdk > provides while OpensslAesCtrCryptoCodec uses encrypted algorithms openssl > provides. > Because ugi credential info is used in the process of encrypted shuffle, we > first enable encrypted shuffle on spark-on-yarn framework. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-5682) Add encrypted shuffle in spark
[ https://issues.apache.org/jira/browse/SPARK-5682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] liyunzhang_intel updated SPARK-5682: Description: Encrypted shuffle is enabled in hadoop 2.6 which make the process of shuffle data safer. This feature is necessary in spark. AES is a specification for the encryption of electronic data. There are 5 common modes in AES. CTR is one of the modes. We use two codec JceAesCtrCryptoCodec and OpensslAesCtrCryptoCodec to enable spark encrypted shuffle which is also used in hadoop encrypted shuffle. JceAesCtrypoCodec uses encrypted algorithms jdk provides while OpensslAesCtrCryptoCodec uses encrypted algorithms openssl provides. Because ugi credential info is used in the process of encrypted shuffle, we first enable encrypted shuffle on spark-on-yarn framework. was:Encrypted shuffle is enabled in hadoop 2.6 which make the process of shuffle data safer. This feature is necessary in spark. We reuse hadoop encrypted shuffle feature to spark and because ugi credential info is necessary in encrypted shuffle, we first enable encrypted shuffle on spark-on-yarn framework. > Add encrypted shuffle in spark > -- > > Key: SPARK-5682 > URL: https://issues.apache.org/jira/browse/SPARK-5682 > Project: Spark > Issue Type: New Feature > Components: Shuffle >Reporter: liyunzhang_intel > Attachments: Design Document of Encrypted Spark > Shuffle_20150209.docx, Design Document of Encrypted Spark > Shuffle_20150318.docx > > > Encrypted shuffle is enabled in hadoop 2.6 which make the process of shuffle > data safer. This feature is necessary in spark. AES is a specification for > the encryption of electronic data. There are 5 common modes in AES. CTR is > one of the modes. We use two codec JceAesCtrCryptoCodec and > OpensslAesCtrCryptoCodec to enable spark encrypted shuffle which is also used > in hadoop encrypted shuffle. JceAesCtrypoCodec uses encrypted algorithms jdk > provides while OpensslAesCtrCryptoCodec uses encrypted algorithms openssl > provides. > Because ugi credential info is used in the process of encrypted shuffle, we > first enable encrypted shuffle on spark-on-yarn framework. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-5682) Add encrypted shuffle in spark
[ https://issues.apache.org/jira/browse/SPARK-5682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] liyunzhang_intel updated SPARK-5682: Attachment: Design Document of Encrypted Spark Shuffle_20150318.docx [~srowen], i have submitted new design doc-Design Document of Encrypted Spark Shuffle_20150318 and also submitted newest code to pull request. In this submit, following big changes are made: * Delete hadoop2.6 profile. We don't depend on hadoop 2.6 because I add crypto classes like CryptoInputStream.scala,CryptoOutputStream.scala and so on in core module org.apache.Spark.crypto package. * AES is a specification for the encryption of electronic data. There are 5 common modes in AES. CTR is one of the modes. We use two codec JceAesCtrCryptoCodec and OpensslAesCtrCryptoCodec to enable spark encrypted shuffle which is also used in hadoop encrypted shuffle. JceAesCtrypoCodec uses encrypted algorithms jdk provides while OpensslAesCtrCryptoCodec uses encrypted algorithms openssl provides. In current code, we only implement JceAesCtrypoCodec and will implement OpensslAesCtrCryptoCodec later. How to test? * download code from https://github.com/kellyzly/spark * build : mvn package -Pyarn -Dyarn.version=2.6.0 -Phadoop-2.4 -Dhadoop.version=2.6.0 -Phive -DskipTests * when need enable encrypted shuffle, add following in conf/spark-defaults.conf spark.encrypted.shuffle true spark.job.encrypted-intermediate-data true spark.security.crypto.cipher.suite AES/CTR/NoPadding spark.security.crypto.codec.classes.aes.ctr.nopadding org.apache.spark.crypto.JceAesCtrCryptoCodec * start master and work: sbin/start-all.sh * edit SparkPi source code to worldcount, run wordcount ** ./bin/spark-submit --class org.apache.spark.examples.SparkPi --master yarn-cluster --num-executors 3 --driver-memory 1g --executor-memory 1g --executor-cores 1 examples/target/my.spark-examples_2.10-1.3.0-SNAPSHOT.jar ** ./bin/spark-submit --class org.apache.spark.examples.SparkPi --master yarn-client --num-executors 3 --driver-memory 1g --executor-memory 1g --executor-cores 1 examples/target/my.spark-examples_2.10-1.3.0-SNAPSHOT.jar > Add encrypted shuffle in spark > -- > > Key: SPARK-5682 > URL: https://issues.apache.org/jira/browse/SPARK-5682 > Project: Spark > Issue Type: New Feature > Components: Shuffle >Reporter: liyunzhang_intel > Attachments: Design Document of Encrypted Spark > Shuffle_20150209.docx, Design Document of Encrypted Spark > Shuffle_20150318.docx > > > Encrypted shuffle is enabled in hadoop 2.6 which make the process of shuffle > data safer. This feature is necessary in spark. We reuse hadoop encrypted > shuffle feature to spark and because ugi credential info is necessary in > encrypted shuffle, we first enable encrypted shuffle on spark-on-yarn > framework. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-5682) Add encrypted shuffle in spark
[ https://issues.apache.org/jira/browse/SPARK-5682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] liyunzhang_intel updated SPARK-5682: Summary: Add encrypted shuffle in spark (was: Reuse hadoop encrypted shuffle algorithm to enable spark encrypted shuffle) > Add encrypted shuffle in spark > -- > > Key: SPARK-5682 > URL: https://issues.apache.org/jira/browse/SPARK-5682 > Project: Spark > Issue Type: New Feature > Components: Shuffle >Reporter: liyunzhang_intel > Attachments: Design Document of Encrypted Spark Shuffle_20150209.docx > > > Encrypted shuffle is enabled in hadoop 2.6 which make the process of shuffle > data safer. This feature is necessary in spark. We reuse hadoop encrypted > shuffle feature to spark and because ugi credential info is necessary in > encrypted shuffle, we first enable encrypted shuffle on spark-on-yarn > framework. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org