[jira] [Commented] (SPARK-5682) Add encrypted shuffle in spark
[ https://issues.apache.org/jira/browse/SPARK-5682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15652217#comment-15652217 ] Krish Dey commented on SPARK-5682: -- The method still seems to be the same as it is. Doesn't this to be changed to accommodate encryption of spill to disk? public UnsafeSorterSpillWriter(BlockManager blockManager, int fileBufferSize, ShuffleWriteMetrics writeMetrics, int numRecordsToWrite) throws IOException { final Tuple2 spilledFileInfo = blockManager.diskBlockManager().createTempLocalBlock(); this.file = spilledFileInfo._2(); this.blockId = spilledFileInfo._1(); this.numRecordsToWrite = numRecordsToWrite; // Unfortunately, we need a serializer instance in order to construct a DiskBlockObjectWriter. // Our write path doesn't actually use this serializer (since we end up calling the `write()` // OutputStream methods), but DiskBlockObjectWriter still calls some methods on it. To work // around this, we pass a dummy no-op serializer. writer = blockManager.getDiskWriter( blockId, file, DummySerializerInstance.INSTANCE, fileBufferSize, writeMetrics); // Write the number of records writeIntToBuffer(numRecordsToWrite, 0); writer.write(writeBuffer, 0, 4); } > Add encrypted shuffle in spark > -- > > Key: SPARK-5682 > URL: https://issues.apache.org/jira/browse/SPARK-5682 > Project: Spark > Issue Type: New Feature > Components: Shuffle >Reporter: liyunzhang_intel >Assignee: Ferdinand Xu > Fix For: 2.1.0 > > Attachments: Design Document of Encrypted Spark > Shuffle_20150209.docx, Design Document of Encrypted Spark > Shuffle_20150318.docx, Design Document of Encrypted Spark > Shuffle_20150402.docx, Design Document of Encrypted Spark > Shuffle_20150506.docx > > > Encrypted shuffle is enabled in hadoop 2.6 which make the process of shuffle > data safer. This feature is necessary in spark. AES is a specification for > the encryption of electronic data. There are 5 common modes in AES. CTR is > one of the modes. We use two codec JceAesCtrCryptoCodec and > OpensslAesCtrCryptoCodec to enable spark encrypted shuffle which is also used > in hadoop encrypted shuffle. JceAesCtrypoCodec uses encrypted algorithms jdk > provides while OpensslAesCtrCryptoCodec uses encrypted algorithms openssl > provides. > Because ugi credential info is used in the process of encrypted shuffle, we > first enable encrypted shuffle on spark-on-yarn framework. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-5682) Add encrypted shuffle in spark
[ https://issues.apache.org/jira/browse/SPARK-5682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15012710#comment-15012710 ] Apache Spark commented on SPARK-5682: - User 'winningsix' has created a pull request for this issue: https://github.com/apache/spark/pull/8880 > Add encrypted shuffle in spark > -- > > Key: SPARK-5682 > URL: https://issues.apache.org/jira/browse/SPARK-5682 > Project: Spark > Issue Type: New Feature > Components: Shuffle >Reporter: liyunzhang_intel > Attachments: Design Document of Encrypted Spark > Shuffle_20150209.docx, Design Document of Encrypted Spark > Shuffle_20150318.docx, Design Document of Encrypted Spark > Shuffle_20150402.docx, Design Document of Encrypted Spark > Shuffle_20150506.docx > > > Encrypted shuffle is enabled in hadoop 2.6 which make the process of shuffle > data safer. This feature is necessary in spark. AES is a specification for > the encryption of electronic data. There are 5 common modes in AES. CTR is > one of the modes. We use two codec JceAesCtrCryptoCodec and > OpensslAesCtrCryptoCodec to enable spark encrypted shuffle which is also used > in hadoop encrypted shuffle. JceAesCtrypoCodec uses encrypted algorithms jdk > provides while OpensslAesCtrCryptoCodec uses encrypted algorithms openssl > provides. > Because ugi credential info is used in the process of encrypted shuffle, we > first enable encrypted shuffle on spark-on-yarn framework. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-5682) Add encrypted shuffle in spark
[ https://issues.apache.org/jira/browse/SPARK-5682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15000526#comment-15000526 ] Ferdinand Xu commented on SPARK-5682: - Thank you for your question. The key is generated by key gen which is instanced by specified keygen algorithm. The part of work is available in the method CryptoConf#initSparkShuffleCredentials. More detailed information is available in the PR(https://github.com/apache/spark/pull/8880). And for the IV part, we are using Chimera(https://github.com/intel-hadoop/chimera) as an external library in the latest PR(https://github.com/intel-hadoop/chimera/blob/master/src/main/java/com/intel/chimera/JceAesCtrCryptoCodec.java#L70 and https://github.com/intel-hadoop/chimera/blob/master/src/main/java/com/intel/chimera/OpensslAesCtrCryptoCodec.java#L81). You can also deep into the code about how IV is calculated by counter and initial IV(https://github.com/intel-hadoop/chimera/blob/master/src/main/java/com/intel/chimera/AesCtrCryptoCodec.java#L42). The initial IV is generated by security random. > Add encrypted shuffle in spark > -- > > Key: SPARK-5682 > URL: https://issues.apache.org/jira/browse/SPARK-5682 > Project: Spark > Issue Type: New Feature > Components: Shuffle >Reporter: liyunzhang_intel > Attachments: Design Document of Encrypted Spark > Shuffle_20150209.docx, Design Document of Encrypted Spark > Shuffle_20150318.docx, Design Document of Encrypted Spark > Shuffle_20150402.docx, Design Document of Encrypted Spark > Shuffle_20150506.docx > > > Encrypted shuffle is enabled in hadoop 2.6 which make the process of shuffle > data safer. This feature is necessary in spark. AES is a specification for > the encryption of electronic data. There are 5 common modes in AES. CTR is > one of the modes. We use two codec JceAesCtrCryptoCodec and > OpensslAesCtrCryptoCodec to enable spark encrypted shuffle which is also used > in hadoop encrypted shuffle. JceAesCtrypoCodec uses encrypted algorithms jdk > provides while OpensslAesCtrCryptoCodec uses encrypted algorithms openssl > provides. > Because ugi credential info is used in the process of encrypted shuffle, we > first enable encrypted shuffle on spark-on-yarn framework. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-5682) Add encrypted shuffle in spark
[ https://issues.apache.org/jira/browse/SPARK-5682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14999418#comment-14999418 ] Mike Yoder commented on SPARK-5682: --- One quick question about AES/CTR. This cipher mode has many nice properties, but is only safe to use when the key/IV pair are _never_ reused. What assurances do you have that the key/IV aren't reused in your scheme? (I skimmed the doc, but didn't see an obvious answer; please forgive me if the answer was in there.) > Add encrypted shuffle in spark > -- > > Key: SPARK-5682 > URL: https://issues.apache.org/jira/browse/SPARK-5682 > Project: Spark > Issue Type: New Feature > Components: Shuffle >Reporter: liyunzhang_intel > Attachments: Design Document of Encrypted Spark > Shuffle_20150209.docx, Design Document of Encrypted Spark > Shuffle_20150318.docx, Design Document of Encrypted Spark > Shuffle_20150402.docx, Design Document of Encrypted Spark > Shuffle_20150506.docx > > > Encrypted shuffle is enabled in hadoop 2.6 which make the process of shuffle > data safer. This feature is necessary in spark. AES is a specification for > the encryption of electronic data. There are 5 common modes in AES. CTR is > one of the modes. We use two codec JceAesCtrCryptoCodec and > OpensslAesCtrCryptoCodec to enable spark encrypted shuffle which is also used > in hadoop encrypted shuffle. JceAesCtrypoCodec uses encrypted algorithms jdk > provides while OpensslAesCtrCryptoCodec uses encrypted algorithms openssl > provides. > Because ugi credential info is used in the process of encrypted shuffle, we > first enable encrypted shuffle on spark-on-yarn framework. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-5682) Add encrypted shuffle in spark
[ https://issues.apache.org/jira/browse/SPARK-5682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14612719#comment-14612719 ] liyunzhang_intel commented on SPARK-5682: - [~hujiayin]: {quote} the AES solution is a bit heavy to encode/decode the live steaming data. {quote} Is there any other solution to encode/decode the live streaming data? please share your suggestion with us. > Add encrypted shuffle in spark > -- > > Key: SPARK-5682 > URL: https://issues.apache.org/jira/browse/SPARK-5682 > Project: Spark > Issue Type: New Feature > Components: Shuffle >Reporter: liyunzhang_intel > Attachments: Design Document of Encrypted Spark > Shuffle_20150209.docx, Design Document of Encrypted Spark > Shuffle_20150318.docx, Design Document of Encrypted Spark > Shuffle_20150402.docx, Design Document of Encrypted Spark > Shuffle_20150506.docx > > > Encrypted shuffle is enabled in hadoop 2.6 which make the process of shuffle > data safer. This feature is necessary in spark. AES is a specification for > the encryption of electronic data. There are 5 common modes in AES. CTR is > one of the modes. We use two codec JceAesCtrCryptoCodec and > OpensslAesCtrCryptoCodec to enable spark encrypted shuffle which is also used > in hadoop encrypted shuffle. JceAesCtrypoCodec uses encrypted algorithms jdk > provides while OpensslAesCtrCryptoCodec uses encrypted algorithms openssl > provides. > Because ugi credential info is used in the process of encrypted shuffle, we > first enable encrypted shuffle on spark-on-yarn framework. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-5682) Add encrypted shuffle in spark
[ https://issues.apache.org/jira/browse/SPARK-5682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14611553#comment-14611553 ] hujiayin commented on SPARK-5682: - Since the encrypted shuffle in spark is focus on the common module, it maybe not good to use hadoop API. On the other side, the AES solution is a bit heavy to encode/decode the live steaming data. > Add encrypted shuffle in spark > -- > > Key: SPARK-5682 > URL: https://issues.apache.org/jira/browse/SPARK-5682 > Project: Spark > Issue Type: New Feature > Components: Shuffle >Reporter: liyunzhang_intel > Attachments: Design Document of Encrypted Spark > Shuffle_20150209.docx, Design Document of Encrypted Spark > Shuffle_20150318.docx, Design Document of Encrypted Spark > Shuffle_20150402.docx, Design Document of Encrypted Spark > Shuffle_20150506.docx > > > Encrypted shuffle is enabled in hadoop 2.6 which make the process of shuffle > data safer. This feature is necessary in spark. AES is a specification for > the encryption of electronic data. There are 5 common modes in AES. CTR is > one of the modes. We use two codec JceAesCtrCryptoCodec and > OpensslAesCtrCryptoCodec to enable spark encrypted shuffle which is also used > in hadoop encrypted shuffle. JceAesCtrypoCodec uses encrypted algorithms jdk > provides while OpensslAesCtrCryptoCodec uses encrypted algorithms openssl > provides. > Because ugi credential info is used in the process of encrypted shuffle, we > first enable encrypted shuffle on spark-on-yarn framework. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-5682) Add encrypted shuffle in spark
[ https://issues.apache.org/jira/browse/SPARK-5682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14611547#comment-14611547 ] liyunzhang_intel commented on SPARK-5682: - [~hujiayin]: thanks for your comment. This feature is not based on hadooop2.6. it is based on hadoop2.6 in original design. In the latest design doc(20150506), It shows that now there are two ways to implement encrypted shuffle in spark. Currently we only implement it on spark-on-yarn framework. One is based on [Chimera(Chimera is a project which strips code related to CryptoInputStream/CryptoOutputStream from Hadoop to facilitate AES-NI based data encryption in other projects)|https://github.com/intel-hadoop/chimera](see https://github.com/apache/spark/pull/5307). In the other way,we implement all the crypto classes like CryptoInputStream/CryptoOutputStream in scala under core/src/main/scala/org/apache/spark/crypto/ package(see https://github.com/apache/spark/pull/4491). For the problem of importing hadoop api in spark, if the interface of hadoop class is public and stable,it can be use in spark. in https://hadoop.apache.org/docs/current/api/org/apache/hadoop/classification/InterfaceStability.html, it says: {quote} Incompatible changes must not be made to classes marked as stable. {quote} which means when a class is marked stable, later release will not change it. > Add encrypted shuffle in spark > -- > > Key: SPARK-5682 > URL: https://issues.apache.org/jira/browse/SPARK-5682 > Project: Spark > Issue Type: New Feature > Components: Shuffle >Reporter: liyunzhang_intel > Attachments: Design Document of Encrypted Spark > Shuffle_20150209.docx, Design Document of Encrypted Spark > Shuffle_20150318.docx, Design Document of Encrypted Spark > Shuffle_20150402.docx, Design Document of Encrypted Spark > Shuffle_20150506.docx > > > Encrypted shuffle is enabled in hadoop 2.6 which make the process of shuffle > data safer. This feature is necessary in spark. AES is a specification for > the encryption of electronic data. There are 5 common modes in AES. CTR is > one of the modes. We use two codec JceAesCtrCryptoCodec and > OpensslAesCtrCryptoCodec to enable spark encrypted shuffle which is also used > in hadoop encrypted shuffle. JceAesCtrypoCodec uses encrypted algorithms jdk > provides while OpensslAesCtrCryptoCodec uses encrypted algorithms openssl > provides. > Because ugi credential info is used in the process of encrypted shuffle, we > first enable encrypted shuffle on spark-on-yarn framework. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-5682) Add encrypted shuffle in spark
[ https://issues.apache.org/jira/browse/SPARK-5682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14611527#comment-14611527 ] hujiayin commented on SPARK-5682: - steps were added to encode and decode the data, the performance will not be fast than before, in the same time, codes also have security issue, for example save the plain text in configuration file and finally used as the part of the key in the same time, the feature based on hadoop 2.6, it is the limitation, that is why i said reply on hadoop though it is public stable, however, you cannot ensure if the api will not be changed since it was not the comercial software. > Add encrypted shuffle in spark > -- > > Key: SPARK-5682 > URL: https://issues.apache.org/jira/browse/SPARK-5682 > Project: Spark > Issue Type: New Feature > Components: Shuffle >Reporter: liyunzhang_intel > Attachments: Design Document of Encrypted Spark > Shuffle_20150209.docx, Design Document of Encrypted Spark > Shuffle_20150318.docx, Design Document of Encrypted Spark > Shuffle_20150402.docx, Design Document of Encrypted Spark > Shuffle_20150506.docx > > > Encrypted shuffle is enabled in hadoop 2.6 which make the process of shuffle > data safer. This feature is necessary in spark. AES is a specification for > the encryption of electronic data. There are 5 common modes in AES. CTR is > one of the modes. We use two codec JceAesCtrCryptoCodec and > OpensslAesCtrCryptoCodec to enable spark encrypted shuffle which is also used > in hadoop encrypted shuffle. JceAesCtrypoCodec uses encrypted algorithms jdk > provides while OpensslAesCtrCryptoCodec uses encrypted algorithms openssl > provides. > Because ugi credential info is used in the process of encrypted shuffle, we > first enable encrypted shuffle on spark-on-yarn framework. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-5682) Add encrypted shuffle in spark
[ https://issues.apache.org/jira/browse/SPARK-5682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14611491#comment-14611491 ] liyunzhang_intel commented on SPARK-5682: - [~hujiayin]: thanks for your comment {quote} The solution relied on hadoop API and maybe downgrade the performance. {quote} For The solution relied on hadoop API: You mean i use org.apache.hadoop.io.Text in [CommonConfigurationKeys |https://github.com/apache/spark/pull/4491/files#diff-a76c55d0e8f2e4e1a6cb5848826585fe]. But i have different idea for this: {code} @Stringable @InterfaceAudience.Public @InterfaceStability.Stable public class Text extends BinaryComparable org.apache.hadoop.io.Text {code} it shows that org.apache.hadoop.io.Text is stable which means the interfaces it provides will be not changed a lot in the later release. For downgrade the performance: have you any test results to show this? > Add encrypted shuffle in spark > -- > > Key: SPARK-5682 > URL: https://issues.apache.org/jira/browse/SPARK-5682 > Project: Spark > Issue Type: New Feature > Components: Shuffle >Reporter: liyunzhang_intel > Attachments: Design Document of Encrypted Spark > Shuffle_20150209.docx, Design Document of Encrypted Spark > Shuffle_20150318.docx, Design Document of Encrypted Spark > Shuffle_20150402.docx, Design Document of Encrypted Spark > Shuffle_20150506.docx > > > Encrypted shuffle is enabled in hadoop 2.6 which make the process of shuffle > data safer. This feature is necessary in spark. AES is a specification for > the encryption of electronic data. There are 5 common modes in AES. CTR is > one of the modes. We use two codec JceAesCtrCryptoCodec and > OpensslAesCtrCryptoCodec to enable spark encrypted shuffle which is also used > in hadoop encrypted shuffle. JceAesCtrypoCodec uses encrypted algorithms jdk > provides while OpensslAesCtrCryptoCodec uses encrypted algorithms openssl > provides. > Because ugi credential info is used in the process of encrypted shuffle, we > first enable encrypted shuffle on spark-on-yarn framework. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-5682) Add encrypted shuffle in spark
[ https://issues.apache.org/jira/browse/SPARK-5682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14611328#comment-14611328 ] hujiayin commented on SPARK-5682: - The solution relied on hadoop API and maybe downgrade the performance. The AES algorithm was used in block data encryption in many case. I think rc4 could be used to encode the stream or a simple solution with a authentication header could be used. : ) > Add encrypted shuffle in spark > -- > > Key: SPARK-5682 > URL: https://issues.apache.org/jira/browse/SPARK-5682 > Project: Spark > Issue Type: New Feature > Components: Shuffle >Reporter: liyunzhang_intel > Attachments: Design Document of Encrypted Spark > Shuffle_20150209.docx, Design Document of Encrypted Spark > Shuffle_20150318.docx, Design Document of Encrypted Spark > Shuffle_20150402.docx, Design Document of Encrypted Spark > Shuffle_20150506.docx > > > Encrypted shuffle is enabled in hadoop 2.6 which make the process of shuffle > data safer. This feature is necessary in spark. AES is a specification for > the encryption of electronic data. There are 5 common modes in AES. CTR is > one of the modes. We use two codec JceAesCtrCryptoCodec and > OpensslAesCtrCryptoCodec to enable spark encrypted shuffle which is also used > in hadoop encrypted shuffle. JceAesCtrypoCodec uses encrypted algorithms jdk > provides while OpensslAesCtrCryptoCodec uses encrypted algorithms openssl > provides. > Because ugi credential info is used in the process of encrypted shuffle, we > first enable encrypted shuffle on spark-on-yarn framework. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-5682) Add encrypted shuffle in spark
[ https://issues.apache.org/jira/browse/SPARK-5682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14390235#comment-14390235 ] liyunzhang_intel commented on SPARK-5682: - Hi all: Now there are two methods to implement SPARK-5682(Add encrypted shuffle in spark). Method1: use [Chimera|https://github.com/intel-hadoop/chimera](Chimera is a project which strips code related to CryptoInputStream/CryptoOutputStream from Hadoop to facilitate AES-NI based data encryption in other projects.) to implement spark encrypted shuffle. Pull request: https://github.com/apache/spark/pull/5307. Method2: Add crypto package in spark-core module and add CryptoInputStream.scala and CryptoOutputStream.scala and so on in this package. Pull request : https://github.com/apache/spark/pull/4491. Which one is better? Any advices/guidance are welcome! > Add encrypted shuffle in spark > -- > > Key: SPARK-5682 > URL: https://issues.apache.org/jira/browse/SPARK-5682 > Project: Spark > Issue Type: New Feature > Components: Shuffle >Reporter: liyunzhang_intel > Attachments: Design Document of Encrypted Spark > Shuffle_20150209.docx, Design Document of Encrypted Spark > Shuffle_20150318.docx > > > Encrypted shuffle is enabled in hadoop 2.6 which make the process of shuffle > data safer. This feature is necessary in spark. AES is a specification for > the encryption of electronic data. There are 5 common modes in AES. CTR is > one of the modes. We use two codec JceAesCtrCryptoCodec and > OpensslAesCtrCryptoCodec to enable spark encrypted shuffle which is also used > in hadoop encrypted shuffle. JceAesCtrypoCodec uses encrypted algorithms jdk > provides while OpensslAesCtrCryptoCodec uses encrypted algorithms openssl > provides. > Because ugi credential info is used in the process of encrypted shuffle, we > first enable encrypted shuffle on spark-on-yarn framework. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-5682) Add encrypted shuffle in spark
[ https://issues.apache.org/jira/browse/SPARK-5682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14390157#comment-14390157 ] Apache Spark commented on SPARK-5682: - User 'kellyzly' has created a pull request for this issue: https://github.com/apache/spark/pull/5307 > Add encrypted shuffle in spark > -- > > Key: SPARK-5682 > URL: https://issues.apache.org/jira/browse/SPARK-5682 > Project: Spark > Issue Type: New Feature > Components: Shuffle >Reporter: liyunzhang_intel > Attachments: Design Document of Encrypted Spark > Shuffle_20150209.docx, Design Document of Encrypted Spark > Shuffle_20150318.docx > > > Encrypted shuffle is enabled in hadoop 2.6 which make the process of shuffle > data safer. This feature is necessary in spark. AES is a specification for > the encryption of electronic data. There are 5 common modes in AES. CTR is > one of the modes. We use two codec JceAesCtrCryptoCodec and > OpensslAesCtrCryptoCodec to enable spark encrypted shuffle which is also used > in hadoop encrypted shuffle. JceAesCtrypoCodec uses encrypted algorithms jdk > provides while OpensslAesCtrCryptoCodec uses encrypted algorithms openssl > provides. > Because ugi credential info is used in the process of encrypted shuffle, we > first enable encrypted shuffle on spark-on-yarn framework. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-5682) Add encrypted shuffle in spark
[ https://issues.apache.org/jira/browse/SPARK-5682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14375430#comment-14375430 ] liyunzhang_intel commented on SPARK-5682: - Hi all:I have a question:is there any api in spark like getInstance(className:String):AnyRef ? I saw org.apache.spark.sql.hive .thriftserver.ReflectionUtils.scala, but not provide getInstance function in it. Now I wrote a function getInstance: [org.apache.spark.crypto.CryptoCodec#getInstance|https://github.com/kellyzly/spark/blob/8a74eea7d926507242c50b28c56962b1f1db256a/core/src/main/scala/org/apache/spark/crypto/CryptoCodec.scala/#l49]: in my getInstance(className:String), i judge classname with "JceAesCtrCryptoCodec" and "OpensslAesCtrCryptoCodec" and if the name equals "JceAesCtrCryptoCodec", it creates the instance by scala.reflect.runtime.universe api. The code can be better like following way but I do not know how to write it. If some knows, please tell me. {code} def getInstance1(className:String):AnyRef={ val m = universe.runtimeMirror(getClass.getClassLoader) var classLoader: ClassLoader = Thread.currentThread.getContextClassLoader val aClass:Class[_] = Class.forName(className, true, classLoader) val aType: scala.reflect.api.TypeTags.TypeTag = // how to write this line? val classCryptoCodec = universe.typeOf[aType] .typeSymbol.asClass val cm = m.reflectClass(classCryptoCodec) val ctor = universe.typeOf[aType].declaration( universe.nme.CONSTRUCTOR).asMethod val ctorm = cm.reflectConstructor(ctor) val p = ctorm() p } {code} > Add encrypted shuffle in spark > -- > > Key: SPARK-5682 > URL: https://issues.apache.org/jira/browse/SPARK-5682 > Project: Spark > Issue Type: New Feature > Components: Shuffle >Reporter: liyunzhang_intel > Attachments: Design Document of Encrypted Spark > Shuffle_20150209.docx, Design Document of Encrypted Spark > Shuffle_20150318.docx > > > Encrypted shuffle is enabled in hadoop 2.6 which make the process of shuffle > data safer. This feature is necessary in spark. We reuse hadoop encrypted > shuffle feature to spark and because ugi credential info is necessary in > encrypted shuffle, we first enable encrypted shuffle on spark-on-yarn > framework. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-5682) Add encrypted shuffle in spark
[ https://issues.apache.org/jira/browse/SPARK-5682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14370631#comment-14370631 ] liyunzhang_intel commented on SPARK-5682: - Hi all: There are two methods to not use encrypted classes like CryptoInputStream.java provided in hadoop2.6: * Isolagte code like CryptoInputStream/CryptoOutputStream from hadoop source code to a seperated lib and put it to maven repository and let other projects to depend on. * Write CryptoInputStream/CryptoOutputStream and so on in spark code. Both method has its advantages and disadvantages: * Method1: Disadvantage:It need hadoop project or spark community to review the code in the seperated lib. After all the code is finished reviewed and the seperated lib has been put to maven repository, we will introduce it to spark code. Maybe it need much time. Advantage: After the recognition of hadoop or spark community, we can ensure the quality of the code. If some fixes about crypto classes are made, someone update the seperated lib and then we modify the maven dependence in spark. * Method2: Disadvantage: We need keep an eye on the later fixes about crypto classes are made in later hadoop release. If some changes, we need update the code in scala. Advantage: No dependance to other lib. It's convenient for us to make some changes if it is really needed in spark. For method1, my teammate is working on it. For method2, the code in the pull request is finished and are waited to review. Can anyone give me some advices? > Add encrypted shuffle in spark > -- > > Key: SPARK-5682 > URL: https://issues.apache.org/jira/browse/SPARK-5682 > Project: Spark > Issue Type: New Feature > Components: Shuffle >Reporter: liyunzhang_intel > Attachments: Design Document of Encrypted Spark > Shuffle_20150209.docx, Design Document of Encrypted Spark > Shuffle_20150318.docx > > > Encrypted shuffle is enabled in hadoop 2.6 which make the process of shuffle > data safer. This feature is necessary in spark. We reuse hadoop encrypted > shuffle feature to spark and because ugi credential info is necessary in > encrypted shuffle, we first enable encrypted shuffle on spark-on-yarn > framework. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-5682) Add encrypted shuffle in spark
[ https://issues.apache.org/jira/browse/SPARK-5682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14368621#comment-14368621 ] liyunzhang_intel commented on SPARK-5682: - sorry to reply so late. If run spark on yarn mode, no need to start master and worker? i'm a newbie to spark.Any guidance are welcome > Add encrypted shuffle in spark > -- > > Key: SPARK-5682 > URL: https://issues.apache.org/jira/browse/SPARK-5682 > Project: Spark > Issue Type: New Feature > Components: Shuffle >Reporter: liyunzhang_intel > Attachments: Design Document of Encrypted Spark Shuffle_20150209.docx > > > Encrypted shuffle is enabled in hadoop 2.6 which make the process of shuffle > data safer. This feature is necessary in spark. We reuse hadoop encrypted > shuffle feature to spark and because ugi credential info is necessary in > encrypted shuffle, we first enable encrypted shuffle on spark-on-yarn > framework. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org