[ https://issues.apache.org/jira/browse/SPARK-5682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14370631#comment-14370631 ]
liyunzhang_intel edited comment on SPARK-5682 at 3/23/15 1:31 AM: ------------------------------------------------------------------ Hi all: There are two methods to not use encrypted classes like CryptoInputStream.java provided in hadoop2.6: * Isolate code like CryptoInputStream/CryptoOutputStream from hadoop source code to a seperated lib and put it to maven repository and let other projects to depend on. * Write CryptoInputStream/CryptoOutputStream and so on in spark code. Both method has its advantages and disadvantages: * Method1: Disadvantage:It need hadoop project or spark community to review the code in the seperated lib. After all the code is finished reviewed and the seperated lib has been put to maven repository, we will introduce it to spark code. Maybe it need much time. Advantage: After the recognition of hadoop or spark community, we can ensure the quality of the code. If some fixes about crypto classes are made, someone update the seperated lib and then we modify the maven dependence in spark. * Method2: Disadvantage: We need keep an eye on the later fixes about crypto classes are made in later hadoop release. If some changes, we need update the code in scala. Advantage: No dependance to other lib. It's convenient for us to make some changes if it is really needed in spark. For method1, my teammate is working on it. For method2, the code in the pull request is finished and are waited to review. Can anyone give me some advices? was (Author: kellyzly): Hi all: There are two methods to not use encrypted classes like CryptoInputStream.java provided in hadoop2.6: * Isolagte code like CryptoInputStream/CryptoOutputStream from hadoop source code to a seperated lib and put it to maven repository and let other projects to depend on. * Write CryptoInputStream/CryptoOutputStream and so on in spark code. Both method has its advantages and disadvantages: * Method1: Disadvantage:It need hadoop project or spark community to review the code in the seperated lib. After all the code is finished reviewed and the seperated lib has been put to maven repository, we will introduce it to spark code. Maybe it need much time. Advantage: After the recognition of hadoop or spark community, we can ensure the quality of the code. If some fixes about crypto classes are made, someone update the seperated lib and then we modify the maven dependence in spark. * Method2: Disadvantage: We need keep an eye on the later fixes about crypto classes are made in later hadoop release. If some changes, we need update the code in scala. Advantage: No dependance to other lib. It's convenient for us to make some changes if it is really needed in spark. For method1, my teammate is working on it. For method2, the code in the pull request is finished and are waited to review. Can anyone give me some advices? > Add encrypted shuffle in spark > ------------------------------ > > Key: SPARK-5682 > URL: https://issues.apache.org/jira/browse/SPARK-5682 > Project: Spark > Issue Type: New Feature > Components: Shuffle > Reporter: liyunzhang_intel > Attachments: Design Document of Encrypted Spark > Shuffle_20150209.docx, Design Document of Encrypted Spark > Shuffle_20150318.docx > > > Encrypted shuffle is enabled in hadoop 2.6 which make the process of shuffle > data safer. This feature is necessary in spark. We reuse hadoop encrypted > shuffle feature to spark and because ugi credential info is necessary in > encrypted shuffle, we first enable encrypted shuffle on spark-on-yarn > framework. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org