[jira] [Commented] (SPARK-5682) Add encrypted shuffle in spark

2016-11-09 Thread Krish Dey (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15652217#comment-15652217
 ] 

Krish Dey commented on SPARK-5682:
--

The method still seems to be the same as it is. Doesn't this to be changed to 
accommodate encryption of spill to disk?

public UnsafeSorterSpillWriter(BlockManager blockManager, int fileBufferSize, 
ShuffleWriteMetrics writeMetrics, int numRecordsToWrite) throws IOException {
final Tuple2 spilledFileInfo =  
blockManager.diskBlockManager().createTempLocalBlock();
this.file = spilledFileInfo._2();
this.blockId = spilledFileInfo._1();
this.numRecordsToWrite = numRecordsToWrite;
// Unfortunately, we need a serializer instance in order to construct a 
DiskBlockObjectWriter.
// Our write path doesn't actually use this serializer (since we end up 
calling the `write()`
// OutputStream methods), but DiskBlockObjectWriter still calls some 
methods on it. To work
// around this, we pass a dummy no-op serializer.
writer = blockManager.getDiskWriter(
  blockId, file, DummySerializerInstance.INSTANCE, fileBufferSize, 
writeMetrics);
// Write the number of records
writeIntToBuffer(numRecordsToWrite, 0);
writer.write(writeBuffer, 0, 4);
  }


> Add encrypted shuffle in spark
> --
>
> Key: SPARK-5682
> URL: https://issues.apache.org/jira/browse/SPARK-5682
> Project: Spark
>  Issue Type: New Feature
>  Components: Shuffle
>Reporter: liyunzhang_intel
>Assignee: Ferdinand Xu
> Fix For: 2.1.0
>
> Attachments: Design Document of Encrypted Spark 
> Shuffle_20150209.docx, Design Document of Encrypted Spark 
> Shuffle_20150318.docx, Design Document of Encrypted Spark 
> Shuffle_20150402.docx, Design Document of Encrypted Spark 
> Shuffle_20150506.docx
>
>
> Encrypted shuffle is enabled in hadoop 2.6 which make the process of shuffle 
> data safer. This feature is necessary in spark. AES  is a specification for 
> the encryption of electronic data. There are 5 common modes in AES. CTR is 
> one of the modes. We use two codec JceAesCtrCryptoCodec and 
> OpensslAesCtrCryptoCodec to enable spark encrypted shuffle which is also used 
> in hadoop encrypted shuffle. JceAesCtrypoCodec uses encrypted algorithms  jdk 
> provides while OpensslAesCtrCryptoCodec uses encrypted algorithms  openssl 
> provides. 
> Because ugi credential info is used in the process of encrypted shuffle, we 
> first enable encrypted shuffle on spark-on-yarn framework.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-5682) Add encrypted shuffle in spark

2015-11-18 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15012710#comment-15012710
 ] 

Apache Spark commented on SPARK-5682:
-

User 'winningsix' has created a pull request for this issue:
https://github.com/apache/spark/pull/8880

> Add encrypted shuffle in spark
> --
>
> Key: SPARK-5682
> URL: https://issues.apache.org/jira/browse/SPARK-5682
> Project: Spark
>  Issue Type: New Feature
>  Components: Shuffle
>Reporter: liyunzhang_intel
> Attachments: Design Document of Encrypted Spark 
> Shuffle_20150209.docx, Design Document of Encrypted Spark 
> Shuffle_20150318.docx, Design Document of Encrypted Spark 
> Shuffle_20150402.docx, Design Document of Encrypted Spark 
> Shuffle_20150506.docx
>
>
> Encrypted shuffle is enabled in hadoop 2.6 which make the process of shuffle 
> data safer. This feature is necessary in spark. AES  is a specification for 
> the encryption of electronic data. There are 5 common modes in AES. CTR is 
> one of the modes. We use two codec JceAesCtrCryptoCodec and 
> OpensslAesCtrCryptoCodec to enable spark encrypted shuffle which is also used 
> in hadoop encrypted shuffle. JceAesCtrypoCodec uses encrypted algorithms  jdk 
> provides while OpensslAesCtrCryptoCodec uses encrypted algorithms  openssl 
> provides. 
> Because ugi credential info is used in the process of encrypted shuffle, we 
> first enable encrypted shuffle on spark-on-yarn framework.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-5682) Add encrypted shuffle in spark

2015-11-11 Thread Ferdinand Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15000526#comment-15000526
 ] 

Ferdinand Xu commented on SPARK-5682:
-

Thank you for your question. The key is generated by key gen which is instanced 
by specified keygen algorithm. The part of work is available in the method 
CryptoConf#initSparkShuffleCredentials. More detailed information is available 
in the PR(https://github.com/apache/spark/pull/8880). And for the IV part, we 
are using Chimera(https://github.com/intel-hadoop/chimera) as an external 
library in the latest 
PR(https://github.com/intel-hadoop/chimera/blob/master/src/main/java/com/intel/chimera/JceAesCtrCryptoCodec.java#L70
 and 
https://github.com/intel-hadoop/chimera/blob/master/src/main/java/com/intel/chimera/OpensslAesCtrCryptoCodec.java#L81).
 You can also deep into the code about how IV is calculated by counter and 
initial 
IV(https://github.com/intel-hadoop/chimera/blob/master/src/main/java/com/intel/chimera/AesCtrCryptoCodec.java#L42).
 The initial IV is generated by security random.

> Add encrypted shuffle in spark
> --
>
> Key: SPARK-5682
> URL: https://issues.apache.org/jira/browse/SPARK-5682
> Project: Spark
>  Issue Type: New Feature
>  Components: Shuffle
>Reporter: liyunzhang_intel
> Attachments: Design Document of Encrypted Spark 
> Shuffle_20150209.docx, Design Document of Encrypted Spark 
> Shuffle_20150318.docx, Design Document of Encrypted Spark 
> Shuffle_20150402.docx, Design Document of Encrypted Spark 
> Shuffle_20150506.docx
>
>
> Encrypted shuffle is enabled in hadoop 2.6 which make the process of shuffle 
> data safer. This feature is necessary in spark. AES  is a specification for 
> the encryption of electronic data. There are 5 common modes in AES. CTR is 
> one of the modes. We use two codec JceAesCtrCryptoCodec and 
> OpensslAesCtrCryptoCodec to enable spark encrypted shuffle which is also used 
> in hadoop encrypted shuffle. JceAesCtrypoCodec uses encrypted algorithms  jdk 
> provides while OpensslAesCtrCryptoCodec uses encrypted algorithms  openssl 
> provides. 
> Because ugi credential info is used in the process of encrypted shuffle, we 
> first enable encrypted shuffle on spark-on-yarn framework.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-5682) Add encrypted shuffle in spark

2015-11-10 Thread Mike Yoder (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14999418#comment-14999418
 ] 

Mike Yoder commented on SPARK-5682:
---

One quick question about AES/CTR.  This cipher mode has many nice properties, 
but is only safe to use when the key/IV pair are _never_ reused. What 
assurances do you have that the key/IV aren't reused in your scheme?  (I 
skimmed the doc, but didn't see an obvious answer; please forgive me if the 
answer was in there.)


> Add encrypted shuffle in spark
> --
>
> Key: SPARK-5682
> URL: https://issues.apache.org/jira/browse/SPARK-5682
> Project: Spark
>  Issue Type: New Feature
>  Components: Shuffle
>Reporter: liyunzhang_intel
> Attachments: Design Document of Encrypted Spark 
> Shuffle_20150209.docx, Design Document of Encrypted Spark 
> Shuffle_20150318.docx, Design Document of Encrypted Spark 
> Shuffle_20150402.docx, Design Document of Encrypted Spark 
> Shuffle_20150506.docx
>
>
> Encrypted shuffle is enabled in hadoop 2.6 which make the process of shuffle 
> data safer. This feature is necessary in spark. AES  is a specification for 
> the encryption of electronic data. There are 5 common modes in AES. CTR is 
> one of the modes. We use two codec JceAesCtrCryptoCodec and 
> OpensslAesCtrCryptoCodec to enable spark encrypted shuffle which is also used 
> in hadoop encrypted shuffle. JceAesCtrypoCodec uses encrypted algorithms  jdk 
> provides while OpensslAesCtrCryptoCodec uses encrypted algorithms  openssl 
> provides. 
> Because ugi credential info is used in the process of encrypted shuffle, we 
> first enable encrypted shuffle on spark-on-yarn framework.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-5682) Add encrypted shuffle in spark

2015-07-02 Thread liyunzhang_intel (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14611547#comment-14611547
 ] 

liyunzhang_intel commented on SPARK-5682:
-

[~hujiayin]: thanks for your comment.

This feature is not based on hadooop2.6.  it is based on hadoop2.6 in original 
design. In the latest design doc(20150506), It shows that now there are two 
ways to implement encrypted shuffle in spark. Currently we only implement it on 
spark-on-yarn framework.  One is based on [Chimera(Chimera is a project which 
strips code related to CryptoInputStream/CryptoOutputStream from Hadoop to 
facilitate AES-NI based data encryption in other 
projects)|https://github.com/intel-hadoop/chimera](see 
https://github.com/apache/spark/pull/5307). In the other way,we implement all 
the crypto classes like CryptoInputStream/CryptoOutputStream in scala under 
core/src/main/scala/org/apache/spark/crypto/ package(see 
https://github.com/apache/spark/pull/4491).

For the problem of importing hadoop api in spark, if the interface of hadoop 
class is public and stable,it can be use in spark.
in 
https://hadoop.apache.org/docs/current/api/org/apache/hadoop/classification/InterfaceStability.html,
 it says:
{quote}
Incompatible changes must not be made to classes marked as stable.
{quote}
which means when a class is marked stable, later release will not change it.





 Add encrypted shuffle in spark
 --

 Key: SPARK-5682
 URL: https://issues.apache.org/jira/browse/SPARK-5682
 Project: Spark
  Issue Type: New Feature
  Components: Shuffle
Reporter: liyunzhang_intel
 Attachments: Design Document of Encrypted Spark 
 Shuffle_20150209.docx, Design Document of Encrypted Spark 
 Shuffle_20150318.docx, Design Document of Encrypted Spark 
 Shuffle_20150402.docx, Design Document of Encrypted Spark 
 Shuffle_20150506.docx


 Encrypted shuffle is enabled in hadoop 2.6 which make the process of shuffle 
 data safer. This feature is necessary in spark. AES  is a specification for 
 the encryption of electronic data. There are 5 common modes in AES. CTR is 
 one of the modes. We use two codec JceAesCtrCryptoCodec and 
 OpensslAesCtrCryptoCodec to enable spark encrypted shuffle which is also used 
 in hadoop encrypted shuffle. JceAesCtrypoCodec uses encrypted algorithms  jdk 
 provides while OpensslAesCtrCryptoCodec uses encrypted algorithms  openssl 
 provides. 
 Because ugi credential info is used in the process of encrypted shuffle, we 
 first enable encrypted shuffle on spark-on-yarn framework.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-5682) Add encrypted shuffle in spark

2015-07-02 Thread hujiayin (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14611527#comment-14611527
 ] 

hujiayin commented on SPARK-5682:
-

steps were added to encode and decode the data, the performance will not be 
fast than before, in the same time, codes also have security issue, for example 
save the plain text in configuration file and finally used as the part of the 
key

in the same time, the feature based on hadoop 2.6, it is the limitation, that 
is why i said reply on hadoop

though it is public stable, however, you cannot ensure if the api will not be 
changed since it was not the comercial software.


 Add encrypted shuffle in spark
 --

 Key: SPARK-5682
 URL: https://issues.apache.org/jira/browse/SPARK-5682
 Project: Spark
  Issue Type: New Feature
  Components: Shuffle
Reporter: liyunzhang_intel
 Attachments: Design Document of Encrypted Spark 
 Shuffle_20150209.docx, Design Document of Encrypted Spark 
 Shuffle_20150318.docx, Design Document of Encrypted Spark 
 Shuffle_20150402.docx, Design Document of Encrypted Spark 
 Shuffle_20150506.docx


 Encrypted shuffle is enabled in hadoop 2.6 which make the process of shuffle 
 data safer. This feature is necessary in spark. AES  is a specification for 
 the encryption of electronic data. There are 5 common modes in AES. CTR is 
 one of the modes. We use two codec JceAesCtrCryptoCodec and 
 OpensslAesCtrCryptoCodec to enable spark encrypted shuffle which is also used 
 in hadoop encrypted shuffle. JceAesCtrypoCodec uses encrypted algorithms  jdk 
 provides while OpensslAesCtrCryptoCodec uses encrypted algorithms  openssl 
 provides. 
 Because ugi credential info is used in the process of encrypted shuffle, we 
 first enable encrypted shuffle on spark-on-yarn framework.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-5682) Add encrypted shuffle in spark

2015-07-02 Thread hujiayin (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14611553#comment-14611553
 ] 

hujiayin commented on SPARK-5682:
-

Since the encrypted shuffle in spark is focus on the common module, it maybe 
not good to use hadoop API. On the other side, the AES solution is a bit heavy 
to encode/decode the live steaming data. 

 Add encrypted shuffle in spark
 --

 Key: SPARK-5682
 URL: https://issues.apache.org/jira/browse/SPARK-5682
 Project: Spark
  Issue Type: New Feature
  Components: Shuffle
Reporter: liyunzhang_intel
 Attachments: Design Document of Encrypted Spark 
 Shuffle_20150209.docx, Design Document of Encrypted Spark 
 Shuffle_20150318.docx, Design Document of Encrypted Spark 
 Shuffle_20150402.docx, Design Document of Encrypted Spark 
 Shuffle_20150506.docx


 Encrypted shuffle is enabled in hadoop 2.6 which make the process of shuffle 
 data safer. This feature is necessary in spark. AES  is a specification for 
 the encryption of electronic data. There are 5 common modes in AES. CTR is 
 one of the modes. We use two codec JceAesCtrCryptoCodec and 
 OpensslAesCtrCryptoCodec to enable spark encrypted shuffle which is also used 
 in hadoop encrypted shuffle. JceAesCtrypoCodec uses encrypted algorithms  jdk 
 provides while OpensslAesCtrCryptoCodec uses encrypted algorithms  openssl 
 provides. 
 Because ugi credential info is used in the process of encrypted shuffle, we 
 first enable encrypted shuffle on spark-on-yarn framework.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-5682) Add encrypted shuffle in spark

2015-07-02 Thread liyunzhang_intel (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14612719#comment-14612719
 ] 

liyunzhang_intel commented on SPARK-5682:
-

[~hujiayin]:
{quote}
 the AES solution is a bit heavy to encode/decode the live steaming data.
{quote}
  Is there any other solution to encode/decode the live streaming data? please 
share your suggestion with us.

 Add encrypted shuffle in spark
 --

 Key: SPARK-5682
 URL: https://issues.apache.org/jira/browse/SPARK-5682
 Project: Spark
  Issue Type: New Feature
  Components: Shuffle
Reporter: liyunzhang_intel
 Attachments: Design Document of Encrypted Spark 
 Shuffle_20150209.docx, Design Document of Encrypted Spark 
 Shuffle_20150318.docx, Design Document of Encrypted Spark 
 Shuffle_20150402.docx, Design Document of Encrypted Spark 
 Shuffle_20150506.docx


 Encrypted shuffle is enabled in hadoop 2.6 which make the process of shuffle 
 data safer. This feature is necessary in spark. AES  is a specification for 
 the encryption of electronic data. There are 5 common modes in AES. CTR is 
 one of the modes. We use two codec JceAesCtrCryptoCodec and 
 OpensslAesCtrCryptoCodec to enable spark encrypted shuffle which is also used 
 in hadoop encrypted shuffle. JceAesCtrypoCodec uses encrypted algorithms  jdk 
 provides while OpensslAesCtrCryptoCodec uses encrypted algorithms  openssl 
 provides. 
 Because ugi credential info is used in the process of encrypted shuffle, we 
 first enable encrypted shuffle on spark-on-yarn framework.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-5682) Add encrypted shuffle in spark

2015-07-01 Thread liyunzhang_intel (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14611491#comment-14611491
 ] 

liyunzhang_intel commented on SPARK-5682:
-

[~hujiayin]:  thanks for your comment
{quote}
The solution relied on hadoop API and maybe downgrade the performance. 
{quote}
For The solution relied on hadoop API: You mean i use org.apache.hadoop.io.Text 
in [CommonConfigurationKeys 
|https://github.com/apache/spark/pull/4491/files#diff-a76c55d0e8f2e4e1a6cb5848826585fe].
  
But i have different idea for this:
{code}
@Stringable
@InterfaceAudience.Public
@InterfaceStability.Stable
public class Text extends BinaryComparable
org.apache.hadoop.io.Text  
{code}

it shows that org.apache.hadoop.io.Text  is stable which means the interfaces 
it provides will be not changed a lot in the  later release.

For downgrade the performance: have you any test results to show this?
 



 Add encrypted shuffle in spark
 --

 Key: SPARK-5682
 URL: https://issues.apache.org/jira/browse/SPARK-5682
 Project: Spark
  Issue Type: New Feature
  Components: Shuffle
Reporter: liyunzhang_intel
 Attachments: Design Document of Encrypted Spark 
 Shuffle_20150209.docx, Design Document of Encrypted Spark 
 Shuffle_20150318.docx, Design Document of Encrypted Spark 
 Shuffle_20150402.docx, Design Document of Encrypted Spark 
 Shuffle_20150506.docx


 Encrypted shuffle is enabled in hadoop 2.6 which make the process of shuffle 
 data safer. This feature is necessary in spark. AES  is a specification for 
 the encryption of electronic data. There are 5 common modes in AES. CTR is 
 one of the modes. We use two codec JceAesCtrCryptoCodec and 
 OpensslAesCtrCryptoCodec to enable spark encrypted shuffle which is also used 
 in hadoop encrypted shuffle. JceAesCtrypoCodec uses encrypted algorithms  jdk 
 provides while OpensslAesCtrCryptoCodec uses encrypted algorithms  openssl 
 provides. 
 Because ugi credential info is used in the process of encrypted shuffle, we 
 first enable encrypted shuffle on spark-on-yarn framework.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-5682) Add encrypted shuffle in spark

2015-07-01 Thread hujiayin (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14611328#comment-14611328
 ] 

hujiayin commented on SPARK-5682:
-

The solution relied on hadoop API and maybe downgrade the performance. The AES 
algorithm was used in block data encryption in many case. I think rc4 could be 
used to encode the stream or a simple solution with a authentication header 
could be used.   : )

 Add encrypted shuffle in spark
 --

 Key: SPARK-5682
 URL: https://issues.apache.org/jira/browse/SPARK-5682
 Project: Spark
  Issue Type: New Feature
  Components: Shuffle
Reporter: liyunzhang_intel
 Attachments: Design Document of Encrypted Spark 
 Shuffle_20150209.docx, Design Document of Encrypted Spark 
 Shuffle_20150318.docx, Design Document of Encrypted Spark 
 Shuffle_20150402.docx, Design Document of Encrypted Spark 
 Shuffle_20150506.docx


 Encrypted shuffle is enabled in hadoop 2.6 which make the process of shuffle 
 data safer. This feature is necessary in spark. AES  is a specification for 
 the encryption of electronic data. There are 5 common modes in AES. CTR is 
 one of the modes. We use two codec JceAesCtrCryptoCodec and 
 OpensslAesCtrCryptoCodec to enable spark encrypted shuffle which is also used 
 in hadoop encrypted shuffle. JceAesCtrypoCodec uses encrypted algorithms  jdk 
 provides while OpensslAesCtrCryptoCodec uses encrypted algorithms  openssl 
 provides. 
 Because ugi credential info is used in the process of encrypted shuffle, we 
 first enable encrypted shuffle on spark-on-yarn framework.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-5682) Add encrypted shuffle in spark

2015-04-01 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14390157#comment-14390157
 ] 

Apache Spark commented on SPARK-5682:
-

User 'kellyzly' has created a pull request for this issue:
https://github.com/apache/spark/pull/5307

 Add encrypted shuffle in spark
 --

 Key: SPARK-5682
 URL: https://issues.apache.org/jira/browse/SPARK-5682
 Project: Spark
  Issue Type: New Feature
  Components: Shuffle
Reporter: liyunzhang_intel
 Attachments: Design Document of Encrypted Spark 
 Shuffle_20150209.docx, Design Document of Encrypted Spark 
 Shuffle_20150318.docx


 Encrypted shuffle is enabled in hadoop 2.6 which make the process of shuffle 
 data safer. This feature is necessary in spark. AES  is a specification for 
 the encryption of electronic data. There are 5 common modes in AES. CTR is 
 one of the modes. We use two codec JceAesCtrCryptoCodec and 
 OpensslAesCtrCryptoCodec to enable spark encrypted shuffle which is also used 
 in hadoop encrypted shuffle. JceAesCtrypoCodec uses encrypted algorithms  jdk 
 provides while OpensslAesCtrCryptoCodec uses encrypted algorithms  openssl 
 provides. 
 Because ugi credential info is used in the process of encrypted shuffle, we 
 first enable encrypted shuffle on spark-on-yarn framework.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-5682) Add encrypted shuffle in spark

2015-04-01 Thread liyunzhang_intel (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14390235#comment-14390235
 ] 

liyunzhang_intel commented on SPARK-5682:
-

Hi all:
  Now there are two methods to implement SPARK-5682(Add encrypted shuffle in 
spark).
  Method1: use [Chimera|https://github.com/intel-hadoop/chimera](Chimera is a 
project which strips code related to CryptoInputStream/CryptoOutputStream from 
Hadoop to facilitate AES-NI based data encryption in other projects.) to 
implement spark encrypted shuffle.  Pull request: 
https://github.com/apache/spark/pull/5307.
  Method2: Add crypto package in spark-core module and add 
CryptoInputStream.scala and CryptoOutputStream.scala and so on in this package. 
Pull request : https://github.com/apache/spark/pull/4491.

Which one is better?  Any advices/guidance are welcome!


 Add encrypted shuffle in spark
 --

 Key: SPARK-5682
 URL: https://issues.apache.org/jira/browse/SPARK-5682
 Project: Spark
  Issue Type: New Feature
  Components: Shuffle
Reporter: liyunzhang_intel
 Attachments: Design Document of Encrypted Spark 
 Shuffle_20150209.docx, Design Document of Encrypted Spark 
 Shuffle_20150318.docx


 Encrypted shuffle is enabled in hadoop 2.6 which make the process of shuffle 
 data safer. This feature is necessary in spark. AES  is a specification for 
 the encryption of electronic data. There are 5 common modes in AES. CTR is 
 one of the modes. We use two codec JceAesCtrCryptoCodec and 
 OpensslAesCtrCryptoCodec to enable spark encrypted shuffle which is also used 
 in hadoop encrypted shuffle. JceAesCtrypoCodec uses encrypted algorithms  jdk 
 provides while OpensslAesCtrCryptoCodec uses encrypted algorithms  openssl 
 provides. 
 Because ugi credential info is used in the process of encrypted shuffle, we 
 first enable encrypted shuffle on spark-on-yarn framework.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-5682) Add encrypted shuffle in spark

2015-03-22 Thread liyunzhang_intel (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14375430#comment-14375430
 ] 

liyunzhang_intel commented on SPARK-5682:
-

Hi all:I have a question:is there any api in spark like 
getInstance(className:String):AnyRef ? I saw org.apache.spark.sql.hive 
.thriftserver.ReflectionUtils.scala, but not provide getInstance function in 
it. 
Now I wrote a function getInstance:
[org.apache.spark.crypto.CryptoCodec#getInstance|https://github.com/kellyzly/spark/blob/8a74eea7d926507242c50b28c56962b1f1db256a/core/src/main/scala/org/apache/spark/crypto/CryptoCodec.scala/#l49]:
 in my getInstance(className:String), i judge classname with 
JceAesCtrCryptoCodec and OpensslAesCtrCryptoCodec and if the name equals 
JceAesCtrCryptoCodec, it creates the instance by 
scala.reflect.runtime.universe api. The code can be better like following way 
but I do not know how to write it. If some knows, please tell me.
{code}
   def getInstance1(className:String):AnyRef={
   val m = universe.runtimeMirror(getClass.getClassLoader)
   var classLoader: ClassLoader = Thread.currentThread.getContextClassLoader
   val aClass:Class[_] =   Class.forName(className, true, classLoader)
   val aType: scala.reflect.api.TypeTags.TypeTag =  // how to write this 
line?
   val classCryptoCodec = universe.typeOf[aType]
 .typeSymbol.asClass
   val cm = m.reflectClass(classCryptoCodec)
   val ctor = universe.typeOf[aType].declaration(
 universe.nme.CONSTRUCTOR).asMethod
   val ctorm = cm.reflectConstructor(ctor)
   val p = ctorm()
   p
 }
{code}


 Add encrypted shuffle in spark
 --

 Key: SPARK-5682
 URL: https://issues.apache.org/jira/browse/SPARK-5682
 Project: Spark
  Issue Type: New Feature
  Components: Shuffle
Reporter: liyunzhang_intel
 Attachments: Design Document of Encrypted Spark 
 Shuffle_20150209.docx, Design Document of Encrypted Spark 
 Shuffle_20150318.docx


 Encrypted shuffle is enabled in hadoop 2.6 which make the process of shuffle 
 data safer. This feature is necessary in spark. We reuse hadoop encrypted 
 shuffle feature to spark and because ugi credential info is necessary in 
 encrypted shuffle, we first enable encrypted shuffle on spark-on-yarn 
 framework.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-5682) Add encrypted shuffle in spark

2015-03-19 Thread liyunzhang_intel (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14368621#comment-14368621
 ] 

liyunzhang_intel commented on SPARK-5682:
-

sorry to reply so late. If run spark on yarn mode, no need to start master and 
worker? i'm a newbie to spark.Any guidance are welcome

 Add encrypted shuffle in spark
 --

 Key: SPARK-5682
 URL: https://issues.apache.org/jira/browse/SPARK-5682
 Project: Spark
  Issue Type: New Feature
  Components: Shuffle
Reporter: liyunzhang_intel
 Attachments: Design Document of Encrypted Spark Shuffle_20150209.docx


 Encrypted shuffle is enabled in hadoop 2.6 which make the process of shuffle 
 data safer. This feature is necessary in spark. We reuse hadoop encrypted 
 shuffle feature to spark and because ugi credential info is necessary in 
 encrypted shuffle, we first enable encrypted shuffle on spark-on-yarn 
 framework.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-5682) Add encrypted shuffle in spark

2015-03-19 Thread liyunzhang_intel (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14370631#comment-14370631
 ] 

liyunzhang_intel commented on SPARK-5682:
-

Hi all:
  There are two methods to not use encrypted classes like 
 CryptoInputStream.java provided in hadoop2.6:
*  Isolagte code like 
 CryptoInputStream/CryptoOutputStream from hadoop source code to a seperated 
lib and put it to maven repository and let other projects to depend on.
* Write CryptoInputStream/CryptoOutputStream and so on in spark code. 
 
 Both method has its advantages and disadvantages:
*  Method1: 
Disadvantage:It need hadoop project or spark community to review the code 
in the seperated lib.
  After all the code is finished reviewed and the seperated lib has been put to 
maven repository, we will introduce it to spark code. Maybe it need much time.
Advantage: After the recognition of hadoop or spark community, we can 
ensure the quality of the code. If  some fixes about crypto classes are made, 
someone update the seperated lib and then we modify the maven dependence in 
spark.
*  Method2:
Disadvantage: We need keep an eye on the later fixes about crypto classes 
are made in later hadoop release. If some changes, we need update the code in 
scala.
  Advantage: No dependance to other lib. It's convenient for us to make some 
changes if it is really needed in spark.

For method1, my teammate is working on it. For method2, the code in the pull 
request is finished and are waited to review. Can anyone give me some advices?



 Add encrypted shuffle in spark
 --

 Key: SPARK-5682
 URL: https://issues.apache.org/jira/browse/SPARK-5682
 Project: Spark
  Issue Type: New Feature
  Components: Shuffle
Reporter: liyunzhang_intel
 Attachments: Design Document of Encrypted Spark 
 Shuffle_20150209.docx, Design Document of Encrypted Spark 
 Shuffle_20150318.docx


 Encrypted shuffle is enabled in hadoop 2.6 which make the process of shuffle 
 data safer. This feature is necessary in spark. We reuse hadoop encrypted 
 shuffle feature to spark and because ugi credential info is necessary in 
 encrypted shuffle, we first enable encrypted shuffle on spark-on-yarn 
 framework.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org