[jira] [Commented] (SPARK-12333) Support shuffle spill encryption in Spark

2016-11-09 Thread Krish Dey (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15652263#comment-15652263
 ] 

Krish Dey commented on SPARK-12333:
---

The constructor still seems to be the same as it is. Doesn't this to be changed 
to accommodate encryption of spill to disk? Moreover passing the 
DummySerializerInstance it should be allowed to pass any Serializer

public UnsafeSorterSpillWriter(BlockManager blockManager, int fileBufferSize, 
ShuffleWriteMetrics writeMetrics, int numRecordsToWrite) throws IOException{
   final Tuple2 spilledFileInfo =
  blockManager.diskBlockManager().createTempLocalBlock();
this.file = spilledFileInfo._2();
this.blockId = spilledFileInfo._1();
this.numRecordsToWrite = numRecordsToWrite;
// Unfortunately, we need a serializer instance in order to construct a 
DiskBlockObjectWriter.
// Our write path doesn't actually use this serializer (since we end up 
calling the `write()`
// OutputStream methods), but DiskBlockObjectWriter still calls some 
methods on it. To work
// around this, we pass a dummy no-op serializer.
writer = blockManager.getDiskWriter(
  blockId, file, DummySerializerInstance.INSTANCE, fileBufferSize, 
writeMetrics);
// Write the number of records
writeIntToBuffer(numRecordsToWrite, 0);
writer.write(writeBuffer, 0, 4);
  }


> Support shuffle spill encryption in Spark
> -
>
> Key: SPARK-12333
> URL: https://issues.apache.org/jira/browse/SPARK-12333
> Project: Spark
>  Issue Type: New Feature
>  Components: Shuffle
>Reporter: Ferdinand Xu
>
> Like shuffle file encryption in SPARK-5682, spills data should also be 
> encrypted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-5682) Add encrypted shuffle in spark

2016-11-09 Thread Krish Dey (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15652217#comment-15652217
 ] 

Krish Dey edited comment on SPARK-5682 at 11/9/16 10:29 PM:


The constructor still seems to be the same as it is. Doesn't this to be changed 
to accommodate encryption of spill to disk?  Moreover passing the 
DummySerializerInstance it should be allowed to pass any Serializer

public UnsafeSorterSpillWriter(BlockManager blockManager, int fileBufferSize, 
ShuffleWriteMetrics writeMetrics, int numRecordsToWrite) throws IOException {
final Tuple2 spilledFileInfo =  
blockManager.diskBlockManager().createTempLocalBlock();
this.file = spilledFileInfo._2();
this.blockId = spilledFileInfo._1();
this.numRecordsToWrite = numRecordsToWrite;
// Unfortunately, we need a serializer instance in order to construct a 
DiskBlockObjectWriter.
// Our write path doesn't actually use this serializer (since we end up 
calling the `write()`
// OutputStream methods), but DiskBlockObjectWriter still calls some 
methods on it. To work
// around this, we pass a dummy no-op serializer.
writer = blockManager.getDiskWriter(
  blockId, file, DummySerializerInstance.INSTANCE, fileBufferSize, 
writeMetrics);
// Write the number of records
writeIntToBuffer(numRecordsToWrite, 0);
writer.write(writeBuffer, 0, 4);
  }



was (Author: krish.dey):
The method still seems to be the same as it is. Doesn't this to be changed to 
accommodate encryption of spill to disk?

public UnsafeSorterSpillWriter(BlockManager blockManager, int fileBufferSize, 
ShuffleWriteMetrics writeMetrics, int numRecordsToWrite) throws IOException {
final Tuple2 spilledFileInfo =  
blockManager.diskBlockManager().createTempLocalBlock();
this.file = spilledFileInfo._2();
this.blockId = spilledFileInfo._1();
this.numRecordsToWrite = numRecordsToWrite;
// Unfortunately, we need a serializer instance in order to construct a 
DiskBlockObjectWriter.
// Our write path doesn't actually use this serializer (since we end up 
calling the `write()`
// OutputStream methods), but DiskBlockObjectWriter still calls some 
methods on it. To work
// around this, we pass a dummy no-op serializer.
writer = blockManager.getDiskWriter(
  blockId, file, DummySerializerInstance.INSTANCE, fileBufferSize, 
writeMetrics);
// Write the number of records
writeIntToBuffer(numRecordsToWrite, 0);
writer.write(writeBuffer, 0, 4);
  }


> Add encrypted shuffle in spark
> --
>
> Key: SPARK-5682
> URL: https://issues.apache.org/jira/browse/SPARK-5682
> Project: Spark
>  Issue Type: New Feature
>  Components: Shuffle
>Reporter: liyunzhang_intel
>Assignee: Ferdinand Xu
> Fix For: 2.1.0
>
> Attachments: Design Document of Encrypted Spark 
> Shuffle_20150209.docx, Design Document of Encrypted Spark 
> Shuffle_20150318.docx, Design Document of Encrypted Spark 
> Shuffle_20150402.docx, Design Document of Encrypted Spark 
> Shuffle_20150506.docx
>
>
> Encrypted shuffle is enabled in hadoop 2.6 which make the process of shuffle 
> data safer. This feature is necessary in spark. AES  is a specification for 
> the encryption of electronic data. There are 5 common modes in AES. CTR is 
> one of the modes. We use two codec JceAesCtrCryptoCodec and 
> OpensslAesCtrCryptoCodec to enable spark encrypted shuffle which is also used 
> in hadoop encrypted shuffle. JceAesCtrypoCodec uses encrypted algorithms  jdk 
> provides while OpensslAesCtrCryptoCodec uses encrypted algorithms  openssl 
> provides. 
> Because ugi credential info is used in the process of encrypted shuffle, we 
> first enable encrypted shuffle on spark-on-yarn framework.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-5682) Add encrypted shuffle in spark

2016-11-09 Thread Krish Dey (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15652217#comment-15652217
 ] 

Krish Dey commented on SPARK-5682:
--

The method still seems to be the same as it is. Doesn't this to be changed to 
accommodate encryption of spill to disk?

public UnsafeSorterSpillWriter(BlockManager blockManager, int fileBufferSize, 
ShuffleWriteMetrics writeMetrics, int numRecordsToWrite) throws IOException {
final Tuple2 spilledFileInfo =  
blockManager.diskBlockManager().createTempLocalBlock();
this.file = spilledFileInfo._2();
this.blockId = spilledFileInfo._1();
this.numRecordsToWrite = numRecordsToWrite;
// Unfortunately, we need a serializer instance in order to construct a 
DiskBlockObjectWriter.
// Our write path doesn't actually use this serializer (since we end up 
calling the `write()`
// OutputStream methods), but DiskBlockObjectWriter still calls some 
methods on it. To work
// around this, we pass a dummy no-op serializer.
writer = blockManager.getDiskWriter(
  blockId, file, DummySerializerInstance.INSTANCE, fileBufferSize, 
writeMetrics);
// Write the number of records
writeIntToBuffer(numRecordsToWrite, 0);
writer.write(writeBuffer, 0, 4);
  }


> Add encrypted shuffle in spark
> --
>
> Key: SPARK-5682
> URL: https://issues.apache.org/jira/browse/SPARK-5682
> Project: Spark
>  Issue Type: New Feature
>  Components: Shuffle
>Reporter: liyunzhang_intel
>Assignee: Ferdinand Xu
> Fix For: 2.1.0
>
> Attachments: Design Document of Encrypted Spark 
> Shuffle_20150209.docx, Design Document of Encrypted Spark 
> Shuffle_20150318.docx, Design Document of Encrypted Spark 
> Shuffle_20150402.docx, Design Document of Encrypted Spark 
> Shuffle_20150506.docx
>
>
> Encrypted shuffle is enabled in hadoop 2.6 which make the process of shuffle 
> data safer. This feature is necessary in spark. AES  is a specification for 
> the encryption of electronic data. There are 5 common modes in AES. CTR is 
> one of the modes. We use two codec JceAesCtrCryptoCodec and 
> OpensslAesCtrCryptoCodec to enable spark encrypted shuffle which is also used 
> in hadoop encrypted shuffle. JceAesCtrypoCodec uses encrypted algorithms  jdk 
> provides while OpensslAesCtrCryptoCodec uses encrypted algorithms  openssl 
> provides. 
> Because ugi credential info is used in the process of encrypted shuffle, we 
> first enable encrypted shuffle on spark-on-yarn framework.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org