[jira] [Commented] (SPARK-12333) Support shuffle spill encryption in Spark
[ https://issues.apache.org/jira/browse/SPARK-12333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15652263#comment-15652263 ] Krish Dey commented on SPARK-12333: --- The constructor still seems to be the same as it is. Doesn't this to be changed to accommodate encryption of spill to disk? Moreover passing the DummySerializerInstance it should be allowed to pass any Serializer public UnsafeSorterSpillWriter(BlockManager blockManager, int fileBufferSize, ShuffleWriteMetrics writeMetrics, int numRecordsToWrite) throws IOException{ final Tuple2 spilledFileInfo = blockManager.diskBlockManager().createTempLocalBlock(); this.file = spilledFileInfo._2(); this.blockId = spilledFileInfo._1(); this.numRecordsToWrite = numRecordsToWrite; // Unfortunately, we need a serializer instance in order to construct a DiskBlockObjectWriter. // Our write path doesn't actually use this serializer (since we end up calling the `write()` // OutputStream methods), but DiskBlockObjectWriter still calls some methods on it. To work // around this, we pass a dummy no-op serializer. writer = blockManager.getDiskWriter( blockId, file, DummySerializerInstance.INSTANCE, fileBufferSize, writeMetrics); // Write the number of records writeIntToBuffer(numRecordsToWrite, 0); writer.write(writeBuffer, 0, 4); } > Support shuffle spill encryption in Spark > - > > Key: SPARK-12333 > URL: https://issues.apache.org/jira/browse/SPARK-12333 > Project: Spark > Issue Type: New Feature > Components: Shuffle >Reporter: Ferdinand Xu > > Like shuffle file encryption in SPARK-5682, spills data should also be > encrypted. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-5682) Add encrypted shuffle in spark
[ https://issues.apache.org/jira/browse/SPARK-5682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15652217#comment-15652217 ] Krish Dey edited comment on SPARK-5682 at 11/9/16 10:29 PM: The constructor still seems to be the same as it is. Doesn't this to be changed to accommodate encryption of spill to disk? Moreover passing the DummySerializerInstance it should be allowed to pass any Serializer public UnsafeSorterSpillWriter(BlockManager blockManager, int fileBufferSize, ShuffleWriteMetrics writeMetrics, int numRecordsToWrite) throws IOException { final Tuple2 spilledFileInfo = blockManager.diskBlockManager().createTempLocalBlock(); this.file = spilledFileInfo._2(); this.blockId = spilledFileInfo._1(); this.numRecordsToWrite = numRecordsToWrite; // Unfortunately, we need a serializer instance in order to construct a DiskBlockObjectWriter. // Our write path doesn't actually use this serializer (since we end up calling the `write()` // OutputStream methods), but DiskBlockObjectWriter still calls some methods on it. To work // around this, we pass a dummy no-op serializer. writer = blockManager.getDiskWriter( blockId, file, DummySerializerInstance.INSTANCE, fileBufferSize, writeMetrics); // Write the number of records writeIntToBuffer(numRecordsToWrite, 0); writer.write(writeBuffer, 0, 4); } was (Author: krish.dey): The method still seems to be the same as it is. Doesn't this to be changed to accommodate encryption of spill to disk? public UnsafeSorterSpillWriter(BlockManager blockManager, int fileBufferSize, ShuffleWriteMetrics writeMetrics, int numRecordsToWrite) throws IOException { final Tuple2 spilledFileInfo = blockManager.diskBlockManager().createTempLocalBlock(); this.file = spilledFileInfo._2(); this.blockId = spilledFileInfo._1(); this.numRecordsToWrite = numRecordsToWrite; // Unfortunately, we need a serializer instance in order to construct a DiskBlockObjectWriter. // Our write path doesn't actually use this serializer (since we end up calling the `write()` // OutputStream methods), but DiskBlockObjectWriter still calls some methods on it. To work // around this, we pass a dummy no-op serializer. writer = blockManager.getDiskWriter( blockId, file, DummySerializerInstance.INSTANCE, fileBufferSize, writeMetrics); // Write the number of records writeIntToBuffer(numRecordsToWrite, 0); writer.write(writeBuffer, 0, 4); } > Add encrypted shuffle in spark > -- > > Key: SPARK-5682 > URL: https://issues.apache.org/jira/browse/SPARK-5682 > Project: Spark > Issue Type: New Feature > Components: Shuffle >Reporter: liyunzhang_intel >Assignee: Ferdinand Xu > Fix For: 2.1.0 > > Attachments: Design Document of Encrypted Spark > Shuffle_20150209.docx, Design Document of Encrypted Spark > Shuffle_20150318.docx, Design Document of Encrypted Spark > Shuffle_20150402.docx, Design Document of Encrypted Spark > Shuffle_20150506.docx > > > Encrypted shuffle is enabled in hadoop 2.6 which make the process of shuffle > data safer. This feature is necessary in spark. AES is a specification for > the encryption of electronic data. There are 5 common modes in AES. CTR is > one of the modes. We use two codec JceAesCtrCryptoCodec and > OpensslAesCtrCryptoCodec to enable spark encrypted shuffle which is also used > in hadoop encrypted shuffle. JceAesCtrypoCodec uses encrypted algorithms jdk > provides while OpensslAesCtrCryptoCodec uses encrypted algorithms openssl > provides. > Because ugi credential info is used in the process of encrypted shuffle, we > first enable encrypted shuffle on spark-on-yarn framework. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-5682) Add encrypted shuffle in spark
[ https://issues.apache.org/jira/browse/SPARK-5682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15652217#comment-15652217 ] Krish Dey commented on SPARK-5682: -- The method still seems to be the same as it is. Doesn't this to be changed to accommodate encryption of spill to disk? public UnsafeSorterSpillWriter(BlockManager blockManager, int fileBufferSize, ShuffleWriteMetrics writeMetrics, int numRecordsToWrite) throws IOException { final Tuple2 spilledFileInfo = blockManager.diskBlockManager().createTempLocalBlock(); this.file = spilledFileInfo._2(); this.blockId = spilledFileInfo._1(); this.numRecordsToWrite = numRecordsToWrite; // Unfortunately, we need a serializer instance in order to construct a DiskBlockObjectWriter. // Our write path doesn't actually use this serializer (since we end up calling the `write()` // OutputStream methods), but DiskBlockObjectWriter still calls some methods on it. To work // around this, we pass a dummy no-op serializer. writer = blockManager.getDiskWriter( blockId, file, DummySerializerInstance.INSTANCE, fileBufferSize, writeMetrics); // Write the number of records writeIntToBuffer(numRecordsToWrite, 0); writer.write(writeBuffer, 0, 4); } > Add encrypted shuffle in spark > -- > > Key: SPARK-5682 > URL: https://issues.apache.org/jira/browse/SPARK-5682 > Project: Spark > Issue Type: New Feature > Components: Shuffle >Reporter: liyunzhang_intel >Assignee: Ferdinand Xu > Fix For: 2.1.0 > > Attachments: Design Document of Encrypted Spark > Shuffle_20150209.docx, Design Document of Encrypted Spark > Shuffle_20150318.docx, Design Document of Encrypted Spark > Shuffle_20150402.docx, Design Document of Encrypted Spark > Shuffle_20150506.docx > > > Encrypted shuffle is enabled in hadoop 2.6 which make the process of shuffle > data safer. This feature is necessary in spark. AES is a specification for > the encryption of electronic data. There are 5 common modes in AES. CTR is > one of the modes. We use two codec JceAesCtrCryptoCodec and > OpensslAesCtrCryptoCodec to enable spark encrypted shuffle which is also used > in hadoop encrypted shuffle. JceAesCtrypoCodec uses encrypted algorithms jdk > provides while OpensslAesCtrCryptoCodec uses encrypted algorithms openssl > provides. > Because ugi credential info is used in the process of encrypted shuffle, we > first enable encrypted shuffle on spark-on-yarn framework. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org