[ https://issues.apache.org/jira/browse/SPARK-21702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16131537#comment-16131537 ]
George Pongracz edited comment on SPARK-21702 at 8/18/17 12:54 AM: ------------------------------------------------------------------- *Update:* The data bearing files (files that contain the data payload from the stream) written to s3 when viewed through the AWS S3 GUI and selected using their LHS check-box encryption in the properties section report their encryption as "-". Data bearing files if written without using "PartitionBy", when viewed through the AWS S3 GUI and selected using their LHS check-box encryption in the properties section report their encryption as "AES-256". All related non-data bearing files, irrespective whether "PartitionBy" has been or not been used when selected using their LHS check-box encryption in the properties section report their encryption as "AES-256". When clicking through the name of a single data bearing file, when "PartitionBy" has been used, brings up a dedicated overview screen for the file, reports it as having AES-256 encryption, which differs from how its reported with encryption "-" in the parent screen and selected using its LHS check-box. As one can see, this labelling of encryption is inconsistent and can cause confusion that a file on first inspection seems unencrypted, whilst really the files on deeper via click-through report as encrypted. I think this lowers the weight of this issue and I can close if deemed a non issue, however it would be good if the files would written would all present consistently and correctly, whether data or non-data bearing. I must say I spun my wheels for a time believing I had not encrypted and trying to debug until I stumbled upon what I just described in this update. was (Author: gpongracz): *Update:* The data bearing files (files that contain the data payload from the stream) written to s3 when viewed through the AWS S3 GUI and selected using their LHS check-box encryption in the properties section report their encryption as "-". All related non-data bearing files when selected using their LHS check-box encryption in the properties section report their encryption as "AES-256". When clicking through the name of a single data bearing file, which brings up a dedicated overview screen for the file, reports it as having AES-256 encryption. As one can see, this labelling of encryption is inconsistent and can cause confusion that a file on first inspection seems unencrypted, whilst really the files on deeper via click-through report as encrypted. I think this lowers the weight of this issue and I can close if deemed a non issue, however it would be good if the files would written would all present consistently and correctly, whether data or non-data bearing. I must say I lost a bit of time believing I had not encrypted and tried to debug until I stumbled upon what I just described in this update. Obviously only happening when PartitionBy is used. > Structured Streaming S3A SSE Encryption Not Visible through AWS S3 GUI when > PartitionBy Used > -------------------------------------------------------------------------------------------- > > Key: SPARK-21702 > URL: https://issues.apache.org/jira/browse/SPARK-21702 > Project: Spark > Issue Type: Bug > Components: Structured Streaming > Affects Versions: 2.2.0 > Environment: Hadoop 2.7.3: AWS SDK 1.7.4 > Hadoop 2.8.1: AWS SDK 1.10.6 > Reporter: George Pongracz > Priority: Minor > Labels: security > > Settings: > .config("spark.hadoop.fs.s3a.impl", > "org.apache.hadoop.fs.s3a.S3AFileSystem") > .config("spark.hadoop.fs.s3a.server-side-encryption-algorithm", > "AES256") > When writing to an S3 sink from structured streaming the files are being > encrypted using AES-256 > When introducing a "PartitionBy" the output data files are unencrypted. > All other supporting files, metadata are encrypted > Suspect write to temp is encrypted and move/rename is not applying the SSE. -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org