[ 
https://issues.apache.org/jira/browse/SPARK-21702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16131537#comment-16131537
 ] 

George Pongracz edited comment on SPARK-21702 at 8/18/17 12:54 AM:
-------------------------------------------------------------------

*Update:*

The data bearing files (files that contain the data payload from the stream) 
written to s3 when viewed through the AWS S3 GUI and selected using their LHS 
check-box encryption in the properties section report their encryption as "-".

Data bearing files if written without using "PartitionBy", when viewed through 
the AWS S3 GUI and selected using their LHS check-box encryption in the 
properties section report their encryption as "AES-256".

All related non-data bearing files, irrespective whether "PartitionBy" has been 
or not been used when selected using their LHS check-box encryption in the 
properties section report their encryption  as "AES-256".

When clicking through the name of a single data bearing file, when 
"PartitionBy" has been used, brings up a dedicated overview screen for the 
file, reports it as having AES-256 encryption, which differs from how its 
reported with encryption "-" in the parent screen and selected using its LHS 
check-box.

As one can see, this labelling of encryption is inconsistent and can cause 
confusion that a file on first inspection seems unencrypted, whilst really the 
files on deeper via click-through report as encrypted.

I think this lowers the weight of this issue and I can close if deemed a non 
issue, however it would be good if the files would written would all present 
consistently and correctly, whether data or non-data bearing. 

I must say I spun my wheels for a time believing I had not encrypted and trying 
to debug until I stumbled upon what I just described in this update.


was (Author: gpongracz):
*Update:*

The data bearing files (files that contain the data payload from the stream) 
written to s3 when viewed through the AWS S3 GUI and selected using their LHS 
check-box encryption in the properties section report their encryption as "-".

All related non-data bearing files when selected using their LHS check-box 
encryption in the properties section report their encryption  as "AES-256".

When clicking through the name of a single data bearing file, which brings up a 
dedicated overview screen for the file, reports it as having AES-256 encryption.

As one can see, this labelling of encryption is inconsistent and can cause 
confusion that a file on first inspection seems unencrypted, whilst really the 
files on deeper via click-through report as encrypted.

I think this lowers the weight of this issue and I can close if deemed a non 
issue, however it would be good if the files would written would all present 
consistently and correctly, whether data or non-data bearing. 

I must say I lost a bit of time believing I had not encrypted and tried to 
debug until I stumbled upon what I just described in this update.

Obviously only happening when PartitionBy is used.

> Structured Streaming S3A SSE Encryption Not Visible through AWS S3 GUI when 
> PartitionBy Used
> --------------------------------------------------------------------------------------------
>
>                 Key: SPARK-21702
>                 URL: https://issues.apache.org/jira/browse/SPARK-21702
>             Project: Spark
>          Issue Type: Bug
>          Components: Structured Streaming
>    Affects Versions: 2.2.0
>         Environment: Hadoop 2.7.3: AWS SDK 1.7.4
> Hadoop 2.8.1: AWS SDK 1.10.6
>            Reporter: George Pongracz
>            Priority: Minor
>              Labels: security
>
> Settings:
>       .config("spark.hadoop.fs.s3a.impl", 
> "org.apache.hadoop.fs.s3a.S3AFileSystem")
>       .config("spark.hadoop.fs.s3a.server-side-encryption-algorithm", 
> "AES256")
> When writing to an S3 sink from structured streaming the files are being 
> encrypted using AES-256
> When introducing a "PartitionBy" the output data files are unencrypted. 
> All other supporting files, metadata are encrypted
> Suspect write to temp is encrypted and move/rename is not applying the SSE.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to