[jira] [Commented] (SPARK-20435) More thorough redaction of sensitive information from logs/UI, more unit tests

Marcelo Vanzin (JIRA) Mon, 24 Apr 2017 14:27:59 -0700

    [ 
https://issues.apache.org/jira/browse/SPARK-20435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15981947#comment-15981947
 ]


Marcelo Vanzin commented on SPARK-20435:
----------------------------------------

bq. The user copies over the entire conf (say from /etc/spark/conf to 
$USER/custom-conf). And, then updates the spark-defaults.conf with the 
appropriate properties containing the password.

While that can be automated with a short script, it's unnecessarily awkward. 
I've seen the idea around of having a user-specific config file appended to the 
default configuration automatically by Spark, but haven't seen anybody actually 
implement that. It would solve this problem in a cleaner manner.

bq. Supply the password over command line to spark-submit.

I hope this is not documented as a recommended way of doing this anywhere, 
because it's just not secure. Worst case, the user should be setting passwords 
via env variables (as is allowed for S3 credentials, for example, using 
{{AWS_ACCESS_KEY_ID}} / {{AWS_SECRET_ACCESS_KEY}}).

> More thorough redaction of sensitive information from logs/UI, more unit tests
> ------------------------------------------------------------------------------
>
>                 Key: SPARK-20435
>                 URL: https://issues.apache.org/jira/browse/SPARK-20435
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 2.2.0
>            Reporter: Mark Grover
>
> SPARK-18535 and SPARK-19720 were works to redact sensitive information (e.g. 
> hadoop credential provider password, AWS access/secret keys) from event logs 
> + YARN logs + UI and from the console output, respectively.
> While some unit tests were added along with these changes - they asserted 
> when a sensitive key was found, that redaction took place for that key. They 
> didn't assert globally that when running a full-fledged Spark app (whether or 
> YARN or locally), that sensitive information was not present in any of the 
> logs or UI. Such a test would also prevent regressions from happening in the 
> future if someone unknowingly adds extra logging that publishes out sensitive 
> information to disk or UI.
> Consequently, it was found that in some Java configurations, sensitive 
> information was still being leaked in the event logs under the 
> {{SparkListenerEnvironmentUpdate}} event, like so:
> {code}
> "sun.java.command":"org.apache.spark.deploy.SparkSubmit ... --conf 
> spark.executorEnv.HADOOP_CREDSTORE_PASSWORD=secret_password ...
> {code}
> "secret_password" should have been redacted.
> Moreover, previously redaction logic was only checking if the key matched the 
> secret regex pattern, it'd redact it's value. That worked for most cases. 
> However, in the above case, the key (sun.java.command) doesn't tell much, so 
> the value needs to be searched. So the check needs to be expanded to match 
> against values as well.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-20435) More thorough redaction of sensitive information from logs/UI, more unit tests

Reply via email to