[ 
https://issues.apache.org/jira/browse/SPARK-18535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15685225#comment-15685225
 ] 

Mark Grover edited comment on SPARK-18535 at 11/22/16 12:36 AM:
----------------------------------------------------------------

I just issued a PR for this, that adds a new customizable property for 
determining what configuration properties are sensitive. Attached is an image 
from the UI with this change.
Here's the text in the YARN logs, with this change:
{{HADOOP_CREDSTORE_PASSWORD -> *********(redacted)}}

Here's the text in the event logs, with this change:
{code}
...,"spark.executorEnv.HADOOP_CREDSTORE_PASSWORD":"*********(redacted)","spark.yarn.appMasterEnv.HADOOP_CREDSTORE_PASSWORD":"*********(redacted)",...
{code}


was (Author: mgrover):
I just issued a PR for this, that adds a new customizable property for 
determining what configuration properties are sensitive. Attached is an image 
from the UI with this change.
Here's the text in the YARN logs, with this change:
{{HADOOP_CREDSTORE_PASSWORD -> *********(redacted)}}

Here's the text in the event logs, with this change:
{{...,"spark.executorEnv.HADOOP_CREDSTORE_PASSWORD":"*********(redacted)","spark.yarn.appMasterEnv.HADOOP_CREDSTORE_PASSWORD":"*********(redacted)",...}}

> Redact sensitive information from Spark logs and UI
> ---------------------------------------------------
>
>                 Key: SPARK-18535
>                 URL: https://issues.apache.org/jira/browse/SPARK-18535
>             Project: Spark
>          Issue Type: Bug
>          Components: Web UI, YARN
>    Affects Versions: 2.1.0
>            Reporter: Mark Grover
>         Attachments: redacted.png
>
>
> A Spark user may have to provide a sensitive information for a Spark 
> configuration property, or a source out an environment variable in the 
> executor or driver environment that contains sensitive information. A good 
> example of this would be when reading/writing data from/to S3 using Spark. 
> The S3 secret and S3 access key can be placed in a [hadoop credential 
> provider|https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/CredentialProviderAPI.html].
>  However, one still needs to provide the password for the credential provider 
> to Spark, which is typically supplied as an environment variable to the 
> driver and executor environments. This environment variable shows up in logs, 
> and may also show up in the UI.
> 1. For logs, it shows up in a few places:
>   1A. Event logs under {{SparkListenerEnvironmentUpdate}} event.
>   1B. YARN logs, when printing the executor launch context.
> 2. For UI, it would show up in the _Environment_ tab, but it is redacted if 
> it contains the words "password" or "secret" in it. And, these magic words 
> are 
> [hardcoded|https://github.com/apache/spark/blob/a2d464770cd183daa7d727bf377bde9c21e29e6a/core/src/main/scala/org/apache/spark/ui/env/EnvironmentPage.scala#L30]
>  and hence not customizable.
> This JIRA is to track the work to make sure sensitive information is redacted 
> from all logs and UIs in Spark, while still being passed on to all relevant 
> places it needs to get passed on to.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to