GitHub user markgrover opened a pull request:

    https://github.com/apache/spark/pull/17047

    [SPARK-19720][SPARK SUBMIT] Redact sensitive information from SparkSubmit 
console

    ## What changes were proposed in this pull request?
    This change redacts senstive information (based on `spark.redaction.regex` 
property)
    from the Spark Submit console logs. Such sensitive information is already 
being
    redacted from event logs and yarn logs, etc.
    
    ## How was this patch tested?
    Testing was done manually to make sure that the console logs were not 
printing any
    sensitive information.
    
    Here's some output from the console:
    
    ```
    Spark properties used, including those specified through
     --conf and those from the properties file 
/etc/spark2/conf/spark-defaults.conf:
      (spark.yarn.appMasterEnv.HADOOP_CREDSTORE_PASSWORD,*********(redacted))
      (spark.authenticate,false)
      (spark.executorEnv.HADOOP_CREDSTORE_PASSWORD,*********(redacted))
    ```
    
    ```
    System properties:
    (spark.yarn.appMasterEnv.HADOOP_CREDSTORE_PASSWORD,*********(redacted))
    (spark.authenticate,false)
    (spark.executorEnv.HADOOP_CREDSTORE_PASSWORD,*********(redacted))
    ```
    There is a risk if new print statements were added to the console down the 
road, sensitive information may still get leaked, since there is no test that 
asserts on the console log output. I considered it out of the scope of this 
JIRA to write an integration test to make sure new leaks don't happen in the 
future.
    
    Running unit tests to make sure nothing else is broken by this change.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/markgrover/spark master_redaction

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/17047.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #17047
    
----
commit 000efb1e3152f837e01ce1f80ae108c596f9baa5
Author: Mark Grover <m...@apache.org>
Date:   2017-02-24T01:30:05Z

    [SPARK-19720][SPARK SUBMIT] Redact sensitive information from SparkSubmit 
console output
    
    This change redacts senstive information (based on spark.redaction.regex 
property)
    from the Spark Submit console logs. Such sensitive information is already 
being
    redacted from event logs and yarn logs, etc.
    
    Testing was done manually to make sure that the console logs were not 
printing any
    sensitive information.
    Here's some output from the console:
    Spark properties used, including those specified through
     --conf and those from the properties file 
/etc/spark2/conf/spark-defaults.conf:
      (spark.yarn.appMasterEnv.HADOOP_CREDSTORE_PASSWORD,*********(redacted))
      (spark.authenticate,false)
      (spark.executorEnv.HADOOP_CREDSTORE_PASSWORD,*********(redacted))

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to