GitHub user dmvieira opened a pull request:

    https://github.com/apache/spark/pull/18765

    [SPARK-19720][CORE] Redact sensitive information from SparkSubmit con…

    …sole
    
    This change redacts senstive information (based on default password and 
secret regex)
    from the Spark Submit console logs. Such sensitive information is already 
being
    redacted from event logs and yarn logs, etc.
    
    Testing was done manually to make sure that the console logs were not 
printing any
    sensitive information.
    
    Here's some output from the console:
    
    ```
    Spark properties used, including those specified through
     --conf and those from the properties file 
/etc/spark2/conf/spark-defaults.conf:
      (spark.yarn.appMasterEnv.HADOOP_CREDSTORE_PASSWORD,*********(redacted))
      (spark.authenticate,false)
      (spark.executorEnv.HADOOP_CREDSTORE_PASSWORD,*********(redacted))
    ```
    
    ```
    System properties:
    (spark.yarn.appMasterEnv.HADOOP_CREDSTORE_PASSWORD,*********(redacted))
    (spark.authenticate,false)
    (spark.executorEnv.HADOOP_CREDSTORE_PASSWORD,*********(redacted))
    ```
    There is a risk if new print statements were added to the console down the 
road, sensitive information may still get leaked, since there is no test that 
asserts on the console log output. I considered it out of the scope of this 
JIRA to write an integration test to make sure new leaks don't happen in the 
future.
    
    Running unit tests to make sure nothing else is broken by this change.
    
    Using reference from Mark Grover <m...@apache.org>
    
    Closes #17047 for 2.1.2 spark vesion.
    
    ## What changes were proposed in this pull request?
    
    (Please fill in changes proposed in this fix)
    
    ## How was this patch tested?
    
    (Please explain how this patch was tested. E.g. unit tests, integration 
tests, manual tests)
    (If this patch involves UI changes, please attach a screenshot; otherwise, 
remove this)
    
    Please review http://spark.apache.org/contributing.html before opening a 
pull request.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/dmvieira/spark branch-2.1

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/18765.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #18765
    
----
commit 9e757820af7990f37d1cb5f8cd9c989fcf815cdf
Author: Mark Grover <m...@apache.org>
Date:   2017-03-02T18:33:56Z

    [SPARK-19720][CORE] Redact sensitive information from SparkSubmit console
    
    This change redacts senstive information (based on default password and 
secret regex)
    from the Spark Submit console logs. Such sensitive information is already 
being
    redacted from event logs and yarn logs, etc.
    
    Testing was done manually to make sure that the console logs were not 
printing any
    sensitive information.
    
    Here's some output from the console:
    
    ```
    Spark properties used, including those specified through
     --conf and those from the properties file 
/etc/spark2/conf/spark-defaults.conf:
      (spark.yarn.appMasterEnv.HADOOP_CREDSTORE_PASSWORD,*********(redacted))
      (spark.authenticate,false)
      (spark.executorEnv.HADOOP_CREDSTORE_PASSWORD,*********(redacted))
    ```
    
    ```
    System properties:
    (spark.yarn.appMasterEnv.HADOOP_CREDSTORE_PASSWORD,*********(redacted))
    (spark.authenticate,false)
    (spark.executorEnv.HADOOP_CREDSTORE_PASSWORD,*********(redacted))
    ```
    There is a risk if new print statements were added to the console down the 
road, sensitive information may still get leaked, since there is no test that 
asserts on the console log output. I considered it out of the scope of this 
JIRA to write an integration test to make sure new leaks don't happen in the 
future.
    
    Running unit tests to make sure nothing else is broken by this change.
    
    Using reference from Mark Grover <m...@apache.org>
    
    Closes #17047 for 2.1.2 spark vesion.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to