Mark Grover created SPARK-20435:
-----------------------------------

             Summary: More thorough redaction of sensitive information from 
logs/UI, more unit tests
                 Key: SPARK-20435
                 URL: https://issues.apache.org/jira/browse/SPARK-20435
             Project: Spark
          Issue Type: Bug
          Components: Spark Core
    Affects Versions: 2.2.0
            Reporter: Mark Grover


SPARK-18535 and SPARK-19720 were works to redact sensitive information (e.g. 
hadoop credential provider password, AWS access/secret keys) from event logs + 
YARN logs + UI and from the console output, respectively.

While some unit tests were added along with these changes - they asserted when 
a sensitive key was found, that redaction took place for that key. They didn't 
assert globally that when running a full-fledged Spark app (whether or YARN or 
locally), that sensitive information was not present in any of the logs or UI. 
Such a test would also prevent regressions from happening in the future if 
someone unknowingly adds extra logging that publishes out sensitive information 
to disk or UI.

Consequently, it was found that in some Java configurations, sensitive 
information was still being leaked in the event logs under the 
{{SparkListenerEnvironmentUpdate}} event, like so:
{code}
"sun.java.command":"org.apache.spark.deploy.SparkSubmit ... --conf 
spark.executorEnv.HADOOP_CREDSTORE_PASSWORD=secret_password ...
{code}

"secret_password" should have been redacted.

Moreover, previously redaction logic was only checking if the key matched the 
secret regex pattern, it'd redact it's value. That worked for most cases. 
However, in the above case, the key (sun.java.command) doesn't tell much, so 
the value needs to be searched. So the check needs to be expanded to match 
against values as well.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to