GitHub user dmvieira opened a pull request: https://github.com/apache/spark/pull/18873
Fixing python 2.6 tests for jenkings ## What changes were proposed in this pull request? I was doing PR https://github.com/apache/spark/pull/18802 and tests always fail. Here I'm fixing Jenkins tests that were failing with python 2.6. Here there are some backports for python 2.6 ## How was this patch tested? Tests passing at Jenkins You can merge this pull request into a Git repository by running: $ git pull https://github.com/dmvieira/spark fix-python-2.6-tests Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/18873.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #18873 ---- commit 6905976d5fedd7e7dc9e6b578a8bbadfa675fd63 Author: Mark Grover <m...@apache.org> Date: 2016-11-28T16:59:47Z [SPARK-18535][UI][YARN] Redact sensitive information from Spark logs and UI ## What changes were proposed in this pull request? This patch adds a new property called `spark.secret.redactionPattern` that allows users to specify a scala regex to decide which Spark configuration properties and environment variables in driver and executor environments contain sensitive information. When this regex matches the property or environment variable name, its value is redacted from the environment UI and various logs like YARN and event logs. This change uses this property to redact information from event logs and YARN logs. It also, updates the UI code to adhere to this property instead of hardcoding the logic to decipher which properties are sensitive. Here's an image of the UI post-redaction: ![image](https://cloud.githubusercontent.com/assets/1709451/20506215/4cc30654-b007-11e6-8aee-4cde253fba2f.png) Here's the text in the YARN logs, post-redaction: ``HADOOP_CREDSTORE_PASSWORD -> *********(redacted)`` Here's the text in the event logs, post-redaction: ``...,"spark.executorEnv.HADOOP_CREDSTORE_PASSWORD":"*********(redacted)","spark.yarn.appMasterEnv.HADOOP_CREDSTORE_PASSWORD":"*********(redacted)",...`` ## How was this patch tested? 1. Unit tests are added to ensure that redaction works. 2. A YARN job reading data off of S3 with confidential information (hadoop credential provider password) being provided in the environment variables of driver and executor. And, afterwards, logs were grepped to make sure that no mention of secret password was present. It was also ensure that the job was able to read the data off of S3 correctly, thereby ensuring that the sensitive information was being trickled down to the right places to read the data. 3. The event logs were checked to make sure no mention of secret password was present. 4. UI environment tab was checked to make sure there was no secret information being displayed. Author: Mark Grover <m...@apache.org> Closes #15971 from markgrover/master_redaction. commit 7b419b4a1dcad7be02441e5e3729540022b51b4a Author: Mark Grover <m...@apache.org> Date: 2017-03-02T18:33:56Z [SPARK-19720][CORE] Redact sensitive information from SparkSubmit console ## What changes were proposed in this pull request? This change redacts senstive information (based on `spark.redaction.regex` property) from the Spark Submit console logs. Such sensitive information is already being redacted from event logs and yarn logs, etc. ## How was this patch tested? Testing was done manually to make sure that the console logs were not printing any sensitive information. Here's some output from the console: ``` Spark properties used, including those specified through --conf and those from the properties file /etc/spark2/conf/spark-defaults.conf: (spark.yarn.appMasterEnv.HADOOP_CREDSTORE_PASSWORD,*********(redacted)) (spark.authenticate,false) (spark.executorEnv.HADOOP_CREDSTORE_PASSWORD,*********(redacted)) ``` ``` System properties: (spark.yarn.appMasterEnv.HADOOP_CREDSTORE_PASSWORD,*********(redacted)) (spark.authenticate,false) (spark.executorEnv.HADOOP_CREDSTORE_PASSWORD,*********(redacted)) ``` There is a risk if new print statements were added to the console down the road, sensitive information may still get leaked, since there is no test that asserts on the console log output. I considered it out of the scope of this JIRA to write an integration test to make sure new leaks don't happen in the future. Running unit tests to make sure nothing else is broken by this change. Author: Mark Grover <m...@apache.org> Closes #17047 from markgrover/master_redaction. commit 81dc26bd79dad088f533a6b8cc750e5c71abe378 Author: Diogo Munaro <diogo.mun...@corp.globo.com> Date: 2017-08-02T17:49:38Z Fixing tests for jenkins ---- --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org