This is an automated email from the ASF dual-hosted git repository.

sarutak pushed a commit to branch branch-3.1
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.1 by this push:
     new 8c0852c  [SPARK-36398][SQL] Redact sensitive information in Spark 
Thrift Server log
8c0852c is described below

commit 8c0852ca805a918cebe9f22166887128a03b3222
Author: Kousuke Saruta <saru...@oss.nttdata.com>
AuthorDate: Wed Aug 25 21:30:43 2021 +0900

    [SPARK-36398][SQL] Redact sensitive information in Spark Thrift Server log
    
    ### What changes were proposed in this pull request?
    
    This PR fixes an issue that there is no way to redact sensitive information 
in Spark Thrift Server log.
    For example, JDBC password can be exposed in the log.
    ```
    21/08/25 18:52:37 INFO SparkExecuteStatementOperation: Submitting query 
'CREATE TABLE mytbl2(a int) OPTIONS(url="jdbc:mysql//example.com:3306", 
driver="com.mysql.jdbc.Driver", dbtable="test_tbl", user="test_usr", 
password="abcde")' with ca14ae38-1aaf-4bf4-a099-06b8e5337613
    ```
    
    ### Why are the changes needed?
    
    Bug fix.
    
    ### Does this PR introduce _any_ user-facing change?
    
    No.
    
    ### How was this patch tested?
    
    Ran ThriftServer, connect to it and execute `CREATE TABLE mytbl2(a int) 
OPTIONS(url="jdbc:mysql//example.com:3306", driver="com.mysql.jdbc.Driver", 
dbtable="test_tbl", user="test_usr", password="abcde");` with 
`spark.sql.redaction.string.regex=((?i)(?<=password=))(".*")|('.*')`
    Then, confirmed the log.
    ```
    21/08/25 18:54:11 INFO SparkExecuteStatementOperation: Submitting query 
'CREATE TABLE mytbl2(a int) OPTIONS(url="jdbc:mysql//example.com:3306", 
driver="com.mysql.jdbc.Driver", dbtable="test_tbl", user="test_usr", 
password=*********(redacted))' with ffc627e2-b1a8-4d83-ab6d-d819b3ccd909
    ```
    
    Closes #33832 from sarutak/fix-SPARK-36398.
    
    Authored-by: Kousuke Saruta <saru...@oss.nttdata.com>
    Signed-off-by: Kousuke Saruta <saru...@oss.nttdata.com>
    (cherry picked from commit b2ff01608f5ecdba19630e12478bd370f9766f7b)
    Signed-off-by: Kousuke Saruta <saru...@oss.nttdata.com>
---
 .../spark/sql/hive/thriftserver/SparkExecuteStatementOperation.scala    | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git 
a/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkExecuteStatementOperation.scala
 
b/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkExecuteStatementOperation.scala
index acb00e4..bb55bb0 100644
--- 
a/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkExecuteStatementOperation.scala
+++ 
b/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkExecuteStatementOperation.scala
@@ -219,8 +219,8 @@ private[hive] class SparkExecuteStatementOperation(
 
   override def runInternal(): Unit = {
     setState(OperationState.PENDING)
-    logInfo(s"Submitting query '$statement' with $statementId")
     val redactedStatement = 
SparkUtils.redact(sqlContext.conf.stringRedactionPattern, statement)
+    logInfo(s"Submitting query '$redactedStatement' with $statementId")
     HiveThriftServer2.eventManager.onStatementStart(
       statementId,
       parentSession.getSessionHandle.getSessionId.toString,

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

Reply via email to