[jira] [Commented] (SPARK-32165) SessionState leaks SparkListener with multiple SparkSession
[ https://issues.apache.org/jira/browse/SPARK-32165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17476442#comment-17476442 ] Denis Krivenko commented on SPARK-32165: The issue is still reproducible on Spark 3.2.0 [~Ngone51] could you please provide us with more details why your PR's have not been merged and were closed automatically? I think the Priority could be changed to Critical because it "Crashes, loss of data, severe memory leak". It is so in case of running Spark Thrift Server. > SessionState leaks SparkListener with multiple SparkSession > --- > > Key: SPARK-32165 > URL: https://issues.apache.org/jira/browse/SPARK-32165 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.0 >Reporter: Xianjin YE >Priority: Major > > Copied from > [https://github.com/apache/spark/pull/28128#issuecomment-653102770] > I'd like to point out that this pr > (https://github.com/apache/spark/pull/28128) doesn't fix the memory leaky > completely. Once {{SessionState}} is touched, it will add two more listeners > into the SparkContext, namely {{SQLAppStatusListener}} and > {{ExecutionListenerBus}} > It can be reproduced easily as > {code:java} > test("SPARK-31354: SparkContext only register one SparkSession > ApplicationEnd listener") { > val conf = new SparkConf() > .setMaster("local") > .setAppName("test-app-SPARK-31354-1") > val context = new SparkContext(conf) > SparkSession > .builder() > .sparkContext(context) > .master("local") > .getOrCreate() > .sessionState // this touches the sessionState > val postFirstCreation = context.listenerBus.listeners.size() > SparkSession.clearActiveSession() > SparkSession.clearDefaultSession() > SparkSession > .builder() > .sparkContext(context) > .master("local") > .getOrCreate() > .sessionState // this touches the sessionState > val postSecondCreation = context.listenerBus.listeners.size() > SparkSession.clearActiveSession() > SparkSession.clearDefaultSession() > assert(postFirstCreation == postSecondCreation) > } > {code} > The problem can be reproduced by the above code. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-35262) Memory leak when dataset is being persisted
[ https://issues.apache.org/jira/browse/SPARK-35262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17472248#comment-17472248 ] Denis Krivenko commented on SPARK-35262: [~iamelin] Could you please check/confirm the issue still exists in 3.2.0? > Memory leak when dataset is being persisted > --- > > Key: SPARK-35262 > URL: https://issues.apache.org/jira/browse/SPARK-35262 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.1.1 >Reporter: Igor Amelin >Priority: Major > > If a Java- or Scala-application with SparkSession runs a long time and > persists a lot of datasets, it can crash because of a memory leak. > I've noticed the following. When we have a dataset and persist it, the > SparkSession used to load that dataset is cloned in CacheManager, and this > clone is added as a listener to `listenersPlusTimers` in `ListenerBus`. But > this clone isn't removed from the list of listeners after that, e.g. > unpersisting the dataset. If we persist a lot of datasets, the SparkSession > is cloned and added to `ListenerBus` many times. This leads to a memory leak > since the `listenersPlusTimers` list become very large. > I've found out that the SparkSession is cloned is CacheManager when the > parameters `spark.sql.sources.bucketing.autoBucketedScan.enabled` and > `spark.sql.adaptive.enabled` are true. The first one is true by default, and > this default behavior leads to the problem. When auto bucketed scan is > disabled, the SparkSession isn't cloned, and there are no duplicates in > ListenerBus, so the memory leak doesn't occur. > Here is a small Java application to reproduce the memory leak: > [https://github.com/iamelin/spark-memory-leak] -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-37856) Executor pods keep existing if driver container was restarted
[ https://issues.apache.org/jira/browse/SPARK-37856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Denis Krivenko updated SPARK-37856: --- Environment: Kubernetes 1.20 | Spark 3.1.2 | Hadoop 3.2.0 | Java 11 | Scala 2.12 Kubernetes 1.20 | Spark 3.2.0 | Hadoop 3.3.1 | Java 11 | Scala 2.12 was: * Kubernetes 1.20 * Spark 3.1.2 * Hadoop 3.2.0 * Java 11 * Scala 2.12 and * Kubernetes 1.20 * Spark 3.2.0 * Hadoop 3.3.1 * Java 11 * Scala 2.12 > Executor pods keep existing if driver container was restarted > - > > Key: SPARK-37856 > URL: https://issues.apache.org/jira/browse/SPARK-37856 > Project: Spark > Issue Type: Bug > Components: Kubernetes >Affects Versions: 3.1.2, 3.2.0 > Environment: Kubernetes 1.20 | Spark 3.1.2 | Hadoop 3.2.0 | Java 11 | > Scala 2.12 > Kubernetes 1.20 | Spark 3.2.0 | Hadoop 3.3.1 | Java 11 | Scala 2.12 >Reporter: Denis Krivenko >Priority: Minor > > I run Spark Thrift Server on Kubernetes cluster, so the driver pod runs > continuously and it creates and manages executor pods. From time to time OOM > issue occurs on a driver pod or executor pods. > When it happens on > * executor - the executor pod is getting deleted and the driver creates a > new executor pod instead. It works as expected. > * driver - Kubernetes restarts the driver container and the driver > creates new executor pods. All previous executors stop, but still exist with > *Error* state for Spark 3.1.2 or with *Completed* state for Spark 3.2.0 > The behavior can be reproduced by restarting a pod container with the command > {code:java} > kubectl exec POD_NAME -c CONTAINER_NAME -- /sbin/killall5{code} > Property _spark.kubernetes.executor.deleteOnTermination_ is set to *true* by > default. > If I delete driver pod all executor pods (in any state) are also deleted > completely. > +Pod list+ > {code:java} > NAME READY STATUS RESTARTS > AGE > spark-thrift-server-85cf5d689b-vvrwd 1/1 Running 1 > 3d15h > spark-thrift-server-198cc57e3f9a7400-exec-10 1/1 Running 0 > 86m > spark-thrift-server-198cc57e3f9a7400-exec-6 1/1 Running 0 > 12h > spark-thrift-server-198cc57e3f9a7400-exec-8 1/1 Running 0 > 9h > spark-thrift-server-198cc57e3f9a7400-exec-9 1/1 Running 0 > 3h12m > spark-thrift-server-1a9aee7e31f36eea-exec-17 0/1 Completed 0 > 38h > spark-thrift-server-1a9aee7e31f36eea-exec-18 0/1 Completed 0 > 38h > spark-thrift-server-1a9aee7e31f36eea-exec-19 0/1 Completed 0 > 36h > spark-thrift-server-1a9aee7e31f36eea-exec-21 0/1 Completed 0 > 24h > {code} > +Driver pod+ > {code:java} > apiVersion: v1 > kind: Pod > metadata: > name: spark-thrift-server-85cf5d689b-vvrwd > uid: b69a7c68-a767-4e3b-939c-061347b1c25e > spec: > ... > status: > containerStatuses: > - containerID: > containerd://7206acf424aa30b6f8533c0e32c99ebfdc5ee80648e76289f6bd2f87460ddcd3 > image: xxx/spark:3.2.0 > lastState: > terminated: > containerID: > containerd://fe3cacb8e6470ac37dcd50d525ae3d54c8b6bfef3558325bc22e7b40daab1703 > exitCode: 143 > finishedAt: "2022-01-09T16:09:50Z" > reason: OOMKilled > startedAt: "2022-01-07T00:32:21Z" > name: spark-thrift-server > ready: true > restartCount: 1 > started: true > state: > running: > startedAt: "2022-01-09T16:09:51Z" {code} > Executor pod > {code:java} > apiVersion: v1 > kind: Pod > metadata: > name: spark-thrift-server-1a9aee7e31f36eea-exec-17 > ownerReferences: > - apiVersion: v1 > controller: true > kind: Pod > name: spark-thrift-server-85cf5d689b-vvrwd > uid: b69a7c68-a767-4e3b-939c-061347b1c25e > spec: > ... > status: > containerStatuses: > - containerID: > containerd://75c68190147ba980f4b9014eef3989ddc2ee30de321fd1119957b6684a995c19 > image: xxx/spark:3.2.0 > lastState: {} > name: spark-kubernetes-executor > ready: false > restartCount: 0 > started: false > state: > terminated: > containerID: > containerd://75c68190147ba980f4b9014eef3989ddc2ee30de321fd1119957b6684a995c19 > exitCode: 0 > finishedAt: "2022-01-09T16:08:57Z" > reason: Completed > startedAt: "2022-01-09T01:39:15Z" {code} -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-37856) Executor pods keep existing if driver container was restarted
Denis Krivenko created SPARK-37856: -- Summary: Executor pods keep existing if driver container was restarted Key: SPARK-37856 URL: https://issues.apache.org/jira/browse/SPARK-37856 Project: Spark Issue Type: Bug Components: Kubernetes Affects Versions: 3.2.0, 3.1.2 Environment: * Kubernetes 1.20 * Spark 3.1.2 * Hadoop 3.2.0 * Java 11 * Scala 2.12 and * Kubernetes 1.20 * Spark 3.2.0 * Hadoop 3.3.1 * Java 11 * Scala 2.12 Reporter: Denis Krivenko I run Spark Thrift Server on Kubernetes cluster, so the driver pod runs continuously and it creates and manages executor pods. From time to time OOM issue occurs on a driver pod or executor pods. When it happens on * executor - the executor pod is getting deleted and the driver creates a new executor pod instead. It works as expected. * driver - Kubernetes restarts the driver container and the driver creates new executor pods. All previous executors stop, but still exist with *Error* state for Spark 3.1.2 or with *Completed* state for Spark 3.2.0 The behavior can be reproduced by restarting a pod container with the command {code:java} kubectl exec POD_NAME -c CONTAINER_NAME -- /sbin/killall5{code} Property _spark.kubernetes.executor.deleteOnTermination_ is set to *true* by default. If I delete driver pod all executor pods (in any state) are also deleted completely. +Pod list+ {code:java} NAME READY STATUS RESTARTS AGE spark-thrift-server-85cf5d689b-vvrwd 1/1 Running 1 3d15h spark-thrift-server-198cc57e3f9a7400-exec-10 1/1 Running 0 86m spark-thrift-server-198cc57e3f9a7400-exec-6 1/1 Running 0 12h spark-thrift-server-198cc57e3f9a7400-exec-8 1/1 Running 0 9h spark-thrift-server-198cc57e3f9a7400-exec-9 1/1 Running 0 3h12m spark-thrift-server-1a9aee7e31f36eea-exec-17 0/1 Completed 0 38h spark-thrift-server-1a9aee7e31f36eea-exec-18 0/1 Completed 0 38h spark-thrift-server-1a9aee7e31f36eea-exec-19 0/1 Completed 0 36h spark-thrift-server-1a9aee7e31f36eea-exec-21 0/1 Completed 0 24h {code} +Driver pod+ {code:java} apiVersion: v1 kind: Pod metadata: name: spark-thrift-server-85cf5d689b-vvrwd uid: b69a7c68-a767-4e3b-939c-061347b1c25e spec: ... status: containerStatuses: - containerID: containerd://7206acf424aa30b6f8533c0e32c99ebfdc5ee80648e76289f6bd2f87460ddcd3 image: xxx/spark:3.2.0 lastState: terminated: containerID: containerd://fe3cacb8e6470ac37dcd50d525ae3d54c8b6bfef3558325bc22e7b40daab1703 exitCode: 143 finishedAt: "2022-01-09T16:09:50Z" reason: OOMKilled startedAt: "2022-01-07T00:32:21Z" name: spark-thrift-server ready: true restartCount: 1 started: true state: running: startedAt: "2022-01-09T16:09:51Z" {code} Executor pod {code:java} apiVersion: v1 kind: Pod metadata: name: spark-thrift-server-1a9aee7e31f36eea-exec-17 ownerReferences: - apiVersion: v1 controller: true kind: Pod name: spark-thrift-server-85cf5d689b-vvrwd uid: b69a7c68-a767-4e3b-939c-061347b1c25e spec: ... status: containerStatuses: - containerID: containerd://75c68190147ba980f4b9014eef3989ddc2ee30de321fd1119957b6684a995c19 image: xxx/spark:3.2.0 lastState: {} name: spark-kubernetes-executor ready: false restartCount: 0 started: false state: terminated: containerID: containerd://75c68190147ba980f4b9014eef3989ddc2ee30de321fd1119957b6684a995c19 exitCode: 0 finishedAt: "2022-01-09T16:08:57Z" reason: Completed startedAt: "2022-01-09T01:39:15Z" {code} -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-37132) Incorrect Spark 3.2.0 package names with included Hadoop binaries
Denis Krivenko created SPARK-37132: -- Summary: Incorrect Spark 3.2.0 package names with included Hadoop binaries Key: SPARK-37132 URL: https://issues.apache.org/jira/browse/SPARK-37132 Project: Spark Issue Type: Bug Components: Build, Documentation Affects Versions: 3.2.0 Reporter: Denis Krivenko *Spark 3.2.0+Hadoop* packages contains Hadoop 3.3 binaries, however file names still refer to Hadoop 3.2, i.e. _spark-3.2.0-bin-*hadoop3.2*.tgz_ [https://dlcdn.apache.org/spark/spark-3.2.0/] [https://dlcdn.apache.org/spark/spark-3.2.0/spark-3.2.0-bin-hadoop3.2.tgz] [https://dlcdn.apache.org/spark/spark-3.2.0/spark-3.2.0-bin-hadoop3.2-scala2.13.tgz] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36398) Redact sensitive information in Spark Thrift Server log
[ https://issues.apache.org/jira/browse/SPARK-36398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17399169#comment-17399169 ] Denis Krivenko commented on SPARK-36398: Should be fixed by [PR#33743|https://github.com/apache/spark/pull/33743] > Redact sensitive information in Spark Thrift Server log > --- > > Key: SPARK-36398 > URL: https://issues.apache.org/jira/browse/SPARK-36398 > Project: Spark > Issue Type: Bug > Components: Security, SQL >Affects Versions: 3.1.2 >Reporter: Denis Krivenko >Priority: Major > > Spark Thrift Server logs query without sensitive information redaction in > [org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.scala|https://github.com/apache/spark/blob/master/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkExecuteStatementOperation.scala#L188] > {code:scala} > override def runInternal(): Unit = { > setState(OperationState.PENDING) > logInfo(s"Submitting query '$statement' with $statementId") > {code} > Logs > {code:sh} > 21/08/03 20:49:46 INFO SparkExecuteStatementOperation: Submitting query > 'CREATE OR REPLACE TEMPORARY VIEW test_view > USING org.apache.spark.sql.jdbc > OPTIONS ( > url="jdbc:mysql://example.com:3306", > driver="com.mysql.jdbc.Driver", > dbtable="example.test", > user="my_username", > password="my_password" > )' with 37e5d2cb-aa96-407e-b589-7cb212324100 > 21/08/03 20:49:46 INFO SparkExecuteStatementOperation: Running query with > 37e5d2cb-aa96-407e-b589-7cb212324100 > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-36510) Missing spark.redaction.string.regex property
[ https://issues.apache.org/jira/browse/SPARK-36510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Denis Krivenko updated SPARK-36510: --- Description: The property *spark.redaction.string.regex* is missing in [Runtime Environment|https://spark.apache.org/docs/3.1.2/configuration.html#runtime-environment] properties table but referred by *spark.sql.redaction.string.regex* description as its default value (was: The property *spark.redaction.string.regex* is missing in [Runtime Environment|https://spark.apache.org/docs/3.1.2/configuration.html#runtime-environment] properties table but referred by spark.sql.redaction.string.regex ** description as its default value) > Missing spark.redaction.string.regex property > - > > Key: SPARK-36510 > URL: https://issues.apache.org/jira/browse/SPARK-36510 > Project: Spark > Issue Type: Documentation > Components: docs >Affects Versions: 3.1.2 >Reporter: Denis Krivenko >Priority: Trivial > > The property *spark.redaction.string.regex* is missing in [Runtime > Environment|https://spark.apache.org/docs/3.1.2/configuration.html#runtime-environment] > properties table but referred by *spark.sql.redaction.string.regex* > description as its default value -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-36510) Missing spark.redaction.string.regex property
Denis Krivenko created SPARK-36510: -- Summary: Missing spark.redaction.string.regex property Key: SPARK-36510 URL: https://issues.apache.org/jira/browse/SPARK-36510 Project: Spark Issue Type: Documentation Components: docs Affects Versions: 3.1.2 Reporter: Denis Krivenko The property *spark.redaction.string.regex* is missing in [Runtime Environment|https://spark.apache.org/docs/3.1.2/configuration.html#runtime-environment] properties table but referred by spark.sql.redaction.string.regex ** description as its default value -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-36472) Improve SQL syntax for MERGE
[ https://issues.apache.org/jira/browse/SPARK-36472?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Denis Krivenko updated SPARK-36472: --- Description: Existing SQL syntax for *MEGRE* (see Delta Lake examples [here|https://docs.delta.io/latest/delta-update.html#upsert-into-a-table-using-merge] and [here|https://docs.microsoft.com/en-us/azure/databricks/spark/latest/spark-sql/language-manual/delta-merge-into]) could be improved by adding an alternative for {{}} *Main assumption* In common cases target and source tables have the same column names used in {{}} as merge keys, for example: {code:sql} ON target.key1 = source.key1 AND target.key2 = source.key2{code} It would be more convenient to use a syntax similar to: {code:sql} ON COLUMNS (key1, key2) -- or ON MATCHING (key1, key2) {code} The same approach is used for [JOIN|https://spark.apache.org/docs/latest/sql-ref-syntax-qry-select-join.html] where {{join_criteria}} syntax is {code:sql} ON boolean_expression | USING ( column_name [ , ... ] ) {code} *Improvement proposal* Syntax {code:sql} MERGE INTO target_table_identifier [AS target_alias] USING source_table_identifier [] [AS source_alias] ON { | COLUMNS ( column_name [ , ... ] ) } [ WHEN MATCHED [ AND ] THEN ] [ WHEN MATCHED [ AND ] THEN ] [ WHEN NOT MATCHED [ AND ] THEN ] {code} Example {code:sql} MERGE INTO target USING source ON COLUMNS (key1, key2) WHEN MATCHED THEN UPDATE SET * WHEN NOT MATCHED THEN INSERT * {code} was: Existing SQL syntax for *MEGRE* (see Delta Lake examples [here|https://docs.delta.io/latest/delta-update.html#upsert-into-a-table-using-merge] and [here|https://docs.microsoft.com/en-us/azure/databricks/spark/latest/spark-sql/language-manual/delta-merge-into]) could be improved by adding an alternative for {{}} *Main assumption* In common cases target and source tables have the same column names used in {{}} as merge keys, for example: {code:sql} ON target.key1 = source.key1 AND target.key2 = source.key2{code} It would be more convenient to use a syntax similar to: {code:sql} ON COLUMNS (key1, key2) {code} The same approach is used for [JOIN|https://spark.apache.org/docs/latest/sql-ref-syntax-qry-select-join.html] where {{join_criteria}} syntax is {code:sql} ON boolean_expression | USING ( column_name [ , ... ] ) {code} *Improvement proposal* Syntax {code:sql} MERGE INTO target_table_identifier [AS target_alias] USING source_table_identifier [] [AS source_alias] ON { | COLUMNS ( column_name [ , ... ] ) } [ WHEN MATCHED [ AND ] THEN ] [ WHEN MATCHED [ AND ] THEN ] [ WHEN NOT MATCHED [ AND ] THEN ] {code} Example {code:sql} MERGE INTO target USING source ON COLUMNS (key1, key2) WHEN MATCHED THEN UPDATE SET * WHEN NOT MATCHED THEN INSERT * {code} > Improve SQL syntax for MERGE > > > Key: SPARK-36472 > URL: https://issues.apache.org/jira/browse/SPARK-36472 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.1.2 >Reporter: Denis Krivenko >Priority: Trivial > > Existing SQL syntax for *MEGRE* (see Delta Lake examples > [here|https://docs.delta.io/latest/delta-update.html#upsert-into-a-table-using-merge] > and > [here|https://docs.microsoft.com/en-us/azure/databricks/spark/latest/spark-sql/language-manual/delta-merge-into]) > could be improved by adding an alternative for {{}} > *Main assumption* > In common cases target and source tables have the same column names used in > {{}} as merge keys, for example: > {code:sql} > ON target.key1 = source.key1 AND target.key2 = source.key2{code} > It would be more convenient to use a syntax similar to: > {code:sql} > ON COLUMNS (key1, key2) > -- or > ON MATCHING (key1, key2) > {code} > The same approach is used for > [JOIN|https://spark.apache.org/docs/latest/sql-ref-syntax-qry-select-join.html] > where {{join_criteria}} syntax is > {code:sql} > ON boolean_expression | USING ( column_name [ , ... ] ) > {code} > *Improvement proposal* > Syntax > {code:sql} > MERGE INTO target_table_identifier [AS target_alias] > USING source_table_identifier [] [AS source_alias] > ON { | COLUMNS ( column_name [ , ... ] ) } > [ WHEN MATCHED [ AND ] THEN ] > [ WHEN MATCHED [ AND ] THEN ] > [ WHEN NOT MATCHED [ AND ] THEN ] > {code} > Example > {code:sql} > MERGE INTO target > USING source > ON COLUMNS (key1, key2) > WHEN MATCHED THEN > UPDATE SET * > WHEN NOT MATCHED THEN > INSERT * > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-36472) Improve SQL syntax for MERGE
Denis Krivenko created SPARK-36472: -- Summary: Improve SQL syntax for MERGE Key: SPARK-36472 URL: https://issues.apache.org/jira/browse/SPARK-36472 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.1.2 Reporter: Denis Krivenko Existing SQL syntax for *MEGRE* (see Delta Lake examples [here|https://docs.delta.io/latest/delta-update.html#upsert-into-a-table-using-merge] and [here|https://docs.microsoft.com/en-us/azure/databricks/spark/latest/spark-sql/language-manual/delta-merge-into]) could be improved by adding an alternative for {{}} *Main assumption* In common cases target and source tables have the same column names used in {{}} as merge keys, for example: {code:sql} ON target.key1 = source.key1 AND target.key2 = source.key2{code} It would be more convenient to use a syntax similar to: {code:sql} ON COLUMNS (key1, key2) {code} The same approach is used for [JOIN|https://spark.apache.org/docs/latest/sql-ref-syntax-qry-select-join.html] where {{join_criteria}} syntax is {code:sql} ON boolean_expression | USING ( column_name [ , ... ] ) {code} *Improvement proposal* Syntax {code:sql} MERGE INTO target_table_identifier [AS target_alias] USING source_table_identifier [] [AS source_alias] ON { | COLUMNS ( column_name [ , ... ] ) } [ WHEN MATCHED [ AND ] THEN ] [ WHEN MATCHED [ AND ] THEN ] [ WHEN NOT MATCHED [ AND ] THEN ] {code} Example {code:sql} MERGE INTO target USING source ON COLUMNS (key1, key2) WHEN MATCHED THEN UPDATE SET * WHEN NOT MATCHED THEN INSERT * {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-36400) Redact sensitive information in Spark Thrift Server UI
[ https://issues.apache.org/jira/browse/SPARK-36400?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Denis Krivenko updated SPARK-36400: --- Attachment: SQL Statistics.png > Redact sensitive information in Spark Thrift Server UI > -- > > Key: SPARK-36400 > URL: https://issues.apache.org/jira/browse/SPARK-36400 > Project: Spark > Issue Type: Bug > Components: SQL, Web UI >Affects Versions: 3.1.2 >Reporter: Denis Krivenko >Priority: Major > Attachments: SQL Statistics.png > > > Spark UI displays sensitive information on "JDBC/ODBC Server" tab > The reason of the issue is in > [org.apache.spark.sql.hive.thriftserver.ui.SqlStatsPagedTable|https://github.com/apache/spark/blob/master/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/ui/ThriftServerPage.scala#L166] > class > [here|https://github.com/apache/spark/blob/master/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/ui/ThriftServerPage.scala#L266-L268] > {code:scala} > > > {info.statement} > > > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-36400) Redact sensitive information in Spark Thrift Server UI
[ https://issues.apache.org/jira/browse/SPARK-36400?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Denis Krivenko updated SPARK-36400: --- Description: Spark UI displays sensitive information on "JDBC/ODBC Server" tab The reason of the issue is in [org.apache.spark.sql.hive.thriftserver.ui.SqlStatsPagedTable|https://github.com/apache/spark/blob/master/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/ui/ThriftServerPage.scala#L166] class [here|https://github.com/apache/spark/blob/master/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/ui/ThriftServerPage.scala#L266-L268] {code:scala} {info.statement} {code} was: Spark UI displays sensitive information on "JDBC/ODBC Server" tab !image-2021-08-04-01-02-27-593.png|width=594,height=272! The reason of the issue is in [org.apache.spark.sql.hive.thriftserver.ui.SqlStatsPagedTable|https://github.com/apache/spark/blob/master/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/ui/ThriftServerPage.scala#L166] class [here|https://github.com/apache/spark/blob/master/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/ui/ThriftServerPage.scala#L266-L268] {code:scala} {info.statement} {code} > Redact sensitive information in Spark Thrift Server UI > -- > > Key: SPARK-36400 > URL: https://issues.apache.org/jira/browse/SPARK-36400 > Project: Spark > Issue Type: Bug > Components: SQL, Web UI >Affects Versions: 3.1.2 >Reporter: Denis Krivenko >Priority: Major > Attachments: SQL Statistics.png > > > Spark UI displays sensitive information on "JDBC/ODBC Server" tab > The reason of the issue is in > [org.apache.spark.sql.hive.thriftserver.ui.SqlStatsPagedTable|https://github.com/apache/spark/blob/master/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/ui/ThriftServerPage.scala#L166] > class > [here|https://github.com/apache/spark/blob/master/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/ui/ThriftServerPage.scala#L266-L268] > {code:scala} > > > {info.statement} > > > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-36400) Redact sensitive information in Spark Thrift Server UI
Denis Krivenko created SPARK-36400: -- Summary: Redact sensitive information in Spark Thrift Server UI Key: SPARK-36400 URL: https://issues.apache.org/jira/browse/SPARK-36400 Project: Spark Issue Type: Bug Components: SQL, Web UI Affects Versions: 3.1.2 Reporter: Denis Krivenko Attachments: SQL Statistics.png Spark UI displays sensitive information on "JDBC/ODBC Server" tab !image-2021-08-04-01-02-27-593.png|width=594,height=272! The reason of the issue is in [org.apache.spark.sql.hive.thriftserver.ui.SqlStatsPagedTable|https://github.com/apache/spark/blob/master/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/ui/ThriftServerPage.scala#L166] class [here|https://github.com/apache/spark/blob/master/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/ui/ThriftServerPage.scala#L266-L268] {code:scala} {info.statement} {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-36398) Redact sensitive information in Spark Thrift Server log
Denis Krivenko created SPARK-36398: -- Summary: Redact sensitive information in Spark Thrift Server log Key: SPARK-36398 URL: https://issues.apache.org/jira/browse/SPARK-36398 Project: Spark Issue Type: Bug Components: Security, SQL Affects Versions: 3.1.2 Reporter: Denis Krivenko Spark Thrift Server logs query without sensitive information redaction in [org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.scala|https://github.com/apache/spark/blob/master/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkExecuteStatementOperation.scala#L188] {code:scala} override def runInternal(): Unit = { setState(OperationState.PENDING) logInfo(s"Submitting query '$statement' with $statementId") {code} Logs {code:sh} 21/08/03 20:49:46 INFO SparkExecuteStatementOperation: Submitting query 'CREATE OR REPLACE TEMPORARY VIEW test_view USING org.apache.spark.sql.jdbc OPTIONS ( url="jdbc:mysql://example.com:3306", driver="com.mysql.jdbc.Driver", dbtable="example.test", user="my_username", password="my_password" )' with 37e5d2cb-aa96-407e-b589-7cb212324100 21/08/03 20:49:46 INFO SparkExecuteStatementOperation: Running query with 37e5d2cb-aa96-407e-b589-7cb212324100 {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org