[jira] [Assigned] (SPARK-40239) Remove duplicated 'fraction' validation in RDD.sample
[ https://issues.apache.org/jira/browse/SPARK-40239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ruifeng Zheng reassigned SPARK-40239: - Assignee: Ruifeng Zheng > Remove duplicated 'fraction' validation in RDD.sample > - > > Key: SPARK-40239 > URL: https://issues.apache.org/jira/browse/SPARK-40239 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.4.0 >Reporter: Ruifeng Zheng >Assignee: Ruifeng Zheng >Priority: Trivial > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-40239) Remove duplicated 'fraction' validation in RDD.sample
[ https://issues.apache.org/jira/browse/SPARK-40239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ruifeng Zheng resolved SPARK-40239. --- Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 37682 [https://github.com/apache/spark/pull/37682] > Remove duplicated 'fraction' validation in RDD.sample > - > > Key: SPARK-40239 > URL: https://issues.apache.org/jira/browse/SPARK-40239 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.4.0 >Reporter: Ruifeng Zheng >Assignee: Ruifeng Zheng >Priority: Trivial > Fix For: 3.4.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-40149) Star expansion after outer join asymmetrically includes joining key
[ https://issues.apache.org/jira/browse/SPARK-40149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li updated SPARK-40149: Priority: Blocker (was: Major) > Star expansion after outer join asymmetrically includes joining key > --- > > Key: SPARK-40149 > URL: https://issues.apache.org/jira/browse/SPARK-40149 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.0, 3.2.1, 3.3.0, 3.2.2 >Reporter: Otakar Truněček >Priority: Blocker > > When star expansion is used on left side of a join, the result will include > joining key, while on the right side of join it doesn't. I would expect the > behaviour to be symmetric (either include on both sides or on neither). > Example: > {code:python} > from pyspark.sql import SparkSession > import pyspark.sql.functions as f > spark = SparkSession.builder.getOrCreate() > df_left = spark.range(5).withColumn('val', f.lit('left')) > df_right = spark.range(3, 7).withColumn('val', f.lit('right')) > df_merged = ( > df_left > .alias('left') > .join(df_right.alias('right'), on='id', how='full_outer') > .withColumn('left_all', f.struct('left.*')) > .withColumn('right_all', f.struct('right.*')) > ) > df_merged.show() > {code} > result: > {code:java} > +---++-++-+ > | id| val| val|left_all|right_all| > +---++-++-+ > | 0|left| null| {0, left}| {null}| > | 1|left| null| {1, left}| {null}| > | 2|left| null| {2, left}| {null}| > | 3|left|right| {3, left}| {right}| > | 4|left|right| {4, left}| {right}| > | 5|null|right|{null, null}| {right}| > | 6|null|right|{null, null}| {right}| > +---++-++-+ > {code} > This behaviour started with release 3.2.0. Previously the key was not > included on either side. > Result from Spark 3.1.3 > {code:java} > +---++-++-+ > | id| val| val|left_all|right_all| > +---++-++-+ > | 0|left| null| {left}| {null}| > | 6|null|right| {null}| {right}| > | 5|null|right| {null}| {right}| > | 1|left| null| {left}| {null}| > | 3|left|right| {left}| {right}| > | 2|left| null| {left}| {null}| > | 4|left|right| {left}| {right}| > +---++-++-+ {code} > I have a gut feeling this is related to these issues: > https://issues.apache.org/jira/browse/SPARK-39376 > https://issues.apache.org/jira/browse/SPARK-34527 > https://issues.apache.org/jira/browse/SPARK-38603 > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-40149) Star expansion after outer join asymmetrically includes joining key
[ https://issues.apache.org/jira/browse/SPARK-40149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li updated SPARK-40149: Target Version/s: 3.4.0 > Star expansion after outer join asymmetrically includes joining key > --- > > Key: SPARK-40149 > URL: https://issues.apache.org/jira/browse/SPARK-40149 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.0, 3.2.1, 3.3.0, 3.2.2 >Reporter: Otakar Truněček >Priority: Blocker > > When star expansion is used on left side of a join, the result will include > joining key, while on the right side of join it doesn't. I would expect the > behaviour to be symmetric (either include on both sides or on neither). > Example: > {code:python} > from pyspark.sql import SparkSession > import pyspark.sql.functions as f > spark = SparkSession.builder.getOrCreate() > df_left = spark.range(5).withColumn('val', f.lit('left')) > df_right = spark.range(3, 7).withColumn('val', f.lit('right')) > df_merged = ( > df_left > .alias('left') > .join(df_right.alias('right'), on='id', how='full_outer') > .withColumn('left_all', f.struct('left.*')) > .withColumn('right_all', f.struct('right.*')) > ) > df_merged.show() > {code} > result: > {code:java} > +---++-++-+ > | id| val| val|left_all|right_all| > +---++-++-+ > | 0|left| null| {0, left}| {null}| > | 1|left| null| {1, left}| {null}| > | 2|left| null| {2, left}| {null}| > | 3|left|right| {3, left}| {right}| > | 4|left|right| {4, left}| {right}| > | 5|null|right|{null, null}| {right}| > | 6|null|right|{null, null}| {right}| > +---++-++-+ > {code} > This behaviour started with release 3.2.0. Previously the key was not > included on either side. > Result from Spark 3.1.3 > {code:java} > +---++-++-+ > | id| val| val|left_all|right_all| > +---++-++-+ > | 0|left| null| {left}| {null}| > | 6|null|right| {null}| {right}| > | 5|null|right| {null}| {right}| > | 1|left| null| {left}| {null}| > | 3|left|right| {left}| {right}| > | 2|left| null| {left}| {null}| > | 4|left|right| {left}| {right}| > +---++-++-+ {code} > I have a gut feeling this is related to these issues: > https://issues.apache.org/jira/browse/SPARK-39376 > https://issues.apache.org/jira/browse/SPARK-34527 > https://issues.apache.org/jira/browse/SPARK-38603 > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40156) url_decode() exposes a Java error
[ https://issues.apache.org/jira/browse/SPARK-40156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17586140#comment-17586140 ] Apache Spark commented on SPARK-40156: -- User 'ming95' has created a pull request for this issue: https://github.com/apache/spark/pull/37695 > url_decode() exposes a Java error > - > > Key: SPARK-40156 > URL: https://issues.apache.org/jira/browse/SPARK-40156 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.4.0 >Reporter: Serge Rielau >Priority: Major > > Given a badly encode string Spark returns a Java error. > It should the return an ERROR_CLASS > spark-sql> SELECT url_decode('http%3A%2F%2spark.apache.org'); > 22/08/20 17:17:20 ERROR SparkSQLDriver: Failed in [SELECT > url_decode('http%3A%2F%2spark.apache.org')] > java.lang.IllegalArgumentException: URLDecoder: Illegal hex characters in > escape (%) pattern - Error at index 1 in: "2s" > at java.base/java.net.URLDecoder.decode(URLDecoder.java:232) > at java.base/java.net.URLDecoder.decode(URLDecoder.java:142) > at > org.apache.spark.sql.catalyst.expressions.UrlCodec$.decode(urlExpressions.scala:113) > at > org.apache.spark.sql.catalyst.expressions.UrlCodec.decode(urlExpressions.scala) -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-40240) PySpark rdd.takeSample should validate `num > maxSampleSize` at first
[ https://issues.apache.org/jira/browse/SPARK-40240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ruifeng Zheng reassigned SPARK-40240: - Assignee: Ruifeng Zheng > PySpark rdd.takeSample should validate `num > maxSampleSize` at first > - > > Key: SPARK-40240 > URL: https://issues.apache.org/jira/browse/SPARK-40240 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 3.4.0 >Reporter: Ruifeng Zheng >Assignee: Ruifeng Zheng >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-40240) PySpark rdd.takeSample should validate `num > maxSampleSize` at first
[ https://issues.apache.org/jira/browse/SPARK-40240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ruifeng Zheng resolved SPARK-40240. --- Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 37683 [https://github.com/apache/spark/pull/37683] > PySpark rdd.takeSample should validate `num > maxSampleSize` at first > - > > Key: SPARK-40240 > URL: https://issues.apache.org/jira/browse/SPARK-40240 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 3.4.0 >Reporter: Ruifeng Zheng >Assignee: Ruifeng Zheng >Priority: Minor > Fix For: 3.4.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-40124) Update TPCDS v1.4 q32 for Plan Stability tests
[ https://issues.apache.org/jira/browse/SPARK-40124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-40124: -- Fix Version/s: 3.2.3 > Update TPCDS v1.4 q32 for Plan Stability tests > -- > > Key: SPARK-40124 > URL: https://issues.apache.org/jira/browse/SPARK-40124 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.3.0 >Reporter: Kapil Singh >Assignee: Kapil Singh >Priority: Major > Fix For: 3.4.0, 3.3.1, 3.2.3 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-40234) Clean only MDC items set by Spark
[ https://issues.apache.org/jira/browse/SPARK-40234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] L. C. Hsieh resolved SPARK-40234. - Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 37680 [https://github.com/apache/spark/pull/37680] > Clean only MDC items set by Spark > - > > Key: SPARK-40234 > URL: https://issues.apache.org/jira/browse/SPARK-40234 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.4.0 >Reporter: L. C. Hsieh >Assignee: L. C. Hsieh >Priority: Major > Fix For: 3.4.0 > > > Since SPARK-8981, Spark executor adds MDC support. Before setting MDC items, > the executor cleans up all MDC items. But it causes an issue for other MDC > items not set by Spark but from users at other places. It causes these custom > MDC items not shown in executor log. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-40246) Logging isn't configurable via log4j2 with hadoop-provided profile
[ https://issues.apache.org/jira/browse/SPARK-40246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-40246: Assignee: (was: Apache Spark) > Logging isn't configurable via log4j2 with hadoop-provided profile > -- > > Key: SPARK-40246 > URL: https://issues.apache.org/jira/browse/SPARK-40246 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 3.3.0 >Reporter: Adam Binford >Priority: Major > > When building Spark with -Phadoop-provided (or using the 3.3.0 build without > Hadoop), there is no slf implementation provided for log4j2, so the default > log4j2 properties are ignored and logging isn't configurable via > SparkContext.setLogLevel. > Reproduction on a fresh Ubuntu container: > > {noformat} > apt-get update > apt-get install -y wget > wget https://dlcdn.apache.org/hadoop/common/hadoop-3.3.4/hadoop-3.3.4.tar.gz > wget > https://dlcdn.apache.org/spark/spark-3.3.0/spark-3.3.0-bin-without-hadoop.tgz > tar -xvf hadoop-3.3.4.tar.gz -C /opt > tar -xvf spark-3.3.0-bin-without-hadoop.tgz -C /opt > export HADOOP_HOME=/opt/hadoop-3.3.4/ > export SPARK_HOME=/opt/spark-3.3.0-bin-without-hadoop/ > apt install -y openjdk-11-jre-headless python3 > export JAVA_HOME=/usr/lib/jvm/java-11-openjdk-amd64/ > export SPARK_DIST_CLASSPATH=$($HADOOP_HOME/bin/hadoop classpath) > $SPARK_HOME/bin/pyspark > {noformat} > The default log level starts at INFO and you can't change it with > sc.setLogLevel -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40246) Logging isn't configurable via log4j2 with hadoop-provided profile
[ https://issues.apache.org/jira/browse/SPARK-40246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17586104#comment-17586104 ] Apache Spark commented on SPARK-40246: -- User 'Kimahriman' has created a pull request for this issue: https://github.com/apache/spark/pull/37694 > Logging isn't configurable via log4j2 with hadoop-provided profile > -- > > Key: SPARK-40246 > URL: https://issues.apache.org/jira/browse/SPARK-40246 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 3.3.0 >Reporter: Adam Binford >Priority: Major > > When building Spark with -Phadoop-provided (or using the 3.3.0 build without > Hadoop), there is no slf implementation provided for log4j2, so the default > log4j2 properties are ignored and logging isn't configurable via > SparkContext.setLogLevel. > Reproduction on a fresh Ubuntu container: > > {noformat} > apt-get update > apt-get install -y wget > wget https://dlcdn.apache.org/hadoop/common/hadoop-3.3.4/hadoop-3.3.4.tar.gz > wget > https://dlcdn.apache.org/spark/spark-3.3.0/spark-3.3.0-bin-without-hadoop.tgz > tar -xvf hadoop-3.3.4.tar.gz -C /opt > tar -xvf spark-3.3.0-bin-without-hadoop.tgz -C /opt > export HADOOP_HOME=/opt/hadoop-3.3.4/ > export SPARK_HOME=/opt/spark-3.3.0-bin-without-hadoop/ > apt install -y openjdk-11-jre-headless python3 > export JAVA_HOME=/usr/lib/jvm/java-11-openjdk-amd64/ > export SPARK_DIST_CLASSPATH=$($HADOOP_HOME/bin/hadoop classpath) > $SPARK_HOME/bin/pyspark > {noformat} > The default log level starts at INFO and you can't change it with > sc.setLogLevel -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-40246) Logging isn't configurable via log4j2 with hadoop-provided profile
[ https://issues.apache.org/jira/browse/SPARK-40246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-40246: Assignee: Apache Spark > Logging isn't configurable via log4j2 with hadoop-provided profile > -- > > Key: SPARK-40246 > URL: https://issues.apache.org/jira/browse/SPARK-40246 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 3.3.0 >Reporter: Adam Binford >Assignee: Apache Spark >Priority: Major > > When building Spark with -Phadoop-provided (or using the 3.3.0 build without > Hadoop), there is no slf implementation provided for log4j2, so the default > log4j2 properties are ignored and logging isn't configurable via > SparkContext.setLogLevel. > Reproduction on a fresh Ubuntu container: > > {noformat} > apt-get update > apt-get install -y wget > wget https://dlcdn.apache.org/hadoop/common/hadoop-3.3.4/hadoop-3.3.4.tar.gz > wget > https://dlcdn.apache.org/spark/spark-3.3.0/spark-3.3.0-bin-without-hadoop.tgz > tar -xvf hadoop-3.3.4.tar.gz -C /opt > tar -xvf spark-3.3.0-bin-without-hadoop.tgz -C /opt > export HADOOP_HOME=/opt/hadoop-3.3.4/ > export SPARK_HOME=/opt/spark-3.3.0-bin-without-hadoop/ > apt install -y openjdk-11-jre-headless python3 > export JAVA_HOME=/usr/lib/jvm/java-11-openjdk-amd64/ > export SPARK_DIST_CLASSPATH=$($HADOOP_HOME/bin/hadoop classpath) > $SPARK_HOME/bin/pyspark > {noformat} > The default log level starts at INFO and you can't change it with > sc.setLogLevel -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-40246) Logging isn't configurable via log4j2 with hadoop-provided profile
[ https://issues.apache.org/jira/browse/SPARK-40246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adam Binford updated SPARK-40246: - Component/s: Build (was: Spark Core) > Logging isn't configurable via log4j2 with hadoop-provided profile > -- > > Key: SPARK-40246 > URL: https://issues.apache.org/jira/browse/SPARK-40246 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 3.3.0 >Reporter: Adam Binford >Priority: Major > > When building Spark with -Phadoop-provided (or using the 3.3.0 build without > Hadoop), there is no slf implementation provided for log4j2, so the default > log4j2 properties are ignored and logging isn't configurable via > SparkContext.setLogLevel. > Reproduction on a fresh Ubuntu container: > > {noformat} > apt-get update > apt-get install -y wget > wget https://dlcdn.apache.org/hadoop/common/hadoop-3.3.4/hadoop-3.3.4.tar.gz > wget > https://dlcdn.apache.org/spark/spark-3.3.0/spark-3.3.0-bin-without-hadoop.tgz > tar -xvf hadoop-3.3.4.tar.gz -C /opt > tar -xvf spark-3.3.0-bin-without-hadoop.tgz -C /opt > export HADOOP_HOME=/opt/hadoop-3.3.4/ > export SPARK_HOME=/opt/spark-3.3.0-bin-without-hadoop/ > apt install -y openjdk-11-jre-headless python3 > export JAVA_HOME=/usr/lib/jvm/java-11-openjdk-amd64/ > export SPARK_DIST_CLASSPATH=$($HADOOP_HOME/bin/hadoop classpath) > $SPARK_HOME/bin/pyspark > {noformat} > The default log level starts at INFO and you can't change it with > sc.setLogLevel -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-40246) Logging isn't configurable via log4j2 with hadoop-provided profile
Adam Binford created SPARK-40246: Summary: Logging isn't configurable via log4j2 with hadoop-provided profile Key: SPARK-40246 URL: https://issues.apache.org/jira/browse/SPARK-40246 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 3.3.0 Reporter: Adam Binford When building Spark with -Phadoop-provided (or using the 3.3.0 build without Hadoop), there is no slf implementation provided for log4j2, so the default log4j2 properties are ignored and logging isn't configurable via SparkContext.setLogLevel. Reproduction on a fresh Ubuntu container: {noformat} apt-get update apt-get install -y wget wget https://dlcdn.apache.org/hadoop/common/hadoop-3.3.4/hadoop-3.3.4.tar.gz wget https://dlcdn.apache.org/spark/spark-3.3.0/spark-3.3.0-bin-without-hadoop.tgz tar -xvf hadoop-3.3.4.tar.gz -C /opt tar -xvf spark-3.3.0-bin-without-hadoop.tgz -C /opt export HADOOP_HOME=/opt/hadoop-3.3.4/ export SPARK_HOME=/opt/spark-3.3.0-bin-without-hadoop/ apt install -y openjdk-11-jre-headless python3 export JAVA_HOME=/usr/lib/jvm/java-11-openjdk-amd64/ export SPARK_DIST_CLASSPATH=$($HADOOP_HOME/bin/hadoop classpath) $SPARK_HOME/bin/pyspark {noformat} The default log level starts at INFO and you can't change it with sc.setLogLevel -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-40245) Fix FileScan equality check when partition or data filter columns are not read
[ https://issues.apache.org/jira/browse/SPARK-40245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-40245: Assignee: Apache Spark > Fix FileScan equality check when partition or data filter columns are not read > -- > > Key: SPARK-40245 > URL: https://issues.apache.org/jira/browse/SPARK-40245 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.4.0 >Reporter: Peter Toth >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-40245) Fix FileScan equality check when partition or data filter columns are not read
[ https://issues.apache.org/jira/browse/SPARK-40245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-40245: Assignee: (was: Apache Spark) > Fix FileScan equality check when partition or data filter columns are not read > -- > > Key: SPARK-40245 > URL: https://issues.apache.org/jira/browse/SPARK-40245 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.4.0 >Reporter: Peter Toth >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40245) Fix FileScan equality check when partition or data filter columns are not read
[ https://issues.apache.org/jira/browse/SPARK-40245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17586097#comment-17586097 ] Apache Spark commented on SPARK-40245: -- User 'peter-toth' has created a pull request for this issue: https://github.com/apache/spark/pull/37693 > Fix FileScan equality check when partition or data filter columns are not read > -- > > Key: SPARK-40245 > URL: https://issues.apache.org/jira/browse/SPARK-40245 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.4.0 >Reporter: Peter Toth >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-40245) Fix FileScan equality check when partition or data filter columns are not read
[ https://issues.apache.org/jira/browse/SPARK-40245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Toth updated SPARK-40245: --- Summary: Fix FileScan equality check when partition or data filter columns are not read (was: Fix FileScan canonicalization when partition or data filter columns are not read) > Fix FileScan equality check when partition or data filter columns are not read > -- > > Key: SPARK-40245 > URL: https://issues.apache.org/jira/browse/SPARK-40245 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.4.0 >Reporter: Peter Toth >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-40245) Fix FileScan canonicalization when partition or data filter columns are not read
Peter Toth created SPARK-40245: -- Summary: Fix FileScan canonicalization when partition or data filter columns are not read Key: SPARK-40245 URL: https://issues.apache.org/jira/browse/SPARK-40245 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.4.0 Reporter: Peter Toth -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-40241) Correct the link of GenericUDTF
[ https://issues.apache.org/jira/browse/SPARK-40241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuming Wang resolved SPARK-40241. - Fix Version/s: 3.3.1 3.1.4 3.2.3 3.4.0 Resolution: Fixed Issue resolved by pull request 37685 [https://github.com/apache/spark/pull/37685] > Correct the link of GenericUDTF > --- > > Key: SPARK-40241 > URL: https://issues.apache.org/jira/browse/SPARK-40241 > Project: Spark > Issue Type: Improvement > Components: Documentation >Affects Versions: 3.4.0 >Reporter: Ruifeng Zheng >Assignee: Ruifeng Zheng >Priority: Trivial > Fix For: 3.3.1, 3.1.4, 3.2.3, 3.4.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-40241) Correct the link of GenericUDTF
[ https://issues.apache.org/jira/browse/SPARK-40241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuming Wang reassigned SPARK-40241: --- Assignee: Ruifeng Zheng > Correct the link of GenericUDTF > --- > > Key: SPARK-40241 > URL: https://issues.apache.org/jira/browse/SPARK-40241 > Project: Spark > Issue Type: Improvement > Components: Documentation >Affects Versions: 3.4.0 >Reporter: Ruifeng Zheng >Assignee: Ruifeng Zheng >Priority: Trivial > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40244) Correct the property name of data source option for csv
[ https://issues.apache.org/jira/browse/SPARK-40244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17585717#comment-17585717 ] Apache Spark commented on SPARK-40244: -- User 'mukever' has created a pull request for this issue: https://github.com/apache/spark/pull/37692 > Correct the property name of data source option for csv > --- > > Key: SPARK-40244 > URL: https://issues.apache.org/jira/browse/SPARK-40244 > Project: Spark > Issue Type: Improvement > Components: Documentation >Affects Versions: 3.3.0 >Reporter: 陈志祥 >Priority: Trivial > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40244) Correct the property name of data source option for csv
[ https://issues.apache.org/jira/browse/SPARK-40244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17585716#comment-17585716 ] Apache Spark commented on SPARK-40244: -- User 'mukever' has created a pull request for this issue: https://github.com/apache/spark/pull/37692 > Correct the property name of data source option for csv > --- > > Key: SPARK-40244 > URL: https://issues.apache.org/jira/browse/SPARK-40244 > Project: Spark > Issue Type: Improvement > Components: Documentation >Affects Versions: 3.3.0 >Reporter: 陈志祥 >Priority: Trivial > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40244) Correct the property name of data source option for csv
[ https://issues.apache.org/jira/browse/SPARK-40244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17585712#comment-17585712 ] Apache Spark commented on SPARK-40244: -- User 'mukever' has created a pull request for this issue: https://github.com/apache/spark/pull/37691 > Correct the property name of data source option for csv > --- > > Key: SPARK-40244 > URL: https://issues.apache.org/jira/browse/SPARK-40244 > Project: Spark > Issue Type: Improvement > Components: Documentation >Affects Versions: 3.3.0 >Reporter: 陈志祥 >Priority: Trivial > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40244) Correct the property name of data source option for csv
[ https://issues.apache.org/jira/browse/SPARK-40244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17585711#comment-17585711 ] Apache Spark commented on SPARK-40244: -- User 'mukever' has created a pull request for this issue: https://github.com/apache/spark/pull/37690 > Correct the property name of data source option for csv > --- > > Key: SPARK-40244 > URL: https://issues.apache.org/jira/browse/SPARK-40244 > Project: Spark > Issue Type: Improvement > Components: Documentation >Affects Versions: 3.3.0 >Reporter: 陈志祥 >Priority: Trivial > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40244) Correct the property name of data source option for csv
[ https://issues.apache.org/jira/browse/SPARK-40244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17585710#comment-17585710 ] Apache Spark commented on SPARK-40244: -- User 'mukever' has created a pull request for this issue: https://github.com/apache/spark/pull/37690 > Correct the property name of data source option for csv > --- > > Key: SPARK-40244 > URL: https://issues.apache.org/jira/browse/SPARK-40244 > Project: Spark > Issue Type: Improvement > Components: Documentation >Affects Versions: 3.3.0 >Reporter: 陈志祥 >Priority: Trivial > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-40244) Correct the property name of data source option for csv
[ https://issues.apache.org/jira/browse/SPARK-40244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-40244: Assignee: (was: Apache Spark) > Correct the property name of data source option for csv > --- > > Key: SPARK-40244 > URL: https://issues.apache.org/jira/browse/SPARK-40244 > Project: Spark > Issue Type: Improvement > Components: Documentation >Affects Versions: 3.3.0 >Reporter: 陈志祥 >Priority: Trivial > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-40244) Correct the property name of data source option for csv
[ https://issues.apache.org/jira/browse/SPARK-40244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-40244: Assignee: Apache Spark > Correct the property name of data source option for csv > --- > > Key: SPARK-40244 > URL: https://issues.apache.org/jira/browse/SPARK-40244 > Project: Spark > Issue Type: Improvement > Components: Documentation >Affects Versions: 3.3.0 >Reporter: 陈志祥 >Assignee: Apache Spark >Priority: Trivial > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40244) Correct the property name of data source option for csv
[ https://issues.apache.org/jira/browse/SPARK-40244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17585706#comment-17585706 ] Apache Spark commented on SPARK-40244: -- User 'mukever' has created a pull request for this issue: https://github.com/apache/spark/pull/37689 > Correct the property name of data source option for csv > --- > > Key: SPARK-40244 > URL: https://issues.apache.org/jira/browse/SPARK-40244 > Project: Spark > Issue Type: Improvement > Components: Documentation >Affects Versions: 3.3.0 >Reporter: 陈志祥 >Priority: Trivial > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-40244) Correct the property name of data source option for csv
陈志祥 created SPARK-40244: --- Summary: Correct the property name of data source option for csv Key: SPARK-40244 URL: https://issues.apache.org/jira/browse/SPARK-40244 Project: Spark Issue Type: Improvement Components: Documentation Affects Versions: 3.3.0 Reporter: 陈志祥 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40243) Enhance Hive UDF support documentation
[ https://issues.apache.org/jira/browse/SPARK-40243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17585704#comment-17585704 ] Apache Spark commented on SPARK-40243: -- User 'wangyum' has created a pull request for this issue: https://github.com/apache/spark/pull/37688 > Enhance Hive UDF support documentation > -- > > Key: SPARK-40243 > URL: https://issues.apache.org/jira/browse/SPARK-40243 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.4.0 >Reporter: Yuming Wang >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-40243) Enhance Hive UDF support documentation
[ https://issues.apache.org/jira/browse/SPARK-40243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-40243: Assignee: (was: Apache Spark) > Enhance Hive UDF support documentation > -- > > Key: SPARK-40243 > URL: https://issues.apache.org/jira/browse/SPARK-40243 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.4.0 >Reporter: Yuming Wang >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-40243) Enhance Hive UDF support documentation
[ https://issues.apache.org/jira/browse/SPARK-40243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-40243: Assignee: Apache Spark > Enhance Hive UDF support documentation > -- > > Key: SPARK-40243 > URL: https://issues.apache.org/jira/browse/SPARK-40243 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.4.0 >Reporter: Yuming Wang >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40243) Enhance Hive UDF support documentation
[ https://issues.apache.org/jira/browse/SPARK-40243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17585703#comment-17585703 ] Apache Spark commented on SPARK-40243: -- User 'wangyum' has created a pull request for this issue: https://github.com/apache/spark/pull/37688 > Enhance Hive UDF support documentation > -- > > Key: SPARK-40243 > URL: https://issues.apache.org/jira/browse/SPARK-40243 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.4.0 >Reporter: Yuming Wang >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40039) Introducing a streaming checkpoint file manager based on Hadoop's Abortable interface
[ https://issues.apache.org/jira/browse/SPARK-40039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17585701#comment-17585701 ] Apache Spark commented on SPARK-40039: -- User 'attilapiros' has created a pull request for this issue: https://github.com/apache/spark/pull/37687 > Introducing a streaming checkpoint file manager based on Hadoop's Abortable > interface > - > > Key: SPARK-40039 > URL: https://issues.apache.org/jira/browse/SPARK-40039 > Project: Spark > Issue Type: Improvement > Components: Structured Streaming >Affects Versions: 3.4.0 >Reporter: Attila Zsolt Piros >Priority: Major > > Currently on S3 the checkpoint file manager (called > FileContextBasedCheckpointFileManager) is based on rename. So when a file is > opened for an atomic stream a temporary file used instead and when the stream > is committed the file is renamed. > But on S3 a rename will be a file copy. So it has some serious performance > implication. > But on Hadoop 3 there is new interface introduce called *Abortable* and > *S3AFileSystem* has this capability which is implemented by on top S3's > multipart upload. So when the file is committed a POST is sent > ([https://docs.aws.amazon.com/AmazonS3/latest/API/API_CompleteMultipartUpload.html]) > and when aborted a DELETE will be send > ([https://docs.aws.amazon.com/AmazonS3/latest/API/API_AbortMultipartUpload.html]) -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-40243) Enhance Hive UDF support documentation
Yuming Wang created SPARK-40243: --- Summary: Enhance Hive UDF support documentation Key: SPARK-40243 URL: https://issues.apache.org/jira/browse/SPARK-40243 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.4.0 Reporter: Yuming Wang -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-40242) Only return all physical plans after summitting pyspark script with several spark sql blocks inside
[ https://issues.apache.org/jira/browse/SPARK-40242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liang Fenjie updated SPARK-40242: - Description: Backgroud: In industry development environment, we got used to write several spark-sql blocks in one pyspark script. Without really submitting to cluster as running application, getting all the physical plans of a application is too hard to do within a parameter statement of "spark-submit" command. Wish: I wish to add a parameter "–genPlan" as for "spark-submit" command, which can return all the physical plans of a application instead of submitting to run indeed. Also, other approach to settle the matter is wellcome. was: Backgroud: In industry development environment, we got used to write several spark-sql blocks in one pyspark script. Without really submitting to cluster as running application, getting all the physical plans of a application is too hard to do within a parameter statement of "spark-submit" command. Wish: I wish to add a parameter "–genPlan" as for "spark-submit" command, which can return all the physical plans of a application instead of submitting to run indeed. Also, other approach to meet the issue is wellcome. > Only return all physical plans after summitting pyspark script with several > spark sql blocks inside > --- > > Key: SPARK-40242 > URL: https://issues.apache.org/jira/browse/SPARK-40242 > Project: Spark > Issue Type: Wish > Components: Spark Submit, SQL >Affects Versions: 2.1.3 >Reporter: Liang Fenjie >Priority: Major > > Backgroud: > In industry development environment, we got used to write several > spark-sql blocks in one pyspark script. Without really submitting to cluster > as running application, getting all the physical plans of a application is > too hard to do within a parameter statement of "spark-submit" command. > > Wish: > I wish to add a parameter "–genPlan" as for "spark-submit" command, which > can return all the physical plans of a application instead of submitting to > run indeed. Also, other approach to settle the matter is wellcome. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-40242) Only return all physical plans after summitting pyspark script with several spark sql blocks inside
[ https://issues.apache.org/jira/browse/SPARK-40242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liang Fenjie updated SPARK-40242: - Description: Backgroud: In industry development environment, we got used to write several spark-sql blocks in one pyspark script. Without really submitting to cluster as running application, getting all the physical plans of a application is too hard to do within a parameter statement of "spark-submit" command. Wish: I wish to add a parameter "–genPlan" as for "spark-submit" command, which can return all the physical plans of a application instead of submitting to run indeed. Also, other approach to meet the issue is wellcome. was: Backgroud: In industry development environment, we got used to write several spark-sql blocks in one pyspark script. Without really submitting to cluster as running application, getting all the physical plans of a application is too hard to do within a parameter statement of "spark-submit" command. Wish: I wish to add a parameter "–genPlan" as for "spark-submit" command, which can return all the physical plans of a application instead of submitting to run indeed. > Only return all physical plans after summitting pyspark script with several > spark sql blocks inside > --- > > Key: SPARK-40242 > URL: https://issues.apache.org/jira/browse/SPARK-40242 > Project: Spark > Issue Type: Wish > Components: Spark Submit, SQL >Affects Versions: 2.1.3 >Reporter: Liang Fenjie >Priority: Major > > Backgroud: > In industry development environment, we got used to write several > spark-sql blocks in one pyspark script. Without really submitting to cluster > as running application, getting all the physical plans of a application is > too hard to do within a parameter statement of "spark-submit" command. > > Wish: > I wish to add a parameter "–genPlan" as for "spark-submit" command, which > can return all the physical plans of a application instead of submitting to > run indeed. Also, other approach to meet the issue is wellcome. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40142) Make pyspark.sql.functions examples self-contained
[ https://issues.apache.org/jira/browse/SPARK-40142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17585680#comment-17585680 ] Apache Spark commented on SPARK-40142: -- User 'khalidmammadov' has created a pull request for this issue: https://github.com/apache/spark/pull/37686 > Make pyspark.sql.functions examples self-contained > -- > > Key: SPARK-40142 > URL: https://issues.apache.org/jira/browse/SPARK-40142 > Project: Spark > Issue Type: Sub-task > Components: PySpark, SQL >Affects Versions: 3.4.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Fix For: 3.4.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40142) Make pyspark.sql.functions examples self-contained
[ https://issues.apache.org/jira/browse/SPARK-40142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17585681#comment-17585681 ] Apache Spark commented on SPARK-40142: -- User 'khalidmammadov' has created a pull request for this issue: https://github.com/apache/spark/pull/37686 > Make pyspark.sql.functions examples self-contained > -- > > Key: SPARK-40142 > URL: https://issues.apache.org/jira/browse/SPARK-40142 > Project: Spark > Issue Type: Sub-task > Components: PySpark, SQL >Affects Versions: 3.4.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Fix For: 3.4.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-40242) Only return all physical plans after summitting pyspark script with several spark sql blocks inside
Liang Fenjie created SPARK-40242: Summary: Only return all physical plans after summitting pyspark script with several spark sql blocks inside Key: SPARK-40242 URL: https://issues.apache.org/jira/browse/SPARK-40242 Project: Spark Issue Type: Wish Components: Spark Submit, SQL Affects Versions: 2.1.3 Reporter: Liang Fenjie Backgroud: In industry development environment, we got used to write several spark-sql blocks in one pyspark script. Without really submitting to cluster as running application, getting all the physical plans of a application is too hard to do within a parameter statement of "spark-submit" command. Wish: I wish to add a parameter "–genPlan" as for "spark-submit" command, which can return all the physical plans of a application instead of submitting to run indeed. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org