[jira] [Commented] (SPARK-41875) Throw proper errors in Dataset.to()
[ https://issues.apache.org/jira/browse/SPARK-41875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17654845#comment-17654845 ] jiaan.geng commented on SPARK-41875: It seems this is't the issue of connect. > Throw proper errors in Dataset.to() > --- > > Key: SPARK-41875 > URL: https://issues.apache.org/jira/browse/SPARK-41875 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Sandeep Singh >Priority: Major > > {code:java} > schema = StructType( > [StructField("i", StringType(), True), StructField("j", IntegerType(), > True)] > ) > df = self.spark.createDataFrame([("a", 1)], schema) > schema1 = StructType([StructField("j", StringType()), StructField("i", > StringType())]) > df1 = df.to(schema1) > self.assertEqual(schema1, df1.schema) > self.assertEqual(df.count(), df1.count()) > schema2 = StructType([StructField("j", LongType())]) > df2 = df.to(schema2) > self.assertEqual(schema2, df2.schema) > self.assertEqual(df.count(), df2.count()) > schema3 = StructType([StructField("struct", schema1, False)]) > df3 = df.select(struct("i", "j").alias("struct")).to(schema3) > self.assertEqual(schema3, df3.schema) > self.assertEqual(df.count(), df3.count()) > # incompatible field nullability > schema4 = StructType([StructField("j", LongType(), False)]) > self.assertRaisesRegex( > AnalysisException, "NULLABLE_COLUMN_OR_FIELD", lambda: df.to(schema4) > ){code} > {code:java} > Traceback (most recent call last): > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/tests/test_dataframe.py", > line 1486, in test_to > self.assertRaisesRegex( > AssertionError: AnalysisException not raised by {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41889) Attach root cause to invalidPatternError
[ https://issues.apache.org/jira/browse/SPARK-41889?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-41889: Assignee: (was: Apache Spark) > Attach root cause to invalidPatternError > > > Key: SPARK-41889 > URL: https://issues.apache.org/jira/browse/SPARK-41889 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: BingKun Pan >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41889) Attach root cause to invalidPatternError
[ https://issues.apache.org/jira/browse/SPARK-41889?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-41889: Assignee: Apache Spark > Attach root cause to invalidPatternError > > > Key: SPARK-41889 > URL: https://issues.apache.org/jira/browse/SPARK-41889 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: BingKun Pan >Assignee: Apache Spark >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41889) Attach root cause to invalidPatternError
[ https://issues.apache.org/jira/browse/SPARK-41889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17654826#comment-17654826 ] Apache Spark commented on SPARK-41889: -- User 'panbingkun' has created a pull request for this issue: https://github.com/apache/spark/pull/39402 > Attach root cause to invalidPatternError > > > Key: SPARK-41889 > URL: https://issues.apache.org/jira/browse/SPARK-41889 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: BingKun Pan >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41893) Publish SBOM artifacts
[ https://issues.apache.org/jira/browse/SPARK-41893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-41893: Assignee: (was: Apache Spark) > Publish SBOM artifacts > -- > > Key: SPARK-41893 > URL: https://issues.apache.org/jira/browse/SPARK-41893 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.4.0 >Reporter: Dongjoon Hyun >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41893) Publish SBOM artifacts
[ https://issues.apache.org/jira/browse/SPARK-41893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17654824#comment-17654824 ] Apache Spark commented on SPARK-41893: -- User 'dongjoon-hyun' has created a pull request for this issue: https://github.com/apache/spark/pull/39401 > Publish SBOM artifacts > -- > > Key: SPARK-41893 > URL: https://issues.apache.org/jira/browse/SPARK-41893 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.4.0 >Reporter: Dongjoon Hyun >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41893) Publish SBOM artifacts
[ https://issues.apache.org/jira/browse/SPARK-41893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-41893: Assignee: Apache Spark > Publish SBOM artifacts > -- > > Key: SPARK-41893 > URL: https://issues.apache.org/jira/browse/SPARK-41893 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.4.0 >Reporter: Dongjoon Hyun >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41894) sql/core module mvn clean failed
[ https://issues.apache.org/jira/browse/SPARK-41894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17654823#comment-17654823 ] Yang Jie commented on SPARK-41894: -- The running environment is linux, I haven't found the specific case that generated this file now, need more investigation > sql/core module mvn clean failed > > > Key: SPARK-41894 > URL: https://issues.apache.org/jira/browse/SPARK-41894 > Project: Spark > Issue Type: Bug > Components: Structured Streaming, Tests >Affects Versions: 3.4.0 >Reporter: Yang Jie >Priority: Major > > run the following commands: > # mvn clean install -pl sql/core -am -DskipTests > # mvn test -pl sql/core > # mvn clean > > then following error: > > {code:java} > [INFO] Spark Project Parent POM ... SUCCESS [ 0.133 > s] > [INFO] Spark Project Tags . SUCCESS [ 0.008 > s] > [INFO] Spark Project Sketch ... SUCCESS [ 0.007 > s] > [INFO] Spark Project Local DB . SUCCESS [ 0.008 > s] > [INFO] Spark Project Networking ... SUCCESS [ 0.015 > s] > [INFO] Spark Project Shuffle Streaming Service SUCCESS [ 0.020 > s] > [INFO] Spark Project Unsafe ... SUCCESS [ 0.007 > s] > [INFO] Spark Project Launcher . SUCCESS [ 0.008 > s] > [INFO] Spark Project Core . SUCCESS [ 0.279 > s] > [INFO] Spark Project ML Local Library . SUCCESS [ 0.010 > s] > [INFO] Spark Project GraphX ... SUCCESS [ 0.016 > s] > [INFO] Spark Project Streaming SUCCESS [ 0.039 > s] > [INFO] Spark Project Catalyst . SUCCESS [ 0.262 > s] > [INFO] Spark Project SQL .. FAILURE [ 1.305 > s] > [INFO] Spark Project ML Library ... SKIPPED > [INFO] Spark Project Tools SKIPPED > [INFO] Spark Project Hive . SKIPPED > [INFO] Spark Project REPL . SKIPPED > [INFO] Spark Project YARN Shuffle Service . SKIPPED > [INFO] Spark Project YARN . SKIPPED > [INFO] Spark Project Mesos SKIPPED > [INFO] Spark Project Kubernetes ... SKIPPED > [INFO] Spark Project Hive Thrift Server ... SKIPPED > [INFO] Spark Ganglia Integration .. SKIPPED > [INFO] Spark Project Hadoop Cloud Integration . SKIPPED > [INFO] Spark Project Assembly . SKIPPED > [INFO] Kafka 0.10+ Token Provider for Streaming ... SKIPPED > [INFO] Spark Integration for Kafka 0.10 ... SKIPPED > [INFO] Kafka 0.10+ Source for Structured Streaming SKIPPED > [INFO] Spark Kinesis Integration .. SKIPPED > [INFO] Spark Project Examples . SKIPPED > [INFO] Spark Integration for Kafka 0.10 Assembly .. SKIPPED > [INFO] Spark Avro . SKIPPED > [INFO] Spark Project Connect Common ... SKIPPED > [INFO] Spark Project Connect Server ... SKIPPED > [INFO] Spark Project Connect Client ... SKIPPED > [INFO] Spark Protobuf . SKIPPED > [INFO] Spark Project Kinesis Assembly . SKIPPED > [INFO] > > [INFO] BUILD FAILURE > [INFO] > > [INFO] Total time: 2.896 s > [INFO] Finished at: 2023-01-05T15:15:57+08:00 > [INFO] > > [ERROR] Failed to execute goal > org.apache.maven.plugins:maven-clean-plugin:3.1.0:clean (default-clean) on > project spark-sql_2.13: Failed to clean project: Failed to delete > /${basedir}/sql/core/target/tmp/streaming.metadata-1b8b16d8-c9ba-4c38-9ac0-94a39f583082/commits/.0.crc > -> [Help 1] > {code} > > > run : > * ll > /${basedir}/sql/core/target/tmp/streaming.metadata-1b8b16d8-c9ba-4c38-9ac0-94a39f583082/commits/.0.crc > > > {code:java} > -rw-r--r-- 1 work work 12 Dec 28 16:06 > /${basedir}/sql/core/target/tmp/streaming.metadata-1b8b16d8-c9ba-4c38-9ac0-94a39f583082/commits/.0.crc{code} > > > and current user(work) can't rm this file: > * rm > /${basedir}/sql/core/target/tmp/streaming.metadata-1b8b16d8-c9ba-4c38-9ac0-94a39f583082/commits/.0.crc > > {code:java} > rm: cannot remove >
[jira] [Created] (SPARK-41894) sql/core module mvn clean failed
Yang Jie created SPARK-41894: Summary: sql/core module mvn clean failed Key: SPARK-41894 URL: https://issues.apache.org/jira/browse/SPARK-41894 Project: Spark Issue Type: Bug Components: Structured Streaming, Tests Affects Versions: 3.4.0 Reporter: Yang Jie run the following commands: # mvn clean install -pl sql/core -am -DskipTests # mvn test -pl sql/core # mvn clean then following error: {code:java} [INFO] Spark Project Parent POM ... SUCCESS [ 0.133 s] [INFO] Spark Project Tags . SUCCESS [ 0.008 s] [INFO] Spark Project Sketch ... SUCCESS [ 0.007 s] [INFO] Spark Project Local DB . SUCCESS [ 0.008 s] [INFO] Spark Project Networking ... SUCCESS [ 0.015 s] [INFO] Spark Project Shuffle Streaming Service SUCCESS [ 0.020 s] [INFO] Spark Project Unsafe ... SUCCESS [ 0.007 s] [INFO] Spark Project Launcher . SUCCESS [ 0.008 s] [INFO] Spark Project Core . SUCCESS [ 0.279 s] [INFO] Spark Project ML Local Library . SUCCESS [ 0.010 s] [INFO] Spark Project GraphX ... SUCCESS [ 0.016 s] [INFO] Spark Project Streaming SUCCESS [ 0.039 s] [INFO] Spark Project Catalyst . SUCCESS [ 0.262 s] [INFO] Spark Project SQL .. FAILURE [ 1.305 s] [INFO] Spark Project ML Library ... SKIPPED [INFO] Spark Project Tools SKIPPED [INFO] Spark Project Hive . SKIPPED [INFO] Spark Project REPL . SKIPPED [INFO] Spark Project YARN Shuffle Service . SKIPPED [INFO] Spark Project YARN . SKIPPED [INFO] Spark Project Mesos SKIPPED [INFO] Spark Project Kubernetes ... SKIPPED [INFO] Spark Project Hive Thrift Server ... SKIPPED [INFO] Spark Ganglia Integration .. SKIPPED [INFO] Spark Project Hadoop Cloud Integration . SKIPPED [INFO] Spark Project Assembly . SKIPPED [INFO] Kafka 0.10+ Token Provider for Streaming ... SKIPPED [INFO] Spark Integration for Kafka 0.10 ... SKIPPED [INFO] Kafka 0.10+ Source for Structured Streaming SKIPPED [INFO] Spark Kinesis Integration .. SKIPPED [INFO] Spark Project Examples . SKIPPED [INFO] Spark Integration for Kafka 0.10 Assembly .. SKIPPED [INFO] Spark Avro . SKIPPED [INFO] Spark Project Connect Common ... SKIPPED [INFO] Spark Project Connect Server ... SKIPPED [INFO] Spark Project Connect Client ... SKIPPED [INFO] Spark Protobuf . SKIPPED [INFO] Spark Project Kinesis Assembly . SKIPPED [INFO] [INFO] BUILD FAILURE [INFO] [INFO] Total time: 2.896 s [INFO] Finished at: 2023-01-05T15:15:57+08:00 [INFO] [ERROR] Failed to execute goal org.apache.maven.plugins:maven-clean-plugin:3.1.0:clean (default-clean) on project spark-sql_2.13: Failed to clean project: Failed to delete /${basedir}/sql/core/target/tmp/streaming.metadata-1b8b16d8-c9ba-4c38-9ac0-94a39f583082/commits/.0.crc -> [Help 1] {code} run : * ll /${basedir}/sql/core/target/tmp/streaming.metadata-1b8b16d8-c9ba-4c38-9ac0-94a39f583082/commits/.0.crc {code:java} -rw-r--r-- 1 work work 12 Dec 28 16:06 /${basedir}/sql/core/target/tmp/streaming.metadata-1b8b16d8-c9ba-4c38-9ac0-94a39f583082/commits/.0.crc{code} and current user(work) can't rm this file: * rm /${basedir}/sql/core/target/tmp/streaming.metadata-1b8b16d8-c9ba-4c38-9ac0-94a39f583082/commits/.0.crc {code:java} rm: cannot remove `/${basedir}/sql/core/target/tmp/streaming.metadata-1b8b16d8-c9ba-4c38-9ac0-94a39f583082/commits/.0.crc': Permission denied {code} need to use root to clean this file -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-41829) Implement Dataframe.sort,sortWithinPartitions Ordering
[ https://issues.apache.org/jira/browse/SPARK-41829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ruifeng Zheng resolved SPARK-41829. --- Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 39398 [https://github.com/apache/spark/pull/39398] > Implement Dataframe.sort,sortWithinPartitions Ordering > -- > > Key: SPARK-41829 > URL: https://issues.apache.org/jira/browse/SPARK-41829 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Sandeep Singh >Assignee: Ruifeng Zheng >Priority: Major > Fix For: 3.4.0 > > > {code:java} > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", > line 422, in pyspark.sql.connect.dataframe.DataFrame.sort > Failed example: > df.orderBy(["age", "name"], ascending=[False, False]).show() > Exception raised: > Traceback (most recent call last): > File > "/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py", > line 1350, in __run > exec(compile(example.source, filename, "single", > File "", line > 1, in > df.orderBy(["age", "name"], ascending=[False, False]).show() > TypeError: DataFrame.sort() got an unexpected keyword argument 'ascending' > ** > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", > line 379, in pyspark.sql.connect.dataframe.DataFrame.sortWithinPartitions > Failed example: > df.sortWithinPartitions("age", ascending=False) > Exception raised: > Traceback (most recent call last): > File > "/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py", > line 1350, in __run > exec(compile(example.source, filename, "single", > File " pyspark.sql.connect.dataframe.DataFrame.sortWithinPartitions[1]>", line 1, in > > df.sortWithinPartitions("age", ascending=False) > TypeError: DataFrame.sortWithinPartitions() got an unexpected keyword > argument 'ascending'{code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41829) Implement Dataframe.sort,sortWithinPartitions Ordering
[ https://issues.apache.org/jira/browse/SPARK-41829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ruifeng Zheng reassigned SPARK-41829: - Assignee: Ruifeng Zheng > Implement Dataframe.sort,sortWithinPartitions Ordering > -- > > Key: SPARK-41829 > URL: https://issues.apache.org/jira/browse/SPARK-41829 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Sandeep Singh >Assignee: Ruifeng Zheng >Priority: Major > > {code:java} > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", > line 422, in pyspark.sql.connect.dataframe.DataFrame.sort > Failed example: > df.orderBy(["age", "name"], ascending=[False, False]).show() > Exception raised: > Traceback (most recent call last): > File > "/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py", > line 1350, in __run > exec(compile(example.source, filename, "single", > File "", line > 1, in > df.orderBy(["age", "name"], ascending=[False, False]).show() > TypeError: DataFrame.sort() got an unexpected keyword argument 'ascending' > ** > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", > line 379, in pyspark.sql.connect.dataframe.DataFrame.sortWithinPartitions > Failed example: > df.sortWithinPartitions("age", ascending=False) > Exception raised: > Traceback (most recent call last): > File > "/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py", > line 1350, in __run > exec(compile(example.source, filename, "single", > File " pyspark.sql.connect.dataframe.DataFrame.sortWithinPartitions[1]>", line 1, in > > df.sortWithinPartitions("age", ascending=False) > TypeError: DataFrame.sortWithinPartitions() got an unexpected keyword > argument 'ascending'{code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-41893) Publish SBOM artifacts
Dongjoon Hyun created SPARK-41893: - Summary: Publish SBOM artifacts Key: SPARK-41893 URL: https://issues.apache.org/jira/browse/SPARK-41893 Project: Spark Issue Type: Improvement Components: Build Affects Versions: 3.4.0 Reporter: Dongjoon Hyun -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33772) Build and Run Spark on Java 17
[ https://issues.apache.org/jira/browse/SPARK-33772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17654818#comment-17654818 ] Dongjoon Hyun commented on SPARK-33772: --- FYI, Apache Spark has been Java 17 SBT test coverage, [~jomach] . * Java 17 on Linux (GitHub Action) [https://github.com/apache/spark/actions/runs/3833322692] * Java 17 on Apple Silicon [https://apache-spark.s3.fr-par.scw.cloud/index.html] Please file a Jira with details like your environment information and reproducible commands. > Build and Run Spark on Java 17 > -- > > Key: SPARK-33772 > URL: https://issues.apache.org/jira/browse/SPARK-33772 > Project: Spark > Issue Type: New Feature > Components: Build >Affects Versions: 3.3.0 >Reporter: Dongjoon Hyun >Assignee: Yang Jie >Priority: Major > Labels: releasenotes > Fix For: 3.3.0 > > > Apache Spark supports Java 8 and Java 11 (LTS). The next Java LTS version is > 17. > ||Version||Release Date|| > |Java 17 (LTS)|September 2021| > Apache Spark has a release plan and `Spark 3.2 Code freeze` was July along > with the release branch cut. > - https://spark.apache.org/versioning-policy.html > Supporting new Java version is considered as a new feature which we cannot > allow to backport. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-41162) Anti-join must not be pushed below aggregation with ambiguous predicates
[ https://issues.apache.org/jira/browse/SPARK-41162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Enrico Minack updated SPARK-41162: -- Affects Version/s: 3.0.3 > Anti-join must not be pushed below aggregation with ambiguous predicates > > > Key: SPARK-41162 > URL: https://issues.apache.org/jira/browse/SPARK-41162 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.3, 3.1.3, 3.3.1, 3.2.3, 3.4.0 >Reporter: Enrico Minack >Priority: Major > Labels: correctness > > The following query should return a single row as all values for {{id}} > except for the largest will be eliminated by the anti-join: > {code} > val ids = Seq(1, 2, 3).toDF("id").distinct() > val result = ids.withColumn("id", $"id" + 1).join(ids, "id", > "left_anti").collect() > assert(result.length == 1) > {code} > Without the {{distinct()}}, the assertion is true. With {{distinct()}}, the > assertion should still hold but is false. > Rule {{PushDownLeftSemiAntiJoin}} pushes the {{Join}} below the left > {{Aggregate}} with join condition {{(id#750 + 1) = id#750}}, which can never > be true. > {code} > === Applying Rule > org.apache.spark.sql.catalyst.optimizer.PushDownLeftSemiAntiJoin === > !Join LeftAnti, (id#752 = id#750) 'Aggregate [id#750], > [(id#750 + 1) AS id#752] > !:- Aggregate [id#750], [(id#750 + 1) AS id#752] +- 'Join LeftAnti, > ((id#750 + 1) = id#750) > !: +- LocalRelation [id#750] :- LocalRelation > [id#750] > !+- Aggregate [id#750], [id#750] +- Aggregate [id#750], > [id#750] > ! +- LocalRelation [id#750]+- LocalRelation > [id#750] > {code} > The optimizer then rightly removes the left-anti join altogether, returning > the left child only. > Rule {{PushDownLeftSemiAntiJoin}} should not push down predicates that > reference left *and* right child. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-41580) Assign name to _LEGACY_ERROR_TEMP_2137
[ https://issues.apache.org/jira/browse/SPARK-41580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Max Gekk resolved SPARK-41580. -- Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 39305 [https://github.com/apache/spark/pull/39305] > Assign name to _LEGACY_ERROR_TEMP_2137 > -- > > Key: SPARK-41580 > URL: https://issues.apache.org/jira/browse/SPARK-41580 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Haejoon Lee >Assignee: Haejoon Lee >Priority: Major > Fix For: 3.4.0 > > > We should use proper error class name rather than `_LEGACY_ERROR_TEMP_xxx`. > > *NOTE:* Please reply to this ticket before start working on it, to avoid > working on same ticket at a time -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41580) Assign name to _LEGACY_ERROR_TEMP_2137
[ https://issues.apache.org/jira/browse/SPARK-41580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Max Gekk reassigned SPARK-41580: Assignee: Haejoon Lee > Assign name to _LEGACY_ERROR_TEMP_2137 > -- > > Key: SPARK-41580 > URL: https://issues.apache.org/jira/browse/SPARK-41580 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Haejoon Lee >Assignee: Haejoon Lee >Priority: Major > > We should use proper error class name rather than `_LEGACY_ERROR_TEMP_xxx`. > > *NOTE:* Please reply to this ticket before start working on it, to avoid > working on same ticket at a time -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-41576) Assign name to _LEGACY_ERROR_TEMP_2051
[ https://issues.apache.org/jira/browse/SPARK-41576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Max Gekk resolved SPARK-41576. -- Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 39281 [https://github.com/apache/spark/pull/39281] > Assign name to _LEGACY_ERROR_TEMP_2051 > -- > > Key: SPARK-41576 > URL: https://issues.apache.org/jira/browse/SPARK-41576 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Haejoon Lee >Assignee: Haejoon Lee >Priority: Major > Fix For: 3.4.0 > > > We should use proper error class name rather than `_LEGACY_ERROR_TEMP_xxx`. > > *NOTE:* Please reply to this ticket before start working on it, to avoid > working on same ticket at a time -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41576) Assign name to _LEGACY_ERROR_TEMP_2051
[ https://issues.apache.org/jira/browse/SPARK-41576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Max Gekk reassigned SPARK-41576: Assignee: Haejoon Lee > Assign name to _LEGACY_ERROR_TEMP_2051 > -- > > Key: SPARK-41576 > URL: https://issues.apache.org/jira/browse/SPARK-41576 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Haejoon Lee >Assignee: Haejoon Lee >Priority: Major > > We should use proper error class name rather than `_LEGACY_ERROR_TEMP_xxx`. > > *NOTE:* Please reply to this ticket before start working on it, to avoid > working on same ticket at a time -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41821) Fix DataFrame.describe
[ https://issues.apache.org/jira/browse/SPARK-41821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-41821: Assignee: jiaan.geng > Fix DataFrame.describe > -- > > Key: SPARK-41821 > URL: https://issues.apache.org/jira/browse/SPARK-41821 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Sandeep Singh >Assignee: jiaan.geng >Priority: Major > Fix For: 3.4.0 > > > {code:java} > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", > line 898, in pyspark.sql.connect.dataframe.DataFrame.describe > Failed example: > df.describe(['age']).show() > Exception raised: > Traceback (most recent call last): > File > "/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py", > line 1350, in __run > exec(compile(example.source, filename, "single", > File "", > line 1, in > df.describe(['age']).show() > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", > line 832, in describe > raise TypeError(f"'cols' must be list[str], but got > {type(s).__name__}") > TypeError: 'cols' must be list[str], but got list {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-41821) Fix DataFrame.describe
[ https://issues.apache.org/jira/browse/SPARK-41821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-41821. -- Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 39378 [https://github.com/apache/spark/pull/39378] > Fix DataFrame.describe > -- > > Key: SPARK-41821 > URL: https://issues.apache.org/jira/browse/SPARK-41821 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Sandeep Singh >Priority: Major > Fix For: 3.4.0 > > > {code:java} > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", > line 898, in pyspark.sql.connect.dataframe.DataFrame.describe > Failed example: > df.describe(['age']).show() > Exception raised: > Traceback (most recent call last): > File > "/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py", > line 1350, in __run > exec(compile(example.source, filename, "single", > File "", > line 1, in > df.describe(['age']).show() > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", > line 832, in describe > raise TypeError(f"'cols' must be list[str], but got > {type(s).__name__}") > TypeError: 'cols' must be list[str], but got list {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41871) DataFrame hint parameter can be str, float or int
[ https://issues.apache.org/jira/browse/SPARK-41871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-41871: Assignee: Sandeep Singh > DataFrame hint parameter can be str, float or int > - > > Key: SPARK-41871 > URL: https://issues.apache.org/jira/browse/SPARK-41871 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Sandeep Singh >Assignee: Sandeep Singh >Priority: Major > > {code:java} > df = self.spark.range(10e10).toDF("id") > such_a_nice_list = ["itworks1", "itworks2", "itworks3"] > hinted_df = df.hint("my awesome hint", 1.2345, "what", such_a_nice_list){code} > {code:java} > Traceback (most recent call last): > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/tests/test_dataframe.py", > line 556, in test_extended_hint_types > hinted_df = df.hint("my awesome hint", 1.2345, "what", such_a_nice_list) > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", > line 482, in hint > raise TypeError( > TypeError: param should be a int or str, but got float 1.2345{code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-41871) DataFrame hint parameter can be str, float or int
[ https://issues.apache.org/jira/browse/SPARK-41871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-41871. -- Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 39393 [https://github.com/apache/spark/pull/39393] > DataFrame hint parameter can be str, float or int > - > > Key: SPARK-41871 > URL: https://issues.apache.org/jira/browse/SPARK-41871 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Sandeep Singh >Assignee: Sandeep Singh >Priority: Major > Fix For: 3.4.0 > > > {code:java} > df = self.spark.range(10e10).toDF("id") > such_a_nice_list = ["itworks1", "itworks2", "itworks3"] > hinted_df = df.hint("my awesome hint", 1.2345, "what", such_a_nice_list){code} > {code:java} > Traceback (most recent call last): > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/tests/test_dataframe.py", > line 556, in test_extended_hint_types > hinted_df = df.hint("my awesome hint", 1.2345, "what", such_a_nice_list) > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", > line 482, in hint > raise TypeError( > TypeError: param should be a int or str, but got float 1.2345{code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41891) Enable test_add_months_function, test_array_repeat, test_dayofweek, test_first_last_ignorenulls, test_function_parity, test_inline, test_window_time, test_reciprocal_t
[ https://issues.apache.org/jira/browse/SPARK-41891?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17654734#comment-17654734 ] Apache Spark commented on SPARK-41891: -- User 'techaddict' has created a pull request for this issue: https://github.com/apache/spark/pull/39400 > Enable test_add_months_function, test_array_repeat, test_dayofweek, > test_first_last_ignorenulls, test_function_parity, test_inline, > test_window_time, test_reciprocal_trig_functions > > > Key: SPARK-41891 > URL: https://issues.apache.org/jira/browse/SPARK-41891 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Sandeep Singh >Assignee: Sandeep Singh >Priority: Major > Fix For: 3.4.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41891) Enable test_add_months_function, test_array_repeat, test_dayofweek, test_first_last_ignorenulls, test_function_parity, test_inline, test_window_time, test_reciprocal_tr
[ https://issues.apache.org/jira/browse/SPARK-41891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-41891: Assignee: Sandeep Singh (was: Apache Spark) > Enable test_add_months_function, test_array_repeat, test_dayofweek, > test_first_last_ignorenulls, test_function_parity, test_inline, > test_window_time, test_reciprocal_trig_functions > > > Key: SPARK-41891 > URL: https://issues.apache.org/jira/browse/SPARK-41891 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Sandeep Singh >Assignee: Sandeep Singh >Priority: Major > Fix For: 3.4.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41891) Enable test_add_months_function, test_array_repeat, test_dayofweek, test_first_last_ignorenulls, test_function_parity, test_inline, test_window_time, test_reciprocal_tr
[ https://issues.apache.org/jira/browse/SPARK-41891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-41891: Assignee: Apache Spark (was: Sandeep Singh) > Enable test_add_months_function, test_array_repeat, test_dayofweek, > test_first_last_ignorenulls, test_function_parity, test_inline, > test_window_time, test_reciprocal_trig_functions > > > Key: SPARK-41891 > URL: https://issues.apache.org/jira/browse/SPARK-41891 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Sandeep Singh >Assignee: Apache Spark >Priority: Major > Fix For: 3.4.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-39318) Remove tpch-plan-stability WithStats golden files
[ https://issues.apache.org/jira/browse/SPARK-39318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-39318: --- Assignee: XiDuo You > Remove tpch-plan-stability WithStats golden files > - > > Key: SPARK-39318 > URL: https://issues.apache.org/jira/browse/SPARK-39318 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.4.0 >Reporter: XiDuo You >Assignee: XiDuo You >Priority: Major > Fix For: 3.4.0 > > > It's a dead golden files since we have no stats with TPCH and no check for > that. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-39318) Remove tpch-plan-stability WithStats golden files
[ https://issues.apache.org/jira/browse/SPARK-39318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-39318. - Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 36700 [https://github.com/apache/spark/pull/36700] > Remove tpch-plan-stability WithStats golden files > - > > Key: SPARK-39318 > URL: https://issues.apache.org/jira/browse/SPARK-39318 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.4.0 >Reporter: XiDuo You >Priority: Major > Fix For: 3.4.0 > > > It's a dead golden files since we have no stats with TPCH and no check for > that. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-41891) Enable test_add_months_function, test_array_repeat, test_dayofweek, test_first_last_ignorenulls, test_function_parity, test_inline, test_window_time, test_reciprocal_tri
[ https://issues.apache.org/jira/browse/SPARK-41891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandeep Singh updated SPARK-41891: -- Summary: Enable test_add_months_function, test_array_repeat, test_dayofweek, test_first_last_ignorenulls, test_function_parity, test_inline, test_window_time, test_reciprocal_trig_functions (was: Enable 8 tests) > Enable test_add_months_function, test_array_repeat, test_dayofweek, > test_first_last_ignorenulls, test_function_parity, test_inline, > test_window_time, test_reciprocal_trig_functions > > > Key: SPARK-41891 > URL: https://issues.apache.org/jira/browse/SPARK-41891 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Sandeep Singh >Assignee: Sandeep Singh >Priority: Major > Fix For: 3.4.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-41892) Add JIRAs or messages for skipped messages
Sandeep Singh created SPARK-41892: - Summary: Add JIRAs or messages for skipped messages Key: SPARK-41892 URL: https://issues.apache.org/jira/browse/SPARK-41892 Project: Spark Issue Type: Sub-task Components: Connect Affects Versions: 3.4.0 Reporter: Sandeep Singh Assignee: Sandeep Singh Fix For: 3.4.0 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-41878) Add JIRAs or messages for skipped tests
[ https://issues.apache.org/jira/browse/SPARK-41878?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandeep Singh updated SPARK-41878: -- Summary: Add JIRAs or messages for skipped tests (was: Add JIRAs or messages for skipped messages) > Add JIRAs or messages for skipped tests > --- > > Key: SPARK-41878 > URL: https://issues.apache.org/jira/browse/SPARK-41878 > Project: Spark > Issue Type: Sub-task > Components: Connect, Tests >Affects Versions: 3.4.0 >Reporter: Sandeep Singh >Assignee: Sandeep Singh >Priority: Major > Fix For: 3.4.0 > > > Add JIRAs or Messages for all the skipped messages. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-41891) Enable 8 tests
Sandeep Singh created SPARK-41891: - Summary: Enable 8 tests Key: SPARK-41891 URL: https://issues.apache.org/jira/browse/SPARK-41891 Project: Spark Issue Type: Sub-task Components: Connect Affects Versions: 3.4.0 Reporter: Sandeep Singh Assignee: Sandeep Singh Fix For: 3.4.0 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41694) Add new config to clean up `spark.ui.store.path` directory when SparkContext.stop()
[ https://issues.apache.org/jira/browse/SPARK-41694?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gengliang Wang reassigned SPARK-41694: -- Assignee: Yang Jie > Add new config to clean up `spark.ui.store.path` directory when > SparkContext.stop() > --- > > Key: SPARK-41694 > URL: https://issues.apache.org/jira/browse/SPARK-41694 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 3.4.0 >Reporter: Yang Jie >Assignee: Yang Jie >Priority: Major > > {{spark.ui.store.path}} directory not clean up when {{SparkContext.stop() > now:}} > # {{{}{}}}The disk space occupied by the {{spark.ui.store.path}} directory > will continue to grow. > # When submitting new App and reusing the {{spark.ui.store.path}} directory, > we will see the content related to the previous App, which is a bit strange -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-41694) Add new config to clean up `spark.ui.store.path` directory when SparkContext.stop()
[ https://issues.apache.org/jira/browse/SPARK-41694?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gengliang Wang resolved SPARK-41694. Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 39226 [https://github.com/apache/spark/pull/39226] > Add new config to clean up `spark.ui.store.path` directory when > SparkContext.stop() > --- > > Key: SPARK-41694 > URL: https://issues.apache.org/jira/browse/SPARK-41694 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 3.4.0 >Reporter: Yang Jie >Assignee: Yang Jie >Priority: Major > Fix For: 3.4.0 > > > {{spark.ui.store.path}} directory not clean up when {{SparkContext.stop() > now:}} > # {{{}{}}}The disk space occupied by the {{spark.ui.store.path}} directory > will continue to grow. > # When submitting new App and reusing the {{spark.ui.store.path}} directory, > we will see the content related to the previous App, which is a bit strange -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41890) Reduce `toSeq` in `RDDOperationGraphWrapperSerializer`/SparkPlanGraphWrapperSerializer` for Scala 2.13
[ https://issues.apache.org/jira/browse/SPARK-41890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17654723#comment-17654723 ] Apache Spark commented on SPARK-41890: -- User 'LuciferYang' has created a pull request for this issue: https://github.com/apache/spark/pull/39399 > Reduce `toSeq` in > `RDDOperationGraphWrapperSerializer`/SparkPlanGraphWrapperSerializer` for > Scala 2.13 > -- > > Key: SPARK-41890 > URL: https://issues.apache.org/jira/browse/SPARK-41890 > Project: Spark > Issue Type: Sub-task > Components: Spark Core, SQL, Web UI >Affects Versions: 3.4.0 >Reporter: Yang Jie >Priority: Minor > > Similar work as SPARK-41709 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41890) Reduce `toSeq` in `RDDOperationGraphWrapperSerializer`/SparkPlanGraphWrapperSerializer` for Scala 2.13
[ https://issues.apache.org/jira/browse/SPARK-41890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-41890: Assignee: (was: Apache Spark) > Reduce `toSeq` in > `RDDOperationGraphWrapperSerializer`/SparkPlanGraphWrapperSerializer` for > Scala 2.13 > -- > > Key: SPARK-41890 > URL: https://issues.apache.org/jira/browse/SPARK-41890 > Project: Spark > Issue Type: Sub-task > Components: Spark Core, SQL, Web UI >Affects Versions: 3.4.0 >Reporter: Yang Jie >Priority: Minor > > Similar work as SPARK-41709 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41890) Reduce `toSeq` in `RDDOperationGraphWrapperSerializer`/SparkPlanGraphWrapperSerializer` for Scala 2.13
[ https://issues.apache.org/jira/browse/SPARK-41890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-41890: Assignee: Apache Spark > Reduce `toSeq` in > `RDDOperationGraphWrapperSerializer`/SparkPlanGraphWrapperSerializer` for > Scala 2.13 > -- > > Key: SPARK-41890 > URL: https://issues.apache.org/jira/browse/SPARK-41890 > Project: Spark > Issue Type: Sub-task > Components: Spark Core, SQL, Web UI >Affects Versions: 3.4.0 >Reporter: Yang Jie >Assignee: Apache Spark >Priority: Minor > > Similar work as SPARK-41709 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-41890) Reduce `toSeq` in `RDDOperationGraphWrapperSerializer`/SparkPlanGraphWrapperSerializer` for Scala 2.13
[ https://issues.apache.org/jira/browse/SPARK-41890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yang Jie updated SPARK-41890: - Description: Similar work to SPARK-41709 (was: Similar work as SPARK-41709) > Reduce `toSeq` in > `RDDOperationGraphWrapperSerializer`/SparkPlanGraphWrapperSerializer` for > Scala 2.13 > -- > > Key: SPARK-41890 > URL: https://issues.apache.org/jira/browse/SPARK-41890 > Project: Spark > Issue Type: Sub-task > Components: Spark Core, SQL, Web UI >Affects Versions: 3.4.0 >Reporter: Yang Jie >Priority: Minor > > Similar work to SPARK-41709 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-41890) Reduce `toSeq` in `RDDOperationGraphWrapperSerializer`/SparkPlanGraphWrapperSerializer` for Scala 2.13
[ https://issues.apache.org/jira/browse/SPARK-41890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yang Jie updated SPARK-41890: - Summary: Reduce `toSeq` in `RDDOperationGraphWrapperSerializer`/SparkPlanGraphWrapperSerializer` for Scala 2.13 (was: Reduce `toSeq` in `RDDOperationGraphWrapperSerializer`/`sql/core/src/main/scala/org/apache/spark/status/protobuf/sql/SparkPlanGraphWrapperSerializer` for Scala 2.13) > Reduce `toSeq` in > `RDDOperationGraphWrapperSerializer`/SparkPlanGraphWrapperSerializer` for > Scala 2.13 > -- > > Key: SPARK-41890 > URL: https://issues.apache.org/jira/browse/SPARK-41890 > Project: Spark > Issue Type: Sub-task > Components: Spark Core, SQL, Web UI >Affects Versions: 3.4.0 >Reporter: Yang Jie >Priority: Minor > > Similar work as SPARK-41709 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-41890) Reduce `toSeq` in `RDDOperationGraphWrapperSerializer`/`sql/core/src/main/scala/org/apache/spark/status/protobuf/sql/SparkPlanGraphWrapperSerializer` for Scala 2.13
Yang Jie created SPARK-41890: Summary: Reduce `toSeq` in `RDDOperationGraphWrapperSerializer`/`sql/core/src/main/scala/org/apache/spark/status/protobuf/sql/SparkPlanGraphWrapperSerializer` for Scala 2.13 Key: SPARK-41890 URL: https://issues.apache.org/jira/browse/SPARK-41890 Project: Spark Issue Type: Sub-task Components: Spark Core, SQL, Web UI Affects Versions: 3.4.0 Reporter: Yang Jie Similar work as SPARK-41709 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41829) Implement Dataframe.sort,sortWithinPartitions Ordering
[ https://issues.apache.org/jira/browse/SPARK-41829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-41829: Assignee: (was: Apache Spark) > Implement Dataframe.sort,sortWithinPartitions Ordering > -- > > Key: SPARK-41829 > URL: https://issues.apache.org/jira/browse/SPARK-41829 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Sandeep Singh >Priority: Major > > {code:java} > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", > line 422, in pyspark.sql.connect.dataframe.DataFrame.sort > Failed example: > df.orderBy(["age", "name"], ascending=[False, False]).show() > Exception raised: > Traceback (most recent call last): > File > "/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py", > line 1350, in __run > exec(compile(example.source, filename, "single", > File "", line > 1, in > df.orderBy(["age", "name"], ascending=[False, False]).show() > TypeError: DataFrame.sort() got an unexpected keyword argument 'ascending' > ** > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", > line 379, in pyspark.sql.connect.dataframe.DataFrame.sortWithinPartitions > Failed example: > df.sortWithinPartitions("age", ascending=False) > Exception raised: > Traceback (most recent call last): > File > "/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py", > line 1350, in __run > exec(compile(example.source, filename, "single", > File " pyspark.sql.connect.dataframe.DataFrame.sortWithinPartitions[1]>", line 1, in > > df.sortWithinPartitions("age", ascending=False) > TypeError: DataFrame.sortWithinPartitions() got an unexpected keyword > argument 'ascending'{code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41829) Implement Dataframe.sort,sortWithinPartitions Ordering
[ https://issues.apache.org/jira/browse/SPARK-41829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-41829: Assignee: Apache Spark > Implement Dataframe.sort,sortWithinPartitions Ordering > -- > > Key: SPARK-41829 > URL: https://issues.apache.org/jira/browse/SPARK-41829 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Sandeep Singh >Assignee: Apache Spark >Priority: Major > > {code:java} > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", > line 422, in pyspark.sql.connect.dataframe.DataFrame.sort > Failed example: > df.orderBy(["age", "name"], ascending=[False, False]).show() > Exception raised: > Traceback (most recent call last): > File > "/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py", > line 1350, in __run > exec(compile(example.source, filename, "single", > File "", line > 1, in > df.orderBy(["age", "name"], ascending=[False, False]).show() > TypeError: DataFrame.sort() got an unexpected keyword argument 'ascending' > ** > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", > line 379, in pyspark.sql.connect.dataframe.DataFrame.sortWithinPartitions > Failed example: > df.sortWithinPartitions("age", ascending=False) > Exception raised: > Traceback (most recent call last): > File > "/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py", > line 1350, in __run > exec(compile(example.source, filename, "single", > File " pyspark.sql.connect.dataframe.DataFrame.sortWithinPartitions[1]>", line 1, in > > df.sortWithinPartitions("age", ascending=False) > TypeError: DataFrame.sortWithinPartitions() got an unexpected keyword > argument 'ascending'{code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41829) Implement Dataframe.sort,sortWithinPartitions Ordering
[ https://issues.apache.org/jira/browse/SPARK-41829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17654718#comment-17654718 ] Apache Spark commented on SPARK-41829: -- User 'zhengruifeng' has created a pull request for this issue: https://github.com/apache/spark/pull/39398 > Implement Dataframe.sort,sortWithinPartitions Ordering > -- > > Key: SPARK-41829 > URL: https://issues.apache.org/jira/browse/SPARK-41829 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Sandeep Singh >Priority: Major > > {code:java} > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", > line 422, in pyspark.sql.connect.dataframe.DataFrame.sort > Failed example: > df.orderBy(["age", "name"], ascending=[False, False]).show() > Exception raised: > Traceback (most recent call last): > File > "/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py", > line 1350, in __run > exec(compile(example.source, filename, "single", > File "", line > 1, in > df.orderBy(["age", "name"], ascending=[False, False]).show() > TypeError: DataFrame.sort() got an unexpected keyword argument 'ascending' > ** > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", > line 379, in pyspark.sql.connect.dataframe.DataFrame.sortWithinPartitions > Failed example: > df.sortWithinPartitions("age", ascending=False) > Exception raised: > Traceback (most recent call last): > File > "/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py", > line 1350, in __run > exec(compile(example.source, filename, "single", > File " pyspark.sql.connect.dataframe.DataFrame.sortWithinPartitions[1]>", line 1, in > > df.sortWithinPartitions("age", ascending=False) > TypeError: DataFrame.sortWithinPartitions() got an unexpected keyword > argument 'ascending'{code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-41889) Attach root cause to invalidPatternError
[ https://issues.apache.org/jira/browse/SPARK-41889?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] BingKun Pan updated SPARK-41889: Summary: Attach root cause to invalidPatternError (was: Attach root cause to INVALID_PARAMETER_VALUE) > Attach root cause to invalidPatternError > > > Key: SPARK-41889 > URL: https://issues.apache.org/jira/browse/SPARK-41889 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: BingKun Pan >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41889) Attach root cause to INVALID_PARAMETER_VALUE
[ https://issues.apache.org/jira/browse/SPARK-41889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17654717#comment-17654717 ] BingKun Pan commented on SPARK-41889: - I work on it. > Attach root cause to INVALID_PARAMETER_VALUE > > > Key: SPARK-41889 > URL: https://issues.apache.org/jira/browse/SPARK-41889 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: BingKun Pan >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41825) DataFrame.show formatting int as double
[ https://issues.apache.org/jira/browse/SPARK-41825?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-41825: Assignee: Ruifeng Zheng > DataFrame.show formatting int as double > --- > > Key: SPARK-41825 > URL: https://issues.apache.org/jira/browse/SPARK-41825 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Sandeep Singh >Assignee: Ruifeng Zheng >Priority: Major > > {code:java} > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", > line 650, in pyspark.sql.connect.dataframe.DataFrame.fillna > Failed example: > df.na.fill(50).show() > Expected: > +---+--+-++ > |age|height| name|bool| > +---+--+-++ > | 10| 80.5|Alice|null| > | 5| 50.0| Bob|null| > | 50| 50.0| Tom|null| > | 50| 50.0| null|true| > +---+--+-++ > Got: > ++--+-++ > | age|height| name|bool| > ++--+-++ > |10.0| 80.5|Alice|null| > | 5.0| 50.0| Bob|null| > |50.0| 50.0| Tom|null| > |50.0| 50.0| null|true| > ++--+-++ > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-41825) DataFrame.show formatting int as double
[ https://issues.apache.org/jira/browse/SPARK-41825?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-41825. -- Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 39396 [https://github.com/apache/spark/pull/39396] > DataFrame.show formatting int as double > --- > > Key: SPARK-41825 > URL: https://issues.apache.org/jira/browse/SPARK-41825 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Sandeep Singh >Assignee: Ruifeng Zheng >Priority: Major > Fix For: 3.4.0 > > > {code:java} > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", > line 650, in pyspark.sql.connect.dataframe.DataFrame.fillna > Failed example: > df.na.fill(50).show() > Expected: > +---+--+-++ > |age|height| name|bool| > +---+--+-++ > | 10| 80.5|Alice|null| > | 5| 50.0| Bob|null| > | 50| 50.0| Tom|null| > | 50| 50.0| null|true| > +---+--+-++ > Got: > ++--+-++ > | age|height| name|bool| > ++--+-++ > |10.0| 80.5|Alice|null| > | 5.0| 50.0| Bob|null| > |50.0| 50.0| Tom|null| > |50.0| 50.0| null|true| > ++--+-++ > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-41889) Attach root cause to INVALID_PARAMETER_VALUE
BingKun Pan created SPARK-41889: --- Summary: Attach root cause to INVALID_PARAMETER_VALUE Key: SPARK-41889 URL: https://issues.apache.org/jira/browse/SPARK-41889 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.4.0 Reporter: BingKun Pan -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-41888) Support StreamingQueryListener for DataFrame.observe
[ https://issues.apache.org/jira/browse/SPARK-41888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] jiaan.geng updated SPARK-41888: --- Summary: Support StreamingQueryListener for DataFrame.observe (was: Support StreamingQueryListener for connect) > Support StreamingQueryListener for DataFrame.observe > > > Key: SPARK-41888 > URL: https://issues.apache.org/jira/browse/SPARK-41888 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: jiaan.geng >Priority: Major > > {code:java} > ** > 1334 > File "/__w/spark/spark/python/pyspark/sql/connect/dataframe.py", line 619, in > pyspark.sql.connect.dataframe.DataFrame.observe > 1335 > Failed example: > 1336 > observation.get > 1337 > Exception raised: > 1338 > Traceback (most recent call last): > 1339 > File "/usr/lib/python3.9/doctest.py", line 1336, in __run > 1340 > exec(compile(example.source, filename, "single", > 1341 > File "", > line 1, in > 1342 > observation.get > 1343 > File "/__w/spark/spark/python/pyspark/sql/utils.py", line 378, in > wrapped > 1344 > raise NotImplementedError() > 1345 > NotImplementedError > 1346 > ** > 1347 > File "/__w/spark/spark/python/pyspark/sql/connect/dataframe.py", line 642, in > pyspark.sql.connect.dataframe.DataFrame.observe > 1348 > Failed example: > 1349 > spark.streams.addListener(MyErrorListener()) > 1350 > Exception raised: > 1351 > Traceback (most recent call last): > 1352 > File "/usr/lib/python3.9/doctest.py", line 1336, in __run > 1353 > exec(compile(example.source, filename, "single", > 1354 > File "", > line 1, in > 1355 > spark.streams.addListener(MyErrorListener()) > 1356 > AttributeError: 'SparkSession' object has no attribute 'streams' > 1357 > ** > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-41888) Support StreamingQueryListener for connect
[ https://issues.apache.org/jira/browse/SPARK-41888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] jiaan.geng updated SPARK-41888: --- Description: {code:java} ** 1334 File "/__w/spark/spark/python/pyspark/sql/connect/dataframe.py", line 619, in pyspark.sql.connect.dataframe.DataFrame.observe 1335 Failed example: 1336 observation.get 1337 Exception raised: 1338 Traceback (most recent call last): 1339 File "/usr/lib/python3.9/doctest.py", line 1336, in __run 1340 exec(compile(example.source, filename, "single", 1341 File "", line 1, in 1342 observation.get 1343 File "/__w/spark/spark/python/pyspark/sql/utils.py", line 378, in wrapped 1344 raise NotImplementedError() 1345 NotImplementedError 1346 ** 1347 File "/__w/spark/spark/python/pyspark/sql/connect/dataframe.py", line 642, in pyspark.sql.connect.dataframe.DataFrame.observe 1348 Failed example: 1349 spark.streams.addListener(MyErrorListener()) 1350 Exception raised: 1351 Traceback (most recent call last): 1352 File "/usr/lib/python3.9/doctest.py", line 1336, in __run 1353 exec(compile(example.source, filename, "single", 1354 File "", line 1, in 1355 spark.streams.addListener(MyErrorListener()) 1356 AttributeError: 'SparkSession' object has no attribute 'streams' 1357 ** {code} > Support StreamingQueryListener for connect > -- > > Key: SPARK-41888 > URL: https://issues.apache.org/jira/browse/SPARK-41888 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: jiaan.geng >Priority: Major > > {code:java} > ** > 1334 > File "/__w/spark/spark/python/pyspark/sql/connect/dataframe.py", line 619, in > pyspark.sql.connect.dataframe.DataFrame.observe > 1335 > Failed example: > 1336 > observation.get > 1337 > Exception raised: > 1338 > Traceback (most recent call last): > 1339 > File "/usr/lib/python3.9/doctest.py", line 1336, in __run > 1340 > exec(compile(example.source, filename, "single", > 1341 > File "", > line 1, in > 1342 > observation.get > 1343 > File "/__w/spark/spark/python/pyspark/sql/utils.py", line 378, in > wrapped > 1344 > raise NotImplementedError() > 1345 > NotImplementedError > 1346 > ** > 1347 > File "/__w/spark/spark/python/pyspark/sql/connect/dataframe.py", line 642, in > pyspark.sql.connect.dataframe.DataFrame.observe > 1348 > Failed example: > 1349 > spark.streams.addListener(MyErrorListener()) > 1350 > Exception raised: > 1351 > Traceback (most recent call last): > 1352 > File "/usr/lib/python3.9/doctest.py", line 1336, in __run > 1353 > exec(compile(example.source, filename, "single", > 1354 > File "", > line 1, in > 1355 > spark.streams.addListener(MyErrorListener()) > 1356 > AttributeError: 'SparkSession' object has no attribute 'streams' > 1357 > ** > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-41888) Support StreamingQueryListener for connect
jiaan.geng created SPARK-41888: -- Summary: Support StreamingQueryListener for connect Key: SPARK-41888 URL: https://issues.apache.org/jira/browse/SPARK-41888 Project: Spark Issue Type: Sub-task Components: Connect Affects Versions: 3.4.0 Reporter: jiaan.geng -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-41887) Support DataFrame hint parameter to be list
Sandeep Singh created SPARK-41887: - Summary: Support DataFrame hint parameter to be list Key: SPARK-41887 URL: https://issues.apache.org/jira/browse/SPARK-41887 Project: Spark Issue Type: Sub-task Components: Connect Affects Versions: 3.4.0 Reporter: Sandeep Singh {code:java} df = self.spark.range(10e10).toDF("id") such_a_nice_list = ["itworks1", "itworks2", "itworks3"] hinted_df = df.hint("my awesome hint", 1.2345, "what", such_a_nice_list){code} {code:java} Traceback (most recent call last): File "/Users/s.singh/personal/spark-oss/python/pyspark/sql/tests/test_dataframe.py", line 556, in test_extended_hint_types hinted_df = df.hint("my awesome hint", 1.2345, "what", such_a_nice_list) File "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", line 482, in hint raise TypeError( TypeError: param should be a int or str, but got float 1.2345{code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-41887) Support DataFrame hint parameter to be list
[ https://issues.apache.org/jira/browse/SPARK-41887?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandeep Singh updated SPARK-41887: -- Description: {code:java} df = self.spark.range(10e10).toDF("id") such_a_nice_list = ["itworks1", "itworks2", "itworks3"] hinted_df = df.hint("my awesome hint", 1.2345, "what", such_a_nice_list){code} was: {code:java} df = self.spark.range(10e10).toDF("id") such_a_nice_list = ["itworks1", "itworks2", "itworks3"] hinted_df = df.hint("my awesome hint", 1.2345, "what", such_a_nice_list){code} {code:java} Traceback (most recent call last): File "/Users/s.singh/personal/spark-oss/python/pyspark/sql/tests/test_dataframe.py", line 556, in test_extended_hint_types hinted_df = df.hint("my awesome hint", 1.2345, "what", such_a_nice_list) File "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", line 482, in hint raise TypeError( TypeError: param should be a int or str, but got float 1.2345{code} > Support DataFrame hint parameter to be list > --- > > Key: SPARK-41887 > URL: https://issues.apache.org/jira/browse/SPARK-41887 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Sandeep Singh >Priority: Major > > {code:java} > df = self.spark.range(10e10).toDF("id") > such_a_nice_list = ["itworks1", "itworks2", "itworks3"] > hinted_df = df.hint("my awesome hint", 1.2345, "what", such_a_nice_list){code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-41871) DataFrame hint parameter can be str, float or int
[ https://issues.apache.org/jira/browse/SPARK-41871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandeep Singh updated SPARK-41871: -- Summary: DataFrame hint parameter can be str, float or int (was: DataFrame hint parameter can be str, list, float or int) > DataFrame hint parameter can be str, float or int > - > > Key: SPARK-41871 > URL: https://issues.apache.org/jira/browse/SPARK-41871 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Sandeep Singh >Priority: Major > > {code:java} > df = self.spark.range(10e10).toDF("id") > such_a_nice_list = ["itworks1", "itworks2", "itworks3"] > hinted_df = df.hint("my awesome hint", 1.2345, "what", such_a_nice_list){code} > {code:java} > Traceback (most recent call last): > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/tests/test_dataframe.py", > line 556, in test_extended_hint_types > hinted_df = df.hint("my awesome hint", 1.2345, "what", such_a_nice_list) > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", > line 482, in hint > raise TypeError( > TypeError: param should be a int or str, but got float 1.2345{code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41825) DataFrame.show formatting int as double
[ https://issues.apache.org/jira/browse/SPARK-41825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17654697#comment-17654697 ] Apache Spark commented on SPARK-41825: -- User 'zhengruifeng' has created a pull request for this issue: https://github.com/apache/spark/pull/39396 > DataFrame.show formatting int as double > --- > > Key: SPARK-41825 > URL: https://issues.apache.org/jira/browse/SPARK-41825 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Sandeep Singh >Priority: Major > > {code:java} > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", > line 650, in pyspark.sql.connect.dataframe.DataFrame.fillna > Failed example: > df.na.fill(50).show() > Expected: > +---+--+-++ > |age|height| name|bool| > +---+--+-++ > | 10| 80.5|Alice|null| > | 5| 50.0| Bob|null| > | 50| 50.0| Tom|null| > | 50| 50.0| null|true| > +---+--+-++ > Got: > ++--+-++ > | age|height| name|bool| > ++--+-++ > |10.0| 80.5|Alice|null| > | 5.0| 50.0| Bob|null| > |50.0| 50.0| Tom|null| > |50.0| 50.0| null|true| > ++--+-++ > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41825) DataFrame.show formatting int as double
[ https://issues.apache.org/jira/browse/SPARK-41825?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-41825: Assignee: Apache Spark > DataFrame.show formatting int as double > --- > > Key: SPARK-41825 > URL: https://issues.apache.org/jira/browse/SPARK-41825 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Sandeep Singh >Assignee: Apache Spark >Priority: Major > > {code:java} > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", > line 650, in pyspark.sql.connect.dataframe.DataFrame.fillna > Failed example: > df.na.fill(50).show() > Expected: > +---+--+-++ > |age|height| name|bool| > +---+--+-++ > | 10| 80.5|Alice|null| > | 5| 50.0| Bob|null| > | 50| 50.0| Tom|null| > | 50| 50.0| null|true| > +---+--+-++ > Got: > ++--+-++ > | age|height| name|bool| > ++--+-++ > |10.0| 80.5|Alice|null| > | 5.0| 50.0| Bob|null| > |50.0| 50.0| Tom|null| > |50.0| 50.0| null|true| > ++--+-++ > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41825) DataFrame.show formatting int as double
[ https://issues.apache.org/jira/browse/SPARK-41825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17654698#comment-17654698 ] Apache Spark commented on SPARK-41825: -- User 'zhengruifeng' has created a pull request for this issue: https://github.com/apache/spark/pull/39396 > DataFrame.show formatting int as double > --- > > Key: SPARK-41825 > URL: https://issues.apache.org/jira/browse/SPARK-41825 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Sandeep Singh >Priority: Major > > {code:java} > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", > line 650, in pyspark.sql.connect.dataframe.DataFrame.fillna > Failed example: > df.na.fill(50).show() > Expected: > +---+--+-++ > |age|height| name|bool| > +---+--+-++ > | 10| 80.5|Alice|null| > | 5| 50.0| Bob|null| > | 50| 50.0| Tom|null| > | 50| 50.0| null|true| > +---+--+-++ > Got: > ++--+-++ > | age|height| name|bool| > ++--+-++ > |10.0| 80.5|Alice|null| > | 5.0| 50.0| Bob|null| > |50.0| 50.0| Tom|null| > |50.0| 50.0| null|true| > ++--+-++ > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41825) DataFrame.show formatting int as double
[ https://issues.apache.org/jira/browse/SPARK-41825?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-41825: Assignee: (was: Apache Spark) > DataFrame.show formatting int as double > --- > > Key: SPARK-41825 > URL: https://issues.apache.org/jira/browse/SPARK-41825 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Sandeep Singh >Priority: Major > > {code:java} > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", > line 650, in pyspark.sql.connect.dataframe.DataFrame.fillna > Failed example: > df.na.fill(50).show() > Expected: > +---+--+-++ > |age|height| name|bool| > +---+--+-++ > | 10| 80.5|Alice|null| > | 5| 50.0| Bob|null| > | 50| 50.0| Tom|null| > | 50| 50.0| null|true| > +---+--+-++ > Got: > ++--+-++ > | age|height| name|bool| > ++--+-++ > |10.0| 80.5|Alice|null| > | 5.0| 50.0| Bob|null| > |50.0| 50.0| Tom|null| > |50.0| 50.0| null|true| > ++--+-++ > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-41886) `DataFrame.intersect` doctest output has different order
Ruifeng Zheng created SPARK-41886: - Summary: `DataFrame.intersect` doctest output has different order Key: SPARK-41886 URL: https://issues.apache.org/jira/browse/SPARK-41886 Project: Spark Issue Type: Sub-task Components: Connect, PySpark Affects Versions: 3.4.0 Reporter: Ruifeng Zheng not sure whether this needs to be fix: {code:java} File "/Users/ruifeng.zheng/Dev/spark/python/pyspark/sql/connect/dataframe.py", line 609, in pyspark.sql.connect.dataframe.DataFrame.intersect Failed example: df1.intersect(df2).show() Expected: +---+---+ | C1| C2| +---+---+ | b| 3| | a| 1| +---+---+ Got: +---+---+ | C1| C2| +---+---+ | a| 1| | b| 3| +---+---+ ** 1 of 3 in pyspark.sql.connect.dataframe.DataFrame.intersect {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-41053) Better Spark UI scalability and Driver stability for large applications
[ https://issues.apache.org/jira/browse/SPARK-41053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-41053: -- Labels: releasenotes (was: release-notes) > Better Spark UI scalability and Driver stability for large applications > --- > > Key: SPARK-41053 > URL: https://issues.apache.org/jira/browse/SPARK-41053 > Project: Spark > Issue Type: Umbrella > Components: Spark Core, Web UI >Affects Versions: 3.4.0 >Reporter: Gengliang Wang >Priority: Major > Labels: releasenotes > Attachments: Better Spark UI scalability and Driver stability for > large applications.pdf > > > After SPARK-18085, the Spark history server(SHS) becomes more scalable for > processing large applications by supporting a persistent > KV-store(LevelDB/RocksDB) as the storage layer. > As for the live Spark UI, all the data is still stored in memory, which can > bring memory pressures to the Spark driver for large applications. > For better Spark UI scalability and Driver stability, I propose to > * {*}Support storing all the UI data in a persistent KV store{*}. > RocksDB/LevelDB provides low memory overhead. Their write/read performance is > fast enough to serve the write/read workload for live UI. SHS can leverage > the persistent KV store to fasten its startup. > * *Support a new Protobuf serializer for all the UI data.* The new > serializer is supposed to be faster, according to benchmarks. It will be the > default serializer for the persistent KV store of live UI. As for event logs, > it is optional. The current serializer for UI data is JSON. When writing > persistent KV-store, there is GZip compression. Since there is compression > support in RocksDB/LevelDB, the new serializer won’t compress the output > before writing to the persistent KV store. Here is a benchmark of > writing/reading 100,000 SQLExecutionUIData to/from RocksDB: > > |*Serializer*|*Avg Write time(μs)*|*Avg Read time(μs)*|*RocksDB File Total > Size(MB)*|*Result total size in memory(MB)*| > |*Spark’s KV Serializer(JSON+gzip)*|352.2|119.26|837|868| > |*Protobuf*|109.9|34.3|858|2105| > I am also proposing to support RocksDB instead of both LevelDB & RocksDB in > the live UI. > SPIP: > [https://docs.google.com/document/d/1cuKnFwlTodyVhUQPMuakq2YDaLH05jaY9FRu_aD1zMo/edit?usp=sharing] > SPIP vote: https://lists.apache.org/thread/lom4zcob6237q6nnj46jylkzwmmsxvgj -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-41286) Build, package and infrastructure for Spark Connect
[ https://issues.apache.org/jira/browse/SPARK-41286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-41286. -- Resolution: Done I am going to mark it as done for now. > Build, package and infrastructure for Spark Connect > --- > > Key: SPARK-41286 > URL: https://issues.apache.org/jira/browse/SPARK-41286 > Project: Spark > Issue Type: Umbrella > Components: Connect >Affects Versions: 3.4.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Critical > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-41841) Support PyPI packaging without JVM
[ https://issues.apache.org/jira/browse/SPARK-41841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-41841. -- Resolution: Later > Support PyPI packaging without JVM > -- > > Key: SPARK-41841 > URL: https://issues.apache.org/jira/browse/SPARK-41841 > Project: Spark > Issue Type: Sub-task > Components: Build, Connect >Affects Versions: 3.4.0 >Reporter: Hyukjin Kwon >Priority: Blocker > > We should support pip install pyspark without JVM so Spark Connect can be > real lightweight library. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-41878) Add JIRAs or messages for skipped messages
[ https://issues.apache.org/jira/browse/SPARK-41878?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-41878. -- Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 39382 [https://github.com/apache/spark/pull/39382] > Add JIRAs or messages for skipped messages > -- > > Key: SPARK-41878 > URL: https://issues.apache.org/jira/browse/SPARK-41878 > Project: Spark > Issue Type: Sub-task > Components: Connect, Tests >Affects Versions: 3.4.0 >Reporter: Sandeep Singh >Assignee: Sandeep Singh >Priority: Major > Fix For: 3.4.0 > > > Add JIRAs or Messages for all the skipped messages. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41881) `DataFrame.collect` should handle None/NaN properly
[ https://issues.apache.org/jira/browse/SPARK-41881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-41881: Assignee: Ruifeng Zheng > `DataFrame.collect` should handle None/NaN properly > --- > > Key: SPARK-41881 > URL: https://issues.apache.org/jira/browse/SPARK-41881 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 3.4.0 >Reporter: Ruifeng Zheng >Assignee: Ruifeng Zheng >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41815) Column.isNull returns nan instead of None
[ https://issues.apache.org/jira/browse/SPARK-41815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-41815: Assignee: Ruifeng Zheng > Column.isNull returns nan instead of None > - > > Key: SPARK-41815 > URL: https://issues.apache.org/jira/browse/SPARK-41815 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Hyukjin Kwon >Assignee: Ruifeng Zheng >Priority: Major > > {code} > File "/.../spark/python/pyspark/sql/connect/column.py", line 99, in > pyspark.sql.connect.column.Column.isNull > Failed example: > df.filter(df.height.isNull()).collect() > Expected: > [Row(name='Alice', height=None)] > Got: > [Row(name='Alice', height=nan)] > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-41815) Column.isNull returns nan instead of None
[ https://issues.apache.org/jira/browse/SPARK-41815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-41815. -- Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 39386 [https://github.com/apache/spark/pull/39386] > Column.isNull returns nan instead of None > - > > Key: SPARK-41815 > URL: https://issues.apache.org/jira/browse/SPARK-41815 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Hyukjin Kwon >Assignee: Ruifeng Zheng >Priority: Major > Fix For: 3.4.0 > > > {code} > File "/.../spark/python/pyspark/sql/connect/column.py", line 99, in > pyspark.sql.connect.column.Column.isNull > Failed example: > df.filter(df.height.isNull()).collect() > Expected: > [Row(name='Alice', height=None)] > Got: > [Row(name='Alice', height=nan)] > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-41833) DataFrame.collect() output parity with pyspark
[ https://issues.apache.org/jira/browse/SPARK-41833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-41833. -- Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 39386 [https://github.com/apache/spark/pull/39386] > DataFrame.collect() output parity with pyspark > -- > > Key: SPARK-41833 > URL: https://issues.apache.org/jira/browse/SPARK-41833 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Sandeep Singh >Assignee: Ruifeng Zheng >Priority: Major > Fix For: 3.4.0 > > > {code:java} > ** > > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/functions.py", > line 1117, in pyspark.sql.connect.functions.array > Failed example: > df.select(array('age', 'age').alias("arr")).collect() > Expected: > [Row(arr=[2, 2]), Row(arr=[5, 5])] > Got: > [Row(arr=array([2, 2])), Row(arr=array([5, 5]))] > ** > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/functions.py", > line 1119, in pyspark.sql.connect.functions.array > Failed example: > df.select(array([df.age, df.age]).alias("arr")).collect() > Expected: > [Row(arr=[2, 2]), Row(arr=[5, 5])] > Got: > [Row(arr=array([2, 2])), Row(arr=array([5, 5]))] > ** > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/functions.py", > line 1124, in pyspark.sql.connect.functions.array_distinct > Failed example: > df.select(array_distinct(df.data)).collect() > Expected: > [Row(array_distinct(data)=[1, 2, 3]), Row(array_distinct(data)=[4, 5])] > Got: > [Row(array_distinct(data)=array([1, 2, 3])), > Row(array_distinct(data)=array([4, 5]))] > ** > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/functions.py", > line 1135, in pyspark.sql.connect.functions.array_except > Failed example: > df.select(array_except(df.c1, df.c2)).collect() > Expected: > [Row(array_except(c1, c2)=['b'])] > Got: > [Row(array_except(c1, c2)=array(['b'], dtype=object))] > ** > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/functions.py", > line 1142, in pyspark.sql.connect.functions.array_intersect > Failed example: > df.select(array_intersect(df.c1, df.c2)).collect() > Expected: > [Row(array_intersect(c1, c2)=['a', 'c'])] > Got: > [Row(array_intersect(c1, c2)=array(['a', 'c'], dtype=object))] > ** > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/functions.py", > line 1180, in pyspark.sql.connect.functions.array_remove > Failed example: > df.select(array_remove(df.data, 1)).collect() > Expected: > [Row(array_remove(data, 1)=[2, 3]), Row(array_remove(data, 1)=[])] > Got: > [Row(array_remove(data, 1)=array([2, 3])), Row(array_remove(data, > 1)=array([], dtype=int64))] > ** > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/functions.py", > line 1187, in pyspark.sql.connect.functions.array_repeat > Failed example: > df.select(array_repeat(df.data, 3).alias('r')).collect() > Expected: > [Row(r=['ab', 'ab', 'ab'])] > Got: > [Row(r=array(['ab', 'ab', 'ab'], dtype=object))] > ** > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/functions.py", > line 1204, in pyspark.sql.connect.functions.array_sort > Failed example: > df.select(array_sort(df.data).alias('r')).collect() > Expected: > [Row(r=[1, 2, 3, None]), Row(r=[1]), Row(r=[])] > Got: > [Row(r=array([ 1., 2., 3., nan])), Row(r=array([1])), Row(r=array([], > dtype=int64))] > ** > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/functions.py", > line 1207, in pyspark.sql.connect.functions.array_sort > Failed example: > df.select(array_sort( > "data", > lambda x, y: when(x.isNull() | y.isNull(), > lit(0)).otherwise(length(y) - length(x)) > ).alias("r")).collect() > Expected: > [Row(r=['foobar', 'foo', None, 'bar']), Row(r=['foo']), Row(r=[])] > Got: > [Row(r=array(['foobar', 'foo', None, 'bar'], dtype=object)), > Row(r=array(['foo'], dtype=object)), Row(r=array([], dtype=object))] >
[jira] [Assigned] (SPARK-41833) DataFrame.collect() output parity with pyspark
[ https://issues.apache.org/jira/browse/SPARK-41833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-41833: Assignee: Ruifeng Zheng > DataFrame.collect() output parity with pyspark > -- > > Key: SPARK-41833 > URL: https://issues.apache.org/jira/browse/SPARK-41833 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Sandeep Singh >Assignee: Ruifeng Zheng >Priority: Major > > {code:java} > ** > > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/functions.py", > line 1117, in pyspark.sql.connect.functions.array > Failed example: > df.select(array('age', 'age').alias("arr")).collect() > Expected: > [Row(arr=[2, 2]), Row(arr=[5, 5])] > Got: > [Row(arr=array([2, 2])), Row(arr=array([5, 5]))] > ** > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/functions.py", > line 1119, in pyspark.sql.connect.functions.array > Failed example: > df.select(array([df.age, df.age]).alias("arr")).collect() > Expected: > [Row(arr=[2, 2]), Row(arr=[5, 5])] > Got: > [Row(arr=array([2, 2])), Row(arr=array([5, 5]))] > ** > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/functions.py", > line 1124, in pyspark.sql.connect.functions.array_distinct > Failed example: > df.select(array_distinct(df.data)).collect() > Expected: > [Row(array_distinct(data)=[1, 2, 3]), Row(array_distinct(data)=[4, 5])] > Got: > [Row(array_distinct(data)=array([1, 2, 3])), > Row(array_distinct(data)=array([4, 5]))] > ** > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/functions.py", > line 1135, in pyspark.sql.connect.functions.array_except > Failed example: > df.select(array_except(df.c1, df.c2)).collect() > Expected: > [Row(array_except(c1, c2)=['b'])] > Got: > [Row(array_except(c1, c2)=array(['b'], dtype=object))] > ** > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/functions.py", > line 1142, in pyspark.sql.connect.functions.array_intersect > Failed example: > df.select(array_intersect(df.c1, df.c2)).collect() > Expected: > [Row(array_intersect(c1, c2)=['a', 'c'])] > Got: > [Row(array_intersect(c1, c2)=array(['a', 'c'], dtype=object))] > ** > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/functions.py", > line 1180, in pyspark.sql.connect.functions.array_remove > Failed example: > df.select(array_remove(df.data, 1)).collect() > Expected: > [Row(array_remove(data, 1)=[2, 3]), Row(array_remove(data, 1)=[])] > Got: > [Row(array_remove(data, 1)=array([2, 3])), Row(array_remove(data, > 1)=array([], dtype=int64))] > ** > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/functions.py", > line 1187, in pyspark.sql.connect.functions.array_repeat > Failed example: > df.select(array_repeat(df.data, 3).alias('r')).collect() > Expected: > [Row(r=['ab', 'ab', 'ab'])] > Got: > [Row(r=array(['ab', 'ab', 'ab'], dtype=object))] > ** > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/functions.py", > line 1204, in pyspark.sql.connect.functions.array_sort > Failed example: > df.select(array_sort(df.data).alias('r')).collect() > Expected: > [Row(r=[1, 2, 3, None]), Row(r=[1]), Row(r=[])] > Got: > [Row(r=array([ 1., 2., 3., nan])), Row(r=array([1])), Row(r=array([], > dtype=int64))] > ** > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/functions.py", > line 1207, in pyspark.sql.connect.functions.array_sort > Failed example: > df.select(array_sort( > "data", > lambda x, y: when(x.isNull() | y.isNull(), > lit(0)).otherwise(length(y) - length(x)) > ).alias("r")).collect() > Expected: > [Row(r=['foobar', 'foo', None, 'bar']), Row(r=['foo']), Row(r=[])] > Got: > [Row(r=array(['foobar', 'foo', None, 'bar'], dtype=object)), > Row(r=array(['foo'], dtype=object)), Row(r=array([], dtype=object))] > ** > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/functions.py",
[jira] [Resolved] (SPARK-41881) `DataFrame.collect` should handle None/NaN properly
[ https://issues.apache.org/jira/browse/SPARK-41881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-41881. -- Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 39386 [https://github.com/apache/spark/pull/39386] > `DataFrame.collect` should handle None/NaN properly > --- > > Key: SPARK-41881 > URL: https://issues.apache.org/jira/browse/SPARK-41881 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 3.4.0 >Reporter: Ruifeng Zheng >Assignee: Ruifeng Zheng >Priority: Major > Fix For: 3.4.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-41846) DataFrame windowspec functions : unresolved columns
[ https://issues.apache.org/jira/browse/SPARK-41846?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-41846. -- Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 39392 [https://github.com/apache/spark/pull/39392] > DataFrame windowspec functions : unresolved columns > --- > > Key: SPARK-41846 > URL: https://issues.apache.org/jira/browse/SPARK-41846 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Sandeep Singh >Assignee: Sandeep Singh >Priority: Major > Fix For: 3.4.0 > > > {code:java} > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/functions.py", > line 1098, in pyspark.sql.connect.functions.rank > Failed example: > df.withColumn("drank", rank().over(w)).show() > Exception raised: > Traceback (most recent call last): > File > "/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py", > line 1350, in __run > exec(compile(example.source, filename, "single", > File "", line 1, in > > df.withColumn("drank", rank().over(w)).show() > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", > line 534, in show > print(self._show_string(n, truncate, vertical)) > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", > line 423, in _show_string > ).toPandas() > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", > line 1031, in toPandas > return self._session.client.to_pandas(query) > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", > line 413, in to_pandas > return self._execute_and_fetch(req) > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", > line 573, in _execute_and_fetch > self._handle_error(rpc_error) > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", > line 619, in _handle_error > raise SparkConnectAnalysisException( > pyspark.sql.connect.client.SparkConnectAnalysisException: > [UNRESOLVED_COLUMN.WITH_SUGGESTION] A column or function parameter with name > `value` cannot be resolved. Did you mean one of the following? [`_1`] > Plan: 'Project [_1#4000L, rank() windowspecdefinition('value ASC NULLS > FIRST, specifiedwindowframe(RowFrame, unboundedpreceding$(), currentrow$())) > AS drank#4003] > +- Project [0#3998L AS _1#4000L] > +- LocalRelation [0#3998L] {code} > {code:java} > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/functions.py", > line 1032, in pyspark.sql.connect.functions.cume_dist > Failed example: > df.withColumn("cd", cume_dist().over(w)).show() > Exception raised: > Traceback (most recent call last): > File > "/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py", > line 1350, in __run > exec(compile(example.source, filename, "single", > File "", line 1, in > > df.withColumn("cd", cume_dist().over(w)).show() > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", > line 534, in show > print(self._show_string(n, truncate, vertical)) > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", > line 423, in _show_string > ).toPandas() > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", > line 1031, in toPandas > return self._session.client.to_pandas(query) > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", > line 413, in to_pandas > return self._execute_and_fetch(req) > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", > line 573, in _execute_and_fetch > self._handle_error(rpc_error) > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", > line 619, in _handle_error > raise SparkConnectAnalysisException( > pyspark.sql.connect.client.SparkConnectAnalysisException: > [UNRESOLVED_COLUMN.WITH_SUGGESTION] A column or function parameter with name > `value` cannot be resolved. Did you mean one of the following? [`_1`] > Plan: 'Project [_1#2202L, cume_dist() windowspecdefinition('value ASC > NULLS FIRST, specifiedwindowframe(RangeFrame, unboundedpreceding$(), > currentrow$())) AS cd#2205] > +- Project [0#2200L AS _1#2202L] > +- LocalRelation [0#2200L] {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (SPARK-41846) DataFrame windowspec functions : unresolved columns
[ https://issues.apache.org/jira/browse/SPARK-41846?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-41846: Assignee: Ruifeng Zheng (was: Sandeep Singh) > DataFrame windowspec functions : unresolved columns > --- > > Key: SPARK-41846 > URL: https://issues.apache.org/jira/browse/SPARK-41846 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Sandeep Singh >Assignee: Ruifeng Zheng >Priority: Major > Fix For: 3.4.0 > > > {code:java} > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/functions.py", > line 1098, in pyspark.sql.connect.functions.rank > Failed example: > df.withColumn("drank", rank().over(w)).show() > Exception raised: > Traceback (most recent call last): > File > "/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py", > line 1350, in __run > exec(compile(example.source, filename, "single", > File "", line 1, in > > df.withColumn("drank", rank().over(w)).show() > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", > line 534, in show > print(self._show_string(n, truncate, vertical)) > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", > line 423, in _show_string > ).toPandas() > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", > line 1031, in toPandas > return self._session.client.to_pandas(query) > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", > line 413, in to_pandas > return self._execute_and_fetch(req) > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", > line 573, in _execute_and_fetch > self._handle_error(rpc_error) > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", > line 619, in _handle_error > raise SparkConnectAnalysisException( > pyspark.sql.connect.client.SparkConnectAnalysisException: > [UNRESOLVED_COLUMN.WITH_SUGGESTION] A column or function parameter with name > `value` cannot be resolved. Did you mean one of the following? [`_1`] > Plan: 'Project [_1#4000L, rank() windowspecdefinition('value ASC NULLS > FIRST, specifiedwindowframe(RowFrame, unboundedpreceding$(), currentrow$())) > AS drank#4003] > +- Project [0#3998L AS _1#4000L] > +- LocalRelation [0#3998L] {code} > {code:java} > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/functions.py", > line 1032, in pyspark.sql.connect.functions.cume_dist > Failed example: > df.withColumn("cd", cume_dist().over(w)).show() > Exception raised: > Traceback (most recent call last): > File > "/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py", > line 1350, in __run > exec(compile(example.source, filename, "single", > File "", line 1, in > > df.withColumn("cd", cume_dist().over(w)).show() > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", > line 534, in show > print(self._show_string(n, truncate, vertical)) > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", > line 423, in _show_string > ).toPandas() > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", > line 1031, in toPandas > return self._session.client.to_pandas(query) > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", > line 413, in to_pandas > return self._execute_and_fetch(req) > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", > line 573, in _execute_and_fetch > self._handle_error(rpc_error) > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", > line 619, in _handle_error > raise SparkConnectAnalysisException( > pyspark.sql.connect.client.SparkConnectAnalysisException: > [UNRESOLVED_COLUMN.WITH_SUGGESTION] A column or function parameter with name > `value` cannot be resolved. Did you mean one of the following? [`_1`] > Plan: 'Project [_1#2202L, cume_dist() windowspecdefinition('value ASC > NULLS FIRST, specifiedwindowframe(RangeFrame, unboundedpreceding$(), > currentrow$())) AS cd#2205] > +- Project [0#2200L AS _1#2202L] > +- LocalRelation [0#2200L] {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail:
[jira] [Assigned] (SPARK-41846) DataFrame windowspec functions : unresolved columns
[ https://issues.apache.org/jira/browse/SPARK-41846?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-41846: Assignee: Sandeep Singh > DataFrame windowspec functions : unresolved columns > --- > > Key: SPARK-41846 > URL: https://issues.apache.org/jira/browse/SPARK-41846 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Sandeep Singh >Assignee: Sandeep Singh >Priority: Major > > {code:java} > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/functions.py", > line 1098, in pyspark.sql.connect.functions.rank > Failed example: > df.withColumn("drank", rank().over(w)).show() > Exception raised: > Traceback (most recent call last): > File > "/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py", > line 1350, in __run > exec(compile(example.source, filename, "single", > File "", line 1, in > > df.withColumn("drank", rank().over(w)).show() > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", > line 534, in show > print(self._show_string(n, truncate, vertical)) > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", > line 423, in _show_string > ).toPandas() > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", > line 1031, in toPandas > return self._session.client.to_pandas(query) > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", > line 413, in to_pandas > return self._execute_and_fetch(req) > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", > line 573, in _execute_and_fetch > self._handle_error(rpc_error) > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", > line 619, in _handle_error > raise SparkConnectAnalysisException( > pyspark.sql.connect.client.SparkConnectAnalysisException: > [UNRESOLVED_COLUMN.WITH_SUGGESTION] A column or function parameter with name > `value` cannot be resolved. Did you mean one of the following? [`_1`] > Plan: 'Project [_1#4000L, rank() windowspecdefinition('value ASC NULLS > FIRST, specifiedwindowframe(RowFrame, unboundedpreceding$(), currentrow$())) > AS drank#4003] > +- Project [0#3998L AS _1#4000L] > +- LocalRelation [0#3998L] {code} > {code:java} > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/functions.py", > line 1032, in pyspark.sql.connect.functions.cume_dist > Failed example: > df.withColumn("cd", cume_dist().over(w)).show() > Exception raised: > Traceback (most recent call last): > File > "/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py", > line 1350, in __run > exec(compile(example.source, filename, "single", > File "", line 1, in > > df.withColumn("cd", cume_dist().over(w)).show() > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", > line 534, in show > print(self._show_string(n, truncate, vertical)) > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", > line 423, in _show_string > ).toPandas() > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", > line 1031, in toPandas > return self._session.client.to_pandas(query) > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", > line 413, in to_pandas > return self._execute_and_fetch(req) > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", > line 573, in _execute_and_fetch > self._handle_error(rpc_error) > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", > line 619, in _handle_error > raise SparkConnectAnalysisException( > pyspark.sql.connect.client.SparkConnectAnalysisException: > [UNRESOLVED_COLUMN.WITH_SUGGESTION] A column or function parameter with name > `value` cannot be resolved. Did you mean one of the following? [`_1`] > Plan: 'Project [_1#2202L, cume_dist() windowspecdefinition('value ASC > NULLS FIRST, specifiedwindowframe(RangeFrame, unboundedpreceding$(), > currentrow$())) AS cd#2205] > +- Project [0#2200L AS _1#2202L] > +- LocalRelation [0#2200L] {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail:
[jira] [Assigned] (SPARK-41840) DataFrame.show(): 'Column' object is not callable
[ https://issues.apache.org/jira/browse/SPARK-41840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-41840: Assignee: Ruifeng Zheng > DataFrame.show(): 'Column' object is not callable > - > > Key: SPARK-41840 > URL: https://issues.apache.org/jira/browse/SPARK-41840 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Sandeep Singh >Assignee: Ruifeng Zheng >Priority: Major > > {code:java} > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/functions.py", > line 855, in pyspark.sql.connect.functions.first > Failed example: > df.groupby("name").agg(first("age", > ignorenulls=True)).orderBy("name").show() > Exception raised: > Traceback (most recent call last): > File > "/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py", > line 1350, in __run > exec(compile(example.source, filename, "single", > File "", line 1, in > > df.groupby("name").agg(first("age", > ignorenulls=True)).orderBy("name").show() > TypeError: 'Column' object is not callable{code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-41840) DataFrame.show(): 'Column' object is not callable
[ https://issues.apache.org/jira/browse/SPARK-41840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-41840. -- Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 39390 [https://github.com/apache/spark/pull/39390] > DataFrame.show(): 'Column' object is not callable > - > > Key: SPARK-41840 > URL: https://issues.apache.org/jira/browse/SPARK-41840 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Sandeep Singh >Assignee: Ruifeng Zheng >Priority: Major > Fix For: 3.4.0 > > > {code:java} > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/functions.py", > line 855, in pyspark.sql.connect.functions.first > Failed example: > df.groupby("name").agg(first("age", > ignorenulls=True)).orderBy("name").show() > Exception raised: > Traceback (most recent call last): > File > "/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py", > line 1350, in __run > exec(compile(example.source, filename, "single", > File "", line 1, in > > df.groupby("name").agg(first("age", > ignorenulls=True)).orderBy("name").show() > TypeError: 'Column' object is not callable{code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41677) Protobuf serializer for StreamingQueryProgressWrapper
[ https://issues.apache.org/jira/browse/SPARK-41677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gengliang Wang reassigned SPARK-41677: -- Assignee: Yang Jie > Protobuf serializer for StreamingQueryProgressWrapper > - > > Key: SPARK-41677 > URL: https://issues.apache.org/jira/browse/SPARK-41677 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Yang Jie >Assignee: Yang Jie >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-41677) Protobuf serializer for StreamingQueryProgressWrapper
[ https://issues.apache.org/jira/browse/SPARK-41677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gengliang Wang resolved SPARK-41677. Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 39357 [https://github.com/apache/spark/pull/39357] > Protobuf serializer for StreamingQueryProgressWrapper > - > > Key: SPARK-41677 > URL: https://issues.apache.org/jira/browse/SPARK-41677 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Yang Jie >Assignee: Yang Jie >Priority: Major > Fix For: 3.4.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-41768) Refactor the definition of enum - `JobExecutionStatus` to follow with the code style
[ https://issues.apache.org/jira/browse/SPARK-41768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gengliang Wang resolved SPARK-41768. Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 39286 [https://github.com/apache/spark/pull/39286] > Refactor the definition of enum - `JobExecutionStatus` to follow with the > code style > - > > Key: SPARK-41768 > URL: https://issues.apache.org/jira/browse/SPARK-41768 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 3.4.0 >Reporter: BingKun Pan >Assignee: BingKun Pan >Priority: Minor > Fix For: 3.4.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41768) Refactor the definition of enum - `JobExecutionStatus` to follow with the code style
[ https://issues.apache.org/jira/browse/SPARK-41768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gengliang Wang reassigned SPARK-41768: -- Assignee: BingKun Pan > Refactor the definition of enum - `JobExecutionStatus` to follow with the > code style > - > > Key: SPARK-41768 > URL: https://issues.apache.org/jira/browse/SPARK-41768 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 3.4.0 >Reporter: BingKun Pan >Assignee: BingKun Pan >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41573) Assign name to _LEGACY_ERROR_TEMP_2136
[ https://issues.apache.org/jira/browse/SPARK-41573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Max Gekk reassigned SPARK-41573: Assignee: Haejoon Lee > Assign name to _LEGACY_ERROR_TEMP_2136 > -- > > Key: SPARK-41573 > URL: https://issues.apache.org/jira/browse/SPARK-41573 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Haejoon Lee >Assignee: Haejoon Lee >Priority: Major > > We should use proper error class name rather than `_LEGACY_ERROR_TEMP_xxx`. > > *NOTE:* Please reply to this ticket before start working on it, to avoid > working on same ticket at a time -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-41573) Assign name to _LEGACY_ERROR_TEMP_2136
[ https://issues.apache.org/jira/browse/SPARK-41573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Max Gekk resolved SPARK-41573. -- Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 39284 [https://github.com/apache/spark/pull/39284] > Assign name to _LEGACY_ERROR_TEMP_2136 > -- > > Key: SPARK-41573 > URL: https://issues.apache.org/jira/browse/SPARK-41573 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Haejoon Lee >Assignee: Haejoon Lee >Priority: Major > Fix For: 3.4.0 > > > We should use proper error class name rather than `_LEGACY_ERROR_TEMP_xxx`. > > *NOTE:* Please reply to this ticket before start working on it, to avoid > working on same ticket at a time -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41497) Accumulator undercounting in the case of retry task with rdd cache
[ https://issues.apache.org/jira/browse/SPARK-41497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17654590#comment-17654590 ] Mridul Muralidharan commented on SPARK-41497: - Sounds good [~Ngone51], thanks ! > Accumulator undercounting in the case of retry task with rdd cache > -- > > Key: SPARK-41497 > URL: https://issues.apache.org/jira/browse/SPARK-41497 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.4.8, 3.0.3, 3.1.3, 3.2.2, 3.3.1 >Reporter: wuyi >Priority: Major > > Accumulator could be undercounted when the retried task has rdd cache. See > the example below and you could also find the completed and reproducible > example at > [https://github.com/apache/spark/compare/master...Ngone51:spark:fix-acc] > > {code:scala} > test("SPARK-XXX") { > // Set up a cluster with 2 executors > val conf = new SparkConf() > .setMaster("local-cluster[2, 1, > 1024]").setAppName("TaskSchedulerImplSuite") > sc = new SparkContext(conf) > // Set up a custom task scheduler. The scheduler will fail the first task > attempt of the job > // submitted below. In particular, the failed first attempt task would > success on computation > // (accumulator accounting, result caching) but only fail to report its > success status due > // to the concurrent executor lost. The second task attempt would success. > taskScheduler = setupSchedulerWithCustomStatusUpdate(sc) > val myAcc = sc.longAccumulator("myAcc") > // Initiate a rdd with only one partition so there's only one task and > specify the storage level > // with MEMORY_ONLY_2 so that the rdd result will be cached on both two > executors. > val rdd = sc.parallelize(0 until 10, 1).mapPartitions { iter => > myAcc.add(100) > iter.map(x => x + 1) > }.persist(StorageLevel.MEMORY_ONLY_2) > // This will pass since the second task attempt will succeed > assert(rdd.count() === 10) > // This will fail due to `myAcc.add(100)` won't be executed during the > second task attempt's > // execution. Because the second task attempt will load the rdd cache > directly instead of > // executing the task function so `myAcc.add(100)` is skipped. > assert(myAcc.value === 100) > } {code} > > We could also hit this issue with decommission even if the rdd only has one > copy. For example, decommission could migrate the rdd cache block to another > executor (the result is actually the same with 2 copies) and the > decommissioned executor lost before the task reports its success status to > the driver. > > And the issue is a bit more complicated than expected to fix. I have tried to > give some fixes but all of them are not ideal: > Option 1: Clean up any rdd cache related to the failed task: in practice, > this option can already fix the issue in most cases. However, theoretically, > rdd cache could be reported to the driver right after the driver cleans up > the failed task's caches due to asynchronous communication. So this option > can’t resolve the issue thoroughly; > Option 2: Disallow rdd cache reuse across the task attempts for the same > task: this option can 100% fix the issue. The problem is this way can also > affect the case where rdd cache can be reused across the attempts (e.g., when > there is no accumulator operation in the task), which can have perf > regression; > Option 3: Introduce accumulator cache: first, this requires a new framework > for supporting accumulator cache; second, the driver should improve its logic > to distinguish whether the accumulator cache value should be reported to the > user to avoid overcounting. For example, in the case of task retry, the value > should be reported. However, in the case of rdd cache reuse, the value > shouldn’t be reported (should it?); > Option 4: Do task success validation when a task trying to load the rdd > cache: this way defines a rdd cache is only valid/accessible if the task has > succeeded. This way could be either overkill or a bit complex (because > currently Spark would clean up the task state once it’s finished. So we need > to maintain a structure to know if task once succeeded or not. ) -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-40497) Upgrade Scala to 2.13.11
[ https://issues.apache.org/jira/browse/SPARK-40497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-40497: -- Description: We tested and decided to skip the following releases. This issue aims to use 2.13.11. - 2022-09-21: v2.13.9 released [https://github.com/scala/scala/releases/tag/v2.13.9] - 2022-10-13: 2.13.10 released [https://github.com/scala/scala/releases/tag/v2.13.10] Scala 2.13.11 Milestone - https://github.com/scala/scala/milestone/100 was: We tested and decided to skip the following releases. This issue aims to use 2.13.11. - 2022-09-21: v2.13.9 released [https://github.com/scala/scala/releases/tag/v2.13.9] - 2022-10-13: 2.13.10 released [https://github.com/scala/scala/releases/tag/v2.13.10] > Upgrade Scala to 2.13.11 > > > Key: SPARK-40497 > URL: https://issues.apache.org/jira/browse/SPARK-40497 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.4.0 >Reporter: Yang Jie >Priority: Major > > We tested and decided to skip the following releases. This issue aims to use > 2.13.11. > - 2022-09-21: v2.13.9 released > [https://github.com/scala/scala/releases/tag/v2.13.9] > - 2022-10-13: 2.13.10 released > [https://github.com/scala/scala/releases/tag/v2.13.10] > > Scala 2.13.11 Milestone > - https://github.com/scala/scala/milestone/100 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-40497) Upgrade Scala to 2.13.11
[ https://issues.apache.org/jira/browse/SPARK-40497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-40497: -- Description: We tested and decided to skip the following releases. This issue aims to use 2.13.11. - 2022-09-21: v2.13.9 released [https://github.com/scala/scala/releases/tag/v2.13.9] - 2022-10-13: 2.13.10 released [https://github.com/scala/scala/releases/tag/v2.13.10] was: 2.13.9 released [https://github.com/scala/scala/releases/tag/v2.13.9] > Upgrade Scala to 2.13.11 > > > Key: SPARK-40497 > URL: https://issues.apache.org/jira/browse/SPARK-40497 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.4.0 >Reporter: Yang Jie >Priority: Major > > We tested and decided to skip the following releases. This issue aims to use > 2.13.11. > - 2022-09-21: v2.13.9 released > [https://github.com/scala/scala/releases/tag/v2.13.9] > - 2022-10-13: 2.13.10 released > [https://github.com/scala/scala/releases/tag/v2.13.10] > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-41885) --packages may not work on Windows 11
Shixiong Zhu created SPARK-41885: Summary: --packages may not work on Windows 11 Key: SPARK-41885 URL: https://issues.apache.org/jira/browse/SPARK-41885 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 3.2.1 Environment: Hadoop 2.7 in windows 11 Reporter: Shixiong Zhu Gastón Ortiz reported an issue when using spark 3.2.1 and hadoop 2.7 in windows 11. See [https://github.com/delta-io/delta/issues/1059] Looks like executor cannot fetch the jar files. See the critical stack trace below (the full stack trace is in [https://github.com/delta-io/delta/issues/1059] ): {code:java} org.apache.spark.rpc.netty.NettyRpcEnv.openChannel(NettyRpcEnv.scala:366) at org.apache.spark.util.Utils$.doFetchFile(Utils.scala:762) at org.apache.spark.util.Utils$.fetchFile(Utils.scala:549) at org.apache.spark.executor.Executor.$anonfun$updateDependencies$13(Executor.scala:962) at org.apache.spark.executor.Executor.$anonfun$updateDependencies$13$adapted(Executor.scala:954) at scala.collection.TraversableLike$WithFilter.$anonfun$foreach$1(TraversableLike.scala:985) at scala.collection.mutable.HashMap.$anonfun$foreach$1(HashMap.scala:149) at scala.collection.mutable.HashTable.foreachEntry(HashTable.scala:237) at scala.collection.mutable.HashTable.foreachEntry$(HashTable.scala:230) at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:44) at scala.collection.mutable.HashMap.foreach(HashMap.scala:149) at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:984) at org.apache.spark.executor.Executor.org$apache$spark$executor$Executor$$updateDependencies(Executor.scala:954) at org.apache.spark.executor.Executor.(Executor.scala:247) at {code} This is not a Delta Lake issue, as this can be reproduced by running `pyspark --packages org.apache.kafka:kafka-clients:2.8.1` as well. I don't have a Windows 11 environment to debug. Hence I help Gastón Ortiz create this ticket and it would be great if anyone who has a Windows 11 environment can help this. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41575) Assign name to _LEGACY_ERROR_TEMP_2054
[ https://issues.apache.org/jira/browse/SPARK-41575?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-41575: Assignee: Apache Spark > Assign name to _LEGACY_ERROR_TEMP_2054 > -- > > Key: SPARK-41575 > URL: https://issues.apache.org/jira/browse/SPARK-41575 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Haejoon Lee >Assignee: Apache Spark >Priority: Major > > We should use proper error class name rather than `_LEGACY_ERROR_TEMP_xxx`. > > *NOTE:* Please reply to this ticket before start working on it, to avoid > working on same ticket at a time -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41575) Assign name to _LEGACY_ERROR_TEMP_2054
[ https://issues.apache.org/jira/browse/SPARK-41575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17654532#comment-17654532 ] Apache Spark commented on SPARK-41575: -- User 'itholic' has created a pull request for this issue: https://github.com/apache/spark/pull/39394 > Assign name to _LEGACY_ERROR_TEMP_2054 > -- > > Key: SPARK-41575 > URL: https://issues.apache.org/jira/browse/SPARK-41575 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Haejoon Lee >Priority: Major > > We should use proper error class name rather than `_LEGACY_ERROR_TEMP_xxx`. > > *NOTE:* Please reply to this ticket before start working on it, to avoid > working on same ticket at a time -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41575) Assign name to _LEGACY_ERROR_TEMP_2054
[ https://issues.apache.org/jira/browse/SPARK-41575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17654533#comment-17654533 ] Apache Spark commented on SPARK-41575: -- User 'itholic' has created a pull request for this issue: https://github.com/apache/spark/pull/39394 > Assign name to _LEGACY_ERROR_TEMP_2054 > -- > > Key: SPARK-41575 > URL: https://issues.apache.org/jira/browse/SPARK-41575 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Haejoon Lee >Priority: Major > > We should use proper error class name rather than `_LEGACY_ERROR_TEMP_xxx`. > > *NOTE:* Please reply to this ticket before start working on it, to avoid > working on same ticket at a time -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41575) Assign name to _LEGACY_ERROR_TEMP_2054
[ https://issues.apache.org/jira/browse/SPARK-41575?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-41575: Assignee: (was: Apache Spark) > Assign name to _LEGACY_ERROR_TEMP_2054 > -- > > Key: SPARK-41575 > URL: https://issues.apache.org/jira/browse/SPARK-41575 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Haejoon Lee >Priority: Major > > We should use proper error class name rather than `_LEGACY_ERROR_TEMP_xxx`. > > *NOTE:* Please reply to this ticket before start working on it, to avoid > working on same ticket at a time -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41871) DataFrame hint parameter can be str, list, float or int
[ https://issues.apache.org/jira/browse/SPARK-41871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17654516#comment-17654516 ] Apache Spark commented on SPARK-41871: -- User 'techaddict' has created a pull request for this issue: https://github.com/apache/spark/pull/39393 > DataFrame hint parameter can be str, list, float or int > --- > > Key: SPARK-41871 > URL: https://issues.apache.org/jira/browse/SPARK-41871 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Sandeep Singh >Priority: Major > > {code:java} > df = self.spark.range(10e10).toDF("id") > such_a_nice_list = ["itworks1", "itworks2", "itworks3"] > hinted_df = df.hint("my awesome hint", 1.2345, "what", such_a_nice_list){code} > {code:java} > Traceback (most recent call last): > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/tests/test_dataframe.py", > line 556, in test_extended_hint_types > hinted_df = df.hint("my awesome hint", 1.2345, "what", such_a_nice_list) > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", > line 482, in hint > raise TypeError( > TypeError: param should be a int or str, but got float 1.2345{code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41871) DataFrame hint parameter can be str, list, float or int
[ https://issues.apache.org/jira/browse/SPARK-41871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-41871: Assignee: Apache Spark > DataFrame hint parameter can be str, list, float or int > --- > > Key: SPARK-41871 > URL: https://issues.apache.org/jira/browse/SPARK-41871 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Sandeep Singh >Assignee: Apache Spark >Priority: Major > > {code:java} > df = self.spark.range(10e10).toDF("id") > such_a_nice_list = ["itworks1", "itworks2", "itworks3"] > hinted_df = df.hint("my awesome hint", 1.2345, "what", such_a_nice_list){code} > {code:java} > Traceback (most recent call last): > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/tests/test_dataframe.py", > line 556, in test_extended_hint_types > hinted_df = df.hint("my awesome hint", 1.2345, "what", such_a_nice_list) > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", > line 482, in hint > raise TypeError( > TypeError: param should be a int or str, but got float 1.2345{code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41871) DataFrame hint parameter can be str, list, float or int
[ https://issues.apache.org/jira/browse/SPARK-41871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-41871: Assignee: (was: Apache Spark) > DataFrame hint parameter can be str, list, float or int > --- > > Key: SPARK-41871 > URL: https://issues.apache.org/jira/browse/SPARK-41871 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Sandeep Singh >Priority: Major > > {code:java} > df = self.spark.range(10e10).toDF("id") > such_a_nice_list = ["itworks1", "itworks2", "itworks3"] > hinted_df = df.hint("my awesome hint", 1.2345, "what", such_a_nice_list){code} > {code:java} > Traceback (most recent call last): > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/tests/test_dataframe.py", > line 556, in test_extended_hint_types > hinted_df = df.hint("my awesome hint", 1.2345, "what", such_a_nice_list) > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", > line 482, in hint > raise TypeError( > TypeError: param should be a int or str, but got float 1.2345{code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41871) DataFrame hint parameter can be str, list, float or int
[ https://issues.apache.org/jira/browse/SPARK-41871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17654515#comment-17654515 ] Apache Spark commented on SPARK-41871: -- User 'techaddict' has created a pull request for this issue: https://github.com/apache/spark/pull/39393 > DataFrame hint parameter can be str, list, float or int > --- > > Key: SPARK-41871 > URL: https://issues.apache.org/jira/browse/SPARK-41871 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Sandeep Singh >Priority: Major > > {code:java} > df = self.spark.range(10e10).toDF("id") > such_a_nice_list = ["itworks1", "itworks2", "itworks3"] > hinted_df = df.hint("my awesome hint", 1.2345, "what", such_a_nice_list){code} > {code:java} > Traceback (most recent call last): > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/tests/test_dataframe.py", > line 556, in test_extended_hint_types > hinted_df = df.hint("my awesome hint", 1.2345, "what", such_a_nice_list) > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", > line 482, in hint > raise TypeError( > TypeError: param should be a int or str, but got float 1.2345{code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-41884) DataFrame `toPandas` parity in return types
[ https://issues.apache.org/jira/browse/SPARK-41884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandeep Singh updated SPARK-41884: -- Description: {code:java} import numpy as np import pandas as pd df = self.spark.createDataFrame( [[[("a", 2, 3.0), ("a", 2, 3.0)]], [[("b", 5, 6.0), ("b", 5, 6.0)]]], "array_struct_col Array>", ) for is_arrow_enabled in [True, False]: with self.sql_conf({"spark.sql.execution.arrow.pyspark.enabled": is_arrow_enabled}): pdf = df.toPandas() self.assertEqual(type(pdf), pd.DataFrame) self.assertEqual(type(pdf["array_struct_col"]), pd.Series) if is_arrow_enabled: self.assertEqual(type(pdf["array_struct_col"][0]), np.ndarray) else: self.assertEqual(type(pdf["array_struct_col"][0]), list){code} {code:java} Traceback (most recent call last): 1415 File "/__w/spark/spark/python/pyspark/sql/tests/test_dataframe.py", line 1202, in test_to_pandas_for_array_of_struct 1416df = self.spark.createDataFrame( 1417 File "/__w/spark/spark/python/pyspark/sql/connect/session.py", line 264, in createDataFrame 1418table = pa.Table.from_pylist([dict(zip(_cols, list(item))) for item in _data]) 1419 File "pyarrow/table.pxi", line 3700, in pyarrow.lib.Table.from_pylist 1420 File "pyarrow/table.pxi", line 5221, in pyarrow.lib._from_pylist 1421 File "pyarrow/table.pxi", line 3575, in pyarrow.lib.Table.from_arrays 1422 File "pyarrow/table.pxi", line 1383, in pyarrow.lib._sanitize_arrays 1423 File "pyarrow/table.pxi", line 1364, in pyarrow.lib._schema_from_arrays 1424 File "pyarrow/array.pxi", line 320, in pyarrow.lib.array 1425 File "pyarrow/array.pxi", line 39, in pyarrow.lib._sequence_to_array 1426 File "pyarrow/error.pxi", line 144, in pyarrow.lib.pyarrow_internal_check_status 1427 File "pyarrow/error.pxi", line 123, in pyarrow.lib.check_status 1428pyarrow.lib.ArrowTypeError: Expected bytes, got a 'int' object{code} {code:java} import numpy as np pdf = self._to_pandas() types = pdf.dtypes self.assertEqual(types[0], np.int32) self.assertEqual(types[1], np.object) self.assertEqual(types[2], np.bool) self.assertEqual(types[3], np.float32) self.assertEqual(types[4], np.object) # datetime.date self.assertEqual(types[5], "datetime64[ns]") self.assertEqual(types[6], "datetime64[ns]") self.assertEqual(types[7], "timedelta64[ns]") {code} {code:java} Traceback (most recent call last): 1434 File "/__w/spark/spark/python/pyspark/sql/tests/test_dataframe.py", line 1039, in test_to_pandas 1435 self.assertEqual(types[5], "datetime64[ns]") 1436AssertionError: datetime64[ns, Etc/UTC] != 'datetime64[ns]' 1437 {code} was: {code:java} schema = StructType( [StructField("i", StringType(), True), StructField("j", IntegerType(), True)] ) df = self.spark.createDataFrame([("a", 1)], schema) schema1 = StructType([StructField("j", StringType()), StructField("i", StringType())]) df1 = df.to(schema1) self.assertEqual(schema1, df1.schema) self.assertEqual(df.count(), df1.count()) schema2 = StructType([StructField("j", LongType())]) df2 = df.to(schema2) self.assertEqual(schema2, df2.schema) self.assertEqual(df.count(), df2.count()) schema3 = StructType([StructField("struct", schema1, False)]) df3 = df.select(struct("i", "j").alias("struct")).to(schema3) self.assertEqual(schema3, df3.schema) self.assertEqual(df.count(), df3.count()) # incompatible field nullability schema4 = StructType([StructField("j", LongType(), False)]) self.assertRaisesRegex( AnalysisException, "NULLABLE_COLUMN_OR_FIELD", lambda: df.to(schema4) ){code} {code:java} Traceback (most recent call last): File "/Users/s.singh/personal/spark-oss/python/pyspark/sql/tests/test_dataframe.py", line 1486, in test_to self.assertRaisesRegex( AssertionError: AnalysisException not raised by {code} > DataFrame `toPandas` parity in return types > --- > > Key: SPARK-41884 > URL: https://issues.apache.org/jira/browse/SPARK-41884 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Sandeep Singh >Priority: Major > > {code:java} > import numpy as np > import pandas as pd > df = self.spark.createDataFrame( > [[[("a", 2, 3.0), ("a", 2, 3.0)]], [[("b", 5, 6.0), ("b", 5, 6.0)]]], > "array_struct_col Array>", > ) > for is_arrow_enabled in [True, False]: > with self.sql_conf({"spark.sql.execution.arrow.pyspark.enabled": > is_arrow_enabled}): > pdf = df.toPandas() > self.assertEqual(type(pdf), pd.DataFrame) > self.assertEqual(type(pdf["array_struct_col"]), pd.Series) > if is_arrow_enabled: > self.assertEqual(type(pdf["array_struct_col"][0]), np.ndarray) > else: > self.assertEqual(type(pdf["array_struct_col"][0]), list){code} > {code:java} >
[jira] [Created] (SPARK-41884) DataFrame `toPandas` parity in return types
Sandeep Singh created SPARK-41884: - Summary: DataFrame `toPandas` parity in return types Key: SPARK-41884 URL: https://issues.apache.org/jira/browse/SPARK-41884 Project: Spark Issue Type: Sub-task Components: Connect Affects Versions: 3.4.0 Reporter: Sandeep Singh {code:java} schema = StructType( [StructField("i", StringType(), True), StructField("j", IntegerType(), True)] ) df = self.spark.createDataFrame([("a", 1)], schema) schema1 = StructType([StructField("j", StringType()), StructField("i", StringType())]) df1 = df.to(schema1) self.assertEqual(schema1, df1.schema) self.assertEqual(df.count(), df1.count()) schema2 = StructType([StructField("j", LongType())]) df2 = df.to(schema2) self.assertEqual(schema2, df2.schema) self.assertEqual(df.count(), df2.count()) schema3 = StructType([StructField("struct", schema1, False)]) df3 = df.select(struct("i", "j").alias("struct")).to(schema3) self.assertEqual(schema3, df3.schema) self.assertEqual(df.count(), df3.count()) # incompatible field nullability schema4 = StructType([StructField("j", LongType(), False)]) self.assertRaisesRegex( AnalysisException, "NULLABLE_COLUMN_OR_FIELD", lambda: df.to(schema4) ){code} {code:java} Traceback (most recent call last): File "/Users/s.singh/personal/spark-oss/python/pyspark/sql/tests/test_dataframe.py", line 1486, in test_to self.assertRaisesRegex( AssertionError: AnalysisException not raised by {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-39304) ps.read_csv ignore double quotes.
[ https://issues.apache.org/jira/browse/SPARK-39304?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bjørn Jørgensen resolved SPARK-39304. - Resolution: Won't Fix > ps.read_csv ignore double quotes. > - > > Key: SPARK-39304 > URL: https://issues.apache.org/jira/browse/SPARK-39304 > Project: Spark > Issue Type: Bug > Components: Pandas API on Spark >Affects Versions: 3.4.0 >Reporter: Bjørn Jørgensen >Priority: Major > Attachments: Untitled (4).ipynb, csvfile.csv > > > This one is coming from u...@spark.org mail list tittle "Complexity with the > data" and also on > [SO|https://stackoverflow.com/questions/72389385/how-to-load-complex-data-using-pyspark] > > Add a notebook and the sample data, where this error is tested. > Test data : > Some years,"If your job title needs additional context, please clarify > here:","If ""Other,"" please indicate the currency here: " > 5-7 years,"I started as the Marketing Coordinator, and was given the > ""Associate Product Manager"" title as a promotion. My duties remained mostly > the same and include graphic design work, marketing, and product management.", > 8 - 10 years,equivalent to Assistant Registrar, > 2 - 4 years,"I manage our fundraising department, primarily overseeing our > direct mail, planned giving, and grant writing programs. ", -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41846) DataFrame windowspec functions : unresolved columns
[ https://issues.apache.org/jira/browse/SPARK-41846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17654446#comment-17654446 ] Apache Spark commented on SPARK-41846: -- User 'zhengruifeng' has created a pull request for this issue: https://github.com/apache/spark/pull/39392 > DataFrame windowspec functions : unresolved columns > --- > > Key: SPARK-41846 > URL: https://issues.apache.org/jira/browse/SPARK-41846 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Sandeep Singh >Priority: Major > > {code:java} > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/functions.py", > line 1098, in pyspark.sql.connect.functions.rank > Failed example: > df.withColumn("drank", rank().over(w)).show() > Exception raised: > Traceback (most recent call last): > File > "/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py", > line 1350, in __run > exec(compile(example.source, filename, "single", > File "", line 1, in > > df.withColumn("drank", rank().over(w)).show() > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", > line 534, in show > print(self._show_string(n, truncate, vertical)) > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", > line 423, in _show_string > ).toPandas() > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", > line 1031, in toPandas > return self._session.client.to_pandas(query) > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", > line 413, in to_pandas > return self._execute_and_fetch(req) > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", > line 573, in _execute_and_fetch > self._handle_error(rpc_error) > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", > line 619, in _handle_error > raise SparkConnectAnalysisException( > pyspark.sql.connect.client.SparkConnectAnalysisException: > [UNRESOLVED_COLUMN.WITH_SUGGESTION] A column or function parameter with name > `value` cannot be resolved. Did you mean one of the following? [`_1`] > Plan: 'Project [_1#4000L, rank() windowspecdefinition('value ASC NULLS > FIRST, specifiedwindowframe(RowFrame, unboundedpreceding$(), currentrow$())) > AS drank#4003] > +- Project [0#3998L AS _1#4000L] > +- LocalRelation [0#3998L] {code} > {code:java} > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/functions.py", > line 1032, in pyspark.sql.connect.functions.cume_dist > Failed example: > df.withColumn("cd", cume_dist().over(w)).show() > Exception raised: > Traceback (most recent call last): > File > "/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py", > line 1350, in __run > exec(compile(example.source, filename, "single", > File "", line 1, in > > df.withColumn("cd", cume_dist().over(w)).show() > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", > line 534, in show > print(self._show_string(n, truncate, vertical)) > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", > line 423, in _show_string > ).toPandas() > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", > line 1031, in toPandas > return self._session.client.to_pandas(query) > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", > line 413, in to_pandas > return self._execute_and_fetch(req) > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", > line 573, in _execute_and_fetch > self._handle_error(rpc_error) > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", > line 619, in _handle_error > raise SparkConnectAnalysisException( > pyspark.sql.connect.client.SparkConnectAnalysisException: > [UNRESOLVED_COLUMN.WITH_SUGGESTION] A column or function parameter with name > `value` cannot be resolved. Did you mean one of the following? [`_1`] > Plan: 'Project [_1#2202L, cume_dist() windowspecdefinition('value ASC > NULLS FIRST, specifiedwindowframe(RangeFrame, unboundedpreceding$(), > currentrow$())) AS cd#2205] > +- Project [0#2200L AS _1#2202L] > +- LocalRelation [0#2200L] {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail:
[jira] [Assigned] (SPARK-41846) DataFrame windowspec functions : unresolved columns
[ https://issues.apache.org/jira/browse/SPARK-41846?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-41846: Assignee: (was: Apache Spark) > DataFrame windowspec functions : unresolved columns > --- > > Key: SPARK-41846 > URL: https://issues.apache.org/jira/browse/SPARK-41846 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Sandeep Singh >Priority: Major > > {code:java} > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/functions.py", > line 1098, in pyspark.sql.connect.functions.rank > Failed example: > df.withColumn("drank", rank().over(w)).show() > Exception raised: > Traceback (most recent call last): > File > "/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py", > line 1350, in __run > exec(compile(example.source, filename, "single", > File "", line 1, in > > df.withColumn("drank", rank().over(w)).show() > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", > line 534, in show > print(self._show_string(n, truncate, vertical)) > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", > line 423, in _show_string > ).toPandas() > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", > line 1031, in toPandas > return self._session.client.to_pandas(query) > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", > line 413, in to_pandas > return self._execute_and_fetch(req) > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", > line 573, in _execute_and_fetch > self._handle_error(rpc_error) > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", > line 619, in _handle_error > raise SparkConnectAnalysisException( > pyspark.sql.connect.client.SparkConnectAnalysisException: > [UNRESOLVED_COLUMN.WITH_SUGGESTION] A column or function parameter with name > `value` cannot be resolved. Did you mean one of the following? [`_1`] > Plan: 'Project [_1#4000L, rank() windowspecdefinition('value ASC NULLS > FIRST, specifiedwindowframe(RowFrame, unboundedpreceding$(), currentrow$())) > AS drank#4003] > +- Project [0#3998L AS _1#4000L] > +- LocalRelation [0#3998L] {code} > {code:java} > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/functions.py", > line 1032, in pyspark.sql.connect.functions.cume_dist > Failed example: > df.withColumn("cd", cume_dist().over(w)).show() > Exception raised: > Traceback (most recent call last): > File > "/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py", > line 1350, in __run > exec(compile(example.source, filename, "single", > File "", line 1, in > > df.withColumn("cd", cume_dist().over(w)).show() > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", > line 534, in show > print(self._show_string(n, truncate, vertical)) > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", > line 423, in _show_string > ).toPandas() > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", > line 1031, in toPandas > return self._session.client.to_pandas(query) > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", > line 413, in to_pandas > return self._execute_and_fetch(req) > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", > line 573, in _execute_and_fetch > self._handle_error(rpc_error) > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", > line 619, in _handle_error > raise SparkConnectAnalysisException( > pyspark.sql.connect.client.SparkConnectAnalysisException: > [UNRESOLVED_COLUMN.WITH_SUGGESTION] A column or function parameter with name > `value` cannot be resolved. Did you mean one of the following? [`_1`] > Plan: 'Project [_1#2202L, cume_dist() windowspecdefinition('value ASC > NULLS FIRST, specifiedwindowframe(RangeFrame, unboundedpreceding$(), > currentrow$())) AS cd#2205] > +- Project [0#2200L AS _1#2202L] > +- LocalRelation [0#2200L] {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41846) DataFrame windowspec functions : unresolved columns
[ https://issues.apache.org/jira/browse/SPARK-41846?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-41846: Assignee: Apache Spark > DataFrame windowspec functions : unresolved columns > --- > > Key: SPARK-41846 > URL: https://issues.apache.org/jira/browse/SPARK-41846 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Sandeep Singh >Assignee: Apache Spark >Priority: Major > > {code:java} > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/functions.py", > line 1098, in pyspark.sql.connect.functions.rank > Failed example: > df.withColumn("drank", rank().over(w)).show() > Exception raised: > Traceback (most recent call last): > File > "/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py", > line 1350, in __run > exec(compile(example.source, filename, "single", > File "", line 1, in > > df.withColumn("drank", rank().over(w)).show() > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", > line 534, in show > print(self._show_string(n, truncate, vertical)) > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", > line 423, in _show_string > ).toPandas() > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", > line 1031, in toPandas > return self._session.client.to_pandas(query) > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", > line 413, in to_pandas > return self._execute_and_fetch(req) > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", > line 573, in _execute_and_fetch > self._handle_error(rpc_error) > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", > line 619, in _handle_error > raise SparkConnectAnalysisException( > pyspark.sql.connect.client.SparkConnectAnalysisException: > [UNRESOLVED_COLUMN.WITH_SUGGESTION] A column or function parameter with name > `value` cannot be resolved. Did you mean one of the following? [`_1`] > Plan: 'Project [_1#4000L, rank() windowspecdefinition('value ASC NULLS > FIRST, specifiedwindowframe(RowFrame, unboundedpreceding$(), currentrow$())) > AS drank#4003] > +- Project [0#3998L AS _1#4000L] > +- LocalRelation [0#3998L] {code} > {code:java} > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/functions.py", > line 1032, in pyspark.sql.connect.functions.cume_dist > Failed example: > df.withColumn("cd", cume_dist().over(w)).show() > Exception raised: > Traceback (most recent call last): > File > "/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py", > line 1350, in __run > exec(compile(example.source, filename, "single", > File "", line 1, in > > df.withColumn("cd", cume_dist().over(w)).show() > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", > line 534, in show > print(self._show_string(n, truncate, vertical)) > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", > line 423, in _show_string > ).toPandas() > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", > line 1031, in toPandas > return self._session.client.to_pandas(query) > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", > line 413, in to_pandas > return self._execute_and_fetch(req) > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", > line 573, in _execute_and_fetch > self._handle_error(rpc_error) > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", > line 619, in _handle_error > raise SparkConnectAnalysisException( > pyspark.sql.connect.client.SparkConnectAnalysisException: > [UNRESOLVED_COLUMN.WITH_SUGGESTION] A column or function parameter with name > `value` cannot be resolved. Did you mean one of the following? [`_1`] > Plan: 'Project [_1#2202L, cume_dist() windowspecdefinition('value ASC > NULLS FIRST, specifiedwindowframe(RangeFrame, unboundedpreceding$(), > currentrow$())) AS cd#2205] > +- Project [0#2200L AS _1#2202L] > +- LocalRelation [0#2200L] {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail:
[jira] [Commented] (SPARK-41825) DataFrame.show formatting int as double
[ https://issues.apache.org/jira/browse/SPARK-41825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17654440#comment-17654440 ] Ruifeng Zheng commented on SPARK-41825: --- I'll take this one > DataFrame.show formatting int as double > --- > > Key: SPARK-41825 > URL: https://issues.apache.org/jira/browse/SPARK-41825 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Sandeep Singh >Priority: Major > > {code:java} > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", > line 650, in pyspark.sql.connect.dataframe.DataFrame.fillna > Failed example: > df.na.fill(50).show() > Expected: > +---+--+-++ > |age|height| name|bool| > +---+--+-++ > | 10| 80.5|Alice|null| > | 5| 50.0| Bob|null| > | 50| 50.0| Tom|null| > | 50| 50.0| null|true| > +---+--+-++ > Got: > ++--+-++ > | age|height| name|bool| > ++--+-++ > |10.0| 80.5|Alice|null| > | 5.0| 50.0| Bob|null| > |50.0| 50.0| Tom|null| > |50.0| 50.0| null|true| > ++--+-++ > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41883) Upgrade dropwizard metrics 4.2.15
[ https://issues.apache.org/jira/browse/SPARK-41883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-41883: Assignee: (was: Apache Spark) > Upgrade dropwizard metrics 4.2.15 > -- > > Key: SPARK-41883 > URL: https://issues.apache.org/jira/browse/SPARK-41883 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.4.0 >Reporter: Yang Jie >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41883) Upgrade dropwizard metrics 4.2.15
[ https://issues.apache.org/jira/browse/SPARK-41883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-41883: Assignee: Apache Spark > Upgrade dropwizard metrics 4.2.15 > -- > > Key: SPARK-41883 > URL: https://issues.apache.org/jira/browse/SPARK-41883 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.4.0 >Reporter: Yang Jie >Assignee: Apache Spark >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org