[jira] [Updated] (SPARK-42100) Protect null `SQLExecutionUIData#description` in `SQLExecutionUIDataSerializer`
[ https://issues.apache.org/jira/browse/SPARK-42100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yang Jie updated SPARK-42100: - Description: export LIVE_UI_LOCAL_STORE_DIR = /tmp/spark-ui mvn clean install -pl sql/core -Dtest.exclude.tags=org.apache.spark.tags.ExtendedLevelDBTest -Dtest=none -DwildcardSuites=org.apache.spark.sql.DynamicPartitionPruningV1SuiteAEOff -am no test failed, but some error message: {code:java} 14:46:44.514 ERROR org.apache.spark.scheduler.AsyncEventQueue: Listener SQLAppStatusListener threw an exception java.lang.NullPointerException at org.apache.spark.status.protobuf.StoreTypes$SQLExecutionUIData$Builder.setDescription(StoreTypes.java:46500) at org.apache.spark.status.protobuf.sql.SQLExecutionUIDataSerializer.serialize(SQLExecutionUIDataSerializer.scala:34) at org.apache.spark.status.protobuf.sql.SQLExecutionUIDataSerializer.serialize(SQLExecutionUIDataSerializer.scala:28) at org.apache.spark.status.protobuf.KVStoreProtobufSerializer.serialize(KVStoreProtobufSerializer.scala:30) at org.apache.spark.util.kvstore.RocksDB.write(RocksDB.java:188) at org.apache.spark.status.ElementTrackingStore.write(ElementTrackingStore.scala:123) at org.apache.spark.status.ElementTrackingStore.write(ElementTrackingStore.scala:127) at org.apache.spark.status.LiveEntity.write(LiveEntity.scala:50) at org.apache.spark.sql.execution.ui.SQLAppStatusListener.update(SQLAppStatusListener.scala:456) at org.apache.spark.sql.execution.ui.SQLAppStatusListener.onJobStart(SQLAppStatusListener.scala:124) at org.apache.spark.scheduler.SparkListenerBus.doPostEvent(SparkListenerBus.scala:37) at org.apache.spark.scheduler.SparkListenerBus.doPostEvent$(SparkListenerBus.scala:28) at org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:37) at org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:37) at org.apache.spark.util.ListenerBus.postToAll(ListenerBus.scala:117) at org.apache.spark.util.ListenerBus.postToAll$(ListenerBus.scala:101) at org.apache.spark.scheduler.AsyncEventQueue.super$postToAll(AsyncEventQueue.scala:105) at org.apache.spark.scheduler.AsyncEventQueue.$anonfun$dispatch$1(AsyncEventQueue.scala:105) at scala.runtime.java8.JFunction0$mcJ$sp.apply(JFunction0$mcJ$sp.java:23) at scala.util.DynamicVariable.withValue(DynamicVariable.scala:62) at org.apache.spark.scheduler.AsyncEventQueue.org$apache$spark$scheduler$AsyncEventQueue$$dispatch(AsyncEventQueue.scala:100) at org.apache.spark.scheduler.AsyncEventQueue$$anon$2.$anonfun$run$1(AsyncEventQueue.scala:96) at org.apache.spark.util.Utils$.tryOrStopSparkContext(Utils.scala:1444) at org.apache.spark.scheduler.AsyncEventQueue$$anon$2.run(AsyncEventQueue.scala:96) 14:46:44.936 ERROR org.apache.spark.scheduler.AsyncEventQueue: Listener SQLAppStatusListener threw an exception {code} > Protect null `SQLExecutionUIData#description` in > `SQLExecutionUIDataSerializer` > --- > > Key: SPARK-42100 > URL: https://issues.apache.org/jira/browse/SPARK-42100 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Yang Jie >Priority: Major > > export LIVE_UI_LOCAL_STORE_DIR = /tmp/spark-ui > mvn clean install -pl sql/core > -Dtest.exclude.tags=org.apache.spark.tags.ExtendedLevelDBTest -Dtest=none > -DwildcardSuites=org.apache.spark.sql.DynamicPartitionPruningV1SuiteAEOff -am > > no test failed, but some error message: > > {code:java} > 14:46:44.514 ERROR org.apache.spark.scheduler.AsyncEventQueue: Listener > SQLAppStatusListener threw an exception > java.lang.NullPointerException > at > org.apache.spark.status.protobuf.StoreTypes$SQLExecutionUIData$Builder.setDescription(StoreTypes.java:46500) > at > org.apache.spark.status.protobuf.sql.SQLExecutionUIDataSerializer.serialize(SQLExecutionUIDataSerializer.scala:34) > at > org.apache.spark.status.protobuf.sql.SQLExecutionUIDataSerializer.serialize(SQLExecutionUIDataSerializer.scala:28) > at > org.apache.spark.status.protobuf.KVStoreProtobufSerializer.serialize(KVStoreProtobufSerializer.scala:30) > at org.apache.spark.util.kvstore.RocksDB.write(RocksDB.java:188) > at > org.apache.spark.status.ElementTrackingStore.write(ElementTrackingStore.scala:123) > at > org.apache.spark.status.ElementTrackingStore.write(ElementTrackingStore.scala:127) > at org.apache.spark.status.LiveEntity.write(LiveEntity.scala:50) > at > org.apache.spark.sql.execution.ui.SQLAppStatusListener.update(SQLAppStatusListener.scala:456) > at > org.apache.spark.sql.execution.ui.SQLAppStatusListener.onJobStart(SQLAppStatusListener.scala:124) > at >
[jira] [Assigned] (SPARK-41845) Fix `count(expr("*"))` function
[ https://issues.apache.org/jira/browse/SPARK-41845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-41845: Assignee: (was: Apache Spark) > Fix `count(expr("*"))` function > --- > > Key: SPARK-41845 > URL: https://issues.apache.org/jira/browse/SPARK-41845 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 3.4.0 >Reporter: Sandeep Singh >Priority: Major > > {code:java} > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/functions.py", > line 801, in pyspark.sql.connect.functions.count > Failed example: > df.select(count(expr("*")), count(df.alphabets)).show() > Expected: > +++ > |count(1)|count(alphabets)| > +++ > | 4| 3| > +++ > Got: > +++ > |count(alphabets)|count(alphabets)| > +++ > | 3| 3| > +++ > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41845) Fix `count(expr("*"))` function
[ https://issues.apache.org/jira/browse/SPARK-41845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-41845: Assignee: Apache Spark > Fix `count(expr("*"))` function > --- > > Key: SPARK-41845 > URL: https://issues.apache.org/jira/browse/SPARK-41845 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 3.4.0 >Reporter: Sandeep Singh >Assignee: Apache Spark >Priority: Major > > {code:java} > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/functions.py", > line 801, in pyspark.sql.connect.functions.count > Failed example: > df.select(count(expr("*")), count(df.alphabets)).show() > Expected: > +++ > |count(1)|count(alphabets)| > +++ > | 4| 3| > +++ > Got: > +++ > |count(alphabets)|count(alphabets)| > +++ > | 3| 3| > +++ > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-42100) Protect null `SQLExecutionUIData#description` in `SQLExecutionUIDataSerializer`
Yang Jie created SPARK-42100: Summary: Protect null `SQLExecutionUIData#description` in `SQLExecutionUIDataSerializer` Key: SPARK-42100 URL: https://issues.apache.org/jira/browse/SPARK-42100 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.4.0 Reporter: Yang Jie -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41845) Fix `count(expr("*"))` function
[ https://issues.apache.org/jira/browse/SPARK-41845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17677636#comment-17677636 ] Apache Spark commented on SPARK-41845: -- User 'zhengruifeng' has created a pull request for this issue: https://github.com/apache/spark/pull/39622 > Fix `count(expr("*"))` function > --- > > Key: SPARK-41845 > URL: https://issues.apache.org/jira/browse/SPARK-41845 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 3.4.0 >Reporter: Sandeep Singh >Priority: Major > > {code:java} > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/functions.py", > line 801, in pyspark.sql.connect.functions.count > Failed example: > df.select(count(expr("*")), count(df.alphabets)).show() > Expected: > +++ > |count(1)|count(alphabets)| > +++ > | 4| 3| > +++ > Got: > +++ > |count(alphabets)|count(alphabets)| > +++ > | 3| 3| > +++ > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42099) Make `count(*)` work correctly
[ https://issues.apache.org/jira/browse/SPARK-42099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17677635#comment-17677635 ] Apache Spark commented on SPARK-42099: -- User 'zhengruifeng' has created a pull request for this issue: https://github.com/apache/spark/pull/39622 > Make `count(*)` work correctly > -- > > Key: SPARK-42099 > URL: https://issues.apache.org/jira/browse/SPARK-42099 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 3.4.0 >Reporter: Ruifeng Zheng >Priority: Major > > cdf.select(CF.count("*"), CF.count(cdf.alphabets)).collect() > {code:java} > pyspark.sql.connect.client.SparkConnectAnalysisException: > [UNRESOLVED_COLUMN.WITH_SUGGESTION] A column or function parameter with name > `*` cannot be resolved. Did you mean one of the following? [`alphabets`] > Plan: 'Aggregate [unresolvedalias('count('*), None), count(alphabets#32) AS > count(alphabets)#35L] > +- Project [alphabets#30 AS alphabets#32] >+- LocalRelation [alphabets#30] > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42099) Make `count(*)` work correctly
[ https://issues.apache.org/jira/browse/SPARK-42099?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42099: Assignee: (was: Apache Spark) > Make `count(*)` work correctly > -- > > Key: SPARK-42099 > URL: https://issues.apache.org/jira/browse/SPARK-42099 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 3.4.0 >Reporter: Ruifeng Zheng >Priority: Major > > cdf.select(CF.count("*"), CF.count(cdf.alphabets)).collect() > {code:java} > pyspark.sql.connect.client.SparkConnectAnalysisException: > [UNRESOLVED_COLUMN.WITH_SUGGESTION] A column or function parameter with name > `*` cannot be resolved. Did you mean one of the following? [`alphabets`] > Plan: 'Aggregate [unresolvedalias('count('*), None), count(alphabets#32) AS > count(alphabets)#35L] > +- Project [alphabets#30 AS alphabets#32] >+- LocalRelation [alphabets#30] > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42099) Make `count(*)` work correctly
[ https://issues.apache.org/jira/browse/SPARK-42099?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42099: Assignee: Apache Spark > Make `count(*)` work correctly > -- > > Key: SPARK-42099 > URL: https://issues.apache.org/jira/browse/SPARK-42099 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 3.4.0 >Reporter: Ruifeng Zheng >Assignee: Apache Spark >Priority: Major > > cdf.select(CF.count("*"), CF.count(cdf.alphabets)).collect() > {code:java} > pyspark.sql.connect.client.SparkConnectAnalysisException: > [UNRESOLVED_COLUMN.WITH_SUGGESTION] A column or function parameter with name > `*` cannot be resolved. Did you mean one of the following? [`alphabets`] > Plan: 'Aggregate [unresolvedalias('count('*), None), count(alphabets#32) AS > count(alphabets)#35L] > +- Project [alphabets#30 AS alphabets#32] >+- LocalRelation [alphabets#30] > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-42090) Introduce sasl retry count in RetryingBlockTransferor
[ https://issues.apache.org/jira/browse/SPARK-42090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mridul Muralidharan resolved SPARK-42090. - Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 39611 [https://github.com/apache/spark/pull/39611] > Introduce sasl retry count in RetryingBlockTransferor > - > > Key: SPARK-42090 > URL: https://issues.apache.org/jira/browse/SPARK-42090 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.4.0 >Reporter: Ted Yu >Assignee: Ted Yu >Priority: Major > Fix For: 3.4.0 > > > Previously a boolean variable, saslTimeoutSeen, was used in > RetryingBlockTransferor. However, the boolean variable wouldn't cover the > following scenario: > 1. SaslTimeoutException > 2. IOException > 3. SaslTimeoutException > 4. IOException > Even though IOException at #2 is retried (resulting in increment of > retryCount), the retryCount would be cleared at step #4. > Since the intention of saslTimeoutSeen is to undo the increment due to > retrying SaslTimeoutException, we should keep a counter for > SaslTimeoutException retries and subtract the value of this counter from > retryCount. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42090) Introduce sasl retry count in RetryingBlockTransferor
[ https://issues.apache.org/jira/browse/SPARK-42090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mridul Muralidharan reassigned SPARK-42090: --- Assignee: Ted Yu > Introduce sasl retry count in RetryingBlockTransferor > - > > Key: SPARK-42090 > URL: https://issues.apache.org/jira/browse/SPARK-42090 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.4.0 >Reporter: Ted Yu >Assignee: Ted Yu >Priority: Major > > Previously a boolean variable, saslTimeoutSeen, was used in > RetryingBlockTransferor. However, the boolean variable wouldn't cover the > following scenario: > 1. SaslTimeoutException > 2. IOException > 3. SaslTimeoutException > 4. IOException > Even though IOException at #2 is retried (resulting in increment of > retryCount), the retryCount would be cleared at step #4. > Since the intention of saslTimeoutSeen is to undo the increment due to > retrying SaslTimeoutException, we should keep a counter for > SaslTimeoutException retries and subtract the value of this counter from > retryCount. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-42099) Make `count(*)` work correctly
Ruifeng Zheng created SPARK-42099: - Summary: Make `count(*)` work correctly Key: SPARK-42099 URL: https://issues.apache.org/jira/browse/SPARK-42099 Project: Spark Issue Type: Sub-task Components: Connect, PySpark Affects Versions: 3.4.0 Reporter: Ruifeng Zheng cdf.select(CF.count("*"), CF.count(cdf.alphabets)).collect() {code:java} pyspark.sql.connect.client.SparkConnectAnalysisException: [UNRESOLVED_COLUMN.WITH_SUGGESTION] A column or function parameter with name `*` cannot be resolved. Did you mean one of the following? [`alphabets`] Plan: 'Aggregate [unresolvedalias('count('*), None), count(alphabets#32) AS count(alphabets)#35L] +- Project [alphabets#30 AS alphabets#32] +- LocalRelation [alphabets#30] {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-42097) Register SerializedLambda and BitSet to KryoSerializer
[ https://issues.apache.org/jira/browse/SPARK-42097?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-42097. --- Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 39621 [https://github.com/apache/spark/pull/39621] > Register SerializedLambda and BitSet to KryoSerializer > -- > > Key: SPARK-42097 > URL: https://issues.apache.org/jira/browse/SPARK-42097 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.4.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Minor > Fix For: 3.4.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42097) Register SerializedLambda and BitSet to KryoSerializer
[ https://issues.apache.org/jira/browse/SPARK-42097?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-42097: - Assignee: Dongjoon Hyun > Register SerializedLambda and BitSet to KryoSerializer > -- > > Key: SPARK-42097 > URL: https://issues.apache.org/jira/browse/SPARK-42097 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.4.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-42098) ResolveInlineTables should handle RuntimeReplaceable
Wenchen Fan created SPARK-42098: --- Summary: ResolveInlineTables should handle RuntimeReplaceable Key: SPARK-42098 URL: https://issues.apache.org/jira/browse/SPARK-42098 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.3.1 Reporter: Wenchen Fan spark-sql> VALUES (try_divide(5, 0)); cannot evaluate expression try_divide(5, 0) in inline table definition; line 1 pos 8 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42097) Register SerializedLambda and BitSet to KryoSerializer
[ https://issues.apache.org/jira/browse/SPARK-42097?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42097: Assignee: Apache Spark > Register SerializedLambda and BitSet to KryoSerializer > -- > > Key: SPARK-42097 > URL: https://issues.apache.org/jira/browse/SPARK-42097 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.4.0 >Reporter: Dongjoon Hyun >Assignee: Apache Spark >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42097) Register SerializedLambda and BitSet to KryoSerializer
[ https://issues.apache.org/jira/browse/SPARK-42097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17677618#comment-17677618 ] Apache Spark commented on SPARK-42097: -- User 'dongjoon-hyun' has created a pull request for this issue: https://github.com/apache/spark/pull/39621 > Register SerializedLambda and BitSet to KryoSerializer > -- > > Key: SPARK-42097 > URL: https://issues.apache.org/jira/browse/SPARK-42097 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.4.0 >Reporter: Dongjoon Hyun >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42097) Register SerializedLambda and BitSet to KryoSerializer
[ https://issues.apache.org/jira/browse/SPARK-42097?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42097: Assignee: (was: Apache Spark) > Register SerializedLambda and BitSet to KryoSerializer > -- > > Key: SPARK-42097 > URL: https://issues.apache.org/jira/browse/SPARK-42097 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.4.0 >Reporter: Dongjoon Hyun >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41757) Compatibility of string representation in Column
[ https://issues.apache.org/jira/browse/SPARK-41757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-41757: Assignee: Hyukjin Kwon > Compatibility of string representation in Column > > > Key: SPARK-41757 > URL: https://issues.apache.org/jira/browse/SPARK-41757 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Sandeep Singh >Assignee: Hyukjin Kwon >Priority: Major > > Doctest in pyspark.sql.connect.column.Columnfails with the error below: > {code:java} > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/column.py", > line 120, in pyspark.sql.connect.column.Column > Failed example: > df.name > Expected: > Column<'name'> > Got: > Column<'ColumnReference(name)'> > ** > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/column.py", > line 122, in pyspark.sql.connect.column.Column > Failed example: > df["name"] > Expected: > Column<'name'> > Got: > Column<'ColumnReference(name)'> > ** > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/column.py", > line 127, in pyspark.sql.connect.column.Column > Failed example: > df.age + 1 > Expected: > Column<'(age + 1)'> > Got: > Column<'+(ColumnReference(age), Literal(1))'> > ** > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/column.py", > line 129, in pyspark.sql.connect.column.Column > Failed example: > 1 / df.age > Expected: > Column<'(1 / age)'> > Got: > Column<'/(Literal(1), ColumnReference(age))'> {code} > > We should enable this back after fixing the issue in Spark Connect -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41901) Parity in String representation of Column
[ https://issues.apache.org/jira/browse/SPARK-41901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-41901: Assignee: Hyukjin Kwon > Parity in String representation of Column > - > > Key: SPARK-41901 > URL: https://issues.apache.org/jira/browse/SPARK-41901 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Sandeep Singh >Assignee: Hyukjin Kwon >Priority: Major > > {code:java} > from pyspark.sql import functions > funs = [ > (functions.acosh, "ACOSH"), > (functions.asinh, "ASINH"), > (functions.atanh, "ATANH"), > ] > cols = ["a", functions.col("a")] > for f, alias in funs: > for c in cols: > self.assertIn(f"{alias}(a)", repr(f(c))){code} > {code:java} > Traceback (most recent call last): > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/tests/test_functions.py", > line 271, in test_inverse_trig_functions > self.assertIn(f"{alias}(a)", repr(f(c))) > AssertionError: 'ACOSH(a)' not found in > "Column<'acosh(ColumnReference(a))'>"{code} > > > {code:java} > from pyspark.sql.functions import col, lit, overlay > from itertools import chain > import re > actual = list( > chain.from_iterable( > [ > re.findall("(overlay\\(.*\\))", str(x)) > for x in [ > overlay(col("foo"), col("bar"), 1), > overlay("x", "y", 3), > overlay(col("x"), col("y"), 1, 3), > overlay("x", "y", 2, 5), > overlay("x", "y", lit(11)), > overlay("x", "y", lit(2), lit(5)), > ] > ] > ) > ) > expected = [ > "overlay(foo, bar, 1, -1)", > "overlay(x, y, 3, -1)", > "overlay(x, y, 1, 3)", > "overlay(x, y, 2, 5)", > "overlay(x, y, 11, -1)", > "overlay(x, y, 2, 5)", > ] > self.assertListEqual(actual, expected) > df = self.spark.createDataFrame([("SPARK_SQL", "CORE", 7, 0)], ("x", "y", > "pos", "len")) > exp = [Row(ol="SPARK_CORESQL")] > self.assertTrue( > all( > [ > df.select(overlay(df.x, df.y, 7, 0).alias("ol")).collect() == exp, > df.select(overlay(df.x, df.y, lit(7), > lit(0)).alias("ol")).collect() == exp, > df.select(overlay("x", "y", "pos", "len").alias("ol")).collect() > == exp, > ] > ) > ) {code} > {code:java} > Traceback (most recent call last): > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/tests/test_functions.py", > line 675, in test_overlay > self.assertListEqual(actual, expected) > AssertionError: Lists differ: ['overlay(ColumnReference(foo), > ColumnReference(bar[402 chars]5))'] != ['overlay(foo, bar, 1, -1)', > 'overlay(x, y, 3, -1)'[90 chars] 5)'] > First differing element 0: > 'overlay(ColumnReference(foo), ColumnReference(bar), Literal(1), Literal(-1))' > 'overlay(foo, bar, 1, -1)' > - ['overlay(ColumnReference(foo), ColumnReference(bar), Literal(1), > Literal(-1))', > - 'overlay(ColumnReference(x), ColumnReference(y), Literal(3), Literal(-1))', > - 'overlay(ColumnReference(x), ColumnReference(y), Literal(1), Literal(3))', > - 'overlay(ColumnReference(x), ColumnReference(y), Literal(2), Literal(5))', > - 'overlay(ColumnReference(x), ColumnReference(y), Literal(11), > Literal(-1))', > - 'overlay(ColumnReference(x), ColumnReference(y), Literal(2), Literal(5))'] > + ['overlay(foo, bar, 1, -1)', > + 'overlay(x, y, 3, -1)', > + 'overlay(x, y, 1, 3)', > + 'overlay(x, y, 2, 5)', > + 'overlay(x, y, 11, -1)', > + 'overlay(x, y, 2, 5)'] > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-41901) Parity in String representation of Column
[ https://issues.apache.org/jira/browse/SPARK-41901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-41901. -- Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 39616 [https://github.com/apache/spark/pull/39616] > Parity in String representation of Column > - > > Key: SPARK-41901 > URL: https://issues.apache.org/jira/browse/SPARK-41901 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Sandeep Singh >Assignee: Hyukjin Kwon >Priority: Major > Fix For: 3.4.0 > > > {code:java} > from pyspark.sql import functions > funs = [ > (functions.acosh, "ACOSH"), > (functions.asinh, "ASINH"), > (functions.atanh, "ATANH"), > ] > cols = ["a", functions.col("a")] > for f, alias in funs: > for c in cols: > self.assertIn(f"{alias}(a)", repr(f(c))){code} > {code:java} > Traceback (most recent call last): > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/tests/test_functions.py", > line 271, in test_inverse_trig_functions > self.assertIn(f"{alias}(a)", repr(f(c))) > AssertionError: 'ACOSH(a)' not found in > "Column<'acosh(ColumnReference(a))'>"{code} > > > {code:java} > from pyspark.sql.functions import col, lit, overlay > from itertools import chain > import re > actual = list( > chain.from_iterable( > [ > re.findall("(overlay\\(.*\\))", str(x)) > for x in [ > overlay(col("foo"), col("bar"), 1), > overlay("x", "y", 3), > overlay(col("x"), col("y"), 1, 3), > overlay("x", "y", 2, 5), > overlay("x", "y", lit(11)), > overlay("x", "y", lit(2), lit(5)), > ] > ] > ) > ) > expected = [ > "overlay(foo, bar, 1, -1)", > "overlay(x, y, 3, -1)", > "overlay(x, y, 1, 3)", > "overlay(x, y, 2, 5)", > "overlay(x, y, 11, -1)", > "overlay(x, y, 2, 5)", > ] > self.assertListEqual(actual, expected) > df = self.spark.createDataFrame([("SPARK_SQL", "CORE", 7, 0)], ("x", "y", > "pos", "len")) > exp = [Row(ol="SPARK_CORESQL")] > self.assertTrue( > all( > [ > df.select(overlay(df.x, df.y, 7, 0).alias("ol")).collect() == exp, > df.select(overlay(df.x, df.y, lit(7), > lit(0)).alias("ol")).collect() == exp, > df.select(overlay("x", "y", "pos", "len").alias("ol")).collect() > == exp, > ] > ) > ) {code} > {code:java} > Traceback (most recent call last): > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/tests/test_functions.py", > line 675, in test_overlay > self.assertListEqual(actual, expected) > AssertionError: Lists differ: ['overlay(ColumnReference(foo), > ColumnReference(bar[402 chars]5))'] != ['overlay(foo, bar, 1, -1)', > 'overlay(x, y, 3, -1)'[90 chars] 5)'] > First differing element 0: > 'overlay(ColumnReference(foo), ColumnReference(bar), Literal(1), Literal(-1))' > 'overlay(foo, bar, 1, -1)' > - ['overlay(ColumnReference(foo), ColumnReference(bar), Literal(1), > Literal(-1))', > - 'overlay(ColumnReference(x), ColumnReference(y), Literal(3), Literal(-1))', > - 'overlay(ColumnReference(x), ColumnReference(y), Literal(1), Literal(3))', > - 'overlay(ColumnReference(x), ColumnReference(y), Literal(2), Literal(5))', > - 'overlay(ColumnReference(x), ColumnReference(y), Literal(11), > Literal(-1))', > - 'overlay(ColumnReference(x), ColumnReference(y), Literal(2), Literal(5))'] > + ['overlay(foo, bar, 1, -1)', > + 'overlay(x, y, 3, -1)', > + 'overlay(x, y, 1, 3)', > + 'overlay(x, y, 2, 5)', > + 'overlay(x, y, 11, -1)', > + 'overlay(x, y, 2, 5)'] > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-41757) Compatibility of string representation in Column
[ https://issues.apache.org/jira/browse/SPARK-41757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-41757. -- Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 39616 [https://github.com/apache/spark/pull/39616] > Compatibility of string representation in Column > > > Key: SPARK-41757 > URL: https://issues.apache.org/jira/browse/SPARK-41757 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Sandeep Singh >Assignee: Hyukjin Kwon >Priority: Major > Fix For: 3.4.0 > > > Doctest in pyspark.sql.connect.column.Columnfails with the error below: > {code:java} > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/column.py", > line 120, in pyspark.sql.connect.column.Column > Failed example: > df.name > Expected: > Column<'name'> > Got: > Column<'ColumnReference(name)'> > ** > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/column.py", > line 122, in pyspark.sql.connect.column.Column > Failed example: > df["name"] > Expected: > Column<'name'> > Got: > Column<'ColumnReference(name)'> > ** > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/column.py", > line 127, in pyspark.sql.connect.column.Column > Failed example: > df.age + 1 > Expected: > Column<'(age + 1)'> > Got: > Column<'+(ColumnReference(age), Literal(1))'> > ** > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/column.py", > line 129, in pyspark.sql.connect.column.Column > Failed example: > 1 / df.age > Expected: > Column<'(1 / age)'> > Got: > Column<'/(Literal(1), ColumnReference(age))'> {code} > > We should enable this back after fixing the issue in Spark Connect -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-41775) Implement training functions as input
[ https://issues.apache.org/jira/browse/SPARK-41775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rithwik Ediga Lakhamsani updated SPARK-41775: - Description: Sidenote: make formatting updates described in https://github.com/apache/spark/pull/39188 Currently, `Distributor().run(...)` takes only files as input. Now we will add in additional functionality to take in functions as well. This will require us to go through the following process on each task in the executor nodes: 1. take the input function and args and pickle them 2. Create a temp train.py file that looks like {code:java} import cloudpickle import os if _name_ == "_main_": train, args = cloudpickle.load(f"{tempdir}/train_input.pkl") output = train(*args) if output and os.environ.get("RANK", "") == "0": # this is for partitionId == 0 cloudpickle.dump(f"{tempdir}/train_output.pkl") {code} 3. Run that train.py file with `torchrun` 4. Check if `train_output.pkl` has been created on process on partitionId == 0, if it has, then deserialize it and return that output through `.collect()` was: Currently, `Distributor().run(...)` takes only files as input. Now we will add in additional functionality to take in functions as well. This will require us to go through the following process on each task in the executor nodes: 1. take the input function and args and pickle them 2. Create a temp train.py file that looks like {code:java} import cloudpickle import os if _name_ == "_main_": train, args = cloudpickle.load(f"{tempdir}/train_input.pkl") output = train(*args) if output and os.environ.get("RANK", "") == "0": # this is for partitionId == 0 cloudpickle.dump(f"{tempdir}/train_output.pkl") {code} 3. Run that train.py file with `torchrun` 4. Check if `train_output.pkl` has been created on process on partitionId == 0, if it has, then deserialize it and return that output through `.collect()` > Implement training functions as input > - > > Key: SPARK-41775 > URL: https://issues.apache.org/jira/browse/SPARK-41775 > Project: Spark > Issue Type: Sub-task > Components: ML, PySpark >Affects Versions: 3.4.0 >Reporter: Rithwik Ediga Lakhamsani >Priority: Major > > Sidenote: make formatting updates described in > https://github.com/apache/spark/pull/39188 > > Currently, `Distributor().run(...)` takes only files as input. Now we will > add in additional functionality to take in functions as well. This will > require us to go through the following process on each task in the executor > nodes: > 1. take the input function and args and pickle them > 2. Create a temp train.py file that looks like > {code:java} > import cloudpickle > import os > if _name_ == "_main_": > train, args = cloudpickle.load(f"{tempdir}/train_input.pkl") > output = train(*args) > if output and os.environ.get("RANK", "") == "0": # this is for > partitionId == 0 > cloudpickle.dump(f"{tempdir}/train_output.pkl") {code} > 3. Run that train.py file with `torchrun` > 4. Check if `train_output.pkl` has been created on process on partitionId == > 0, if it has, then deserialize it and return that output through `.collect()` -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42066) The DATATYPE_MISMATCH error class contains inappropriate and duplicating subclasses
[ https://issues.apache.org/jira/browse/SPARK-42066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17677608#comment-17677608 ] Haejoon Lee commented on SPARK-42066: - Let me take a look > The DATATYPE_MISMATCH error class contains inappropriate and duplicating > subclasses > --- > > Key: SPARK-42066 > URL: https://issues.apache.org/jira/browse/SPARK-42066 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.4.0 >Reporter: Serge Rielau >Priority: Major > > subclass WRONG_NUM_ARGS (with suggestions) semantically does not belong into > DATATYPE_MISMATCH and there is an error class with that same name. > We should rea the subclasses for this errorclass, which seems to have become > a bit of a dumping ground... -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41845) Fix `count(expr("*"))` function
[ https://issues.apache.org/jira/browse/SPARK-41845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17677605#comment-17677605 ] Ruifeng Zheng commented on SPARK-41845: --- I will take a look > Fix `count(expr("*"))` function > --- > > Key: SPARK-41845 > URL: https://issues.apache.org/jira/browse/SPARK-41845 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 3.4.0 >Reporter: Sandeep Singh >Priority: Major > > {code:java} > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/functions.py", > line 801, in pyspark.sql.connect.functions.count > Failed example: > df.select(count(expr("*")), count(df.alphabets)).show() > Expected: > +++ > |count(1)|count(alphabets)| > +++ > | 4| 3| > +++ > Got: > +++ > |count(alphabets)|count(alphabets)| > +++ > | 3| 3| > +++ > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-42097) Register SerializedLambda and BitSet to KryoSerializer
Dongjoon Hyun created SPARK-42097: - Summary: Register SerializedLambda and BitSet to KryoSerializer Key: SPARK-42097 URL: https://issues.apache.org/jira/browse/SPARK-42097 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 3.4.0 Reporter: Dongjoon Hyun -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42096) Code cleanup for connect module
[ https://issues.apache.org/jira/browse/SPARK-42096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42096: Assignee: Apache Spark > Code cleanup for connect module > --- > > Key: SPARK-42096 > URL: https://issues.apache.org/jira/browse/SPARK-42096 > Project: Spark > Issue Type: Improvement > Components: Connect >Affects Versions: 3.4.0 >Reporter: Yang Jie >Assignee: Apache Spark >Priority: Trivial > > For example, some functions that are currently only used inside the class > weaken the access scope from public to private > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42096) Code cleanup for connect module
[ https://issues.apache.org/jira/browse/SPARK-42096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42096: Assignee: (was: Apache Spark) > Code cleanup for connect module > --- > > Key: SPARK-42096 > URL: https://issues.apache.org/jira/browse/SPARK-42096 > Project: Spark > Issue Type: Improvement > Components: Connect >Affects Versions: 3.4.0 >Reporter: Yang Jie >Priority: Trivial > > For example, some functions that are currently only used inside the class > weaken the access scope from public to private > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42096) Code cleanup for connect module
[ https://issues.apache.org/jira/browse/SPARK-42096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17677602#comment-17677602 ] Apache Spark commented on SPARK-42096: -- User 'LuciferYang' has created a pull request for this issue: https://github.com/apache/spark/pull/39620 > Code cleanup for connect module > --- > > Key: SPARK-42096 > URL: https://issues.apache.org/jira/browse/SPARK-42096 > Project: Spark > Issue Type: Improvement > Components: Connect >Affects Versions: 3.4.0 >Reporter: Yang Jie >Priority: Trivial > > For example, some functions that are currently only used inside the class > weaken the access scope from public to private > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42089) Different result in nested lambda function
[ https://issues.apache.org/jira/browse/SPARK-42089?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ruifeng Zheng reassigned SPARK-42089: - Assignee: Ruifeng Zheng > Different result in nested lambda function > -- > > Key: SPARK-42089 > URL: https://issues.apache.org/jira/browse/SPARK-42089 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 3.4.0 >Reporter: Ruifeng Zheng >Assignee: Ruifeng Zheng >Priority: Major > > test_nested_higher_order_function > {code:java} > Traceback (most recent call last): > File > "/Users/ruifeng.zheng/Dev/spark/python/pyspark/sql/tests/test_functions.py", > line 814, in test_nested_higher_order_function > self.assertEquals(actual, expected) > AssertionError: Lists differ: [Row(n='a', l='a'), Row(n='b', l='b'), Row[124 > chars]'c')] != [(1, 'a'), (1, 'b'), (1, 'c'), (2, 'a'), ([43 chars]'c')] > First differing element 0: > Row(n='a', l='a') > (1, 'a') > - [Row(n='a', l='a'), > - Row(n='b', l='b'), > - Row(n='c', l='c'), > - Row(n='a', l='a'), > - Row(n='b', l='b'), > - Row(n='c', l='c'), > - Row(n='a', l='a'), > - Row(n='b', l='b'), > - Row(n='c', l='c')] > + [(1, 'a'), > + (1, 'b'), > + (1, 'c'), > + (2, 'a'), > + (2, 'b'), > + (2, 'c'), > + (3, 'a'), > + (3, 'b'), > + (3, 'c')] > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-42089) Different result in nested lambda function
[ https://issues.apache.org/jira/browse/SPARK-42089?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ruifeng Zheng resolved SPARK-42089. --- Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 39619 [https://github.com/apache/spark/pull/39619] > Different result in nested lambda function > -- > > Key: SPARK-42089 > URL: https://issues.apache.org/jira/browse/SPARK-42089 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 3.4.0 >Reporter: Ruifeng Zheng >Assignee: Ruifeng Zheng >Priority: Major > Fix For: 3.4.0 > > > test_nested_higher_order_function > {code:java} > Traceback (most recent call last): > File > "/Users/ruifeng.zheng/Dev/spark/python/pyspark/sql/tests/test_functions.py", > line 814, in test_nested_higher_order_function > self.assertEquals(actual, expected) > AssertionError: Lists differ: [Row(n='a', l='a'), Row(n='b', l='b'), Row[124 > chars]'c')] != [(1, 'a'), (1, 'b'), (1, 'c'), (2, 'a'), ([43 chars]'c')] > First differing element 0: > Row(n='a', l='a') > (1, 'a') > - [Row(n='a', l='a'), > - Row(n='b', l='b'), > - Row(n='c', l='c'), > - Row(n='a', l='a'), > - Row(n='b', l='b'), > - Row(n='c', l='c'), > - Row(n='a', l='a'), > - Row(n='b', l='b'), > - Row(n='c', l='c')] > + [(1, 'a'), > + (1, 'b'), > + (1, 'c'), > + (2, 'a'), > + (2, 'b'), > + (2, 'c'), > + (3, 'a'), > + (3, 'b'), > + (3, 'c')] > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-42096) Code cleanup for connect module
Yang Jie created SPARK-42096: Summary: Code cleanup for connect module Key: SPARK-42096 URL: https://issues.apache.org/jira/browse/SPARK-42096 Project: Spark Issue Type: Improvement Components: Connect Affects Versions: 3.4.0 Reporter: Yang Jie For example, some functions that are currently only used inside the class weaken the access scope from public to private -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-41982) When the inserted partition type is of string type, similar `dt=01` will be converted to `dt=1`
[ https://issues.apache.org/jira/browse/SPARK-41982?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-41982. - Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 39558 [https://github.com/apache/spark/pull/39558] > When the inserted partition type is of string type, similar `dt=01` will be > converted to `dt=1` > --- > > Key: SPARK-41982 > URL: https://issues.apache.org/jira/browse/SPARK-41982 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.4.0 >Reporter: jingxiong zhong >Priority: Critical > Fix For: 3.4.0 > > > At present, during the process of upgrading Spark2.4 to Spark3.2, we > carefully read the migration documentwe and found a kind of situation not > involved: > {code:java} > create table if not exists test_90(a string, b string) partitioned by (dt > string); > desc formatted test_90; > // case1 > insert into table test_90 partition (dt=05) values("1","2"); > // case2 > insert into table test_90 partition (dt='05') values("1","2"); > drop table test_90;{code} > in spark2.4.3, it will generate such a path: > {code:java} > // the path > hdfs://test5/user/hive/db1/test_90/dt=05 > //result > spark-sql> select * from test_90; > 1 2 05 > 1 2 05 > Time taken: 1.316 seconds, Fetched 2 row(s) > spark-sql> show partitions test_90; > dt=05 > Time taken: 0.201 seconds, Fetched 1 row(s) > spark-sql> select * from test_90 where dt='05'; > 1 2 05 > 1 2 05 > Time taken: 0.212 seconds, Fetched 2 row(s) > spark-sql> explain insert into table test_90 partition (dt=05) > values("1","2"); > == Physical Plan == > Execute InsertIntoHiveTable InsertIntoHiveTable `db1`.`test_90`, > org.apache.hadoop.hive.ql.io.orc.OrcSerde, Map(dt -> Some(05)), false, false, > [a, b] > +- LocalTableScan [a#116, b#117] > Time taken: 1.145 seconds, Fetched 1 row(s){code} > in spark3.2.0, it will generate two path: > {code:java} > // the path > hdfs://test5/user/hive/db1/test_90/dt=05 > hdfs://test5/user/hive/db1/test_90/dt=5 > // result > spark-sql> select * from test_90; > 1 2 05 > 1 2 5 > Time taken: 2.119 seconds, Fetched 2 row(s) > spark-sql> show partitions test_90; > dt=05 > dt=5 > Time taken: 0.161 seconds, Fetched 2 row(s) > spark-sql> select * from test_90 where dt='05'; > 1 2 05 > Time taken: 0.252 seconds, Fetched 1 row(s) > spark-sql> explain insert into table test_90 partition (dt=05) > values("1","2"); > plan > == Physical Plan == > Execute InsertIntoHiveTable `db1`.`test_90`, > org.apache.hadoop.hive.ql.io.orc.OrcSerde, [dt=Some(5)], false, false, [a, b] > +- LocalTableScan [a#109, b#110]{code} > This will cause problems in reading data after the user switches to spark3. > The root cause is that in the process of partition field resolution, Spark3 > has a process of strongly converting this string type, which will cause > partition `05` to lose the previous `0` > So I think we have two solutions: > one is to record the risk clearly in the migration document, and the other is > to repair this case, because we internally keep the partition of string type > as string type, regardless of whether single or double quotation marks are > added. > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41982) When the inserted partition type is of string type, similar `dt=01` will be converted to `dt=1`
[ https://issues.apache.org/jira/browse/SPARK-41982?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-41982: --- Assignee: jingxiong zhong > When the inserted partition type is of string type, similar `dt=01` will be > converted to `dt=1` > --- > > Key: SPARK-41982 > URL: https://issues.apache.org/jira/browse/SPARK-41982 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.4.0 >Reporter: jingxiong zhong >Assignee: jingxiong zhong >Priority: Critical > Fix For: 3.4.0 > > > At present, during the process of upgrading Spark2.4 to Spark3.2, we > carefully read the migration documentwe and found a kind of situation not > involved: > {code:java} > create table if not exists test_90(a string, b string) partitioned by (dt > string); > desc formatted test_90; > // case1 > insert into table test_90 partition (dt=05) values("1","2"); > // case2 > insert into table test_90 partition (dt='05') values("1","2"); > drop table test_90;{code} > in spark2.4.3, it will generate such a path: > {code:java} > // the path > hdfs://test5/user/hive/db1/test_90/dt=05 > //result > spark-sql> select * from test_90; > 1 2 05 > 1 2 05 > Time taken: 1.316 seconds, Fetched 2 row(s) > spark-sql> show partitions test_90; > dt=05 > Time taken: 0.201 seconds, Fetched 1 row(s) > spark-sql> select * from test_90 where dt='05'; > 1 2 05 > 1 2 05 > Time taken: 0.212 seconds, Fetched 2 row(s) > spark-sql> explain insert into table test_90 partition (dt=05) > values("1","2"); > == Physical Plan == > Execute InsertIntoHiveTable InsertIntoHiveTable `db1`.`test_90`, > org.apache.hadoop.hive.ql.io.orc.OrcSerde, Map(dt -> Some(05)), false, false, > [a, b] > +- LocalTableScan [a#116, b#117] > Time taken: 1.145 seconds, Fetched 1 row(s){code} > in spark3.2.0, it will generate two path: > {code:java} > // the path > hdfs://test5/user/hive/db1/test_90/dt=05 > hdfs://test5/user/hive/db1/test_90/dt=5 > // result > spark-sql> select * from test_90; > 1 2 05 > 1 2 5 > Time taken: 2.119 seconds, Fetched 2 row(s) > spark-sql> show partitions test_90; > dt=05 > dt=5 > Time taken: 0.161 seconds, Fetched 2 row(s) > spark-sql> select * from test_90 where dt='05'; > 1 2 05 > Time taken: 0.252 seconds, Fetched 1 row(s) > spark-sql> explain insert into table test_90 partition (dt=05) > values("1","2"); > plan > == Physical Plan == > Execute InsertIntoHiveTable `db1`.`test_90`, > org.apache.hadoop.hive.ql.io.orc.OrcSerde, [dt=Some(5)], false, false, [a, b] > +- LocalTableScan [a#109, b#110]{code} > This will cause problems in reading data after the user switches to spark3. > The root cause is that in the process of partition field resolution, Spark3 > has a process of strongly converting this string type, which will cause > partition `05` to lose the previous `0` > So I think we have two solutions: > one is to record the risk clearly in the migration document, and the other is > to repair this case, because we internally keep the partition of string type > as string type, regardless of whether single or double quotation marks are > added. > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41866) Make `createDataFrame` support array
[ https://issues.apache.org/jira/browse/SPARK-41866?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ruifeng Zheng reassigned SPARK-41866: - Assignee: Hyukjin Kwon > Make `createDataFrame` support array > > > Key: SPARK-41866 > URL: https://issues.apache.org/jira/browse/SPARK-41866 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Sandeep Singh >Assignee: Hyukjin Kwon >Priority: Major > Fix For: 3.4.0 > > > {code:java} > import array > data = [Row(longarray=array.array("l", [-9223372036854775808, 0, > 9223372036854775807]))] > df = self.spark.createDataFrame(data) {code} > Error: > {code:java} > Traceback (most recent call last): > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/tests/test_dataframe.py", > line 1220, in test_create_dataframe_from_array_of_long > df = self.spark.createDataFrame(data) > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/session.py", > line 260, in createDataFrame > table = pa.Table.from_pylist([row.asDict(recursive=True) for row in > _data]) > File "pyarrow/table.pxi", line 3700, in pyarrow.lib.Table.from_pylist > File "pyarrow/table.pxi", line 5221, in pyarrow.lib._from_pylist > File "pyarrow/table.pxi", line 3575, in pyarrow.lib.Table.from_arrays > File "pyarrow/table.pxi", line 1383, in pyarrow.lib._sanitize_arrays > File "pyarrow/table.pxi", line 1364, in pyarrow.lib._schema_from_arrays > File "pyarrow/array.pxi", line 320, in pyarrow.lib.array > File "pyarrow/array.pxi", line 39, in pyarrow.lib._sequence_to_array > File "pyarrow/error.pxi", line 144, in > pyarrow.lib.pyarrow_internal_check_status > File "pyarrow/error.pxi", line 100, in pyarrow.lib.check_status > pyarrow.lib.ArrowInvalid: Could not convert array('l', [-9223372036854775808, > 0, 9223372036854775807]) with type array.array: did not recognize Python > value type when inferring an Arrow data type{code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-41866) Make `createDataFrame` support array
[ https://issues.apache.org/jira/browse/SPARK-41866?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ruifeng Zheng resolved SPARK-41866. --- Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 39617 [https://github.com/apache/spark/pull/39617] > Make `createDataFrame` support array > > > Key: SPARK-41866 > URL: https://issues.apache.org/jira/browse/SPARK-41866 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Sandeep Singh >Priority: Major > Fix For: 3.4.0 > > > {code:java} > import array > data = [Row(longarray=array.array("l", [-9223372036854775808, 0, > 9223372036854775807]))] > df = self.spark.createDataFrame(data) {code} > Error: > {code:java} > Traceback (most recent call last): > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/tests/test_dataframe.py", > line 1220, in test_create_dataframe_from_array_of_long > df = self.spark.createDataFrame(data) > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/session.py", > line 260, in createDataFrame > table = pa.Table.from_pylist([row.asDict(recursive=True) for row in > _data]) > File "pyarrow/table.pxi", line 3700, in pyarrow.lib.Table.from_pylist > File "pyarrow/table.pxi", line 5221, in pyarrow.lib._from_pylist > File "pyarrow/table.pxi", line 3575, in pyarrow.lib.Table.from_arrays > File "pyarrow/table.pxi", line 1383, in pyarrow.lib._sanitize_arrays > File "pyarrow/table.pxi", line 1364, in pyarrow.lib._schema_from_arrays > File "pyarrow/array.pxi", line 320, in pyarrow.lib.array > File "pyarrow/array.pxi", line 39, in pyarrow.lib._sequence_to_array > File "pyarrow/error.pxi", line 144, in > pyarrow.lib.pyarrow_internal_check_status > File "pyarrow/error.pxi", line 100, in pyarrow.lib.check_status > pyarrow.lib.ArrowInvalid: Could not convert array('l', [-9223372036854775808, > 0, 9223372036854775807]) with type array.array: did not recognize Python > value type when inferring an Arrow data type{code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-42072) `core` module requires `javax.servlet-api`
[ https://issues.apache.org/jira/browse/SPARK-42072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-42072. --- Resolution: Cannot Reproduce I verified on a clean Apple Silicon machine and master branch works fine. I'm close this issue for now. > `core` module requires `javax.servlet-api` > -- > > Key: SPARK-42072 > URL: https://issues.apache.org/jira/browse/SPARK-42072 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.4.0 >Reporter: Dongjoon Hyun >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42095) Fix gRPC check in tests
[ https://issues.apache.org/jira/browse/SPARK-42095?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ruifeng Zheng reassigned SPARK-42095: - Assignee: Xinrong Meng > Fix gRPC check in tests > --- > > Key: SPARK-42095 > URL: https://issues.apache.org/jira/browse/SPARK-42095 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.4.0 >Reporter: Xinrong Meng >Assignee: Xinrong Meng >Priority: Major > > Fix gRPC check in tests, including variables and error messages. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-42095) Fix gRPC check in tests
[ https://issues.apache.org/jira/browse/SPARK-42095?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ruifeng Zheng resolved SPARK-42095. --- Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 39618 [https://github.com/apache/spark/pull/39618] > Fix gRPC check in tests > --- > > Key: SPARK-42095 > URL: https://issues.apache.org/jira/browse/SPARK-42095 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.4.0 >Reporter: Xinrong Meng >Assignee: Xinrong Meng >Priority: Major > Fix For: 3.4.0 > > > Fix gRPC check in tests, including variables and error messages. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42089) Different result in nested lambda function
[ https://issues.apache.org/jira/browse/SPARK-42089?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42089: Assignee: Apache Spark > Different result in nested lambda function > -- > > Key: SPARK-42089 > URL: https://issues.apache.org/jira/browse/SPARK-42089 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 3.4.0 >Reporter: Ruifeng Zheng >Assignee: Apache Spark >Priority: Major > > test_nested_higher_order_function > {code:java} > Traceback (most recent call last): > File > "/Users/ruifeng.zheng/Dev/spark/python/pyspark/sql/tests/test_functions.py", > line 814, in test_nested_higher_order_function > self.assertEquals(actual, expected) > AssertionError: Lists differ: [Row(n='a', l='a'), Row(n='b', l='b'), Row[124 > chars]'c')] != [(1, 'a'), (1, 'b'), (1, 'c'), (2, 'a'), ([43 chars]'c')] > First differing element 0: > Row(n='a', l='a') > (1, 'a') > - [Row(n='a', l='a'), > - Row(n='b', l='b'), > - Row(n='c', l='c'), > - Row(n='a', l='a'), > - Row(n='b', l='b'), > - Row(n='c', l='c'), > - Row(n='a', l='a'), > - Row(n='b', l='b'), > - Row(n='c', l='c')] > + [(1, 'a'), > + (1, 'b'), > + (1, 'c'), > + (2, 'a'), > + (2, 'b'), > + (2, 'c'), > + (3, 'a'), > + (3, 'b'), > + (3, 'c')] > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42089) Different result in nested lambda function
[ https://issues.apache.org/jira/browse/SPARK-42089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17677591#comment-17677591 ] Apache Spark commented on SPARK-42089: -- User 'zhengruifeng' has created a pull request for this issue: https://github.com/apache/spark/pull/39619 > Different result in nested lambda function > -- > > Key: SPARK-42089 > URL: https://issues.apache.org/jira/browse/SPARK-42089 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 3.4.0 >Reporter: Ruifeng Zheng >Priority: Major > > test_nested_higher_order_function > {code:java} > Traceback (most recent call last): > File > "/Users/ruifeng.zheng/Dev/spark/python/pyspark/sql/tests/test_functions.py", > line 814, in test_nested_higher_order_function > self.assertEquals(actual, expected) > AssertionError: Lists differ: [Row(n='a', l='a'), Row(n='b', l='b'), Row[124 > chars]'c')] != [(1, 'a'), (1, 'b'), (1, 'c'), (2, 'a'), ([43 chars]'c')] > First differing element 0: > Row(n='a', l='a') > (1, 'a') > - [Row(n='a', l='a'), > - Row(n='b', l='b'), > - Row(n='c', l='c'), > - Row(n='a', l='a'), > - Row(n='b', l='b'), > - Row(n='c', l='c'), > - Row(n='a', l='a'), > - Row(n='b', l='b'), > - Row(n='c', l='c')] > + [(1, 'a'), > + (1, 'b'), > + (1, 'c'), > + (2, 'a'), > + (2, 'b'), > + (2, 'c'), > + (3, 'a'), > + (3, 'b'), > + (3, 'c')] > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42089) Different result in nested lambda function
[ https://issues.apache.org/jira/browse/SPARK-42089?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42089: Assignee: (was: Apache Spark) > Different result in nested lambda function > -- > > Key: SPARK-42089 > URL: https://issues.apache.org/jira/browse/SPARK-42089 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 3.4.0 >Reporter: Ruifeng Zheng >Priority: Major > > test_nested_higher_order_function > {code:java} > Traceback (most recent call last): > File > "/Users/ruifeng.zheng/Dev/spark/python/pyspark/sql/tests/test_functions.py", > line 814, in test_nested_higher_order_function > self.assertEquals(actual, expected) > AssertionError: Lists differ: [Row(n='a', l='a'), Row(n='b', l='b'), Row[124 > chars]'c')] != [(1, 'a'), (1, 'b'), (1, 'c'), (2, 'a'), ([43 chars]'c')] > First differing element 0: > Row(n='a', l='a') > (1, 'a') > - [Row(n='a', l='a'), > - Row(n='b', l='b'), > - Row(n='c', l='c'), > - Row(n='a', l='a'), > - Row(n='b', l='b'), > - Row(n='c', l='c'), > - Row(n='a', l='a'), > - Row(n='b', l='b'), > - Row(n='c', l='c')] > + [(1, 'a'), > + (1, 'b'), > + (1, 'c'), > + (2, 'a'), > + (2, 'b'), > + (2, 'c'), > + (3, 'a'), > + (3, 'b'), > + (3, 'c')] > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-42058) Harden SQLSTATE usage for error classes (2)
[ https://issues.apache.org/jira/browse/SPARK-42058?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-42058. - Fix Version/s: 3.4.0 Resolution: Fixed > Harden SQLSTATE usage for error classes (2) > --- > > Key: SPARK-42058 > URL: https://issues.apache.org/jira/browse/SPARK-42058 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.4.0 >Reporter: Serge Rielau >Assignee: Serge Rielau >Priority: Major > Fix For: 3.4.0 > > > Error classes are great, but for JDBC, ODBC etc the SQLSTATEs of the standard > reign. > We have started adding SQLSTATEs but have not really paid attention to their > correctness. > Follow up to: https://issues.apache.org/jira/browse/SPARK-41994 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42058) Harden SQLSTATE usage for error classes (2)
[ https://issues.apache.org/jira/browse/SPARK-42058?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-42058: --- Assignee: Serge Rielau > Harden SQLSTATE usage for error classes (2) > --- > > Key: SPARK-42058 > URL: https://issues.apache.org/jira/browse/SPARK-42058 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.4.0 >Reporter: Serge Rielau >Assignee: Serge Rielau >Priority: Major > > Error classes are great, but for JDBC, ODBC etc the SQLSTATEs of the standard > reign. > We have started adding SQLSTATEs but have not really paid attention to their > correctness. > Follow up to: https://issues.apache.org/jira/browse/SPARK-41994 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42089) Different result in nested lambda function
[ https://issues.apache.org/jira/browse/SPARK-42089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17677579#comment-17677579 ] Ruifeng Zheng commented on SPARK-42089: --- I am working on this one > Different result in nested lambda function > -- > > Key: SPARK-42089 > URL: https://issues.apache.org/jira/browse/SPARK-42089 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 3.4.0 >Reporter: Ruifeng Zheng >Priority: Major > > test_nested_higher_order_function > {code:java} > Traceback (most recent call last): > File > "/Users/ruifeng.zheng/Dev/spark/python/pyspark/sql/tests/test_functions.py", > line 814, in test_nested_higher_order_function > self.assertEquals(actual, expected) > AssertionError: Lists differ: [Row(n='a', l='a'), Row(n='b', l='b'), Row[124 > chars]'c')] != [(1, 'a'), (1, 'b'), (1, 'c'), (2, 'a'), ([43 chars]'c')] > First differing element 0: > Row(n='a', l='a') > (1, 'a') > - [Row(n='a', l='a'), > - Row(n='b', l='b'), > - Row(n='c', l='c'), > - Row(n='a', l='a'), > - Row(n='b', l='b'), > - Row(n='c', l='c'), > - Row(n='a', l='a'), > - Row(n='b', l='b'), > - Row(n='c', l='c')] > + [(1, 'a'), > + (1, 'b'), > + (1, 'c'), > + (2, 'a'), > + (2, 'b'), > + (2, 'c'), > + (3, 'a'), > + (3, 'b'), > + (3, 'c')] > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42095) Fix gRPC check in tests
[ https://issues.apache.org/jira/browse/SPARK-42095?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42095: Assignee: Apache Spark > Fix gRPC check in tests > --- > > Key: SPARK-42095 > URL: https://issues.apache.org/jira/browse/SPARK-42095 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.4.0 >Reporter: Xinrong Meng >Assignee: Apache Spark >Priority: Major > > Fix gRPC check in tests, including variables and error messages. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42095) Fix gRPC check in tests
[ https://issues.apache.org/jira/browse/SPARK-42095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17677570#comment-17677570 ] Apache Spark commented on SPARK-42095: -- User 'xinrong-meng' has created a pull request for this issue: https://github.com/apache/spark/pull/39618 > Fix gRPC check in tests > --- > > Key: SPARK-42095 > URL: https://issues.apache.org/jira/browse/SPARK-42095 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.4.0 >Reporter: Xinrong Meng >Priority: Major > > Fix gRPC check in tests, including variables and error messages. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42095) Fix gRPC check in tests
[ https://issues.apache.org/jira/browse/SPARK-42095?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42095: Assignee: (was: Apache Spark) > Fix gRPC check in tests > --- > > Key: SPARK-42095 > URL: https://issues.apache.org/jira/browse/SPARK-42095 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.4.0 >Reporter: Xinrong Meng >Priority: Major > > Fix gRPC check in tests, including variables and error messages. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41901) Parity in String representation of Column
[ https://issues.apache.org/jira/browse/SPARK-41901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17677569#comment-17677569 ] Apache Spark commented on SPARK-41901: -- User 'HyukjinKwon' has created a pull request for this issue: https://github.com/apache/spark/pull/39616 > Parity in String representation of Column > - > > Key: SPARK-41901 > URL: https://issues.apache.org/jira/browse/SPARK-41901 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Sandeep Singh >Priority: Major > > {code:java} > from pyspark.sql import functions > funs = [ > (functions.acosh, "ACOSH"), > (functions.asinh, "ASINH"), > (functions.atanh, "ATANH"), > ] > cols = ["a", functions.col("a")] > for f, alias in funs: > for c in cols: > self.assertIn(f"{alias}(a)", repr(f(c))){code} > {code:java} > Traceback (most recent call last): > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/tests/test_functions.py", > line 271, in test_inverse_trig_functions > self.assertIn(f"{alias}(a)", repr(f(c))) > AssertionError: 'ACOSH(a)' not found in > "Column<'acosh(ColumnReference(a))'>"{code} > > > {code:java} > from pyspark.sql.functions import col, lit, overlay > from itertools import chain > import re > actual = list( > chain.from_iterable( > [ > re.findall("(overlay\\(.*\\))", str(x)) > for x in [ > overlay(col("foo"), col("bar"), 1), > overlay("x", "y", 3), > overlay(col("x"), col("y"), 1, 3), > overlay("x", "y", 2, 5), > overlay("x", "y", lit(11)), > overlay("x", "y", lit(2), lit(5)), > ] > ] > ) > ) > expected = [ > "overlay(foo, bar, 1, -1)", > "overlay(x, y, 3, -1)", > "overlay(x, y, 1, 3)", > "overlay(x, y, 2, 5)", > "overlay(x, y, 11, -1)", > "overlay(x, y, 2, 5)", > ] > self.assertListEqual(actual, expected) > df = self.spark.createDataFrame([("SPARK_SQL", "CORE", 7, 0)], ("x", "y", > "pos", "len")) > exp = [Row(ol="SPARK_CORESQL")] > self.assertTrue( > all( > [ > df.select(overlay(df.x, df.y, 7, 0).alias("ol")).collect() == exp, > df.select(overlay(df.x, df.y, lit(7), > lit(0)).alias("ol")).collect() == exp, > df.select(overlay("x", "y", "pos", "len").alias("ol")).collect() > == exp, > ] > ) > ) {code} > {code:java} > Traceback (most recent call last): > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/tests/test_functions.py", > line 675, in test_overlay > self.assertListEqual(actual, expected) > AssertionError: Lists differ: ['overlay(ColumnReference(foo), > ColumnReference(bar[402 chars]5))'] != ['overlay(foo, bar, 1, -1)', > 'overlay(x, y, 3, -1)'[90 chars] 5)'] > First differing element 0: > 'overlay(ColumnReference(foo), ColumnReference(bar), Literal(1), Literal(-1))' > 'overlay(foo, bar, 1, -1)' > - ['overlay(ColumnReference(foo), ColumnReference(bar), Literal(1), > Literal(-1))', > - 'overlay(ColumnReference(x), ColumnReference(y), Literal(3), Literal(-1))', > - 'overlay(ColumnReference(x), ColumnReference(y), Literal(1), Literal(3))', > - 'overlay(ColumnReference(x), ColumnReference(y), Literal(2), Literal(5))', > - 'overlay(ColumnReference(x), ColumnReference(y), Literal(11), > Literal(-1))', > - 'overlay(ColumnReference(x), ColumnReference(y), Literal(2), Literal(5))'] > + ['overlay(foo, bar, 1, -1)', > + 'overlay(x, y, 3, -1)', > + 'overlay(x, y, 1, 3)', > + 'overlay(x, y, 2, 5)', > + 'overlay(x, y, 11, -1)', > + 'overlay(x, y, 2, 5)'] > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41901) Parity in String representation of Column
[ https://issues.apache.org/jira/browse/SPARK-41901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-41901: Assignee: (was: Apache Spark) > Parity in String representation of Column > - > > Key: SPARK-41901 > URL: https://issues.apache.org/jira/browse/SPARK-41901 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Sandeep Singh >Priority: Major > > {code:java} > from pyspark.sql import functions > funs = [ > (functions.acosh, "ACOSH"), > (functions.asinh, "ASINH"), > (functions.atanh, "ATANH"), > ] > cols = ["a", functions.col("a")] > for f, alias in funs: > for c in cols: > self.assertIn(f"{alias}(a)", repr(f(c))){code} > {code:java} > Traceback (most recent call last): > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/tests/test_functions.py", > line 271, in test_inverse_trig_functions > self.assertIn(f"{alias}(a)", repr(f(c))) > AssertionError: 'ACOSH(a)' not found in > "Column<'acosh(ColumnReference(a))'>"{code} > > > {code:java} > from pyspark.sql.functions import col, lit, overlay > from itertools import chain > import re > actual = list( > chain.from_iterable( > [ > re.findall("(overlay\\(.*\\))", str(x)) > for x in [ > overlay(col("foo"), col("bar"), 1), > overlay("x", "y", 3), > overlay(col("x"), col("y"), 1, 3), > overlay("x", "y", 2, 5), > overlay("x", "y", lit(11)), > overlay("x", "y", lit(2), lit(5)), > ] > ] > ) > ) > expected = [ > "overlay(foo, bar, 1, -1)", > "overlay(x, y, 3, -1)", > "overlay(x, y, 1, 3)", > "overlay(x, y, 2, 5)", > "overlay(x, y, 11, -1)", > "overlay(x, y, 2, 5)", > ] > self.assertListEqual(actual, expected) > df = self.spark.createDataFrame([("SPARK_SQL", "CORE", 7, 0)], ("x", "y", > "pos", "len")) > exp = [Row(ol="SPARK_CORESQL")] > self.assertTrue( > all( > [ > df.select(overlay(df.x, df.y, 7, 0).alias("ol")).collect() == exp, > df.select(overlay(df.x, df.y, lit(7), > lit(0)).alias("ol")).collect() == exp, > df.select(overlay("x", "y", "pos", "len").alias("ol")).collect() > == exp, > ] > ) > ) {code} > {code:java} > Traceback (most recent call last): > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/tests/test_functions.py", > line 675, in test_overlay > self.assertListEqual(actual, expected) > AssertionError: Lists differ: ['overlay(ColumnReference(foo), > ColumnReference(bar[402 chars]5))'] != ['overlay(foo, bar, 1, -1)', > 'overlay(x, y, 3, -1)'[90 chars] 5)'] > First differing element 0: > 'overlay(ColumnReference(foo), ColumnReference(bar), Literal(1), Literal(-1))' > 'overlay(foo, bar, 1, -1)' > - ['overlay(ColumnReference(foo), ColumnReference(bar), Literal(1), > Literal(-1))', > - 'overlay(ColumnReference(x), ColumnReference(y), Literal(3), Literal(-1))', > - 'overlay(ColumnReference(x), ColumnReference(y), Literal(1), Literal(3))', > - 'overlay(ColumnReference(x), ColumnReference(y), Literal(2), Literal(5))', > - 'overlay(ColumnReference(x), ColumnReference(y), Literal(11), > Literal(-1))', > - 'overlay(ColumnReference(x), ColumnReference(y), Literal(2), Literal(5))'] > + ['overlay(foo, bar, 1, -1)', > + 'overlay(x, y, 3, -1)', > + 'overlay(x, y, 1, 3)', > + 'overlay(x, y, 2, 5)', > + 'overlay(x, y, 11, -1)', > + 'overlay(x, y, 2, 5)'] > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41901) Parity in String representation of Column
[ https://issues.apache.org/jira/browse/SPARK-41901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-41901: Assignee: Apache Spark > Parity in String representation of Column > - > > Key: SPARK-41901 > URL: https://issues.apache.org/jira/browse/SPARK-41901 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Sandeep Singh >Assignee: Apache Spark >Priority: Major > > {code:java} > from pyspark.sql import functions > funs = [ > (functions.acosh, "ACOSH"), > (functions.asinh, "ASINH"), > (functions.atanh, "ATANH"), > ] > cols = ["a", functions.col("a")] > for f, alias in funs: > for c in cols: > self.assertIn(f"{alias}(a)", repr(f(c))){code} > {code:java} > Traceback (most recent call last): > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/tests/test_functions.py", > line 271, in test_inverse_trig_functions > self.assertIn(f"{alias}(a)", repr(f(c))) > AssertionError: 'ACOSH(a)' not found in > "Column<'acosh(ColumnReference(a))'>"{code} > > > {code:java} > from pyspark.sql.functions import col, lit, overlay > from itertools import chain > import re > actual = list( > chain.from_iterable( > [ > re.findall("(overlay\\(.*\\))", str(x)) > for x in [ > overlay(col("foo"), col("bar"), 1), > overlay("x", "y", 3), > overlay(col("x"), col("y"), 1, 3), > overlay("x", "y", 2, 5), > overlay("x", "y", lit(11)), > overlay("x", "y", lit(2), lit(5)), > ] > ] > ) > ) > expected = [ > "overlay(foo, bar, 1, -1)", > "overlay(x, y, 3, -1)", > "overlay(x, y, 1, 3)", > "overlay(x, y, 2, 5)", > "overlay(x, y, 11, -1)", > "overlay(x, y, 2, 5)", > ] > self.assertListEqual(actual, expected) > df = self.spark.createDataFrame([("SPARK_SQL", "CORE", 7, 0)], ("x", "y", > "pos", "len")) > exp = [Row(ol="SPARK_CORESQL")] > self.assertTrue( > all( > [ > df.select(overlay(df.x, df.y, 7, 0).alias("ol")).collect() == exp, > df.select(overlay(df.x, df.y, lit(7), > lit(0)).alias("ol")).collect() == exp, > df.select(overlay("x", "y", "pos", "len").alias("ol")).collect() > == exp, > ] > ) > ) {code} > {code:java} > Traceback (most recent call last): > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/tests/test_functions.py", > line 675, in test_overlay > self.assertListEqual(actual, expected) > AssertionError: Lists differ: ['overlay(ColumnReference(foo), > ColumnReference(bar[402 chars]5))'] != ['overlay(foo, bar, 1, -1)', > 'overlay(x, y, 3, -1)'[90 chars] 5)'] > First differing element 0: > 'overlay(ColumnReference(foo), ColumnReference(bar), Literal(1), Literal(-1))' > 'overlay(foo, bar, 1, -1)' > - ['overlay(ColumnReference(foo), ColumnReference(bar), Literal(1), > Literal(-1))', > - 'overlay(ColumnReference(x), ColumnReference(y), Literal(3), Literal(-1))', > - 'overlay(ColumnReference(x), ColumnReference(y), Literal(1), Literal(3))', > - 'overlay(ColumnReference(x), ColumnReference(y), Literal(2), Literal(5))', > - 'overlay(ColumnReference(x), ColumnReference(y), Literal(11), > Literal(-1))', > - 'overlay(ColumnReference(x), ColumnReference(y), Literal(2), Literal(5))'] > + ['overlay(foo, bar, 1, -1)', > + 'overlay(x, y, 3, -1)', > + 'overlay(x, y, 1, 3)', > + 'overlay(x, y, 2, 5)', > + 'overlay(x, y, 11, -1)', > + 'overlay(x, y, 2, 5)'] > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-42095) Fix gRPC check in tests
[ https://issues.apache.org/jira/browse/SPARK-42095?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinrong Meng updated SPARK-42095: - Description: Fix gRPC check in tests, including variables and error messages. > Fix gRPC check in tests > --- > > Key: SPARK-42095 > URL: https://issues.apache.org/jira/browse/SPARK-42095 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.4.0 >Reporter: Xinrong Meng >Priority: Major > > Fix gRPC check in tests, including variables and error messages. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-42095) Fix gRPC check in tests
[ https://issues.apache.org/jira/browse/SPARK-42095?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinrong Meng updated SPARK-42095: - Summary: Fix gRPC check in tests (was: gRPC check in tests) > Fix gRPC check in tests > --- > > Key: SPARK-42095 > URL: https://issues.apache.org/jira/browse/SPARK-42095 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.4.0 >Reporter: Xinrong Meng >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-42095) gRPC check in tests
Xinrong Meng created SPARK-42095: Summary: gRPC check in tests Key: SPARK-42095 URL: https://issues.apache.org/jira/browse/SPARK-42095 Project: Spark Issue Type: Sub-task Components: PySpark Affects Versions: 3.4.0 Reporter: Xinrong Meng -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-42094) Support `fill_value` for `ps.Series.add`
Haejoon Lee created SPARK-42094: --- Summary: Support `fill_value` for `ps.Series.add` Key: SPARK-42094 URL: https://issues.apache.org/jira/browse/SPARK-42094 Project: Spark Issue Type: Bug Components: Pandas API on Spark Affects Versions: 3.4.0 Reporter: Haejoon Lee For pandas function parity: https://pandas.pydata.org/docs/reference/api/pandas.Series.add.html -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41866) Make `createDataFrame` support array
[ https://issues.apache.org/jira/browse/SPARK-41866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17677565#comment-17677565 ] Apache Spark commented on SPARK-41866: -- User 'HyukjinKwon' has created a pull request for this issue: https://github.com/apache/spark/pull/39617 > Make `createDataFrame` support array > > > Key: SPARK-41866 > URL: https://issues.apache.org/jira/browse/SPARK-41866 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Sandeep Singh >Priority: Major > > {code:java} > import array > data = [Row(longarray=array.array("l", [-9223372036854775808, 0, > 9223372036854775807]))] > df = self.spark.createDataFrame(data) {code} > Error: > {code:java} > Traceback (most recent call last): > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/tests/test_dataframe.py", > line 1220, in test_create_dataframe_from_array_of_long > df = self.spark.createDataFrame(data) > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/session.py", > line 260, in createDataFrame > table = pa.Table.from_pylist([row.asDict(recursive=True) for row in > _data]) > File "pyarrow/table.pxi", line 3700, in pyarrow.lib.Table.from_pylist > File "pyarrow/table.pxi", line 5221, in pyarrow.lib._from_pylist > File "pyarrow/table.pxi", line 3575, in pyarrow.lib.Table.from_arrays > File "pyarrow/table.pxi", line 1383, in pyarrow.lib._sanitize_arrays > File "pyarrow/table.pxi", line 1364, in pyarrow.lib._schema_from_arrays > File "pyarrow/array.pxi", line 320, in pyarrow.lib.array > File "pyarrow/array.pxi", line 39, in pyarrow.lib._sequence_to_array > File "pyarrow/error.pxi", line 144, in > pyarrow.lib.pyarrow_internal_check_status > File "pyarrow/error.pxi", line 100, in pyarrow.lib.check_status > pyarrow.lib.ArrowInvalid: Could not convert array('l', [-9223372036854775808, > 0, 9223372036854775807]) with type array.array: did not recognize Python > value type when inferring an Arrow data type{code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41866) Make `createDataFrame` support array
[ https://issues.apache.org/jira/browse/SPARK-41866?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-41866: Assignee: Apache Spark > Make `createDataFrame` support array > > > Key: SPARK-41866 > URL: https://issues.apache.org/jira/browse/SPARK-41866 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Sandeep Singh >Assignee: Apache Spark >Priority: Major > > {code:java} > import array > data = [Row(longarray=array.array("l", [-9223372036854775808, 0, > 9223372036854775807]))] > df = self.spark.createDataFrame(data) {code} > Error: > {code:java} > Traceback (most recent call last): > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/tests/test_dataframe.py", > line 1220, in test_create_dataframe_from_array_of_long > df = self.spark.createDataFrame(data) > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/session.py", > line 260, in createDataFrame > table = pa.Table.from_pylist([row.asDict(recursive=True) for row in > _data]) > File "pyarrow/table.pxi", line 3700, in pyarrow.lib.Table.from_pylist > File "pyarrow/table.pxi", line 5221, in pyarrow.lib._from_pylist > File "pyarrow/table.pxi", line 3575, in pyarrow.lib.Table.from_arrays > File "pyarrow/table.pxi", line 1383, in pyarrow.lib._sanitize_arrays > File "pyarrow/table.pxi", line 1364, in pyarrow.lib._schema_from_arrays > File "pyarrow/array.pxi", line 320, in pyarrow.lib.array > File "pyarrow/array.pxi", line 39, in pyarrow.lib._sequence_to_array > File "pyarrow/error.pxi", line 144, in > pyarrow.lib.pyarrow_internal_check_status > File "pyarrow/error.pxi", line 100, in pyarrow.lib.check_status > pyarrow.lib.ArrowInvalid: Could not convert array('l', [-9223372036854775808, > 0, 9223372036854775807]) with type array.array: did not recognize Python > value type when inferring an Arrow data type{code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41866) Make `createDataFrame` support array
[ https://issues.apache.org/jira/browse/SPARK-41866?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-41866: Assignee: (was: Apache Spark) > Make `createDataFrame` support array > > > Key: SPARK-41866 > URL: https://issues.apache.org/jira/browse/SPARK-41866 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Sandeep Singh >Priority: Major > > {code:java} > import array > data = [Row(longarray=array.array("l", [-9223372036854775808, 0, > 9223372036854775807]))] > df = self.spark.createDataFrame(data) {code} > Error: > {code:java} > Traceback (most recent call last): > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/tests/test_dataframe.py", > line 1220, in test_create_dataframe_from_array_of_long > df = self.spark.createDataFrame(data) > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/session.py", > line 260, in createDataFrame > table = pa.Table.from_pylist([row.asDict(recursive=True) for row in > _data]) > File "pyarrow/table.pxi", line 3700, in pyarrow.lib.Table.from_pylist > File "pyarrow/table.pxi", line 5221, in pyarrow.lib._from_pylist > File "pyarrow/table.pxi", line 3575, in pyarrow.lib.Table.from_arrays > File "pyarrow/table.pxi", line 1383, in pyarrow.lib._sanitize_arrays > File "pyarrow/table.pxi", line 1364, in pyarrow.lib._schema_from_arrays > File "pyarrow/array.pxi", line 320, in pyarrow.lib.array > File "pyarrow/array.pxi", line 39, in pyarrow.lib._sequence_to_array > File "pyarrow/error.pxi", line 144, in > pyarrow.lib.pyarrow_internal_check_status > File "pyarrow/error.pxi", line 100, in pyarrow.lib.check_status > pyarrow.lib.ArrowInvalid: Could not convert array('l', [-9223372036854775808, > 0, 9223372036854775807]) with type array.array: did not recognize Python > value type when inferring an Arrow data type{code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41757) Compatibility of string representation in Column
[ https://issues.apache.org/jira/browse/SPARK-41757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17677564#comment-17677564 ] Apache Spark commented on SPARK-41757: -- User 'HyukjinKwon' has created a pull request for this issue: https://github.com/apache/spark/pull/39616 > Compatibility of string representation in Column > > > Key: SPARK-41757 > URL: https://issues.apache.org/jira/browse/SPARK-41757 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Sandeep Singh >Priority: Major > > Doctest in pyspark.sql.connect.column.Columnfails with the error below: > {code:java} > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/column.py", > line 120, in pyspark.sql.connect.column.Column > Failed example: > df.name > Expected: > Column<'name'> > Got: > Column<'ColumnReference(name)'> > ** > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/column.py", > line 122, in pyspark.sql.connect.column.Column > Failed example: > df["name"] > Expected: > Column<'name'> > Got: > Column<'ColumnReference(name)'> > ** > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/column.py", > line 127, in pyspark.sql.connect.column.Column > Failed example: > df.age + 1 > Expected: > Column<'(age + 1)'> > Got: > Column<'+(ColumnReference(age), Literal(1))'> > ** > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/column.py", > line 129, in pyspark.sql.connect.column.Column > Failed example: > 1 / df.age > Expected: > Column<'(1 / age)'> > Got: > Column<'/(Literal(1), ColumnReference(age))'> {code} > > We should enable this back after fixing the issue in Spark Connect -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42067) Upgrade buf from 1.11.0 to 1.12.0
[ https://issues.apache.org/jira/browse/SPARK-42067?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ruifeng Zheng reassigned SPARK-42067: - Assignee: BingKun Pan > Upgrade buf from 1.11.0 to 1.12.0 > - > > Key: SPARK-42067 > URL: https://issues.apache.org/jira/browse/SPARK-42067 > Project: Spark > Issue Type: Improvement > Components: Build, Connect >Affects Versions: 3.4.0 >Reporter: BingKun Pan >Assignee: BingKun Pan >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-42067) Upgrade buf from 1.11.0 to 1.12.0
[ https://issues.apache.org/jira/browse/SPARK-42067?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ruifeng Zheng resolved SPARK-42067. --- Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 39576 [https://github.com/apache/spark/pull/39576] > Upgrade buf from 1.11.0 to 1.12.0 > - > > Key: SPARK-42067 > URL: https://issues.apache.org/jira/browse/SPARK-42067 > Project: Spark > Issue Type: Improvement > Components: Build, Connect >Affects Versions: 3.4.0 >Reporter: BingKun Pan >Assignee: BingKun Pan >Priority: Minor > Fix For: 3.4.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42088) Running python3 setup.py sdist on windows reports a permission error
[ https://issues.apache.org/jira/browse/SPARK-42088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-42088: Assignee: zheju_he > Running python3 setup.py sdist on windows reports a permission error > > > Key: SPARK-42088 > URL: https://issues.apache.org/jira/browse/SPARK-42088 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 3.4.0 >Reporter: zheju_he >Assignee: zheju_he >Priority: Minor > > My system version is windows 10, and I can run setup.py with administrator > permissions, so there will be no error. However, it may be troublesome for us > to upgrade permissions with Windows Server, so we need to modify the code of > setup.py to ensure no error. To avoid the hassle of compiling for the user, I > suggest modifying the following code to enable the out-of-the-box effect > {code:python} > def _supports_symlinks(): > """Check if the system supports symlinks (e.g. *nix) or not.""" > return getattr(os, "symlink", None) is not None and > ctypes.windll.shell32.IsUserAnAdmin() != 0 if sys.platform == "win32" else > True > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-42088) Running python3 setup.py sdist on windows reports a permission error
[ https://issues.apache.org/jira/browse/SPARK-42088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-42088. -- Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 39603 [https://github.com/apache/spark/pull/39603] > Running python3 setup.py sdist on windows reports a permission error > > > Key: SPARK-42088 > URL: https://issues.apache.org/jira/browse/SPARK-42088 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 3.4.0 >Reporter: zheju_he >Assignee: zheju_he >Priority: Minor > Fix For: 3.4.0 > > > My system version is windows 10, and I can run setup.py with administrator > permissions, so there will be no error. However, it may be troublesome for us > to upgrade permissions with Windows Server, so we need to modify the code of > setup.py to ensure no error. To avoid the hassle of compiling for the user, I > suggest modifying the following code to enable the out-of-the-box effect > {code:python} > def _supports_symlinks(): > """Check if the system supports symlinks (e.g. *nix) or not.""" > return getattr(os, "symlink", None) is not None and > ctypes.windll.shell32.IsUserAnAdmin() != 0 if sys.platform == "win32" else > True > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-42091) Upgrade jetty to 9.4.50.v20221201
[ https://issues.apache.org/jira/browse/SPARK-42091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-42091. -- Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 39612 [https://github.com/apache/spark/pull/39612] > Upgrade jetty to 9.4.50.v20221201 > - > > Key: SPARK-42091 > URL: https://issues.apache.org/jira/browse/SPARK-42091 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.4.0 >Reporter: Yang Jie >Assignee: Yang Jie >Priority: Minor > Fix For: 3.4.0 > > > https://github.com/eclipse/jetty.project/releases/tag/jetty-9.4.50.v20221201 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42091) Upgrade jetty to 9.4.50.v20221201
[ https://issues.apache.org/jira/browse/SPARK-42091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-42091: Assignee: Yang Jie > Upgrade jetty to 9.4.50.v20221201 > - > > Key: SPARK-42091 > URL: https://issues.apache.org/jira/browse/SPARK-42091 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.4.0 >Reporter: Yang Jie >Assignee: Yang Jie >Priority: Minor > > https://github.com/eclipse/jetty.project/releases/tag/jetty-9.4.50.v20221201 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42021) createDataFrame with array.array
[ https://issues.apache.org/jira/browse/SPARK-42021?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-42021: Assignee: Hyukjin Kwon > createDataFrame with array.array > > > Key: SPARK-42021 > URL: https://issues.apache.org/jira/browse/SPARK-42021 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Fix For: 3.4.0 > > > {code} > pyspark/sql/tests/test_types.py:964 (TypesParityTests.test_array_types) > self = testMethod=test_array_types> > def test_array_types(self): > # This test need to make sure that the Scala type selected is at least > # as large as the python's types. This is necessary because python's > # array types depend on C implementation on the machine. Therefore > there > # is no machine independent correspondence between python's array > types > # and Scala types. > # See: https://docs.python.org/2/library/array.html > > def assertCollectSuccess(typecode, value): > row = Row(myarray=array.array(typecode, [value])) > df = self.spark.createDataFrame([row]) > self.assertEqual(df.first()["myarray"][0], value) > > # supported string types > # > # String types in python's array are "u" for Py_UNICODE and "c" for > char. > # "u" will be removed in python 4, and "c" is not supported in python > 3. > supported_string_types = [] > if sys.version_info[0] < 4: > supported_string_types += ["u"] > # test unicode > > assertCollectSuccess("u", "a") > ../test_types.py:986: > _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ > _ > ../test_types.py:975: in assertCollectSuccess > df = self.spark.createDataFrame([row]) > ../../connect/session.py:278: in createDataFrame > _table = pa.Table.from_pylist([row.asDict(recursive=True) for row in > _data]) > pyarrow/table.pxi:3700: in pyarrow.lib.Table.from_pylist > ??? > pyarrow/table.pxi:5221: in pyarrow.lib._from_pylist > ??? > pyarrow/table.pxi:3575: in pyarrow.lib.Table.from_arrays > ??? > pyarrow/table.pxi:1383: in pyarrow.lib._sanitize_arrays > ??? > pyarrow/table.pxi:1364: in pyarrow.lib._schema_from_arrays > ??? > pyarrow/array.pxi:320: in pyarrow.lib.array > ??? > pyarrow/array.pxi:39: in pyarrow.lib._sequence_to_array > ??? > pyarrow/error.pxi:144: in pyarrow.lib.pyarrow_internal_check_status > ??? > _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ > _ > > ??? > E pyarrow.lib.ArrowInvalid: Could not convert array('u', 'a') with type > array.array: did not recognize Python value type when inferring an Arrow data > type > pyarrow/error.pxi:100: ArrowInvalid > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41902) Parity in String representation of higher_order_function's output
[ https://issues.apache.org/jira/browse/SPARK-41902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-41902: Assignee: Ruifeng Zheng > Parity in String representation of higher_order_function's output > - > > Key: SPARK-41902 > URL: https://issues.apache.org/jira/browse/SPARK-41902 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Sandeep Singh >Assignee: Ruifeng Zheng >Priority: Major > Fix For: 3.4.0 > > > {code:java} > from pyspark.sql.functions import flatten, struct, transform > df = self.spark.sql("SELECT array(1, 2, 3) as numbers, array('a', 'b', 'c') > as letters") > actual = df.select( > flatten( > transform( > "numbers", > lambda number: transform( > "letters", lambda letter: struct(number.alias("n"), > letter.alias("l")) > ), > ) > ) > ).first()[0] > expected = [ > (1, "a"), > (1, "b"), > (1, "c"), > (2, "a"), > (2, "b"), > (2, "c"), > (3, "a"), > (3, "b"), > (3, "c"), > ] > self.assertEquals(actual, expected){code} > {code:java} > Traceback (most recent call last): > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/tests/test_functions.py", > line 809, in test_nested_higher_order_function > self.assertEquals(actual, expected) > AssertionError: Lists differ: [{'n': 'a', 'l': 'a'}, {'n': 'b', 'l': 'b'[151 > chars]'c'}] != [(1, 'a'), (1, 'b'), (1, 'c'), (2, 'a'), ([43 chars]'c')] > First differing element 0: > {'n': 'a', 'l': 'a'} > (1, 'a') > - [{'l': 'a', 'n': 'a'}, > - {'l': 'b', 'n': 'b'}, > - {'l': 'c', 'n': 'c'}, > - {'l': 'a', 'n': 'a'}, > - {'l': 'b', 'n': 'b'}, > - {'l': 'c', 'n': 'c'}, > - {'l': 'a', 'n': 'a'}, > - {'l': 'b', 'n': 'b'}, > - {'l': 'c', 'n': 'c'}] > + [(1, 'a'), > + (1, 'b'), > + (1, 'c'), > + (2, 'a'), > + (2, 'b'), > + (2, 'c'), > + (3, 'a'), > + (3, 'b'), > + (3, 'c')] > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42021) createDataFrame with array.array
[ https://issues.apache.org/jira/browse/SPARK-42021?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-42021: Assignee: Ruifeng Zheng (was: Hyukjin Kwon) > createDataFrame with array.array > > > Key: SPARK-42021 > URL: https://issues.apache.org/jira/browse/SPARK-42021 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Hyukjin Kwon >Assignee: Ruifeng Zheng >Priority: Major > Fix For: 3.4.0 > > > {code} > pyspark/sql/tests/test_types.py:964 (TypesParityTests.test_array_types) > self = testMethod=test_array_types> > def test_array_types(self): > # This test need to make sure that the Scala type selected is at least > # as large as the python's types. This is necessary because python's > # array types depend on C implementation on the machine. Therefore > there > # is no machine independent correspondence between python's array > types > # and Scala types. > # See: https://docs.python.org/2/library/array.html > > def assertCollectSuccess(typecode, value): > row = Row(myarray=array.array(typecode, [value])) > df = self.spark.createDataFrame([row]) > self.assertEqual(df.first()["myarray"][0], value) > > # supported string types > # > # String types in python's array are "u" for Py_UNICODE and "c" for > char. > # "u" will be removed in python 4, and "c" is not supported in python > 3. > supported_string_types = [] > if sys.version_info[0] < 4: > supported_string_types += ["u"] > # test unicode > > assertCollectSuccess("u", "a") > ../test_types.py:986: > _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ > _ > ../test_types.py:975: in assertCollectSuccess > df = self.spark.createDataFrame([row]) > ../../connect/session.py:278: in createDataFrame > _table = pa.Table.from_pylist([row.asDict(recursive=True) for row in > _data]) > pyarrow/table.pxi:3700: in pyarrow.lib.Table.from_pylist > ??? > pyarrow/table.pxi:5221: in pyarrow.lib._from_pylist > ??? > pyarrow/table.pxi:3575: in pyarrow.lib.Table.from_arrays > ??? > pyarrow/table.pxi:1383: in pyarrow.lib._sanitize_arrays > ??? > pyarrow/table.pxi:1364: in pyarrow.lib._schema_from_arrays > ??? > pyarrow/array.pxi:320: in pyarrow.lib.array > ??? > pyarrow/array.pxi:39: in pyarrow.lib._sequence_to_array > ??? > pyarrow/error.pxi:144: in pyarrow.lib.pyarrow_internal_check_status > ??? > _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ > _ > > ??? > E pyarrow.lib.ArrowInvalid: Could not convert array('u', 'a') with type > array.array: did not recognize Python value type when inferring an Arrow data > type > pyarrow/error.pxi:100: ArrowInvalid > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-41902) Parity in String representation of higher_order_function's output
[ https://issues.apache.org/jira/browse/SPARK-41902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-41902. -- Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 39607 [https://github.com/apache/spark/pull/39607] > Parity in String representation of higher_order_function's output > - > > Key: SPARK-41902 > URL: https://issues.apache.org/jira/browse/SPARK-41902 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Sandeep Singh >Priority: Major > Fix For: 3.4.0 > > > {code:java} > from pyspark.sql.functions import flatten, struct, transform > df = self.spark.sql("SELECT array(1, 2, 3) as numbers, array('a', 'b', 'c') > as letters") > actual = df.select( > flatten( > transform( > "numbers", > lambda number: transform( > "letters", lambda letter: struct(number.alias("n"), > letter.alias("l")) > ), > ) > ) > ).first()[0] > expected = [ > (1, "a"), > (1, "b"), > (1, "c"), > (2, "a"), > (2, "b"), > (2, "c"), > (3, "a"), > (3, "b"), > (3, "c"), > ] > self.assertEquals(actual, expected){code} > {code:java} > Traceback (most recent call last): > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/tests/test_functions.py", > line 809, in test_nested_higher_order_function > self.assertEquals(actual, expected) > AssertionError: Lists differ: [{'n': 'a', 'l': 'a'}, {'n': 'b', 'l': 'b'[151 > chars]'c'}] != [(1, 'a'), (1, 'b'), (1, 'c'), (2, 'a'), ([43 chars]'c')] > First differing element 0: > {'n': 'a', 'l': 'a'} > (1, 'a') > - [{'l': 'a', 'n': 'a'}, > - {'l': 'b', 'n': 'b'}, > - {'l': 'c', 'n': 'c'}, > - {'l': 'a', 'n': 'a'}, > - {'l': 'b', 'n': 'b'}, > - {'l': 'c', 'n': 'c'}, > - {'l': 'a', 'n': 'a'}, > - {'l': 'b', 'n': 'b'}, > - {'l': 'c', 'n': 'c'}] > + [(1, 'a'), > + (1, 'b'), > + (1, 'c'), > + (2, 'a'), > + (2, 'b'), > + (2, 'c'), > + (3, 'a'), > + (3, 'b'), > + (3, 'c')] > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-42021) createDataFrame with array.array
[ https://issues.apache.org/jira/browse/SPARK-42021?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-42021. -- Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 39602 [https://github.com/apache/spark/pull/39602] > createDataFrame with array.array > > > Key: SPARK-42021 > URL: https://issues.apache.org/jira/browse/SPARK-42021 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Hyukjin Kwon >Priority: Major > Fix For: 3.4.0 > > > {code} > pyspark/sql/tests/test_types.py:964 (TypesParityTests.test_array_types) > self = testMethod=test_array_types> > def test_array_types(self): > # This test need to make sure that the Scala type selected is at least > # as large as the python's types. This is necessary because python's > # array types depend on C implementation on the machine. Therefore > there > # is no machine independent correspondence between python's array > types > # and Scala types. > # See: https://docs.python.org/2/library/array.html > > def assertCollectSuccess(typecode, value): > row = Row(myarray=array.array(typecode, [value])) > df = self.spark.createDataFrame([row]) > self.assertEqual(df.first()["myarray"][0], value) > > # supported string types > # > # String types in python's array are "u" for Py_UNICODE and "c" for > char. > # "u" will be removed in python 4, and "c" is not supported in python > 3. > supported_string_types = [] > if sys.version_info[0] < 4: > supported_string_types += ["u"] > # test unicode > > assertCollectSuccess("u", "a") > ../test_types.py:986: > _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ > _ > ../test_types.py:975: in assertCollectSuccess > df = self.spark.createDataFrame([row]) > ../../connect/session.py:278: in createDataFrame > _table = pa.Table.from_pylist([row.asDict(recursive=True) for row in > _data]) > pyarrow/table.pxi:3700: in pyarrow.lib.Table.from_pylist > ??? > pyarrow/table.pxi:5221: in pyarrow.lib._from_pylist > ??? > pyarrow/table.pxi:3575: in pyarrow.lib.Table.from_arrays > ??? > pyarrow/table.pxi:1383: in pyarrow.lib._sanitize_arrays > ??? > pyarrow/table.pxi:1364: in pyarrow.lib._schema_from_arrays > ??? > pyarrow/array.pxi:320: in pyarrow.lib.array > ??? > pyarrow/array.pxi:39: in pyarrow.lib._sequence_to_array > ??? > pyarrow/error.pxi:144: in pyarrow.lib.pyarrow_internal_check_status > ??? > _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ > _ > > ??? > E pyarrow.lib.ArrowInvalid: Could not convert array('u', 'a') with type > array.array: did not recognize Python value type when inferring an Arrow data > type > pyarrow/error.pxi:100: ArrowInvalid > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-42079) Rename proto messages for `toDF` and `withColumnsRenamed`
[ https://issues.apache.org/jira/browse/SPARK-42079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-42079. -- Fix Version/s: 3.4.0 Assignee: Ruifeng Zheng Resolution: Fixed > Rename proto messages for `toDF` and `withColumnsRenamed` > - > > Key: SPARK-42079 > URL: https://issues.apache.org/jira/browse/SPARK-42079 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 3.4.0 >Reporter: Ruifeng Zheng >Assignee: Ruifeng Zheng >Priority: Major > Fix For: 3.4.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42079) Rename proto messages for `toDF` and `withColumnsRenamed`
[ https://issues.apache.org/jira/browse/SPARK-42079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17677549#comment-17677549 ] Hyukjin Kwon commented on SPARK-42079: -- Fixed in https://github.com/apache/spark/pull/39590 > Rename proto messages for `toDF` and `withColumnsRenamed` > - > > Key: SPARK-42079 > URL: https://issues.apache.org/jira/browse/SPARK-42079 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 3.4.0 >Reporter: Ruifeng Zheng >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-42068) Implicit conversion is not working with parallelization in scala with java 11 and spark3
[ https://issues.apache.org/jira/browse/SPARK-42068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Srinivas Rishindra Pothireddi updated SPARK-42068: -- Summary: Implicit conversion is not working with parallelization in scala with java 11 and spark3 (was: Parallelization in Scala is not working with Java 11 and spark3) > Implicit conversion is not working with parallelization in scala with java 11 > and spark3 > > > Key: SPARK-42068 > URL: https://issues.apache.org/jira/browse/SPARK-42068 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.3.1, 3.2.3, 3.4.0 > Environment: spark version 3.3.1 Using Scala version 2.12.15 (OpenJDK > 64-Bit Server VM, Java 11.0.17) >Reporter: Srinivas Rishindra Pothireddi >Priority: Major > > The following code snippet fails with java 11 with spark3, but works with > java 8. It also works with spark2 and java 11. > {code:java} > import scala.collection.mutable > import scala.collection.parallel.{ExecutionContextTaskSupport, > ForkJoinTaskSupport} > case class Person(name: String, age: Int) > val pc = List(1, 2, 3).par > val forkJoinPool = new java.util.concurrent.ForkJoinPool(2) > pc.tasksupport = new ForkJoinTaskSupport(forkJoinPool) > pc.map { x => > val personList: Array[Person] = (1 to 999).map(value => Person("p" + > value, value)).toArray > //creating RDD of Person > val rddPerson = spark.sparkContext.parallelize(personList, 5) > val evenAgePerson = rddPerson.filter(_.age % 2 == 0) > import spark.implicits._ > val evenAgePersonDF = evenAgePerson.toDF("Name", "Age") > } {code} > The error is as follows. > {code:java} > scala.ScalaReflectionException: object $read not found. > at scala.reflect.internal.Mirrors$RootsBase.staticModule(Mirrors.scala:185) > at scala.reflect.internal.Mirrors$RootsBase.staticModule(Mirrors.scala:29) > at $typecreator6$1.apply(:37) > at > scala.reflect.api.TypeTags$WeakTypeTagImpl.tpe$lzycompute(TypeTags.scala:237) > at scala.reflect.api.TypeTags$WeakTypeTagImpl.tpe(TypeTags.scala:237) > at > org.apache.spark.sql.catalyst.encoders.ExpressionEncoder$.apply(ExpressionEncoder.scala:52) > at org.apache.spark.sql.Encoders$.product(Encoders.scala:300) > at > org.apache.spark.sql.LowPrioritySQLImplicits.newProductEncoder(SQLImplicits.scala:261) > at > org.apache.spark.sql.LowPrioritySQLImplicits.newProductEncoder$(SQLImplicits.scala:261) > at > org.apache.spark.sql.SQLImplicits.newProductEncoder(SQLImplicits.scala:32) > at $anonfun$res0$1(:37) > at scala.runtime.java8.JFunction1$mcVI$sp.apply(JFunction1$mcVI$sp.java:23) > at > scala.collection.parallel.AugmentedIterableIterator.map2combiner(RemainsIterator.scala:116) > at > scala.collection.parallel.AugmentedIterableIterator.map2combiner$(RemainsIterator.scala:113) > at > scala.collection.parallel.immutable.ParVector$ParVectorIterator.map2combiner(ParVector.scala:66) > at > scala.collection.parallel.ParIterableLike$Map.leaf(ParIterableLike.scala:1064) > at scala.collection.parallel.Task.$anonfun$tryLeaf$1(Tasks.scala:53) > at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) > at scala.util.control.Breaks$$anon$1.catchBreak(Breaks.scala:67) > at scala.collection.parallel.Task.tryLeaf(Tasks.scala:56) > at scala.collection.parallel.Task.tryLeaf$(Tasks.scala:50) > at > scala.collection.parallel.ParIterableLike$Map.tryLeaf(ParIterableLike.scala:1061) > at > scala.collection.parallel.AdaptiveWorkStealingTasks$WrappedTask.internal(Tasks.scala:160) > at > scala.collection.parallel.AdaptiveWorkStealingTasks$WrappedTask.internal$(Tasks.scala:157) > at > scala.collection.parallel.AdaptiveWorkStealingForkJoinTasks$WrappedTask.internal(Tasks.scala:440) > at > scala.collection.parallel.AdaptiveWorkStealingTasks$WrappedTask.compute(Tasks.scala:150) > at > scala.collection.parallel.AdaptiveWorkStealingTasks$WrappedTask.compute$(Tasks.scala:149) > at > scala.collection.parallel.AdaptiveWorkStealingForkJoinTasks$WrappedTask.compute(Tasks.scala:440) > at > java.base/java.util.concurrent.RecursiveAction.exec(RecursiveAction.java:189) > at java.base/java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:290) > at java.base/java.util.concurrent.ForkJoinTask.doJoin(ForkJoinTask.java:396) > at java.base/java.util.concurrent.ForkJoinTask.join(ForkJoinTask.java:721) > at scala.collection.parallel.ForkJoinTasks$WrappedTask.sync(Tasks.scala:379) > at > scala.collection.parallel.ForkJoinTasks$WrappedTask.sync$(Tasks.scala:379) > at > scala.collection.parallel.AdaptiveWorkStealingForkJoinTasks$WrappedTask.sync(Tasks.scala:440) > at >
[jira] [Commented] (SPARK-42093) Move JavaTypeInference to AgnosticEncoders
[ https://issues.apache.org/jira/browse/SPARK-42093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17677488#comment-17677488 ] Apache Spark commented on SPARK-42093: -- User 'hvanhovell' has created a pull request for this issue: https://github.com/apache/spark/pull/39615 > Move JavaTypeInference to AgnosticEncoders > -- > > Key: SPARK-42093 > URL: https://issues.apache.org/jira/browse/SPARK-42093 > Project: Spark > Issue Type: Task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Herman van Hövell >Assignee: Herman van Hövell >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42093) Move JavaTypeInference to AgnosticEncoders
[ https://issues.apache.org/jira/browse/SPARK-42093?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42093: Assignee: Herman van Hövell (was: Apache Spark) > Move JavaTypeInference to AgnosticEncoders > -- > > Key: SPARK-42093 > URL: https://issues.apache.org/jira/browse/SPARK-42093 > Project: Spark > Issue Type: Task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Herman van Hövell >Assignee: Herman van Hövell >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42093) Move JavaTypeInference to AgnosticEncoders
[ https://issues.apache.org/jira/browse/SPARK-42093?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42093: Assignee: Apache Spark (was: Herman van Hövell) > Move JavaTypeInference to AgnosticEncoders > -- > > Key: SPARK-42093 > URL: https://issues.apache.org/jira/browse/SPARK-42093 > Project: Spark > Issue Type: Task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Herman van Hövell >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-42093) Move JavaTypeInference to AgnosticEncoders
Herman van Hövell created SPARK-42093: - Summary: Move JavaTypeInference to AgnosticEncoders Key: SPARK-42093 URL: https://issues.apache.org/jira/browse/SPARK-42093 Project: Spark Issue Type: Task Components: SQL Affects Versions: 3.4.0 Reporter: Herman van Hövell Assignee: Herman van Hövell -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42002) Implement DataFrameWriterV2 (ReadwriterV2Tests)
[ https://issues.apache.org/jira/browse/SPARK-42002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42002: Assignee: (was: Apache Spark) > Implement DataFrameWriterV2 (ReadwriterV2Tests) > --- > > Key: SPARK-42002 > URL: https://issues.apache.org/jira/browse/SPARK-42002 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Hyukjin Kwon >Priority: Major > > {code} > pyspark/sql/tests/test_readwriter.py:182 (ReadwriterV2ParityTests.test_api) > self = > testMethod=test_api> > def test_api(self): > df = self.df > > writer = df.writeTo("testcat.t") > ../test_readwriter.py:185: > _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ > _ > self = DataFrame[key: bigint, value: string], args = ('testcat.t',), kwargs = > {} > def writeTo(self, *args: Any, **kwargs: Any) -> None: > > raise NotImplementedError("writeTo() is not implemented.") > E NotImplementedError: writeTo() is not implemented. > ../../connect/dataframe.py:1529: NotImplementedError > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42002) Implement DataFrameWriterV2 (ReadwriterV2Tests)
[ https://issues.apache.org/jira/browse/SPARK-42002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17677487#comment-17677487 ] Apache Spark commented on SPARK-42002: -- User 'techaddict' has created a pull request for this issue: https://github.com/apache/spark/pull/39614 > Implement DataFrameWriterV2 (ReadwriterV2Tests) > --- > > Key: SPARK-42002 > URL: https://issues.apache.org/jira/browse/SPARK-42002 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Hyukjin Kwon >Priority: Major > > {code} > pyspark/sql/tests/test_readwriter.py:182 (ReadwriterV2ParityTests.test_api) > self = > testMethod=test_api> > def test_api(self): > df = self.df > > writer = df.writeTo("testcat.t") > ../test_readwriter.py:185: > _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ > _ > self = DataFrame[key: bigint, value: string], args = ('testcat.t',), kwargs = > {} > def writeTo(self, *args: Any, **kwargs: Any) -> None: > > raise NotImplementedError("writeTo() is not implemented.") > E NotImplementedError: writeTo() is not implemented. > ../../connect/dataframe.py:1529: NotImplementedError > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42002) Implement DataFrameWriterV2 (ReadwriterV2Tests)
[ https://issues.apache.org/jira/browse/SPARK-42002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42002: Assignee: Apache Spark > Implement DataFrameWriterV2 (ReadwriterV2Tests) > --- > > Key: SPARK-42002 > URL: https://issues.apache.org/jira/browse/SPARK-42002 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Hyukjin Kwon >Assignee: Apache Spark >Priority: Major > > {code} > pyspark/sql/tests/test_readwriter.py:182 (ReadwriterV2ParityTests.test_api) > self = > testMethod=test_api> > def test_api(self): > df = self.df > > writer = df.writeTo("testcat.t") > ../test_readwriter.py:185: > _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ > _ > self = DataFrame[key: bigint, value: string], args = ('testcat.t',), kwargs = > {} > def writeTo(self, *args: Any, **kwargs: Any) -> None: > > raise NotImplementedError("writeTo() is not implemented.") > E NotImplementedError: writeTo() is not implemented. > ../../connect/dataframe.py:1529: NotImplementedError > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42092) Upgrade RoaringBitmap to 0.9.38
[ https://issues.apache.org/jira/browse/SPARK-42092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17677465#comment-17677465 ] Apache Spark commented on SPARK-42092: -- User 'LuciferYang' has created a pull request for this issue: https://github.com/apache/spark/pull/39613 > Upgrade RoaringBitmap to 0.9.38 > --- > > Key: SPARK-42092 > URL: https://issues.apache.org/jira/browse/SPARK-42092 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.4.0 >Reporter: Yang Jie >Priority: Minor > > https://github.com/RoaringBitmap/RoaringBitmap/compare/0.9.36...0.9.38 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42092) Upgrade RoaringBitmap to 0.9.38
[ https://issues.apache.org/jira/browse/SPARK-42092?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42092: Assignee: (was: Apache Spark) > Upgrade RoaringBitmap to 0.9.38 > --- > > Key: SPARK-42092 > URL: https://issues.apache.org/jira/browse/SPARK-42092 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.4.0 >Reporter: Yang Jie >Priority: Minor > > https://github.com/RoaringBitmap/RoaringBitmap/compare/0.9.36...0.9.38 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42092) Upgrade RoaringBitmap to 0.9.38
[ https://issues.apache.org/jira/browse/SPARK-42092?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42092: Assignee: Apache Spark > Upgrade RoaringBitmap to 0.9.38 > --- > > Key: SPARK-42092 > URL: https://issues.apache.org/jira/browse/SPARK-42092 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.4.0 >Reporter: Yang Jie >Assignee: Apache Spark >Priority: Minor > > https://github.com/RoaringBitmap/RoaringBitmap/compare/0.9.36...0.9.38 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42092) Upgrade RoaringBitmap to 0.9.38
[ https://issues.apache.org/jira/browse/SPARK-42092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17677464#comment-17677464 ] Apache Spark commented on SPARK-42092: -- User 'LuciferYang' has created a pull request for this issue: https://github.com/apache/spark/pull/39613 > Upgrade RoaringBitmap to 0.9.38 > --- > > Key: SPARK-42092 > URL: https://issues.apache.org/jira/browse/SPARK-42092 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.4.0 >Reporter: Yang Jie >Priority: Minor > > https://github.com/RoaringBitmap/RoaringBitmap/compare/0.9.36...0.9.38 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-42092) Upgrade RoaringBitmap to 0.9.38
Yang Jie created SPARK-42092: Summary: Upgrade RoaringBitmap to 0.9.38 Key: SPARK-42092 URL: https://issues.apache.org/jira/browse/SPARK-42092 Project: Spark Issue Type: Improvement Components: Build Affects Versions: 3.4.0 Reporter: Yang Jie https://github.com/RoaringBitmap/RoaringBitmap/compare/0.9.36...0.9.38 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42091) Upgrade jetty to 9.4.50.v20221201
[ https://issues.apache.org/jira/browse/SPARK-42091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42091: Assignee: (was: Apache Spark) > Upgrade jetty to 9.4.50.v20221201 > - > > Key: SPARK-42091 > URL: https://issues.apache.org/jira/browse/SPARK-42091 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.4.0 >Reporter: Yang Jie >Priority: Minor > > https://github.com/eclipse/jetty.project/releases/tag/jetty-9.4.50.v20221201 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42091) Upgrade jetty to 9.4.50.v20221201
[ https://issues.apache.org/jira/browse/SPARK-42091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17677460#comment-17677460 ] Apache Spark commented on SPARK-42091: -- User 'LuciferYang' has created a pull request for this issue: https://github.com/apache/spark/pull/39612 > Upgrade jetty to 9.4.50.v20221201 > - > > Key: SPARK-42091 > URL: https://issues.apache.org/jira/browse/SPARK-42091 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.4.0 >Reporter: Yang Jie >Priority: Minor > > https://github.com/eclipse/jetty.project/releases/tag/jetty-9.4.50.v20221201 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42091) Upgrade jetty to 9.4.50.v20221201
[ https://issues.apache.org/jira/browse/SPARK-42091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42091: Assignee: Apache Spark > Upgrade jetty to 9.4.50.v20221201 > - > > Key: SPARK-42091 > URL: https://issues.apache.org/jira/browse/SPARK-42091 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.4.0 >Reporter: Yang Jie >Assignee: Apache Spark >Priority: Minor > > https://github.com/eclipse/jetty.project/releases/tag/jetty-9.4.50.v20221201 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-42091) Upgrade jetty to 9.4.50.v20221201
Yang Jie created SPARK-42091: Summary: Upgrade jetty to 9.4.50.v20221201 Key: SPARK-42091 URL: https://issues.apache.org/jira/browse/SPARK-42091 Project: Spark Issue Type: Improvement Components: Build Affects Versions: 3.4.0 Reporter: Yang Jie https://github.com/eclipse/jetty.project/releases/tag/jetty-9.4.50.v20221201 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41708) Pull v1write information to WriteFiles
[ https://issues.apache.org/jira/browse/SPARK-41708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17677405#comment-17677405 ] Apache Spark commented on SPARK-41708: -- User 'LuciferYang' has created a pull request for this issue: https://github.com/apache/spark/pull/39610 > Pull v1write information to WriteFiles > -- > > Key: SPARK-41708 > URL: https://issues.apache.org/jira/browse/SPARK-41708 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: XiDuo You >Assignee: XiDuo You >Priority: Major > Fix For: 3.4.0 > > > Make WriteFiles hold v1 write information -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42090) Introduce sasl retry count in RetryingBlockTransferor
[ https://issues.apache.org/jira/browse/SPARK-42090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42090: Assignee: (was: Apache Spark) > Introduce sasl retry count in RetryingBlockTransferor > - > > Key: SPARK-42090 > URL: https://issues.apache.org/jira/browse/SPARK-42090 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.4.0 >Reporter: Ted Yu >Priority: Major > > Previously a boolean variable, saslTimeoutSeen, was used in > RetryingBlockTransferor. However, the boolean variable wouldn't cover the > following scenario: > 1. SaslTimeoutException > 2. IOException > 3. SaslTimeoutException > 4. IOException > Even though IOException at #2 is retried (resulting in increment of > retryCount), the retryCount would be cleared at step #4. > Since the intention of saslTimeoutSeen is to undo the increment due to > retrying SaslTimeoutException, we should keep a counter for > SaslTimeoutException retries and subtract the value of this counter from > retryCount. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42090) Introduce sasl retry count in RetryingBlockTransferor
[ https://issues.apache.org/jira/browse/SPARK-42090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42090: Assignee: Apache Spark > Introduce sasl retry count in RetryingBlockTransferor > - > > Key: SPARK-42090 > URL: https://issues.apache.org/jira/browse/SPARK-42090 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.4.0 >Reporter: Ted Yu >Assignee: Apache Spark >Priority: Major > > Previously a boolean variable, saslTimeoutSeen, was used in > RetryingBlockTransferor. However, the boolean variable wouldn't cover the > following scenario: > 1. SaslTimeoutException > 2. IOException > 3. SaslTimeoutException > 4. IOException > Even though IOException at #2 is retried (resulting in increment of > retryCount), the retryCount would be cleared at step #4. > Since the intention of saslTimeoutSeen is to undo the increment due to > retrying SaslTimeoutException, we should keep a counter for > SaslTimeoutException retries and subtract the value of this counter from > retryCount. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42090) Introduce sasl retry count in RetryingBlockTransferor
[ https://issues.apache.org/jira/browse/SPARK-42090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17677404#comment-17677404 ] Apache Spark commented on SPARK-42090: -- User 'tedyu' has created a pull request for this issue: https://github.com/apache/spark/pull/39611 > Introduce sasl retry count in RetryingBlockTransferor > - > > Key: SPARK-42090 > URL: https://issues.apache.org/jira/browse/SPARK-42090 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.4.0 >Reporter: Ted Yu >Priority: Major > > Previously a boolean variable, saslTimeoutSeen, was used in > RetryingBlockTransferor. However, the boolean variable wouldn't cover the > following scenario: > 1. SaslTimeoutException > 2. IOException > 3. SaslTimeoutException > 4. IOException > Even though IOException at #2 is retried (resulting in increment of > retryCount), the retryCount would be cleared at step #4. > Since the intention of saslTimeoutSeen is to undo the increment due to > retrying SaslTimeoutException, we should keep a counter for > SaslTimeoutException retries and subtract the value of this counter from > retryCount. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-42090) Introduce sasl retry count in RetryingBlockTransferor
[ https://issues.apache.org/jira/browse/SPARK-42090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated SPARK-42090: --- Description: Previously a boolean variable, saslTimeoutSeen, was used in RetryingBlockTransferor. However, the boolean variable wouldn't cover the following scenario: 1. SaslTimeoutException 2. IOException 3. SaslTimeoutException 4. IOException Even though IOException at #2 is retried (resulting in increment of retryCount), the retryCount would be cleared at step #4. Since the intention of saslTimeoutSeen is to undo the increment due to retrying SaslTimeoutException, we should keep a counter for SaslTimeoutException retries and subtract the value of this counter from retryCount. was: Previously a boolean variable, saslTimeoutSeen, was used. However, the boolean variable wouldn't cover the following scenario: 1. SaslTimeoutException 2. IOException 3. SaslTimeoutException 4. IOException Even though IOException at #2 is retried (resulting in increment of retryCount), the retryCount would be cleared at step #4. Since the intention of saslTimeoutSeen is to undo the increment due to retrying SaslTimeoutException, we should keep a counter for SaslTimeoutException retries and subtract the value of this counter from retryCount. > Introduce sasl retry count in RetryingBlockTransferor > - > > Key: SPARK-42090 > URL: https://issues.apache.org/jira/browse/SPARK-42090 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.4.0 >Reporter: Ted Yu >Priority: Major > > Previously a boolean variable, saslTimeoutSeen, was used in > RetryingBlockTransferor. However, the boolean variable wouldn't cover the > following scenario: > 1. SaslTimeoutException > 2. IOException > 3. SaslTimeoutException > 4. IOException > Even though IOException at #2 is retried (resulting in increment of > retryCount), the retryCount would be cleared at step #4. > Since the intention of saslTimeoutSeen is to undo the increment due to > retrying SaslTimeoutException, we should keep a counter for > SaslTimeoutException retries and subtract the value of this counter from > retryCount. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-42090) Introduce sasl retry count in RetryingBlockTransferor
Ted Yu created SPARK-42090: -- Summary: Introduce sasl retry count in RetryingBlockTransferor Key: SPARK-42090 URL: https://issues.apache.org/jira/browse/SPARK-42090 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 3.4.0 Reporter: Ted Yu Previously a boolean variable, saslTimeoutSeen, was used. However, the boolean variable wouldn't cover the following scenario: 1. SaslTimeoutException 2. IOException 3. SaslTimeoutException 4. IOException Even though IOException at #2 is retried (resulting in increment of retryCount), the retryCount would be cleared at step #4. Since the intention of saslTimeoutSeen is to undo the increment due to retrying SaslTimeoutException, we should keep a counter for SaslTimeoutException retries and subtract the value of this counter from retryCount. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41994) Harden SQLSTATE usage for error classes
[ https://issues.apache.org/jira/browse/SPARK-41994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-41994: --- Assignee: Serge Rielau > Harden SQLSTATE usage for error classes > --- > > Key: SPARK-41994 > URL: https://issues.apache.org/jira/browse/SPARK-41994 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.4.0 >Reporter: Serge Rielau >Assignee: Serge Rielau >Priority: Major > Fix For: 3.4.0 > > > Error classes are great, but for JDBC, ODBC etc the SQLSTATEs of the standard > reign. > We have started adding SQLSTATEs but have not really paid attention to their > correctness. > Here is a unified view of SQLSTATE's used in the > [Industry.|https://docs.google.com/spreadsheets/d/1hrQBSuHooiozUNAQTHiYq3WidS1uliHpl9cYfWpig1c/edit?usp=sharing] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-41994) Harden SQLSTATE usage for error classes
[ https://issues.apache.org/jira/browse/SPARK-41994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-41994. - Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 39537 [https://github.com/apache/spark/pull/39537] > Harden SQLSTATE usage for error classes > --- > > Key: SPARK-41994 > URL: https://issues.apache.org/jira/browse/SPARK-41994 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.4.0 >Reporter: Serge Rielau >Priority: Major > Fix For: 3.4.0 > > > Error classes are great, but for JDBC, ODBC etc the SQLSTATEs of the standard > reign. > We have started adding SQLSTATEs but have not really paid attention to their > correctness. > Here is a unified view of SQLSTATE's used in the > [Industry.|https://docs.google.com/spreadsheets/d/1hrQBSuHooiozUNAQTHiYq3WidS1uliHpl9cYfWpig1c/edit?usp=sharing] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41896) Filtering by row_index always returns empty results
[ https://issues.apache.org/jira/browse/SPARK-41896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17677344#comment-17677344 ] Apache Spark commented on SPARK-41896: -- User 'olaky' has created a pull request for this issue: https://github.com/apache/spark/pull/39608 > Filtering by row_index always returns empty results > --- > > Key: SPARK-41896 > URL: https://issues.apache.org/jira/browse/SPARK-41896 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.4.0 >Reporter: Jan-Ole Sasse >Assignee: Jan-Ole Sasse >Priority: Critical > Fix For: 3.4.0 > > > Queries that include a filter with row_index currently always return an empty > result. This is because we consider all metadata attributes constant per file > [here|https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/PartitioningAwareFileIndex.scala#L76] > and the filter then always evaluates to false. > This should be fixed as a follow up to SPARK-41791 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41902) Parity in String representation of higher_order_function's output
[ https://issues.apache.org/jira/browse/SPARK-41902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-41902: Assignee: (was: Apache Spark) > Parity in String representation of higher_order_function's output > - > > Key: SPARK-41902 > URL: https://issues.apache.org/jira/browse/SPARK-41902 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Sandeep Singh >Priority: Major > > {code:java} > from pyspark.sql.functions import flatten, struct, transform > df = self.spark.sql("SELECT array(1, 2, 3) as numbers, array('a', 'b', 'c') > as letters") > actual = df.select( > flatten( > transform( > "numbers", > lambda number: transform( > "letters", lambda letter: struct(number.alias("n"), > letter.alias("l")) > ), > ) > ) > ).first()[0] > expected = [ > (1, "a"), > (1, "b"), > (1, "c"), > (2, "a"), > (2, "b"), > (2, "c"), > (3, "a"), > (3, "b"), > (3, "c"), > ] > self.assertEquals(actual, expected){code} > {code:java} > Traceback (most recent call last): > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/tests/test_functions.py", > line 809, in test_nested_higher_order_function > self.assertEquals(actual, expected) > AssertionError: Lists differ: [{'n': 'a', 'l': 'a'}, {'n': 'b', 'l': 'b'[151 > chars]'c'}] != [(1, 'a'), (1, 'b'), (1, 'c'), (2, 'a'), ([43 chars]'c')] > First differing element 0: > {'n': 'a', 'l': 'a'} > (1, 'a') > - [{'l': 'a', 'n': 'a'}, > - {'l': 'b', 'n': 'b'}, > - {'l': 'c', 'n': 'c'}, > - {'l': 'a', 'n': 'a'}, > - {'l': 'b', 'n': 'b'}, > - {'l': 'c', 'n': 'c'}, > - {'l': 'a', 'n': 'a'}, > - {'l': 'b', 'n': 'b'}, > - {'l': 'c', 'n': 'c'}] > + [(1, 'a'), > + (1, 'b'), > + (1, 'c'), > + (2, 'a'), > + (2, 'b'), > + (2, 'c'), > + (3, 'a'), > + (3, 'b'), > + (3, 'c')] > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41902) Parity in String representation of higher_order_function's output
[ https://issues.apache.org/jira/browse/SPARK-41902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-41902: Assignee: Apache Spark > Parity in String representation of higher_order_function's output > - > > Key: SPARK-41902 > URL: https://issues.apache.org/jira/browse/SPARK-41902 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Sandeep Singh >Assignee: Apache Spark >Priority: Major > > {code:java} > from pyspark.sql.functions import flatten, struct, transform > df = self.spark.sql("SELECT array(1, 2, 3) as numbers, array('a', 'b', 'c') > as letters") > actual = df.select( > flatten( > transform( > "numbers", > lambda number: transform( > "letters", lambda letter: struct(number.alias("n"), > letter.alias("l")) > ), > ) > ) > ).first()[0] > expected = [ > (1, "a"), > (1, "b"), > (1, "c"), > (2, "a"), > (2, "b"), > (2, "c"), > (3, "a"), > (3, "b"), > (3, "c"), > ] > self.assertEquals(actual, expected){code} > {code:java} > Traceback (most recent call last): > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/tests/test_functions.py", > line 809, in test_nested_higher_order_function > self.assertEquals(actual, expected) > AssertionError: Lists differ: [{'n': 'a', 'l': 'a'}, {'n': 'b', 'l': 'b'[151 > chars]'c'}] != [(1, 'a'), (1, 'b'), (1, 'c'), (2, 'a'), ([43 chars]'c')] > First differing element 0: > {'n': 'a', 'l': 'a'} > (1, 'a') > - [{'l': 'a', 'n': 'a'}, > - {'l': 'b', 'n': 'b'}, > - {'l': 'c', 'n': 'c'}, > - {'l': 'a', 'n': 'a'}, > - {'l': 'b', 'n': 'b'}, > - {'l': 'c', 'n': 'c'}, > - {'l': 'a', 'n': 'a'}, > - {'l': 'b', 'n': 'b'}, > - {'l': 'c', 'n': 'c'}] > + [(1, 'a'), > + (1, 'b'), > + (1, 'c'), > + (2, 'a'), > + (2, 'b'), > + (2, 'c'), > + (3, 'a'), > + (3, 'b'), > + (3, 'c')] > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41902) Parity in String representation of higher_order_function's output
[ https://issues.apache.org/jira/browse/SPARK-41902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17677335#comment-17677335 ] Apache Spark commented on SPARK-41902: -- User 'zhengruifeng' has created a pull request for this issue: https://github.com/apache/spark/pull/39607 > Parity in String representation of higher_order_function's output > - > > Key: SPARK-41902 > URL: https://issues.apache.org/jira/browse/SPARK-41902 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Sandeep Singh >Priority: Major > > {code:java} > from pyspark.sql.functions import flatten, struct, transform > df = self.spark.sql("SELECT array(1, 2, 3) as numbers, array('a', 'b', 'c') > as letters") > actual = df.select( > flatten( > transform( > "numbers", > lambda number: transform( > "letters", lambda letter: struct(number.alias("n"), > letter.alias("l")) > ), > ) > ) > ).first()[0] > expected = [ > (1, "a"), > (1, "b"), > (1, "c"), > (2, "a"), > (2, "b"), > (2, "c"), > (3, "a"), > (3, "b"), > (3, "c"), > ] > self.assertEquals(actual, expected){code} > {code:java} > Traceback (most recent call last): > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/tests/test_functions.py", > line 809, in test_nested_higher_order_function > self.assertEquals(actual, expected) > AssertionError: Lists differ: [{'n': 'a', 'l': 'a'}, {'n': 'b', 'l': 'b'[151 > chars]'c'}] != [(1, 'a'), (1, 'b'), (1, 'c'), (2, 'a'), ([43 chars]'c')] > First differing element 0: > {'n': 'a', 'l': 'a'} > (1, 'a') > - [{'l': 'a', 'n': 'a'}, > - {'l': 'b', 'n': 'b'}, > - {'l': 'c', 'n': 'c'}, > - {'l': 'a', 'n': 'a'}, > - {'l': 'b', 'n': 'b'}, > - {'l': 'c', 'n': 'c'}, > - {'l': 'a', 'n': 'a'}, > - {'l': 'b', 'n': 'b'}, > - {'l': 'c', 'n': 'c'}] > + [(1, 'a'), > + (1, 'b'), > + (1, 'c'), > + (2, 'a'), > + (2, 'b'), > + (2, 'c'), > + (3, 'a'), > + (3, 'b'), > + (3, 'c')] > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org