[jira] [Updated] (SPARK-42100) Protect null `SQLExecutionUIData#description` in `SQLExecutionUIDataSerializer`

2023-01-16 Thread Yang Jie (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Jie updated SPARK-42100:
-
Description: 
export LIVE_UI_LOCAL_STORE_DIR = /tmp/spark-ui

mvn clean install -pl sql/core 
-Dtest.exclude.tags=org.apache.spark.tags.ExtendedLevelDBTest -Dtest=none 
-DwildcardSuites=org.apache.spark.sql.DynamicPartitionPruningV1SuiteAEOff -am 

 

no test failed, but some error message:

 
{code:java}
14:46:44.514 ERROR org.apache.spark.scheduler.AsyncEventQueue: Listener 
SQLAppStatusListener threw an exception
java.lang.NullPointerException
    at 
org.apache.spark.status.protobuf.StoreTypes$SQLExecutionUIData$Builder.setDescription(StoreTypes.java:46500)
    at 
org.apache.spark.status.protobuf.sql.SQLExecutionUIDataSerializer.serialize(SQLExecutionUIDataSerializer.scala:34)
    at 
org.apache.spark.status.protobuf.sql.SQLExecutionUIDataSerializer.serialize(SQLExecutionUIDataSerializer.scala:28)
    at 
org.apache.spark.status.protobuf.KVStoreProtobufSerializer.serialize(KVStoreProtobufSerializer.scala:30)
    at org.apache.spark.util.kvstore.RocksDB.write(RocksDB.java:188)
    at 
org.apache.spark.status.ElementTrackingStore.write(ElementTrackingStore.scala:123)
    at 
org.apache.spark.status.ElementTrackingStore.write(ElementTrackingStore.scala:127)
    at org.apache.spark.status.LiveEntity.write(LiveEntity.scala:50)
    at 
org.apache.spark.sql.execution.ui.SQLAppStatusListener.update(SQLAppStatusListener.scala:456)
    at 
org.apache.spark.sql.execution.ui.SQLAppStatusListener.onJobStart(SQLAppStatusListener.scala:124)
    at 
org.apache.spark.scheduler.SparkListenerBus.doPostEvent(SparkListenerBus.scala:37)
    at 
org.apache.spark.scheduler.SparkListenerBus.doPostEvent$(SparkListenerBus.scala:28)
    at 
org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:37)
    at 
org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:37)
    at org.apache.spark.util.ListenerBus.postToAll(ListenerBus.scala:117)
    at org.apache.spark.util.ListenerBus.postToAll$(ListenerBus.scala:101)
    at 
org.apache.spark.scheduler.AsyncEventQueue.super$postToAll(AsyncEventQueue.scala:105)
    at 
org.apache.spark.scheduler.AsyncEventQueue.$anonfun$dispatch$1(AsyncEventQueue.scala:105)
    at scala.runtime.java8.JFunction0$mcJ$sp.apply(JFunction0$mcJ$sp.java:23)
    at scala.util.DynamicVariable.withValue(DynamicVariable.scala:62)
    at 
org.apache.spark.scheduler.AsyncEventQueue.org$apache$spark$scheduler$AsyncEventQueue$$dispatch(AsyncEventQueue.scala:100)
    at 
org.apache.spark.scheduler.AsyncEventQueue$$anon$2.$anonfun$run$1(AsyncEventQueue.scala:96)
    at org.apache.spark.util.Utils$.tryOrStopSparkContext(Utils.scala:1444)
    at 
org.apache.spark.scheduler.AsyncEventQueue$$anon$2.run(AsyncEventQueue.scala:96)
14:46:44.936 ERROR org.apache.spark.scheduler.AsyncEventQueue: Listener 
SQLAppStatusListener threw an exception {code}

> Protect null `SQLExecutionUIData#description` in 
> `SQLExecutionUIDataSerializer`
> ---
>
> Key: SPARK-42100
> URL: https://issues.apache.org/jira/browse/SPARK-42100
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Yang Jie
>Priority: Major
>
> export LIVE_UI_LOCAL_STORE_DIR = /tmp/spark-ui
> mvn clean install -pl sql/core 
> -Dtest.exclude.tags=org.apache.spark.tags.ExtendedLevelDBTest -Dtest=none 
> -DwildcardSuites=org.apache.spark.sql.DynamicPartitionPruningV1SuiteAEOff -am 
>  
> no test failed, but some error message:
>  
> {code:java}
> 14:46:44.514 ERROR org.apache.spark.scheduler.AsyncEventQueue: Listener 
> SQLAppStatusListener threw an exception
> java.lang.NullPointerException
>     at 
> org.apache.spark.status.protobuf.StoreTypes$SQLExecutionUIData$Builder.setDescription(StoreTypes.java:46500)
>     at 
> org.apache.spark.status.protobuf.sql.SQLExecutionUIDataSerializer.serialize(SQLExecutionUIDataSerializer.scala:34)
>     at 
> org.apache.spark.status.protobuf.sql.SQLExecutionUIDataSerializer.serialize(SQLExecutionUIDataSerializer.scala:28)
>     at 
> org.apache.spark.status.protobuf.KVStoreProtobufSerializer.serialize(KVStoreProtobufSerializer.scala:30)
>     at org.apache.spark.util.kvstore.RocksDB.write(RocksDB.java:188)
>     at 
> org.apache.spark.status.ElementTrackingStore.write(ElementTrackingStore.scala:123)
>     at 
> org.apache.spark.status.ElementTrackingStore.write(ElementTrackingStore.scala:127)
>     at org.apache.spark.status.LiveEntity.write(LiveEntity.scala:50)
>     at 
> org.apache.spark.sql.execution.ui.SQLAppStatusListener.update(SQLAppStatusListener.scala:456)
>     at 
> org.apache.spark.sql.execution.ui.SQLAppStatusListener.onJobStart(SQLAppStatusListener.scala:124)
>     at 
> 

[jira] [Assigned] (SPARK-41845) Fix `count(expr("*"))` function

2023-01-16 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41845:


Assignee: (was: Apache Spark)

> Fix `count(expr("*"))` function
> ---
>
> Key: SPARK-41845
> URL: https://issues.apache.org/jira/browse/SPARK-41845
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.4.0
>Reporter: Sandeep Singh
>Priority: Major
>
> {code:java}
> File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/functions.py", 
> line 801, in pyspark.sql.connect.functions.count
> Failed example:
>     df.select(count(expr("*")), count(df.alphabets)).show()
> Expected:
>     +++
>     |count(1)|count(alphabets)|
>     +++
>     |       4|               3|
>     +++
> Got:
>     +++
>     |count(alphabets)|count(alphabets)|
>     +++
>     |               3|               3|
>     +++
>      {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-41845) Fix `count(expr("*"))` function

2023-01-16 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41845:


Assignee: Apache Spark

> Fix `count(expr("*"))` function
> ---
>
> Key: SPARK-41845
> URL: https://issues.apache.org/jira/browse/SPARK-41845
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.4.0
>Reporter: Sandeep Singh
>Assignee: Apache Spark
>Priority: Major
>
> {code:java}
> File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/functions.py", 
> line 801, in pyspark.sql.connect.functions.count
> Failed example:
>     df.select(count(expr("*")), count(df.alphabets)).show()
> Expected:
>     +++
>     |count(1)|count(alphabets)|
>     +++
>     |       4|               3|
>     +++
> Got:
>     +++
>     |count(alphabets)|count(alphabets)|
>     +++
>     |               3|               3|
>     +++
>      {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-42100) Protect null `SQLExecutionUIData#description` in `SQLExecutionUIDataSerializer`

2023-01-16 Thread Yang Jie (Jira)
Yang Jie created SPARK-42100:


 Summary: Protect null `SQLExecutionUIData#description` in 
`SQLExecutionUIDataSerializer`
 Key: SPARK-42100
 URL: https://issues.apache.org/jira/browse/SPARK-42100
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.4.0
Reporter: Yang Jie






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41845) Fix `count(expr("*"))` function

2023-01-16 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17677636#comment-17677636
 ] 

Apache Spark commented on SPARK-41845:
--

User 'zhengruifeng' has created a pull request for this issue:
https://github.com/apache/spark/pull/39622

> Fix `count(expr("*"))` function
> ---
>
> Key: SPARK-41845
> URL: https://issues.apache.org/jira/browse/SPARK-41845
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.4.0
>Reporter: Sandeep Singh
>Priority: Major
>
> {code:java}
> File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/functions.py", 
> line 801, in pyspark.sql.connect.functions.count
> Failed example:
>     df.select(count(expr("*")), count(df.alphabets)).show()
> Expected:
>     +++
>     |count(1)|count(alphabets)|
>     +++
>     |       4|               3|
>     +++
> Got:
>     +++
>     |count(alphabets)|count(alphabets)|
>     +++
>     |               3|               3|
>     +++
>      {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42099) Make `count(*)` work correctly

2023-01-16 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17677635#comment-17677635
 ] 

Apache Spark commented on SPARK-42099:
--

User 'zhengruifeng' has created a pull request for this issue:
https://github.com/apache/spark/pull/39622

> Make `count(*)` work correctly
> --
>
> Key: SPARK-42099
> URL: https://issues.apache.org/jira/browse/SPARK-42099
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Priority: Major
>
> cdf.select(CF.count("*"), CF.count(cdf.alphabets)).collect()
> {code:java}
> pyspark.sql.connect.client.SparkConnectAnalysisException: 
> [UNRESOLVED_COLUMN.WITH_SUGGESTION] A column or function parameter with name 
> `*` cannot be resolved. Did you mean one of the following? [`alphabets`]
> Plan: 'Aggregate [unresolvedalias('count('*), None), count(alphabets#32) AS 
> count(alphabets)#35L]
> +- Project [alphabets#30 AS alphabets#32]
>+- LocalRelation [alphabets#30]
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42099) Make `count(*)` work correctly

2023-01-16 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42099?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42099:


Assignee: (was: Apache Spark)

> Make `count(*)` work correctly
> --
>
> Key: SPARK-42099
> URL: https://issues.apache.org/jira/browse/SPARK-42099
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Priority: Major
>
> cdf.select(CF.count("*"), CF.count(cdf.alphabets)).collect()
> {code:java}
> pyspark.sql.connect.client.SparkConnectAnalysisException: 
> [UNRESOLVED_COLUMN.WITH_SUGGESTION] A column or function parameter with name 
> `*` cannot be resolved. Did you mean one of the following? [`alphabets`]
> Plan: 'Aggregate [unresolvedalias('count('*), None), count(alphabets#32) AS 
> count(alphabets)#35L]
> +- Project [alphabets#30 AS alphabets#32]
>+- LocalRelation [alphabets#30]
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42099) Make `count(*)` work correctly

2023-01-16 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42099?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42099:


Assignee: Apache Spark

> Make `count(*)` work correctly
> --
>
> Key: SPARK-42099
> URL: https://issues.apache.org/jira/browse/SPARK-42099
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Assignee: Apache Spark
>Priority: Major
>
> cdf.select(CF.count("*"), CF.count(cdf.alphabets)).collect()
> {code:java}
> pyspark.sql.connect.client.SparkConnectAnalysisException: 
> [UNRESOLVED_COLUMN.WITH_SUGGESTION] A column or function parameter with name 
> `*` cannot be resolved. Did you mean one of the following? [`alphabets`]
> Plan: 'Aggregate [unresolvedalias('count('*), None), count(alphabets#32) AS 
> count(alphabets)#35L]
> +- Project [alphabets#30 AS alphabets#32]
>+- LocalRelation [alphabets#30]
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-42090) Introduce sasl retry count in RetryingBlockTransferor

2023-01-16 Thread Mridul Muralidharan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mridul Muralidharan resolved SPARK-42090.
-
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 39611
[https://github.com/apache/spark/pull/39611]

> Introduce sasl retry count in RetryingBlockTransferor
> -
>
> Key: SPARK-42090
> URL: https://issues.apache.org/jira/browse/SPARK-42090
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.4.0
>Reporter: Ted Yu
>Assignee: Ted Yu
>Priority: Major
> Fix For: 3.4.0
>
>
> Previously a boolean variable, saslTimeoutSeen, was used in 
> RetryingBlockTransferor. However, the boolean variable wouldn't cover the 
> following scenario:
> 1. SaslTimeoutException
> 2. IOException
> 3. SaslTimeoutException
> 4. IOException
> Even though IOException at #2 is retried (resulting in increment of 
> retryCount), the retryCount would be cleared at step #4.
> Since the intention of saslTimeoutSeen is to undo the increment due to 
> retrying SaslTimeoutException, we should keep a counter for 
> SaslTimeoutException retries and subtract the value of this counter from 
> retryCount.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42090) Introduce sasl retry count in RetryingBlockTransferor

2023-01-16 Thread Mridul Muralidharan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mridul Muralidharan reassigned SPARK-42090:
---

Assignee: Ted Yu

> Introduce sasl retry count in RetryingBlockTransferor
> -
>
> Key: SPARK-42090
> URL: https://issues.apache.org/jira/browse/SPARK-42090
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.4.0
>Reporter: Ted Yu
>Assignee: Ted Yu
>Priority: Major
>
> Previously a boolean variable, saslTimeoutSeen, was used in 
> RetryingBlockTransferor. However, the boolean variable wouldn't cover the 
> following scenario:
> 1. SaslTimeoutException
> 2. IOException
> 3. SaslTimeoutException
> 4. IOException
> Even though IOException at #2 is retried (resulting in increment of 
> retryCount), the retryCount would be cleared at step #4.
> Since the intention of saslTimeoutSeen is to undo the increment due to 
> retrying SaslTimeoutException, we should keep a counter for 
> SaslTimeoutException retries and subtract the value of this counter from 
> retryCount.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-42099) Make `count(*)` work correctly

2023-01-16 Thread Ruifeng Zheng (Jira)
Ruifeng Zheng created SPARK-42099:
-

 Summary: Make `count(*)` work correctly
 Key: SPARK-42099
 URL: https://issues.apache.org/jira/browse/SPARK-42099
 Project: Spark
  Issue Type: Sub-task
  Components: Connect, PySpark
Affects Versions: 3.4.0
Reporter: Ruifeng Zheng


cdf.select(CF.count("*"), CF.count(cdf.alphabets)).collect()


{code:java}
pyspark.sql.connect.client.SparkConnectAnalysisException: 
[UNRESOLVED_COLUMN.WITH_SUGGESTION] A column or function parameter with name 
`*` cannot be resolved. Did you mean one of the following? [`alphabets`]
Plan: 'Aggregate [unresolvedalias('count('*), None), count(alphabets#32) AS 
count(alphabets)#35L]
+- Project [alphabets#30 AS alphabets#32]
   +- LocalRelation [alphabets#30]

{code}




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-42097) Register SerializedLambda and BitSet to KryoSerializer

2023-01-16 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42097?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-42097.
---
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 39621
[https://github.com/apache/spark/pull/39621]

> Register SerializedLambda and BitSet to KryoSerializer
> --
>
> Key: SPARK-42097
> URL: https://issues.apache.org/jira/browse/SPARK-42097
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.4.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Minor
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42097) Register SerializedLambda and BitSet to KryoSerializer

2023-01-16 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42097?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-42097:
-

Assignee: Dongjoon Hyun

> Register SerializedLambda and BitSet to KryoSerializer
> --
>
> Key: SPARK-42097
> URL: https://issues.apache.org/jira/browse/SPARK-42097
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.4.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-42098) ResolveInlineTables should handle RuntimeReplaceable

2023-01-16 Thread Wenchen Fan (Jira)
Wenchen Fan created SPARK-42098:
---

 Summary: ResolveInlineTables should handle RuntimeReplaceable
 Key: SPARK-42098
 URL: https://issues.apache.org/jira/browse/SPARK-42098
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.3.1
Reporter: Wenchen Fan


spark-sql> VALUES (try_divide(5, 0));
cannot evaluate expression try_divide(5, 0) in inline table definition; line 1 
pos 8



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42097) Register SerializedLambda and BitSet to KryoSerializer

2023-01-16 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42097?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42097:


Assignee: Apache Spark

> Register SerializedLambda and BitSet to KryoSerializer
> --
>
> Key: SPARK-42097
> URL: https://issues.apache.org/jira/browse/SPARK-42097
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.4.0
>Reporter: Dongjoon Hyun
>Assignee: Apache Spark
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42097) Register SerializedLambda and BitSet to KryoSerializer

2023-01-16 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17677618#comment-17677618
 ] 

Apache Spark commented on SPARK-42097:
--

User 'dongjoon-hyun' has created a pull request for this issue:
https://github.com/apache/spark/pull/39621

> Register SerializedLambda and BitSet to KryoSerializer
> --
>
> Key: SPARK-42097
> URL: https://issues.apache.org/jira/browse/SPARK-42097
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.4.0
>Reporter: Dongjoon Hyun
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42097) Register SerializedLambda and BitSet to KryoSerializer

2023-01-16 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42097?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42097:


Assignee: (was: Apache Spark)

> Register SerializedLambda and BitSet to KryoSerializer
> --
>
> Key: SPARK-42097
> URL: https://issues.apache.org/jira/browse/SPARK-42097
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.4.0
>Reporter: Dongjoon Hyun
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-41757) Compatibility of string representation in Column

2023-01-16 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-41757:


Assignee: Hyukjin Kwon

> Compatibility of string representation in Column
> 
>
> Key: SPARK-41757
> URL: https://issues.apache.org/jira/browse/SPARK-41757
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Sandeep Singh
>Assignee: Hyukjin Kwon
>Priority: Major
>
> Doctest in pyspark.sql.connect.column.Columnfails with the error below:
> {code:java}
> File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/column.py", 
> line 120, in pyspark.sql.connect.column.Column
> Failed example:
>     df.name
> Expected:
>     Column<'name'>
> Got:
>     Column<'ColumnReference(name)'>
> **
> File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/column.py", 
> line 122, in pyspark.sql.connect.column.Column
> Failed example:
>     df["name"]
> Expected:
>     Column<'name'>
> Got:
>     Column<'ColumnReference(name)'>
> **
> File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/column.py", 
> line 127, in pyspark.sql.connect.column.Column
> Failed example:
>     df.age + 1
> Expected:
>     Column<'(age + 1)'>
> Got:
>     Column<'+(ColumnReference(age), Literal(1))'>
> **
> File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/column.py", 
> line 129, in pyspark.sql.connect.column.Column
> Failed example:
>     1 / df.age
> Expected:
>     Column<'(1 / age)'>
> Got:
>     Column<'/(Literal(1), ColumnReference(age))'> {code}
>  
> We should enable this back after fixing the issue in Spark Connect



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-41901) Parity in String representation of Column

2023-01-16 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-41901:


Assignee: Hyukjin Kwon

> Parity in String representation of Column
> -
>
> Key: SPARK-41901
> URL: https://issues.apache.org/jira/browse/SPARK-41901
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Sandeep Singh
>Assignee: Hyukjin Kwon
>Priority: Major
>
> {code:java}
> from pyspark.sql import functions
> funs = [
> (functions.acosh, "ACOSH"),
> (functions.asinh, "ASINH"),
> (functions.atanh, "ATANH"),
> ]
> cols = ["a", functions.col("a")]
> for f, alias in funs:
> for c in cols:
> self.assertIn(f"{alias}(a)", repr(f(c))){code}
> {code:java}
>  Traceback (most recent call last):
>   File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/tests/test_functions.py",
>  line 271, in test_inverse_trig_functions
> self.assertIn(f"{alias}(a)", repr(f(c)))
> AssertionError: 'ACOSH(a)' not found in 
> "Column<'acosh(ColumnReference(a))'>"{code}
>  
>  
> {code:java}
> from pyspark.sql.functions import col, lit, overlay
> from itertools import chain
> import re
> actual = list(
> chain.from_iterable(
> [
> re.findall("(overlay\\(.*\\))", str(x))
> for x in [
> overlay(col("foo"), col("bar"), 1),
> overlay("x", "y", 3),
> overlay(col("x"), col("y"), 1, 3),
> overlay("x", "y", 2, 5),
> overlay("x", "y", lit(11)),
> overlay("x", "y", lit(2), lit(5)),
> ]
> ]
> )
> )
> expected = [
> "overlay(foo, bar, 1, -1)",
> "overlay(x, y, 3, -1)",
> "overlay(x, y, 1, 3)",
> "overlay(x, y, 2, 5)",
> "overlay(x, y, 11, -1)",
> "overlay(x, y, 2, 5)",
> ]
> self.assertListEqual(actual, expected)
> df = self.spark.createDataFrame([("SPARK_SQL", "CORE", 7, 0)], ("x", "y", 
> "pos", "len"))
> exp = [Row(ol="SPARK_CORESQL")]
> self.assertTrue(
> all(
> [
> df.select(overlay(df.x, df.y, 7, 0).alias("ol")).collect() == exp,
> df.select(overlay(df.x, df.y, lit(7), 
> lit(0)).alias("ol")).collect() == exp,
> df.select(overlay("x", "y", "pos", "len").alias("ol")).collect() 
> == exp,
> ]
> )
> ) {code}
> {code:java}
> Traceback (most recent call last):
>   File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/tests/test_functions.py",
>  line 675, in test_overlay
> self.assertListEqual(actual, expected)
> AssertionError: Lists differ: ['overlay(ColumnReference(foo), 
> ColumnReference(bar[402 chars]5))'] != ['overlay(foo, bar, 1, -1)', 
> 'overlay(x, y, 3, -1)'[90 chars] 5)']
> First differing element 0:
> 'overlay(ColumnReference(foo), ColumnReference(bar), Literal(1), Literal(-1))'
> 'overlay(foo, bar, 1, -1)'
> - ['overlay(ColumnReference(foo), ColumnReference(bar), Literal(1), 
> Literal(-1))',
> -  'overlay(ColumnReference(x), ColumnReference(y), Literal(3), Literal(-1))',
> -  'overlay(ColumnReference(x), ColumnReference(y), Literal(1), Literal(3))',
> -  'overlay(ColumnReference(x), ColumnReference(y), Literal(2), Literal(5))',
> -  'overlay(ColumnReference(x), ColumnReference(y), Literal(11), 
> Literal(-1))',
> -  'overlay(ColumnReference(x), ColumnReference(y), Literal(2), Literal(5))']
> + ['overlay(foo, bar, 1, -1)',
> +  'overlay(x, y, 3, -1)',
> +  'overlay(x, y, 1, 3)',
> +  'overlay(x, y, 2, 5)',
> +  'overlay(x, y, 11, -1)',
> +  'overlay(x, y, 2, 5)']
>  {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-41901) Parity in String representation of Column

2023-01-16 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-41901.
--
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 39616
[https://github.com/apache/spark/pull/39616]

> Parity in String representation of Column
> -
>
> Key: SPARK-41901
> URL: https://issues.apache.org/jira/browse/SPARK-41901
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Sandeep Singh
>Assignee: Hyukjin Kwon
>Priority: Major
> Fix For: 3.4.0
>
>
> {code:java}
> from pyspark.sql import functions
> funs = [
> (functions.acosh, "ACOSH"),
> (functions.asinh, "ASINH"),
> (functions.atanh, "ATANH"),
> ]
> cols = ["a", functions.col("a")]
> for f, alias in funs:
> for c in cols:
> self.assertIn(f"{alias}(a)", repr(f(c))){code}
> {code:java}
>  Traceback (most recent call last):
>   File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/tests/test_functions.py",
>  line 271, in test_inverse_trig_functions
> self.assertIn(f"{alias}(a)", repr(f(c)))
> AssertionError: 'ACOSH(a)' not found in 
> "Column<'acosh(ColumnReference(a))'>"{code}
>  
>  
> {code:java}
> from pyspark.sql.functions import col, lit, overlay
> from itertools import chain
> import re
> actual = list(
> chain.from_iterable(
> [
> re.findall("(overlay\\(.*\\))", str(x))
> for x in [
> overlay(col("foo"), col("bar"), 1),
> overlay("x", "y", 3),
> overlay(col("x"), col("y"), 1, 3),
> overlay("x", "y", 2, 5),
> overlay("x", "y", lit(11)),
> overlay("x", "y", lit(2), lit(5)),
> ]
> ]
> )
> )
> expected = [
> "overlay(foo, bar, 1, -1)",
> "overlay(x, y, 3, -1)",
> "overlay(x, y, 1, 3)",
> "overlay(x, y, 2, 5)",
> "overlay(x, y, 11, -1)",
> "overlay(x, y, 2, 5)",
> ]
> self.assertListEqual(actual, expected)
> df = self.spark.createDataFrame([("SPARK_SQL", "CORE", 7, 0)], ("x", "y", 
> "pos", "len"))
> exp = [Row(ol="SPARK_CORESQL")]
> self.assertTrue(
> all(
> [
> df.select(overlay(df.x, df.y, 7, 0).alias("ol")).collect() == exp,
> df.select(overlay(df.x, df.y, lit(7), 
> lit(0)).alias("ol")).collect() == exp,
> df.select(overlay("x", "y", "pos", "len").alias("ol")).collect() 
> == exp,
> ]
> )
> ) {code}
> {code:java}
> Traceback (most recent call last):
>   File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/tests/test_functions.py",
>  line 675, in test_overlay
> self.assertListEqual(actual, expected)
> AssertionError: Lists differ: ['overlay(ColumnReference(foo), 
> ColumnReference(bar[402 chars]5))'] != ['overlay(foo, bar, 1, -1)', 
> 'overlay(x, y, 3, -1)'[90 chars] 5)']
> First differing element 0:
> 'overlay(ColumnReference(foo), ColumnReference(bar), Literal(1), Literal(-1))'
> 'overlay(foo, bar, 1, -1)'
> - ['overlay(ColumnReference(foo), ColumnReference(bar), Literal(1), 
> Literal(-1))',
> -  'overlay(ColumnReference(x), ColumnReference(y), Literal(3), Literal(-1))',
> -  'overlay(ColumnReference(x), ColumnReference(y), Literal(1), Literal(3))',
> -  'overlay(ColumnReference(x), ColumnReference(y), Literal(2), Literal(5))',
> -  'overlay(ColumnReference(x), ColumnReference(y), Literal(11), 
> Literal(-1))',
> -  'overlay(ColumnReference(x), ColumnReference(y), Literal(2), Literal(5))']
> + ['overlay(foo, bar, 1, -1)',
> +  'overlay(x, y, 3, -1)',
> +  'overlay(x, y, 1, 3)',
> +  'overlay(x, y, 2, 5)',
> +  'overlay(x, y, 11, -1)',
> +  'overlay(x, y, 2, 5)']
>  {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-41757) Compatibility of string representation in Column

2023-01-16 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-41757.
--
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 39616
[https://github.com/apache/spark/pull/39616]

> Compatibility of string representation in Column
> 
>
> Key: SPARK-41757
> URL: https://issues.apache.org/jira/browse/SPARK-41757
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Sandeep Singh
>Assignee: Hyukjin Kwon
>Priority: Major
> Fix For: 3.4.0
>
>
> Doctest in pyspark.sql.connect.column.Columnfails with the error below:
> {code:java}
> File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/column.py", 
> line 120, in pyspark.sql.connect.column.Column
> Failed example:
>     df.name
> Expected:
>     Column<'name'>
> Got:
>     Column<'ColumnReference(name)'>
> **
> File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/column.py", 
> line 122, in pyspark.sql.connect.column.Column
> Failed example:
>     df["name"]
> Expected:
>     Column<'name'>
> Got:
>     Column<'ColumnReference(name)'>
> **
> File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/column.py", 
> line 127, in pyspark.sql.connect.column.Column
> Failed example:
>     df.age + 1
> Expected:
>     Column<'(age + 1)'>
> Got:
>     Column<'+(ColumnReference(age), Literal(1))'>
> **
> File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/column.py", 
> line 129, in pyspark.sql.connect.column.Column
> Failed example:
>     1 / df.age
> Expected:
>     Column<'(1 / age)'>
> Got:
>     Column<'/(Literal(1), ColumnReference(age))'> {code}
>  
> We should enable this back after fixing the issue in Spark Connect



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-41775) Implement training functions as input

2023-01-16 Thread Rithwik Ediga Lakhamsani (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rithwik Ediga Lakhamsani updated SPARK-41775:
-
Description: 
Sidenote: make formatting updates described in 
https://github.com/apache/spark/pull/39188

 

Currently, `Distributor().run(...)` takes only files as input. Now we will add 
in additional functionality to take in functions as well. This will require us 
to go through the following process on each task in the executor nodes:
1. take the input function and args and pickle them
2. Create a temp train.py file that looks like
{code:java}
import cloudpickle
import os
if _name_ == "_main_":
    train, args = cloudpickle.load(f"{tempdir}/train_input.pkl")
    output = train(*args)
    if output and os.environ.get("RANK", "") == "0": # this is for partitionId 
== 0
        cloudpickle.dump(f"{tempdir}/train_output.pkl") {code}
3. Run that train.py file with `torchrun`

4. Check if `train_output.pkl` has been created on process on partitionId == 0, 
if it has, then deserialize it and return that output through `.collect()`

  was:
Currently, `Distributor().run(...)` takes only files as input. Now we will add 
in additional functionality to take in functions as well. This will require us 
to go through the following process on each task in the executor nodes:
1. take the input function and args and pickle them
2. Create a temp train.py file that looks like
{code:java}
import cloudpickle
import os
if _name_ == "_main_":
    train, args = cloudpickle.load(f"{tempdir}/train_input.pkl")
    output = train(*args)
    if output and os.environ.get("RANK", "") == "0": # this is for partitionId 
== 0
        cloudpickle.dump(f"{tempdir}/train_output.pkl") {code}
3. Run that train.py file with `torchrun`

4. Check if `train_output.pkl` has been created on process on partitionId == 0, 
if it has, then deserialize it and return that output through `.collect()`


> Implement training functions as input
> -
>
> Key: SPARK-41775
> URL: https://issues.apache.org/jira/browse/SPARK-41775
> Project: Spark
>  Issue Type: Sub-task
>  Components: ML, PySpark
>Affects Versions: 3.4.0
>Reporter: Rithwik Ediga Lakhamsani
>Priority: Major
>
> Sidenote: make formatting updates described in 
> https://github.com/apache/spark/pull/39188
>  
> Currently, `Distributor().run(...)` takes only files as input. Now we will 
> add in additional functionality to take in functions as well. This will 
> require us to go through the following process on each task in the executor 
> nodes:
> 1. take the input function and args and pickle them
> 2. Create a temp train.py file that looks like
> {code:java}
> import cloudpickle
> import os
> if _name_ == "_main_":
>     train, args = cloudpickle.load(f"{tempdir}/train_input.pkl")
>     output = train(*args)
>     if output and os.environ.get("RANK", "") == "0": # this is for 
> partitionId == 0
>         cloudpickle.dump(f"{tempdir}/train_output.pkl") {code}
> 3. Run that train.py file with `torchrun`
> 4. Check if `train_output.pkl` has been created on process on partitionId == 
> 0, if it has, then deserialize it and return that output through `.collect()`



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42066) The DATATYPE_MISMATCH error class contains inappropriate and duplicating subclasses

2023-01-16 Thread Haejoon Lee (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17677608#comment-17677608
 ] 

Haejoon Lee commented on SPARK-42066:
-

Let me take a look

> The DATATYPE_MISMATCH error class contains inappropriate and duplicating 
> subclasses
> ---
>
> Key: SPARK-42066
> URL: https://issues.apache.org/jira/browse/SPARK-42066
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.4.0
>Reporter: Serge Rielau
>Priority: Major
>
> subclass WRONG_NUM_ARGS (with suggestions) semantically does not belong into 
> DATATYPE_MISMATCH and there is an error class with that same name.
> We should rea the subclasses for this errorclass, which seems to have become 
> a bit of a dumping ground...



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41845) Fix `count(expr("*"))` function

2023-01-16 Thread Ruifeng Zheng (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17677605#comment-17677605
 ] 

Ruifeng Zheng commented on SPARK-41845:
---

I will take a look

> Fix `count(expr("*"))` function
> ---
>
> Key: SPARK-41845
> URL: https://issues.apache.org/jira/browse/SPARK-41845
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.4.0
>Reporter: Sandeep Singh
>Priority: Major
>
> {code:java}
> File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/functions.py", 
> line 801, in pyspark.sql.connect.functions.count
> Failed example:
>     df.select(count(expr("*")), count(df.alphabets)).show()
> Expected:
>     +++
>     |count(1)|count(alphabets)|
>     +++
>     |       4|               3|
>     +++
> Got:
>     +++
>     |count(alphabets)|count(alphabets)|
>     +++
>     |               3|               3|
>     +++
>      {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-42097) Register SerializedLambda and BitSet to KryoSerializer

2023-01-16 Thread Dongjoon Hyun (Jira)
Dongjoon Hyun created SPARK-42097:
-

 Summary: Register SerializedLambda and BitSet to KryoSerializer
 Key: SPARK-42097
 URL: https://issues.apache.org/jira/browse/SPARK-42097
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 3.4.0
Reporter: Dongjoon Hyun






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42096) Code cleanup for connect module

2023-01-16 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42096:


Assignee: Apache Spark

> Code cleanup for connect module
> ---
>
> Key: SPARK-42096
> URL: https://issues.apache.org/jira/browse/SPARK-42096
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Yang Jie
>Assignee: Apache Spark
>Priority: Trivial
>
> For example, some functions that are currently only used inside the class 
> weaken the access scope from public to private
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42096) Code cleanup for connect module

2023-01-16 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42096:


Assignee: (was: Apache Spark)

> Code cleanup for connect module
> ---
>
> Key: SPARK-42096
> URL: https://issues.apache.org/jira/browse/SPARK-42096
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Yang Jie
>Priority: Trivial
>
> For example, some functions that are currently only used inside the class 
> weaken the access scope from public to private
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42096) Code cleanup for connect module

2023-01-16 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17677602#comment-17677602
 ] 

Apache Spark commented on SPARK-42096:
--

User 'LuciferYang' has created a pull request for this issue:
https://github.com/apache/spark/pull/39620

> Code cleanup for connect module
> ---
>
> Key: SPARK-42096
> URL: https://issues.apache.org/jira/browse/SPARK-42096
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Yang Jie
>Priority: Trivial
>
> For example, some functions that are currently only used inside the class 
> weaken the access scope from public to private
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42089) Different result in nested lambda function

2023-01-16 Thread Ruifeng Zheng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42089?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng reassigned SPARK-42089:
-

Assignee: Ruifeng Zheng

> Different result in nested lambda function
> --
>
> Key: SPARK-42089
> URL: https://issues.apache.org/jira/browse/SPARK-42089
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
>
> test_nested_higher_order_function
> {code:java}
> Traceback (most recent call last):
>   File 
> "/Users/ruifeng.zheng/Dev/spark/python/pyspark/sql/tests/test_functions.py", 
> line 814, in test_nested_higher_order_function
> self.assertEquals(actual, expected)
> AssertionError: Lists differ: [Row(n='a', l='a'), Row(n='b', l='b'), Row[124 
> chars]'c')] != [(1, 'a'), (1, 'b'), (1, 'c'), (2, 'a'), ([43 chars]'c')]
> First differing element 0:
> Row(n='a', l='a')
> (1, 'a')
> - [Row(n='a', l='a'),
> -  Row(n='b', l='b'),
> -  Row(n='c', l='c'),
> -  Row(n='a', l='a'),
> -  Row(n='b', l='b'),
> -  Row(n='c', l='c'),
> -  Row(n='a', l='a'),
> -  Row(n='b', l='b'),
> -  Row(n='c', l='c')]
> + [(1, 'a'),
> +  (1, 'b'),
> +  (1, 'c'),
> +  (2, 'a'),
> +  (2, 'b'),
> +  (2, 'c'),
> +  (3, 'a'),
> +  (3, 'b'),
> +  (3, 'c')]
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-42089) Different result in nested lambda function

2023-01-16 Thread Ruifeng Zheng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42089?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng resolved SPARK-42089.
---
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 39619
[https://github.com/apache/spark/pull/39619]

> Different result in nested lambda function
> --
>
> Key: SPARK-42089
> URL: https://issues.apache.org/jira/browse/SPARK-42089
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
> Fix For: 3.4.0
>
>
> test_nested_higher_order_function
> {code:java}
> Traceback (most recent call last):
>   File 
> "/Users/ruifeng.zheng/Dev/spark/python/pyspark/sql/tests/test_functions.py", 
> line 814, in test_nested_higher_order_function
> self.assertEquals(actual, expected)
> AssertionError: Lists differ: [Row(n='a', l='a'), Row(n='b', l='b'), Row[124 
> chars]'c')] != [(1, 'a'), (1, 'b'), (1, 'c'), (2, 'a'), ([43 chars]'c')]
> First differing element 0:
> Row(n='a', l='a')
> (1, 'a')
> - [Row(n='a', l='a'),
> -  Row(n='b', l='b'),
> -  Row(n='c', l='c'),
> -  Row(n='a', l='a'),
> -  Row(n='b', l='b'),
> -  Row(n='c', l='c'),
> -  Row(n='a', l='a'),
> -  Row(n='b', l='b'),
> -  Row(n='c', l='c')]
> + [(1, 'a'),
> +  (1, 'b'),
> +  (1, 'c'),
> +  (2, 'a'),
> +  (2, 'b'),
> +  (2, 'c'),
> +  (3, 'a'),
> +  (3, 'b'),
> +  (3, 'c')]
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-42096) Code cleanup for connect module

2023-01-16 Thread Yang Jie (Jira)
Yang Jie created SPARK-42096:


 Summary: Code cleanup for connect module
 Key: SPARK-42096
 URL: https://issues.apache.org/jira/browse/SPARK-42096
 Project: Spark
  Issue Type: Improvement
  Components: Connect
Affects Versions: 3.4.0
Reporter: Yang Jie


For example, some functions that are currently only used inside the class 
weaken the access scope from public to private

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-41982) When the inserted partition type is of string type, similar `dt=01` will be converted to `dt=1`

2023-01-16 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41982?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-41982.
-
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 39558
[https://github.com/apache/spark/pull/39558]

> When the inserted partition type is of string type, similar `dt=01` will be 
> converted to `dt=1`
> ---
>
> Key: SPARK-41982
> URL: https://issues.apache.org/jira/browse/SPARK-41982
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.4.0
>Reporter: jingxiong zhong
>Priority: Critical
> Fix For: 3.4.0
>
>
> At present, during the process of upgrading Spark2.4 to Spark3.2, we 
> carefully read the migration documentwe and found a kind of situation not 
> involved:
> {code:java}
> create table if not exists test_90(a string, b string) partitioned by (dt 
> string);
> desc formatted test_90;
> // case1
> insert into table test_90 partition (dt=05) values("1","2");
> // case2
> insert into table test_90 partition (dt='05') values("1","2");
> drop table test_90;{code}
> in spark2.4.3, it will generate such a path:
> {code:java}
> // the path
> hdfs://test5/user/hive/db1/test_90/dt=05 
> //result
> spark-sql> select * from test_90;
> 1       2       05
> 1       2       05
> Time taken: 1.316 seconds, Fetched 2 row(s)
> spark-sql> show partitions test_90; 
> dt=05 
> Time taken: 0.201 seconds, Fetched 1 row(s)
> spark-sql> select * from test_90 where dt='05';
> 1       2       05
> 1   2       05
> Time taken: 0.212 seconds, Fetched 2 row(s)
> spark-sql> explain insert into table test_90 partition (dt=05) 
> values("1","2");
> == Physical Plan ==
> Execute InsertIntoHiveTable InsertIntoHiveTable `db1`.`test_90`, 
> org.apache.hadoop.hive.ql.io.orc.OrcSerde, Map(dt -> Some(05)), false, false, 
> [a, b]
> +- LocalTableScan [a#116, b#117]
> Time taken: 1.145 seconds, Fetched 1 row(s){code}
> in spark3.2.0, it will generate two path:
> {code:java}
> // the path
> hdfs://test5/user/hive/db1/test_90/dt=05 
> hdfs://test5/user/hive/db1/test_90/dt=5 
> // result
> spark-sql> select * from test_90;
> 1       2       05
> 1       2       5
> Time taken: 2.119 seconds, Fetched 2 row(s)
> spark-sql> show partitions test_90;
> dt=05
> dt=5
> Time taken: 0.161 seconds, Fetched 2 row(s)
> spark-sql> select * from test_90 where dt='05';
> 1       2       05
> Time taken: 0.252 seconds, Fetched 1 row(s)
> spark-sql> explain insert into table test_90 partition (dt=05) 
> values("1","2");
> plan
> == Physical Plan ==
> Execute InsertIntoHiveTable `db1`.`test_90`, 
> org.apache.hadoop.hive.ql.io.orc.OrcSerde, [dt=Some(5)], false, false, [a, b]
> +- LocalTableScan [a#109, b#110]{code}
> This will cause problems in reading data after the user switches to spark3. 
> The root cause is that in the process of partition field resolution, Spark3 
> has a process of strongly converting this string type, which will cause 
> partition `05` to lose the previous `0`
> So I think we have two solutions:
> one is to record the risk clearly in the migration document, and the other is 
> to repair this case, because we internally keep the partition of string type 
> as string type, regardless of whether single or double quotation marks are 
> added.
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-41982) When the inserted partition type is of string type, similar `dt=01` will be converted to `dt=1`

2023-01-16 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41982?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-41982:
---

Assignee: jingxiong zhong

> When the inserted partition type is of string type, similar `dt=01` will be 
> converted to `dt=1`
> ---
>
> Key: SPARK-41982
> URL: https://issues.apache.org/jira/browse/SPARK-41982
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.4.0
>Reporter: jingxiong zhong
>Assignee: jingxiong zhong
>Priority: Critical
> Fix For: 3.4.0
>
>
> At present, during the process of upgrading Spark2.4 to Spark3.2, we 
> carefully read the migration documentwe and found a kind of situation not 
> involved:
> {code:java}
> create table if not exists test_90(a string, b string) partitioned by (dt 
> string);
> desc formatted test_90;
> // case1
> insert into table test_90 partition (dt=05) values("1","2");
> // case2
> insert into table test_90 partition (dt='05') values("1","2");
> drop table test_90;{code}
> in spark2.4.3, it will generate such a path:
> {code:java}
> // the path
> hdfs://test5/user/hive/db1/test_90/dt=05 
> //result
> spark-sql> select * from test_90;
> 1       2       05
> 1       2       05
> Time taken: 1.316 seconds, Fetched 2 row(s)
> spark-sql> show partitions test_90; 
> dt=05 
> Time taken: 0.201 seconds, Fetched 1 row(s)
> spark-sql> select * from test_90 where dt='05';
> 1       2       05
> 1   2       05
> Time taken: 0.212 seconds, Fetched 2 row(s)
> spark-sql> explain insert into table test_90 partition (dt=05) 
> values("1","2");
> == Physical Plan ==
> Execute InsertIntoHiveTable InsertIntoHiveTable `db1`.`test_90`, 
> org.apache.hadoop.hive.ql.io.orc.OrcSerde, Map(dt -> Some(05)), false, false, 
> [a, b]
> +- LocalTableScan [a#116, b#117]
> Time taken: 1.145 seconds, Fetched 1 row(s){code}
> in spark3.2.0, it will generate two path:
> {code:java}
> // the path
> hdfs://test5/user/hive/db1/test_90/dt=05 
> hdfs://test5/user/hive/db1/test_90/dt=5 
> // result
> spark-sql> select * from test_90;
> 1       2       05
> 1       2       5
> Time taken: 2.119 seconds, Fetched 2 row(s)
> spark-sql> show partitions test_90;
> dt=05
> dt=5
> Time taken: 0.161 seconds, Fetched 2 row(s)
> spark-sql> select * from test_90 where dt='05';
> 1       2       05
> Time taken: 0.252 seconds, Fetched 1 row(s)
> spark-sql> explain insert into table test_90 partition (dt=05) 
> values("1","2");
> plan
> == Physical Plan ==
> Execute InsertIntoHiveTable `db1`.`test_90`, 
> org.apache.hadoop.hive.ql.io.orc.OrcSerde, [dt=Some(5)], false, false, [a, b]
> +- LocalTableScan [a#109, b#110]{code}
> This will cause problems in reading data after the user switches to spark3. 
> The root cause is that in the process of partition field resolution, Spark3 
> has a process of strongly converting this string type, which will cause 
> partition `05` to lose the previous `0`
> So I think we have two solutions:
> one is to record the risk clearly in the migration document, and the other is 
> to repair this case, because we internally keep the partition of string type 
> as string type, regardless of whether single or double quotation marks are 
> added.
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-41866) Make `createDataFrame` support array

2023-01-16 Thread Ruifeng Zheng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41866?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng reassigned SPARK-41866:
-

Assignee: Hyukjin Kwon

> Make `createDataFrame` support array
> 
>
> Key: SPARK-41866
> URL: https://issues.apache.org/jira/browse/SPARK-41866
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Sandeep Singh
>Assignee: Hyukjin Kwon
>Priority: Major
> Fix For: 3.4.0
>
>
> {code:java}
> import array
> data = [Row(longarray=array.array("l", [-9223372036854775808, 0, 
> 9223372036854775807]))]
> df = self.spark.createDataFrame(data) {code}
> Error:
> {code:java}
> Traceback (most recent call last):
>   File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/tests/test_dataframe.py",
>  line 1220, in test_create_dataframe_from_array_of_long
>     df = self.spark.createDataFrame(data)
>   File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/session.py", 
> line 260, in createDataFrame
>     table = pa.Table.from_pylist([row.asDict(recursive=True) for row in 
> _data])
>   File "pyarrow/table.pxi", line 3700, in pyarrow.lib.Table.from_pylist
>   File "pyarrow/table.pxi", line 5221, in pyarrow.lib._from_pylist
>   File "pyarrow/table.pxi", line 3575, in pyarrow.lib.Table.from_arrays
>   File "pyarrow/table.pxi", line 1383, in pyarrow.lib._sanitize_arrays
>   File "pyarrow/table.pxi", line 1364, in pyarrow.lib._schema_from_arrays
>   File "pyarrow/array.pxi", line 320, in pyarrow.lib.array
>   File "pyarrow/array.pxi", line 39, in pyarrow.lib._sequence_to_array
>   File "pyarrow/error.pxi", line 144, in 
> pyarrow.lib.pyarrow_internal_check_status
>   File "pyarrow/error.pxi", line 100, in pyarrow.lib.check_status
> pyarrow.lib.ArrowInvalid: Could not convert array('l', [-9223372036854775808, 
> 0, 9223372036854775807]) with type array.array: did not recognize Python 
> value type when inferring an Arrow data type{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-41866) Make `createDataFrame` support array

2023-01-16 Thread Ruifeng Zheng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41866?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng resolved SPARK-41866.
---
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 39617
[https://github.com/apache/spark/pull/39617]

> Make `createDataFrame` support array
> 
>
> Key: SPARK-41866
> URL: https://issues.apache.org/jira/browse/SPARK-41866
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Sandeep Singh
>Priority: Major
> Fix For: 3.4.0
>
>
> {code:java}
> import array
> data = [Row(longarray=array.array("l", [-9223372036854775808, 0, 
> 9223372036854775807]))]
> df = self.spark.createDataFrame(data) {code}
> Error:
> {code:java}
> Traceback (most recent call last):
>   File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/tests/test_dataframe.py",
>  line 1220, in test_create_dataframe_from_array_of_long
>     df = self.spark.createDataFrame(data)
>   File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/session.py", 
> line 260, in createDataFrame
>     table = pa.Table.from_pylist([row.asDict(recursive=True) for row in 
> _data])
>   File "pyarrow/table.pxi", line 3700, in pyarrow.lib.Table.from_pylist
>   File "pyarrow/table.pxi", line 5221, in pyarrow.lib._from_pylist
>   File "pyarrow/table.pxi", line 3575, in pyarrow.lib.Table.from_arrays
>   File "pyarrow/table.pxi", line 1383, in pyarrow.lib._sanitize_arrays
>   File "pyarrow/table.pxi", line 1364, in pyarrow.lib._schema_from_arrays
>   File "pyarrow/array.pxi", line 320, in pyarrow.lib.array
>   File "pyarrow/array.pxi", line 39, in pyarrow.lib._sequence_to_array
>   File "pyarrow/error.pxi", line 144, in 
> pyarrow.lib.pyarrow_internal_check_status
>   File "pyarrow/error.pxi", line 100, in pyarrow.lib.check_status
> pyarrow.lib.ArrowInvalid: Could not convert array('l', [-9223372036854775808, 
> 0, 9223372036854775807]) with type array.array: did not recognize Python 
> value type when inferring an Arrow data type{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-42072) `core` module requires `javax.servlet-api`

2023-01-16 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-42072.
---
Resolution: Cannot Reproduce

I verified on a clean Apple Silicon machine and master branch works fine. I'm 
close this issue for now.

> `core` module requires `javax.servlet-api`
> --
>
> Key: SPARK-42072
> URL: https://issues.apache.org/jira/browse/SPARK-42072
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.4.0
>Reporter: Dongjoon Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42095) Fix gRPC check in tests

2023-01-16 Thread Ruifeng Zheng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42095?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng reassigned SPARK-42095:
-

Assignee: Xinrong Meng

> Fix gRPC check in tests
> ---
>
> Key: SPARK-42095
> URL: https://issues.apache.org/jira/browse/SPARK-42095
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.4.0
>Reporter: Xinrong Meng
>Assignee: Xinrong Meng
>Priority: Major
>
> Fix gRPC check in tests, including variables and error messages.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-42095) Fix gRPC check in tests

2023-01-16 Thread Ruifeng Zheng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42095?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng resolved SPARK-42095.
---
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 39618
[https://github.com/apache/spark/pull/39618]

> Fix gRPC check in tests
> ---
>
> Key: SPARK-42095
> URL: https://issues.apache.org/jira/browse/SPARK-42095
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.4.0
>Reporter: Xinrong Meng
>Assignee: Xinrong Meng
>Priority: Major
> Fix For: 3.4.0
>
>
> Fix gRPC check in tests, including variables and error messages.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42089) Different result in nested lambda function

2023-01-16 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42089?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42089:


Assignee: Apache Spark

> Different result in nested lambda function
> --
>
> Key: SPARK-42089
> URL: https://issues.apache.org/jira/browse/SPARK-42089
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Assignee: Apache Spark
>Priority: Major
>
> test_nested_higher_order_function
> {code:java}
> Traceback (most recent call last):
>   File 
> "/Users/ruifeng.zheng/Dev/spark/python/pyspark/sql/tests/test_functions.py", 
> line 814, in test_nested_higher_order_function
> self.assertEquals(actual, expected)
> AssertionError: Lists differ: [Row(n='a', l='a'), Row(n='b', l='b'), Row[124 
> chars]'c')] != [(1, 'a'), (1, 'b'), (1, 'c'), (2, 'a'), ([43 chars]'c')]
> First differing element 0:
> Row(n='a', l='a')
> (1, 'a')
> - [Row(n='a', l='a'),
> -  Row(n='b', l='b'),
> -  Row(n='c', l='c'),
> -  Row(n='a', l='a'),
> -  Row(n='b', l='b'),
> -  Row(n='c', l='c'),
> -  Row(n='a', l='a'),
> -  Row(n='b', l='b'),
> -  Row(n='c', l='c')]
> + [(1, 'a'),
> +  (1, 'b'),
> +  (1, 'c'),
> +  (2, 'a'),
> +  (2, 'b'),
> +  (2, 'c'),
> +  (3, 'a'),
> +  (3, 'b'),
> +  (3, 'c')]
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42089) Different result in nested lambda function

2023-01-16 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17677591#comment-17677591
 ] 

Apache Spark commented on SPARK-42089:
--

User 'zhengruifeng' has created a pull request for this issue:
https://github.com/apache/spark/pull/39619

> Different result in nested lambda function
> --
>
> Key: SPARK-42089
> URL: https://issues.apache.org/jira/browse/SPARK-42089
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Priority: Major
>
> test_nested_higher_order_function
> {code:java}
> Traceback (most recent call last):
>   File 
> "/Users/ruifeng.zheng/Dev/spark/python/pyspark/sql/tests/test_functions.py", 
> line 814, in test_nested_higher_order_function
> self.assertEquals(actual, expected)
> AssertionError: Lists differ: [Row(n='a', l='a'), Row(n='b', l='b'), Row[124 
> chars]'c')] != [(1, 'a'), (1, 'b'), (1, 'c'), (2, 'a'), ([43 chars]'c')]
> First differing element 0:
> Row(n='a', l='a')
> (1, 'a')
> - [Row(n='a', l='a'),
> -  Row(n='b', l='b'),
> -  Row(n='c', l='c'),
> -  Row(n='a', l='a'),
> -  Row(n='b', l='b'),
> -  Row(n='c', l='c'),
> -  Row(n='a', l='a'),
> -  Row(n='b', l='b'),
> -  Row(n='c', l='c')]
> + [(1, 'a'),
> +  (1, 'b'),
> +  (1, 'c'),
> +  (2, 'a'),
> +  (2, 'b'),
> +  (2, 'c'),
> +  (3, 'a'),
> +  (3, 'b'),
> +  (3, 'c')]
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42089) Different result in nested lambda function

2023-01-16 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42089?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42089:


Assignee: (was: Apache Spark)

> Different result in nested lambda function
> --
>
> Key: SPARK-42089
> URL: https://issues.apache.org/jira/browse/SPARK-42089
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Priority: Major
>
> test_nested_higher_order_function
> {code:java}
> Traceback (most recent call last):
>   File 
> "/Users/ruifeng.zheng/Dev/spark/python/pyspark/sql/tests/test_functions.py", 
> line 814, in test_nested_higher_order_function
> self.assertEquals(actual, expected)
> AssertionError: Lists differ: [Row(n='a', l='a'), Row(n='b', l='b'), Row[124 
> chars]'c')] != [(1, 'a'), (1, 'b'), (1, 'c'), (2, 'a'), ([43 chars]'c')]
> First differing element 0:
> Row(n='a', l='a')
> (1, 'a')
> - [Row(n='a', l='a'),
> -  Row(n='b', l='b'),
> -  Row(n='c', l='c'),
> -  Row(n='a', l='a'),
> -  Row(n='b', l='b'),
> -  Row(n='c', l='c'),
> -  Row(n='a', l='a'),
> -  Row(n='b', l='b'),
> -  Row(n='c', l='c')]
> + [(1, 'a'),
> +  (1, 'b'),
> +  (1, 'c'),
> +  (2, 'a'),
> +  (2, 'b'),
> +  (2, 'c'),
> +  (3, 'a'),
> +  (3, 'b'),
> +  (3, 'c')]
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-42058) Harden SQLSTATE usage for error classes (2)

2023-01-16 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42058?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-42058.
-
Fix Version/s: 3.4.0
   Resolution: Fixed

> Harden SQLSTATE usage for error classes (2)
> ---
>
> Key: SPARK-42058
> URL: https://issues.apache.org/jira/browse/SPARK-42058
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.4.0
>Reporter: Serge Rielau
>Assignee: Serge Rielau
>Priority: Major
> Fix For: 3.4.0
>
>
> Error classes are great, but for JDBC, ODBC etc the SQLSTATEs of the standard 
> reign.
> We have started adding SQLSTATEs but have not really paid attention to their 
> correctness.
> Follow up to: https://issues.apache.org/jira/browse/SPARK-41994



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42058) Harden SQLSTATE usage for error classes (2)

2023-01-16 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42058?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-42058:
---

Assignee: Serge Rielau

> Harden SQLSTATE usage for error classes (2)
> ---
>
> Key: SPARK-42058
> URL: https://issues.apache.org/jira/browse/SPARK-42058
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.4.0
>Reporter: Serge Rielau
>Assignee: Serge Rielau
>Priority: Major
>
> Error classes are great, but for JDBC, ODBC etc the SQLSTATEs of the standard 
> reign.
> We have started adding SQLSTATEs but have not really paid attention to their 
> correctness.
> Follow up to: https://issues.apache.org/jira/browse/SPARK-41994



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42089) Different result in nested lambda function

2023-01-16 Thread Ruifeng Zheng (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17677579#comment-17677579
 ] 

Ruifeng Zheng commented on SPARK-42089:
---

I am working on this one

> Different result in nested lambda function
> --
>
> Key: SPARK-42089
> URL: https://issues.apache.org/jira/browse/SPARK-42089
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Priority: Major
>
> test_nested_higher_order_function
> {code:java}
> Traceback (most recent call last):
>   File 
> "/Users/ruifeng.zheng/Dev/spark/python/pyspark/sql/tests/test_functions.py", 
> line 814, in test_nested_higher_order_function
> self.assertEquals(actual, expected)
> AssertionError: Lists differ: [Row(n='a', l='a'), Row(n='b', l='b'), Row[124 
> chars]'c')] != [(1, 'a'), (1, 'b'), (1, 'c'), (2, 'a'), ([43 chars]'c')]
> First differing element 0:
> Row(n='a', l='a')
> (1, 'a')
> - [Row(n='a', l='a'),
> -  Row(n='b', l='b'),
> -  Row(n='c', l='c'),
> -  Row(n='a', l='a'),
> -  Row(n='b', l='b'),
> -  Row(n='c', l='c'),
> -  Row(n='a', l='a'),
> -  Row(n='b', l='b'),
> -  Row(n='c', l='c')]
> + [(1, 'a'),
> +  (1, 'b'),
> +  (1, 'c'),
> +  (2, 'a'),
> +  (2, 'b'),
> +  (2, 'c'),
> +  (3, 'a'),
> +  (3, 'b'),
> +  (3, 'c')]
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42095) Fix gRPC check in tests

2023-01-16 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42095?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42095:


Assignee: Apache Spark

> Fix gRPC check in tests
> ---
>
> Key: SPARK-42095
> URL: https://issues.apache.org/jira/browse/SPARK-42095
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.4.0
>Reporter: Xinrong Meng
>Assignee: Apache Spark
>Priority: Major
>
> Fix gRPC check in tests, including variables and error messages.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42095) Fix gRPC check in tests

2023-01-16 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17677570#comment-17677570
 ] 

Apache Spark commented on SPARK-42095:
--

User 'xinrong-meng' has created a pull request for this issue:
https://github.com/apache/spark/pull/39618

> Fix gRPC check in tests
> ---
>
> Key: SPARK-42095
> URL: https://issues.apache.org/jira/browse/SPARK-42095
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.4.0
>Reporter: Xinrong Meng
>Priority: Major
>
> Fix gRPC check in tests, including variables and error messages.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42095) Fix gRPC check in tests

2023-01-16 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42095?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42095:


Assignee: (was: Apache Spark)

> Fix gRPC check in tests
> ---
>
> Key: SPARK-42095
> URL: https://issues.apache.org/jira/browse/SPARK-42095
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.4.0
>Reporter: Xinrong Meng
>Priority: Major
>
> Fix gRPC check in tests, including variables and error messages.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41901) Parity in String representation of Column

2023-01-16 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17677569#comment-17677569
 ] 

Apache Spark commented on SPARK-41901:
--

User 'HyukjinKwon' has created a pull request for this issue:
https://github.com/apache/spark/pull/39616

> Parity in String representation of Column
> -
>
> Key: SPARK-41901
> URL: https://issues.apache.org/jira/browse/SPARK-41901
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Sandeep Singh
>Priority: Major
>
> {code:java}
> from pyspark.sql import functions
> funs = [
> (functions.acosh, "ACOSH"),
> (functions.asinh, "ASINH"),
> (functions.atanh, "ATANH"),
> ]
> cols = ["a", functions.col("a")]
> for f, alias in funs:
> for c in cols:
> self.assertIn(f"{alias}(a)", repr(f(c))){code}
> {code:java}
>  Traceback (most recent call last):
>   File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/tests/test_functions.py",
>  line 271, in test_inverse_trig_functions
> self.assertIn(f"{alias}(a)", repr(f(c)))
> AssertionError: 'ACOSH(a)' not found in 
> "Column<'acosh(ColumnReference(a))'>"{code}
>  
>  
> {code:java}
> from pyspark.sql.functions import col, lit, overlay
> from itertools import chain
> import re
> actual = list(
> chain.from_iterable(
> [
> re.findall("(overlay\\(.*\\))", str(x))
> for x in [
> overlay(col("foo"), col("bar"), 1),
> overlay("x", "y", 3),
> overlay(col("x"), col("y"), 1, 3),
> overlay("x", "y", 2, 5),
> overlay("x", "y", lit(11)),
> overlay("x", "y", lit(2), lit(5)),
> ]
> ]
> )
> )
> expected = [
> "overlay(foo, bar, 1, -1)",
> "overlay(x, y, 3, -1)",
> "overlay(x, y, 1, 3)",
> "overlay(x, y, 2, 5)",
> "overlay(x, y, 11, -1)",
> "overlay(x, y, 2, 5)",
> ]
> self.assertListEqual(actual, expected)
> df = self.spark.createDataFrame([("SPARK_SQL", "CORE", 7, 0)], ("x", "y", 
> "pos", "len"))
> exp = [Row(ol="SPARK_CORESQL")]
> self.assertTrue(
> all(
> [
> df.select(overlay(df.x, df.y, 7, 0).alias("ol")).collect() == exp,
> df.select(overlay(df.x, df.y, lit(7), 
> lit(0)).alias("ol")).collect() == exp,
> df.select(overlay("x", "y", "pos", "len").alias("ol")).collect() 
> == exp,
> ]
> )
> ) {code}
> {code:java}
> Traceback (most recent call last):
>   File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/tests/test_functions.py",
>  line 675, in test_overlay
> self.assertListEqual(actual, expected)
> AssertionError: Lists differ: ['overlay(ColumnReference(foo), 
> ColumnReference(bar[402 chars]5))'] != ['overlay(foo, bar, 1, -1)', 
> 'overlay(x, y, 3, -1)'[90 chars] 5)']
> First differing element 0:
> 'overlay(ColumnReference(foo), ColumnReference(bar), Literal(1), Literal(-1))'
> 'overlay(foo, bar, 1, -1)'
> - ['overlay(ColumnReference(foo), ColumnReference(bar), Literal(1), 
> Literal(-1))',
> -  'overlay(ColumnReference(x), ColumnReference(y), Literal(3), Literal(-1))',
> -  'overlay(ColumnReference(x), ColumnReference(y), Literal(1), Literal(3))',
> -  'overlay(ColumnReference(x), ColumnReference(y), Literal(2), Literal(5))',
> -  'overlay(ColumnReference(x), ColumnReference(y), Literal(11), 
> Literal(-1))',
> -  'overlay(ColumnReference(x), ColumnReference(y), Literal(2), Literal(5))']
> + ['overlay(foo, bar, 1, -1)',
> +  'overlay(x, y, 3, -1)',
> +  'overlay(x, y, 1, 3)',
> +  'overlay(x, y, 2, 5)',
> +  'overlay(x, y, 11, -1)',
> +  'overlay(x, y, 2, 5)']
>  {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-41901) Parity in String representation of Column

2023-01-16 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41901:


Assignee: (was: Apache Spark)

> Parity in String representation of Column
> -
>
> Key: SPARK-41901
> URL: https://issues.apache.org/jira/browse/SPARK-41901
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Sandeep Singh
>Priority: Major
>
> {code:java}
> from pyspark.sql import functions
> funs = [
> (functions.acosh, "ACOSH"),
> (functions.asinh, "ASINH"),
> (functions.atanh, "ATANH"),
> ]
> cols = ["a", functions.col("a")]
> for f, alias in funs:
> for c in cols:
> self.assertIn(f"{alias}(a)", repr(f(c))){code}
> {code:java}
>  Traceback (most recent call last):
>   File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/tests/test_functions.py",
>  line 271, in test_inverse_trig_functions
> self.assertIn(f"{alias}(a)", repr(f(c)))
> AssertionError: 'ACOSH(a)' not found in 
> "Column<'acosh(ColumnReference(a))'>"{code}
>  
>  
> {code:java}
> from pyspark.sql.functions import col, lit, overlay
> from itertools import chain
> import re
> actual = list(
> chain.from_iterable(
> [
> re.findall("(overlay\\(.*\\))", str(x))
> for x in [
> overlay(col("foo"), col("bar"), 1),
> overlay("x", "y", 3),
> overlay(col("x"), col("y"), 1, 3),
> overlay("x", "y", 2, 5),
> overlay("x", "y", lit(11)),
> overlay("x", "y", lit(2), lit(5)),
> ]
> ]
> )
> )
> expected = [
> "overlay(foo, bar, 1, -1)",
> "overlay(x, y, 3, -1)",
> "overlay(x, y, 1, 3)",
> "overlay(x, y, 2, 5)",
> "overlay(x, y, 11, -1)",
> "overlay(x, y, 2, 5)",
> ]
> self.assertListEqual(actual, expected)
> df = self.spark.createDataFrame([("SPARK_SQL", "CORE", 7, 0)], ("x", "y", 
> "pos", "len"))
> exp = [Row(ol="SPARK_CORESQL")]
> self.assertTrue(
> all(
> [
> df.select(overlay(df.x, df.y, 7, 0).alias("ol")).collect() == exp,
> df.select(overlay(df.x, df.y, lit(7), 
> lit(0)).alias("ol")).collect() == exp,
> df.select(overlay("x", "y", "pos", "len").alias("ol")).collect() 
> == exp,
> ]
> )
> ) {code}
> {code:java}
> Traceback (most recent call last):
>   File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/tests/test_functions.py",
>  line 675, in test_overlay
> self.assertListEqual(actual, expected)
> AssertionError: Lists differ: ['overlay(ColumnReference(foo), 
> ColumnReference(bar[402 chars]5))'] != ['overlay(foo, bar, 1, -1)', 
> 'overlay(x, y, 3, -1)'[90 chars] 5)']
> First differing element 0:
> 'overlay(ColumnReference(foo), ColumnReference(bar), Literal(1), Literal(-1))'
> 'overlay(foo, bar, 1, -1)'
> - ['overlay(ColumnReference(foo), ColumnReference(bar), Literal(1), 
> Literal(-1))',
> -  'overlay(ColumnReference(x), ColumnReference(y), Literal(3), Literal(-1))',
> -  'overlay(ColumnReference(x), ColumnReference(y), Literal(1), Literal(3))',
> -  'overlay(ColumnReference(x), ColumnReference(y), Literal(2), Literal(5))',
> -  'overlay(ColumnReference(x), ColumnReference(y), Literal(11), 
> Literal(-1))',
> -  'overlay(ColumnReference(x), ColumnReference(y), Literal(2), Literal(5))']
> + ['overlay(foo, bar, 1, -1)',
> +  'overlay(x, y, 3, -1)',
> +  'overlay(x, y, 1, 3)',
> +  'overlay(x, y, 2, 5)',
> +  'overlay(x, y, 11, -1)',
> +  'overlay(x, y, 2, 5)']
>  {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-41901) Parity in String representation of Column

2023-01-16 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41901:


Assignee: Apache Spark

> Parity in String representation of Column
> -
>
> Key: SPARK-41901
> URL: https://issues.apache.org/jira/browse/SPARK-41901
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Sandeep Singh
>Assignee: Apache Spark
>Priority: Major
>
> {code:java}
> from pyspark.sql import functions
> funs = [
> (functions.acosh, "ACOSH"),
> (functions.asinh, "ASINH"),
> (functions.atanh, "ATANH"),
> ]
> cols = ["a", functions.col("a")]
> for f, alias in funs:
> for c in cols:
> self.assertIn(f"{alias}(a)", repr(f(c))){code}
> {code:java}
>  Traceback (most recent call last):
>   File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/tests/test_functions.py",
>  line 271, in test_inverse_trig_functions
> self.assertIn(f"{alias}(a)", repr(f(c)))
> AssertionError: 'ACOSH(a)' not found in 
> "Column<'acosh(ColumnReference(a))'>"{code}
>  
>  
> {code:java}
> from pyspark.sql.functions import col, lit, overlay
> from itertools import chain
> import re
> actual = list(
> chain.from_iterable(
> [
> re.findall("(overlay\\(.*\\))", str(x))
> for x in [
> overlay(col("foo"), col("bar"), 1),
> overlay("x", "y", 3),
> overlay(col("x"), col("y"), 1, 3),
> overlay("x", "y", 2, 5),
> overlay("x", "y", lit(11)),
> overlay("x", "y", lit(2), lit(5)),
> ]
> ]
> )
> )
> expected = [
> "overlay(foo, bar, 1, -1)",
> "overlay(x, y, 3, -1)",
> "overlay(x, y, 1, 3)",
> "overlay(x, y, 2, 5)",
> "overlay(x, y, 11, -1)",
> "overlay(x, y, 2, 5)",
> ]
> self.assertListEqual(actual, expected)
> df = self.spark.createDataFrame([("SPARK_SQL", "CORE", 7, 0)], ("x", "y", 
> "pos", "len"))
> exp = [Row(ol="SPARK_CORESQL")]
> self.assertTrue(
> all(
> [
> df.select(overlay(df.x, df.y, 7, 0).alias("ol")).collect() == exp,
> df.select(overlay(df.x, df.y, lit(7), 
> lit(0)).alias("ol")).collect() == exp,
> df.select(overlay("x", "y", "pos", "len").alias("ol")).collect() 
> == exp,
> ]
> )
> ) {code}
> {code:java}
> Traceback (most recent call last):
>   File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/tests/test_functions.py",
>  line 675, in test_overlay
> self.assertListEqual(actual, expected)
> AssertionError: Lists differ: ['overlay(ColumnReference(foo), 
> ColumnReference(bar[402 chars]5))'] != ['overlay(foo, bar, 1, -1)', 
> 'overlay(x, y, 3, -1)'[90 chars] 5)']
> First differing element 0:
> 'overlay(ColumnReference(foo), ColumnReference(bar), Literal(1), Literal(-1))'
> 'overlay(foo, bar, 1, -1)'
> - ['overlay(ColumnReference(foo), ColumnReference(bar), Literal(1), 
> Literal(-1))',
> -  'overlay(ColumnReference(x), ColumnReference(y), Literal(3), Literal(-1))',
> -  'overlay(ColumnReference(x), ColumnReference(y), Literal(1), Literal(3))',
> -  'overlay(ColumnReference(x), ColumnReference(y), Literal(2), Literal(5))',
> -  'overlay(ColumnReference(x), ColumnReference(y), Literal(11), 
> Literal(-1))',
> -  'overlay(ColumnReference(x), ColumnReference(y), Literal(2), Literal(5))']
> + ['overlay(foo, bar, 1, -1)',
> +  'overlay(x, y, 3, -1)',
> +  'overlay(x, y, 1, 3)',
> +  'overlay(x, y, 2, 5)',
> +  'overlay(x, y, 11, -1)',
> +  'overlay(x, y, 2, 5)']
>  {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-42095) Fix gRPC check in tests

2023-01-16 Thread Xinrong Meng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42095?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinrong Meng updated SPARK-42095:
-
Description: Fix gRPC check in tests, including variables and error 
messages.

> Fix gRPC check in tests
> ---
>
> Key: SPARK-42095
> URL: https://issues.apache.org/jira/browse/SPARK-42095
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.4.0
>Reporter: Xinrong Meng
>Priority: Major
>
> Fix gRPC check in tests, including variables and error messages.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-42095) Fix gRPC check in tests

2023-01-16 Thread Xinrong Meng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42095?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinrong Meng updated SPARK-42095:
-
Summary: Fix gRPC check in tests  (was: gRPC check in tests)

> Fix gRPC check in tests
> ---
>
> Key: SPARK-42095
> URL: https://issues.apache.org/jira/browse/SPARK-42095
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.4.0
>Reporter: Xinrong Meng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-42095) gRPC check in tests

2023-01-16 Thread Xinrong Meng (Jira)
Xinrong Meng created SPARK-42095:


 Summary: gRPC check in tests
 Key: SPARK-42095
 URL: https://issues.apache.org/jira/browse/SPARK-42095
 Project: Spark
  Issue Type: Sub-task
  Components: PySpark
Affects Versions: 3.4.0
Reporter: Xinrong Meng






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-42094) Support `fill_value` for `ps.Series.add`

2023-01-16 Thread Haejoon Lee (Jira)
Haejoon Lee created SPARK-42094:
---

 Summary: Support `fill_value` for `ps.Series.add`
 Key: SPARK-42094
 URL: https://issues.apache.org/jira/browse/SPARK-42094
 Project: Spark
  Issue Type: Bug
  Components: Pandas API on Spark
Affects Versions: 3.4.0
Reporter: Haejoon Lee


For pandas function parity: 
https://pandas.pydata.org/docs/reference/api/pandas.Series.add.html



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41866) Make `createDataFrame` support array

2023-01-16 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17677565#comment-17677565
 ] 

Apache Spark commented on SPARK-41866:
--

User 'HyukjinKwon' has created a pull request for this issue:
https://github.com/apache/spark/pull/39617

> Make `createDataFrame` support array
> 
>
> Key: SPARK-41866
> URL: https://issues.apache.org/jira/browse/SPARK-41866
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Sandeep Singh
>Priority: Major
>
> {code:java}
> import array
> data = [Row(longarray=array.array("l", [-9223372036854775808, 0, 
> 9223372036854775807]))]
> df = self.spark.createDataFrame(data) {code}
> Error:
> {code:java}
> Traceback (most recent call last):
>   File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/tests/test_dataframe.py",
>  line 1220, in test_create_dataframe_from_array_of_long
>     df = self.spark.createDataFrame(data)
>   File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/session.py", 
> line 260, in createDataFrame
>     table = pa.Table.from_pylist([row.asDict(recursive=True) for row in 
> _data])
>   File "pyarrow/table.pxi", line 3700, in pyarrow.lib.Table.from_pylist
>   File "pyarrow/table.pxi", line 5221, in pyarrow.lib._from_pylist
>   File "pyarrow/table.pxi", line 3575, in pyarrow.lib.Table.from_arrays
>   File "pyarrow/table.pxi", line 1383, in pyarrow.lib._sanitize_arrays
>   File "pyarrow/table.pxi", line 1364, in pyarrow.lib._schema_from_arrays
>   File "pyarrow/array.pxi", line 320, in pyarrow.lib.array
>   File "pyarrow/array.pxi", line 39, in pyarrow.lib._sequence_to_array
>   File "pyarrow/error.pxi", line 144, in 
> pyarrow.lib.pyarrow_internal_check_status
>   File "pyarrow/error.pxi", line 100, in pyarrow.lib.check_status
> pyarrow.lib.ArrowInvalid: Could not convert array('l', [-9223372036854775808, 
> 0, 9223372036854775807]) with type array.array: did not recognize Python 
> value type when inferring an Arrow data type{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-41866) Make `createDataFrame` support array

2023-01-16 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41866?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41866:


Assignee: Apache Spark

> Make `createDataFrame` support array
> 
>
> Key: SPARK-41866
> URL: https://issues.apache.org/jira/browse/SPARK-41866
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Sandeep Singh
>Assignee: Apache Spark
>Priority: Major
>
> {code:java}
> import array
> data = [Row(longarray=array.array("l", [-9223372036854775808, 0, 
> 9223372036854775807]))]
> df = self.spark.createDataFrame(data) {code}
> Error:
> {code:java}
> Traceback (most recent call last):
>   File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/tests/test_dataframe.py",
>  line 1220, in test_create_dataframe_from_array_of_long
>     df = self.spark.createDataFrame(data)
>   File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/session.py", 
> line 260, in createDataFrame
>     table = pa.Table.from_pylist([row.asDict(recursive=True) for row in 
> _data])
>   File "pyarrow/table.pxi", line 3700, in pyarrow.lib.Table.from_pylist
>   File "pyarrow/table.pxi", line 5221, in pyarrow.lib._from_pylist
>   File "pyarrow/table.pxi", line 3575, in pyarrow.lib.Table.from_arrays
>   File "pyarrow/table.pxi", line 1383, in pyarrow.lib._sanitize_arrays
>   File "pyarrow/table.pxi", line 1364, in pyarrow.lib._schema_from_arrays
>   File "pyarrow/array.pxi", line 320, in pyarrow.lib.array
>   File "pyarrow/array.pxi", line 39, in pyarrow.lib._sequence_to_array
>   File "pyarrow/error.pxi", line 144, in 
> pyarrow.lib.pyarrow_internal_check_status
>   File "pyarrow/error.pxi", line 100, in pyarrow.lib.check_status
> pyarrow.lib.ArrowInvalid: Could not convert array('l', [-9223372036854775808, 
> 0, 9223372036854775807]) with type array.array: did not recognize Python 
> value type when inferring an Arrow data type{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-41866) Make `createDataFrame` support array

2023-01-16 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41866?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41866:


Assignee: (was: Apache Spark)

> Make `createDataFrame` support array
> 
>
> Key: SPARK-41866
> URL: https://issues.apache.org/jira/browse/SPARK-41866
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Sandeep Singh
>Priority: Major
>
> {code:java}
> import array
> data = [Row(longarray=array.array("l", [-9223372036854775808, 0, 
> 9223372036854775807]))]
> df = self.spark.createDataFrame(data) {code}
> Error:
> {code:java}
> Traceback (most recent call last):
>   File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/tests/test_dataframe.py",
>  line 1220, in test_create_dataframe_from_array_of_long
>     df = self.spark.createDataFrame(data)
>   File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/session.py", 
> line 260, in createDataFrame
>     table = pa.Table.from_pylist([row.asDict(recursive=True) for row in 
> _data])
>   File "pyarrow/table.pxi", line 3700, in pyarrow.lib.Table.from_pylist
>   File "pyarrow/table.pxi", line 5221, in pyarrow.lib._from_pylist
>   File "pyarrow/table.pxi", line 3575, in pyarrow.lib.Table.from_arrays
>   File "pyarrow/table.pxi", line 1383, in pyarrow.lib._sanitize_arrays
>   File "pyarrow/table.pxi", line 1364, in pyarrow.lib._schema_from_arrays
>   File "pyarrow/array.pxi", line 320, in pyarrow.lib.array
>   File "pyarrow/array.pxi", line 39, in pyarrow.lib._sequence_to_array
>   File "pyarrow/error.pxi", line 144, in 
> pyarrow.lib.pyarrow_internal_check_status
>   File "pyarrow/error.pxi", line 100, in pyarrow.lib.check_status
> pyarrow.lib.ArrowInvalid: Could not convert array('l', [-9223372036854775808, 
> 0, 9223372036854775807]) with type array.array: did not recognize Python 
> value type when inferring an Arrow data type{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41757) Compatibility of string representation in Column

2023-01-16 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17677564#comment-17677564
 ] 

Apache Spark commented on SPARK-41757:
--

User 'HyukjinKwon' has created a pull request for this issue:
https://github.com/apache/spark/pull/39616

> Compatibility of string representation in Column
> 
>
> Key: SPARK-41757
> URL: https://issues.apache.org/jira/browse/SPARK-41757
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Sandeep Singh
>Priority: Major
>
> Doctest in pyspark.sql.connect.column.Columnfails with the error below:
> {code:java}
> File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/column.py", 
> line 120, in pyspark.sql.connect.column.Column
> Failed example:
>     df.name
> Expected:
>     Column<'name'>
> Got:
>     Column<'ColumnReference(name)'>
> **
> File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/column.py", 
> line 122, in pyspark.sql.connect.column.Column
> Failed example:
>     df["name"]
> Expected:
>     Column<'name'>
> Got:
>     Column<'ColumnReference(name)'>
> **
> File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/column.py", 
> line 127, in pyspark.sql.connect.column.Column
> Failed example:
>     df.age + 1
> Expected:
>     Column<'(age + 1)'>
> Got:
>     Column<'+(ColumnReference(age), Literal(1))'>
> **
> File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/column.py", 
> line 129, in pyspark.sql.connect.column.Column
> Failed example:
>     1 / df.age
> Expected:
>     Column<'(1 / age)'>
> Got:
>     Column<'/(Literal(1), ColumnReference(age))'> {code}
>  
> We should enable this back after fixing the issue in Spark Connect



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42067) Upgrade buf from 1.11.0 to 1.12.0

2023-01-16 Thread Ruifeng Zheng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42067?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng reassigned SPARK-42067:
-

Assignee: BingKun Pan

> Upgrade buf from 1.11.0 to 1.12.0
> -
>
> Key: SPARK-42067
> URL: https://issues.apache.org/jira/browse/SPARK-42067
> Project: Spark
>  Issue Type: Improvement
>  Components: Build, Connect
>Affects Versions: 3.4.0
>Reporter: BingKun Pan
>Assignee: BingKun Pan
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-42067) Upgrade buf from 1.11.0 to 1.12.0

2023-01-16 Thread Ruifeng Zheng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42067?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng resolved SPARK-42067.
---
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 39576
[https://github.com/apache/spark/pull/39576]

> Upgrade buf from 1.11.0 to 1.12.0
> -
>
> Key: SPARK-42067
> URL: https://issues.apache.org/jira/browse/SPARK-42067
> Project: Spark
>  Issue Type: Improvement
>  Components: Build, Connect
>Affects Versions: 3.4.0
>Reporter: BingKun Pan
>Assignee: BingKun Pan
>Priority: Minor
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42088) Running python3 setup.py sdist on windows reports a permission error

2023-01-16 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-42088:


Assignee: zheju_he

> Running python3 setup.py sdist on windows reports a permission error
> 
>
> Key: SPARK-42088
> URL: https://issues.apache.org/jira/browse/SPARK-42088
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 3.4.0
>Reporter: zheju_he
>Assignee: zheju_he
>Priority: Minor
>
> My system version is windows 10, and I can run setup.py with administrator 
> permissions, so there will be no error. However, it may be troublesome for us 
> to upgrade permissions with Windows Server, so we need to modify the code of 
> setup.py to ensure no error. To avoid the hassle of compiling for the user, I 
> suggest modifying the following code to enable the out-of-the-box effect
> {code:python}
> def _supports_symlinks():
> """Check if the system supports symlinks (e.g. *nix) or not."""
> return getattr(os, "symlink", None) is not None and 
> ctypes.windll.shell32.IsUserAnAdmin() != 0 if sys.platform == "win32" else 
> True
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-42088) Running python3 setup.py sdist on windows reports a permission error

2023-01-16 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-42088.
--
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 39603
[https://github.com/apache/spark/pull/39603]

> Running python3 setup.py sdist on windows reports a permission error
> 
>
> Key: SPARK-42088
> URL: https://issues.apache.org/jira/browse/SPARK-42088
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 3.4.0
>Reporter: zheju_he
>Assignee: zheju_he
>Priority: Minor
> Fix For: 3.4.0
>
>
> My system version is windows 10, and I can run setup.py with administrator 
> permissions, so there will be no error. However, it may be troublesome for us 
> to upgrade permissions with Windows Server, so we need to modify the code of 
> setup.py to ensure no error. To avoid the hassle of compiling for the user, I 
> suggest modifying the following code to enable the out-of-the-box effect
> {code:python}
> def _supports_symlinks():
> """Check if the system supports symlinks (e.g. *nix) or not."""
> return getattr(os, "symlink", None) is not None and 
> ctypes.windll.shell32.IsUserAnAdmin() != 0 if sys.platform == "win32" else 
> True
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-42091) Upgrade jetty to 9.4.50.v20221201

2023-01-16 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-42091.
--
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 39612
[https://github.com/apache/spark/pull/39612]

> Upgrade jetty to 9.4.50.v20221201
> -
>
> Key: SPARK-42091
> URL: https://issues.apache.org/jira/browse/SPARK-42091
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.4.0
>Reporter: Yang Jie
>Assignee: Yang Jie
>Priority: Minor
> Fix For: 3.4.0
>
>
> https://github.com/eclipse/jetty.project/releases/tag/jetty-9.4.50.v20221201



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42091) Upgrade jetty to 9.4.50.v20221201

2023-01-16 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-42091:


Assignee: Yang Jie

> Upgrade jetty to 9.4.50.v20221201
> -
>
> Key: SPARK-42091
> URL: https://issues.apache.org/jira/browse/SPARK-42091
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.4.0
>Reporter: Yang Jie
>Assignee: Yang Jie
>Priority: Minor
>
> https://github.com/eclipse/jetty.project/releases/tag/jetty-9.4.50.v20221201



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42021) createDataFrame with array.array

2023-01-16 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42021?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-42021:


Assignee: Hyukjin Kwon

> createDataFrame with array.array
> 
>
> Key: SPARK-42021
> URL: https://issues.apache.org/jira/browse/SPARK-42021
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
> Fix For: 3.4.0
>
>
> {code}
> pyspark/sql/tests/test_types.py:964 (TypesParityTests.test_array_types)
> self =  testMethod=test_array_types>
> def test_array_types(self):
> # This test need to make sure that the Scala type selected is at least
> # as large as the python's types. This is necessary because python's
> # array types depend on C implementation on the machine. Therefore 
> there
> # is no machine independent correspondence between python's array 
> types
> # and Scala types.
> # See: https://docs.python.org/2/library/array.html
> 
> def assertCollectSuccess(typecode, value):
> row = Row(myarray=array.array(typecode, [value]))
> df = self.spark.createDataFrame([row])
> self.assertEqual(df.first()["myarray"][0], value)
> 
> # supported string types
> #
> # String types in python's array are "u" for Py_UNICODE and "c" for 
> char.
> # "u" will be removed in python 4, and "c" is not supported in python 
> 3.
> supported_string_types = []
> if sys.version_info[0] < 4:
> supported_string_types += ["u"]
> # test unicode
> >   assertCollectSuccess("u", "a")
> ../test_types.py:986: 
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
> _ 
> ../test_types.py:975: in assertCollectSuccess
> df = self.spark.createDataFrame([row])
> ../../connect/session.py:278: in createDataFrame
> _table = pa.Table.from_pylist([row.asDict(recursive=True) for row in 
> _data])
> pyarrow/table.pxi:3700: in pyarrow.lib.Table.from_pylist
> ???
> pyarrow/table.pxi:5221: in pyarrow.lib._from_pylist
> ???
> pyarrow/table.pxi:3575: in pyarrow.lib.Table.from_arrays
> ???
> pyarrow/table.pxi:1383: in pyarrow.lib._sanitize_arrays
> ???
> pyarrow/table.pxi:1364: in pyarrow.lib._schema_from_arrays
> ???
> pyarrow/array.pxi:320: in pyarrow.lib.array
> ???
> pyarrow/array.pxi:39: in pyarrow.lib._sequence_to_array
> ???
> pyarrow/error.pxi:144: in pyarrow.lib.pyarrow_internal_check_status
> ???
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
> _ 
> >   ???
> E   pyarrow.lib.ArrowInvalid: Could not convert array('u', 'a') with type 
> array.array: did not recognize Python value type when inferring an Arrow data 
> type
> pyarrow/error.pxi:100: ArrowInvalid
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-41902) Parity in String representation of higher_order_function's output

2023-01-16 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-41902:


Assignee: Ruifeng Zheng

> Parity in String representation of higher_order_function's output
> -
>
> Key: SPARK-41902
> URL: https://issues.apache.org/jira/browse/SPARK-41902
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Sandeep Singh
>Assignee: Ruifeng Zheng
>Priority: Major
> Fix For: 3.4.0
>
>
> {code:java}
> from pyspark.sql.functions import flatten, struct, transform
> df = self.spark.sql("SELECT array(1, 2, 3) as numbers, array('a', 'b', 'c') 
> as letters")
> actual = df.select(
> flatten(
> transform(
> "numbers",
> lambda number: transform(
> "letters", lambda letter: struct(number.alias("n"), 
> letter.alias("l"))
> ),
> )
> )
> ).first()[0]
> expected = [
> (1, "a"),
> (1, "b"),
> (1, "c"),
> (2, "a"),
> (2, "b"),
> (2, "c"),
> (3, "a"),
> (3, "b"),
> (3, "c"),
> ]
> self.assertEquals(actual, expected){code}
> {code:java}
> Traceback (most recent call last):
>   File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/tests/test_functions.py",
>  line 809, in test_nested_higher_order_function
> self.assertEquals(actual, expected)
> AssertionError: Lists differ: [{'n': 'a', 'l': 'a'}, {'n': 'b', 'l': 'b'[151 
> chars]'c'}] != [(1, 'a'), (1, 'b'), (1, 'c'), (2, 'a'), ([43 chars]'c')]
> First differing element 0:
> {'n': 'a', 'l': 'a'}
> (1, 'a')
> - [{'l': 'a', 'n': 'a'},
> -  {'l': 'b', 'n': 'b'},
> -  {'l': 'c', 'n': 'c'},
> -  {'l': 'a', 'n': 'a'},
> -  {'l': 'b', 'n': 'b'},
> -  {'l': 'c', 'n': 'c'},
> -  {'l': 'a', 'n': 'a'},
> -  {'l': 'b', 'n': 'b'},
> -  {'l': 'c', 'n': 'c'}]
> + [(1, 'a'),
> +  (1, 'b'),
> +  (1, 'c'),
> +  (2, 'a'),
> +  (2, 'b'),
> +  (2, 'c'),
> +  (3, 'a'),
> +  (3, 'b'),
> +  (3, 'c')]
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42021) createDataFrame with array.array

2023-01-16 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42021?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-42021:


Assignee: Ruifeng Zheng  (was: Hyukjin Kwon)

> createDataFrame with array.array
> 
>
> Key: SPARK-42021
> URL: https://issues.apache.org/jira/browse/SPARK-42021
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Hyukjin Kwon
>Assignee: Ruifeng Zheng
>Priority: Major
> Fix For: 3.4.0
>
>
> {code}
> pyspark/sql/tests/test_types.py:964 (TypesParityTests.test_array_types)
> self =  testMethod=test_array_types>
> def test_array_types(self):
> # This test need to make sure that the Scala type selected is at least
> # as large as the python's types. This is necessary because python's
> # array types depend on C implementation on the machine. Therefore 
> there
> # is no machine independent correspondence between python's array 
> types
> # and Scala types.
> # See: https://docs.python.org/2/library/array.html
> 
> def assertCollectSuccess(typecode, value):
> row = Row(myarray=array.array(typecode, [value]))
> df = self.spark.createDataFrame([row])
> self.assertEqual(df.first()["myarray"][0], value)
> 
> # supported string types
> #
> # String types in python's array are "u" for Py_UNICODE and "c" for 
> char.
> # "u" will be removed in python 4, and "c" is not supported in python 
> 3.
> supported_string_types = []
> if sys.version_info[0] < 4:
> supported_string_types += ["u"]
> # test unicode
> >   assertCollectSuccess("u", "a")
> ../test_types.py:986: 
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
> _ 
> ../test_types.py:975: in assertCollectSuccess
> df = self.spark.createDataFrame([row])
> ../../connect/session.py:278: in createDataFrame
> _table = pa.Table.from_pylist([row.asDict(recursive=True) for row in 
> _data])
> pyarrow/table.pxi:3700: in pyarrow.lib.Table.from_pylist
> ???
> pyarrow/table.pxi:5221: in pyarrow.lib._from_pylist
> ???
> pyarrow/table.pxi:3575: in pyarrow.lib.Table.from_arrays
> ???
> pyarrow/table.pxi:1383: in pyarrow.lib._sanitize_arrays
> ???
> pyarrow/table.pxi:1364: in pyarrow.lib._schema_from_arrays
> ???
> pyarrow/array.pxi:320: in pyarrow.lib.array
> ???
> pyarrow/array.pxi:39: in pyarrow.lib._sequence_to_array
> ???
> pyarrow/error.pxi:144: in pyarrow.lib.pyarrow_internal_check_status
> ???
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
> _ 
> >   ???
> E   pyarrow.lib.ArrowInvalid: Could not convert array('u', 'a') with type 
> array.array: did not recognize Python value type when inferring an Arrow data 
> type
> pyarrow/error.pxi:100: ArrowInvalid
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-41902) Parity in String representation of higher_order_function's output

2023-01-16 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-41902.
--
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 39607
[https://github.com/apache/spark/pull/39607]

> Parity in String representation of higher_order_function's output
> -
>
> Key: SPARK-41902
> URL: https://issues.apache.org/jira/browse/SPARK-41902
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Sandeep Singh
>Priority: Major
> Fix For: 3.4.0
>
>
> {code:java}
> from pyspark.sql.functions import flatten, struct, transform
> df = self.spark.sql("SELECT array(1, 2, 3) as numbers, array('a', 'b', 'c') 
> as letters")
> actual = df.select(
> flatten(
> transform(
> "numbers",
> lambda number: transform(
> "letters", lambda letter: struct(number.alias("n"), 
> letter.alias("l"))
> ),
> )
> )
> ).first()[0]
> expected = [
> (1, "a"),
> (1, "b"),
> (1, "c"),
> (2, "a"),
> (2, "b"),
> (2, "c"),
> (3, "a"),
> (3, "b"),
> (3, "c"),
> ]
> self.assertEquals(actual, expected){code}
> {code:java}
> Traceback (most recent call last):
>   File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/tests/test_functions.py",
>  line 809, in test_nested_higher_order_function
> self.assertEquals(actual, expected)
> AssertionError: Lists differ: [{'n': 'a', 'l': 'a'}, {'n': 'b', 'l': 'b'[151 
> chars]'c'}] != [(1, 'a'), (1, 'b'), (1, 'c'), (2, 'a'), ([43 chars]'c')]
> First differing element 0:
> {'n': 'a', 'l': 'a'}
> (1, 'a')
> - [{'l': 'a', 'n': 'a'},
> -  {'l': 'b', 'n': 'b'},
> -  {'l': 'c', 'n': 'c'},
> -  {'l': 'a', 'n': 'a'},
> -  {'l': 'b', 'n': 'b'},
> -  {'l': 'c', 'n': 'c'},
> -  {'l': 'a', 'n': 'a'},
> -  {'l': 'b', 'n': 'b'},
> -  {'l': 'c', 'n': 'c'}]
> + [(1, 'a'),
> +  (1, 'b'),
> +  (1, 'c'),
> +  (2, 'a'),
> +  (2, 'b'),
> +  (2, 'c'),
> +  (3, 'a'),
> +  (3, 'b'),
> +  (3, 'c')]
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-42021) createDataFrame with array.array

2023-01-16 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42021?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-42021.
--
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 39602
[https://github.com/apache/spark/pull/39602]

> createDataFrame with array.array
> 
>
> Key: SPARK-42021
> URL: https://issues.apache.org/jira/browse/SPARK-42021
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Hyukjin Kwon
>Priority: Major
> Fix For: 3.4.0
>
>
> {code}
> pyspark/sql/tests/test_types.py:964 (TypesParityTests.test_array_types)
> self =  testMethod=test_array_types>
> def test_array_types(self):
> # This test need to make sure that the Scala type selected is at least
> # as large as the python's types. This is necessary because python's
> # array types depend on C implementation on the machine. Therefore 
> there
> # is no machine independent correspondence between python's array 
> types
> # and Scala types.
> # See: https://docs.python.org/2/library/array.html
> 
> def assertCollectSuccess(typecode, value):
> row = Row(myarray=array.array(typecode, [value]))
> df = self.spark.createDataFrame([row])
> self.assertEqual(df.first()["myarray"][0], value)
> 
> # supported string types
> #
> # String types in python's array are "u" for Py_UNICODE and "c" for 
> char.
> # "u" will be removed in python 4, and "c" is not supported in python 
> 3.
> supported_string_types = []
> if sys.version_info[0] < 4:
> supported_string_types += ["u"]
> # test unicode
> >   assertCollectSuccess("u", "a")
> ../test_types.py:986: 
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
> _ 
> ../test_types.py:975: in assertCollectSuccess
> df = self.spark.createDataFrame([row])
> ../../connect/session.py:278: in createDataFrame
> _table = pa.Table.from_pylist([row.asDict(recursive=True) for row in 
> _data])
> pyarrow/table.pxi:3700: in pyarrow.lib.Table.from_pylist
> ???
> pyarrow/table.pxi:5221: in pyarrow.lib._from_pylist
> ???
> pyarrow/table.pxi:3575: in pyarrow.lib.Table.from_arrays
> ???
> pyarrow/table.pxi:1383: in pyarrow.lib._sanitize_arrays
> ???
> pyarrow/table.pxi:1364: in pyarrow.lib._schema_from_arrays
> ???
> pyarrow/array.pxi:320: in pyarrow.lib.array
> ???
> pyarrow/array.pxi:39: in pyarrow.lib._sequence_to_array
> ???
> pyarrow/error.pxi:144: in pyarrow.lib.pyarrow_internal_check_status
> ???
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
> _ 
> >   ???
> E   pyarrow.lib.ArrowInvalid: Could not convert array('u', 'a') with type 
> array.array: did not recognize Python value type when inferring an Arrow data 
> type
> pyarrow/error.pxi:100: ArrowInvalid
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-42079) Rename proto messages for `toDF` and `withColumnsRenamed`

2023-01-16 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-42079.
--
Fix Version/s: 3.4.0
 Assignee: Ruifeng Zheng
   Resolution: Fixed

> Rename proto messages for `toDF` and `withColumnsRenamed`
> -
>
> Key: SPARK-42079
> URL: https://issues.apache.org/jira/browse/SPARK-42079
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42079) Rename proto messages for `toDF` and `withColumnsRenamed`

2023-01-16 Thread Hyukjin Kwon (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17677549#comment-17677549
 ] 

Hyukjin Kwon commented on SPARK-42079:
--

Fixed in https://github.com/apache/spark/pull/39590

> Rename proto messages for `toDF` and `withColumnsRenamed`
> -
>
> Key: SPARK-42079
> URL: https://issues.apache.org/jira/browse/SPARK-42079
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-42068) Implicit conversion is not working with parallelization in scala with java 11 and spark3

2023-01-16 Thread Srinivas Rishindra Pothireddi (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Srinivas Rishindra Pothireddi updated SPARK-42068:
--
Summary: Implicit conversion is not working with parallelization in scala 
with java 11 and spark3  (was: Parallelization in Scala is not working with 
Java 11 and spark3)

> Implicit conversion is not working with parallelization in scala with java 11 
> and spark3
> 
>
> Key: SPARK-42068
> URL: https://issues.apache.org/jira/browse/SPARK-42068
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.3.1, 3.2.3, 3.4.0
> Environment: spark version 3.3.1 Using Scala version 2.12.15 (OpenJDK 
> 64-Bit Server VM, Java 11.0.17)
>Reporter: Srinivas Rishindra Pothireddi
>Priority: Major
>
> The following code snippet fails with java 11 with spark3, but works with 
> java 8. It also works with spark2 and java 11. 
> {code:java}
> import scala.collection.mutable
> import scala.collection.parallel.{ExecutionContextTaskSupport, 
> ForkJoinTaskSupport}
> case class Person(name: String, age: Int)
> val pc = List(1, 2, 3).par
> val forkJoinPool = new java.util.concurrent.ForkJoinPool(2)
> pc.tasksupport = new ForkJoinTaskSupport(forkJoinPool)
> pc.map { x =>
>     val personList: Array[Person] = (1 to 999).map(value => Person("p" + 
> value, value)).toArray
>     //creating RDD of Person
>     val rddPerson = spark.sparkContext.parallelize(personList, 5)
>     val evenAgePerson = rddPerson.filter(_.age % 2 == 0)
>     import spark.implicits._
>     val evenAgePersonDF = evenAgePerson.toDF("Name", "Age")
> } {code}
> The error is as follows.
> {code:java}
> scala.ScalaReflectionException: object $read not found.
>   at scala.reflect.internal.Mirrors$RootsBase.staticModule(Mirrors.scala:185)
>   at scala.reflect.internal.Mirrors$RootsBase.staticModule(Mirrors.scala:29)
>   at $typecreator6$1.apply(:37)
>   at 
> scala.reflect.api.TypeTags$WeakTypeTagImpl.tpe$lzycompute(TypeTags.scala:237)
>   at scala.reflect.api.TypeTags$WeakTypeTagImpl.tpe(TypeTags.scala:237)
>   at 
> org.apache.spark.sql.catalyst.encoders.ExpressionEncoder$.apply(ExpressionEncoder.scala:52)
>   at org.apache.spark.sql.Encoders$.product(Encoders.scala:300)
>   at 
> org.apache.spark.sql.LowPrioritySQLImplicits.newProductEncoder(SQLImplicits.scala:261)
>   at 
> org.apache.spark.sql.LowPrioritySQLImplicits.newProductEncoder$(SQLImplicits.scala:261)
>   at 
> org.apache.spark.sql.SQLImplicits.newProductEncoder(SQLImplicits.scala:32)
>   at $anonfun$res0$1(:37)
>   at scala.runtime.java8.JFunction1$mcVI$sp.apply(JFunction1$mcVI$sp.java:23)
>   at 
> scala.collection.parallel.AugmentedIterableIterator.map2combiner(RemainsIterator.scala:116)
>   at 
> scala.collection.parallel.AugmentedIterableIterator.map2combiner$(RemainsIterator.scala:113)
>   at 
> scala.collection.parallel.immutable.ParVector$ParVectorIterator.map2combiner(ParVector.scala:66)
>   at 
> scala.collection.parallel.ParIterableLike$Map.leaf(ParIterableLike.scala:1064)
>   at scala.collection.parallel.Task.$anonfun$tryLeaf$1(Tasks.scala:53)
>   at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
>   at scala.util.control.Breaks$$anon$1.catchBreak(Breaks.scala:67)
>   at scala.collection.parallel.Task.tryLeaf(Tasks.scala:56)
>   at scala.collection.parallel.Task.tryLeaf$(Tasks.scala:50)
>   at 
> scala.collection.parallel.ParIterableLike$Map.tryLeaf(ParIterableLike.scala:1061)
>   at 
> scala.collection.parallel.AdaptiveWorkStealingTasks$WrappedTask.internal(Tasks.scala:160)
>   at 
> scala.collection.parallel.AdaptiveWorkStealingTasks$WrappedTask.internal$(Tasks.scala:157)
>   at 
> scala.collection.parallel.AdaptiveWorkStealingForkJoinTasks$WrappedTask.internal(Tasks.scala:440)
>   at 
> scala.collection.parallel.AdaptiveWorkStealingTasks$WrappedTask.compute(Tasks.scala:150)
>   at 
> scala.collection.parallel.AdaptiveWorkStealingTasks$WrappedTask.compute$(Tasks.scala:149)
>   at 
> scala.collection.parallel.AdaptiveWorkStealingForkJoinTasks$WrappedTask.compute(Tasks.scala:440)
>   at 
> java.base/java.util.concurrent.RecursiveAction.exec(RecursiveAction.java:189)
>   at java.base/java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:290)
>   at java.base/java.util.concurrent.ForkJoinTask.doJoin(ForkJoinTask.java:396)
>   at java.base/java.util.concurrent.ForkJoinTask.join(ForkJoinTask.java:721)
>   at scala.collection.parallel.ForkJoinTasks$WrappedTask.sync(Tasks.scala:379)
>   at 
> scala.collection.parallel.ForkJoinTasks$WrappedTask.sync$(Tasks.scala:379)
>   at 
> scala.collection.parallel.AdaptiveWorkStealingForkJoinTasks$WrappedTask.sync(Tasks.scala:440)
>   at 
> 

[jira] [Commented] (SPARK-42093) Move JavaTypeInference to AgnosticEncoders

2023-01-16 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17677488#comment-17677488
 ] 

Apache Spark commented on SPARK-42093:
--

User 'hvanhovell' has created a pull request for this issue:
https://github.com/apache/spark/pull/39615

> Move JavaTypeInference to AgnosticEncoders
> --
>
> Key: SPARK-42093
> URL: https://issues.apache.org/jira/browse/SPARK-42093
> Project: Spark
>  Issue Type: Task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Herman van Hövell
>Assignee: Herman van Hövell
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42093) Move JavaTypeInference to AgnosticEncoders

2023-01-16 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42093?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42093:


Assignee: Herman van Hövell  (was: Apache Spark)

> Move JavaTypeInference to AgnosticEncoders
> --
>
> Key: SPARK-42093
> URL: https://issues.apache.org/jira/browse/SPARK-42093
> Project: Spark
>  Issue Type: Task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Herman van Hövell
>Assignee: Herman van Hövell
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42093) Move JavaTypeInference to AgnosticEncoders

2023-01-16 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42093?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42093:


Assignee: Apache Spark  (was: Herman van Hövell)

> Move JavaTypeInference to AgnosticEncoders
> --
>
> Key: SPARK-42093
> URL: https://issues.apache.org/jira/browse/SPARK-42093
> Project: Spark
>  Issue Type: Task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Herman van Hövell
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-42093) Move JavaTypeInference to AgnosticEncoders

2023-01-16 Thread Jira
Herman van Hövell created SPARK-42093:
-

 Summary: Move JavaTypeInference to AgnosticEncoders
 Key: SPARK-42093
 URL: https://issues.apache.org/jira/browse/SPARK-42093
 Project: Spark
  Issue Type: Task
  Components: SQL
Affects Versions: 3.4.0
Reporter: Herman van Hövell
Assignee: Herman van Hövell






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42002) Implement DataFrameWriterV2 (ReadwriterV2Tests)

2023-01-16 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42002:


Assignee: (was: Apache Spark)

> Implement DataFrameWriterV2 (ReadwriterV2Tests)
> ---
>
> Key: SPARK-42002
> URL: https://issues.apache.org/jira/browse/SPARK-42002
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Hyukjin Kwon
>Priority: Major
>
> {code}
> pyspark/sql/tests/test_readwriter.py:182 (ReadwriterV2ParityTests.test_api)
> self = 
>  testMethod=test_api>
> def test_api(self):
> df = self.df
> >   writer = df.writeTo("testcat.t")
> ../test_readwriter.py:185: 
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
> _ 
> self = DataFrame[key: bigint, value: string], args = ('testcat.t',), kwargs = 
> {}
> def writeTo(self, *args: Any, **kwargs: Any) -> None:
> >   raise NotImplementedError("writeTo() is not implemented.")
> E   NotImplementedError: writeTo() is not implemented.
> ../../connect/dataframe.py:1529: NotImplementedError
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42002) Implement DataFrameWriterV2 (ReadwriterV2Tests)

2023-01-16 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17677487#comment-17677487
 ] 

Apache Spark commented on SPARK-42002:
--

User 'techaddict' has created a pull request for this issue:
https://github.com/apache/spark/pull/39614

> Implement DataFrameWriterV2 (ReadwriterV2Tests)
> ---
>
> Key: SPARK-42002
> URL: https://issues.apache.org/jira/browse/SPARK-42002
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Hyukjin Kwon
>Priority: Major
>
> {code}
> pyspark/sql/tests/test_readwriter.py:182 (ReadwriterV2ParityTests.test_api)
> self = 
>  testMethod=test_api>
> def test_api(self):
> df = self.df
> >   writer = df.writeTo("testcat.t")
> ../test_readwriter.py:185: 
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
> _ 
> self = DataFrame[key: bigint, value: string], args = ('testcat.t',), kwargs = 
> {}
> def writeTo(self, *args: Any, **kwargs: Any) -> None:
> >   raise NotImplementedError("writeTo() is not implemented.")
> E   NotImplementedError: writeTo() is not implemented.
> ../../connect/dataframe.py:1529: NotImplementedError
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42002) Implement DataFrameWriterV2 (ReadwriterV2Tests)

2023-01-16 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42002:


Assignee: Apache Spark

> Implement DataFrameWriterV2 (ReadwriterV2Tests)
> ---
>
> Key: SPARK-42002
> URL: https://issues.apache.org/jira/browse/SPARK-42002
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Hyukjin Kwon
>Assignee: Apache Spark
>Priority: Major
>
> {code}
> pyspark/sql/tests/test_readwriter.py:182 (ReadwriterV2ParityTests.test_api)
> self = 
>  testMethod=test_api>
> def test_api(self):
> df = self.df
> >   writer = df.writeTo("testcat.t")
> ../test_readwriter.py:185: 
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
> _ 
> self = DataFrame[key: bigint, value: string], args = ('testcat.t',), kwargs = 
> {}
> def writeTo(self, *args: Any, **kwargs: Any) -> None:
> >   raise NotImplementedError("writeTo() is not implemented.")
> E   NotImplementedError: writeTo() is not implemented.
> ../../connect/dataframe.py:1529: NotImplementedError
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42092) Upgrade RoaringBitmap to 0.9.38

2023-01-16 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17677465#comment-17677465
 ] 

Apache Spark commented on SPARK-42092:
--

User 'LuciferYang' has created a pull request for this issue:
https://github.com/apache/spark/pull/39613

> Upgrade RoaringBitmap to 0.9.38
> ---
>
> Key: SPARK-42092
> URL: https://issues.apache.org/jira/browse/SPARK-42092
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.4.0
>Reporter: Yang Jie
>Priority: Minor
>
> https://github.com/RoaringBitmap/RoaringBitmap/compare/0.9.36...0.9.38



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42092) Upgrade RoaringBitmap to 0.9.38

2023-01-16 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42092?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42092:


Assignee: (was: Apache Spark)

> Upgrade RoaringBitmap to 0.9.38
> ---
>
> Key: SPARK-42092
> URL: https://issues.apache.org/jira/browse/SPARK-42092
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.4.0
>Reporter: Yang Jie
>Priority: Minor
>
> https://github.com/RoaringBitmap/RoaringBitmap/compare/0.9.36...0.9.38



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42092) Upgrade RoaringBitmap to 0.9.38

2023-01-16 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42092?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42092:


Assignee: Apache Spark

> Upgrade RoaringBitmap to 0.9.38
> ---
>
> Key: SPARK-42092
> URL: https://issues.apache.org/jira/browse/SPARK-42092
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.4.0
>Reporter: Yang Jie
>Assignee: Apache Spark
>Priority: Minor
>
> https://github.com/RoaringBitmap/RoaringBitmap/compare/0.9.36...0.9.38



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42092) Upgrade RoaringBitmap to 0.9.38

2023-01-16 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17677464#comment-17677464
 ] 

Apache Spark commented on SPARK-42092:
--

User 'LuciferYang' has created a pull request for this issue:
https://github.com/apache/spark/pull/39613

> Upgrade RoaringBitmap to 0.9.38
> ---
>
> Key: SPARK-42092
> URL: https://issues.apache.org/jira/browse/SPARK-42092
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.4.0
>Reporter: Yang Jie
>Priority: Minor
>
> https://github.com/RoaringBitmap/RoaringBitmap/compare/0.9.36...0.9.38



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-42092) Upgrade RoaringBitmap to 0.9.38

2023-01-16 Thread Yang Jie (Jira)
Yang Jie created SPARK-42092:


 Summary: Upgrade RoaringBitmap to 0.9.38
 Key: SPARK-42092
 URL: https://issues.apache.org/jira/browse/SPARK-42092
 Project: Spark
  Issue Type: Improvement
  Components: Build
Affects Versions: 3.4.0
Reporter: Yang Jie


https://github.com/RoaringBitmap/RoaringBitmap/compare/0.9.36...0.9.38



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42091) Upgrade jetty to 9.4.50.v20221201

2023-01-16 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42091:


Assignee: (was: Apache Spark)

> Upgrade jetty to 9.4.50.v20221201
> -
>
> Key: SPARK-42091
> URL: https://issues.apache.org/jira/browse/SPARK-42091
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.4.0
>Reporter: Yang Jie
>Priority: Minor
>
> https://github.com/eclipse/jetty.project/releases/tag/jetty-9.4.50.v20221201



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42091) Upgrade jetty to 9.4.50.v20221201

2023-01-16 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17677460#comment-17677460
 ] 

Apache Spark commented on SPARK-42091:
--

User 'LuciferYang' has created a pull request for this issue:
https://github.com/apache/spark/pull/39612

> Upgrade jetty to 9.4.50.v20221201
> -
>
> Key: SPARK-42091
> URL: https://issues.apache.org/jira/browse/SPARK-42091
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.4.0
>Reporter: Yang Jie
>Priority: Minor
>
> https://github.com/eclipse/jetty.project/releases/tag/jetty-9.4.50.v20221201



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42091) Upgrade jetty to 9.4.50.v20221201

2023-01-16 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42091:


Assignee: Apache Spark

> Upgrade jetty to 9.4.50.v20221201
> -
>
> Key: SPARK-42091
> URL: https://issues.apache.org/jira/browse/SPARK-42091
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.4.0
>Reporter: Yang Jie
>Assignee: Apache Spark
>Priority: Minor
>
> https://github.com/eclipse/jetty.project/releases/tag/jetty-9.4.50.v20221201



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-42091) Upgrade jetty to 9.4.50.v20221201

2023-01-16 Thread Yang Jie (Jira)
Yang Jie created SPARK-42091:


 Summary: Upgrade jetty to 9.4.50.v20221201
 Key: SPARK-42091
 URL: https://issues.apache.org/jira/browse/SPARK-42091
 Project: Spark
  Issue Type: Improvement
  Components: Build
Affects Versions: 3.4.0
Reporter: Yang Jie


https://github.com/eclipse/jetty.project/releases/tag/jetty-9.4.50.v20221201



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41708) Pull v1write information to WriteFiles

2023-01-16 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17677405#comment-17677405
 ] 

Apache Spark commented on SPARK-41708:
--

User 'LuciferYang' has created a pull request for this issue:
https://github.com/apache/spark/pull/39610

> Pull v1write information to WriteFiles
> --
>
> Key: SPARK-41708
> URL: https://issues.apache.org/jira/browse/SPARK-41708
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: XiDuo You
>Assignee: XiDuo You
>Priority: Major
> Fix For: 3.4.0
>
>
> Make WriteFiles hold v1 write information



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42090) Introduce sasl retry count in RetryingBlockTransferor

2023-01-16 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42090:


Assignee: (was: Apache Spark)

> Introduce sasl retry count in RetryingBlockTransferor
> -
>
> Key: SPARK-42090
> URL: https://issues.apache.org/jira/browse/SPARK-42090
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.4.0
>Reporter: Ted Yu
>Priority: Major
>
> Previously a boolean variable, saslTimeoutSeen, was used in 
> RetryingBlockTransferor. However, the boolean variable wouldn't cover the 
> following scenario:
> 1. SaslTimeoutException
> 2. IOException
> 3. SaslTimeoutException
> 4. IOException
> Even though IOException at #2 is retried (resulting in increment of 
> retryCount), the retryCount would be cleared at step #4.
> Since the intention of saslTimeoutSeen is to undo the increment due to 
> retrying SaslTimeoutException, we should keep a counter for 
> SaslTimeoutException retries and subtract the value of this counter from 
> retryCount.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42090) Introduce sasl retry count in RetryingBlockTransferor

2023-01-16 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42090:


Assignee: Apache Spark

> Introduce sasl retry count in RetryingBlockTransferor
> -
>
> Key: SPARK-42090
> URL: https://issues.apache.org/jira/browse/SPARK-42090
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.4.0
>Reporter: Ted Yu
>Assignee: Apache Spark
>Priority: Major
>
> Previously a boolean variable, saslTimeoutSeen, was used in 
> RetryingBlockTransferor. However, the boolean variable wouldn't cover the 
> following scenario:
> 1. SaslTimeoutException
> 2. IOException
> 3. SaslTimeoutException
> 4. IOException
> Even though IOException at #2 is retried (resulting in increment of 
> retryCount), the retryCount would be cleared at step #4.
> Since the intention of saslTimeoutSeen is to undo the increment due to 
> retrying SaslTimeoutException, we should keep a counter for 
> SaslTimeoutException retries and subtract the value of this counter from 
> retryCount.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42090) Introduce sasl retry count in RetryingBlockTransferor

2023-01-16 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17677404#comment-17677404
 ] 

Apache Spark commented on SPARK-42090:
--

User 'tedyu' has created a pull request for this issue:
https://github.com/apache/spark/pull/39611

> Introduce sasl retry count in RetryingBlockTransferor
> -
>
> Key: SPARK-42090
> URL: https://issues.apache.org/jira/browse/SPARK-42090
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.4.0
>Reporter: Ted Yu
>Priority: Major
>
> Previously a boolean variable, saslTimeoutSeen, was used in 
> RetryingBlockTransferor. However, the boolean variable wouldn't cover the 
> following scenario:
> 1. SaslTimeoutException
> 2. IOException
> 3. SaslTimeoutException
> 4. IOException
> Even though IOException at #2 is retried (resulting in increment of 
> retryCount), the retryCount would be cleared at step #4.
> Since the intention of saslTimeoutSeen is to undo the increment due to 
> retrying SaslTimeoutException, we should keep a counter for 
> SaslTimeoutException retries and subtract the value of this counter from 
> retryCount.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-42090) Introduce sasl retry count in RetryingBlockTransferor

2023-01-16 Thread Ted Yu (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated SPARK-42090:
---
Description: 
Previously a boolean variable, saslTimeoutSeen, was used in 
RetryingBlockTransferor. However, the boolean variable wouldn't cover the 
following scenario:

1. SaslTimeoutException
2. IOException
3. SaslTimeoutException
4. IOException

Even though IOException at #2 is retried (resulting in increment of 
retryCount), the retryCount would be cleared at step #4.
Since the intention of saslTimeoutSeen is to undo the increment due to retrying 
SaslTimeoutException, we should keep a counter for SaslTimeoutException retries 
and subtract the value of this counter from retryCount.

  was:
Previously a boolean variable, saslTimeoutSeen, was used. However, the boolean 
variable wouldn't cover the following scenario:

1. SaslTimeoutException
2. IOException
3. SaslTimeoutException
4. IOException

Even though IOException at #2 is retried (resulting in increment of 
retryCount), the retryCount would be cleared at step #4.
Since the intention of saslTimeoutSeen is to undo the increment due to retrying 
SaslTimeoutException, we should keep a counter for SaslTimeoutException retries 
and subtract the value of this counter from retryCount.


> Introduce sasl retry count in RetryingBlockTransferor
> -
>
> Key: SPARK-42090
> URL: https://issues.apache.org/jira/browse/SPARK-42090
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.4.0
>Reporter: Ted Yu
>Priority: Major
>
> Previously a boolean variable, saslTimeoutSeen, was used in 
> RetryingBlockTransferor. However, the boolean variable wouldn't cover the 
> following scenario:
> 1. SaslTimeoutException
> 2. IOException
> 3. SaslTimeoutException
> 4. IOException
> Even though IOException at #2 is retried (resulting in increment of 
> retryCount), the retryCount would be cleared at step #4.
> Since the intention of saslTimeoutSeen is to undo the increment due to 
> retrying SaslTimeoutException, we should keep a counter for 
> SaslTimeoutException retries and subtract the value of this counter from 
> retryCount.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-42090) Introduce sasl retry count in RetryingBlockTransferor

2023-01-16 Thread Ted Yu (Jira)
Ted Yu created SPARK-42090:
--

 Summary: Introduce sasl retry count in RetryingBlockTransferor
 Key: SPARK-42090
 URL: https://issues.apache.org/jira/browse/SPARK-42090
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 3.4.0
Reporter: Ted Yu


Previously a boolean variable, saslTimeoutSeen, was used. However, the boolean 
variable wouldn't cover the following scenario:

1. SaslTimeoutException
2. IOException
3. SaslTimeoutException
4. IOException

Even though IOException at #2 is retried (resulting in increment of 
retryCount), the retryCount would be cleared at step #4.
Since the intention of saslTimeoutSeen is to undo the increment due to retrying 
SaslTimeoutException, we should keep a counter for SaslTimeoutException retries 
and subtract the value of this counter from retryCount.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-41994) Harden SQLSTATE usage for error classes

2023-01-16 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-41994:
---

Assignee: Serge Rielau

> Harden SQLSTATE usage for error classes
> ---
>
> Key: SPARK-41994
> URL: https://issues.apache.org/jira/browse/SPARK-41994
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.4.0
>Reporter: Serge Rielau
>Assignee: Serge Rielau
>Priority: Major
> Fix For: 3.4.0
>
>
> Error classes are great, but for JDBC, ODBC etc the SQLSTATEs of the standard 
> reign.
> We have started adding SQLSTATEs but have not really paid attention to their 
> correctness.
> Here is a unified view of SQLSTATE's used in the  
> [Industry.|https://docs.google.com/spreadsheets/d/1hrQBSuHooiozUNAQTHiYq3WidS1uliHpl9cYfWpig1c/edit?usp=sharing]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-41994) Harden SQLSTATE usage for error classes

2023-01-16 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-41994.
-
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 39537
[https://github.com/apache/spark/pull/39537]

> Harden SQLSTATE usage for error classes
> ---
>
> Key: SPARK-41994
> URL: https://issues.apache.org/jira/browse/SPARK-41994
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.4.0
>Reporter: Serge Rielau
>Priority: Major
> Fix For: 3.4.0
>
>
> Error classes are great, but for JDBC, ODBC etc the SQLSTATEs of the standard 
> reign.
> We have started adding SQLSTATEs but have not really paid attention to their 
> correctness.
> Here is a unified view of SQLSTATE's used in the  
> [Industry.|https://docs.google.com/spreadsheets/d/1hrQBSuHooiozUNAQTHiYq3WidS1uliHpl9cYfWpig1c/edit?usp=sharing]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41896) Filtering by row_index always returns empty results

2023-01-16 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17677344#comment-17677344
 ] 

Apache Spark commented on SPARK-41896:
--

User 'olaky' has created a pull request for this issue:
https://github.com/apache/spark/pull/39608

> Filtering by row_index always returns empty results
> ---
>
> Key: SPARK-41896
> URL: https://issues.apache.org/jira/browse/SPARK-41896
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.4.0
>Reporter: Jan-Ole Sasse
>Assignee: Jan-Ole Sasse
>Priority: Critical
> Fix For: 3.4.0
>
>
> Queries that include a filter with row_index currently always return an empty 
> result. This is because we consider all metadata attributes constant per file 
> [here|https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/PartitioningAwareFileIndex.scala#L76]
>  and the filter then always evaluates to false.
> This should be fixed as a follow up to SPARK-41791



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-41902) Parity in String representation of higher_order_function's output

2023-01-16 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41902:


Assignee: (was: Apache Spark)

> Parity in String representation of higher_order_function's output
> -
>
> Key: SPARK-41902
> URL: https://issues.apache.org/jira/browse/SPARK-41902
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Sandeep Singh
>Priority: Major
>
> {code:java}
> from pyspark.sql.functions import flatten, struct, transform
> df = self.spark.sql("SELECT array(1, 2, 3) as numbers, array('a', 'b', 'c') 
> as letters")
> actual = df.select(
> flatten(
> transform(
> "numbers",
> lambda number: transform(
> "letters", lambda letter: struct(number.alias("n"), 
> letter.alias("l"))
> ),
> )
> )
> ).first()[0]
> expected = [
> (1, "a"),
> (1, "b"),
> (1, "c"),
> (2, "a"),
> (2, "b"),
> (2, "c"),
> (3, "a"),
> (3, "b"),
> (3, "c"),
> ]
> self.assertEquals(actual, expected){code}
> {code:java}
> Traceback (most recent call last):
>   File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/tests/test_functions.py",
>  line 809, in test_nested_higher_order_function
> self.assertEquals(actual, expected)
> AssertionError: Lists differ: [{'n': 'a', 'l': 'a'}, {'n': 'b', 'l': 'b'[151 
> chars]'c'}] != [(1, 'a'), (1, 'b'), (1, 'c'), (2, 'a'), ([43 chars]'c')]
> First differing element 0:
> {'n': 'a', 'l': 'a'}
> (1, 'a')
> - [{'l': 'a', 'n': 'a'},
> -  {'l': 'b', 'n': 'b'},
> -  {'l': 'c', 'n': 'c'},
> -  {'l': 'a', 'n': 'a'},
> -  {'l': 'b', 'n': 'b'},
> -  {'l': 'c', 'n': 'c'},
> -  {'l': 'a', 'n': 'a'},
> -  {'l': 'b', 'n': 'b'},
> -  {'l': 'c', 'n': 'c'}]
> + [(1, 'a'),
> +  (1, 'b'),
> +  (1, 'c'),
> +  (2, 'a'),
> +  (2, 'b'),
> +  (2, 'c'),
> +  (3, 'a'),
> +  (3, 'b'),
> +  (3, 'c')]
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-41902) Parity in String representation of higher_order_function's output

2023-01-16 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41902:


Assignee: Apache Spark

> Parity in String representation of higher_order_function's output
> -
>
> Key: SPARK-41902
> URL: https://issues.apache.org/jira/browse/SPARK-41902
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Sandeep Singh
>Assignee: Apache Spark
>Priority: Major
>
> {code:java}
> from pyspark.sql.functions import flatten, struct, transform
> df = self.spark.sql("SELECT array(1, 2, 3) as numbers, array('a', 'b', 'c') 
> as letters")
> actual = df.select(
> flatten(
> transform(
> "numbers",
> lambda number: transform(
> "letters", lambda letter: struct(number.alias("n"), 
> letter.alias("l"))
> ),
> )
> )
> ).first()[0]
> expected = [
> (1, "a"),
> (1, "b"),
> (1, "c"),
> (2, "a"),
> (2, "b"),
> (2, "c"),
> (3, "a"),
> (3, "b"),
> (3, "c"),
> ]
> self.assertEquals(actual, expected){code}
> {code:java}
> Traceback (most recent call last):
>   File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/tests/test_functions.py",
>  line 809, in test_nested_higher_order_function
> self.assertEquals(actual, expected)
> AssertionError: Lists differ: [{'n': 'a', 'l': 'a'}, {'n': 'b', 'l': 'b'[151 
> chars]'c'}] != [(1, 'a'), (1, 'b'), (1, 'c'), (2, 'a'), ([43 chars]'c')]
> First differing element 0:
> {'n': 'a', 'l': 'a'}
> (1, 'a')
> - [{'l': 'a', 'n': 'a'},
> -  {'l': 'b', 'n': 'b'},
> -  {'l': 'c', 'n': 'c'},
> -  {'l': 'a', 'n': 'a'},
> -  {'l': 'b', 'n': 'b'},
> -  {'l': 'c', 'n': 'c'},
> -  {'l': 'a', 'n': 'a'},
> -  {'l': 'b', 'n': 'b'},
> -  {'l': 'c', 'n': 'c'}]
> + [(1, 'a'),
> +  (1, 'b'),
> +  (1, 'c'),
> +  (2, 'a'),
> +  (2, 'b'),
> +  (2, 'c'),
> +  (3, 'a'),
> +  (3, 'b'),
> +  (3, 'c')]
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41902) Parity in String representation of higher_order_function's output

2023-01-16 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17677335#comment-17677335
 ] 

Apache Spark commented on SPARK-41902:
--

User 'zhengruifeng' has created a pull request for this issue:
https://github.com/apache/spark/pull/39607

> Parity in String representation of higher_order_function's output
> -
>
> Key: SPARK-41902
> URL: https://issues.apache.org/jira/browse/SPARK-41902
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Sandeep Singh
>Priority: Major
>
> {code:java}
> from pyspark.sql.functions import flatten, struct, transform
> df = self.spark.sql("SELECT array(1, 2, 3) as numbers, array('a', 'b', 'c') 
> as letters")
> actual = df.select(
> flatten(
> transform(
> "numbers",
> lambda number: transform(
> "letters", lambda letter: struct(number.alias("n"), 
> letter.alias("l"))
> ),
> )
> )
> ).first()[0]
> expected = [
> (1, "a"),
> (1, "b"),
> (1, "c"),
> (2, "a"),
> (2, "b"),
> (2, "c"),
> (3, "a"),
> (3, "b"),
> (3, "c"),
> ]
> self.assertEquals(actual, expected){code}
> {code:java}
> Traceback (most recent call last):
>   File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/tests/test_functions.py",
>  line 809, in test_nested_higher_order_function
> self.assertEquals(actual, expected)
> AssertionError: Lists differ: [{'n': 'a', 'l': 'a'}, {'n': 'b', 'l': 'b'[151 
> chars]'c'}] != [(1, 'a'), (1, 'b'), (1, 'c'), (2, 'a'), ([43 chars]'c')]
> First differing element 0:
> {'n': 'a', 'l': 'a'}
> (1, 'a')
> - [{'l': 'a', 'n': 'a'},
> -  {'l': 'b', 'n': 'b'},
> -  {'l': 'c', 'n': 'c'},
> -  {'l': 'a', 'n': 'a'},
> -  {'l': 'b', 'n': 'b'},
> -  {'l': 'c', 'n': 'c'},
> -  {'l': 'a', 'n': 'a'},
> -  {'l': 'b', 'n': 'b'},
> -  {'l': 'c', 'n': 'c'}]
> + [(1, 'a'),
> +  (1, 'b'),
> +  (1, 'c'),
> +  (2, 'a'),
> +  (2, 'b'),
> +  (2, 'c'),
> +  (3, 'a'),
> +  (3, 'b'),
> +  (3, 'c')]
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



  1   2   >