[jira] [Updated] (SPARK-41857) Enable test_between_function, test_datetime_functions, test_expr, test_math_functions, test_window_functions_cumulative_sum, test_corr, test_cov, test_crosstab, test_app

2023-01-02 Thread Sandeep Singh (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sandeep Singh updated SPARK-41857:
--
Summary: Enable test_between_function, test_datetime_functions, test_expr, 
test_math_functions, test_window_functions_cumulative_sum, test_corr, test_cov, 
test_crosstab, test_approxQuantile  (was: Enable test_between_function, 
test_datetime_functions, test_expr, test_function_parity, test_math_functions, 
test_window_functions_cumulative_sum, test_corr, test_cov, test_crosstab, 
test_approxQuantile)

> Enable test_between_function, test_datetime_functions, test_expr, 
> test_math_functions, test_window_functions_cumulative_sum, test_corr, 
> test_cov, test_crosstab, test_approxQuantile
> 
>
> Key: SPARK-41857
> URL: https://issues.apache.org/jira/browse/SPARK-41857
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Sandeep Singh
>Assignee: Hyukjin Kwon
>Priority: Major
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-41857) Enable test_between_function, test_datetime_functions, test_expr, test_function_parity, test_math_functions, test_window_functions_cumulative_sum, test_corr, test_cov,

2023-01-02 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41857:


Assignee: Apache Spark  (was: Hyukjin Kwon)

> Enable test_between_function, test_datetime_functions, test_expr, 
> test_function_parity, test_math_functions, 
> test_window_functions_cumulative_sum, test_corr, test_cov, test_crosstab, 
> test_approxQuantile
> --
>
> Key: SPARK-41857
> URL: https://issues.apache.org/jira/browse/SPARK-41857
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Sandeep Singh
>Assignee: Apache Spark
>Priority: Major
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41857) Enable test_between_function, test_datetime_functions, test_expr, test_function_parity, test_math_functions, test_window_functions_cumulative_sum, test_corr, test_cov,

2023-01-02 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17653829#comment-17653829
 ] 

Apache Spark commented on SPARK-41857:
--

User 'techaddict' has created a pull request for this issue:
https://github.com/apache/spark/pull/39359

> Enable test_between_function, test_datetime_functions, test_expr, 
> test_function_parity, test_math_functions, 
> test_window_functions_cumulative_sum, test_corr, test_cov, test_crosstab, 
> test_approxQuantile
> --
>
> Key: SPARK-41857
> URL: https://issues.apache.org/jira/browse/SPARK-41857
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Sandeep Singh
>Assignee: Hyukjin Kwon
>Priority: Major
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-41857) Enable test_between_function, test_datetime_functions, test_expr, test_function_parity, test_math_functions, test_window_functions_cumulative_sum, test_corr, test_cov,

2023-01-02 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41857:


Assignee: Hyukjin Kwon  (was: Apache Spark)

> Enable test_between_function, test_datetime_functions, test_expr, 
> test_function_parity, test_math_functions, 
> test_window_functions_cumulative_sum, test_corr, test_cov, test_crosstab, 
> test_approxQuantile
> --
>
> Key: SPARK-41857
> URL: https://issues.apache.org/jira/browse/SPARK-41857
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Sandeep Singh
>Assignee: Hyukjin Kwon
>Priority: Major
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-41857) Enable test_between_function, test_datetime_functions, test_expr, test_function_parity, test_math_functions, test_window_functions_cumulative_sum, test_corr, test_cov, t

2023-01-02 Thread Sandeep Singh (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sandeep Singh updated SPARK-41857:
--
Summary: Enable test_between_function, test_datetime_functions, test_expr, 
test_function_parity, test_math_functions, 
test_window_functions_cumulative_sum, test_corr, test_cov, test_crosstab, 
test_approxQuantile  (was: Enable 10 tests that pass)

> Enable test_between_function, test_datetime_functions, test_expr, 
> test_function_parity, test_math_functions, 
> test_window_functions_cumulative_sum, test_corr, test_cov, test_crosstab, 
> test_approxQuantile
> --
>
> Key: SPARK-41857
> URL: https://issues.apache.org/jira/browse/SPARK-41857
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Sandeep Singh
>Assignee: Hyukjin Kwon
>Priority: Major
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-41857) Enable 10 tests that pass

2023-01-02 Thread Sandeep Singh (Jira)
Sandeep Singh created SPARK-41857:
-

 Summary: Enable 10 tests that pass
 Key: SPARK-41857
 URL: https://issues.apache.org/jira/browse/SPARK-41857
 Project: Spark
  Issue Type: Sub-task
  Components: Connect
Affects Versions: 3.4.0
Reporter: Sandeep Singh
Assignee: Hyukjin Kwon
 Fix For: 3.4.0






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41856) Enable test_create_nan_decimal_dataframe, test_freqItems, test_input_files, test_toDF_with_schema_string, test_to_pandas_required_pandas_not_found

2023-01-02 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17653819#comment-17653819
 ] 

Apache Spark commented on SPARK-41856:
--

User 'techaddict' has created a pull request for this issue:
https://github.com/apache/spark/pull/39358

> Enable test_create_nan_decimal_dataframe, test_freqItems, test_input_files, 
> test_toDF_with_schema_string, test_to_pandas_required_pandas_not_found
> --
>
> Key: SPARK-41856
> URL: https://issues.apache.org/jira/browse/SPARK-41856
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, Tests
>Affects Versions: 3.4.0
>Reporter: Sandeep Singh
>Assignee: Hyukjin Kwon
>Priority: Major
> Fix For: 3.4.0
>
>
> 5 tests pass now. Should enable them.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-41856) Enable test_create_nan_decimal_dataframe, test_freqItems, test_input_files, test_toDF_with_schema_string, test_to_pandas_required_pandas_not_found

2023-01-02 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41856:


Assignee: Apache Spark  (was: Hyukjin Kwon)

> Enable test_create_nan_decimal_dataframe, test_freqItems, test_input_files, 
> test_toDF_with_schema_string, test_to_pandas_required_pandas_not_found
> --
>
> Key: SPARK-41856
> URL: https://issues.apache.org/jira/browse/SPARK-41856
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, Tests
>Affects Versions: 3.4.0
>Reporter: Sandeep Singh
>Assignee: Apache Spark
>Priority: Major
> Fix For: 3.4.0
>
>
> 5 tests pass now. Should enable them.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41856) Enable test_create_nan_decimal_dataframe, test_freqItems, test_input_files, test_toDF_with_schema_string, test_to_pandas_required_pandas_not_found

2023-01-02 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17653818#comment-17653818
 ] 

Apache Spark commented on SPARK-41856:
--

User 'techaddict' has created a pull request for this issue:
https://github.com/apache/spark/pull/39358

> Enable test_create_nan_decimal_dataframe, test_freqItems, test_input_files, 
> test_toDF_with_schema_string, test_to_pandas_required_pandas_not_found
> --
>
> Key: SPARK-41856
> URL: https://issues.apache.org/jira/browse/SPARK-41856
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, Tests
>Affects Versions: 3.4.0
>Reporter: Sandeep Singh
>Assignee: Hyukjin Kwon
>Priority: Major
> Fix For: 3.4.0
>
>
> 5 tests pass now. Should enable them.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-41856) Enable test_create_nan_decimal_dataframe, test_freqItems, test_input_files, test_toDF_with_schema_string, test_to_pandas_required_pandas_not_found

2023-01-02 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41856:


Assignee: Hyukjin Kwon  (was: Apache Spark)

> Enable test_create_nan_decimal_dataframe, test_freqItems, test_input_files, 
> test_toDF_with_schema_string, test_to_pandas_required_pandas_not_found
> --
>
> Key: SPARK-41856
> URL: https://issues.apache.org/jira/browse/SPARK-41856
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, Tests
>Affects Versions: 3.4.0
>Reporter: Sandeep Singh
>Assignee: Hyukjin Kwon
>Priority: Major
> Fix For: 3.4.0
>
>
> 5 tests pass now. Should enable them.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-41856) Enable test_create_nan_decimal_dataframe, test_freqItems, test_input_files, test_toDF_with_schema_string, test_to_pandas_required_pandas_not_found

2023-01-02 Thread Sandeep Singh (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sandeep Singh updated SPARK-41856:
--
Description: 5 tests pass now. Should enable them.  (was: These tests pass 
now. Should enable them.)

> Enable test_create_nan_decimal_dataframe, test_freqItems, test_input_files, 
> test_toDF_with_schema_string, test_to_pandas_required_pandas_not_found
> --
>
> Key: SPARK-41856
> URL: https://issues.apache.org/jira/browse/SPARK-41856
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, Tests
>Affects Versions: 3.4.0
>Reporter: Sandeep Singh
>Assignee: Hyukjin Kwon
>Priority: Major
> Fix For: 3.4.0
>
>
> 5 tests pass now. Should enable them.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-41856) Enable test_create_nan_decimal_dataframe, test_freqItems, test_input_files, test_toDF_with_schema_string, test_to_pandas_required_pandas_not_found

2023-01-02 Thread Sandeep Singh (Jira)
Sandeep Singh created SPARK-41856:
-

 Summary: Enable test_create_nan_decimal_dataframe, test_freqItems, 
test_input_files, test_toDF_with_schema_string, 
test_to_pandas_required_pandas_not_found
 Key: SPARK-41856
 URL: https://issues.apache.org/jira/browse/SPARK-41856
 Project: Spark
  Issue Type: Sub-task
  Components: Connect, Tests
Affects Versions: 3.4.0
Reporter: Sandeep Singh
Assignee: Hyukjin Kwon
 Fix For: 3.4.0


These tests pass now. Should enable them.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41677) Protobuf serializer for StreamingQueryProgressWrapper

2023-01-02 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17653802#comment-17653802
 ] 

Apache Spark commented on SPARK-41677:
--

User 'LuciferYang' has created a pull request for this issue:
https://github.com/apache/spark/pull/39357

> Protobuf serializer for StreamingQueryProgressWrapper
> -
>
> Key: SPARK-41677
> URL: https://issues.apache.org/jira/browse/SPARK-41677
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Yang Jie
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-41677) Protobuf serializer for StreamingQueryProgressWrapper

2023-01-02 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41677:


Assignee: (was: Apache Spark)

> Protobuf serializer for StreamingQueryProgressWrapper
> -
>
> Key: SPARK-41677
> URL: https://issues.apache.org/jira/browse/SPARK-41677
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Yang Jie
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-41677) Protobuf serializer for StreamingQueryProgressWrapper

2023-01-02 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41677:


Assignee: Apache Spark

> Protobuf serializer for StreamingQueryProgressWrapper
> -
>
> Key: SPARK-41677
> URL: https://issues.apache.org/jira/browse/SPARK-41677
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Yang Jie
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41677) Protobuf serializer for StreamingQueryProgressWrapper

2023-01-02 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17653803#comment-17653803
 ] 

Apache Spark commented on SPARK-41677:
--

User 'LuciferYang' has created a pull request for this issue:
https://github.com/apache/spark/pull/39357

> Protobuf serializer for StreamingQueryProgressWrapper
> -
>
> Key: SPARK-41677
> URL: https://issues.apache.org/jira/browse/SPARK-41677
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Yang Jie
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41423) Protobuf serializer for StageDataWrapper

2023-01-02 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17653798#comment-17653798
 ] 

Apache Spark commented on SPARK-41423:
--

User 'HyukjinKwon' has created a pull request for this issue:
https://github.com/apache/spark/pull/39356

> Protobuf serializer for StageDataWrapper
> 
>
> Key: SPARK-41423
> URL: https://issues.apache.org/jira/browse/SPARK-41423
> Project: Spark
>  Issue Type: Sub-task
>  Components: Web UI
>Affects Versions: 3.4.0
>Reporter: Gengliang Wang
>Assignee: BingKun Pan
>Priority: Major
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40263) Use interruptible lock instead of synchronized in TransportClientFactory.createClient()

2023-01-02 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17653795#comment-17653795
 ] 

Apache Spark commented on SPARK-40263:
--

User 'techaddict' has created a pull request for this issue:
https://github.com/apache/spark/pull/39355

> Use interruptible lock instead of synchronized in 
> TransportClientFactory.createClient()
> ---
>
> Key: SPARK-40263
> URL: https://issues.apache.org/jira/browse/SPARK-40263
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.4.0
>Reporter: Josh Rosen
>Priority: Major
>
> Followup to SPARK-40235: we should apply a similar fix in 
> TransportClientFactory.createClient



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-40263) Use interruptible lock instead of synchronized in TransportClientFactory.createClient()

2023-01-02 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40263:


Assignee: (was: Apache Spark)

> Use interruptible lock instead of synchronized in 
> TransportClientFactory.createClient()
> ---
>
> Key: SPARK-40263
> URL: https://issues.apache.org/jira/browse/SPARK-40263
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.4.0
>Reporter: Josh Rosen
>Priority: Major
>
> Followup to SPARK-40235: we should apply a similar fix in 
> TransportClientFactory.createClient



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-40263) Use interruptible lock instead of synchronized in TransportClientFactory.createClient()

2023-01-02 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40263:


Assignee: Apache Spark

> Use interruptible lock instead of synchronized in 
> TransportClientFactory.createClient()
> ---
>
> Key: SPARK-40263
> URL: https://issues.apache.org/jira/browse/SPARK-40263
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.4.0
>Reporter: Josh Rosen
>Assignee: Apache Spark
>Priority: Major
>
> Followup to SPARK-40235: we should apply a similar fix in 
> TransportClientFactory.createClient



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40263) Use interruptible lock instead of synchronized in TransportClientFactory.createClient()

2023-01-02 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17653790#comment-17653790
 ] 

Apache Spark commented on SPARK-40263:
--

User 'techaddict' has created a pull request for this issue:
https://github.com/apache/spark/pull/39355

> Use interruptible lock instead of synchronized in 
> TransportClientFactory.createClient()
> ---
>
> Key: SPARK-40263
> URL: https://issues.apache.org/jira/browse/SPARK-40263
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.4.0
>Reporter: Josh Rosen
>Priority: Major
>
> Followup to SPARK-40235: we should apply a similar fix in 
> TransportClientFactory.createClient



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41656) Enable doctests in pyspark.sql.connect.dataframe

2023-01-02 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17653787#comment-17653787
 ] 

Apache Spark commented on SPARK-41656:
--

User 'HyukjinKwon' has created a pull request for this issue:
https://github.com/apache/spark/pull/39354

> Enable doctests in pyspark.sql.connect.dataframe
> 
>
> Key: SPARK-41656
> URL: https://issues.apache.org/jira/browse/SPARK-41656
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Hyukjin Kwon
>Assignee: Sandeep Singh
>Priority: Major
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-41839) Implement SparkSession.sparkContext

2023-01-02 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-41839.
--
Resolution: Invalid

I am resolving this because Spark Connect is not designed to support Spark 
Context or RDD API.

> Implement SparkSession.sparkContext
> ---
>
> Key: SPARK-41839
> URL: https://issues.apache.org/jira/browse/SPARK-41839
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Sandeep Singh
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41656) Enable doctests in pyspark.sql.connect.dataframe

2023-01-02 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17653786#comment-17653786
 ] 

Apache Spark commented on SPARK-41656:
--

User 'HyukjinKwon' has created a pull request for this issue:
https://github.com/apache/spark/pull/39354

> Enable doctests in pyspark.sql.connect.dataframe
> 
>
> Key: SPARK-41656
> URL: https://issues.apache.org/jira/browse/SPARK-41656
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Hyukjin Kwon
>Assignee: Sandeep Singh
>Priority: Major
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-41819) Implement Dataframe.rdd getNumPartitions

2023-01-02 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-41819.
--
Resolution: Invalid

I am resolving this because Spark Connect is not designed to support Spark 
Context or RDD API.

> Implement Dataframe.rdd getNumPartitions
> 
>
> Key: SPARK-41819
> URL: https://issues.apache.org/jira/browse/SPARK-41819
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Sandeep Singh
>Priority: Major
>
> {code:java}
> File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
> line 243, in pyspark.sql.connect.dataframe.DataFrame.coalesce
> Failed example:
>     df.coalesce(1).rdd.getNumPartitions()
> Exception raised:
>     Traceback (most recent call last):
>       File 
> "/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py",
>  line 1350, in __run
>         exec(compile(example.source, filename, "single",
>       File "", 
> line 1, in 
>         df.coalesce(1).rdd.getNumPartitions()
>     AttributeError: 'function' object has no attribute 
> 'getNumPartitions'{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41658) Enable doctests in pyspark.sql.connect.functions

2023-01-02 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17653784#comment-17653784
 ] 

Apache Spark commented on SPARK-41658:
--

User 'HyukjinKwon' has created a pull request for this issue:
https://github.com/apache/spark/pull/39354

> Enable doctests in pyspark.sql.connect.functions
> 
>
> Key: SPARK-41658
> URL: https://issues.apache.org/jira/browse/SPARK-41658
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Hyukjin Kwon
>Assignee: Sandeep Singh
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41497) Accumulator undercounting in the case of retry task with rdd cache

2023-01-02 Thread wuyi (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17653781#comment-17653781
 ] 

wuyi commented on SPARK-41497:
--

> If I am not wrong, SQL makes very heavy use of accumulators, and so most 
> stages will end up having them anyway - right ?

Right.

 

> I would expect this scenario (even without accumulator) to be fairly low 
> frequency enough that the cost of extra recomputation might be fine.
 
Agree. So shall we proceed with the improved Option 4 that was proposed by you 
[~mridulm80] ? [~ivoson] can help with the fix.
 

> Accumulator undercounting in the case of retry task with rdd cache
> --
>
> Key: SPARK-41497
> URL: https://issues.apache.org/jira/browse/SPARK-41497
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.4.8, 3.0.3, 3.1.3, 3.2.2, 3.3.1
>Reporter: wuyi
>Priority: Major
>
> Accumulator could be undercounted when the retried task has rdd cache.  See 
> the example below and you could also find the completed and reproducible 
> example at 
> [https://github.com/apache/spark/compare/master...Ngone51:spark:fix-acc]
>   
> {code:scala}
> test("SPARK-XXX") {
>   // Set up a cluster with 2 executors
>   val conf = new SparkConf()
> .setMaster("local-cluster[2, 1, 
> 1024]").setAppName("TaskSchedulerImplSuite")
>   sc = new SparkContext(conf)
>   // Set up a custom task scheduler. The scheduler will fail the first task 
> attempt of the job
>   // submitted below. In particular, the failed first attempt task would 
> success on computation
>   // (accumulator accounting, result caching) but only fail to report its 
> success status due
>   // to the concurrent executor lost. The second task attempt would success.
>   taskScheduler = setupSchedulerWithCustomStatusUpdate(sc)
>   val myAcc = sc.longAccumulator("myAcc")
>   // Initiate a rdd with only one partition so there's only one task and 
> specify the storage level
>   // with MEMORY_ONLY_2 so that the rdd result will be cached on both two 
> executors.
>   val rdd = sc.parallelize(0 until 10, 1).mapPartitions { iter =>
> myAcc.add(100)
> iter.map(x => x + 1)
>   }.persist(StorageLevel.MEMORY_ONLY_2)
>   // This will pass since the second task attempt will succeed
>   assert(rdd.count() === 10)
>   // This will fail due to `myAcc.add(100)` won't be executed during the 
> second task attempt's
>   // execution. Because the second task attempt will load the rdd cache 
> directly instead of
>   // executing the task function so `myAcc.add(100)` is skipped.
>   assert(myAcc.value === 100)
> } {code}
>  
> We could also hit this issue with decommission even if the rdd only has one 
> copy. For example, decommission could migrate the rdd cache block to another 
> executor (the result is actually the same with 2 copies) and the 
> decommissioned executor lost before the task reports its success status to 
> the driver. 
>  
> And the issue is a bit more complicated than expected to fix. I have tried to 
> give some fixes but all of them are not ideal:
> Option 1: Clean up any rdd cache related to the failed task: in practice, 
> this option can already fix the issue in most cases. However, theoretically, 
> rdd cache could be reported to the driver right after the driver cleans up 
> the failed task's caches due to asynchronous communication. So this option 
> can’t resolve the issue thoroughly;
> Option 2: Disallow rdd cache reuse across the task attempts for the same 
> task: this option can 100% fix the issue. The problem is this way can also 
> affect the case where rdd cache can be reused across the attempts (e.g., when 
> there is no accumulator operation in the task), which can have perf 
> regression;
> Option 3: Introduce accumulator cache: first, this requires a new framework 
> for supporting accumulator cache; second, the driver should improve its logic 
> to distinguish whether the accumulator cache value should be reported to the 
> user to avoid overcounting. For example, in the case of task retry, the value 
> should be reported. However, in the case of rdd cache reuse, the value 
> shouldn’t be reported (should it?);
> Option 4: Do task success validation when a task trying to load the rdd 
> cache: this way defines a rdd cache is only valid/accessible if the task has 
> succeeded. This way could be either overkill or a bit complex (because 
> currently Spark would clean up the task state once it’s finished. So we need 
> to maintain a structure to know if task once succeeded or not. )



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: 

[jira] [Resolved] (SPARK-41826) Implement Dataframe.readStream

2023-01-02 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41826?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-41826.
--
Resolution: Duplicate

> Implement Dataframe.readStream
> --
>
> Key: SPARK-41826
> URL: https://issues.apache.org/jira/browse/SPARK-41826
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Sandeep Singh
>Priority: Major
>
> {code:java}
> File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
> line ?, in pyspark.sql.connect.dataframe.DataFrame.isStreaming
> Failed example:
>     df = spark.readStream.format("rate").load()
> Exception raised:
>     Traceback (most recent call last):
>       File 
> "/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py",
>  line 1350, in __run
>         exec(compile(example.source, filename, "single",
>       File " pyspark.sql.connect.dataframe.DataFrame.isStreaming[0]>", line 1, in 
>         df = spark.readStream.format("rate").load()
>     AttributeError: 'SparkSession' object has no attribute 'readStream'{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41234) High-order function: array_insert

2023-01-02 Thread Ruifeng Zheng (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17653778#comment-17653778
 ] 

Ruifeng Zheng commented on SPARK-41234:
---


{code:sql}

++
| ARRAY_INSERT(ARRAY_CONSTRUCT(1,2,3), 0, 4) |
||
| [  |
|   4,   |
|   1,   |
|   2,   |
|   3|
| ]  |
++
1 Row(s) produced. Time Elapsed: 0.139s
++
| ARRAY_INSERT(ARRAY_CONSTRUCT(1,2,3), 2, 4) |
||
| [  |
|   1,   |
|   2,   |
|   4,   |
|   3|
| ]  |
++
1 Row(s) produced. Time Elapsed: 0.116s
+---+
| ARRAY_INSERT(ARRAY_CONSTRUCT(1,2,3), 2, NULL) |
|---|
| [ |
|   1,  |
|   2,  |
|   undefined,  |
|   3   |
| ] |
+---+
1 Row(s) produced. Time Elapsed: 0.130s
+--+
| ARRAY_INSERT(NULL, 2, 1) |
|--|
| NULL |
+--+
1 Row(s) produced. Time Elapsed: 0.106s
+-+
| ARRAY_INSERT(NULL, 2, NULL) |
|-|
| NULL|
+-+
1 Row(s) produced. Time Elapsed: 0.113s

+-+
| ARRAY_INSERT(ARRAY_CONSTRUCT(1,2,3), 10, 1) |
|-|
| [   |
|   1,|
|   2,|
|   3,|
|   undefined,|
|   undefined,|
|   undefined,|
|   undefined,|
|   undefined,|
|   undefined,|
|   undefined,|
|   1 |
| ]   |
+-+
1 Row(s) produced. Time Elapsed: 0.116s
+--+
| ARRAY_INSERT(ARRAY_CONSTRUCT(1,2,3), -10, 1) |
|--|
| [|
|   1, |
|   undefined, |
|   undefined, |
|   undefined, |
|   undefined, |
|   undefined, |
|   undefined, |
|   undefined, |
|   1, |
|   2, |
|   3  |
| ]|
+--+
1 Row(s) produced. Time Elapsed: 0.111s
++
| ARRAY_INSERT(ARRAY_CONSTRUCT(1,2,3), 10, NULL) |
||
| [  |
|   1,   |
|   2,   |
|   3,   |
|   undefined,   |
|   undefined,   |
|   undefined,   |
|   undefined,   |
|   undefined,   |
|   undefined,   |
|   undefined,   |
|   undefined|
| ]  |
++
1 Row(s) produced. Time Elapsed: 0.109s
+-+
| ARRAY_INSERT(ARRAY_CONSTRUCT(1,2,3), -10, NULL) |
|-|
| [

[jira] [Assigned] (SPARK-41311) Rewrite test RENAME_SRC_PATH_NOT_FOUND to trigger the error from user space

2023-01-02 Thread Max Gekk (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk reassigned SPARK-41311:


Assignee: Immanuel Buder

> Rewrite test RENAME_SRC_PATH_NOT_FOUND to trigger the error from user space
> ---
>
> Key: SPARK-41311
> URL: https://issues.apache.org/jira/browse/SPARK-41311
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Immanuel Buder
>Assignee: Immanuel Buder
>Priority: Minor
>
> Rewrite the test for error class *RENAME_SRC_PATH_NOT_FOUND* in 
> [QueryExecutionErrorsSuite.scala|https://github.com/apache/spark/pull/38782/files/bea17e4fa61a06ff566b0ff1c3fcc39fa1100912#diff-b1989c7e0e4b50291fb7bdd2993da387102c669a47fe6e2077b23b37d78b18be]
>  to trigger the error from user space. The current test uses non-user-facing 
> class FileSystemBasedCheckpointFileManager directly to trigger the error. 
> (see 
> [https://github.com/apache/spark/pull/38782/files/bea17e4fa61a06ff566b0ff1c3fcc39fa1100912#diff-b1989c7e0e4b50291fb7bdd2993da387102c669a47fe6e2077b23b37d78b18beR680]
>  )
> Done when: the test uses user-facing APIs as much as possible.
> Proposed solution: rewrite the test following the example of 
> [https://github.com/apache/spark/pull/38782/files/bea17e4fa61a06ff566b0ff1c3fcc39fa1100912#diff-b1989c7e0e4b50291fb7bdd2993da387102c669a47fe6e2077b23b37d78b18beR641]
> See [https://github.com/apache/spark/pull/38782#discussion_r1033013064] for 
> more context



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-41311) Rewrite test RENAME_SRC_PATH_NOT_FOUND to trigger the error from user space

2023-01-02 Thread Max Gekk (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk resolved SPARK-41311.
--
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 39348
[https://github.com/apache/spark/pull/39348]

> Rewrite test RENAME_SRC_PATH_NOT_FOUND to trigger the error from user space
> ---
>
> Key: SPARK-41311
> URL: https://issues.apache.org/jira/browse/SPARK-41311
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Immanuel Buder
>Assignee: Immanuel Buder
>Priority: Minor
> Fix For: 3.4.0
>
>
> Rewrite the test for error class *RENAME_SRC_PATH_NOT_FOUND* in 
> [QueryExecutionErrorsSuite.scala|https://github.com/apache/spark/pull/38782/files/bea17e4fa61a06ff566b0ff1c3fcc39fa1100912#diff-b1989c7e0e4b50291fb7bdd2993da387102c669a47fe6e2077b23b37d78b18be]
>  to trigger the error from user space. The current test uses non-user-facing 
> class FileSystemBasedCheckpointFileManager directly to trigger the error. 
> (see 
> [https://github.com/apache/spark/pull/38782/files/bea17e4fa61a06ff566b0ff1c3fcc39fa1100912#diff-b1989c7e0e4b50291fb7bdd2993da387102c669a47fe6e2077b23b37d78b18beR680]
>  )
> Done when: the test uses user-facing APIs as much as possible.
> Proposed solution: rewrite the test following the example of 
> [https://github.com/apache/spark/pull/38782/files/bea17e4fa61a06ff566b0ff1c3fcc39fa1100912#diff-b1989c7e0e4b50291fb7bdd2993da387102c669a47fe6e2077b23b37d78b18beR641]
> See [https://github.com/apache/spark/pull/38782#discussion_r1033013064] for 
> more context



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-41661) Support for Python UDFs

2023-01-02 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-41661:


Assignee: Xinrong Meng

> Support for Python UDFs
> ---
>
> Key: SPARK-41661
> URL: https://issues.apache.org/jira/browse/SPARK-41661
> Project: Spark
>  Issue Type: Umbrella
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Martin Grund
>Assignee: Xinrong Meng
>Priority: Major
>
> Spark Connect should support Python UDFs



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41651) Test parity: pyspark.sql.tests.test_dataframe

2023-01-02 Thread Hyukjin Kwon (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17653775#comment-17653775
 ] 

Hyukjin Kwon commented on SPARK-41651:
--

cc [~techaddict] in case you're interested in this. We can do a similar 
approach by enabling some tests and/or fixing the skipping messages with new 
JIRAs

> Test parity: pyspark.sql.tests.test_dataframe
> -
>
> Key: SPARK-41651
> URL: https://issues.apache.org/jira/browse/SPARK-41651
> Project: Spark
>  Issue Type: Umbrella
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
>
> After https://github.com/apache/spark/pull/39041 (SPARK-41528), we now reuses 
> the same test cases, see 
> {{python/pyspark/sql/tests/connect/test_parity_dataframe.py}}.
> We should remove all the test cases defined there, and fix Spark Connect 
> behaviours accordingly.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41652) Test parity: pyspark.sql.tests.test_functions

2023-01-02 Thread Hyukjin Kwon (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17653776#comment-17653776
 ] 

Hyukjin Kwon commented on SPARK-41652:
--

cc [~techaddict] in case you're interested in this. We can do a similar 
approach by enabling some tests and/or fixing the skipping messages with new 
JIRAs

> Test parity: pyspark.sql.tests.test_functions
> -
>
> Key: SPARK-41652
> URL: https://issues.apache.org/jira/browse/SPARK-41652
> Project: Spark
>  Issue Type: Umbrella
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
>
> After https://github.com/apache/spark/pull/39041 (SPARK-41528), we now reuses 
> the same test cases, see 
> {{python/pyspark/sql/tests/connect/test_parity_functions.py}}.
> We should remove all the test cases defined there, and fix Spark Connect 
> behaviours accordingly.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41658) Enable doctests in pyspark.sql.connect.functions

2023-01-02 Thread Hyukjin Kwon (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17653774#comment-17653774
 ] 

Hyukjin Kwon commented on SPARK-41658:
--

Fixed in https://github.com/apache/spark/pull/39347

> Enable doctests in pyspark.sql.connect.functions
> 
>
> Key: SPARK-41658
> URL: https://issues.apache.org/jira/browse/SPARK-41658
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Hyukjin Kwon
>Assignee: Sandeep Singh
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-41658) Enable doctests in pyspark.sql.connect.functions

2023-01-02 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-41658.
--
Resolution: Fixed

> Enable doctests in pyspark.sql.connect.functions
> 
>
> Key: SPARK-41658
> URL: https://issues.apache.org/jira/browse/SPARK-41658
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Hyukjin Kwon
>Assignee: Sandeep Singh
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-41653) Test parity: enable doctests in Spark Connect

2023-01-02 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41653?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-41653.
--
Resolution: Done

> Test parity: enable doctests in Spark Connect
> -
>
> Key: SPARK-41653
> URL: https://issues.apache.org/jira/browse/SPARK-41653
> Project: Spark
>  Issue Type: Umbrella
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Hyukjin Kwon
>Assignee: Sandeep Singh
>Priority: Major
>
> We should actually run the doctests of Spark Connect.
> We should add something like 
> https://github.com/apache/spark/blob/master/python/pyspark/sql/column.py#L1227-L1247
>  to Spark Connect modules, and add the module into 
> https://github.com/apache/spark/blob/master/dev/sparktestsupport/modules.py#L507



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-41658) Enable doctests in pyspark.sql.connect.functions

2023-01-02 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-41658:


Assignee: Sandeep Singh

> Enable doctests in pyspark.sql.connect.functions
> 
>
> Key: SPARK-41658
> URL: https://issues.apache.org/jira/browse/SPARK-41658
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Hyukjin Kwon
>Assignee: Sandeep Singh
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-41807) Remove non-existent error class: UNSUPPORTED_FEATURE.DISTRIBUTE_BY

2023-01-02 Thread Max Gekk (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk reassigned SPARK-41807:


Assignee: BingKun Pan

> Remove non-existent error class: UNSUPPORTED_FEATURE.DISTRIBUTE_BY
> --
>
> Key: SPARK-41807
> URL: https://issues.apache.org/jira/browse/SPARK-41807
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: BingKun Pan
>Assignee: BingKun Pan
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-41807) Remove non-existent error class: UNSUPPORTED_FEATURE.DISTRIBUTE_BY

2023-01-02 Thread Max Gekk (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk resolved SPARK-41807.
--
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 39335
[https://github.com/apache/spark/pull/39335]

> Remove non-existent error class: UNSUPPORTED_FEATURE.DISTRIBUTE_BY
> --
>
> Key: SPARK-41807
> URL: https://issues.apache.org/jira/browse/SPARK-41807
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: BingKun Pan
>Assignee: BingKun Pan
>Priority: Minor
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-41841) Support PyPI packaging without JVM

2023-01-02 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41841:


Assignee: Apache Spark

> Support PyPI packaging without JVM
> --
>
> Key: SPARK-41841
> URL: https://issues.apache.org/jira/browse/SPARK-41841
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build, Connect
>Affects Versions: 3.4.0
>Reporter: Hyukjin Kwon
>Assignee: Apache Spark
>Priority: Blocker
>
> We should support pip install pyspark without JVM so Spark Connect can be 
> real lightweight library.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41841) Support PyPI packaging without JVM

2023-01-02 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17653770#comment-17653770
 ] 

Apache Spark commented on SPARK-41841:
--

User 'HyukjinKwon' has created a pull request for this issue:
https://github.com/apache/spark/pull/39353

> Support PyPI packaging without JVM
> --
>
> Key: SPARK-41841
> URL: https://issues.apache.org/jira/browse/SPARK-41841
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build, Connect
>Affects Versions: 3.4.0
>Reporter: Hyukjin Kwon
>Priority: Blocker
>
> We should support pip install pyspark without JVM so Spark Connect can be 
> real lightweight library.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-41841) Support PyPI packaging without JVM

2023-01-02 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41841:


Assignee: (was: Apache Spark)

> Support PyPI packaging without JVM
> --
>
> Key: SPARK-41841
> URL: https://issues.apache.org/jira/browse/SPARK-41841
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build, Connect
>Affects Versions: 3.4.0
>Reporter: Hyukjin Kwon
>Priority: Blocker
>
> We should support pip install pyspark without JVM so Spark Connect can be 
> real lightweight library.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-41854) Automatic reformat/check python/setup.py

2023-01-02 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41854?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-41854.
--
Fix Version/s: 3.4.0
 Assignee: Hyukjin Kwon
   Resolution: Fixed

Fixed in https://github.com/apache/spark/pull/39352

> Automatic reformat/check python/setup.py 
> -
>
> Key: SPARK-41854
> URL: https://issues.apache.org/jira/browse/SPARK-41854
> Project: Spark
>  Issue Type: Test
>  Components: Build, PySpark
>Affects Versions: 3.4.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
> Fix For: 3.4.0
>
>
> python/setup.py should be also reformatted via ./dev/reformat-python



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-41855) `createDataFrame` doesn't handle None/NaN properly

2023-01-02 Thread Ruifeng Zheng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng updated SPARK-41855:
--
Description: 
{code:python}
data = [Row(id=1, value=float("NaN")), Row(id=2, value=42.0), Row(id=3, 
value=None)]

# +---+-+
# | id|value|
# +---+-+
# |  1|  NaN|
# |  2| 42.0|
# |  3| null|
# +---+-+

cdf = self.connect.createDataFrame(data)
sdf = self.spark.createDataFrame(data)

print()
print()
print(cdf._show_string(100, 100, False))
print()
print(cdf.schema)
print()
print(sdf._jdf.showString(100, 100, False))
print()
print(sdf.schema)

self.compare_by_show(cdf, sdf)
{code}



{code:java}
+---+-+
| id|value|
+---+-+
|  1| null|
|  2| 42.0|
|  3| null|
+---+-+


StructType([StructField('id', LongType(), True), StructField('value', 
DoubleType(), True)])

+---+-+
| id|value|
+---+-+
|  1|  NaN|
|  2| 42.0|
|  3| null|
+---+-+


StructType([StructField('id', LongType(), True), StructField('value', 
DoubleType(), True)])

{code}



this issue is due to that `createDataFrame` can't handle None/NaN properly:

1, in the conversion from local data to pd.DataFrame, it automatically converts 
both None and NaN to NaN
2, then in the conversion from pd.DataFrame to pa.Table, it always converts NaN 
to null

  was:
{code:python}
data = [Row(id=1, value=float("NaN")), Row(id=2, value=42.0), Row(id=3, 
value=None)]

# +---+-+
# | id|value|
# +---+-+
# |  1|  NaN|
# |  2| 42.0|
# |  3| null|
# +---+-+

cdf = self.connect.createDataFrame(data)
sdf = self.spark.createDataFrame(data)

print()
print()
print(cdf._show_string(100, 100, False))
print()
print(cdf.schema)
print()
print(sdf._jdf.showString(100, 100, False))
print()
print(sdf.schema)

self.compare_by_show(cdf, sdf)
{code}



{code:java}
+---+-+
| id|value|
+---+-+
|  1| null|
|  2| 42.0|
|  3| null|
+---+-+


StructType([StructField('id', LongType(), True), StructField('value', 
DoubleType(), True)])

+---+-+
| id|value|
+---+-+
|  1|  NaN|
|  2| 42.0|
|  3| null|
+---+-+


StructType([StructField('id', LongType(), True), StructField('value', 
DoubleType(), True)])

{code}



this issue is due to that `createDataFrame` can't handle None/NaN properly:

1, in the conversion from local data to pd.DataFrame, it automatically converts 
None to NaN
2, then in the conversion from pd.DataFrame to pa.Table, it always converts NaN 
to null


> `createDataFrame` doesn't handle None/NaN properly
> --
>
> Key: SPARK-41855
> URL: https://issues.apache.org/jira/browse/SPARK-41855
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Priority: Major
>
> {code:python}
> data = [Row(id=1, value=float("NaN")), Row(id=2, value=42.0), 
> Row(id=3, value=None)]
> # +---+-+
> # | id|value|
> # +---+-+
> # |  1|  NaN|
> # |  2| 42.0|
> # |  3| null|
> # +---+-+
> cdf = self.connect.createDataFrame(data)
> sdf = self.spark.createDataFrame(data)
> print()
> print()
> print(cdf._show_string(100, 100, False))
> print()
> print(cdf.schema)
> print()
> print(sdf._jdf.showString(100, 100, False))
> print()
> print(sdf.schema)
> self.compare_by_show(cdf, sdf)
> {code}
> {code:java}
> +---+-+
> | id|value|
> +---+-+
> |  1| null|
> |  2| 42.0|
> |  3| null|
> +---+-+
> StructType([StructField('id', LongType(), True), StructField('value', 
> DoubleType(), True)])
> +---+-+
> | id|value|
> +---+-+
> |  1|  NaN|
> |  2| 42.0|
> |  3| null|
> +---+-+
> StructType([StructField('id', LongType(), True), StructField('value', 
> DoubleType(), True)])
> {code}
> this issue is due to that `createDataFrame` can't handle None/NaN properly:
> 1, in the conversion from local data to pd.DataFrame, it automatically 
> converts both None and NaN to NaN
> 2, then in the conversion from pd.DataFrame to pa.Table, it always converts 
> NaN to null



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-41855) `createDataFrame` doesn't handle None/NaN properly

2023-01-02 Thread Ruifeng Zheng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng updated SPARK-41855:
--
Description: 
{code:python}
data = [Row(id=1, value=float("NaN")), Row(id=2, value=42.0), Row(id=3, 
value=None)]

# +---+-+
# | id|value|
# +---+-+
# |  1|  NaN|
# |  2| 42.0|
# |  3| null|
# +---+-+

cdf = self.connect.createDataFrame(data)
sdf = self.spark.createDataFrame(data)

print()
print()
print(cdf._show_string(100, 100, False))
print()
print(cdf.schema)
print()
print(sdf._jdf.showString(100, 100, False))
print()
print(sdf.schema)

self.compare_by_show(cdf, sdf)
{code}



{code:java}
+---+-+
| id|value|
+---+-+
|  1| null|
|  2| 42.0|
|  3| null|
+---+-+


StructType([StructField('id', LongType(), True), StructField('value', 
DoubleType(), True)])

+---+-+
| id|value|
+---+-+
|  1|  NaN|
|  2| 42.0|
|  3| null|
+---+-+


StructType([StructField('id', LongType(), True), StructField('value', 
DoubleType(), True)])

{code}



this issue is due to that `createDataFrame` can't handle None/NaN properly:

1, in the conversion from local data to pd.DataFrame, it automatically converts 
None to NaN
2, then in the conversion from pd.DataFrame to pa.Table, it always converts NaN 
to null

  was:

{code:python}
data = [Row(id=1, value=float("NaN")), Row(id=2, value=42.0), Row(id=3, 
value=None)]

# +---+-+
# | id|value|
# +---+-+
# |  1|  NaN|
# |  2| 42.0|
# |  3| null|
# +---+-+

cdf = self.connect.createDataFrame(data)
sdf = self.spark.createDataFrame(data)

print()
print()
print(cdf._show_string(100, 100, False))
print()
print(cdf.schema)
print()
print(sdf._jdf.showString(100, 100, False))
print()
print(sdf.schema)

self.compare_by_show(cdf, sdf)
{code}



{code:java}
+---+-+
| id|value|
+---+-+
|  1| null|
|  2| 42.0|
|  3| null|
+---+-+


StructType([StructField('id', LongType(), True), StructField('value', 
DoubleType(), True)])

+---+-+
| id|value|
+---+-+
|  1|  NaN|
|  2| 42.0|
|  3| null|
+---+-+


StructType([StructField('id', LongType(), True), StructField('value', 
DoubleType(), True)])

{code}



this issue is due to that `createDataFrame` can't handle None properly:

1, in the conversion from local data to pd.DataFrame, it automatically converts 
None to NaN
2, then in the conversion from pd.DataFrame to pa.Table, it always converts NaN 
to null


> `createDataFrame` doesn't handle None/NaN properly
> --
>
> Key: SPARK-41855
> URL: https://issues.apache.org/jira/browse/SPARK-41855
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Priority: Major
>
> {code:python}
> data = [Row(id=1, value=float("NaN")), Row(id=2, value=42.0), 
> Row(id=3, value=None)]
> # +---+-+
> # | id|value|
> # +---+-+
> # |  1|  NaN|
> # |  2| 42.0|
> # |  3| null|
> # +---+-+
> cdf = self.connect.createDataFrame(data)
> sdf = self.spark.createDataFrame(data)
> print()
> print()
> print(cdf._show_string(100, 100, False))
> print()
> print(cdf.schema)
> print()
> print(sdf._jdf.showString(100, 100, False))
> print()
> print(sdf.schema)
> self.compare_by_show(cdf, sdf)
> {code}
> {code:java}
> +---+-+
> | id|value|
> +---+-+
> |  1| null|
> |  2| 42.0|
> |  3| null|
> +---+-+
> StructType([StructField('id', LongType(), True), StructField('value', 
> DoubleType(), True)])
> +---+-+
> | id|value|
> +---+-+
> |  1|  NaN|
> |  2| 42.0|
> |  3| null|
> +---+-+
> StructType([StructField('id', LongType(), True), StructField('value', 
> DoubleType(), True)])
> {code}
> this issue is due to that `createDataFrame` can't handle None/NaN properly:
> 1, in the conversion from local data to pd.DataFrame, it automatically 
> converts None to NaN
> 2, then in the conversion from pd.DataFrame to pa.Table, it always converts 
> NaN to null



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-41855) `createDataFrame` doesn't handle None/NaN properly

2023-01-02 Thread Ruifeng Zheng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng updated SPARK-41855:
--
Summary: `createDataFrame` doesn't handle None/NaN properly  (was: 
`createDataFrame` doesn't handle None properly)

> `createDataFrame` doesn't handle None/NaN properly
> --
>
> Key: SPARK-41855
> URL: https://issues.apache.org/jira/browse/SPARK-41855
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Priority: Major
>
> {code:python}
> data = [Row(id=1, value=float("NaN")), Row(id=2, value=42.0), 
> Row(id=3, value=None)]
> # +---+-+
> # | id|value|
> # +---+-+
> # |  1|  NaN|
> # |  2| 42.0|
> # |  3| null|
> # +---+-+
> cdf = self.connect.createDataFrame(data)
> sdf = self.spark.createDataFrame(data)
> print()
> print()
> print(cdf._show_string(100, 100, False))
> print()
> print(cdf.schema)
> print()
> print(sdf._jdf.showString(100, 100, False))
> print()
> print(sdf.schema)
> self.compare_by_show(cdf, sdf)
> {code}
> {code:java}
> +---+-+
> | id|value|
> +---+-+
> |  1| null|
> |  2| 42.0|
> |  3| null|
> +---+-+
> StructType([StructField('id', LongType(), True), StructField('value', 
> DoubleType(), True)])
> +---+-+
> | id|value|
> +---+-+
> |  1|  NaN|
> |  2| 42.0|
> |  3| null|
> +---+-+
> StructType([StructField('id', LongType(), True), StructField('value', 
> DoubleType(), True)])
> {code}
> this issue is due to that `createDataFrame` can't handle None properly:
> 1, in the conversion from local data to pd.DataFrame, it automatically 
> converts None to NaN
> 2, then in the conversion from pd.DataFrame to pa.Table, it always converts 
> NaN to null



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-41855) `createDataFrame` doesn't handle None properly

2023-01-02 Thread Ruifeng Zheng (Jira)
Ruifeng Zheng created SPARK-41855:
-

 Summary: `createDataFrame` doesn't handle None properly
 Key: SPARK-41855
 URL: https://issues.apache.org/jira/browse/SPARK-41855
 Project: Spark
  Issue Type: Sub-task
  Components: Connect, PySpark
Affects Versions: 3.4.0
Reporter: Ruifeng Zheng



{code:python}
data = [Row(id=1, value=float("NaN")), Row(id=2, value=42.0), Row(id=3, 
value=None)]

# +---+-+
# | id|value|
# +---+-+
# |  1|  NaN|
# |  2| 42.0|
# |  3| null|
# +---+-+

cdf = self.connect.createDataFrame(data)
sdf = self.spark.createDataFrame(data)

print()
print()
print(cdf._show_string(100, 100, False))
print()
print(cdf.schema)
print()
print(sdf._jdf.showString(100, 100, False))
print()
print(sdf.schema)

self.compare_by_show(cdf, sdf)
{code}



{code:java}
+---+-+
| id|value|
+---+-+
|  1| null|
|  2| 42.0|
|  3| null|
+---+-+


StructType([StructField('id', LongType(), True), StructField('value', 
DoubleType(), True)])

+---+-+
| id|value|
+---+-+
|  1|  NaN|
|  2| 42.0|
|  3| null|
+---+-+


StructType([StructField('id', LongType(), True), StructField('value', 
DoubleType(), True)])

{code}



this issue is due to that `createDataFrame` can't handle None properly:

1, in the conversion from local data to pd.DataFrame, it automatically converts 
None to NaN
2, then in the conversion from pd.DataFrame to pa.Table, it always converts NaN 
to null



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41854) Automatic reformat/check python/setup.py

2023-01-02 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17653756#comment-17653756
 ] 

Apache Spark commented on SPARK-41854:
--

User 'HyukjinKwon' has created a pull request for this issue:
https://github.com/apache/spark/pull/39352

> Automatic reformat/check python/setup.py 
> -
>
> Key: SPARK-41854
> URL: https://issues.apache.org/jira/browse/SPARK-41854
> Project: Spark
>  Issue Type: Test
>  Components: Build, PySpark
>Affects Versions: 3.4.0
>Reporter: Hyukjin Kwon
>Priority: Major
>
> python/setup.py should be also reformatted via ./dev/reformat-python



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-41854) Automatic reformat/check python/setup.py

2023-01-02 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41854?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41854:


Assignee: Apache Spark

> Automatic reformat/check python/setup.py 
> -
>
> Key: SPARK-41854
> URL: https://issues.apache.org/jira/browse/SPARK-41854
> Project: Spark
>  Issue Type: Test
>  Components: Build, PySpark
>Affects Versions: 3.4.0
>Reporter: Hyukjin Kwon
>Assignee: Apache Spark
>Priority: Major
>
> python/setup.py should be also reformatted via ./dev/reformat-python



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-41854) Automatic reformat/check python/setup.py

2023-01-02 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41854?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41854:


Assignee: (was: Apache Spark)

> Automatic reformat/check python/setup.py 
> -
>
> Key: SPARK-41854
> URL: https://issues.apache.org/jira/browse/SPARK-41854
> Project: Spark
>  Issue Type: Test
>  Components: Build, PySpark
>Affects Versions: 3.4.0
>Reporter: Hyukjin Kwon
>Priority: Major
>
> python/setup.py should be also reformatted via ./dev/reformat-python



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-41854) Automatic reformat/check python/setup.py

2023-01-02 Thread Hyukjin Kwon (Jira)
Hyukjin Kwon created SPARK-41854:


 Summary: Automatic reformat/check python/setup.py 
 Key: SPARK-41854
 URL: https://issues.apache.org/jira/browse/SPARK-41854
 Project: Spark
  Issue Type: Test
  Components: Build, PySpark
Affects Versions: 3.4.0
Reporter: Hyukjin Kwon


python/setup.py should be also reformatted via ./dev/reformat-python



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-39995) PySpark installation doesn't support Scala 2.13 binaries

2023-01-02 Thread Hyukjin Kwon (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-39995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17653753#comment-17653753
 ] 

Hyukjin Kwon commented on SPARK-39995:
--

I think i will be able to pick this up before Spark 3.4.

> PySpark installation doesn't support Scala 2.13 binaries
> 
>
> Key: SPARK-39995
> URL: https://issues.apache.org/jira/browse/SPARK-39995
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 3.3.0
>Reporter: Oleksandr Shevchenko
>Priority: Major
>
> [PyPi|https://pypi.org/project/pyspark/] doesn't support Spark binary 
> [installation|https://spark.apache.org/docs/latest/api/python/getting_started/install.html#using-pypi]
>  for Scala 2.13.
> Currently, the setup 
> [script|https://github.com/apache/spark/blob/master/python/pyspark/install.py]
>  allows to set versions of Spark, Hadoop (PYSPARK_HADOOP_VERSION), and mirror 
> (PYSPARK_RELEASE_MIRROR) to download needed Spark binaries, but it's always 
> Scala 2.12 compatible binaries. There isn't any parameter to download 
> "spark-3.3.0-bin-hadoop3-scala2.13.tgz".
> It's possible to download Spark manually and set the needed SPARK_HOME, but 
> it's hard to use with pip or Poetry.
> Also, env vars (e.g. PYSPARK_HADOOP_VERSION) are easy to use with pip and CLI 
> but not possible with package managers like Poetry.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-39995) PySpark installation doesn't support Scala 2.13 binaries

2023-01-02 Thread Hyukjin Kwon (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-39995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17653752#comment-17653752
 ] 

Hyukjin Kwon commented on SPARK-39995:
--

For:

{quote}
Also, env vars (e.g. PYSPARK_HADOOP_VERSION) are easy to use with pip and CLI 
but not possible with package managers like Poetry.
{quote}

We can't do this because of the issue in pip itself, see SPARK-32837

> PySpark installation doesn't support Scala 2.13 binaries
> 
>
> Key: SPARK-39995
> URL: https://issues.apache.org/jira/browse/SPARK-39995
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 3.3.0
>Reporter: Oleksandr Shevchenko
>Priority: Major
>
> [PyPi|https://pypi.org/project/pyspark/] doesn't support Spark binary 
> [installation|https://spark.apache.org/docs/latest/api/python/getting_started/install.html#using-pypi]
>  for Scala 2.13.
> Currently, the setup 
> [script|https://github.com/apache/spark/blob/master/python/pyspark/install.py]
>  allows to set versions of Spark, Hadoop (PYSPARK_HADOOP_VERSION), and mirror 
> (PYSPARK_RELEASE_MIRROR) to download needed Spark binaries, but it's always 
> Scala 2.12 compatible binaries. There isn't any parameter to download 
> "spark-3.3.0-bin-hadoop3-scala2.13.tgz".
> It's possible to download Spark manually and set the needed SPARK_HOME, but 
> it's hard to use with pip or Poetry.
> Also, env vars (e.g. PYSPARK_HADOOP_VERSION) are easy to use with pip and CLI 
> but not possible with package managers like Poetry.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41852) Fix `pmod` function

2023-01-02 Thread Sandeep Singh (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17653750#comment-17653750
 ] 

Sandeep Singh commented on SPARK-41852:
---

[~podongfeng] these are from the doctests
{code:java}
>>> from pyspark.sql.functions import pmod
>>> df = spark.createDataFrame([
... (1.0, float('nan')), (float('nan'), 2.0), (10.0, 3.0),
... (float('nan'), float('nan')), (-3.0, 4.0), (-10.0, 3.0),
... (-5.0, -6.0), (7.0, -8.0), (1.0, 2.0)],
... ("a", "b"))
>>> df.select(pmod("a", "b")).show() {code}

> Fix `pmod` function
> ---
>
> Key: SPARK-41852
> URL: https://issues.apache.org/jira/browse/SPARK-41852
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.4.0
>Reporter: Sandeep Singh
>Priority: Major
> Fix For: 3.4.0
>
>
> {code:java}
> File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/functions.py", 
> line 622, in pyspark.sql.connect.functions.pmod
> Failed example:
>     df.select(pmod("a", "b")).show()
> Expected:
>     +--+
>     |pmod(a, b)|
>     +--+
>     |       NaN|
>     |       NaN|
>     |       1.0|
>     |       NaN|
>     |       1.0|
>     |       2.0|
>     |      -5.0|
>     |       7.0|
>     |       1.0|
>     +--+
> Got:
>     +--+
>     |pmod(a, b)|
>     +--+
>     |      null|
>     |      null|
>     |       1.0|
>     |      null|
>     |       1.0|
>     |       2.0|
>     |      -5.0|
>     |       7.0|
>     |       1.0|
>     +--+
>     {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41851) Fix `nanvl` function

2023-01-02 Thread Sandeep Singh (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17653751#comment-17653751
 ] 

Sandeep Singh commented on SPARK-41851:
---

[~podongfeng] 
{code:java}
>>> df = spark.createDataFrame([(1.0, float('nan')), (float('nan'), 2.0)], 
>>> ("a", "b"))
>>> df.select(nanvl("a", "b").alias("r1"), nanvl(df.a, 
>>> df.b).alias("r2")).collect() {code}

> Fix `nanvl` function
> 
>
> Key: SPARK-41851
> URL: https://issues.apache.org/jira/browse/SPARK-41851
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.4.0
>Reporter: Sandeep Singh
>Priority: Major
> Fix For: 3.4.0
>
>
> {code:java}
> File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/functions.py", 
> line 313, in pyspark.sql.connect.functions.nanvl
> Failed example:
>     df.select(nanvl("a", "b").alias("r1"), nanvl(df.a, 
> df.b).alias("r2")).collect()
> Expected:
>     [Row(r1=1.0, r2=1.0), Row(r1=2.0, r2=2.0)]
> Got:
>     [Row(r1=1.0, r2=1.0), Row(r1=nan, r2=nan)]{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41815) Column.isNull returns nan instead of None

2023-01-02 Thread Ruifeng Zheng (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17653748#comment-17653748
 ] 

Ruifeng Zheng commented on SPARK-41815:
---

similar to the issue in `createDataFrame` 
https://issues.apache.org/jira/browse/SPARK-41814



> Column.isNull returns nan instead of None
> -
>
> Key: SPARK-41815
> URL: https://issues.apache.org/jira/browse/SPARK-41815
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Hyukjin Kwon
>Priority: Major
>
> {code}
> File "/.../spark/python/pyspark/sql/connect/column.py", line 99, in 
> pyspark.sql.connect.column.Column.isNull
> Failed example:
> df.filter(df.height.isNull()).collect()
> Expected:
> [Row(name='Alice', height=None)]
> Got:
> [Row(name='Alice', height=nan)]
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-41814) Column.eqNullSafe fails on NaN comparison

2023-01-02 Thread Ruifeng Zheng (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17653736#comment-17653736
 ] 

Ruifeng Zheng edited comment on SPARK-41814 at 1/3/23 3:12 AM:
---

this issue is due to that `createDataFrame` can't handle NaN/None properly:
1, the conversion from rows to pd.DataFrame, which automatically convert None 
to NaN
2, then the conversion from pd.DataFrame to pa.Table, which convert NaN to null


was (Author: podongfeng):
this issue is due to that `createDataFrame` can't handle NaN/None properly:
1, the conversion from rows to pd.DataFrame, which automatically convert null 
to NaN
2, then the conversion from pd.DataFrame to pa.Table, which convert NaN to null

> Column.eqNullSafe fails on NaN comparison
> -
>
> Key: SPARK-41814
> URL: https://issues.apache.org/jira/browse/SPARK-41814
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Hyukjin Kwon
>Priority: Major
>
> {code}
> File "/.../spark/python/pyspark/sql/connect/column.py", line 115, in 
> pyspark.sql.connect.column.Column.eqNullSafe
> Failed example:
> df2.select(
> df2['value'].eqNullSafe(None),
> df2['value'].eqNullSafe(float('NaN')),
> df2['value'].eqNullSafe(42.0)
> ).show()
> Expected:
> ++---++
> |(value <=> NULL)|(value <=> NaN)|(value <=> 42.0)|
> ++---++
> |   false|   true|   false|
> |   false|  false|true|
> |true|  false|   false|
> ++---++
> Got:
> ++---++
> |(value <=> NULL)|(value <=> NaN)|(value <=> 42.0)|
> ++---++
> |true|  false|   false|
> |   false|  false|true|
> |true|  false|   false|
> ++---++
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41851) Fix `nanvl` function

2023-01-02 Thread Ruifeng Zheng (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17653746#comment-17653746
 ] 

Ruifeng Zheng commented on SPARK-41851:
---

could you please also provide the code to create the dataframe?

a known issue is that `session.createDataFrame` doesn't handle NaN/None 
correctly.

https://issues.apache.org/jira/browse/SPARK-41814

> Fix `nanvl` function
> 
>
> Key: SPARK-41851
> URL: https://issues.apache.org/jira/browse/SPARK-41851
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.4.0
>Reporter: Sandeep Singh
>Priority: Major
> Fix For: 3.4.0
>
>
> {code:java}
> File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/functions.py", 
> line 313, in pyspark.sql.connect.functions.nanvl
> Failed example:
>     df.select(nanvl("a", "b").alias("r1"), nanvl(df.a, 
> df.b).alias("r2")).collect()
> Expected:
>     [Row(r1=1.0, r2=1.0), Row(r1=2.0, r2=2.0)]
> Got:
>     [Row(r1=1.0, r2=1.0), Row(r1=nan, r2=nan)]{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-41814) Column.eqNullSafe fails on NaN comparison

2023-01-02 Thread Ruifeng Zheng (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17653736#comment-17653736
 ] 

Ruifeng Zheng edited comment on SPARK-41814 at 1/3/23 3:06 AM:
---

this issue is due to that `createDataFrame` can't handle NaN/None properly:
1, the conversion from rows to pd.DataFrame, which automatically convert null 
to NaN
2, then the conversion from pd.DataFrame to pa.Table, which convert NaN to null


was (Author: podongfeng):
this issue is due to:
1, the conversion from rows to pd.DataFrame, which automatically convert null 
to NaN
2, then the conversion from pd.DataFrame to pa.Table, which convert NaN to null

> Column.eqNullSafe fails on NaN comparison
> -
>
> Key: SPARK-41814
> URL: https://issues.apache.org/jira/browse/SPARK-41814
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Hyukjin Kwon
>Priority: Major
>
> {code}
> File "/.../spark/python/pyspark/sql/connect/column.py", line 115, in 
> pyspark.sql.connect.column.Column.eqNullSafe
> Failed example:
> df2.select(
> df2['value'].eqNullSafe(None),
> df2['value'].eqNullSafe(float('NaN')),
> df2['value'].eqNullSafe(42.0)
> ).show()
> Expected:
> ++---++
> |(value <=> NULL)|(value <=> NaN)|(value <=> 42.0)|
> ++---++
> |   false|   true|   false|
> |   false|  false|true|
> |true|  false|   false|
> ++---++
> Got:
> ++---++
> |(value <=> NULL)|(value <=> NaN)|(value <=> 42.0)|
> ++---++
> |true|  false|   false|
> |   false|  false|true|
> |true|  false|   false|
> ++---++
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41852) Fix `pmod` function

2023-01-02 Thread Ruifeng Zheng (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17653743#comment-17653743
 ] 

Ruifeng Zheng commented on SPARK-41852:
---

could you please also provide the code to create the dataframe?

a known issue is that `session.createDataFrame` doesn't handle NaN/None 
correctly.

> Fix `pmod` function
> ---
>
> Key: SPARK-41852
> URL: https://issues.apache.org/jira/browse/SPARK-41852
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.4.0
>Reporter: Sandeep Singh
>Priority: Major
> Fix For: 3.4.0
>
>
> {code:java}
> File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/functions.py", 
> line 622, in pyspark.sql.connect.functions.pmod
> Failed example:
>     df.select(pmod("a", "b")).show()
> Expected:
>     +--+
>     |pmod(a, b)|
>     +--+
>     |       NaN|
>     |       NaN|
>     |       1.0|
>     |       NaN|
>     |       1.0|
>     |       2.0|
>     |      -5.0|
>     |       7.0|
>     |       1.0|
>     +--+
> Got:
>     +--+
>     |pmod(a, b)|
>     +--+
>     |      null|
>     |      null|
>     |       1.0|
>     |      null|
>     |       1.0|
>     |       2.0|
>     |      -5.0|
>     |       7.0|
>     |       1.0|
>     +--+
>     {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41853) Use Map in place of SortedMap for ErrorClassesJsonReader

2023-01-02 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17653741#comment-17653741
 ] 

Apache Spark commented on SPARK-41853:
--

User 'tedyu' has created a pull request for this issue:
https://github.com/apache/spark/pull/39351

> Use Map in place of SortedMap for ErrorClassesJsonReader
> 
>
> Key: SPARK-41853
> URL: https://issues.apache.org/jira/browse/SPARK-41853
> Project: Spark
>  Issue Type: Task
>  Components: Spark Core
>Affects Versions: 3.2.3
>Reporter: Ted Yu
>Priority: Minor
>
> The use of SortedMap in ErrorClassesJsonReader was mostly for making tests 
> easier to write.
> This PR replaces SortedMap with Map since SortedMap is slower compared to Map.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-41847) DataFrame mapfield,structlist invalid type

2023-01-02 Thread Sandeep Singh (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sandeep Singh updated SPARK-41847:
--
Description: 
{code:java}
File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/functions.py", 
line 1270, in pyspark.sql.connect.functions.explode
Failed example:
    eDF.select(explode(eDF.mapfield).alias("key", "value")).show()
Exception raised:
    Traceback (most recent call last):
      File 
"/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py",
 line 1350, in __run
        exec(compile(example.source, filename, "single",
      File "", line 1, in 

        eDF.select(explode(eDF.mapfield).alias("key", "value")).show()
      File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
line 534, in show
        print(self._show_string(n, truncate, vertical))
      File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
line 423, in _show_string
        ).toPandas()
      File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
line 1031, in toPandas
        return self._session.client.to_pandas(query)
      File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", line 
413, in to_pandas
        return self._execute_and_fetch(req)
      File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", line 
573, in _execute_and_fetch
        self._handle_error(rpc_error)
      File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", line 
619, in _handle_error
        raise SparkConnectAnalysisException(
    pyspark.sql.connect.client.SparkConnectAnalysisException: 
[INVALID_COLUMN_OR_FIELD_DATA_TYPE] Column or field `mapfield` is of type 
"STRUCT" while it's required to be "MAP".
    Plan:  {code}
{code:java}
File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/functions.py", 
line 1364, in pyspark.sql.connect.functions.inline
Failed example:
    df.select(inline(df.structlist)).show()
Exception raised:
    Traceback (most recent call last):
      File 
"/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py",
 line 1350, in __run
        exec(compile(example.source, filename, "single",
      File "", line 1, in 

        df.select(inline(df.structlist)).show()
      File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
line 534, in show
        print(self._show_string(n, truncate, vertical))
      File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
line 423, in _show_string
        ).toPandas()
      File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
line 1031, in toPandas
        return self._session.client.to_pandas(query)
      File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", line 
413, in to_pandas
        return self._execute_and_fetch(req)
      File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", line 
573, in _execute_and_fetch
        self._handle_error(rpc_error)
      File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", line 
619, in _handle_error
        raise SparkConnectAnalysisException(
    pyspark.sql.connect.client.SparkConnectAnalysisException: 
[INVALID_COLUMN_OR_FIELD_DATA_TYPE] Column or field `structlist`.`element` is 
of type "ARRAY" while it's required to be "STRUCT".
    Plan:  {code}
{code:java}
File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/functions.py", 
line 1411, in pyspark.sql.connect.functions.map_filter
Failed example:
    df.select(map_filter(
        "data", lambda _, v: v > 30.0).alias("data_filtered")
    ).show(truncate=False)
Exception raised:
    Traceback (most recent call last):
      File 
"/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py",
 line 1350, in __run
        exec(compile(example.source, filename, "single",
      File "", line 1, in 

        df.select(map_filter(
      File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
line 534, in show
        print(self._show_string(n, truncate, vertical))
      File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
line 423, in _show_string
        ).toPandas()
      File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
line 1031, in toPandas
        return self._session.client.to_pandas(query)
      File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", line 
413, in to_pandas
        return self._execute_and_fetch(req)
      File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", line 
573, in _execute_and_fetch
        self._handle_error(rpc_error)
 

[jira] [Commented] (SPARK-41853) Use Map in place of SortedMap for ErrorClassesJsonReader

2023-01-02 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17653740#comment-17653740
 ] 

Apache Spark commented on SPARK-41853:
--

User 'tedyu' has created a pull request for this issue:
https://github.com/apache/spark/pull/39351

> Use Map in place of SortedMap for ErrorClassesJsonReader
> 
>
> Key: SPARK-41853
> URL: https://issues.apache.org/jira/browse/SPARK-41853
> Project: Spark
>  Issue Type: Task
>  Components: Spark Core
>Affects Versions: 3.2.3
>Reporter: Ted Yu
>Priority: Minor
>
> The use of SortedMap in ErrorClassesJsonReader was mostly for making tests 
> easier to write.
> This PR replaces SortedMap with Map since SortedMap is slower compared to Map.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-41853) Use Map in place of SortedMap for ErrorClassesJsonReader

2023-01-02 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41853?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41853:


Assignee: Apache Spark

> Use Map in place of SortedMap for ErrorClassesJsonReader
> 
>
> Key: SPARK-41853
> URL: https://issues.apache.org/jira/browse/SPARK-41853
> Project: Spark
>  Issue Type: Task
>  Components: Spark Core
>Affects Versions: 3.2.3
>Reporter: Ted Yu
>Assignee: Apache Spark
>Priority: Minor
>
> The use of SortedMap in ErrorClassesJsonReader was mostly for making tests 
> easier to write.
> This PR replaces SortedMap with Map since SortedMap is slower compared to Map.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-41853) Use Map in place of SortedMap for ErrorClassesJsonReader

2023-01-02 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41853?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41853:


Assignee: (was: Apache Spark)

> Use Map in place of SortedMap for ErrorClassesJsonReader
> 
>
> Key: SPARK-41853
> URL: https://issues.apache.org/jira/browse/SPARK-41853
> Project: Spark
>  Issue Type: Task
>  Components: Spark Core
>Affects Versions: 3.2.3
>Reporter: Ted Yu
>Priority: Minor
>
> The use of SortedMap in ErrorClassesJsonReader was mostly for making tests 
> easier to write.
> This PR replaces SortedMap with Map since SortedMap is slower compared to Map.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-41853) Use Map in place of SortedMap for ErrorClassesJsonReader

2023-01-02 Thread Ted Yu (Jira)
Ted Yu created SPARK-41853:
--

 Summary: Use Map in place of SortedMap for ErrorClassesJsonReader
 Key: SPARK-41853
 URL: https://issues.apache.org/jira/browse/SPARK-41853
 Project: Spark
  Issue Type: Task
  Components: Spark Core
Affects Versions: 3.2.3
Reporter: Ted Yu


The use of SortedMap in ErrorClassesJsonReader was mostly for making tests 
easier to write.

This PR replaces SortedMap with Map since SortedMap is slower compared to Map.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-41852) Fix `pmod` function

2023-01-02 Thread Sandeep Singh (Jira)
Sandeep Singh created SPARK-41852:
-

 Summary: Fix `pmod` function
 Key: SPARK-41852
 URL: https://issues.apache.org/jira/browse/SPARK-41852
 Project: Spark
  Issue Type: Sub-task
  Components: Connect, PySpark
Affects Versions: 3.4.0
Reporter: Sandeep Singh
 Fix For: 3.4.0


{code:java}
File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/functions.py", 
line 313, in pyspark.sql.connect.functions.nanvl
Failed example:
    df.select(nanvl("a", "b").alias("r1"), nanvl(df.a, 
df.b).alias("r2")).collect()
Expected:
    [Row(r1=1.0, r2=1.0), Row(r1=2.0, r2=2.0)]
Got:
    [Row(r1=1.0, r2=1.0), Row(r1=nan, r2=nan)]{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-41852) Fix `pmod` function

2023-01-02 Thread Sandeep Singh (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sandeep Singh updated SPARK-41852:
--
Description: 
{code:java}
File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/functions.py", 
line 622, in pyspark.sql.connect.functions.pmod
Failed example:
    df.select(pmod("a", "b")).show()
Expected:
    +--+
    |pmod(a, b)|
    +--+
    |       NaN|
    |       NaN|
    |       1.0|
    |       NaN|
    |       1.0|
    |       2.0|
    |      -5.0|
    |       7.0|
    |       1.0|
    +--+
Got:
    +--+
    |pmod(a, b)|
    +--+
    |      null|
    |      null|
    |       1.0|
    |      null|
    |       1.0|
    |       2.0|
    |      -5.0|
    |       7.0|
    |       1.0|
    +--+
    {code}

  was:
{code:java}
File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/functions.py", 
line 313, in pyspark.sql.connect.functions.nanvl
Failed example:
    df.select(nanvl("a", "b").alias("r1"), nanvl(df.a, 
df.b).alias("r2")).collect()
Expected:
    [Row(r1=1.0, r2=1.0), Row(r1=2.0, r2=2.0)]
Got:
    [Row(r1=1.0, r2=1.0), Row(r1=nan, r2=nan)]{code}


> Fix `pmod` function
> ---
>
> Key: SPARK-41852
> URL: https://issues.apache.org/jira/browse/SPARK-41852
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.4.0
>Reporter: Sandeep Singh
>Priority: Major
> Fix For: 3.4.0
>
>
> {code:java}
> File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/functions.py", 
> line 622, in pyspark.sql.connect.functions.pmod
> Failed example:
>     df.select(pmod("a", "b")).show()
> Expected:
>     +--+
>     |pmod(a, b)|
>     +--+
>     |       NaN|
>     |       NaN|
>     |       1.0|
>     |       NaN|
>     |       1.0|
>     |       2.0|
>     |      -5.0|
>     |       7.0|
>     |       1.0|
>     +--+
> Got:
>     +--+
>     |pmod(a, b)|
>     +--+
>     |      null|
>     |      null|
>     |       1.0|
>     |      null|
>     |       1.0|
>     |       2.0|
>     |      -5.0|
>     |       7.0|
>     |       1.0|
>     +--+
>     {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-41848) Tasks are over-scheduled with TaskResourceProfile

2023-01-02 Thread wuyi (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wuyi updated SPARK-41848:
-
Priority: Blocker  (was: Major)

> Tasks are over-scheduled with TaskResourceProfile
> -
>
> Key: SPARK-41848
> URL: https://issues.apache.org/jira/browse/SPARK-41848
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.4.0
>Reporter: wuyi
>Priority: Blocker
>
> {code:java}
> test("SPARK-XXX") {
>   val conf = new 
> SparkConf().setAppName("test").setMaster("local-cluster[1,4,1024]")
>   sc = new SparkContext(conf)
>   val req = new TaskResourceRequests().cpus(3)
>   val rp = new ResourceProfileBuilder().require(req).build()
>   val res = sc.parallelize(Seq(0, 1), 2).withResources(rp).map { x =>
> Thread.sleep(5000)
> x * 2
>   }.collect()
>   assert(res === Array(0, 2))
> } {code}
> In this test, tasks are supposed to be scheduled in order since each task 
> requires 3 cores but the executor only has 4 cores. However, we noticed 2 
> tasks are launched concurrently from the logs.
> It turns out that we used the TaskResourceProfile (taskCpus=3) of the taskset 
> for task scheduling:
> {code:java}
> val rpId = taskSet.taskSet.resourceProfileId
> val taskSetProf = sc.resourceProfileManager.resourceProfileFromId(rpId)
> val taskCpus = ResourceProfile.getTaskCpusOrDefaultForProfile(taskSetProf, 
> conf) {code}
> but the ResourceProfile (taskCpus=1) of the executor for updating the free 
> cores in ExecutorData:
> {code:java}
> val rpId = executorData.resourceProfileId
> val prof = scheduler.sc.resourceProfileManager.resourceProfileFromId(rpId)
> val taskCpus = ResourceProfile.getTaskCpusOrDefaultForProfile(prof, conf)
> executorData.freeCores -= taskCpus {code}
> which results in the inconsistency of the available cores.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41848) Tasks are over-scheduled with TaskResourceProfile

2023-01-02 Thread wuyi (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17653739#comment-17653739
 ] 

wuyi commented on SPARK-41848:
--

cc [~ivoson] 

> Tasks are over-scheduled with TaskResourceProfile
> -
>
> Key: SPARK-41848
> URL: https://issues.apache.org/jira/browse/SPARK-41848
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.4.0
>Reporter: wuyi
>Priority: Major
>
> {code:java}
> test("SPARK-XXX") {
>   val conf = new 
> SparkConf().setAppName("test").setMaster("local-cluster[1,4,1024]")
>   sc = new SparkContext(conf)
>   val req = new TaskResourceRequests().cpus(3)
>   val rp = new ResourceProfileBuilder().require(req).build()
>   val res = sc.parallelize(Seq(0, 1), 2).withResources(rp).map { x =>
> Thread.sleep(5000)
> x * 2
>   }.collect()
>   assert(res === Array(0, 2))
> } {code}
> In this test, tasks are supposed to be scheduled in order since each task 
> requires 3 cores but the executor only has 4 cores. However, we noticed 2 
> tasks are launched concurrently from the logs.
> It turns out that we used the TaskResourceProfile (taskCpus=3) of the taskset 
> for task scheduling:
> {code:java}
> val rpId = taskSet.taskSet.resourceProfileId
> val taskSetProf = sc.resourceProfileManager.resourceProfileFromId(rpId)
> val taskCpus = ResourceProfile.getTaskCpusOrDefaultForProfile(taskSetProf, 
> conf) {code}
> but the ResourceProfile (taskCpus=1) of the executor for updating the free 
> cores in ExecutorData:
> {code:java}
> val rpId = executorData.resourceProfileId
> val prof = scheduler.sc.resourceProfileManager.resourceProfileFromId(rpId)
> val taskCpus = ResourceProfile.getTaskCpusOrDefaultForProfile(prof, conf)
> executorData.freeCores -= taskCpus {code}
> which results in the inconsistency of the available cores.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-41851) Fix `nanvl` function

2023-01-02 Thread Sandeep Singh (Jira)
Sandeep Singh created SPARK-41851:
-

 Summary: Fix `nanvl` function
 Key: SPARK-41851
 URL: https://issues.apache.org/jira/browse/SPARK-41851
 Project: Spark
  Issue Type: Sub-task
  Components: Connect, PySpark
Affects Versions: 3.4.0
Reporter: Sandeep Singh
 Fix For: 3.4.0


{code:java}
File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/functions.py", 
line 801, in pyspark.sql.connect.functions.count
Failed example:
    df.select(count(expr("*")), count(df.alphabets)).show()
Expected:
    +++
    |count(1)|count(alphabets)|
    +++
    |       4|               3|
    +++
Got:
    +++
    |count(alphabets)|count(alphabets)|
    +++
    |               3|               3|
    +++
     {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-41851) Fix `nanvl` function

2023-01-02 Thread Sandeep Singh (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sandeep Singh updated SPARK-41851:
--
Description: 
{code:java}
File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/functions.py", 
line 313, in pyspark.sql.connect.functions.nanvl
Failed example:
    df.select(nanvl("a", "b").alias("r1"), nanvl(df.a, 
df.b).alias("r2")).collect()
Expected:
    [Row(r1=1.0, r2=1.0), Row(r1=2.0, r2=2.0)]
Got:
    [Row(r1=1.0, r2=1.0), Row(r1=nan, r2=nan)]{code}

  was:
{code:java}
File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/functions.py", 
line 801, in pyspark.sql.connect.functions.count
Failed example:
    df.select(count(expr("*")), count(df.alphabets)).show()
Expected:
    +++
    |count(1)|count(alphabets)|
    +++
    |       4|               3|
    +++
Got:
    +++
    |count(alphabets)|count(alphabets)|
    +++
    |               3|               3|
    +++
     {code}


> Fix `nanvl` function
> 
>
> Key: SPARK-41851
> URL: https://issues.apache.org/jira/browse/SPARK-41851
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.4.0
>Reporter: Sandeep Singh
>Priority: Major
> Fix For: 3.4.0
>
>
> {code:java}
> File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/functions.py", 
> line 313, in pyspark.sql.connect.functions.nanvl
> Failed example:
>     df.select(nanvl("a", "b").alias("r1"), nanvl(df.a, 
> df.b).alias("r2")).collect()
> Expected:
>     [Row(r1=1.0, r2=1.0), Row(r1=2.0, r2=2.0)]
> Got:
>     [Row(r1=1.0, r2=1.0), Row(r1=nan, r2=nan)]{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-41847) DataFrame mapfield,structlist invalid type

2023-01-02 Thread Sandeep Singh (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sandeep Singh updated SPARK-41847:
--
Description: 
{code:java}
File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/functions.py", 
line 1270, in pyspark.sql.connect.functions.explode
Failed example:
    eDF.select(explode(eDF.mapfield).alias("key", "value")).show()
Exception raised:
    Traceback (most recent call last):
      File 
"/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py",
 line 1350, in __run
        exec(compile(example.source, filename, "single",
      File "", line 1, in 

        eDF.select(explode(eDF.mapfield).alias("key", "value")).show()
      File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
line 534, in show
        print(self._show_string(n, truncate, vertical))
      File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
line 423, in _show_string
        ).toPandas()
      File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
line 1031, in toPandas
        return self._session.client.to_pandas(query)
      File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", line 
413, in to_pandas
        return self._execute_and_fetch(req)
      File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", line 
573, in _execute_and_fetch
        self._handle_error(rpc_error)
      File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", line 
619, in _handle_error
        raise SparkConnectAnalysisException(
    pyspark.sql.connect.client.SparkConnectAnalysisException: 
[INVALID_COLUMN_OR_FIELD_DATA_TYPE] Column or field `mapfield` is of type 
"STRUCT" while it's required to be "MAP".
    Plan:  {code}
{code:java}
File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/functions.py", 
line 1364, in pyspark.sql.connect.functions.inline
Failed example:
    df.select(inline(df.structlist)).show()
Exception raised:
    Traceback (most recent call last):
      File 
"/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py",
 line 1350, in __run
        exec(compile(example.source, filename, "single",
      File "", line 1, in 

        df.select(inline(df.structlist)).show()
      File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
line 534, in show
        print(self._show_string(n, truncate, vertical))
      File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
line 423, in _show_string
        ).toPandas()
      File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
line 1031, in toPandas
        return self._session.client.to_pandas(query)
      File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", line 
413, in to_pandas
        return self._execute_and_fetch(req)
      File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", line 
573, in _execute_and_fetch
        self._handle_error(rpc_error)
      File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", line 
619, in _handle_error
        raise SparkConnectAnalysisException(
    pyspark.sql.connect.client.SparkConnectAnalysisException: 
[INVALID_COLUMN_OR_FIELD_DATA_TYPE] Column or field `structlist`.`element` is 
of type "ARRAY" while it's required to be "STRUCT".
    Plan:  {code}
{code:java}
File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/functions.py", 
line 1411, in pyspark.sql.connect.functions.map_filter
Failed example:
    df.select(map_filter(
        "data", lambda _, v: v > 30.0).alias("data_filtered")
    ).show(truncate=False)
Exception raised:
    Traceback (most recent call last):
      File 
"/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py",
 line 1350, in __run
        exec(compile(example.source, filename, "single",
      File "", line 1, in 

        df.select(map_filter(
      File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
line 534, in show
        print(self._show_string(n, truncate, vertical))
      File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
line 423, in _show_string
        ).toPandas()
      File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
line 1031, in toPandas
        return self._session.client.to_pandas(query)
      File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", line 
413, in to_pandas
        return self._execute_and_fetch(req)
      File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", line 
573, in _execute_and_fetch
        self._handle_error(rpc_error)
 

[jira] [Commented] (SPARK-41850) Fix `isnan` function

2023-01-02 Thread Sandeep Singh (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17653738#comment-17653738
 ] 

Sandeep Singh commented on SPARK-41850:
---

This should be moved under SPARK-41283

> Fix `isnan` function
> 
>
> Key: SPARK-41850
> URL: https://issues.apache.org/jira/browse/SPARK-41850
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Sandeep Singh
>Priority: Major
>
> {code:java}
> File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/functions.py", 
> line 288, in pyspark.sql.connect.functions.isnan
> Failed example:
>     df.select("a", "b", isnan("a").alias("r1"), 
> isnan(df.b).alias("r2")).show()
> Expected:
>     +---+---+-+-+
>     |  a|  b|   r1|   r2|
>     +---+---+-+-+
>     |1.0|NaN|false| true|
>     |NaN|2.0| true|false|
>     +---+---+-+-+
> Got:
>     +++-+-+
>     |   a|   b|   r1|   r2|
>     +++-+-+
>     | 1.0|null|false|false|
>     |null| 2.0|false|false|
>     +++-+-+
>     {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-41850) Fix `isnan` function

2023-01-02 Thread Sandeep Singh (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sandeep Singh updated SPARK-41850:
--
Description: 
{code:java}
File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/functions.py", 
line 288, in pyspark.sql.connect.functions.isnan
Failed example:
    df.select("a", "b", isnan("a").alias("r1"), isnan(df.b).alias("r2")).show()
Expected:
    +---+---+-+-+
    |  a|  b|   r1|   r2|
    +---+---+-+-+
    |1.0|NaN|false| true|
    |NaN|2.0| true|false|
    +---+---+-+-+
Got:
    +++-+-+
    |   a|   b|   r1|   r2|
    +++-+-+
    | 1.0|null|false|false|
    |null| 2.0|false|false|
    +++-+-+
    {code}

  was:
{code:java}
File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/functions.py", 
line 276, in pyspark.sql.connect.functions.input_file_name
Failed example:
    df = spark.read.text(path)
Exception raised:
    Traceback (most recent call last):
      File 
"/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py",
 line 1350, in __run
        exec(compile(example.source, filename, "single",
      File "", line 
1, in 
        df = spark.read.text(path)
    AttributeError: 'DataFrameReader' object has no attribute 'text'{code}


> Fix `isnan` function
> 
>
> Key: SPARK-41850
> URL: https://issues.apache.org/jira/browse/SPARK-41850
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Sandeep Singh
>Priority: Major
>
> {code:java}
> File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/functions.py", 
> line 288, in pyspark.sql.connect.functions.isnan
> Failed example:
>     df.select("a", "b", isnan("a").alias("r1"), 
> isnan(df.b).alias("r2")).show()
> Expected:
>     +---+---+-+-+
>     |  a|  b|   r1|   r2|
>     +---+---+-+-+
>     |1.0|NaN|false| true|
>     |NaN|2.0| true|false|
>     +---+---+-+-+
> Got:
>     +++-+-+
>     |   a|   b|   r1|   r2|
>     +++-+-+
>     | 1.0|null|false|false|
>     |null| 2.0|false|false|
>     +++-+-+
>     {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-41850) Fix `isnan` function

2023-01-02 Thread Sandeep Singh (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sandeep Singh updated SPARK-41850:
--
Summary: Fix `isnan` function  (was: Fix DataFrameReader.isnan)

> Fix `isnan` function
> 
>
> Key: SPARK-41850
> URL: https://issues.apache.org/jira/browse/SPARK-41850
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Sandeep Singh
>Priority: Major
>
> {code:java}
> File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/functions.py", 
> line 276, in pyspark.sql.connect.functions.input_file_name
> Failed example:
>     df = spark.read.text(path)
> Exception raised:
>     Traceback (most recent call last):
>       File 
> "/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py",
>  line 1350, in __run
>         exec(compile(example.source, filename, "single",
>       File "", line 
> 1, in 
>         df = spark.read.text(path)
>     AttributeError: 'DataFrameReader' object has no attribute 'text'{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-41850) Fix DataFrameReader.isnan

2023-01-02 Thread Sandeep Singh (Jira)
Sandeep Singh created SPARK-41850:
-

 Summary: Fix DataFrameReader.isnan
 Key: SPARK-41850
 URL: https://issues.apache.org/jira/browse/SPARK-41850
 Project: Spark
  Issue Type: Sub-task
  Components: Connect
Affects Versions: 3.4.0
Reporter: Sandeep Singh


{code:java}
File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/functions.py", 
line 276, in pyspark.sql.connect.functions.input_file_name
Failed example:
    df = spark.read.text(path)
Exception raised:
    Traceback (most recent call last):
      File 
"/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py",
 line 1350, in __run
        exec(compile(example.source, filename, "single",
      File "", line 
1, in 
        df = spark.read.text(path)
    AttributeError: 'DataFrameReader' object has no attribute 'text'{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41814) Column.eqNullSafe fails on NaN comparison

2023-01-02 Thread Ruifeng Zheng (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17653736#comment-17653736
 ] 

Ruifeng Zheng commented on SPARK-41814:
---

this issue is due to:
1, the conversion from rows to pd.DataFrame, which automatically convert null 
to NaN
2, then the conversion from pd.DataFrame to pa.Table, which convert NaN to null

> Column.eqNullSafe fails on NaN comparison
> -
>
> Key: SPARK-41814
> URL: https://issues.apache.org/jira/browse/SPARK-41814
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Hyukjin Kwon
>Priority: Major
>
> {code}
> File "/.../spark/python/pyspark/sql/connect/column.py", line 115, in 
> pyspark.sql.connect.column.Column.eqNullSafe
> Failed example:
> df2.select(
> df2['value'].eqNullSafe(None),
> df2['value'].eqNullSafe(float('NaN')),
> df2['value'].eqNullSafe(42.0)
> ).show()
> Expected:
> ++---++
> |(value <=> NULL)|(value <=> NaN)|(value <=> 42.0)|
> ++---++
> |   false|   true|   false|
> |   false|  false|true|
> |true|  false|   false|
> ++---++
> Got:
> ++---++
> |(value <=> NULL)|(value <=> NaN)|(value <=> 42.0)|
> ++---++
> |true|  false|   false|
> |   false|  false|true|
> |true|  false|   false|
> ++---++
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-41849) Implement DataFrameReader.text

2023-01-02 Thread Sandeep Singh (Jira)
Sandeep Singh created SPARK-41849:
-

 Summary: Implement DataFrameReader.text
 Key: SPARK-41849
 URL: https://issues.apache.org/jira/browse/SPARK-41849
 Project: Spark
  Issue Type: Sub-task
  Components: Connect
Affects Versions: 3.4.0
Reporter: Sandeep Singh


{code:java}
File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/functions.py", 
line 1270, in pyspark.sql.connect.functions.explode
Failed example:
    eDF.select(explode(eDF.mapfield).alias("key", "value")).show()
Exception raised:
    Traceback (most recent call last):
      File 
"/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py",
 line 1350, in __run
        exec(compile(example.source, filename, "single",
      File "", line 1, in 

        eDF.select(explode(eDF.mapfield).alias("key", "value")).show()
      File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
line 534, in show
        print(self._show_string(n, truncate, vertical))
      File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
line 423, in _show_string
        ).toPandas()
      File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
line 1031, in toPandas
        return self._session.client.to_pandas(query)
      File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", line 
413, in to_pandas
        return self._execute_and_fetch(req)
      File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", line 
573, in _execute_and_fetch
        self._handle_error(rpc_error)
      File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", line 
619, in _handle_error
        raise SparkConnectAnalysisException(
    pyspark.sql.connect.client.SparkConnectAnalysisException: 
[INVALID_COLUMN_OR_FIELD_DATA_TYPE] Column or field `mapfield` is of type 
"STRUCT" while it's required to be "MAP".
    Plan:  {code}
{code:java}
File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/functions.py", 
line 1364, in pyspark.sql.connect.functions.inline
Failed example:
    df.select(inline(df.structlist)).show()
Exception raised:
    Traceback (most recent call last):
      File 
"/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py",
 line 1350, in __run
        exec(compile(example.source, filename, "single",
      File "", line 1, in 

        df.select(inline(df.structlist)).show()
      File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
line 534, in show
        print(self._show_string(n, truncate, vertical))
      File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
line 423, in _show_string
        ).toPandas()
      File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
line 1031, in toPandas
        return self._session.client.to_pandas(query)
      File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", line 
413, in to_pandas
        return self._execute_and_fetch(req)
      File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", line 
573, in _execute_and_fetch
        self._handle_error(rpc_error)
      File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", line 
619, in _handle_error
        raise SparkConnectAnalysisException(
    pyspark.sql.connect.client.SparkConnectAnalysisException: 
[INVALID_COLUMN_OR_FIELD_DATA_TYPE] Column or field `structlist`.`element` is 
of type "ARRAY" while it's required to be "STRUCT".
    Plan:  {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-41849) Implement DataFrameReader.text

2023-01-02 Thread Sandeep Singh (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sandeep Singh updated SPARK-41849:
--
Description: 
{code:java}
File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/functions.py", 
line 276, in pyspark.sql.connect.functions.input_file_name
Failed example:
    df = spark.read.text(path)
Exception raised:
    Traceback (most recent call last):
      File 
"/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py",
 line 1350, in __run
        exec(compile(example.source, filename, "single",
      File "", line 
1, in 
        df = spark.read.text(path)
    AttributeError: 'DataFrameReader' object has no attribute 'text'{code}

  was:
{code:java}
File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/functions.py", 
line 1270, in pyspark.sql.connect.functions.explode
Failed example:
    eDF.select(explode(eDF.mapfield).alias("key", "value")).show()
Exception raised:
    Traceback (most recent call last):
      File 
"/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py",
 line 1350, in __run
        exec(compile(example.source, filename, "single",
      File "", line 1, in 

        eDF.select(explode(eDF.mapfield).alias("key", "value")).show()
      File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
line 534, in show
        print(self._show_string(n, truncate, vertical))
      File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
line 423, in _show_string
        ).toPandas()
      File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
line 1031, in toPandas
        return self._session.client.to_pandas(query)
      File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", line 
413, in to_pandas
        return self._execute_and_fetch(req)
      File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", line 
573, in _execute_and_fetch
        self._handle_error(rpc_error)
      File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", line 
619, in _handle_error
        raise SparkConnectAnalysisException(
    pyspark.sql.connect.client.SparkConnectAnalysisException: 
[INVALID_COLUMN_OR_FIELD_DATA_TYPE] Column or field `mapfield` is of type 
"STRUCT" while it's required to be "MAP".
    Plan:  {code}
{code:java}
File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/functions.py", 
line 1364, in pyspark.sql.connect.functions.inline
Failed example:
    df.select(inline(df.structlist)).show()
Exception raised:
    Traceback (most recent call last):
      File 
"/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py",
 line 1350, in __run
        exec(compile(example.source, filename, "single",
      File "", line 1, in 

        df.select(inline(df.structlist)).show()
      File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
line 534, in show
        print(self._show_string(n, truncate, vertical))
      File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
line 423, in _show_string
        ).toPandas()
      File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
line 1031, in toPandas
        return self._session.client.to_pandas(query)
      File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", line 
413, in to_pandas
        return self._execute_and_fetch(req)
      File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", line 
573, in _execute_and_fetch
        self._handle_error(rpc_error)
      File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", line 
619, in _handle_error
        raise SparkConnectAnalysisException(
    pyspark.sql.connect.client.SparkConnectAnalysisException: 
[INVALID_COLUMN_OR_FIELD_DATA_TYPE] Column or field `structlist`.`element` is 
of type "ARRAY" while it's required to be "STRUCT".
    Plan:  {code}


> Implement DataFrameReader.text
> --
>
> Key: SPARK-41849
> URL: https://issues.apache.org/jira/browse/SPARK-41849
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Sandeep Singh
>Priority: Major
>
> {code:java}
> File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/functions.py", 
> line 276, in pyspark.sql.connect.functions.input_file_name
> Failed example:
>     df = spark.read.text(path)
> Exception raised:
>     Traceback (most recent call last):
>       File 
> "/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py",
>  

[jira] [Created] (SPARK-41848) Tasks are over-scheduled with TaskResourceProfile

2023-01-02 Thread wuyi (Jira)
wuyi created SPARK-41848:


 Summary: Tasks are over-scheduled with TaskResourceProfile
 Key: SPARK-41848
 URL: https://issues.apache.org/jira/browse/SPARK-41848
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 3.4.0
Reporter: wuyi


{code:java}
test("SPARK-XXX") {
  val conf = new 
SparkConf().setAppName("test").setMaster("local-cluster[1,4,1024]")
  sc = new SparkContext(conf)
  val req = new TaskResourceRequests().cpus(3)
  val rp = new ResourceProfileBuilder().require(req).build()

  val res = sc.parallelize(Seq(0, 1), 2).withResources(rp).map { x =>
Thread.sleep(5000)
x * 2
  }.collect()
  assert(res === Array(0, 2))
} {code}
In this test, tasks are supposed to be scheduled in order since each task 
requires 3 cores but the executor only has 4 cores. However, we noticed 2 tasks 
are launched concurrently from the logs.

It turns out that we used the TaskResourceProfile (taskCpus=3) of the taskset 
for task scheduling:
{code:java}
val rpId = taskSet.taskSet.resourceProfileId
val taskSetProf = sc.resourceProfileManager.resourceProfileFromId(rpId)
val taskCpus = ResourceProfile.getTaskCpusOrDefaultForProfile(taskSetProf, 
conf) {code}
but the ResourceProfile (taskCpus=1) of the executor for updating the free 
cores in ExecutorData:
{code:java}
val rpId = executorData.resourceProfileId
val prof = scheduler.sc.resourceProfileManager.resourceProfileFromId(rpId)
val taskCpus = ResourceProfile.getTaskCpusOrDefaultForProfile(prof, conf)
executorData.freeCores -= taskCpus {code}
which results in the inconsistency of the available cores.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-41847) DataFrame mapfield,structlist invalid type

2023-01-02 Thread Sandeep Singh (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sandeep Singh updated SPARK-41847:
--
Description: 
{code:java}
File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/functions.py", 
line 1270, in pyspark.sql.connect.functions.explode
Failed example:
    eDF.select(explode(eDF.mapfield).alias("key", "value")).show()
Exception raised:
    Traceback (most recent call last):
      File 
"/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py",
 line 1350, in __run
        exec(compile(example.source, filename, "single",
      File "", line 1, in 

        eDF.select(explode(eDF.mapfield).alias("key", "value")).show()
      File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
line 534, in show
        print(self._show_string(n, truncate, vertical))
      File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
line 423, in _show_string
        ).toPandas()
      File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
line 1031, in toPandas
        return self._session.client.to_pandas(query)
      File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", line 
413, in to_pandas
        return self._execute_and_fetch(req)
      File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", line 
573, in _execute_and_fetch
        self._handle_error(rpc_error)
      File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", line 
619, in _handle_error
        raise SparkConnectAnalysisException(
    pyspark.sql.connect.client.SparkConnectAnalysisException: 
[INVALID_COLUMN_OR_FIELD_DATA_TYPE] Column or field `mapfield` is of type 
"STRUCT" while it's required to be "MAP".
    Plan:  {code}
{code:java}
File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/functions.py", 
line 1364, in pyspark.sql.connect.functions.inline
Failed example:
    df.select(inline(df.structlist)).show()
Exception raised:
    Traceback (most recent call last):
      File 
"/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py",
 line 1350, in __run
        exec(compile(example.source, filename, "single",
      File "", line 1, in 

        df.select(inline(df.structlist)).show()
      File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
line 534, in show
        print(self._show_string(n, truncate, vertical))
      File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
line 423, in _show_string
        ).toPandas()
      File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
line 1031, in toPandas
        return self._session.client.to_pandas(query)
      File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", line 
413, in to_pandas
        return self._execute_and_fetch(req)
      File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", line 
573, in _execute_and_fetch
        self._handle_error(rpc_error)
      File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", line 
619, in _handle_error
        raise SparkConnectAnalysisException(
    pyspark.sql.connect.client.SparkConnectAnalysisException: 
[INVALID_COLUMN_OR_FIELD_DATA_TYPE] Column or field `structlist`.`element` is 
of type "ARRAY" while it's required to be "STRUCT".
    Plan:  {code}

  was:
{code:java}
File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/functions.py", 
line 1270, in pyspark.sql.connect.functions.explode
Failed example:
    eDF.select(explode(eDF.mapfield).alias("key", "value")).show()
Exception raised:
    Traceback (most recent call last):
      File 
"/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py",
 line 1350, in __run
        exec(compile(example.source, filename, "single",
      File "", line 1, in 

        eDF.select(explode(eDF.mapfield).alias("key", "value")).show()
      File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
line 534, in show
        print(self._show_string(n, truncate, vertical))
      File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
line 423, in _show_string
        ).toPandas()
      File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
line 1031, in toPandas
        return self._session.client.to_pandas(query)
      File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", line 
413, in to_pandas
        return self._execute_and_fetch(req)
      File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", line 
573, in _execute_and_fetch
        self._handle_error(rpc_error)
   

[jira] [Updated] (SPARK-41847) DataFrame mapfield,structlist invalid type

2023-01-02 Thread Sandeep Singh (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sandeep Singh updated SPARK-41847:
--
Summary: DataFrame mapfield,structlist invalid type  (was: DataFrame 
mapfield invalid type)

> DataFrame mapfield,structlist invalid type
> --
>
> Key: SPARK-41847
> URL: https://issues.apache.org/jira/browse/SPARK-41847
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Sandeep Singh
>Priority: Major
>
> {code:java}
> File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/functions.py", 
> line 1270, in pyspark.sql.connect.functions.explode
> Failed example:
>     eDF.select(explode(eDF.mapfield).alias("key", "value")).show()
> Exception raised:
>     Traceback (most recent call last):
>       File 
> "/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py",
>  line 1350, in __run
>         exec(compile(example.source, filename, "single",
>       File "", line 1, in 
> 
>         eDF.select(explode(eDF.mapfield).alias("key", "value")).show()
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
> line 534, in show
>         print(self._show_string(n, truncate, vertical))
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
> line 423, in _show_string
>         ).toPandas()
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
> line 1031, in toPandas
>         return self._session.client.to_pandas(query)
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", 
> line 413, in to_pandas
>         return self._execute_and_fetch(req)
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", 
> line 573, in _execute_and_fetch
>         self._handle_error(rpc_error)
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", 
> line 619, in _handle_error
>         raise SparkConnectAnalysisException(
>     pyspark.sql.connect.client.SparkConnectAnalysisException: 
> [INVALID_COLUMN_OR_FIELD_DATA_TYPE] Column or field `mapfield` is of type 
> "STRUCT" while it's required to be "MAP".
>     Plan:  {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-41847) DataFrame mapfield invalid type

2023-01-02 Thread Sandeep Singh (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sandeep Singh updated SPARK-41847:
--
Description: 
{code:java}
File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/functions.py", 
line 1270, in pyspark.sql.connect.functions.explode
Failed example:
    eDF.select(explode(eDF.mapfield).alias("key", "value")).show()
Exception raised:
    Traceback (most recent call last):
      File 
"/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py",
 line 1350, in __run
        exec(compile(example.source, filename, "single",
      File "", line 1, in 

        eDF.select(explode(eDF.mapfield).alias("key", "value")).show()
      File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
line 534, in show
        print(self._show_string(n, truncate, vertical))
      File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
line 423, in _show_string
        ).toPandas()
      File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
line 1031, in toPandas
        return self._session.client.to_pandas(query)
      File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", line 
413, in to_pandas
        return self._execute_and_fetch(req)
      File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", line 
573, in _execute_and_fetch
        self._handle_error(rpc_error)
      File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", line 
619, in _handle_error
        raise SparkConnectAnalysisException(
    pyspark.sql.connect.client.SparkConnectAnalysisException: 
[INVALID_COLUMN_OR_FIELD_DATA_TYPE] Column or field `mapfield` is of type 
"STRUCT" while it's required to be "MAP".
    Plan:  {code}

  was:
{code:java}
File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/functions.py", 
line 1098, in pyspark.sql.connect.functions.rank
Failed example:
    df.withColumn("drank", rank().over(w)).show()
Exception raised:
    Traceback (most recent call last):
      File 
"/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py",
 line 1350, in __run
        exec(compile(example.source, filename, "single",
      File "", line 1, in 

        df.withColumn("drank", rank().over(w)).show()
      File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
line 534, in show
        print(self._show_string(n, truncate, vertical))
      File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
line 423, in _show_string
        ).toPandas()
      File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
line 1031, in toPandas
        return self._session.client.to_pandas(query)
      File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", line 
413, in to_pandas
        return self._execute_and_fetch(req)
      File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", line 
573, in _execute_and_fetch
        self._handle_error(rpc_error)
      File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", line 
619, in _handle_error
        raise SparkConnectAnalysisException(
    pyspark.sql.connect.client.SparkConnectAnalysisException: 
[UNRESOLVED_COLUMN.WITH_SUGGESTION] A column or function parameter with name 
`value` cannot be resolved. Did you mean one of the following? [`_1`]
    Plan: 'Project [_1#4000L, rank() windowspecdefinition('value ASC NULLS 
FIRST, specifiedwindowframe(RowFrame, unboundedpreceding$(), currentrow$())) AS 
drank#4003]
    +- Project [0#3998L AS _1#4000L]
       +- LocalRelation [0#3998L] {code}
{code:java}
File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/functions.py", 
line 1032, in pyspark.sql.connect.functions.cume_dist
Failed example:
    df.withColumn("cd", cume_dist().over(w)).show()
Exception raised:
    Traceback (most recent call last):
      File 
"/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py",
 line 1350, in __run
        exec(compile(example.source, filename, "single",
      File "", line 1, in 

        df.withColumn("cd", cume_dist().over(w)).show()
      File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
line 534, in show
        print(self._show_string(n, truncate, vertical))
      File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
line 423, in _show_string
        ).toPandas()
      File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
line 1031, in toPandas
        return self._session.client.to_pandas(query)
      File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", 

[jira] [Created] (SPARK-41847) DataFrame mapfield invalid type

2023-01-02 Thread Sandeep Singh (Jira)
Sandeep Singh created SPARK-41847:
-

 Summary: DataFrame mapfield invalid type
 Key: SPARK-41847
 URL: https://issues.apache.org/jira/browse/SPARK-41847
 Project: Spark
  Issue Type: Sub-task
  Components: Connect
Affects Versions: 3.4.0
Reporter: Sandeep Singh


{code:java}
File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/functions.py", 
line 1098, in pyspark.sql.connect.functions.rank
Failed example:
    df.withColumn("drank", rank().over(w)).show()
Exception raised:
    Traceback (most recent call last):
      File 
"/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py",
 line 1350, in __run
        exec(compile(example.source, filename, "single",
      File "", line 1, in 

        df.withColumn("drank", rank().over(w)).show()
      File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
line 534, in show
        print(self._show_string(n, truncate, vertical))
      File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
line 423, in _show_string
        ).toPandas()
      File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
line 1031, in toPandas
        return self._session.client.to_pandas(query)
      File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", line 
413, in to_pandas
        return self._execute_and_fetch(req)
      File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", line 
573, in _execute_and_fetch
        self._handle_error(rpc_error)
      File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", line 
619, in _handle_error
        raise SparkConnectAnalysisException(
    pyspark.sql.connect.client.SparkConnectAnalysisException: 
[UNRESOLVED_COLUMN.WITH_SUGGESTION] A column or function parameter with name 
`value` cannot be resolved. Did you mean one of the following? [`_1`]
    Plan: 'Project [_1#4000L, rank() windowspecdefinition('value ASC NULLS 
FIRST, specifiedwindowframe(RowFrame, unboundedpreceding$(), currentrow$())) AS 
drank#4003]
    +- Project [0#3998L AS _1#4000L]
       +- LocalRelation [0#3998L] {code}
{code:java}
File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/functions.py", 
line 1032, in pyspark.sql.connect.functions.cume_dist
Failed example:
    df.withColumn("cd", cume_dist().over(w)).show()
Exception raised:
    Traceback (most recent call last):
      File 
"/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py",
 line 1350, in __run
        exec(compile(example.source, filename, "single",
      File "", line 1, in 

        df.withColumn("cd", cume_dist().over(w)).show()
      File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
line 534, in show
        print(self._show_string(n, truncate, vertical))
      File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
line 423, in _show_string
        ).toPandas()
      File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
line 1031, in toPandas
        return self._session.client.to_pandas(query)
      File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", line 
413, in to_pandas
        return self._execute_and_fetch(req)
      File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", line 
573, in _execute_and_fetch
        self._handle_error(rpc_error)
      File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", line 
619, in _handle_error
        raise SparkConnectAnalysisException(
    pyspark.sql.connect.client.SparkConnectAnalysisException: 
[UNRESOLVED_COLUMN.WITH_SUGGESTION] A column or function parameter with name 
`value` cannot be resolved. Did you mean one of the following? [`_1`]
    Plan: 'Project [_1#2202L, cume_dist() windowspecdefinition('value ASC NULLS 
FIRST, specifiedwindowframe(RangeFrame, unboundedpreceding$(), currentrow$())) 
AS cd#2205]
    +- Project [0#2200L AS _1#2202L]
       +- LocalRelation [0#2200L] {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-41846) DataFrame windowspec functions : unresolved columns

2023-01-02 Thread Sandeep Singh (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41846?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sandeep Singh updated SPARK-41846:
--
Summary: DataFrame windowspec functions : unresolved columns  (was: 
DataFrame aggregation functions : unresolved columns)

> DataFrame windowspec functions : unresolved columns
> ---
>
> Key: SPARK-41846
> URL: https://issues.apache.org/jira/browse/SPARK-41846
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Sandeep Singh
>Priority: Major
>
> {code:java}
> File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/functions.py", 
> line 1098, in pyspark.sql.connect.functions.rank
> Failed example:
>     df.withColumn("drank", rank().over(w)).show()
> Exception raised:
>     Traceback (most recent call last):
>       File 
> "/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py",
>  line 1350, in __run
>         exec(compile(example.source, filename, "single",
>       File "", line 1, in 
> 
>         df.withColumn("drank", rank().over(w)).show()
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
> line 534, in show
>         print(self._show_string(n, truncate, vertical))
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
> line 423, in _show_string
>         ).toPandas()
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
> line 1031, in toPandas
>         return self._session.client.to_pandas(query)
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", 
> line 413, in to_pandas
>         return self._execute_and_fetch(req)
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", 
> line 573, in _execute_and_fetch
>         self._handle_error(rpc_error)
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", 
> line 619, in _handle_error
>         raise SparkConnectAnalysisException(
>     pyspark.sql.connect.client.SparkConnectAnalysisException: 
> [UNRESOLVED_COLUMN.WITH_SUGGESTION] A column or function parameter with name 
> `value` cannot be resolved. Did you mean one of the following? [`_1`]
>     Plan: 'Project [_1#4000L, rank() windowspecdefinition('value ASC NULLS 
> FIRST, specifiedwindowframe(RowFrame, unboundedpreceding$(), currentrow$())) 
> AS drank#4003]
>     +- Project [0#3998L AS _1#4000L]
>        +- LocalRelation [0#3998L] {code}
> {code:java}
> File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/functions.py", 
> line 1032, in pyspark.sql.connect.functions.cume_dist
> Failed example:
>     df.withColumn("cd", cume_dist().over(w)).show()
> Exception raised:
>     Traceback (most recent call last):
>       File 
> "/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py",
>  line 1350, in __run
>         exec(compile(example.source, filename, "single",
>       File "", line 1, in 
> 
>         df.withColumn("cd", cume_dist().over(w)).show()
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
> line 534, in show
>         print(self._show_string(n, truncate, vertical))
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
> line 423, in _show_string
>         ).toPandas()
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
> line 1031, in toPandas
>         return self._session.client.to_pandas(query)
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", 
> line 413, in to_pandas
>         return self._execute_and_fetch(req)
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", 
> line 573, in _execute_and_fetch
>         self._handle_error(rpc_error)
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", 
> line 619, in _handle_error
>         raise SparkConnectAnalysisException(
>     pyspark.sql.connect.client.SparkConnectAnalysisException: 
> [UNRESOLVED_COLUMN.WITH_SUGGESTION] A column or function parameter with name 
> `value` cannot be resolved. Did you mean one of the following? [`_1`]
>     Plan: 'Project [_1#2202L, cume_dist() windowspecdefinition('value ASC 
> NULLS FIRST, specifiedwindowframe(RangeFrame, unboundedpreceding$(), 
> currentrow$())) AS cd#2205]
>     +- Project [0#2200L AS _1#2202L]
>        +- LocalRelation [0#2200L] {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: 

[jira] [Updated] (SPARK-41846) DataFrame aggregation functions : unresolved columns

2023-01-02 Thread Sandeep Singh (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41846?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sandeep Singh updated SPARK-41846:
--
Description: 
{code:java}
File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/functions.py", 
line 1098, in pyspark.sql.connect.functions.rank
Failed example:
    df.withColumn("drank", rank().over(w)).show()
Exception raised:
    Traceback (most recent call last):
      File 
"/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py",
 line 1350, in __run
        exec(compile(example.source, filename, "single",
      File "", line 1, in 

        df.withColumn("drank", rank().over(w)).show()
      File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
line 534, in show
        print(self._show_string(n, truncate, vertical))
      File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
line 423, in _show_string
        ).toPandas()
      File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
line 1031, in toPandas
        return self._session.client.to_pandas(query)
      File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", line 
413, in to_pandas
        return self._execute_and_fetch(req)
      File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", line 
573, in _execute_and_fetch
        self._handle_error(rpc_error)
      File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", line 
619, in _handle_error
        raise SparkConnectAnalysisException(
    pyspark.sql.connect.client.SparkConnectAnalysisException: 
[UNRESOLVED_COLUMN.WITH_SUGGESTION] A column or function parameter with name 
`value` cannot be resolved. Did you mean one of the following? [`_1`]
    Plan: 'Project [_1#4000L, rank() windowspecdefinition('value ASC NULLS 
FIRST, specifiedwindowframe(RowFrame, unboundedpreceding$(), currentrow$())) AS 
drank#4003]
    +- Project [0#3998L AS _1#4000L]
       +- LocalRelation [0#3998L] {code}
{code:java}
File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/functions.py", 
line 1032, in pyspark.sql.connect.functions.cume_dist
Failed example:
    df.withColumn("cd", cume_dist().over(w)).show()
Exception raised:
    Traceback (most recent call last):
      File 
"/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py",
 line 1350, in __run
        exec(compile(example.source, filename, "single",
      File "", line 1, in 

        df.withColumn("cd", cume_dist().over(w)).show()
      File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
line 534, in show
        print(self._show_string(n, truncate, vertical))
      File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
line 423, in _show_string
        ).toPandas()
      File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
line 1031, in toPandas
        return self._session.client.to_pandas(query)
      File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", line 
413, in to_pandas
        return self._execute_and_fetch(req)
      File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", line 
573, in _execute_and_fetch
        self._handle_error(rpc_error)
      File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", line 
619, in _handle_error
        raise SparkConnectAnalysisException(
    pyspark.sql.connect.client.SparkConnectAnalysisException: 
[UNRESOLVED_COLUMN.WITH_SUGGESTION] A column or function parameter with name 
`value` cannot be resolved. Did you mean one of the following? [`_1`]
    Plan: 'Project [_1#2202L, cume_dist() windowspecdefinition('value ASC NULLS 
FIRST, specifiedwindowframe(RangeFrame, unboundedpreceding$(), currentrow$())) 
AS cd#2205]
    +- Project [0#2200L AS _1#2202L]
       +- LocalRelation [0#2200L] {code}

  was:
{code:java}
File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/functions.py", 
line 1098, in pyspark.sql.connect.functions.rank
Failed example:
    df.withColumn("drank", rank().over(w)).show()
Exception raised:
    Traceback (most recent call last):
      File 
"/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py",
 line 1350, in __run
        exec(compile(example.source, filename, "single",
      File "", line 1, in 

        df.withColumn("drank", rank().over(w)).show()
      File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
line 534, in show
        print(self._show_string(n, truncate, vertical))
      File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
line 423, in _show_string
        ).toPandas()
      File 

[jira] [Updated] (SPARK-39853) Support stage level schedule for standalone cluster when dynamic allocation is disabled

2023-01-02 Thread wuyi (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39853?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wuyi updated SPARK-39853:
-
Fix Version/s: 3.4.0

> Support stage level schedule for standalone cluster when dynamic allocation 
> is disabled
> ---
>
> Key: SPARK-39853
> URL: https://issues.apache.org/jira/browse/SPARK-39853
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.3.0
>Reporter: huangtengfei
>Assignee: huangtengfei
>Priority: Major
> Fix For: 3.4.0
>
>
> [SPARK-39062|https://issues.apache.org/jira/browse/SPARK-39062] added stage 
> level schedule support for standalone cluster when dynamic allocation was 
> enabled, spark would request for executors for different resource profiles.
> While when dynamic allocation is disabled, we can also leverage stage level 
> schedule to schedule tasks based on resource profile(task resource requests) 
> to executors with default resource profile.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-41846) DataFrame aggregation functions : unresolved columns

2023-01-02 Thread Sandeep Singh (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41846?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sandeep Singh updated SPARK-41846:
--
Description: 
{code:java}
File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/functions.py", 
line 1098, in pyspark.sql.connect.functions.rank
Failed example:
    df.withColumn("drank", rank().over(w)).show()
Exception raised:
    Traceback (most recent call last):
      File 
"/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py",
 line 1350, in __run
        exec(compile(example.source, filename, "single",
      File "", line 1, in 

        df.withColumn("drank", rank().over(w)).show()
      File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
line 534, in show
        print(self._show_string(n, truncate, vertical))
      File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
line 423, in _show_string
        ).toPandas()
      File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
line 1031, in toPandas
        return self._session.client.to_pandas(query)
      File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", line 
413, in to_pandas
        return self._execute_and_fetch(req)
      File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", line 
573, in _execute_and_fetch
        self._handle_error(rpc_error)
      File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", line 
619, in _handle_error
        raise SparkConnectAnalysisException(
    pyspark.sql.connect.client.SparkConnectAnalysisException: 
[UNRESOLVED_COLUMN.WITH_SUGGESTION] A column or function parameter with name 
`value` cannot be resolved. Did you mean one of the following? [`_1`]
    Plan: 'Project [_1#4000L, rank() windowspecdefinition('value ASC NULLS 
FIRST, specifiedwindowframe(RowFrame, unboundedpreceding$(), currentrow$())) AS 
drank#4003]
    +- Project [0#3998L AS _1#4000L]
       +- LocalRelation [0#3998L] {code}

  was:
{code}
File "/.../spark/python/pyspark/sql/connect/column.py", line 106, in 
pyspark.sql.connect.column.Column.eqNullSafe
Failed example:
df1.join(df2, df1["value"] == df2["value"]).count()
Exception raised:
Traceback (most recent call last):
  File "/.../miniconda3/envs/python3.9/lib/python3.9/doctest.py", line 
1336, in __run
exec(compile(example.source, filename, "single",
  File "", line 1, 
in 
df1.join(df2, df1["value"] == df2["value"]).count()
  File "/.../spark/python/pyspark/sql/connect/dataframe.py", line 151, in 
count
pdd = self.agg(_invoke_function("count", lit(1))).toPandas()
  File "/.../spark/python/pyspark/sql/connect/dataframe.py", line 1031, in 
toPandas
return self._session.client.to_pandas(query)
  File "/.../spark/python/pyspark/sql/connect/client.py", line 413, in 
to_pandas
return self._execute_and_fetch(req)
  File "/.../spark/python/pyspark/sql/connect/client.py", line 573, in 
_execute_and_fetch
self._handle_error(rpc_error)
  File "/.../spark/python/pyspark/sql/connect/client.py", line 619, in 
_handle_error
raise SparkConnectAnalysisException(
pyspark.sql.connect.client.SparkConnectAnalysisException: 
[AMBIGUOUS_REFERENCE] Reference `value` is ambiguous, could be: [`value`, 
`value`].
{code}


> DataFrame aggregation functions : unresolved columns
> 
>
> Key: SPARK-41846
> URL: https://issues.apache.org/jira/browse/SPARK-41846
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Sandeep Singh
>Priority: Major
>
> {code:java}
> File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/functions.py", 
> line 1098, in pyspark.sql.connect.functions.rank
> Failed example:
>     df.withColumn("drank", rank().over(w)).show()
> Exception raised:
>     Traceback (most recent call last):
>       File 
> "/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py",
>  line 1350, in __run
>         exec(compile(example.source, filename, "single",
>       File "", line 1, in 
> 
>         df.withColumn("drank", rank().over(w)).show()
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
> line 534, in show
>         print(self._show_string(n, truncate, vertical))
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
> line 423, in _show_string
>         ).toPandas()
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
> line 1031, in toPandas
>         return self._session.client.to_pandas(query)
>       File 

[jira] [Created] (SPARK-41846) DataFrame aggregation functions : unresolved columns

2023-01-02 Thread Sandeep Singh (Jira)
Sandeep Singh created SPARK-41846:
-

 Summary: DataFrame aggregation functions : unresolved columns
 Key: SPARK-41846
 URL: https://issues.apache.org/jira/browse/SPARK-41846
 Project: Spark
  Issue Type: Sub-task
  Components: Connect
Affects Versions: 3.4.0
Reporter: Sandeep Singh


{code}
File "/.../spark/python/pyspark/sql/connect/column.py", line 106, in 
pyspark.sql.connect.column.Column.eqNullSafe
Failed example:
df1.join(df2, df1["value"] == df2["value"]).count()
Exception raised:
Traceback (most recent call last):
  File "/.../miniconda3/envs/python3.9/lib/python3.9/doctest.py", line 
1336, in __run
exec(compile(example.source, filename, "single",
  File "", line 1, 
in 
df1.join(df2, df1["value"] == df2["value"]).count()
  File "/.../spark/python/pyspark/sql/connect/dataframe.py", line 151, in 
count
pdd = self.agg(_invoke_function("count", lit(1))).toPandas()
  File "/.../spark/python/pyspark/sql/connect/dataframe.py", line 1031, in 
toPandas
return self._session.client.to_pandas(query)
  File "/.../spark/python/pyspark/sql/connect/client.py", line 413, in 
to_pandas
return self._execute_and_fetch(req)
  File "/.../spark/python/pyspark/sql/connect/client.py", line 573, in 
_execute_and_fetch
self._handle_error(rpc_error)
  File "/.../spark/python/pyspark/sql/connect/client.py", line 619, in 
_handle_error
raise SparkConnectAnalysisException(
pyspark.sql.connect.client.SparkConnectAnalysisException: 
[AMBIGUOUS_REFERENCE] Reference `value` is ambiguous, could be: [`value`, 
`value`].
{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-37677) spark on k8s, when the user want to push python3.6.6.zip to the pod , but no permission to execute

2023-01-02 Thread jingxiong zhong (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-37677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17653733#comment-17653733
 ] 

jingxiong zhong commented on SPARK-37677:
-

At present, I have repaired Hadoop version 3.3.5, but it has not been released 
yet. In the future, Spark needs to update the Hadoop version to solve this 
problem.[~valux] 

> spark on k8s, when the user want to push python3.6.6.zip to the pod , but no 
> permission to execute
> --
>
> Key: SPARK-37677
> URL: https://issues.apache.org/jira/browse/SPARK-37677
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.2.0
>Reporter: jingxiong zhong
>Priority: Major
>
> In cluster mode, I hava another question that when I unzip python3.6.6.zip in 
> pod , but no permission to execute, my execute operation as follows:
> {code:sh}
> spark-submit \
> --archives ./python3.6.6.zip#python3.6.6 \
> --conf "spark.pyspark.python=python3.6.6/python3.6.6/bin/python3" \
> --conf "spark.pyspark.driver.python=python3.6.6/python3.6.6/bin/python3" \
> --conf spark.kubernetes.container.image.pullPolicy=Always \
> ./examples/src/main/python/pi.py 100
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-37521) insert overwrite table but the partition information stored in Metastore was not changed

2023-01-02 Thread jingxiong zhong (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37521?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jingxiong zhong resolved SPARK-37521.
-
Resolution: Won't Fix

> insert overwrite table but the partition information stored in Metastore was 
> not changed
> 
>
> Key: SPARK-37521
> URL: https://issues.apache.org/jira/browse/SPARK-37521
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.0
> Environment: spark3.2.0
> hive2.3.9
> metastore2.3.9
>Reporter: jingxiong zhong
>Priority: Major
>
> I create a partitioned table in SparkSQL, insert a data entry, add a regular 
> field, and finally insert a new data entry into the partition,The query is 
> normal in SparkSQL, but the return value of the newly inserted field is NULL 
> in Hive 2.3.9
> for example
> create table updata_col_test1(a int) partitioned by (dt string); 
> insert overwrite table updata_col_test1 partition(dt='20200101') values(1); 
> insert overwrite table updata_col_test1 partition(dt='20200102') values(1);
> insert overwrite table updata_col_test1 partition(dt='20200103') values(1);
> alter table  updata_col_test1 add columns (b int);
> insert overwrite table updata_col_test1 partition(dt) values(1, 2, 
> '20200101'); fail
> insert overwrite table updata_col_test1 partition(dt='20200101') values(1, 
> 2); fail
> insert overwrite table updata_col_test1 partition(dt='20200104') values(1, 
> 2); sucessfully



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-41823) DataFrame.join creating ambiguous column names

2023-01-02 Thread Sandeep Singh (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41823?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sandeep Singh resolved SPARK-41823.
---
Resolution: Duplicate

> DataFrame.join creating ambiguous column names
> --
>
> Key: SPARK-41823
> URL: https://issues.apache.org/jira/browse/SPARK-41823
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Sandeep Singh
>Priority: Major
>
> {code:java}
> File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
> line 254, in pyspark.sql.connect.dataframe.DataFrame.drop
> Failed example:
>     df.join(df2, df.name == df2.name, 'inner').drop('name').show()
> Exception raised:
>     Traceback (most recent call last):
>       File 
> "/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py",
>  line 1350, in __run
>         exec(compile(example.source, filename, "single",
>       File "", line 
> 1, in 
>         df.join(df2, df.name == df2.name, 'inner').drop('name').show()
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
> line 534, in show
>         print(self._show_string(n, truncate, vertical))
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
> line 423, in _show_string
>         ).toPandas()
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
> line 1031, in toPandas
>         return self._session.client.to_pandas(query)
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", 
> line 413, in to_pandas
>         return self._execute_and_fetch(req)
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", 
> line 573, in _execute_and_fetch
>         self._handle_error(rpc_error)
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", 
> line 619, in _handle_error
>         raise SparkConnectAnalysisException(
>     pyspark.sql.connect.client.SparkConnectAnalysisException: 
> [AMBIGUOUS_REFERENCE] Reference `name` is ambiguous, could be: [`name`, 
> `name`].
>     Plan: {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-41845) Fix `count(expr("*"))` function

2023-01-02 Thread Sandeep Singh (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sandeep Singh updated SPARK-41845:
--
Description: 
{code:java}
File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/functions.py", 
line 801, in pyspark.sql.connect.functions.count
Failed example:
    df.select(count(expr("*")), count(df.alphabets)).show()
Expected:
    +++
    |count(1)|count(alphabets)|
    +++
    |       4|               3|
    +++
Got:
    +++
    |count(alphabets)|count(alphabets)|
    +++
    |               3|               3|
    +++
     {code}

  was:
{code:java}
File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/functions.py", 
line 2332, in pyspark.sql.connect.functions.call_udf
Failed example:
    df.select(call_udf("intX2", "id")).show()
Exception raised:
    Traceback (most recent call last):
      File 
"/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py",
 line 1350, in __run
        exec(compile(example.source, filename, "single",
      File "", line 1, in 

        df.select(call_udf("intX2", "id")).show()
      File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
line 534, in show
        print(self._show_string(n, truncate, vertical))
      File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
line 423, in _show_string
        ).toPandas()
      File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
line 1031, in toPandas
        return self._session.client.to_pandas(query)
      File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", line 
413, in to_pandas
        return self._execute_and_fetch(req)
      File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", line 
573, in _execute_and_fetch
        self._handle_error(rpc_error)
      File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", line 
619, in _handle_error
        raise SparkConnectAnalysisException(
    pyspark.sql.connect.client.SparkConnectAnalysisException: 
[UNRESOLVED_ROUTINE] Cannot resolve function `intX2` on search path 
[`system`.`builtin`, `system`.`session`, `spark_catalog`.`default`].
    Plan: {code}


> Fix `count(expr("*"))` function
> ---
>
> Key: SPARK-41845
> URL: https://issues.apache.org/jira/browse/SPARK-41845
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.4.0
>Reporter: Sandeep Singh
>Priority: Major
> Fix For: 3.4.0
>
>
> {code:java}
> File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/functions.py", 
> line 801, in pyspark.sql.connect.functions.count
> Failed example:
>     df.select(count(expr("*")), count(df.alphabets)).show()
> Expected:
>     +++
>     |count(1)|count(alphabets)|
>     +++
>     |       4|               3|
>     +++
> Got:
>     +++
>     |count(alphabets)|count(alphabets)|
>     +++
>     |               3|               3|
>     +++
>      {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-41845) Fix `count(expr("*"))` function

2023-01-02 Thread Sandeep Singh (Jira)
Sandeep Singh created SPARK-41845:
-

 Summary: Fix `count(expr("*"))` function
 Key: SPARK-41845
 URL: https://issues.apache.org/jira/browse/SPARK-41845
 Project: Spark
  Issue Type: Sub-task
  Components: Connect, PySpark
Affects Versions: 3.4.0
Reporter: Sandeep Singh
 Fix For: 3.4.0


{code:java}
File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/functions.py", 
line 2332, in pyspark.sql.connect.functions.call_udf
Failed example:
    df.select(call_udf("intX2", "id")).show()
Exception raised:
    Traceback (most recent call last):
      File 
"/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py",
 line 1350, in __run
        exec(compile(example.source, filename, "single",
      File "", line 1, in 

        df.select(call_udf("intX2", "id")).show()
      File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
line 534, in show
        print(self._show_string(n, truncate, vertical))
      File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
line 423, in _show_string
        ).toPandas()
      File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
line 1031, in toPandas
        return self._session.client.to_pandas(query)
      File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", line 
413, in to_pandas
        return self._execute_and_fetch(req)
      File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", line 
573, in _execute_and_fetch
        self._handle_error(rpc_error)
      File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", line 
619, in _handle_error
        raise SparkConnectAnalysisException(
    pyspark.sql.connect.client.SparkConnectAnalysisException: 
[UNRESOLVED_ROUTINE] Cannot resolve function `intX2` on search path 
[`system`.`builtin`, `system`.`session`, `spark_catalog`.`default`].
    Plan: {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-41844) Implement `intX2` function

2023-01-02 Thread Sandeep Singh (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sandeep Singh resolved SPARK-41844.
---
Resolution: Invalid

> Implement `intX2` function
> --
>
> Key: SPARK-41844
> URL: https://issues.apache.org/jira/browse/SPARK-41844
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.4.0
>Reporter: Sandeep Singh
>Priority: Major
> Fix For: 3.4.0
>
>
> {code:java}
> File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/functions.py", 
> line 2332, in pyspark.sql.connect.functions.call_udf
> Failed example:
>     df.select(call_udf("intX2", "id")).show()
> Exception raised:
>     Traceback (most recent call last):
>       File 
> "/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py",
>  line 1350, in __run
>         exec(compile(example.source, filename, "single",
>       File "", line 1, in 
> 
>         df.select(call_udf("intX2", "id")).show()
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
> line 534, in show
>         print(self._show_string(n, truncate, vertical))
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
> line 423, in _show_string
>         ).toPandas()
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
> line 1031, in toPandas
>         return self._session.client.to_pandas(query)
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", 
> line 413, in to_pandas
>         return self._execute_and_fetch(req)
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", 
> line 573, in _execute_and_fetch
>         self._handle_error(rpc_error)
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", 
> line 619, in _handle_error
>         raise SparkConnectAnalysisException(
>     pyspark.sql.connect.client.SparkConnectAnalysisException: 
> [UNRESOLVED_ROUTINE] Cannot resolve function `intX2` on search path 
> [`system`.`builtin`, `system`.`session`, `spark_catalog`.`default`].
>     Plan: {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-41844) Implement `intX2` function

2023-01-02 Thread Sandeep Singh (Jira)
Sandeep Singh created SPARK-41844:
-

 Summary: Implement `intX2` function
 Key: SPARK-41844
 URL: https://issues.apache.org/jira/browse/SPARK-41844
 Project: Spark
  Issue Type: Sub-task
  Components: Connect, PySpark
Affects Versions: 3.4.0
Reporter: Sandeep Singh
 Fix For: 3.4.0


{code:java}
File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/functions.py", 
line 1611, in pyspark.sql.connect.functions.transform_keys
Failed example:
    df.select(transform_keys(
        "data", lambda k, _: upper(k)).alias("data_upper")
    ).show(truncate=False)
Exception raised:
    Traceback (most recent call last):
      File 
"/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py",
 line 1350, in __run
        exec(compile(example.source, filename, "single",
      File "", line 1, 
in 
        df.select(transform_keys(
      File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
line 534, in show
        print(self._show_string(n, truncate, vertical))
      File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
line 423, in _show_string
        ).toPandas()
      File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
line 1031, in toPandas
        return self._session.client.to_pandas(query)
      File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", line 
413, in to_pandas
        return self._execute_and_fetch(req)
      File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", line 
573, in _execute_and_fetch
        self._handle_error(rpc_error)
      File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", line 
619, in _handle_error
        raise SparkConnectAnalysisException(
    pyspark.sql.connect.client.SparkConnectAnalysisException: 
[DATATYPE_MISMATCH.UNEXPECTED_INPUT_TYPE] Cannot resolve "transform_keys(data, 
lambdafunction(upper(x_11), x_11, y_12))" due to data type mismatch: Parameter 
1 requires the "MAP" type, however "data" has the type "STRUCT".
    Plan: 'Project [transform_keys(data#4493, lambdafunction('upper(lambda 
'x_11), lambda 'x_11, lambda 'y_12, false)) AS data_upper#4496]
    +- Project [0#4488L AS id#4492L, 1#4489 AS data#4493]
       +- LocalRelation [0#4488L, 1#4489] {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-41844) Implement `intX2` function

2023-01-02 Thread Sandeep Singh (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sandeep Singh updated SPARK-41844:
--
Description: 
{code:java}
File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/functions.py", 
line 2332, in pyspark.sql.connect.functions.call_udf
Failed example:
    df.select(call_udf("intX2", "id")).show()
Exception raised:
    Traceback (most recent call last):
      File 
"/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py",
 line 1350, in __run
        exec(compile(example.source, filename, "single",
      File "", line 1, in 

        df.select(call_udf("intX2", "id")).show()
      File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
line 534, in show
        print(self._show_string(n, truncate, vertical))
      File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
line 423, in _show_string
        ).toPandas()
      File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
line 1031, in toPandas
        return self._session.client.to_pandas(query)
      File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", line 
413, in to_pandas
        return self._execute_and_fetch(req)
      File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", line 
573, in _execute_and_fetch
        self._handle_error(rpc_error)
      File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", line 
619, in _handle_error
        raise SparkConnectAnalysisException(
    pyspark.sql.connect.client.SparkConnectAnalysisException: 
[UNRESOLVED_ROUTINE] Cannot resolve function `intX2` on search path 
[`system`.`builtin`, `system`.`session`, `spark_catalog`.`default`].
    Plan: {code}

  was:
{code:java}
File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/functions.py", 
line 1611, in pyspark.sql.connect.functions.transform_keys
Failed example:
    df.select(transform_keys(
        "data", lambda k, _: upper(k)).alias("data_upper")
    ).show(truncate=False)
Exception raised:
    Traceback (most recent call last):
      File 
"/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py",
 line 1350, in __run
        exec(compile(example.source, filename, "single",
      File "", line 1, 
in 
        df.select(transform_keys(
      File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
line 534, in show
        print(self._show_string(n, truncate, vertical))
      File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
line 423, in _show_string
        ).toPandas()
      File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
line 1031, in toPandas
        return self._session.client.to_pandas(query)
      File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", line 
413, in to_pandas
        return self._execute_and_fetch(req)
      File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", line 
573, in _execute_and_fetch
        self._handle_error(rpc_error)
      File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", line 
619, in _handle_error
        raise SparkConnectAnalysisException(
    pyspark.sql.connect.client.SparkConnectAnalysisException: 
[DATATYPE_MISMATCH.UNEXPECTED_INPUT_TYPE] Cannot resolve "transform_keys(data, 
lambdafunction(upper(x_11), x_11, y_12))" due to data type mismatch: Parameter 
1 requires the "MAP" type, however "data" has the type "STRUCT".
    Plan: 'Project [transform_keys(data#4493, lambdafunction('upper(lambda 
'x_11), lambda 'x_11, lambda 'y_12, false)) AS data_upper#4496]
    +- Project [0#4488L AS id#4492L, 1#4489 AS data#4493]
       +- LocalRelation [0#4488L, 1#4489] {code}


> Implement `intX2` function
> --
>
> Key: SPARK-41844
> URL: https://issues.apache.org/jira/browse/SPARK-41844
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.4.0
>Reporter: Sandeep Singh
>Priority: Major
> Fix For: 3.4.0
>
>
> {code:java}
> File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/functions.py", 
> line 2332, in pyspark.sql.connect.functions.call_udf
> Failed example:
>     df.select(call_udf("intX2", "id")).show()
> Exception raised:
>     Traceback (most recent call last):
>       File 
> "/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py",
>  line 1350, in __run
>         exec(compile(example.source, filename, "single",
>       File "", line 1, in 
> 
>         df.select(call_udf("intX2", "id")).show()
>       File 
> 

[jira] [Updated] (SPARK-41835) Implement `transform_keys` function

2023-01-02 Thread Sandeep Singh (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sandeep Singh updated SPARK-41835:
--
Description: 
{code:java}
File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/functions.py", 
line 1611, in pyspark.sql.connect.functions.transform_keys
Failed example:
    df.select(transform_keys(
        "data", lambda k, _: upper(k)).alias("data_upper")
    ).show(truncate=False)
Exception raised:
    Traceback (most recent call last):
      File 
"/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py",
 line 1350, in __run
        exec(compile(example.source, filename, "single",
      File "", line 1, 
in 
        df.select(transform_keys(
      File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
line 534, in show
        print(self._show_string(n, truncate, vertical))
      File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
line 423, in _show_string
        ).toPandas()
      File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
line 1031, in toPandas
        return self._session.client.to_pandas(query)
      File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", line 
413, in to_pandas
        return self._execute_and_fetch(req)
      File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", line 
573, in _execute_and_fetch
        self._handle_error(rpc_error)
      File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", line 
619, in _handle_error
        raise SparkConnectAnalysisException(
    pyspark.sql.connect.client.SparkConnectAnalysisException: 
[DATATYPE_MISMATCH.UNEXPECTED_INPUT_TYPE] Cannot resolve "transform_keys(data, 
lambdafunction(upper(x_11), x_11, y_12))" due to data type mismatch: Parameter 
1 requires the "MAP" type, however "data" has the type "STRUCT".
    Plan: 'Project [transform_keys(data#4493, lambdafunction('upper(lambda 
'x_11), lambda 'x_11, lambda 'y_12, false)) AS data_upper#4496]
    +- Project [0#4488L AS id#4492L, 1#4489 AS data#4493]
       +- LocalRelation [0#4488L, 1#4489] {code}

> Implement `transform_keys` function
> ---
>
> Key: SPARK-41835
> URL: https://issues.apache.org/jira/browse/SPARK-41835
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.4.0
>Reporter: Sandeep Singh
>Priority: Major
> Fix For: 3.4.0
>
>
> {code:java}
> File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/functions.py", 
> line 1611, in pyspark.sql.connect.functions.transform_keys
> Failed example:
>     df.select(transform_keys(
>         "data", lambda k, _: upper(k)).alias("data_upper")
>     ).show(truncate=False)
> Exception raised:
>     Traceback (most recent call last):
>       File 
> "/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py",
>  line 1350, in __run
>         exec(compile(example.source, filename, "single",
>       File "", line 
> 1, in 
>         df.select(transform_keys(
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
> line 534, in show
>         print(self._show_string(n, truncate, vertical))
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
> line 423, in _show_string
>         ).toPandas()
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
> line 1031, in toPandas
>         return self._session.client.to_pandas(query)
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", 
> line 413, in to_pandas
>         return self._execute_and_fetch(req)
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", 
> line 573, in _execute_and_fetch
>         self._handle_error(rpc_error)
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", 
> line 619, in _handle_error
>         raise SparkConnectAnalysisException(
>     pyspark.sql.connect.client.SparkConnectAnalysisException: 
> [DATATYPE_MISMATCH.UNEXPECTED_INPUT_TYPE] Cannot resolve 
> "transform_keys(data, lambdafunction(upper(x_11), x_11, y_12))" due to data 
> type mismatch: Parameter 1 requires the "MAP" type, however "data" has the 
> type "STRUCT".
>     Plan: 'Project [transform_keys(data#4493, lambdafunction('upper(lambda 
> 'x_11), lambda 'x_11, lambda 'y_12, false)) AS data_upper#4496]
>     +- Project [0#4488L AS id#4492L, 1#4489 AS data#4493]
>        +- LocalRelation [0#4488L, 1#4489] {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


  1   2   3   >