[jira] [Assigned] (SPARK-42179) Upgrade ORC to 1.7.8

2023-01-24 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42179?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42179:


Assignee: (was: Apache Spark)

> Upgrade ORC to 1.7.8
> 
>
> Key: SPARK-42179
> URL: https://issues.apache.org/jira/browse/SPARK-42179
> Project: Spark
>  Issue Type: Bug
>  Components: Build, SQL
>Affects Versions: 3.3.2
>Reporter: Dongjoon Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42179) Upgrade ORC to 1.7.8

2023-01-24 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42179?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42179:


Assignee: Apache Spark

> Upgrade ORC to 1.7.8
> 
>
> Key: SPARK-42179
> URL: https://issues.apache.org/jira/browse/SPARK-42179
> Project: Spark
>  Issue Type: Bug
>  Components: Build, SQL
>Affects Versions: 3.3.2
>Reporter: Dongjoon Hyun
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42179) Upgrade ORC to 1.7.8

2023-01-24 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17680476#comment-17680476
 ] 

Apache Spark commented on SPARK-42179:
--

User 'dongjoon-hyun' has created a pull request for this issue:
https://github.com/apache/spark/pull/39735

> Upgrade ORC to 1.7.8
> 
>
> Key: SPARK-42179
> URL: https://issues.apache.org/jira/browse/SPARK-42179
> Project: Spark
>  Issue Type: Bug
>  Components: Build, SQL
>Affects Versions: 3.3.2
>Reporter: Dongjoon Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-42179) Upgrade ORC to 1.7.8

2023-01-24 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42179?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-42179:
--
Component/s: Build

> Upgrade ORC to 1.7.8
> 
>
> Key: SPARK-42179
> URL: https://issues.apache.org/jira/browse/SPARK-42179
> Project: Spark
>  Issue Type: Bug
>  Components: Build, SQL
>Affects Versions: 3.3.2
>Reporter: Dongjoon Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-42179) Upgrade ORC to 1.7.8

2023-01-24 Thread Dongjoon Hyun (Jira)
Dongjoon Hyun created SPARK-42179:
-

 Summary: Upgrade ORC to 1.7.8
 Key: SPARK-42179
 URL: https://issues.apache.org/jira/browse/SPARK-42179
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.3.2
Reporter: Dongjoon Hyun






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41812) DataFrame.join: ambiguous column

2023-01-24 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17680467#comment-17680467
 ] 

Apache Spark commented on SPARK-41812:
--

User 'zhengruifeng' has created a pull request for this issue:
https://github.com/apache/spark/pull/39734

> DataFrame.join: ambiguous column
> 
>
> Key: SPARK-41812
> URL: https://issues.apache.org/jira/browse/SPARK-41812
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Hyukjin Kwon
>Priority: Major
>
> {code}
> File "/.../spark/python/pyspark/sql/connect/column.py", line 106, in 
> pyspark.sql.connect.column.Column.eqNullSafe
> Failed example:
> df1.join(df2, df1["value"] == df2["value"]).count()
> Exception raised:
> Traceback (most recent call last):
>   File "/.../miniconda3/envs/python3.9/lib/python3.9/doctest.py", line 
> 1336, in __run
> exec(compile(example.source, filename, "single",
>   File "", line 
> 1, in 
> df1.join(df2, df1["value"] == df2["value"]).count()
>   File "/.../spark/python/pyspark/sql/connect/dataframe.py", line 151, in 
> count
> pdd = self.agg(_invoke_function("count", lit(1))).toPandas()
>   File "/.../spark/python/pyspark/sql/connect/dataframe.py", line 1031, 
> in toPandas
> return self._session.client.to_pandas(query)
>   File "/.../spark/python/pyspark/sql/connect/client.py", line 413, in 
> to_pandas
> return self._execute_and_fetch(req)
>   File "/.../spark/python/pyspark/sql/connect/client.py", line 573, in 
> _execute_and_fetch
> self._handle_error(rpc_error)
>   File "/.../spark/python/pyspark/sql/connect/client.py", line 619, in 
> _handle_error
> raise SparkConnectAnalysisException(
> pyspark.sql.connect.client.SparkConnectAnalysisException: 
> [AMBIGUOUS_REFERENCE] Reference `value` is ambiguous, could be: [`value`, 
> `value`].
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41823) DataFrame.join creating ambiguous column names

2023-01-24 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17680468#comment-17680468
 ] 

Apache Spark commented on SPARK-41823:
--

User 'zhengruifeng' has created a pull request for this issue:
https://github.com/apache/spark/pull/39734

> DataFrame.join creating ambiguous column names
> --
>
> Key: SPARK-41823
> URL: https://issues.apache.org/jira/browse/SPARK-41823
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Sandeep Singh
>Priority: Major
>
> {code:java}
> File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
> line 254, in pyspark.sql.connect.dataframe.DataFrame.drop
> Failed example:
>     df.join(df2, df.name == df2.name, 'inner').drop('name').show()
> Exception raised:
>     Traceback (most recent call last):
>       File 
> "/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py",
>  line 1350, in __run
>         exec(compile(example.source, filename, "single",
>       File "", line 
> 1, in 
>         df.join(df2, df.name == df2.name, 'inner').drop('name').show()
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
> line 534, in show
>         print(self._show_string(n, truncate, vertical))
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
> line 423, in _show_string
>         ).toPandas()
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
> line 1031, in toPandas
>         return self._session.client.to_pandas(query)
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", 
> line 413, in to_pandas
>         return self._execute_and_fetch(req)
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", 
> line 573, in _execute_and_fetch
>         self._handle_error(rpc_error)
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", 
> line 619, in _handle_error
>         raise SparkConnectAnalysisException(
>     pyspark.sql.connect.client.SparkConnectAnalysisException: 
> [AMBIGUOUS_REFERENCE] Reference `name` is ambiguous, could be: [`name`, 
> `name`].
>     Plan: {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-41812) DataFrame.join: ambiguous column

2023-01-24 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41812:


Assignee: (was: Apache Spark)

> DataFrame.join: ambiguous column
> 
>
> Key: SPARK-41812
> URL: https://issues.apache.org/jira/browse/SPARK-41812
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Hyukjin Kwon
>Priority: Major
>
> {code}
> File "/.../spark/python/pyspark/sql/connect/column.py", line 106, in 
> pyspark.sql.connect.column.Column.eqNullSafe
> Failed example:
> df1.join(df2, df1["value"] == df2["value"]).count()
> Exception raised:
> Traceback (most recent call last):
>   File "/.../miniconda3/envs/python3.9/lib/python3.9/doctest.py", line 
> 1336, in __run
> exec(compile(example.source, filename, "single",
>   File "", line 
> 1, in 
> df1.join(df2, df1["value"] == df2["value"]).count()
>   File "/.../spark/python/pyspark/sql/connect/dataframe.py", line 151, in 
> count
> pdd = self.agg(_invoke_function("count", lit(1))).toPandas()
>   File "/.../spark/python/pyspark/sql/connect/dataframe.py", line 1031, 
> in toPandas
> return self._session.client.to_pandas(query)
>   File "/.../spark/python/pyspark/sql/connect/client.py", line 413, in 
> to_pandas
> return self._execute_and_fetch(req)
>   File "/.../spark/python/pyspark/sql/connect/client.py", line 573, in 
> _execute_and_fetch
> self._handle_error(rpc_error)
>   File "/.../spark/python/pyspark/sql/connect/client.py", line 619, in 
> _handle_error
> raise SparkConnectAnalysisException(
> pyspark.sql.connect.client.SparkConnectAnalysisException: 
> [AMBIGUOUS_REFERENCE] Reference `value` is ambiguous, could be: [`value`, 
> `value`].
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-41812) DataFrame.join: ambiguous column

2023-01-24 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41812:


Assignee: Apache Spark

> DataFrame.join: ambiguous column
> 
>
> Key: SPARK-41812
> URL: https://issues.apache.org/jira/browse/SPARK-41812
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Hyukjin Kwon
>Assignee: Apache Spark
>Priority: Major
>
> {code}
> File "/.../spark/python/pyspark/sql/connect/column.py", line 106, in 
> pyspark.sql.connect.column.Column.eqNullSafe
> Failed example:
> df1.join(df2, df1["value"] == df2["value"]).count()
> Exception raised:
> Traceback (most recent call last):
>   File "/.../miniconda3/envs/python3.9/lib/python3.9/doctest.py", line 
> 1336, in __run
> exec(compile(example.source, filename, "single",
>   File "", line 
> 1, in 
> df1.join(df2, df1["value"] == df2["value"]).count()
>   File "/.../spark/python/pyspark/sql/connect/dataframe.py", line 151, in 
> count
> pdd = self.agg(_invoke_function("count", lit(1))).toPandas()
>   File "/.../spark/python/pyspark/sql/connect/dataframe.py", line 1031, 
> in toPandas
> return self._session.client.to_pandas(query)
>   File "/.../spark/python/pyspark/sql/connect/client.py", line 413, in 
> to_pandas
> return self._execute_and_fetch(req)
>   File "/.../spark/python/pyspark/sql/connect/client.py", line 573, in 
> _execute_and_fetch
> self._handle_error(rpc_error)
>   File "/.../spark/python/pyspark/sql/connect/client.py", line 619, in 
> _handle_error
> raise SparkConnectAnalysisException(
> pyspark.sql.connect.client.SparkConnectAnalysisException: 
> [AMBIGUOUS_REFERENCE] Reference `value` is ambiguous, could be: [`value`, 
> `value`].
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42178) Handle remaining null string values in ui protobuf serializer and add tests

2023-01-24 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17680455#comment-17680455
 ] 

Apache Spark commented on SPARK-42178:
--

User 'gengliangwang' has created a pull request for this issue:
https://github.com/apache/spark/pull/39732

> Handle remaining null string values in ui protobuf serializer and add tests
> ---
>
> Key: SPARK-42178
> URL: https://issues.apache.org/jira/browse/SPARK-42178
> Project: Spark
>  Issue Type: Sub-task
>  Components: Web UI
>Affects Versions: 3.4.0
>Reporter: Gengliang Wang
>Assignee: Gengliang Wang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42178) Handle remaining null string values in ui protobuf serializer and add tests

2023-01-24 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17680454#comment-17680454
 ] 

Apache Spark commented on SPARK-42178:
--

User 'gengliangwang' has created a pull request for this issue:
https://github.com/apache/spark/pull/39732

> Handle remaining null string values in ui protobuf serializer and add tests
> ---
>
> Key: SPARK-42178
> URL: https://issues.apache.org/jira/browse/SPARK-42178
> Project: Spark
>  Issue Type: Sub-task
>  Components: Web UI
>Affects Versions: 3.4.0
>Reporter: Gengliang Wang
>Assignee: Gengliang Wang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42178) Handle remaining null string values in ui protobuf serializer and add tests

2023-01-24 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42178:


Assignee: Gengliang Wang  (was: Apache Spark)

> Handle remaining null string values in ui protobuf serializer and add tests
> ---
>
> Key: SPARK-42178
> URL: https://issues.apache.org/jira/browse/SPARK-42178
> Project: Spark
>  Issue Type: Sub-task
>  Components: Web UI
>Affects Versions: 3.4.0
>Reporter: Gengliang Wang
>Assignee: Gengliang Wang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42178) Handle remaining null string values in ui protobuf serializer and add tests

2023-01-24 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42178:


Assignee: Apache Spark  (was: Gengliang Wang)

> Handle remaining null string values in ui protobuf serializer and add tests
> ---
>
> Key: SPARK-42178
> URL: https://issues.apache.org/jira/browse/SPARK-42178
> Project: Spark
>  Issue Type: Sub-task
>  Components: Web UI
>Affects Versions: 3.4.0
>Reporter: Gengliang Wang
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-42178) Handle remaining null string values in ui protobuf serializer and add tests

2023-01-24 Thread Gengliang Wang (Jira)
Gengliang Wang created SPARK-42178:
--

 Summary: Handle remaining null string values in ui protobuf 
serializer and add tests
 Key: SPARK-42178
 URL: https://issues.apache.org/jira/browse/SPARK-42178
 Project: Spark
  Issue Type: Sub-task
  Components: Web UI
Affects Versions: 3.4.0
Reporter: Gengliang Wang
Assignee: Gengliang Wang






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42177) Change master to brach-3.4 in GitHub Actions

2023-01-24 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17680438#comment-17680438
 ] 

Apache Spark commented on SPARK-42177:
--

User 'HyukjinKwon' has created a pull request for this issue:
https://github.com/apache/spark/pull/39731

> Change master to brach-3.4 in GitHub Actions
> 
>
> Key: SPARK-42177
> URL: https://issues.apache.org/jira/browse/SPARK-42177
> Project: Spark
>  Issue Type: Bug
>  Components: Project Infra
>Affects Versions: 3.4.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
> Fix For: 3.4.0
>
>
> See https://github.com/apache/spark/actions/runs/4002380215/jobs/6869886029



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42177) Change master to brach-3.4 in GitHub Actions

2023-01-24 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17680437#comment-17680437
 ] 

Apache Spark commented on SPARK-42177:
--

User 'HyukjinKwon' has created a pull request for this issue:
https://github.com/apache/spark/pull/39731

> Change master to brach-3.4 in GitHub Actions
> 
>
> Key: SPARK-42177
> URL: https://issues.apache.org/jira/browse/SPARK-42177
> Project: Spark
>  Issue Type: Bug
>  Components: Project Infra
>Affects Versions: 3.4.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
> Fix For: 3.4.0
>
>
> See https://github.com/apache/spark/actions/runs/4002380215/jobs/6869886029



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42177) Change master to brach-3.4 in GitHub Actions

2023-01-24 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17680435#comment-17680435
 ] 

Apache Spark commented on SPARK-42177:
--

User 'HyukjinKwon' has created a pull request for this issue:
https://github.com/apache/spark/pull/39730

> Change master to brach-3.4 in GitHub Actions
> 
>
> Key: SPARK-42177
> URL: https://issues.apache.org/jira/browse/SPARK-42177
> Project: Spark
>  Issue Type: Bug
>  Components: Project Infra
>Affects Versions: 3.4.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
> Fix For: 3.4.0
>
>
> See https://github.com/apache/spark/actions/runs/4002380215/jobs/6869886029



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42177) Change master to brach-3.4 in GitHub Actions

2023-01-24 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-42177:


Assignee: Hyukjin Kwon

> Change master to brach-3.4 in GitHub Actions
> 
>
> Key: SPARK-42177
> URL: https://issues.apache.org/jira/browse/SPARK-42177
> Project: Spark
>  Issue Type: Bug
>  Components: Project Infra
>Affects Versions: 3.4.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
>
> See https://github.com/apache/spark/actions/runs/4002380215/jobs/6869886029



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-42177) Change master to brach-3.4 in GitHub Actions

2023-01-24 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-42177.
--
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 39731
[https://github.com/apache/spark/pull/39731]

> Change master to brach-3.4 in GitHub Actions
> 
>
> Key: SPARK-42177
> URL: https://issues.apache.org/jira/browse/SPARK-42177
> Project: Spark
>  Issue Type: Bug
>  Components: Project Infra
>Affects Versions: 3.4.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
> Fix For: 3.4.0
>
>
> See https://github.com/apache/spark/actions/runs/4002380215/jobs/6869886029



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-42176) Cast boolean to timestamp fails with ClassCastException

2023-01-24 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-42176:
-
Affects Version/s: 3.5.0

> Cast boolean to timestamp fails with ClassCastException
> ---
>
> Key: SPARK-42176
> URL: https://issues.apache.org/jira/browse/SPARK-42176
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.1, 3.4.0, 3.5.0
>Reporter: Ivan Sadikov
>Assignee: Ivan Sadikov
>Priority: Major
> Fix For: 3.3.2, 3.4.0, 3.5.0
>
>
> When casting a boolean value to timestamp, the following error is thrown:
> {code:java}
> [info]   java.lang.ClassCastException: java.lang.Integer cannot be cast to 
> java.lang.Long
> [info]   at scala.runtime.BoxesRunTime.unboxToLong(BoxesRunTime.java:107)
> [info]   at 
> org.apache.spark.sql.catalyst.InternalRow$.$anonfun$getWriter$5(InternalRow.scala:178)
> [info]   at 
> org.apache.spark.sql.catalyst.InternalRow$.$anonfun$getWriter$5$adapted(InternalRow.scala:178)
>  {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-42176) Cast boolean to timestamp fails with ClassCastException

2023-01-24 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-42176:
-
Fix Version/s: 3.5.0

> Cast boolean to timestamp fails with ClassCastException
> ---
>
> Key: SPARK-42176
> URL: https://issues.apache.org/jira/browse/SPARK-42176
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.1, 3.4.0
>Reporter: Ivan Sadikov
>Assignee: Ivan Sadikov
>Priority: Major
> Fix For: 3.3.2, 3.4.0, 3.5.0
>
>
> When casting a boolean value to timestamp, the following error is thrown:
> {code:java}
> [info]   java.lang.ClassCastException: java.lang.Integer cannot be cast to 
> java.lang.Long
> [info]   at scala.runtime.BoxesRunTime.unboxToLong(BoxesRunTime.java:107)
> [info]   at 
> org.apache.spark.sql.catalyst.InternalRow$.$anonfun$getWriter$5(InternalRow.scala:178)
> [info]   at 
> org.apache.spark.sql.catalyst.InternalRow$.$anonfun$getWriter$5$adapted(InternalRow.scala:178)
>  {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42176) Cast boolean to timestamp fails with ClassCastException

2023-01-24 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-42176:


Assignee: Ivan Sadikov

> Cast boolean to timestamp fails with ClassCastException
> ---
>
> Key: SPARK-42176
> URL: https://issues.apache.org/jira/browse/SPARK-42176
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.1, 3.4.0
>Reporter: Ivan Sadikov
>Assignee: Ivan Sadikov
>Priority: Major
>
> When casting a boolean value to timestamp, the following error is thrown:
> {code:java}
> [info]   java.lang.ClassCastException: java.lang.Integer cannot be cast to 
> java.lang.Long
> [info]   at scala.runtime.BoxesRunTime.unboxToLong(BoxesRunTime.java:107)
> [info]   at 
> org.apache.spark.sql.catalyst.InternalRow$.$anonfun$getWriter$5(InternalRow.scala:178)
> [info]   at 
> org.apache.spark.sql.catalyst.InternalRow$.$anonfun$getWriter$5$adapted(InternalRow.scala:178)
>  {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-42176) Cast boolean to timestamp fails with ClassCastException

2023-01-24 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-42176.
--
Fix Version/s: 3.3.2
   3.4.0
   Resolution: Fixed

Issue resolved by pull request 39729
[https://github.com/apache/spark/pull/39729]

> Cast boolean to timestamp fails with ClassCastException
> ---
>
> Key: SPARK-42176
> URL: https://issues.apache.org/jira/browse/SPARK-42176
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.1, 3.4.0
>Reporter: Ivan Sadikov
>Assignee: Ivan Sadikov
>Priority: Major
> Fix For: 3.3.2, 3.4.0
>
>
> When casting a boolean value to timestamp, the following error is thrown:
> {code:java}
> [info]   java.lang.ClassCastException: java.lang.Integer cannot be cast to 
> java.lang.Long
> [info]   at scala.runtime.BoxesRunTime.unboxToLong(BoxesRunTime.java:107)
> [info]   at 
> org.apache.spark.sql.catalyst.InternalRow$.$anonfun$getWriter$5(InternalRow.scala:178)
> [info]   at 
> org.apache.spark.sql.catalyst.InternalRow$.$anonfun$getWriter$5$adapted(InternalRow.scala:178)
>  {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-42177) Change master to brach-3.4 in GitHub Actions

2023-01-24 Thread Hyukjin Kwon (Jira)
Hyukjin Kwon created SPARK-42177:


 Summary: Change master to brach-3.4 in GitHub Actions
 Key: SPARK-42177
 URL: https://issues.apache.org/jira/browse/SPARK-42177
 Project: Spark
  Issue Type: Bug
  Components: Project Infra
Affects Versions: 3.4.0
Reporter: Hyukjin Kwon


See https://github.com/apache/spark/actions/runs/4002380215/jobs/6869886029



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42176) Cast boolean to timestamp fails with ClassCastException

2023-01-24 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42176:


Assignee: (was: Apache Spark)

> Cast boolean to timestamp fails with ClassCastException
> ---
>
> Key: SPARK-42176
> URL: https://issues.apache.org/jira/browse/SPARK-42176
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.1, 3.4.0
>Reporter: Ivan Sadikov
>Priority: Major
>
> When casting a boolean value to timestamp, the following error is thrown:
> {code:java}
> [info]   java.lang.ClassCastException: java.lang.Integer cannot be cast to 
> java.lang.Long
> [info]   at scala.runtime.BoxesRunTime.unboxToLong(BoxesRunTime.java:107)
> [info]   at 
> org.apache.spark.sql.catalyst.InternalRow$.$anonfun$getWriter$5(InternalRow.scala:178)
> [info]   at 
> org.apache.spark.sql.catalyst.InternalRow$.$anonfun$getWriter$5$adapted(InternalRow.scala:178)
>  {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42176) Cast boolean to timestamp fails with ClassCastException

2023-01-24 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42176:


Assignee: Apache Spark

> Cast boolean to timestamp fails with ClassCastException
> ---
>
> Key: SPARK-42176
> URL: https://issues.apache.org/jira/browse/SPARK-42176
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.1, 3.4.0
>Reporter: Ivan Sadikov
>Assignee: Apache Spark
>Priority: Major
>
> When casting a boolean value to timestamp, the following error is thrown:
> {code:java}
> [info]   java.lang.ClassCastException: java.lang.Integer cannot be cast to 
> java.lang.Long
> [info]   at scala.runtime.BoxesRunTime.unboxToLong(BoxesRunTime.java:107)
> [info]   at 
> org.apache.spark.sql.catalyst.InternalRow$.$anonfun$getWriter$5(InternalRow.scala:178)
> [info]   at 
> org.apache.spark.sql.catalyst.InternalRow$.$anonfun$getWriter$5$adapted(InternalRow.scala:178)
>  {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42176) Cast boolean to timestamp fails with ClassCastException

2023-01-24 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17680427#comment-17680427
 ] 

Apache Spark commented on SPARK-42176:
--

User 'sadikovi' has created a pull request for this issue:
https://github.com/apache/spark/pull/39729

> Cast boolean to timestamp fails with ClassCastException
> ---
>
> Key: SPARK-42176
> URL: https://issues.apache.org/jira/browse/SPARK-42176
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.1, 3.4.0
>Reporter: Ivan Sadikov
>Priority: Major
>
> When casting a boolean value to timestamp, the following error is thrown:
> {code:java}
> [info]   java.lang.ClassCastException: java.lang.Integer cannot be cast to 
> java.lang.Long
> [info]   at scala.runtime.BoxesRunTime.unboxToLong(BoxesRunTime.java:107)
> [info]   at 
> org.apache.spark.sql.catalyst.InternalRow$.$anonfun$getWriter$5(InternalRow.scala:178)
> [info]   at 
> org.apache.spark.sql.catalyst.InternalRow$.$anonfun$getWriter$5$adapted(InternalRow.scala:178)
>  {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42107) Spark 3.3.0 binary breaking change missing from release notes

2023-01-24 Thread Hyukjin Kwon (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17680420#comment-17680420
 ] 

Hyukjin Kwon commented on SPARK-42107:
--

cc [~cloud_fan] and [~dchvn]. I think we should at least add them into release 
notes.

> Spark 3.3.0 binary breaking change missing from release notes
> -
>
> Key: SPARK-42107
> URL: https://issues.apache.org/jira/browse/SPARK-42107
> Project: Spark
>  Issue Type: Bug
>  Components: Java API
>Affects Versions: 3.3.0
>Reporter: Ross Lawley
>Priority: Major
>
> SPARK-37929 contains a binary breaking change in the SupportsNamespaces API
> See: [https://github.com/apache/spark/pull/35246/files#r792289685]
>  
> There is no mention in the [release 
> notes|https://spark.apache.org/releases/spark-release-3-3-0.html]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42118) Wrong result when parsing a multiline JSON file with differing types for same column

2023-01-24 Thread Hyukjin Kwon (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17680419#comment-17680419
 ] 

Hyukjin Kwon commented on SPARK-42118:
--

As a workaround you can do:

{code}
val df = spark.read.format("json").option("multiLine", true).load("/tmp/json")
val newDF = spark.createDataFrame(df.rdd, df.schema)
{code}

then 

{code}
df.show(false)
df.count
{code}

will show the consistent output.

> Wrong result when parsing a multiline JSON file with differing types for same 
> column
> 
>
> Key: SPARK-42118
> URL: https://issues.apache.org/jira/browse/SPARK-42118
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.1
>Reporter: Dilip Biswal
>Priority: Major
>
> Here is a simple reproduction of the problem. We have a JSON file whose 
> content looks like following and is in multiLine format.
> {code}
> [{"name":""},{"name":123.34}]
> {code}
> Here is the result of spark query when we read the above content.
> scala> val df = spark.read.format("json").option("multiLine", 
> true).load("/tmp/json")
> df: org.apache.spark.sql.DataFrame = [name: double]
> scala> df.show(false)
> ++
> |name|
> ++
> |null|
> ++
> scala> df.count
> res5: Long = 2
> This is quite a serious problem for us as it's causing us to master corrupt 
> data in lake. If there is some issue with parsing the input, we expect spark 
> set the "_corrupt_record" so that we can act on it. Please note that df.count 
> is reporting 2 rows where as df.show only reports 1 row with null value.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42175) Implement more methods in the Scala Client Dataset API

2023-01-24 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17680418#comment-17680418
 ] 

Apache Spark commented on SPARK-42175:
--

User 'sadikovi' has created a pull request for this issue:
https://github.com/apache/spark/pull/39729

> Implement more methods in the Scala Client Dataset API
> --
>
> Key: SPARK-42175
> URL: https://issues.apache.org/jira/browse/SPARK-42175
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Zhen Li
>Priority: Major
>
> Also fix the TODOs in the MiMa compatibility test. 
> https://github.com/apache/spark/pull/39712



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42175) Implement more methods in the Scala Client Dataset API

2023-01-24 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42175?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42175:


Assignee: (was: Apache Spark)

> Implement more methods in the Scala Client Dataset API
> --
>
> Key: SPARK-42175
> URL: https://issues.apache.org/jira/browse/SPARK-42175
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Zhen Li
>Priority: Major
>
> Also fix the TODOs in the MiMa compatibility test. 
> https://github.com/apache/spark/pull/39712



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42175) Implement more methods in the Scala Client Dataset API

2023-01-24 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17680417#comment-17680417
 ] 

Apache Spark commented on SPARK-42175:
--

User 'sadikovi' has created a pull request for this issue:
https://github.com/apache/spark/pull/39729

> Implement more methods in the Scala Client Dataset API
> --
>
> Key: SPARK-42175
> URL: https://issues.apache.org/jira/browse/SPARK-42175
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Zhen Li
>Priority: Major
>
> Also fix the TODOs in the MiMa compatibility test. 
> https://github.com/apache/spark/pull/39712



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42175) Implement more methods in the Scala Client Dataset API

2023-01-24 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42175?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42175:


Assignee: Apache Spark

> Implement more methods in the Scala Client Dataset API
> --
>
> Key: SPARK-42175
> URL: https://issues.apache.org/jira/browse/SPARK-42175
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Zhen Li
>Assignee: Apache Spark
>Priority: Major
>
> Also fix the TODOs in the MiMa compatibility test. 
> https://github.com/apache/spark/pull/39712



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-42127) Spark 3.3.0, Error with java.io.IOException: Mkdirs failed to create file

2023-01-24 Thread Hyukjin Kwon (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17680414#comment-17680414
 ] 

Hyukjin Kwon edited comment on SPARK-42127 at 1/25/23 12:55 AM:


[~shamim_er123]how did you face this error? would be great if there are steps 
to reproduce this.


was (Author: gurwls223):
[~shamim_er123]how did you reproduce this?

> Spark 3.3.0, Error with java.io.IOException: Mkdirs failed to create file
> -
>
> Key: SPARK-42127
> URL: https://issues.apache.org/jira/browse/SPARK-42127
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.3.0
>Reporter: shamim
>Priority: Major
>
> 23/01/18 20:23:24 WARN TaskSetManager: Lost task 4.0 in stage 0.0 (TID 4) 
> (10.64.109.72 executor 0): java.io.IOException: Mkdirs failed to create 
> file:/var/backup/_temporary/0/_temporary/attempt_202301182023173234741341853025716_0005_m_04_0
>  (exists=false, cwd=file:/opt/spark-3.3.0/work/app-20230118202317-0001/0)
>         at 
> org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:515)
>         at 
> org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:500)
>         at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1195)
>         at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1081)
>         at 
> org.apache.hadoop.mapred.TextOutputFormat.getRecordWriter(TextOutputFormat.java:113)
>         at 
> org.apache.spark.internal.io.HadoopMapRedWriteConfigUtil.initWriter(SparkHadoopWriter.scala:238)
>         at 
> org.apache.spark.internal.io.SparkHadoopWriter$.executeTask(SparkHadoopWriter.scala:126)
>         at 
> org.apache.spark.internal.io.SparkHadoopWriter$.$anonfun$write$1(SparkHadoopWriter.scala:88)
>         at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
>         at org.apache.spark.scheduler.Task.run(Task.scala:136)
>         at 
> org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:548)
>         at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1504)
>         at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:551)
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>         at java.lang.Thread.run(Thread.java:750) 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42127) Spark 3.3.0, Error with java.io.IOException: Mkdirs failed to create file

2023-01-24 Thread Hyukjin Kwon (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17680414#comment-17680414
 ] 

Hyukjin Kwon commented on SPARK-42127:
--

[~shamim_er123]how did you reproduce this?

> Spark 3.3.0, Error with java.io.IOException: Mkdirs failed to create file
> -
>
> Key: SPARK-42127
> URL: https://issues.apache.org/jira/browse/SPARK-42127
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.3.0
>Reporter: shamim
>Priority: Major
>
> 23/01/18 20:23:24 WARN TaskSetManager: Lost task 4.0 in stage 0.0 (TID 4) 
> (10.64.109.72 executor 0): java.io.IOException: Mkdirs failed to create 
> file:/var/backup/_temporary/0/_temporary/attempt_202301182023173234741341853025716_0005_m_04_0
>  (exists=false, cwd=file:/opt/spark-3.3.0/work/app-20230118202317-0001/0)
>         at 
> org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:515)
>         at 
> org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:500)
>         at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1195)
>         at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1081)
>         at 
> org.apache.hadoop.mapred.TextOutputFormat.getRecordWriter(TextOutputFormat.java:113)
>         at 
> org.apache.spark.internal.io.HadoopMapRedWriteConfigUtil.initWriter(SparkHadoopWriter.scala:238)
>         at 
> org.apache.spark.internal.io.SparkHadoopWriter$.executeTask(SparkHadoopWriter.scala:126)
>         at 
> org.apache.spark.internal.io.SparkHadoopWriter$.$anonfun$write$1(SparkHadoopWriter.scala:88)
>         at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
>         at org.apache.spark.scheduler.Task.run(Task.scala:136)
>         at 
> org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:548)
>         at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1504)
>         at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:551)
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>         at java.lang.Thread.run(Thread.java:750) 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42033) Docker Tag Error 25 on gitlab-ci.yml trying to start GitLab Pipeline

2023-01-24 Thread Hyukjin Kwon (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17680413#comment-17680413
 ] 

Hyukjin Kwon commented on SPARK-42033:
--

How is this a Spark issue?

> Docker Tag Error 25 on gitlab-ci.yml trying to start GitLab Pipeline
> 
>
> Key: SPARK-42033
> URL: https://issues.apache.org/jira/browse/SPARK-42033
> Project: Spark
>  Issue Type: Bug
>  Components: Deploy
>Affects Versions: 2.1.1
>Reporter: Pankaj Nagla
>Priority: Major
>
> I'm going through the "Scalable FastAPI Application on AWS" course. My 
> gitlab-ci.yml file is below.
>     stages:
>   - docker
> variables:
>   DOCKER_DRIVER: overlay2
>   DOCKER_TLS_CERTDIR: "/certs"
> cache:
>   key: ${CI_JOB_NAME}
>   paths:
>     - ${CI_PROJECT_DIR}/services/talk_booking/.venv/
> build-python-ci-image:
>   image: docker:19.03.0
>   services:
>     - docker:19.03.0-dind
>   stage: docker
>   before_script:
>     - cd ci_cd/python/
>   script:
>     - docker login -u "$CI_REGISTRY_USER" -p "$CI_REGISTRY_PASSWORD" 
> $CI_REGISTRY
>     - docker build -t 
> registry.gitlab.com/chris_/talk-booking:cicd-python3.9-slim .
>     - docker push registry.gitlab.com/chris_/talk-booking:cicd-python3.9-slim
> My Pipeline fails with this error:
> See 
> [https://docs.docker.com/engine/reference/commandline/login/#credentials-store]
> Login Succeeded
> $ docker build -t registry.gitlab.com/chris_/talk-booking:cicd-python3.9-slim 
> .
> invalid argument 
> "registry.gitlab.com/chris_/talk-booking:cicd-python3.9-slim" for "-t, --tag" 
> flag: invalid reference format
> See 'docker build --help'.
> Cleaning up project directory and file based variables
> ERROR: Job failed: exit code 125
> It may or may not be relevant but the Container Registry for the GitLab 
> project says there's a Docker connection error. All these problems have been 
> discussed in this [Aws Sysops Training 
> |https://www.igmguru.com/cloud-computing/aws-sysops-certification-training/]follow
>  the page.
> Thanks
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42034) QueryExecutionListener and Observation API, df.observe do not work with `foreach` action.

2023-01-24 Thread Hyukjin Kwon (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17680412#comment-17680412
 ] 

Hyukjin Kwon commented on SPARK-42034:
--

please go ahead a PR if you're interested in doing that!

> QueryExecutionListener and Observation API, df.observe do not work with 
> `foreach` action.
> -
>
> Key: SPARK-42034
> URL: https://issues.apache.org/jira/browse/SPARK-42034
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.3, 3.2.2, 3.3.1
> Environment: I test it locally and on YARN in cluster mode.
> Spark 3.3.1 and 3.2.2 and 3.1.1.
> Yarn 2.9.2 and 3.2.1.
>Reporter: Nick Hryhoriev
>Priority: Major
>  Labels: sql-api
>
> Observation API, {{observe}} dataframe transformation, and custom 
> QueryExecutionListener.
> Do not work with {{foreach}} or {{foreachPartition actions.}}
> {{This is due to }}QueryExecutionListener functions do not trigger on queries 
> whose action is {{foreach}} or {{{}foreachPartition{}}}.
> But the Spark GUI SQL tab sees this query as SQL query and shows its query 
> plans and etc.
> here is the code to reproduce it:
> https://gist.github.com/GrigorievNick/e7cf9ec5584b417d9719e2812722e6d3



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42034) QueryExecutionListener and Observation API, df.observe do not work with `foreach` action.

2023-01-24 Thread Hyukjin Kwon (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17680411#comment-17680411
 ] 

Hyukjin Kwon commented on SPARK-42034:
--

I think the action has to be triggered by using `withAction` in the codes.

> QueryExecutionListener and Observation API, df.observe do not work with 
> `foreach` action.
> -
>
> Key: SPARK-42034
> URL: https://issues.apache.org/jira/browse/SPARK-42034
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.3, 3.2.2, 3.3.1
> Environment: I test it locally and on YARN in cluster mode.
> Spark 3.3.1 and 3.2.2 and 3.1.1.
> Yarn 2.9.2 and 3.2.1.
>Reporter: Nick Hryhoriev
>Priority: Major
>  Labels: sql-api
>
> Observation API, {{observe}} dataframe transformation, and custom 
> QueryExecutionListener.
> Do not work with {{foreach}} or {{foreachPartition actions.}}
> {{This is due to }}QueryExecutionListener functions do not trigger on queries 
> whose action is {{foreach}} or {{{}foreachPartition{}}}.
> But the Spark GUI SQL tab sees this query as SQL query and shows its query 
> plans and etc.
> here is the code to reproduce it:
> https://gist.github.com/GrigorievNick/e7cf9ec5584b417d9719e2812722e6d3



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-42033) Docker Tag Error 25 on gitlab-ci.yml trying to start GitLab Pipeline

2023-01-24 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-42033:
-
Target Version/s:   (was: 1.5.2)

> Docker Tag Error 25 on gitlab-ci.yml trying to start GitLab Pipeline
> 
>
> Key: SPARK-42033
> URL: https://issues.apache.org/jira/browse/SPARK-42033
> Project: Spark
>  Issue Type: Bug
>  Components: Deploy
>Affects Versions: 2.1.1
>Reporter: Pankaj Nagla
>Priority: Major
> Fix For: 1.6.2
>
>
> I'm going through the "Scalable FastAPI Application on AWS" course. My 
> gitlab-ci.yml file is below.
>     stages:
>   - docker
> variables:
>   DOCKER_DRIVER: overlay2
>   DOCKER_TLS_CERTDIR: "/certs"
> cache:
>   key: ${CI_JOB_NAME}
>   paths:
>     - ${CI_PROJECT_DIR}/services/talk_booking/.venv/
> build-python-ci-image:
>   image: docker:19.03.0
>   services:
>     - docker:19.03.0-dind
>   stage: docker
>   before_script:
>     - cd ci_cd/python/
>   script:
>     - docker login -u "$CI_REGISTRY_USER" -p "$CI_REGISTRY_PASSWORD" 
> $CI_REGISTRY
>     - docker build -t 
> registry.gitlab.com/chris_/talk-booking:cicd-python3.9-slim .
>     - docker push registry.gitlab.com/chris_/talk-booking:cicd-python3.9-slim
> My Pipeline fails with this error:
> See 
> [https://docs.docker.com/engine/reference/commandline/login/#credentials-store]
> Login Succeeded
> $ docker build -t registry.gitlab.com/chris_/talk-booking:cicd-python3.9-slim 
> .
> invalid argument 
> "registry.gitlab.com/chris_/talk-booking:cicd-python3.9-slim" for "-t, --tag" 
> flag: invalid reference format
> See 'docker build --help'.
> Cleaning up project directory and file based variables
> ERROR: Job failed: exit code 125
> It may or may not be relevant but the Container Registry for the GitLab 
> project says there's a Docker connection error. All these problems have been 
> discussed in this [Aws Sysops Training 
> |https://www.igmguru.com/cloud-computing/aws-sysops-certification-training/]follow
>  the page.
> Thanks
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-42033) Docker Tag Error 25 on gitlab-ci.yml trying to start GitLab Pipeline

2023-01-24 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-42033:
-
Fix Version/s: (was: 1.6.2)

> Docker Tag Error 25 on gitlab-ci.yml trying to start GitLab Pipeline
> 
>
> Key: SPARK-42033
> URL: https://issues.apache.org/jira/browse/SPARK-42033
> Project: Spark
>  Issue Type: Bug
>  Components: Deploy
>Affects Versions: 2.1.1
>Reporter: Pankaj Nagla
>Priority: Major
>
> I'm going through the "Scalable FastAPI Application on AWS" course. My 
> gitlab-ci.yml file is below.
>     stages:
>   - docker
> variables:
>   DOCKER_DRIVER: overlay2
>   DOCKER_TLS_CERTDIR: "/certs"
> cache:
>   key: ${CI_JOB_NAME}
>   paths:
>     - ${CI_PROJECT_DIR}/services/talk_booking/.venv/
> build-python-ci-image:
>   image: docker:19.03.0
>   services:
>     - docker:19.03.0-dind
>   stage: docker
>   before_script:
>     - cd ci_cd/python/
>   script:
>     - docker login -u "$CI_REGISTRY_USER" -p "$CI_REGISTRY_PASSWORD" 
> $CI_REGISTRY
>     - docker build -t 
> registry.gitlab.com/chris_/talk-booking:cicd-python3.9-slim .
>     - docker push registry.gitlab.com/chris_/talk-booking:cicd-python3.9-slim
> My Pipeline fails with this error:
> See 
> [https://docs.docker.com/engine/reference/commandline/login/#credentials-store]
> Login Succeeded
> $ docker build -t registry.gitlab.com/chris_/talk-booking:cicd-python3.9-slim 
> .
> invalid argument 
> "registry.gitlab.com/chris_/talk-booking:cicd-python3.9-slim" for "-t, --tag" 
> flag: invalid reference format
> See 'docker build --help'.
> Cleaning up project directory and file based variables
> ERROR: Job failed: exit code 125
> It may or may not be relevant but the Container Registry for the GitLab 
> project says there's a Docker connection error. All these problems have been 
> discussed in this [Aws Sysops Training 
> |https://www.igmguru.com/cloud-computing/aws-sysops-certification-training/]follow
>  the page.
> Thanks
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-42176) Cast boolean to timestamp fails with ClassCastException

2023-01-24 Thread Ivan Sadikov (Jira)
Ivan Sadikov created SPARK-42176:


 Summary: Cast boolean to timestamp fails with ClassCastException
 Key: SPARK-42176
 URL: https://issues.apache.org/jira/browse/SPARK-42176
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.3.1, 3.4.0
Reporter: Ivan Sadikov






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-42176) Cast boolean to timestamp fails with ClassCastException

2023-01-24 Thread Ivan Sadikov (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Sadikov updated SPARK-42176:
-
Description: 
When casting a boolean value to timestamp, the following error is thrown:
{code:java}
[info]   java.lang.ClassCastException: java.lang.Integer cannot be cast to 
java.lang.Long
[info]   at scala.runtime.BoxesRunTime.unboxToLong(BoxesRunTime.java:107)
[info]   at 
org.apache.spark.sql.catalyst.InternalRow$.$anonfun$getWriter$5(InternalRow.scala:178)
[info]   at 
org.apache.spark.sql.catalyst.InternalRow$.$anonfun$getWriter$5$adapted(InternalRow.scala:178)
 {code}

> Cast boolean to timestamp fails with ClassCastException
> ---
>
> Key: SPARK-42176
> URL: https://issues.apache.org/jira/browse/SPARK-42176
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.1, 3.4.0
>Reporter: Ivan Sadikov
>Priority: Major
>
> When casting a boolean value to timestamp, the following error is thrown:
> {code:java}
> [info]   java.lang.ClassCastException: java.lang.Integer cannot be cast to 
> java.lang.Long
> [info]   at scala.runtime.BoxesRunTime.unboxToLong(BoxesRunTime.java:107)
> [info]   at 
> org.apache.spark.sql.catalyst.InternalRow$.$anonfun$getWriter$5(InternalRow.scala:178)
> [info]   at 
> org.apache.spark.sql.catalyst.InternalRow$.$anonfun$getWriter$5$adapted(InternalRow.scala:178)
>  {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-42175) Implement more methods in the Scala Client Dataset API

2023-01-24 Thread Zhen Li (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42175?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhen Li updated SPARK-42175:

Description: Also fix the TODOs in the MiMa compatibility test. 
https://github.com/apache/spark/pull/39712

> Implement more methods in the Scala Client Dataset API
> --
>
> Key: SPARK-42175
> URL: https://issues.apache.org/jira/browse/SPARK-42175
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Zhen Li
>Priority: Major
>
> Also fix the TODOs in the MiMa compatibility test. 
> https://github.com/apache/spark/pull/39712



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-42175) Implement more methods in the Scala Client Dataset API

2023-01-24 Thread Zhen Li (Jira)
Zhen Li created SPARK-42175:
---

 Summary: Implement more methods in the Scala Client Dataset API
 Key: SPARK-42175
 URL: https://issues.apache.org/jira/browse/SPARK-42175
 Project: Spark
  Issue Type: Improvement
  Components: Connect
Affects Versions: 3.4.0
Reporter: Zhen Li






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42173) IPv6 address mapping can fail with sparse addresses

2023-01-24 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17680410#comment-17680410
 ] 

Apache Spark commented on SPARK-42173:
--

User 'holdenk' has created a pull request for this issue:
https://github.com/apache/spark/pull/39728

> IPv6 address mapping can fail with sparse addresses
> ---
>
> Key: SPARK-42173
> URL: https://issues.apache.org/jira/browse/SPARK-42173
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.4.0
> Environment: I've only observed this on Kube but it might be able to 
> happen on YARN since the mapping is in core.
>Reporter: Holden Karau
>Assignee: Holden Karau
>Priority: Major
>
> e.g. the `addressToExecutorId` ends up storing `2602:fcb1::1337:12` but when 
> it comes time to handle the `onDisconnect` all of the implicit zeros are 
> expanded out.
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42173) IPv6 address mapping can fail with sparse addresses

2023-01-24 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42173:


Assignee: Holden Karau  (was: Apache Spark)

> IPv6 address mapping can fail with sparse addresses
> ---
>
> Key: SPARK-42173
> URL: https://issues.apache.org/jira/browse/SPARK-42173
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.4.0
> Environment: I've only observed this on Kube but it might be able to 
> happen on YARN since the mapping is in core.
>Reporter: Holden Karau
>Assignee: Holden Karau
>Priority: Major
>
> e.g. the `addressToExecutorId` ends up storing `2602:fcb1::1337:12` but when 
> it comes time to handle the `onDisconnect` all of the implicit zeros are 
> expanded out.
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42173) IPv6 address mapping can fail with sparse addresses

2023-01-24 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17680409#comment-17680409
 ] 

Apache Spark commented on SPARK-42173:
--

User 'holdenk' has created a pull request for this issue:
https://github.com/apache/spark/pull/39728

> IPv6 address mapping can fail with sparse addresses
> ---
>
> Key: SPARK-42173
> URL: https://issues.apache.org/jira/browse/SPARK-42173
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.4.0
> Environment: I've only observed this on Kube but it might be able to 
> happen on YARN since the mapping is in core.
>Reporter: Holden Karau
>Assignee: Holden Karau
>Priority: Major
>
> e.g. the `addressToExecutorId` ends up storing `2602:fcb1::1337:12` but when 
> it comes time to handle the `onDisconnect` all of the implicit zeros are 
> expanded out.
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42173) IPv6 address mapping can fail with sparse addresses

2023-01-24 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42173:


Assignee: Apache Spark  (was: Holden Karau)

> IPv6 address mapping can fail with sparse addresses
> ---
>
> Key: SPARK-42173
> URL: https://issues.apache.org/jira/browse/SPARK-42173
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.4.0
> Environment: I've only observed this on Kube but it might be able to 
> happen on YARN since the mapping is in core.
>Reporter: Holden Karau
>Assignee: Apache Spark
>Priority: Major
>
> e.g. the `addressToExecutorId` ends up storing `2602:fcb1::1337:12` but when 
> it comes time to handle the `onDisconnect` all of the implicit zeros are 
> expanded out.
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42119) Add built-in table-valued functions inline and inline_outer

2023-01-24 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-42119:


Assignee: Allison Wang

> Add built-in table-valued functions inline and inline_outer
> ---
>
> Key: SPARK-42119
> URL: https://issues.apache.org/jira/browse/SPARK-42119
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Allison Wang
>Assignee: Allison Wang
>Priority: Major
>
> Add `inline` and `inline_outer` to the built-in table function registry.
> Add new SQL tests in `table-valued-functions.sql` and `join-lateral.sql`.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-42119) Add built-in table-valued functions inline and inline_outer

2023-01-24 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-42119.
--
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 39656
[https://github.com/apache/spark/pull/39656]

> Add built-in table-valued functions inline and inline_outer
> ---
>
> Key: SPARK-42119
> URL: https://issues.apache.org/jira/browse/SPARK-42119
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Allison Wang
>Assignee: Allison Wang
>Priority: Major
> Fix For: 3.4.0
>
>
> Add `inline` and `inline_outer` to the built-in table function registry.
> Add new SQL tests in `table-valued-functions.sql` and `join-lateral.sql`.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42174) Use scikit-learn instead of sklearn

2023-01-24 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-42174:


Assignee: Dongjoon Hyun

> Use scikit-learn instead of sklearn
> ---
>
> Key: SPARK-42174
> URL: https://issues.apache.org/jira/browse/SPARK-42174
> Project: Spark
>  Issue Type: Bug
>  Components: Project Infra, PySpark
>Affects Versions: 3.4.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-42174) Use scikit-learn instead of sklearn

2023-01-24 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-42174.
--
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 39727
[https://github.com/apache/spark/pull/39727]

> Use scikit-learn instead of sklearn
> ---
>
> Key: SPARK-42174
> URL: https://issues.apache.org/jira/browse/SPARK-42174
> Project: Spark
>  Issue Type: Bug
>  Components: Project Infra, PySpark
>Affects Versions: 3.4.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-36124) Support set operators to be on correlation paths

2023-01-24 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-36124:


Assignee: Allison Wang

> Support set operators to be on correlation paths
> 
>
> Key: SPARK-36124
> URL: https://issues.apache.org/jira/browse/SPARK-36124
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Allison Wang
>Assignee: Allison Wang
>Priority: Major
>
> A correlation path is defined as the sub-tree of all the operators that are 
> on the path from the operator hosting the correlated expressions up to the 
> operator producing the correlated values. 
> We want to support set operators such as union and intercept to be on 
> correlation paths by adding them in DecorrelateInnerQuery. Please see page 
> 391 for more details: 
> [https://dl.gi.de/bitstream/handle/20.500.12116/2418/383.pdf] 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-36124) Support set operators to be on correlation paths

2023-01-24 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-36124.
--
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 39375
[https://github.com/apache/spark/pull/39375]

> Support set operators to be on correlation paths
> 
>
> Key: SPARK-36124
> URL: https://issues.apache.org/jira/browse/SPARK-36124
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Allison Wang
>Assignee: Allison Wang
>Priority: Major
> Fix For: 3.4.0
>
>
> A correlation path is defined as the sub-tree of all the operators that are 
> on the path from the operator hosting the correlated expressions up to the 
> operator producing the correlated values. 
> We want to support set operators such as union and intercept to be on 
> correlation paths by adding them in DecorrelateInnerQuery. Please see page 
> 391 for more details: 
> [https://dl.gi.de/bitstream/handle/20.500.12116/2418/383.pdf] 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42174) Use scikit-learn instead of sklearn

2023-01-24 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17680379#comment-17680379
 ] 

Apache Spark commented on SPARK-42174:
--

User 'dongjoon-hyun' has created a pull request for this issue:
https://github.com/apache/spark/pull/39727

> Use scikit-learn instead of sklearn
> ---
>
> Key: SPARK-42174
> URL: https://issues.apache.org/jira/browse/SPARK-42174
> Project: Spark
>  Issue Type: Bug
>  Components: Project Infra, PySpark
>Affects Versions: 3.4.0
>Reporter: Dongjoon Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42174) Use scikit-learn instead of sklearn

2023-01-24 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42174:


Assignee: Apache Spark

> Use scikit-learn instead of sklearn
> ---
>
> Key: SPARK-42174
> URL: https://issues.apache.org/jira/browse/SPARK-42174
> Project: Spark
>  Issue Type: Bug
>  Components: Project Infra, PySpark
>Affects Versions: 3.4.0
>Reporter: Dongjoon Hyun
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42174) Use scikit-learn instead of sklearn

2023-01-24 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42174:


Assignee: (was: Apache Spark)

> Use scikit-learn instead of sklearn
> ---
>
> Key: SPARK-42174
> URL: https://issues.apache.org/jira/browse/SPARK-42174
> Project: Spark
>  Issue Type: Bug
>  Components: Project Infra, PySpark
>Affects Versions: 3.4.0
>Reporter: Dongjoon Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-42174) Use scikit-learn instead of sklearn

2023-01-24 Thread Dongjoon Hyun (Jira)
Dongjoon Hyun created SPARK-42174:
-

 Summary: Use scikit-learn instead of sklearn
 Key: SPARK-42174
 URL: https://issues.apache.org/jira/browse/SPARK-42174
 Project: Spark
  Issue Type: Bug
  Components: Project Infra, PySpark
Affects Versions: 3.4.0
Reporter: Dongjoon Hyun






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42123) Include column default values in DESCRIBE output

2023-01-24 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17680362#comment-17680362
 ] 

Apache Spark commented on SPARK-42123:
--

User 'dtenedor' has created a pull request for this issue:
https://github.com/apache/spark/pull/39726

> Include column default values in DESCRIBE output
> 
>
> Key: SPARK-42123
> URL: https://issues.apache.org/jira/browse/SPARK-42123
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Daniel
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-42171) Fix `pyspark-errors` module and enable it in GitHub Action

2023-01-24 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-42171.
---
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 39721
[https://github.com/apache/spark/pull/39721]

> Fix `pyspark-errors` module and enable it in GitHub Action
> --
>
> Key: SPARK-42171
> URL: https://issues.apache.org/jira/browse/SPARK-42171
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark, Tests
>Affects Versions: 3.4.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Blocker
>  Labels: 3.4.0
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42171) Fix `pyspark-errors` module and enable it in GitHub Action

2023-01-24 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-42171:
-

Assignee: Dongjoon Hyun

> Fix `pyspark-errors` module and enable it in GitHub Action
> --
>
> Key: SPARK-42171
> URL: https://issues.apache.org/jira/browse/SPARK-42171
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark, Tests
>Affects Versions: 3.4.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Blocker
>  Labels: 3.4.0
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-33573) Server side metrics related to push-based shuffle

2023-01-24 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-33573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17680350#comment-17680350
 ] 

Apache Spark commented on SPARK-33573:
--

User 'rmcyang' has created a pull request for this issue:
https://github.com/apache/spark/pull/39725

> Server side metrics related to push-based shuffle
> -
>
> Key: SPARK-33573
> URL: https://issues.apache.org/jira/browse/SPARK-33573
> Project: Spark
>  Issue Type: Sub-task
>  Components: Shuffle, Spark Core
>Affects Versions: 3.1.0
>Reporter: Min Shen
>Assignee: Minchu Yang
>Priority: Major
> Fix For: 3.4.0
>
>
> Shuffle Server side metrics for push based shuffle.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-42173) IPv6 address mapping can fail with sparse addresses

2023-01-24 Thread Holden Karau (Jira)
Holden Karau created SPARK-42173:


 Summary: IPv6 address mapping can fail with sparse addresses
 Key: SPARK-42173
 URL: https://issues.apache.org/jira/browse/SPARK-42173
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 3.4.0
 Environment: I've only observed this on Kube but it might be able to 
happen on YARN since the mapping is in core.
Reporter: Holden Karau
Assignee: Holden Karau


e.g. the `addressToExecutorId` ends up storing `2602:fcb1::1337:12` but when it 
comes time to handle the `onDisconnect` all of the implicit zeros are expanded 
out.

 

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-42103) Add Instrumentation

2023-01-24 Thread Rithwik Ediga Lakhamsani (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rithwik Ediga Lakhamsani resolved SPARK-42103.
--
Resolution: Not A Problem

> Add Instrumentation
> ---
>
> Key: SPARK-42103
> URL: https://issues.apache.org/jira/browse/SPARK-42103
> Project: Spark
>  Issue Type: Sub-task
>  Components: ML, PySpark
>Affects Versions: 3.4.0
>Reporter: Rithwik Ediga Lakhamsani
>Priority: Major
>
> Adding instrumentation



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-41590) Implement Baseline API Code

2023-01-24 Thread Rithwik Ediga Lakhamsani (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rithwik Ediga Lakhamsani resolved SPARK-41590.
--
Resolution: Fixed

> Implement Baseline API Code
> ---
>
> Key: SPARK-41590
> URL: https://issues.apache.org/jira/browse/SPARK-41590
> Project: Spark
>  Issue Type: Sub-task
>  Components: ML
>Affects Versions: 3.4.0
>Reporter: Rithwik Ediga Lakhamsani
>Priority: Major
>
> Creating a baseline API so that we can agree on how the users will interact 
> with the code. This was determined in this [Design 
> Document|https://docs.google.com/document/d/1_nhUP46cHnYmnZoyirySXvuY1KDMU3vdHRx9MngSVtA/edit]
>  and can be updated as necessary.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41775) Implement training functions as input

2023-01-24 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17680337#comment-17680337
 ] 

Apache Spark commented on SPARK-41775:
--

User 'rithwik-db' has created a pull request for this issue:
https://github.com/apache/spark/pull/39724

> Implement training functions as input
> -
>
> Key: SPARK-41775
> URL: https://issues.apache.org/jira/browse/SPARK-41775
> Project: Spark
>  Issue Type: Sub-task
>  Components: ML, PySpark
>Affects Versions: 3.4.0
>Reporter: Rithwik Ediga Lakhamsani
>Assignee: Rithwik Ediga Lakhamsani
>Priority: Major
> Fix For: 3.4.0
>
>
> Sidenote: make formatting updates described in 
> https://github.com/apache/spark/pull/39188
>  
> Currently, `Distributor().run(...)` takes only files as input. Now we will 
> add in additional functionality to take in functions as well. This will 
> require us to go through the following process on each task in the executor 
> nodes:
> 1. take the input function and args and pickle them
> 2. Create a temp train.py file that looks like
> {code:java}
> import cloudpickle
> import os
> if _name_ == "_main_":
>     train, args = cloudpickle.load(f"{tempdir}/train_input.pkl")
>     output = train(*args)
>     if output and os.environ.get("RANK", "") == "0": # this is for 
> partitionId == 0
>         cloudpickle.dump(f"{tempdir}/train_output.pkl") {code}
> 3. Run that train.py file with `torchrun`
> 4. Check if `train_output.pkl` has been created on process on partitionId == 
> 0, if it has, then deserialize it and return that output through `.collect()`



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41775) Implement training functions as input

2023-01-24 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17680336#comment-17680336
 ] 

Apache Spark commented on SPARK-41775:
--

User 'rithwik-db' has created a pull request for this issue:
https://github.com/apache/spark/pull/39724

> Implement training functions as input
> -
>
> Key: SPARK-41775
> URL: https://issues.apache.org/jira/browse/SPARK-41775
> Project: Spark
>  Issue Type: Sub-task
>  Components: ML, PySpark
>Affects Versions: 3.4.0
>Reporter: Rithwik Ediga Lakhamsani
>Assignee: Rithwik Ediga Lakhamsani
>Priority: Major
> Fix For: 3.4.0
>
>
> Sidenote: make formatting updates described in 
> https://github.com/apache/spark/pull/39188
>  
> Currently, `Distributor().run(...)` takes only files as input. Now we will 
> add in additional functionality to take in functions as well. This will 
> require us to go through the following process on each task in the executor 
> nodes:
> 1. take the input function and args and pickle them
> 2. Create a temp train.py file that looks like
> {code:java}
> import cloudpickle
> import os
> if _name_ == "_main_":
>     train, args = cloudpickle.load(f"{tempdir}/train_input.pkl")
>     output = train(*args)
>     if output and os.environ.get("RANK", "") == "0": # this is for 
> partitionId == 0
>         cloudpickle.dump(f"{tempdir}/train_output.pkl") {code}
> 3. Run that train.py file with `torchrun`
> 4. Check if `train_output.pkl` has been created on process on partitionId == 
> 0, if it has, then deserialize it and return that output through `.collect()`



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-41302) Assign a name to the error class _LEGACY_ERROR_TEMP_1185

2023-01-24 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41302:


Assignee: Apache Spark

> Assign a name to the error class _LEGACY_ERROR_TEMP_1185
> 
>
> Key: SPARK-41302
> URL: https://issues.apache.org/jira/browse/SPARK-41302
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: BingKun Pan
>Assignee: Apache Spark
>Priority: Minor
>
> Assign a name to the legacy error class _LEGACY_ERROR_TEMP_1185, improve 
> error message and tests.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41302) Assign a name to the error class _LEGACY_ERROR_TEMP_1185

2023-01-24 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17680324#comment-17680324
 ] 

Apache Spark commented on SPARK-41302:
--

User 'NarekDW' has created a pull request for this issue:
https://github.com/apache/spark/pull/39723

> Assign a name to the error class _LEGACY_ERROR_TEMP_1185
> 
>
> Key: SPARK-41302
> URL: https://issues.apache.org/jira/browse/SPARK-41302
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: BingKun Pan
>Priority: Minor
>
> Assign a name to the legacy error class _LEGACY_ERROR_TEMP_1185, improve 
> error message and tests.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-41302) Assign a name to the error class _LEGACY_ERROR_TEMP_1185

2023-01-24 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41302:


Assignee: (was: Apache Spark)

> Assign a name to the error class _LEGACY_ERROR_TEMP_1185
> 
>
> Key: SPARK-41302
> URL: https://issues.apache.org/jira/browse/SPARK-41302
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: BingKun Pan
>Priority: Minor
>
> Assign a name to the legacy error class _LEGACY_ERROR_TEMP_1185, improve 
> error message and tests.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42172) Compatibility check for Scala Client

2023-01-24 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17680321#comment-17680321
 ] 

Apache Spark commented on SPARK-42172:
--

User 'zhenlineo' has created a pull request for this issue:
https://github.com/apache/spark/pull/39712

> Compatibility check for Scala Client
> 
>
> Key: SPARK-42172
> URL: https://issues.apache.org/jira/browse/SPARK-42172
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Zhen Li
>Priority: Major
>
> Adding compatibility checks for Scala client to ensure the Scala Client API 
> is binary compatible with Existing Spark SQL API (Dataset, SparkSession etc.)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42172) Compatibility check for Scala Client

2023-01-24 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42172:


Assignee: (was: Apache Spark)

> Compatibility check for Scala Client
> 
>
> Key: SPARK-42172
> URL: https://issues.apache.org/jira/browse/SPARK-42172
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Zhen Li
>Priority: Major
>
> Adding compatibility checks for Scala client to ensure the Scala Client API 
> is binary compatible with Existing Spark SQL API (Dataset, SparkSession etc.)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42172) Compatibility check for Scala Client

2023-01-24 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42172:


Assignee: Apache Spark

> Compatibility check for Scala Client
> 
>
> Key: SPARK-42172
> URL: https://issues.apache.org/jira/browse/SPARK-42172
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Zhen Li
>Assignee: Apache Spark
>Priority: Major
>
> Adding compatibility checks for Scala client to ensure the Scala Client API 
> is binary compatible with Existing Spark SQL API (Dataset, SparkSession etc.)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-42172) Compatibility check for Scala Client

2023-01-24 Thread Zhen Li (Jira)
Zhen Li created SPARK-42172:
---

 Summary: Compatibility check for Scala Client
 Key: SPARK-42172
 URL: https://issues.apache.org/jira/browse/SPARK-42172
 Project: Spark
  Issue Type: Improvement
  Components: Connect
Affects Versions: 3.4.0
Reporter: Zhen Li


Adding compatibility checks for Scala client to ensure the Scala Client API is 
binary compatible with Existing Spark SQL API (Dataset, SparkSession etc.)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42162) Memory usage on executors increased drastically for a complex query with large number of addition operations

2023-01-24 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17680317#comment-17680317
 ] 

Apache Spark commented on SPARK-42162:
--

User 'db-scnakandala' has created a pull request for this issue:
https://github.com/apache/spark/pull/39722

> Memory usage on executors increased drastically for a complex query with 
> large number of addition operations
> 
>
> Key: SPARK-42162
> URL: https://issues.apache.org/jira/browse/SPARK-42162
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Supun Nakandala
>Priority: Major
>
> With the [recent changes|https://github.com/apache/spark/pull/37851]  in the 
> expression canonicalization, a complex query with a large number of Add 
> operations ends up consuming 10x more memory on the executors.
> The reason for this issue is that with the new changes the canonicalization 
> process ends up generating lot of intermediate objects, especially for 
> complex queries with a large number of commutative operators. In this 
> specific case, a heap histogram analysis shows that a large number of Add 
> objects use the extra memory.
> This issue does not happen before PR 
> [#37851.|https://github.com/apache/spark/pull/37851]
> The high memory usage causes the executors to lose heartbeat signals and 
> results in task failures.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42162) Memory usage on executors increased drastically for a complex query with large number of addition operations

2023-01-24 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17680316#comment-17680316
 ] 

Apache Spark commented on SPARK-42162:
--

User 'db-scnakandala' has created a pull request for this issue:
https://github.com/apache/spark/pull/39722

> Memory usage on executors increased drastically for a complex query with 
> large number of addition operations
> 
>
> Key: SPARK-42162
> URL: https://issues.apache.org/jira/browse/SPARK-42162
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Supun Nakandala
>Priority: Major
>
> With the [recent changes|https://github.com/apache/spark/pull/37851]  in the 
> expression canonicalization, a complex query with a large number of Add 
> operations ends up consuming 10x more memory on the executors.
> The reason for this issue is that with the new changes the canonicalization 
> process ends up generating lot of intermediate objects, especially for 
> complex queries with a large number of commutative operators. In this 
> specific case, a heap histogram analysis shows that a large number of Add 
> objects use the extra memory.
> This issue does not happen before PR 
> [#37851.|https://github.com/apache/spark/pull/37851]
> The high memory usage causes the executors to lose heartbeat signals and 
> results in task failures.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42162) Memory usage on executors increased drastically for a complex query with large number of addition operations

2023-01-24 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42162:


Assignee: (was: Apache Spark)

> Memory usage on executors increased drastically for a complex query with 
> large number of addition operations
> 
>
> Key: SPARK-42162
> URL: https://issues.apache.org/jira/browse/SPARK-42162
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Supun Nakandala
>Priority: Major
>
> With the [recent changes|https://github.com/apache/spark/pull/37851]  in the 
> expression canonicalization, a complex query with a large number of Add 
> operations ends up consuming 10x more memory on the executors.
> The reason for this issue is that with the new changes the canonicalization 
> process ends up generating lot of intermediate objects, especially for 
> complex queries with a large number of commutative operators. In this 
> specific case, a heap histogram analysis shows that a large number of Add 
> objects use the extra memory.
> This issue does not happen before PR 
> [#37851.|https://github.com/apache/spark/pull/37851]
> The high memory usage causes the executors to lose heartbeat signals and 
> results in task failures.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42162) Memory usage on executors increased drastically for a complex query with large number of addition operations

2023-01-24 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42162:


Assignee: Apache Spark

> Memory usage on executors increased drastically for a complex query with 
> large number of addition operations
> 
>
> Key: SPARK-42162
> URL: https://issues.apache.org/jira/browse/SPARK-42162
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Supun Nakandala
>Assignee: Apache Spark
>Priority: Major
>
> With the [recent changes|https://github.com/apache/spark/pull/37851]  in the 
> expression canonicalization, a complex query with a large number of Add 
> operations ends up consuming 10x more memory on the executors.
> The reason for this issue is that with the new changes the canonicalization 
> process ends up generating lot of intermediate objects, especially for 
> complex queries with a large number of commutative operators. In this 
> specific case, a heap histogram analysis shows that a large number of Add 
> objects use the extra memory.
> This issue does not happen before PR 
> [#37851.|https://github.com/apache/spark/pull/37851]
> The high memory usage causes the executors to lose heartbeat signals and 
> results in task failures.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-42171) Fix `pyspark-errors` module and enable it in GitHub Action

2023-01-24 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-42171:
--
Summary: Fix `pyspark-errors` module and enable it in GitHub Action  (was: 
Enable `pyspark-errors` module test in GitHub Action)

> Fix `pyspark-errors` module and enable it in GitHub Action
> --
>
> Key: SPARK-42171
> URL: https://issues.apache.org/jira/browse/SPARK-42171
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark, Tests
>Affects Versions: 3.4.0
>Reporter: Dongjoon Hyun
>Priority: Blocker
>  Labels: 3.4.0
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-42171) Enable `pyspark-errors` module test in GitHub Action

2023-01-24 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-42171:
--
Labels: 3.4.0  (was: )

> Enable `pyspark-errors` module test in GitHub Action
> 
>
> Key: SPARK-42171
> URL: https://issues.apache.org/jira/browse/SPARK-42171
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 3.4.0
>Reporter: Dongjoon Hyun
>Priority: Blocker
>  Labels: 3.4.0
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-42171) Enable `pyspark-errors` module test in GitHub Action

2023-01-24 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-42171:
--
Priority: Blocker  (was: Major)

> Enable `pyspark-errors` module test in GitHub Action
> 
>
> Key: SPARK-42171
> URL: https://issues.apache.org/jira/browse/SPARK-42171
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 3.4.0
>Reporter: Dongjoon Hyun
>Priority: Blocker
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-42171) Enable `pyspark-errors` module test in GitHub Action

2023-01-24 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-42171:
--
Component/s: Tests

> Enable `pyspark-errors` module test in GitHub Action
> 
>
> Key: SPARK-42171
> URL: https://issues.apache.org/jira/browse/SPARK-42171
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark, Tests
>Affects Versions: 3.4.0
>Reporter: Dongjoon Hyun
>Priority: Blocker
>  Labels: 3.4.0
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-42090) Introduce sasl retry count in RetryingBlockTransferor

2023-01-24 Thread Mridul Muralidharan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mridul Muralidharan updated SPARK-42090:

Fix Version/s: 3.2.4
   3.3.2

> Introduce sasl retry count in RetryingBlockTransferor
> -
>
> Key: SPARK-42090
> URL: https://issues.apache.org/jira/browse/SPARK-42090
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.4.0
>Reporter: Ted Yu
>Assignee: Ted Yu
>Priority: Major
> Fix For: 3.2.4, 3.3.2, 3.4.0
>
>
> Previously a boolean variable, saslTimeoutSeen, was used in 
> RetryingBlockTransferor. However, the boolean variable wouldn't cover the 
> following scenario:
> 1. SaslTimeoutException
> 2. IOException
> 3. SaslTimeoutException
> 4. IOException
> Even though IOException at #2 is retried (resulting in increment of 
> retryCount), the retryCount would be cleared at step #4.
> Since the intention of saslTimeoutSeen is to undo the increment due to 
> retrying SaslTimeoutException, we should keep a counter for 
> SaslTimeoutException retries and subtract the value of this counter from 
> retryCount.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42171) Enable `pyspark-errors` module test in GitHub Action

2023-01-24 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42171:


Assignee: Apache Spark

> Enable `pyspark-errors` module test in GitHub Action
> 
>
> Key: SPARK-42171
> URL: https://issues.apache.org/jira/browse/SPARK-42171
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 3.4.0
>Reporter: Dongjoon Hyun
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42171) Enable `pyspark-errors` module test in GitHub Action

2023-01-24 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42171:


Assignee: (was: Apache Spark)

> Enable `pyspark-errors` module test in GitHub Action
> 
>
> Key: SPARK-42171
> URL: https://issues.apache.org/jira/browse/SPARK-42171
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 3.4.0
>Reporter: Dongjoon Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42171) Enable `pyspark-errors` module test in GitHub Action

2023-01-24 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17680307#comment-17680307
 ] 

Apache Spark commented on SPARK-42171:
--

User 'dongjoon-hyun' has created a pull request for this issue:
https://github.com/apache/spark/pull/39721

> Enable `pyspark-errors` module test in GitHub Action
> 
>
> Key: SPARK-42171
> URL: https://issues.apache.org/jira/browse/SPARK-42171
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 3.4.0
>Reporter: Dongjoon Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-42171) Enable `pyspark-errors` module test in GitHub Action

2023-01-24 Thread Dongjoon Hyun (Jira)
Dongjoon Hyun created SPARK-42171:
-

 Summary: Enable `pyspark-errors` module test in GitHub Action
 Key: SPARK-42171
 URL: https://issues.apache.org/jira/browse/SPARK-42171
 Project: Spark
  Issue Type: Bug
  Components: PySpark
Affects Versions: 3.4.0
Reporter: Dongjoon Hyun






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42167) Improve GitHub Action `lint` job to stop on failures earlier

2023-01-24 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-42167:
-

Assignee: Dongjoon Hyun

> Improve GitHub Action `lint` job to stop on failures earlier
> 
>
> Key: SPARK-42167
> URL: https://issues.apache.org/jira/browse/SPARK-42167
> Project: Spark
>  Issue Type: Improvement
>  Components: Project Infra
>Affects Versions: 3.4.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-42167) Improve GitHub Action `lint` job to stop on failures earlier

2023-01-24 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-42167.
---
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 39716
[https://github.com/apache/spark/pull/39716]

> Improve GitHub Action `lint` job to stop on failures earlier
> 
>
> Key: SPARK-42167
> URL: https://issues.apache.org/jira/browse/SPARK-42167
> Project: Spark
>  Issue Type: Improvement
>  Components: Project Infra
>Affects Versions: 3.4.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Minor
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41500) auto generate concat as Double when string minus an INTERVAL type

2023-01-24 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17680279#comment-17680279
 ] 

Apache Spark commented on SPARK-41500:
--

User 'NarekDW' has created a pull request for this issue:
https://github.com/apache/spark/pull/39720

> auto generate concat as Double when string minus an INTERVAL type
> -
>
> Key: SPARK-41500
> URL: https://issues.apache.org/jira/browse/SPARK-41500
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.0, 3.2.1, 3.2.2
>Reporter: JacobZheng
>Priority: Major
>
> h2. *Describe the bug*
> Here is a sql.
> {code:sql}
> select '2022-02-01'- INTERVAL 1 year
> {code}
> spark generate cast('2022-02-01' as double) - INTERVAL 1 year automatically 
> and type mismatch happened.
> h2. *To Reproduce*
> On Spark 3.0.1 using spark-shell
> {code:java}
> scala> spark.sql("select '2022-02-01'- interval 1 year").show
> +--+  
>   
> |CAST(CAST(2022-02-01 AS TIMESTAMP) - INTERVAL '1 years' AS STRING)|
> +--+
> |   2021-02-01 00:00:00|
> +--+
> {code}
> On Spark 3.2.1 using spark-shell
> {code:java}
> scala> spark.sql("select '2022-02-01'- interval 1 year").show
> org.apache.spark.sql.AnalysisException: cannot resolve '(CAST('2022-02-01' AS 
> DOUBLE) - INTERVAL '1' YEAR)' due to data type mismatch: differing types in 
> '(CAST('2022-02-01' AS DOUBLE) - INTERVAL '1' YEAR)' (double and interval 
> year).; line 1 pos 7;
> 'Project [unresolvedalias((cast(2022-02-01 as double) - INTERVAL '1' YEAR), 
> None)]
> +- OneRowRelation
>   at 
> org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42)
>   at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$$nestedInanonfun$checkAnalysis$1$2.applyOrElse(CheckAnalysis.scala:190)
>   at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$$nestedInanonfun$checkAnalysis$1$2.applyOrElse(CheckAnalysis.scala:175)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformUpWithPruning$2(TreeNode.scala:535)
>   at 
> org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:82)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformUpWithPruning(TreeNode.scala:535)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformUpWithPruning$1(TreeNode.scala:532)
>   at 
> org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren(TreeNode.scala:1128)
>   at 
> org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren$(TreeNode.scala:1127)
>   at 
> org.apache.spark.sql.catalyst.expressions.UnaryExpression.mapChildren(Expression.scala:467)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformUpWithPruning(TreeNode.scala:532)
>   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$transformExpressionsUpWithPruning$1(QueryPlan.scala:181)
>   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$mapExpressions$1(QueryPlan.scala:193)
>   at 
> org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:82)
>   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpression$1(QueryPlan.scala:193)
>   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.recursiveTransform$1(QueryPlan.scala:204)
>   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$mapExpressions$3(QueryPlan.scala:209)
>   at 
> scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286)
>   at scala.collection.immutable.List.foreach(List.scala:431)
>   at scala.collection.TraversableLike.map(TraversableLike.scala:286)
>   at scala.collection.TraversableLike.map$(TraversableLike.scala:279)
>   at scala.collection.immutable.List.map(List.scala:305)
>   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.recursiveTransform$1(QueryPlan.scala:209)
>   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$mapExpressions$4(QueryPlan.scala:214)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:323)
>   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.mapExpressions(QueryPlan.scala:214)
>   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpressionsUpWithPruning(QueryPlan.scala:181)
>   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpressionsUp(QueryPlan.scala:161)
>   at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis$1(CheckAnalysis.scala:175)
>   at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis$1$adapted(CheckAnalysis.scala:94)
>   at 
> org.apache.s

[jira] [Commented] (SPARK-41500) auto generate concat as Double when string minus an INTERVAL type

2023-01-24 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17680278#comment-17680278
 ] 

Apache Spark commented on SPARK-41500:
--

User 'NarekDW' has created a pull request for this issue:
https://github.com/apache/spark/pull/39720

> auto generate concat as Double when string minus an INTERVAL type
> -
>
> Key: SPARK-41500
> URL: https://issues.apache.org/jira/browse/SPARK-41500
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.0, 3.2.1, 3.2.2
>Reporter: JacobZheng
>Priority: Major
>
> h2. *Describe the bug*
> Here is a sql.
> {code:sql}
> select '2022-02-01'- INTERVAL 1 year
> {code}
> spark generate cast('2022-02-01' as double) - INTERVAL 1 year automatically 
> and type mismatch happened.
> h2. *To Reproduce*
> On Spark 3.0.1 using spark-shell
> {code:java}
> scala> spark.sql("select '2022-02-01'- interval 1 year").show
> +--+  
>   
> |CAST(CAST(2022-02-01 AS TIMESTAMP) - INTERVAL '1 years' AS STRING)|
> +--+
> |   2021-02-01 00:00:00|
> +--+
> {code}
> On Spark 3.2.1 using spark-shell
> {code:java}
> scala> spark.sql("select '2022-02-01'- interval 1 year").show
> org.apache.spark.sql.AnalysisException: cannot resolve '(CAST('2022-02-01' AS 
> DOUBLE) - INTERVAL '1' YEAR)' due to data type mismatch: differing types in 
> '(CAST('2022-02-01' AS DOUBLE) - INTERVAL '1' YEAR)' (double and interval 
> year).; line 1 pos 7;
> 'Project [unresolvedalias((cast(2022-02-01 as double) - INTERVAL '1' YEAR), 
> None)]
> +- OneRowRelation
>   at 
> org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42)
>   at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$$nestedInanonfun$checkAnalysis$1$2.applyOrElse(CheckAnalysis.scala:190)
>   at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$$nestedInanonfun$checkAnalysis$1$2.applyOrElse(CheckAnalysis.scala:175)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformUpWithPruning$2(TreeNode.scala:535)
>   at 
> org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:82)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformUpWithPruning(TreeNode.scala:535)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformUpWithPruning$1(TreeNode.scala:532)
>   at 
> org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren(TreeNode.scala:1128)
>   at 
> org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren$(TreeNode.scala:1127)
>   at 
> org.apache.spark.sql.catalyst.expressions.UnaryExpression.mapChildren(Expression.scala:467)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformUpWithPruning(TreeNode.scala:532)
>   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$transformExpressionsUpWithPruning$1(QueryPlan.scala:181)
>   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$mapExpressions$1(QueryPlan.scala:193)
>   at 
> org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:82)
>   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpression$1(QueryPlan.scala:193)
>   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.recursiveTransform$1(QueryPlan.scala:204)
>   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$mapExpressions$3(QueryPlan.scala:209)
>   at 
> scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286)
>   at scala.collection.immutable.List.foreach(List.scala:431)
>   at scala.collection.TraversableLike.map(TraversableLike.scala:286)
>   at scala.collection.TraversableLike.map$(TraversableLike.scala:279)
>   at scala.collection.immutable.List.map(List.scala:305)
>   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.recursiveTransform$1(QueryPlan.scala:209)
>   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$mapExpressions$4(QueryPlan.scala:214)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:323)
>   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.mapExpressions(QueryPlan.scala:214)
>   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpressionsUpWithPruning(QueryPlan.scala:181)
>   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpressionsUp(QueryPlan.scala:161)
>   at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis$1(CheckAnalysis.scala:175)
>   at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis$1$adapted(CheckAnalysis.scala:94)
>   at 
> org.apache.s

[jira] [Assigned] (SPARK-41500) auto generate concat as Double when string minus an INTERVAL type

2023-01-24 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41500:


Assignee: (was: Apache Spark)

> auto generate concat as Double when string minus an INTERVAL type
> -
>
> Key: SPARK-41500
> URL: https://issues.apache.org/jira/browse/SPARK-41500
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.0, 3.2.1, 3.2.2
>Reporter: JacobZheng
>Priority: Major
>
> h2. *Describe the bug*
> Here is a sql.
> {code:sql}
> select '2022-02-01'- INTERVAL 1 year
> {code}
> spark generate cast('2022-02-01' as double) - INTERVAL 1 year automatically 
> and type mismatch happened.
> h2. *To Reproduce*
> On Spark 3.0.1 using spark-shell
> {code:java}
> scala> spark.sql("select '2022-02-01'- interval 1 year").show
> +--+  
>   
> |CAST(CAST(2022-02-01 AS TIMESTAMP) - INTERVAL '1 years' AS STRING)|
> +--+
> |   2021-02-01 00:00:00|
> +--+
> {code}
> On Spark 3.2.1 using spark-shell
> {code:java}
> scala> spark.sql("select '2022-02-01'- interval 1 year").show
> org.apache.spark.sql.AnalysisException: cannot resolve '(CAST('2022-02-01' AS 
> DOUBLE) - INTERVAL '1' YEAR)' due to data type mismatch: differing types in 
> '(CAST('2022-02-01' AS DOUBLE) - INTERVAL '1' YEAR)' (double and interval 
> year).; line 1 pos 7;
> 'Project [unresolvedalias((cast(2022-02-01 as double) - INTERVAL '1' YEAR), 
> None)]
> +- OneRowRelation
>   at 
> org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42)
>   at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$$nestedInanonfun$checkAnalysis$1$2.applyOrElse(CheckAnalysis.scala:190)
>   at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$$nestedInanonfun$checkAnalysis$1$2.applyOrElse(CheckAnalysis.scala:175)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformUpWithPruning$2(TreeNode.scala:535)
>   at 
> org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:82)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformUpWithPruning(TreeNode.scala:535)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformUpWithPruning$1(TreeNode.scala:532)
>   at 
> org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren(TreeNode.scala:1128)
>   at 
> org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren$(TreeNode.scala:1127)
>   at 
> org.apache.spark.sql.catalyst.expressions.UnaryExpression.mapChildren(Expression.scala:467)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformUpWithPruning(TreeNode.scala:532)
>   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$transformExpressionsUpWithPruning$1(QueryPlan.scala:181)
>   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$mapExpressions$1(QueryPlan.scala:193)
>   at 
> org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:82)
>   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpression$1(QueryPlan.scala:193)
>   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.recursiveTransform$1(QueryPlan.scala:204)
>   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$mapExpressions$3(QueryPlan.scala:209)
>   at 
> scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286)
>   at scala.collection.immutable.List.foreach(List.scala:431)
>   at scala.collection.TraversableLike.map(TraversableLike.scala:286)
>   at scala.collection.TraversableLike.map$(TraversableLike.scala:279)
>   at scala.collection.immutable.List.map(List.scala:305)
>   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.recursiveTransform$1(QueryPlan.scala:209)
>   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$mapExpressions$4(QueryPlan.scala:214)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:323)
>   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.mapExpressions(QueryPlan.scala:214)
>   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpressionsUpWithPruning(QueryPlan.scala:181)
>   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpressionsUp(QueryPlan.scala:161)
>   at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis$1(CheckAnalysis.scala:175)
>   at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis$1$adapted(CheckAnalysis.scala:94)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:263)
>   at 
> org.apache.spark.sql.catalyst.analysis.Ch

[jira] [Assigned] (SPARK-41500) auto generate concat as Double when string minus an INTERVAL type

2023-01-24 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41500:


Assignee: Apache Spark

> auto generate concat as Double when string minus an INTERVAL type
> -
>
> Key: SPARK-41500
> URL: https://issues.apache.org/jira/browse/SPARK-41500
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.0, 3.2.1, 3.2.2
>Reporter: JacobZheng
>Assignee: Apache Spark
>Priority: Major
>
> h2. *Describe the bug*
> Here is a sql.
> {code:sql}
> select '2022-02-01'- INTERVAL 1 year
> {code}
> spark generate cast('2022-02-01' as double) - INTERVAL 1 year automatically 
> and type mismatch happened.
> h2. *To Reproduce*
> On Spark 3.0.1 using spark-shell
> {code:java}
> scala> spark.sql("select '2022-02-01'- interval 1 year").show
> +--+  
>   
> |CAST(CAST(2022-02-01 AS TIMESTAMP) - INTERVAL '1 years' AS STRING)|
> +--+
> |   2021-02-01 00:00:00|
> +--+
> {code}
> On Spark 3.2.1 using spark-shell
> {code:java}
> scala> spark.sql("select '2022-02-01'- interval 1 year").show
> org.apache.spark.sql.AnalysisException: cannot resolve '(CAST('2022-02-01' AS 
> DOUBLE) - INTERVAL '1' YEAR)' due to data type mismatch: differing types in 
> '(CAST('2022-02-01' AS DOUBLE) - INTERVAL '1' YEAR)' (double and interval 
> year).; line 1 pos 7;
> 'Project [unresolvedalias((cast(2022-02-01 as double) - INTERVAL '1' YEAR), 
> None)]
> +- OneRowRelation
>   at 
> org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42)
>   at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$$nestedInanonfun$checkAnalysis$1$2.applyOrElse(CheckAnalysis.scala:190)
>   at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$$nestedInanonfun$checkAnalysis$1$2.applyOrElse(CheckAnalysis.scala:175)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformUpWithPruning$2(TreeNode.scala:535)
>   at 
> org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:82)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformUpWithPruning(TreeNode.scala:535)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformUpWithPruning$1(TreeNode.scala:532)
>   at 
> org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren(TreeNode.scala:1128)
>   at 
> org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren$(TreeNode.scala:1127)
>   at 
> org.apache.spark.sql.catalyst.expressions.UnaryExpression.mapChildren(Expression.scala:467)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformUpWithPruning(TreeNode.scala:532)
>   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$transformExpressionsUpWithPruning$1(QueryPlan.scala:181)
>   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$mapExpressions$1(QueryPlan.scala:193)
>   at 
> org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:82)
>   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpression$1(QueryPlan.scala:193)
>   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.recursiveTransform$1(QueryPlan.scala:204)
>   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$mapExpressions$3(QueryPlan.scala:209)
>   at 
> scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286)
>   at scala.collection.immutable.List.foreach(List.scala:431)
>   at scala.collection.TraversableLike.map(TraversableLike.scala:286)
>   at scala.collection.TraversableLike.map$(TraversableLike.scala:279)
>   at scala.collection.immutable.List.map(List.scala:305)
>   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.recursiveTransform$1(QueryPlan.scala:209)
>   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$mapExpressions$4(QueryPlan.scala:214)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:323)
>   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.mapExpressions(QueryPlan.scala:214)
>   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpressionsUpWithPruning(QueryPlan.scala:181)
>   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpressionsUp(QueryPlan.scala:161)
>   at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis$1(CheckAnalysis.scala:175)
>   at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis$1$adapted(CheckAnalysis.scala:94)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:263)
>   at 
> org.apache.spark

[jira] [Commented] (SPARK-42169) Implement code generation for `to_csv` function (StructsToCsv)

2023-01-24 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17680254#comment-17680254
 ] 

Apache Spark commented on SPARK-42169:
--

User 'NarekDW' has created a pull request for this issue:
https://github.com/apache/spark/pull/39719

> Implement code generation for `to_csv` function (StructsToCsv)
> --
>
> Key: SPARK-42169
> URL: https://issues.apache.org/jira/browse/SPARK-42169
> Project: Spark
>  Issue Type: Task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Narek Karapetian
>Priority: Minor
>  Labels: csv, sql
> Fix For: 3.4.0
>
>
> Implement code generation for `to_csv` function instead of extending it from 
> CodegenFallback trait.
> {code:java}
> org.apache.spark.sql.catalyst.expressions.StructsToCsv.doGenCode(...){code}
>  
> This is good to have from performance point of view.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42169) Implement code generation for `to_csv` function (StructsToCsv)

2023-01-24 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42169:


Assignee: Apache Spark

> Implement code generation for `to_csv` function (StructsToCsv)
> --
>
> Key: SPARK-42169
> URL: https://issues.apache.org/jira/browse/SPARK-42169
> Project: Spark
>  Issue Type: Task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Narek Karapetian
>Assignee: Apache Spark
>Priority: Minor
>  Labels: csv, sql
> Fix For: 3.4.0
>
>
> Implement code generation for `to_csv` function instead of extending it from 
> CodegenFallback trait.
> {code:java}
> org.apache.spark.sql.catalyst.expressions.StructsToCsv.doGenCode(...){code}
>  
> This is good to have from performance point of view.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42169) Implement code generation for `to_csv` function (StructsToCsv)

2023-01-24 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42169:


Assignee: (was: Apache Spark)

> Implement code generation for `to_csv` function (StructsToCsv)
> --
>
> Key: SPARK-42169
> URL: https://issues.apache.org/jira/browse/SPARK-42169
> Project: Spark
>  Issue Type: Task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Narek Karapetian
>Priority: Minor
>  Labels: csv, sql
> Fix For: 3.4.0
>
>
> Implement code generation for `to_csv` function instead of extending it from 
> CodegenFallback trait.
> {code:java}
> org.apache.spark.sql.catalyst.expressions.StructsToCsv.doGenCode(...){code}
>  
> This is good to have from performance point of view.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42169) Implement code generation for `to_csv` function (StructsToCsv)

2023-01-24 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17680251#comment-17680251
 ] 

Apache Spark commented on SPARK-42169:
--

User 'NarekDW' has created a pull request for this issue:
https://github.com/apache/spark/pull/39097

> Implement code generation for `to_csv` function (StructsToCsv)
> --
>
> Key: SPARK-42169
> URL: https://issues.apache.org/jira/browse/SPARK-42169
> Project: Spark
>  Issue Type: Task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Narek Karapetian
>Priority: Minor
>  Labels: csv, sql
> Fix For: 3.4.0
>
>
> Implement code generation for `to_csv` function instead of extending it from 
> CodegenFallback trait.
> {code:java}
> org.apache.spark.sql.catalyst.expressions.StructsToCsv.doGenCode(...){code}
>  
> This is good to have from performance point of view.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42163) Schema pruning fails on non-foldable array index or map key

2023-01-24 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17680248#comment-17680248
 ] 

Apache Spark commented on SPARK-42163:
--

User 'cashmand' has created a pull request for this issue:
https://github.com/apache/spark/pull/39718

> Schema pruning fails on non-foldable array index or map key
> ---
>
> Key: SPARK-42163
> URL: https://issues.apache.org/jira/browse/SPARK-42163
> Project: Spark
>  Issue Type: Bug
>  Components: Optimizer
>Affects Versions: 3.2.3
>Reporter: David Cashman
>Priority: Major
>
> Schema pruning tries to extract selected fields from struct extractors. It 
> looks through GetArrayItem/GetMapItem, but when doing so, it ignores the 
> index/key, which may itself be a struct field. If it is a struct field that 
> is not otherwise selected, and some other field of the same attribute is 
> selected, then pruning will drop the field, resulting in an optimizer error.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42163) Schema pruning fails on non-foldable array index or map key

2023-01-24 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42163:


Assignee: Apache Spark

> Schema pruning fails on non-foldable array index or map key
> ---
>
> Key: SPARK-42163
> URL: https://issues.apache.org/jira/browse/SPARK-42163
> Project: Spark
>  Issue Type: Bug
>  Components: Optimizer
>Affects Versions: 3.2.3
>Reporter: David Cashman
>Assignee: Apache Spark
>Priority: Major
>
> Schema pruning tries to extract selected fields from struct extractors. It 
> looks through GetArrayItem/GetMapItem, but when doing so, it ignores the 
> index/key, which may itself be a struct field. If it is a struct field that 
> is not otherwise selected, and some other field of the same attribute is 
> selected, then pruning will drop the field, resulting in an optimizer error.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42163) Schema pruning fails on non-foldable array index or map key

2023-01-24 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42163:


Assignee: (was: Apache Spark)

> Schema pruning fails on non-foldable array index or map key
> ---
>
> Key: SPARK-42163
> URL: https://issues.apache.org/jira/browse/SPARK-42163
> Project: Spark
>  Issue Type: Bug
>  Components: Optimizer
>Affects Versions: 3.2.3
>Reporter: David Cashman
>Priority: Major
>
> Schema pruning tries to extract selected fields from struct extractors. It 
> looks through GetArrayItem/GetMapItem, but when doing so, it ignores the 
> index/key, which may itself be a struct field. If it is a struct field that 
> is not otherwise selected, and some other field of the same attribute is 
> selected, then pruning will drop the field, resulting in an optimizer error.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



  1   2   >