[jira] [Resolved] (SPARK-41735) Any SparkThrowable (with an error class) not in error-classes.json is masked in SQLExecution.withNewExecutionId and end-user will see "org.apache.spark.SparkException:

2023-01-30 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-41735.
-
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 39794
[https://github.com/apache/spark/pull/39794]

> Any SparkThrowable (with an error class) not in error-classes.json is masked 
> in SQLExecution.withNewExecutionId and end-user will see 
> "org.apache.spark.SparkException: [INTERNAL_ERROR]" 
> --
>
> Key: SPARK-41735
> URL: https://issues.apache.org/jira/browse/SPARK-41735
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Allison Portis
>Priority: Major
> Fix For: 3.4.0
>
>
> This change 
> [here|https://github.com/apache/spark/pull/38302/files#diff-fdd1e9e26aa1ba9d1cc923ee7c84a1935dcc285502330a471f1ade7f3ad08bf9]
>  means that any seen error is passed to SparkThrowableHelper.getMessage(...). 
> Any SparkThrowable with an error class (for example, if a connector uses the 
> spark error format i.e. see ErrorClassesJsonReader) will be masked as 
> {code:java}
> org.apache.spark.SparkException: [INTERNAL_ERROR] Cannot find main error 
> class 'SOME_ERROR_CLASS'{code}
> in SparkThrowableHelper.getMessage since 
> errorReader.getMessageTemplate(errorClass) will fail for the error class not 
> defined in error-classes.json.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-41735) Any SparkThrowable (with an error class) not in error-classes.json is masked in SQLExecution.withNewExecutionId and end-user will see "org.apache.spark.SparkException:

2023-01-30 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-41735:
---

Assignee: XiDuo You

> Any SparkThrowable (with an error class) not in error-classes.json is masked 
> in SQLExecution.withNewExecutionId and end-user will see 
> "org.apache.spark.SparkException: [INTERNAL_ERROR]" 
> --
>
> Key: SPARK-41735
> URL: https://issues.apache.org/jira/browse/SPARK-41735
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Allison Portis
>Assignee: XiDuo You
>Priority: Major
> Fix For: 3.4.0
>
>
> This change 
> [here|https://github.com/apache/spark/pull/38302/files#diff-fdd1e9e26aa1ba9d1cc923ee7c84a1935dcc285502330a471f1ade7f3ad08bf9]
>  means that any seen error is passed to SparkThrowableHelper.getMessage(...). 
> Any SparkThrowable with an error class (for example, if a connector uses the 
> spark error format i.e. see ErrorClassesJsonReader) will be masked as 
> {code:java}
> org.apache.spark.SparkException: [INTERNAL_ERROR] Cannot find main error 
> class 'SOME_ERROR_CLASS'{code}
> in SparkThrowableHelper.getMessage since 
> errorReader.getMessageTemplate(errorClass) will fail for the error class not 
> defined in error-classes.json.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42232) Rename error class: UNSUPPORTED_FEATURE.JDBC_TRANSACTION

2023-01-30 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42232:


Assignee: (was: Apache Spark)

> Rename error class: UNSUPPORTED_FEATURE.JDBC_TRANSACTION
> 
>
> Key: SPARK-42232
> URL: https://issues.apache.org/jira/browse/SPARK-42232
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Haejoon Lee
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42232) Rename error class: UNSUPPORTED_FEATURE.JDBC_TRANSACTION

2023-01-30 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42232:


Assignee: Apache Spark

> Rename error class: UNSUPPORTED_FEATURE.JDBC_TRANSACTION
> 
>
> Key: SPARK-42232
> URL: https://issues.apache.org/jira/browse/SPARK-42232
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Haejoon Lee
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42232) Rename error class: UNSUPPORTED_FEATURE.JDBC_TRANSACTION

2023-01-30 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17681967#comment-17681967
 ] 

Apache Spark commented on SPARK-42232:
--

User 'itholic' has created a pull request for this issue:
https://github.com/apache/spark/pull/39799

> Rename error class: UNSUPPORTED_FEATURE.JDBC_TRANSACTION
> 
>
> Key: SPARK-42232
> URL: https://issues.apache.org/jira/browse/SPARK-42232
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Haejoon Lee
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41855) `createDataFrame` doesn't handle None/NaN properly

2023-01-30 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17681974#comment-17681974
 ] 

Apache Spark commented on SPARK-41855:
--

User 'zhengruifeng' has created a pull request for this issue:
https://github.com/apache/spark/pull/39800

> `createDataFrame` doesn't handle None/NaN properly
> --
>
> Key: SPARK-41855
> URL: https://issues.apache.org/jira/browse/SPARK-41855
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
> Fix For: 3.4.0
>
>
> {code:python}
> data = [Row(id=1, value=float("NaN")), Row(id=2, value=42.0), 
> Row(id=3, value=None)]
> # +---+-+
> # | id|value|
> # +---+-+
> # |  1|  NaN|
> # |  2| 42.0|
> # |  3| null|
> # +---+-+
> cdf = self.connect.createDataFrame(data)
> sdf = self.spark.createDataFrame(data)
> print()
> print()
> print(cdf._show_string(100, 100, False))
> print()
> print(cdf.schema)
> print()
> print(sdf._jdf.showString(100, 100, False))
> print()
> print(sdf.schema)
> self.compare_by_show(cdf, sdf)
> {code}
> {code:java}
> +---+-+
> | id|value|
> +---+-+
> |  1| null|
> |  2| 42.0|
> |  3| null|
> +---+-+
> StructType([StructField('id', LongType(), True), StructField('value', 
> DoubleType(), True)])
> +---+-+
> | id|value|
> +---+-+
> |  1|  NaN|
> |  2| 42.0|
> |  3| null|
> +---+-+
> StructType([StructField('id', LongType(), True), StructField('value', 
> DoubleType(), True)])
> {code}
> this issue is due to that `createDataFrame` can't handle None/NaN properly:
> 1, in the conversion from local data to pd.DataFrame, it automatically 
> converts both None and NaN to NaN
> 2, then in the conversion from pd.DataFrame to pa.Table, it always converts 
> NaN to null



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-42236) Refine `NULLABLE_ARRAY_OR_MAP_ELEMENT`

2023-01-30 Thread Haejoon Lee (Jira)
Haejoon Lee created SPARK-42236:
---

 Summary: Refine `NULLABLE_ARRAY_OR_MAP_ELEMENT`
 Key: SPARK-42236
 URL: https://issues.apache.org/jira/browse/SPARK-42236
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.4.0
Reporter: Haejoon Lee






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-42237) change binary to unsupported dataType in csv format

2023-01-30 Thread Wei Guo (Jira)
Wei Guo created SPARK-42237:
---

 Summary: change binary to unsupported dataType in csv format
 Key: SPARK-42237
 URL: https://issues.apache.org/jira/browse/SPARK-42237
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.3.1, 2.4.8
Reporter: Wei Guo
 Fix For: 3.4.0


When a binary colunm is written into csv files, actual content of this colunm 
is {*}object.toString(){*}, which is meaningless. 
{code:java}
val df = 
Seq(Array[Byte](1,2)).toDFdf.write.csv("/Users/guowei19/Desktop/binary_csv") 
{code}
The csv file's content is as follows:
!image-2023-01-30-17-18-16-372.png|width=104,height=21!
Meanwhile, if a binary colunm saved as table with csv fileformat, the table 
can't be read back successfully.
{code:java}
val df = Seq((1, 
Array[Byte](1,2))).toDFdf.write.format("csv").saveAsTable("binaryDataTable")spark.sql("select
 * from binaryDataTable").show() {code}
!https://rte.weiyun.baidu.com/wiki/attach/image/api/imageDownloadAddress?attachId=82da0afc444c41bdaac34418a1c89963&docGuid=Eiscz4oMI45Sfp&sign=eyJhbGciOiJkaXIiLCJlbmMiOiJBMjU2R0NNIiwiYXBwSWQiOjEsInVpZCI6IjgtVWkzU0lMY2wiLCJkb2NJZCI6IkVpc2N6NG9NSTQ1U2ZwIn0..z1O-00hE1tTua9co.RmL0GxEQyNVQbIMYOvyAmQY18NMCxHdGdEPtulFiV3BuqsVlJODgA9-xFY9H9yer_Ckpbt4aG2ZrqgohIq43_ywzj-8u8SKKZnnzm7Dt-EhQBwrA7EhwUveE4-MRcAmsgqRKneN0gUJIu78ogR-M5-GAYqiyd-C-PH0LTaHDhNBWFBkF01kVOLJ18c2VTT6_lbc9j9Drmxj56ouymFgfhdUtpA.cTYqsEvvnKDcIPiah99f_A!
So I think it' better to change binary to unsupported dataType in csv format, 
both for datasource v1(CSVFileFormat) and v2(CSVTable).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-42237) change binary to unsupported dataType in csv format

2023-01-30 Thread Wei Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42237?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Guo updated SPARK-42237:

Attachment: image-2023-01-30-17-21-09-212.png

> change binary to unsupported dataType in csv format
> ---
>
> Key: SPARK-42237
> URL: https://issues.apache.org/jira/browse/SPARK-42237
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.4.8, 3.3.1
>Reporter: Wei Guo
>Priority: Minor
> Fix For: 3.4.0
>
> Attachments: image-2023-01-30-17-21-09-212.png
>
>
> When a binary colunm is written into csv files, actual content of this colunm 
> is {*}object.toString(){*}, which is meaningless. 
> {code:java}
> val df = 
> Seq(Array[Byte](1,2)).toDFdf.write.csv("/Users/guowei19/Desktop/binary_csv") 
> {code}
> The csv file's content is as follows:
> !image-2023-01-30-17-18-16-372.png|width=104,height=21!
> Meanwhile, if a binary colunm saved as table with csv fileformat, the table 
> can't be read back successfully.
> {code:java}
> val df = Seq((1, 
> Array[Byte](1,2))).toDFdf.write.format("csv").saveAsTable("binaryDataTable")spark.sql("select
>  * from binaryDataTable").show() {code}
> !https://rte.weiyun.baidu.com/wiki/attach/image/api/imageDownloadAddress?attachId=82da0afc444c41bdaac34418a1c89963&docGuid=Eiscz4oMI45Sfp&sign=eyJhbGciOiJkaXIiLCJlbmMiOiJBMjU2R0NNIiwiYXBwSWQiOjEsInVpZCI6IjgtVWkzU0lMY2wiLCJkb2NJZCI6IkVpc2N6NG9NSTQ1U2ZwIn0..z1O-00hE1tTua9co.RmL0GxEQyNVQbIMYOvyAmQY18NMCxHdGdEPtulFiV3BuqsVlJODgA9-xFY9H9yer_Ckpbt4aG2ZrqgohIq43_ywzj-8u8SKKZnnzm7Dt-EhQBwrA7EhwUveE4-MRcAmsgqRKneN0gUJIu78ogR-M5-GAYqiyd-C-PH0LTaHDhNBWFBkF01kVOLJ18c2VTT6_lbc9j9Drmxj56ouymFgfhdUtpA.cTYqsEvvnKDcIPiah99f_A!
> So I think it' better to change binary to unsupported dataType in csv format, 
> both for datasource v1(CSVFileFormat) and v2(CSVTable).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-42237) change binary to unsupported dataType in csv format

2023-01-30 Thread Wei Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42237?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Guo updated SPARK-42237:

Description: 
When a binary colunm is written into csv files, actual content of this colunm 
is {*}object.toString(){*}, which is meaningless.
{code:java}
val df = 
Seq(Array[Byte](1,2)).toDFdf.write.csv("/Users/guowei19/Desktop/binary_csv") 
{code}
The csv file's content is as follows:
!image-2023-01-30-17-21-09-212.png|width=141,height=29!
Meanwhile, if a binary colunm saved as table with csv fileformat, the table 
can't be read back successfully.
{code:java}
val df = Seq((1, 
Array[Byte](1,2))).toDFdf.write.format("csv").saveAsTable("binaryDataTable")spark.sql("select
 * from binaryDataTable").show() {code}
!https://rte.weiyun.baidu.com/wiki/attach/image/api/imageDownloadAddress?attachId=82da0afc444c41bdaac34418a1c89963&docGuid=Eiscz4oMI45Sfp&sign=eyJhbGciOiJkaXIiLCJlbmMiOiJBMjU2R0NNIiwiYXBwSWQiOjEsInVpZCI6IjgtVWkzU0lMY2wiLCJkb2NJZCI6IkVpc2N6NG9NSTQ1U2ZwIn0..z1O-00hE1tTua9co.RmL0GxEQyNVQbIMYOvyAmQY18NMCxHdGdEPtulFiV3BuqsVlJODgA9-xFY9H9yer_Ckpbt4aG2ZrqgohIq43_ywzj-8u8SKKZnnzm7Dt-EhQBwrA7EhwUveE4-MRcAmsgqRKneN0gUJIu78ogR-M5-GAYqiyd-C-PH0LTaHDhNBWFBkF01kVOLJ18c2VTT6_lbc9j9Drmxj56ouymFgfhdUtpA.cTYqsEvvnKDcIPiah99f_A!
So I think it' better to change binary to unsupported dataType in csv format, 
both for datasource v1(CSVFileFormat) and v2(CSVTable).

  was:
When a binary colunm is written into csv files, actual content of this colunm 
is {*}object.toString(){*}, which is meaningless. 
{code:java}
val df = 
Seq(Array[Byte](1,2)).toDFdf.write.csv("/Users/guowei19/Desktop/binary_csv") 
{code}
The csv file's content is as follows:
!image-2023-01-30-17-18-16-372.png|width=104,height=21!
Meanwhile, if a binary colunm saved as table with csv fileformat, the table 
can't be read back successfully.
{code:java}
val df = Seq((1, 
Array[Byte](1,2))).toDFdf.write.format("csv").saveAsTable("binaryDataTable")spark.sql("select
 * from binaryDataTable").show() {code}
!https://rte.weiyun.baidu.com/wiki/attach/image/api/imageDownloadAddress?attachId=82da0afc444c41bdaac34418a1c89963&docGuid=Eiscz4oMI45Sfp&sign=eyJhbGciOiJkaXIiLCJlbmMiOiJBMjU2R0NNIiwiYXBwSWQiOjEsInVpZCI6IjgtVWkzU0lMY2wiLCJkb2NJZCI6IkVpc2N6NG9NSTQ1U2ZwIn0..z1O-00hE1tTua9co.RmL0GxEQyNVQbIMYOvyAmQY18NMCxHdGdEPtulFiV3BuqsVlJODgA9-xFY9H9yer_Ckpbt4aG2ZrqgohIq43_ywzj-8u8SKKZnnzm7Dt-EhQBwrA7EhwUveE4-MRcAmsgqRKneN0gUJIu78ogR-M5-GAYqiyd-C-PH0LTaHDhNBWFBkF01kVOLJ18c2VTT6_lbc9j9Drmxj56ouymFgfhdUtpA.cTYqsEvvnKDcIPiah99f_A!
So I think it' better to change binary to unsupported dataType in csv format, 
both for datasource v1(CSVFileFormat) and v2(CSVTable).


> change binary to unsupported dataType in csv format
> ---
>
> Key: SPARK-42237
> URL: https://issues.apache.org/jira/browse/SPARK-42237
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.4.8, 3.3.1
>Reporter: Wei Guo
>Priority: Minor
> Fix For: 3.4.0
>
> Attachments: image-2023-01-30-17-21-09-212.png
>
>
> When a binary colunm is written into csv files, actual content of this colunm 
> is {*}object.toString(){*}, which is meaningless.
> {code:java}
> val df = 
> Seq(Array[Byte](1,2)).toDFdf.write.csv("/Users/guowei19/Desktop/binary_csv") 
> {code}
> The csv file's content is as follows:
> !image-2023-01-30-17-21-09-212.png|width=141,height=29!
> Meanwhile, if a binary colunm saved as table with csv fileformat, the table 
> can't be read back successfully.
> {code:java}
> val df = Seq((1, 
> Array[Byte](1,2))).toDFdf.write.format("csv").saveAsTable("binaryDataTable")spark.sql("select
>  * from binaryDataTable").show() {code}
> !https://rte.weiyun.baidu.com/wiki/attach/image/api/imageDownloadAddress?attachId=82da0afc444c41bdaac34418a1c89963&docGuid=Eiscz4oMI45Sfp&sign=eyJhbGciOiJkaXIiLCJlbmMiOiJBMjU2R0NNIiwiYXBwSWQiOjEsInVpZCI6IjgtVWkzU0lMY2wiLCJkb2NJZCI6IkVpc2N6NG9NSTQ1U2ZwIn0..z1O-00hE1tTua9co.RmL0GxEQyNVQbIMYOvyAmQY18NMCxHdGdEPtulFiV3BuqsVlJODgA9-xFY9H9yer_Ckpbt4aG2ZrqgohIq43_ywzj-8u8SKKZnnzm7Dt-EhQBwrA7EhwUveE4-MRcAmsgqRKneN0gUJIu78ogR-M5-GAYqiyd-C-PH0LTaHDhNBWFBkF01kVOLJ18c2VTT6_lbc9j9Drmxj56ouymFgfhdUtpA.cTYqsEvvnKDcIPiah99f_A!
> So I think it' better to change binary to unsupported dataType in csv format, 
> both for datasource v1(CSVFileFormat) and v2(CSVTable).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42237) change binary to unsupported dataType in csv format

2023-01-30 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17681979#comment-17681979
 ] 

Apache Spark commented on SPARK-42237:
--

User 'weiyuyilia' has created a pull request for this issue:
https://github.com/apache/spark/pull/39802

> change binary to unsupported dataType in csv format
> ---
>
> Key: SPARK-42237
> URL: https://issues.apache.org/jira/browse/SPARK-42237
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.4.8, 3.3.1
>Reporter: Wei Guo
>Priority: Minor
> Fix For: 3.4.0
>
> Attachments: image-2023-01-30-17-21-09-212.png
>
>
> When a binary colunm is written into csv files, actual content of this colunm 
> is {*}object.toString(){*}, which is meaningless.
> {code:java}
> val df = 
> Seq(Array[Byte](1,2)).toDFdf.write.csv("/Users/guowei19/Desktop/binary_csv") 
> {code}
> The csv file's content is as follows:
> !image-2023-01-30-17-21-09-212.png|width=141,height=29!
> Meanwhile, if a binary colunm saved as table with csv fileformat, the table 
> can't be read back successfully.
> {code:java}
> val df = Seq((1, 
> Array[Byte](1,2))).toDFdf.write.format("csv").saveAsTable("binaryDataTable")spark.sql("select
>  * from binaryDataTable").show() {code}
> !https://rte.weiyun.baidu.com/wiki/attach/image/api/imageDownloadAddress?attachId=82da0afc444c41bdaac34418a1c89963&docGuid=Eiscz4oMI45Sfp&sign=eyJhbGciOiJkaXIiLCJlbmMiOiJBMjU2R0NNIiwiYXBwSWQiOjEsInVpZCI6IjgtVWkzU0lMY2wiLCJkb2NJZCI6IkVpc2N6NG9NSTQ1U2ZwIn0..z1O-00hE1tTua9co.RmL0GxEQyNVQbIMYOvyAmQY18NMCxHdGdEPtulFiV3BuqsVlJODgA9-xFY9H9yer_Ckpbt4aG2ZrqgohIq43_ywzj-8u8SKKZnnzm7Dt-EhQBwrA7EhwUveE4-MRcAmsgqRKneN0gUJIu78ogR-M5-GAYqiyd-C-PH0LTaHDhNBWFBkF01kVOLJ18c2VTT6_lbc9j9Drmxj56ouymFgfhdUtpA.cTYqsEvvnKDcIPiah99f_A!
> So I think it' better to change binary to unsupported dataType in csv format, 
> both for datasource v1(CSVFileFormat) and v2(CSVTable).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42237) change binary to unsupported dataType in csv format

2023-01-30 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17681977#comment-17681977
 ] 

Apache Spark commented on SPARK-42237:
--

User 'weiyuyilia' has created a pull request for this issue:
https://github.com/apache/spark/pull/39802

> change binary to unsupported dataType in csv format
> ---
>
> Key: SPARK-42237
> URL: https://issues.apache.org/jira/browse/SPARK-42237
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.4.8, 3.3.1
>Reporter: Wei Guo
>Priority: Minor
> Fix For: 3.4.0
>
> Attachments: image-2023-01-30-17-21-09-212.png
>
>
> When a binary colunm is written into csv files, actual content of this colunm 
> is {*}object.toString(){*}, which is meaningless.
> {code:java}
> val df = 
> Seq(Array[Byte](1,2)).toDFdf.write.csv("/Users/guowei19/Desktop/binary_csv") 
> {code}
> The csv file's content is as follows:
> !image-2023-01-30-17-21-09-212.png|width=141,height=29!
> Meanwhile, if a binary colunm saved as table with csv fileformat, the table 
> can't be read back successfully.
> {code:java}
> val df = Seq((1, 
> Array[Byte](1,2))).toDFdf.write.format("csv").saveAsTable("binaryDataTable")spark.sql("select
>  * from binaryDataTable").show() {code}
> !https://rte.weiyun.baidu.com/wiki/attach/image/api/imageDownloadAddress?attachId=82da0afc444c41bdaac34418a1c89963&docGuid=Eiscz4oMI45Sfp&sign=eyJhbGciOiJkaXIiLCJlbmMiOiJBMjU2R0NNIiwiYXBwSWQiOjEsInVpZCI6IjgtVWkzU0lMY2wiLCJkb2NJZCI6IkVpc2N6NG9NSTQ1U2ZwIn0..z1O-00hE1tTua9co.RmL0GxEQyNVQbIMYOvyAmQY18NMCxHdGdEPtulFiV3BuqsVlJODgA9-xFY9H9yer_Ckpbt4aG2ZrqgohIq43_ywzj-8u8SKKZnnzm7Dt-EhQBwrA7EhwUveE4-MRcAmsgqRKneN0gUJIu78ogR-M5-GAYqiyd-C-PH0LTaHDhNBWFBkF01kVOLJ18c2VTT6_lbc9j9Drmxj56ouymFgfhdUtpA.cTYqsEvvnKDcIPiah99f_A!
> So I think it' better to change binary to unsupported dataType in csv format, 
> both for datasource v1(CSVFileFormat) and v2(CSVTable).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42237) change binary to unsupported dataType in csv format

2023-01-30 Thread Wei Guo (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17681978#comment-17681978
 ] 

Wei Guo commented on SPARK-42237:
-

a pr is ready~

> change binary to unsupported dataType in csv format
> ---
>
> Key: SPARK-42237
> URL: https://issues.apache.org/jira/browse/SPARK-42237
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.4.8, 3.3.1
>Reporter: Wei Guo
>Priority: Minor
> Fix For: 3.4.0
>
> Attachments: image-2023-01-30-17-21-09-212.png
>
>
> When a binary colunm is written into csv files, actual content of this colunm 
> is {*}object.toString(){*}, which is meaningless.
> {code:java}
> val df = 
> Seq(Array[Byte](1,2)).toDFdf.write.csv("/Users/guowei19/Desktop/binary_csv") 
> {code}
> The csv file's content is as follows:
> !image-2023-01-30-17-21-09-212.png|width=141,height=29!
> Meanwhile, if a binary colunm saved as table with csv fileformat, the table 
> can't be read back successfully.
> {code:java}
> val df = Seq((1, 
> Array[Byte](1,2))).toDFdf.write.format("csv").saveAsTable("binaryDataTable")spark.sql("select
>  * from binaryDataTable").show() {code}
> !https://rte.weiyun.baidu.com/wiki/attach/image/api/imageDownloadAddress?attachId=82da0afc444c41bdaac34418a1c89963&docGuid=Eiscz4oMI45Sfp&sign=eyJhbGciOiJkaXIiLCJlbmMiOiJBMjU2R0NNIiwiYXBwSWQiOjEsInVpZCI6IjgtVWkzU0lMY2wiLCJkb2NJZCI6IkVpc2N6NG9NSTQ1U2ZwIn0..z1O-00hE1tTua9co.RmL0GxEQyNVQbIMYOvyAmQY18NMCxHdGdEPtulFiV3BuqsVlJODgA9-xFY9H9yer_Ckpbt4aG2ZrqgohIq43_ywzj-8u8SKKZnnzm7Dt-EhQBwrA7EhwUveE4-MRcAmsgqRKneN0gUJIu78ogR-M5-GAYqiyd-C-PH0LTaHDhNBWFBkF01kVOLJ18c2VTT6_lbc9j9Drmxj56ouymFgfhdUtpA.cTYqsEvvnKDcIPiah99f_A!
> So I think it' better to change binary to unsupported dataType in csv format, 
> both for datasource v1(CSVFileFormat) and v2(CSVTable).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] (SPARK-42237) change binary to unsupported dataType in csv format

2023-01-30 Thread Wei Guo (Jira)


[ https://issues.apache.org/jira/browse/SPARK-42237 ]


Wei Guo deleted comment on SPARK-42237:
-

was (Author: wayne guo):
a pr is ready~

> change binary to unsupported dataType in csv format
> ---
>
> Key: SPARK-42237
> URL: https://issues.apache.org/jira/browse/SPARK-42237
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.4.8, 3.3.1
>Reporter: Wei Guo
>Priority: Minor
> Fix For: 3.4.0
>
> Attachments: image-2023-01-30-17-21-09-212.png
>
>
> When a binary colunm is written into csv files, actual content of this colunm 
> is {*}object.toString(){*}, which is meaningless.
> {code:java}
> val df = 
> Seq(Array[Byte](1,2)).toDFdf.write.csv("/Users/guowei19/Desktop/binary_csv") 
> {code}
> The csv file's content is as follows:
> !image-2023-01-30-17-21-09-212.png|width=141,height=29!
> Meanwhile, if a binary colunm saved as table with csv fileformat, the table 
> can't be read back successfully.
> {code:java}
> val df = Seq((1, 
> Array[Byte](1,2))).toDFdf.write.format("csv").saveAsTable("binaryDataTable")spark.sql("select
>  * from binaryDataTable").show() {code}
> !https://rte.weiyun.baidu.com/wiki/attach/image/api/imageDownloadAddress?attachId=82da0afc444c41bdaac34418a1c89963&docGuid=Eiscz4oMI45Sfp&sign=eyJhbGciOiJkaXIiLCJlbmMiOiJBMjU2R0NNIiwiYXBwSWQiOjEsInVpZCI6IjgtVWkzU0lMY2wiLCJkb2NJZCI6IkVpc2N6NG9NSTQ1U2ZwIn0..z1O-00hE1tTua9co.RmL0GxEQyNVQbIMYOvyAmQY18NMCxHdGdEPtulFiV3BuqsVlJODgA9-xFY9H9yer_Ckpbt4aG2ZrqgohIq43_ywzj-8u8SKKZnnzm7Dt-EhQBwrA7EhwUveE4-MRcAmsgqRKneN0gUJIu78ogR-M5-GAYqiyd-C-PH0LTaHDhNBWFBkF01kVOLJ18c2VTT6_lbc9j9Drmxj56ouymFgfhdUtpA.cTYqsEvvnKDcIPiah99f_A!
> So I think it' better to change binary to unsupported dataType in csv format, 
> both for datasource v1(CSVFileFormat) and v2(CSVTable).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42237) change binary to unsupported dataType in csv format

2023-01-30 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42237?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42237:


Assignee: (was: Apache Spark)

> change binary to unsupported dataType in csv format
> ---
>
> Key: SPARK-42237
> URL: https://issues.apache.org/jira/browse/SPARK-42237
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.4.8, 3.3.1
>Reporter: Wei Guo
>Priority: Minor
> Fix For: 3.4.0
>
> Attachments: image-2023-01-30-17-21-09-212.png
>
>
> When a binary colunm is written into csv files, actual content of this colunm 
> is {*}object.toString(){*}, which is meaningless.
> {code:java}
> val df = 
> Seq(Array[Byte](1,2)).toDFdf.write.csv("/Users/guowei19/Desktop/binary_csv") 
> {code}
> The csv file's content is as follows:
> !image-2023-01-30-17-21-09-212.png|width=141,height=29!
> Meanwhile, if a binary colunm saved as table with csv fileformat, the table 
> can't be read back successfully.
> {code:java}
> val df = Seq((1, 
> Array[Byte](1,2))).toDFdf.write.format("csv").saveAsTable("binaryDataTable")spark.sql("select
>  * from binaryDataTable").show() {code}
> !https://rte.weiyun.baidu.com/wiki/attach/image/api/imageDownloadAddress?attachId=82da0afc444c41bdaac34418a1c89963&docGuid=Eiscz4oMI45Sfp&sign=eyJhbGciOiJkaXIiLCJlbmMiOiJBMjU2R0NNIiwiYXBwSWQiOjEsInVpZCI6IjgtVWkzU0lMY2wiLCJkb2NJZCI6IkVpc2N6NG9NSTQ1U2ZwIn0..z1O-00hE1tTua9co.RmL0GxEQyNVQbIMYOvyAmQY18NMCxHdGdEPtulFiV3BuqsVlJODgA9-xFY9H9yer_Ckpbt4aG2ZrqgohIq43_ywzj-8u8SKKZnnzm7Dt-EhQBwrA7EhwUveE4-MRcAmsgqRKneN0gUJIu78ogR-M5-GAYqiyd-C-PH0LTaHDhNBWFBkF01kVOLJ18c2VTT6_lbc9j9Drmxj56ouymFgfhdUtpA.cTYqsEvvnKDcIPiah99f_A!
> So I think it' better to change binary to unsupported dataType in csv format, 
> both for datasource v1(CSVFileFormat) and v2(CSVTable).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42237) change binary to unsupported dataType in csv format

2023-01-30 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42237?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42237:


Assignee: Apache Spark

> change binary to unsupported dataType in csv format
> ---
>
> Key: SPARK-42237
> URL: https://issues.apache.org/jira/browse/SPARK-42237
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.4.8, 3.3.1
>Reporter: Wei Guo
>Assignee: Apache Spark
>Priority: Minor
> Fix For: 3.4.0
>
> Attachments: image-2023-01-30-17-21-09-212.png
>
>
> When a binary colunm is written into csv files, actual content of this colunm 
> is {*}object.toString(){*}, which is meaningless.
> {code:java}
> val df = 
> Seq(Array[Byte](1,2)).toDFdf.write.csv("/Users/guowei19/Desktop/binary_csv") 
> {code}
> The csv file's content is as follows:
> !image-2023-01-30-17-21-09-212.png|width=141,height=29!
> Meanwhile, if a binary colunm saved as table with csv fileformat, the table 
> can't be read back successfully.
> {code:java}
> val df = Seq((1, 
> Array[Byte](1,2))).toDFdf.write.format("csv").saveAsTable("binaryDataTable")spark.sql("select
>  * from binaryDataTable").show() {code}
> !https://rte.weiyun.baidu.com/wiki/attach/image/api/imageDownloadAddress?attachId=82da0afc444c41bdaac34418a1c89963&docGuid=Eiscz4oMI45Sfp&sign=eyJhbGciOiJkaXIiLCJlbmMiOiJBMjU2R0NNIiwiYXBwSWQiOjEsInVpZCI6IjgtVWkzU0lMY2wiLCJkb2NJZCI6IkVpc2N6NG9NSTQ1U2ZwIn0..z1O-00hE1tTua9co.RmL0GxEQyNVQbIMYOvyAmQY18NMCxHdGdEPtulFiV3BuqsVlJODgA9-xFY9H9yer_Ckpbt4aG2ZrqgohIq43_ywzj-8u8SKKZnnzm7Dt-EhQBwrA7EhwUveE4-MRcAmsgqRKneN0gUJIu78ogR-M5-GAYqiyd-C-PH0LTaHDhNBWFBkF01kVOLJ18c2VTT6_lbc9j9Drmxj56ouymFgfhdUtpA.cTYqsEvvnKDcIPiah99f_A!
> So I think it' better to change binary to unsupported dataType in csv format, 
> both for datasource v1(CSVFileFormat) and v2(CSVTable).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-42237) change binary to unsupported dataType in csv format

2023-01-30 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42237?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-42237:
-
Fix Version/s: (was: 3.4.0)

> change binary to unsupported dataType in csv format
> ---
>
> Key: SPARK-42237
> URL: https://issues.apache.org/jira/browse/SPARK-42237
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.4.8, 3.3.1
>Reporter: Wei Guo
>Priority: Minor
> Attachments: image-2023-01-30-17-21-09-212.png
>
>
> When a binary colunm is written into csv files, actual content of this colunm 
> is {*}object.toString(){*}, which is meaningless.
> {code:java}
> val df = 
> Seq(Array[Byte](1,2)).toDFdf.write.csv("/Users/guowei19/Desktop/binary_csv") 
> {code}
> The csv file's content is as follows:
> !image-2023-01-30-17-21-09-212.png|width=141,height=29!
> Meanwhile, if a binary colunm saved as table with csv fileformat, the table 
> can't be read back successfully.
> {code:java}
> val df = Seq((1, 
> Array[Byte](1,2))).toDFdf.write.format("csv").saveAsTable("binaryDataTable")spark.sql("select
>  * from binaryDataTable").show() {code}
> !https://rte.weiyun.baidu.com/wiki/attach/image/api/imageDownloadAddress?attachId=82da0afc444c41bdaac34418a1c89963&docGuid=Eiscz4oMI45Sfp&sign=eyJhbGciOiJkaXIiLCJlbmMiOiJBMjU2R0NNIiwiYXBwSWQiOjEsInVpZCI6IjgtVWkzU0lMY2wiLCJkb2NJZCI6IkVpc2N6NG9NSTQ1U2ZwIn0..z1O-00hE1tTua9co.RmL0GxEQyNVQbIMYOvyAmQY18NMCxHdGdEPtulFiV3BuqsVlJODgA9-xFY9H9yer_Ckpbt4aG2ZrqgohIq43_ywzj-8u8SKKZnnzm7Dt-EhQBwrA7EhwUveE4-MRcAmsgqRKneN0gUJIu78ogR-M5-GAYqiyd-C-PH0LTaHDhNBWFBkF01kVOLJ18c2VTT6_lbc9j9Drmxj56ouymFgfhdUtpA.cTYqsEvvnKDcIPiah99f_A!
> So I think it' better to change binary to unsupported dataType in csv format, 
> both for datasource v1(CSVFileFormat) and v2(CSVTable).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-42237) change binary to unsupported dataType in csv format

2023-01-30 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42237?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-42237:
-
Target Version/s:   (was: 3.4.0)

> change binary to unsupported dataType in csv format
> ---
>
> Key: SPARK-42237
> URL: https://issues.apache.org/jira/browse/SPARK-42237
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.4.8, 3.3.1
>Reporter: Wei Guo
>Priority: Minor
> Attachments: image-2023-01-30-17-21-09-212.png
>
>
> When a binary colunm is written into csv files, actual content of this colunm 
> is {*}object.toString(){*}, which is meaningless.
> {code:java}
> val df = 
> Seq(Array[Byte](1,2)).toDFdf.write.csv("/Users/guowei19/Desktop/binary_csv") 
> {code}
> The csv file's content is as follows:
> !image-2023-01-30-17-21-09-212.png|width=141,height=29!
> Meanwhile, if a binary colunm saved as table with csv fileformat, the table 
> can't be read back successfully.
> {code:java}
> val df = Seq((1, 
> Array[Byte](1,2))).toDFdf.write.format("csv").saveAsTable("binaryDataTable")spark.sql("select
>  * from binaryDataTable").show() {code}
> !https://rte.weiyun.baidu.com/wiki/attach/image/api/imageDownloadAddress?attachId=82da0afc444c41bdaac34418a1c89963&docGuid=Eiscz4oMI45Sfp&sign=eyJhbGciOiJkaXIiLCJlbmMiOiJBMjU2R0NNIiwiYXBwSWQiOjEsInVpZCI6IjgtVWkzU0lMY2wiLCJkb2NJZCI6IkVpc2N6NG9NSTQ1U2ZwIn0..z1O-00hE1tTua9co.RmL0GxEQyNVQbIMYOvyAmQY18NMCxHdGdEPtulFiV3BuqsVlJODgA9-xFY9H9yer_Ckpbt4aG2ZrqgohIq43_ywzj-8u8SKKZnnzm7Dt-EhQBwrA7EhwUveE4-MRcAmsgqRKneN0gUJIu78ogR-M5-GAYqiyd-C-PH0LTaHDhNBWFBkF01kVOLJ18c2VTT6_lbc9j9Drmxj56ouymFgfhdUtpA.cTYqsEvvnKDcIPiah99f_A!
> So I think it' better to change binary to unsupported dataType in csv format, 
> both for datasource v1(CSVFileFormat) and v2(CSVTable).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-42237) change binary to unsupported dataType in csv format

2023-01-30 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42237?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-42237:
-
Description: 
When a binary colunm is written into csv files, actual content of this colunm 
is {*}object.toString(){*}, which is meaningless.
{code:java}
val df = Seq(Array[Byte](1,2)).toDF
df.write.csv("/Users/guowei19/Desktop/binary_csv")
{code}

The csv file's content is as follows:

!image-2023-01-30-17-21-09-212.png|width=141,height=29!

Meanwhile, if a binary colunm saved as table with csv fileformat, the table 
can't be read back successfully.

{code:java}
val df = Seq((1, Array[Byte](1,2))).toDF
df.write.format("csv").saveAsTable("binaryDataTable")spark.sql("select * from 
binaryDataTable").show()
{code}

!https://rte.weiyun.baidu.com/wiki/attach/image/api/imageDownloadAddress?attachId=82da0afc444c41bdaac34418a1c89963&docGuid=Eiscz4oMI45Sfp&sign=eyJhbGciOiJkaXIiLCJlbmMiOiJBMjU2R0NNIiwiYXBwSWQiOjEsInVpZCI6IjgtVWkzU0lMY2wiLCJkb2NJZCI6IkVpc2N6NG9NSTQ1U2ZwIn0..z1O-00hE1tTua9co.RmL0GxEQyNVQbIMYOvyAmQY18NMCxHdGdEPtulFiV3BuqsVlJODgA9-xFY9H9yer_Ckpbt4aG2ZrqgohIq43_ywzj-8u8SKKZnnzm7Dt-EhQBwrA7EhwUveE4-MRcAmsgqRKneN0gUJIu78ogR-M5-GAYqiyd-C-PH0LTaHDhNBWFBkF01kVOLJ18c2VTT6_lbc9j9Drmxj56ouymFgfhdUtpA.cTYqsEvvnKDcIPiah99f_A!

So I think it' better to change binary to unsupported dataType in csv format, 
both for datasource v1(CSVFileFormat) and v2(CSVTable).

  was:
When a binary colunm is written into csv files, actual content of this colunm 
is {*}object.toString(){*}, which is meaningless.
{code:java}
val df = 
Seq(Array[Byte](1,2)).toDFdf.write.csv("/Users/guowei19/Desktop/binary_csv") 
{code}
The csv file's content is as follows:
!image-2023-01-30-17-21-09-212.png|width=141,height=29!
Meanwhile, if a binary colunm saved as table with csv fileformat, the table 
can't be read back successfully.
{code:java}
val df = Seq((1, 
Array[Byte](1,2))).toDFdf.write.format("csv").saveAsTable("binaryDataTable")spark.sql("select
 * from binaryDataTable").show() {code}
!https://rte.weiyun.baidu.com/wiki/attach/image/api/imageDownloadAddress?attachId=82da0afc444c41bdaac34418a1c89963&docGuid=Eiscz4oMI45Sfp&sign=eyJhbGciOiJkaXIiLCJlbmMiOiJBMjU2R0NNIiwiYXBwSWQiOjEsInVpZCI6IjgtVWkzU0lMY2wiLCJkb2NJZCI6IkVpc2N6NG9NSTQ1U2ZwIn0..z1O-00hE1tTua9co.RmL0GxEQyNVQbIMYOvyAmQY18NMCxHdGdEPtulFiV3BuqsVlJODgA9-xFY9H9yer_Ckpbt4aG2ZrqgohIq43_ywzj-8u8SKKZnnzm7Dt-EhQBwrA7EhwUveE4-MRcAmsgqRKneN0gUJIu78ogR-M5-GAYqiyd-C-PH0LTaHDhNBWFBkF01kVOLJ18c2VTT6_lbc9j9Drmxj56ouymFgfhdUtpA.cTYqsEvvnKDcIPiah99f_A!
So I think it' better to change binary to unsupported dataType in csv format, 
both for datasource v1(CSVFileFormat) and v2(CSVTable).


> change binary to unsupported dataType in csv format
> ---
>
> Key: SPARK-42237
> URL: https://issues.apache.org/jira/browse/SPARK-42237
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.4.8, 3.3.1
>Reporter: Wei Guo
>Priority: Minor
> Attachments: image-2023-01-30-17-21-09-212.png
>
>
> When a binary colunm is written into csv files, actual content of this colunm 
> is {*}object.toString(){*}, which is meaningless.
> {code:java}
> val df = Seq(Array[Byte](1,2)).toDF
> df.write.csv("/Users/guowei19/Desktop/binary_csv")
> {code}
> The csv file's content is as follows:
> !image-2023-01-30-17-21-09-212.png|width=141,height=29!
> Meanwhile, if a binary colunm saved as table with csv fileformat, the table 
> can't be read back successfully.
> {code:java}
> val df = Seq((1, Array[Byte](1,2))).toDF
> df.write.format("csv").saveAsTable("binaryDataTable")spark.sql("select * from 
> binaryDataTable").show()
> {code}
> !https://rte.weiyun.baidu.com/wiki/attach/image/api/imageDownloadAddress?attachId=82da0afc444c41bdaac34418a1c89963&docGuid=Eiscz4oMI45Sfp&sign=eyJhbGciOiJkaXIiLCJlbmMiOiJBMjU2R0NNIiwiYXBwSWQiOjEsInVpZCI6IjgtVWkzU0lMY2wiLCJkb2NJZCI6IkVpc2N6NG9NSTQ1U2ZwIn0..z1O-00hE1tTua9co.RmL0GxEQyNVQbIMYOvyAmQY18NMCxHdGdEPtulFiV3BuqsVlJODgA9-xFY9H9yer_Ckpbt4aG2ZrqgohIq43_ywzj-8u8SKKZnnzm7Dt-EhQBwrA7EhwUveE4-MRcAmsgqRKneN0gUJIu78ogR-M5-GAYqiyd-C-PH0LTaHDhNBWFBkF01kVOLJ18c2VTT6_lbc9j9Drmxj56ouymFgfhdUtpA.cTYqsEvvnKDcIPiah99f_A!
> So I think it' better to change binary to unsupported dataType in csv format, 
> both for datasource v1(CSVFileFormat) and v2(CSVTable).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42168) CoGroup with window function returns incorrect result when partition keys differ in order

2023-01-30 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17682009#comment-17682009
 ] 

Apache Spark commented on SPARK-42168:
--

User 'EnricoMi' has created a pull request for this issue:
https://github.com/apache/spark/pull/39803

> CoGroup with window function returns incorrect result when partition keys 
> differ in order
> -
>
> Key: SPARK-42168
> URL: https://issues.apache.org/jira/browse/SPARK-42168
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark, SQL
>Affects Versions: 3.0.3, 3.1.3, 3.2.3
>Reporter: Enrico Minack
>Assignee: Enrico Minack
>Priority: Major
>  Labels: correctness
> Fix For: 3.2.4
>
>
> The following example returns an incorrect result:
> {code:java}
> import pandas as pd
> from pyspark.sql import SparkSession, Window
> from pyspark.sql.functions import col, lit, sum
> spark = SparkSession \
> .builder \
> .getOrCreate()
> ids = 1000
> days = 1000
> parts = 10
> id_df = spark.range(ids)
> day_df = spark.range(days).withColumnRenamed("id", "day")
> id_day_df = id_df.join(day_df)
> left_df = id_day_df.select(col("id").alias("id"), col("day").alias("day"), 
> lit("left").alias("side")).repartition(parts).cache()
> right_df = id_day_df.select(col("id").alias("id"), col("day").alias("day"), 
> lit("right").alias("side")).repartition(parts).cache()  
> #.withColumnRenamed("id", "id2")
> # note the column order is different to the groupBy("id", "day") column order 
> below
> window = Window.partitionBy("day", "id")
> left_grouped_df = left_df.groupBy("id", "day")
> right_grouped_df = right_df.withColumn("day_sum", 
> sum(col("day")).over(window)).groupBy("id", "day")
> def cogroup(left: pd.DataFrame, right: pd.DataFrame) -> pd.DataFrame:
> return pd.DataFrame([{
> "id": left["id"][0] if not left.empty else (right["id"][0] if not 
> right.empty else None),
> "day": left["day"][0] if not left.empty else (right["day"][0] if not 
> right.empty else None),
> "lefts": len(left.index),
> "rights": len(right.index)
> }])
> df = left_grouped_df.cogroup(right_grouped_df) \
>  .applyInPandas(cogroup, schema="id long, day long, lefts integer, 
> rights integer")
> df.explain()
> df.show(5)
> {code}
> Output is
> {code}
> == Physical Plan ==
> AdaptiveSparkPlan isFinalPlan=false
> +- FlatMapCoGroupsInPandas [id#8L, day#9L], [id#29L, day#30L], cogroup(id#8L, 
> day#9L, side#10, id#29L, day#30L, side#31, day_sum#54L), [id#64L, day#65L, 
> lefts#66, rights#67]
>:- Sort [id#8L ASC NULLS FIRST, day#9L ASC NULLS FIRST], false, 0
>:  +- Exchange hashpartitioning(id#8L, day#9L, 200), ENSURE_REQUIREMENTS, 
> [plan_id=117]
>: +- ...
>+- Sort [id#29L ASC NULLS FIRST, day#30L ASC NULLS FIRST], false, 0
>   +- Project [id#29L, day#30L, id#29L, day#30L, side#31, day_sum#54L]
>  +- Window [sum(day#30L) windowspecdefinition(day#30L, id#29L, 
> specifiedwindowframe(RowFrame, unboundedpreceding$(), unboundedfollowing$())) 
> AS day_sum#54L], [day#30L, id#29L]
> +- Sort [day#30L ASC NULLS FIRST, id#29L ASC NULLS FIRST], false, > 0
>+- Exchange hashpartitioning(day#30L, id#29L, 200), 
> ENSURE_REQUIREMENTS, [plan_id=112]
>   +- ...
> +---+---+-+--+
> | id|day|lefts|rights|
> +---+---+-+--+
> |  0|  3|0| 1|
> |  0|  4|0| 1|
> |  0| 13|1| 0|
> |  0| 27|0| 1|
> |  0| 31|0| 1|
> +---+---+-+--+
> only showing top 5 rows
> {code}
> The first child is hash-partitioned by {{id}} and {{{}day{}}}, while the 
> second child is hash-partitioned by {{day}} and {{id}} (required by the 
> window function). Therefore, rows end up in different partitions.
> This has been fixed in Spark 3.3 by 
> [#32875|https://github.com/apache/spark/pull/32875/files#diff-e938569a4ca4eba8f7e10fe473d4f9c306ea253df151405bcaba880a601f075fR75-R76]:
> {code}
> == Physical Plan ==
> AdaptiveSparkPlan isFinalPlan=false
> +- FlatMapCoGroupsInPandas [id#8L, day#9L], [id#29L, day#30L], cogroup(id#8L, 
> day#9L, side#10, id#29L, day#30L, side#31, day_sum#54L)#63, [id#64L, day#65L, 
> lefts#66, rights#67]
>:- Sort [id#8L ASC NULLS FIRST, day#9L ASC NULLS FIRST], false, 0
>:  +- Exchange hashpartitioning(id#8L, day#9L, 200), ENSURE_REQUIREMENTS, 
> [plan_id=117]
>: +- ...
>+- Sort [id#29L ASC NULLS FIRST, day#30L ASC NULLS FIRST], false, 0
>   +- Exchange hashpartitioning(id#29L, day#30L, 200), 
> ENSURE_REQUIREMENTS, [plan_id=118]
>  +- Project [id#29L, day#30L, id#29L, day#30L, side#31, day_sum#54L]
> +- Window [sum(day#30L) windowspecdefinition(day#30L, id#29L, 
> specifiedwindowframe(RowF

[jira] [Assigned] (SPARK-42066) The DATATYPE_MISMATCH error class contains inappropriate and duplicating subclasses

2023-01-30 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-42066:
---

Assignee: Haejoon Lee

> The DATATYPE_MISMATCH error class contains inappropriate and duplicating 
> subclasses
> ---
>
> Key: SPARK-42066
> URL: https://issues.apache.org/jira/browse/SPARK-42066
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.4.0
>Reporter: Serge Rielau
>Assignee: Haejoon Lee
>Priority: Major
>
> subclass WRONG_NUM_ARGS (with suggestions) semantically does not belong into 
> DATATYPE_MISMATCH and there is an error class with that same name.
> We should rea the subclasses for this errorclass, which seems to have become 
> a bit of a dumping ground...



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-42066) The DATATYPE_MISMATCH error class contains inappropriate and duplicating subclasses

2023-01-30 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-42066.
-
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 39625
[https://github.com/apache/spark/pull/39625]

> The DATATYPE_MISMATCH error class contains inappropriate and duplicating 
> subclasses
> ---
>
> Key: SPARK-42066
> URL: https://issues.apache.org/jira/browse/SPARK-42066
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.4.0
>Reporter: Serge Rielau
>Assignee: Haejoon Lee
>Priority: Major
> Fix For: 3.4.0
>
>
> subclass WRONG_NUM_ARGS (with suggestions) semantically does not belong into 
> DATATYPE_MISMATCH and there is an error class with that same name.
> We should rea the subclasses for this errorclass, which seems to have become 
> a bit of a dumping ground...



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-42198) spark.read fails to read filenames with accented characters

2023-01-30 Thread Tarique Anwer (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tarique Anwer updated SPARK-42198:
--
Description: 
Unable to read filenames with accented characters in the filename.

*Sample error:*
{code:java}
org.apache.spark.SparkException: Job aborted due to stage failure: Task 43 in 
stage 1.0 failed 4 times, most recent failure: Lost task 43.3 in stage 1.0 (TID 
105) (10.139.64.5 executor 0): java.io.FileNotFoundException: 
/4842022074360943/user/hive/warehouse/hls_cms_source.db/raw_files/synthea_mass/ccda/Amalia471_Magaña874_3912696a-0aef-492e-83ef-468262b82966.xml{code}
 

*{{Steps to reproduce error:}}*
{code:java}
%sh
mkdir -p /dbfs/user/hive/warehouse/hls_cms_source.db/raw_files/synthea_mass
wget  
https://synthetichealth.github.io/synthea-sample-data/downloads/synthea_sample_data_ccda_sep2019.zip
 -O ./synthea_sample_data_ccda_sep2019.zip 
unzip ./synthea_sample_data_ccda_sep2019.zip -d 
/dbfs/user/hive/warehouse/hls_cms_source.db/raw_files/synthea_mass/
{code}
 
{code:java}
spark.conf.set("spark.sql.caseSensitive", "true")
df = (
  spark.read.format('xml')
   .option("rowTag", "ClinicalDocument")
  .load('/user/hive/warehouse/hls_cms_source.db/raw_files/synthea_mass/ccda/')
){code}
Is there a way to deal with this situation where I don't have control over the 
file names for some reason?

  was:
Unable to read filenames with accented characters in the filename.

*Sample error:*
{code:java}
org.apache.spark.SparkException: Job aborted due to stage failure: Task 43 in 
stage 1.0 failed 4 times, most recent failure: Lost task 43.3 in stage 1.0 (TID 
105) (10.139.64.5 executor 0): java.io.FileNotFoundException: 
/4842022074360943/user/hive/warehouse/hls_cms_source.db/raw_files/synthea_mass/ccda/Amalia471_Magaña874_3912696a-0aef-492e-83ef-468262b82966.xml{code}
 

*{{Steps to reproduce error:}}*
{code:java}
%sh
mkdir -p /dbfs/user/hive/warehouse/hls_cms_source.db/raw_files/synthea_mass
wget  
https://synthetichealth.github.io/synthea-sample-data/downloads/synthea_sample_data_ccda_sep2019.zip
 -O ./synthea_sample_data_ccda_sep2019.zip 
unzip ./synthea_sample_data_ccda_sep2019.zip -d 
/dbfs/user/hive/warehouse/hls_cms_source.db/raw_files/synthea_mass/
{code}
 
{code:java}
spark.conf.set("spark.sql.caseSensitive", "true")
df = (
  spark.read.format('xml')
   .option("rowTag", "ClinicalDocument")
  
.load('/user/hive/warehouse/hls_cms_source.db/raw_files/synthea_mass/ccda/José_Emilio366_Macías944_1e740307-8780-4542-abeb-7037a2557a0e.xml')
){code}
Is there a way to deal with this situation where I don't have control over the 
file names for some reason?


> spark.read fails to read filenames with accented characters
> ---
>
> Key: SPARK-42198
> URL: https://issues.apache.org/jira/browse/SPARK-42198
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 3.2.1
>Reporter: Tarique Anwer
>Priority: Major
>
> Unable to read filenames with accented characters in the filename.
> *Sample error:*
> {code:java}
> org.apache.spark.SparkException: Job aborted due to stage failure: Task 43 in 
> stage 1.0 failed 4 times, most recent failure: Lost task 43.3 in stage 1.0 
> (TID 105) (10.139.64.5 executor 0): java.io.FileNotFoundException: 
> /4842022074360943/user/hive/warehouse/hls_cms_source.db/raw_files/synthea_mass/ccda/Amalia471_Magaña874_3912696a-0aef-492e-83ef-468262b82966.xml{code}
>  
> *{{Steps to reproduce error:}}*
> {code:java}
> %sh
> mkdir -p /dbfs/user/hive/warehouse/hls_cms_source.db/raw_files/synthea_mass
> wget  
> https://synthetichealth.github.io/synthea-sample-data/downloads/synthea_sample_data_ccda_sep2019.zip
>  -O ./synthea_sample_data_ccda_sep2019.zip 
> unzip ./synthea_sample_data_ccda_sep2019.zip -d 
> /dbfs/user/hive/warehouse/hls_cms_source.db/raw_files/synthea_mass/
> {code}
>  
> {code:java}
> spark.conf.set("spark.sql.caseSensitive", "true")
> df = (
>   spark.read.format('xml')
>    .option("rowTag", "ClinicalDocument")
>   .load('/user/hive/warehouse/hls_cms_source.db/raw_files/synthea_mass/ccda/')
> ){code}
> Is there a way to deal with this situation where I don't have control over 
> the file names for some reason?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-42198) spark.read fails to read filenames with accented characters

2023-01-30 Thread Tarique Anwer (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17682041#comment-17682041
 ] 

Tarique Anwer edited comment on SPARK-42198 at 1/30/23 12:15 PM:
-

I have updated the original comment to remove the specific file name. I'm 
trying to read all the XML files in the folder together. While it works just 
fine for files without accented characters in their filename, I start getting 
error as soon as one is mixed in the lot.

Even if I try to read a single file with the accented character, as in the 
comment above, I get an error.
{code:java}
spark.conf.set("spark.sql.caseSensitive", "true")
df = (
  spark.read.format('xml')
   .option("rowTag", "ClinicalDocument")
  
.load('/dbfs/user/hive/warehouse/hls_cms_source.db/raw_files/synthea_mass/ccda/José_Emilio366_Macías944_1e740307-8780-4542-abeb-7037a2557a0e.xml')
){code}
 
Error:

 
{code:java}
org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does 
not exist: 
dbfs:/dbfs/user/hive/warehouse/hls_cms_source.db/raw_files/synthea_mass/ccda/José_Emilio366_Macías944_1e740307-8780-4542-abeb-7037a2557a0e.xml{code}
 

 


was (Author: JIRAUSER296223):
I have updated the original comment to remove the specific file name. I'm 
trying to read all the XML files in the folder together. While it works just 
fine for files without accented characters in their filename, I start getting 
error as soon as one is mixed in the lot.

Even if I try to read a single file with the accented character, as in the 
comment above, I get an error.
spark.conf.set("spark.sql.caseSensitive", "true")
df = (
  spark.read.format('xml')
   .option("rowTag", "ClinicalDocument")
  
.load('/dbfs/user/hive/warehouse/hls_cms_source.db/raw_files/synthea_mass/ccda/José_Emilio366_Macías944_1e740307-8780-4542-abeb-7037a2557a0e.xml')
) 
Error:

 
{code:java}
org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does 
not exist: 
dbfs:/dbfs/user/hive/warehouse/hls_cms_source.db/raw_files/synthea_mass/ccda/José_Emilio366_Macías944_1e740307-8780-4542-abeb-7037a2557a0e.xml{code}
 

 

> spark.read fails to read filenames with accented characters
> ---
>
> Key: SPARK-42198
> URL: https://issues.apache.org/jira/browse/SPARK-42198
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 3.2.1
>Reporter: Tarique Anwer
>Priority: Major
>
> Unable to read filenames with accented characters in the filename.
> *Sample error:*
> {code:java}
> org.apache.spark.SparkException: Job aborted due to stage failure: Task 43 in 
> stage 1.0 failed 4 times, most recent failure: Lost task 43.3 in stage 1.0 
> (TID 105) (10.139.64.5 executor 0): java.io.FileNotFoundException: 
> /4842022074360943/user/hive/warehouse/hls_cms_source.db/raw_files/synthea_mass/ccda/Amalia471_Magaña874_3912696a-0aef-492e-83ef-468262b82966.xml{code}
>  
> *{{Steps to reproduce error:}}*
> {code:java}
> %sh
> mkdir -p /dbfs/user/hive/warehouse/hls_cms_source.db/raw_files/synthea_mass
> wget  
> https://synthetichealth.github.io/synthea-sample-data/downloads/synthea_sample_data_ccda_sep2019.zip
>  -O ./synthea_sample_data_ccda_sep2019.zip 
> unzip ./synthea_sample_data_ccda_sep2019.zip -d 
> /dbfs/user/hive/warehouse/hls_cms_source.db/raw_files/synthea_mass/
> {code}
>  
> {code:java}
> spark.conf.set("spark.sql.caseSensitive", "true")
> df = (
>   spark.read.format('xml')
>    .option("rowTag", "ClinicalDocument")
>   .load('/user/hive/warehouse/hls_cms_source.db/raw_files/synthea_mass/ccda/')
> ){code}
> Is there a way to deal with this situation where I don't have control over 
> the file names for some reason?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42198) spark.read fails to read filenames with accented characters

2023-01-30 Thread Tarique Anwer (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17682041#comment-17682041
 ] 

Tarique Anwer commented on SPARK-42198:
---

I have updated the original comment to remove the specific file name. I'm 
trying to read all the XML files in the folder together. While it works just 
fine for files without accented characters in their filename, I start getting 
error as soon as one is mixed in the lot.

Even if I try to read a single file with the accented character, as in the 
comment above, I get an error.
spark.conf.set("spark.sql.caseSensitive", "true")
df = (
  spark.read.format('xml')
   .option("rowTag", "ClinicalDocument")
  
.load('/dbfs/user/hive/warehouse/hls_cms_source.db/raw_files/synthea_mass/ccda/José_Emilio366_Macías944_1e740307-8780-4542-abeb-7037a2557a0e.xml')
) 
Error:

 
{code:java}
org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does 
not exist: 
dbfs:/dbfs/user/hive/warehouse/hls_cms_source.db/raw_files/synthea_mass/ccda/José_Emilio366_Macías944_1e740307-8780-4542-abeb-7037a2557a0e.xml{code}
 

 

> spark.read fails to read filenames with accented characters
> ---
>
> Key: SPARK-42198
> URL: https://issues.apache.org/jira/browse/SPARK-42198
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 3.2.1
>Reporter: Tarique Anwer
>Priority: Major
>
> Unable to read filenames with accented characters in the filename.
> *Sample error:*
> {code:java}
> org.apache.spark.SparkException: Job aborted due to stage failure: Task 43 in 
> stage 1.0 failed 4 times, most recent failure: Lost task 43.3 in stage 1.0 
> (TID 105) (10.139.64.5 executor 0): java.io.FileNotFoundException: 
> /4842022074360943/user/hive/warehouse/hls_cms_source.db/raw_files/synthea_mass/ccda/Amalia471_Magaña874_3912696a-0aef-492e-83ef-468262b82966.xml{code}
>  
> *{{Steps to reproduce error:}}*
> {code:java}
> %sh
> mkdir -p /dbfs/user/hive/warehouse/hls_cms_source.db/raw_files/synthea_mass
> wget  
> https://synthetichealth.github.io/synthea-sample-data/downloads/synthea_sample_data_ccda_sep2019.zip
>  -O ./synthea_sample_data_ccda_sep2019.zip 
> unzip ./synthea_sample_data_ccda_sep2019.zip -d 
> /dbfs/user/hive/warehouse/hls_cms_source.db/raw_files/synthea_mass/
> {code}
>  
> {code:java}
> spark.conf.set("spark.sql.caseSensitive", "true")
> df = (
>   spark.read.format('xml')
>    .option("rowTag", "ClinicalDocument")
>   .load('/user/hive/warehouse/hls_cms_source.db/raw_files/synthea_mass/ccda/')
> ){code}
> Is there a way to deal with this situation where I don't have control over 
> the file names for some reason?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-42214) Branch-3.4 daily test failed

2023-01-30 Thread Yikun Jiang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yikun Jiang resolved SPARK-42214.
-
  Assignee: Yikun Jiang
Resolution: Fixed

Branch-3.4 scheduled job recovered:

https://github.com/apache/spark/pull/39778#issuecomment-1408528171

> Branch-3.4 daily test failed
> 
>
> Key: SPARK-42214
> URL: https://issues.apache.org/jira/browse/SPARK-42214
> Project: Spark
>  Issue Type: Bug
>  Components: Project Infra
>Affects Versions: 3.5.0
>Reporter: Yang Jie
>Assignee: Yikun Jiang
>Priority: Major
>
> https://github.com/apache/spark/actions/runs/4023012095/jobs/6913400923



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42236) Refine `NULLABLE_ARRAY_OR_MAP_ELEMENT`

2023-01-30 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17682045#comment-17682045
 ] 

Apache Spark commented on SPARK-42236:
--

User 'itholic' has created a pull request for this issue:
https://github.com/apache/spark/pull/39804

> Refine `NULLABLE_ARRAY_OR_MAP_ELEMENT`
> --
>
> Key: SPARK-42236
> URL: https://issues.apache.org/jira/browse/SPARK-42236
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Haejoon Lee
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42236) Refine `NULLABLE_ARRAY_OR_MAP_ELEMENT`

2023-01-30 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42236:


Assignee: Apache Spark

> Refine `NULLABLE_ARRAY_OR_MAP_ELEMENT`
> --
>
> Key: SPARK-42236
> URL: https://issues.apache.org/jira/browse/SPARK-42236
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Haejoon Lee
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42236) Refine `NULLABLE_ARRAY_OR_MAP_ELEMENT`

2023-01-30 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42236:


Assignee: (was: Apache Spark)

> Refine `NULLABLE_ARRAY_OR_MAP_ELEMENT`
> --
>
> Key: SPARK-42236
> URL: https://issues.apache.org/jira/browse/SPARK-42236
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Haejoon Lee
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42236) Refine `NULLABLE_ARRAY_OR_MAP_ELEMENT`

2023-01-30 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17682046#comment-17682046
 ] 

Apache Spark commented on SPARK-42236:
--

User 'itholic' has created a pull request for this issue:
https://github.com/apache/spark/pull/39804

> Refine `NULLABLE_ARRAY_OR_MAP_ELEMENT`
> --
>
> Key: SPARK-42236
> URL: https://issues.apache.org/jira/browse/SPARK-42236
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Haejoon Lee
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-42238) Rename UNSUPPORTED_FEATURE.NATURAL_CROSS_JOIN

2023-01-30 Thread Haejoon Lee (Jira)
Haejoon Lee created SPARK-42238:
---

 Summary: Rename UNSUPPORTED_FEATURE.NATURAL_CROSS_JOIN
 Key: SPARK-42238
 URL: https://issues.apache.org/jira/browse/SPARK-42238
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.4.0
Reporter: Haejoon Lee






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-41490) Assign name to _LEGACY_ERROR_TEMP_2441

2023-01-30 Thread Max Gekk (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk resolved SPARK-41490.
--
Fix Version/s: 3.5.0
   Resolution: Fixed

Issue resolved by pull request 39700
[https://github.com/apache/spark/pull/39700]

> Assign name to _LEGACY_ERROR_TEMP_2441
> --
>
> Key: SPARK-41490
> URL: https://issues.apache.org/jira/browse/SPARK-41490
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Haejoon Lee
>Assignee: Haejoon Lee
>Priority: Major
> Fix For: 3.5.0
>
>
> We should assign proper name for all LEGACY temp error classes.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-41490) Assign name to _LEGACY_ERROR_TEMP_2441

2023-01-30 Thread Max Gekk (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk reassigned SPARK-41490:


Assignee: Haejoon Lee

> Assign name to _LEGACY_ERROR_TEMP_2441
> --
>
> Key: SPARK-41490
> URL: https://issues.apache.org/jira/browse/SPARK-41490
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Haejoon Lee
>Assignee: Haejoon Lee
>Priority: Major
>
> We should assign proper name for all LEGACY temp error classes.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-42238) Introduce `INCOMPATIBLE_JOIN_TYPES`

2023-01-30 Thread Haejoon Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haejoon Lee updated SPARK-42238:

Summary: Introduce `INCOMPATIBLE_JOIN_TYPES`  (was: Rename 
UNSUPPORTED_FEATURE.NATURAL_CROSS_JOIN)

> Introduce `INCOMPATIBLE_JOIN_TYPES`
> ---
>
> Key: SPARK-42238
> URL: https://issues.apache.org/jira/browse/SPARK-42238
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Haejoon Lee
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42238) Introduce `INCOMPATIBLE_JOIN_TYPES`

2023-01-30 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17682090#comment-17682090
 ] 

Apache Spark commented on SPARK-42238:
--

User 'itholic' has created a pull request for this issue:
https://github.com/apache/spark/pull/39805

> Introduce `INCOMPATIBLE_JOIN_TYPES`
> ---
>
> Key: SPARK-42238
> URL: https://issues.apache.org/jira/browse/SPARK-42238
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Haejoon Lee
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42238) Introduce `INCOMPATIBLE_JOIN_TYPES`

2023-01-30 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42238:


Assignee: Apache Spark

> Introduce `INCOMPATIBLE_JOIN_TYPES`
> ---
>
> Key: SPARK-42238
> URL: https://issues.apache.org/jira/browse/SPARK-42238
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Haejoon Lee
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42238) Introduce `INCOMPATIBLE_JOIN_TYPES`

2023-01-30 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42238:


Assignee: (was: Apache Spark)

> Introduce `INCOMPATIBLE_JOIN_TYPES`
> ---
>
> Key: SPARK-42238
> URL: https://issues.apache.org/jira/browse/SPARK-42238
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Haejoon Lee
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-42233) Improve error message for PIVOT_AFTER_GROUP_BY

2023-01-30 Thread Max Gekk (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk resolved SPARK-42233.
--
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 39793
[https://github.com/apache/spark/pull/39793]

> Improve error message for PIVOT_AFTER_GROUP_BY
> --
>
> Key: SPARK-42233
> URL: https://issues.apache.org/jira/browse/SPARK-42233
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Haejoon Lee
>Assignee: Apache Spark
>Priority: Major
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-42239) Integrate MUST_AGGREGATE_CORRELATED_SCALAR_SUBQUERY

2023-01-30 Thread Haejoon Lee (Jira)
Haejoon Lee created SPARK-42239:
---

 Summary: Integrate MUST_AGGREGATE_CORRELATED_SCALAR_SUBQUERY
 Key: SPARK-42239
 URL: https://issues.apache.org/jira/browse/SPARK-42239
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.4.0
Reporter: Haejoon Lee






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42239) Integrate MUST_AGGREGATE_CORRELATED_SCALAR_SUBQUERY

2023-01-30 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17682108#comment-17682108
 ] 

Apache Spark commented on SPARK-42239:
--

User 'itholic' has created a pull request for this issue:
https://github.com/apache/spark/pull/39806

> Integrate MUST_AGGREGATE_CORRELATED_SCALAR_SUBQUERY
> ---
>
> Key: SPARK-42239
> URL: https://issues.apache.org/jira/browse/SPARK-42239
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Haejoon Lee
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42239) Integrate MUST_AGGREGATE_CORRELATED_SCALAR_SUBQUERY

2023-01-30 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42239:


Assignee: Apache Spark

> Integrate MUST_AGGREGATE_CORRELATED_SCALAR_SUBQUERY
> ---
>
> Key: SPARK-42239
> URL: https://issues.apache.org/jira/browse/SPARK-42239
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Haejoon Lee
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42239) Integrate MUST_AGGREGATE_CORRELATED_SCALAR_SUBQUERY

2023-01-30 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42239:


Assignee: (was: Apache Spark)

> Integrate MUST_AGGREGATE_CORRELATED_SCALAR_SUBQUERY
> ---
>
> Key: SPARK-42239
> URL: https://issues.apache.org/jira/browse/SPARK-42239
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Haejoon Lee
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41500) auto generate concat as Double when string minus an INTERVAL type

2023-01-30 Thread Narek Karapetian (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17682141#comment-17682141
 ] 

Narek Karapetian commented on SPARK-41500:
--

It is by design...

You can set `{{{}spark.sql.legacy.interval.enabled`{}}} to `true` to restore 
the old behaviour.

> auto generate concat as Double when string minus an INTERVAL type
> -
>
> Key: SPARK-41500
> URL: https://issues.apache.org/jira/browse/SPARK-41500
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.0, 3.2.1, 3.2.2
>Reporter: JacobZheng
>Priority: Major
>
> h2. *Describe the bug*
> Here is a sql.
> {code:sql}
> select '2022-02-01'- INTERVAL 1 year
> {code}
> spark generate cast('2022-02-01' as double) - INTERVAL 1 year automatically 
> and type mismatch happened.
> h2. *To Reproduce*
> On Spark 3.0.1 using spark-shell
> {code:java}
> scala> spark.sql("select '2022-02-01'- interval 1 year").show
> +--+  
>   
> |CAST(CAST(2022-02-01 AS TIMESTAMP) - INTERVAL '1 years' AS STRING)|
> +--+
> |   2021-02-01 00:00:00|
> +--+
> {code}
> On Spark 3.2.1 using spark-shell
> {code:java}
> scala> spark.sql("select '2022-02-01'- interval 1 year").show
> org.apache.spark.sql.AnalysisException: cannot resolve '(CAST('2022-02-01' AS 
> DOUBLE) - INTERVAL '1' YEAR)' due to data type mismatch: differing types in 
> '(CAST('2022-02-01' AS DOUBLE) - INTERVAL '1' YEAR)' (double and interval 
> year).; line 1 pos 7;
> 'Project [unresolvedalias((cast(2022-02-01 as double) - INTERVAL '1' YEAR), 
> None)]
> +- OneRowRelation
>   at 
> org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42)
>   at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$$nestedInanonfun$checkAnalysis$1$2.applyOrElse(CheckAnalysis.scala:190)
>   at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$$nestedInanonfun$checkAnalysis$1$2.applyOrElse(CheckAnalysis.scala:175)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformUpWithPruning$2(TreeNode.scala:535)
>   at 
> org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:82)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformUpWithPruning(TreeNode.scala:535)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformUpWithPruning$1(TreeNode.scala:532)
>   at 
> org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren(TreeNode.scala:1128)
>   at 
> org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren$(TreeNode.scala:1127)
>   at 
> org.apache.spark.sql.catalyst.expressions.UnaryExpression.mapChildren(Expression.scala:467)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformUpWithPruning(TreeNode.scala:532)
>   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$transformExpressionsUpWithPruning$1(QueryPlan.scala:181)
>   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$mapExpressions$1(QueryPlan.scala:193)
>   at 
> org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:82)
>   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpression$1(QueryPlan.scala:193)
>   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.recursiveTransform$1(QueryPlan.scala:204)
>   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$mapExpressions$3(QueryPlan.scala:209)
>   at 
> scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286)
>   at scala.collection.immutable.List.foreach(List.scala:431)
>   at scala.collection.TraversableLike.map(TraversableLike.scala:286)
>   at scala.collection.TraversableLike.map$(TraversableLike.scala:279)
>   at scala.collection.immutable.List.map(List.scala:305)
>   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.recursiveTransform$1(QueryPlan.scala:209)
>   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$mapExpressions$4(QueryPlan.scala:214)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:323)
>   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.mapExpressions(QueryPlan.scala:214)
>   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpressionsUpWithPruning(QueryPlan.scala:181)
>   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpressionsUp(QueryPlan.scala:161)
>   at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis$1(CheckAnalysis.scala:175)
>   at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis$1$adapted(CheckAnalysis.scala:94

[jira] [Created] (SPARK-42240) Move ClientE2ETestSuite into a separate module to test shaded jvm client

2023-01-30 Thread Yang Jie (Jira)
Yang Jie created SPARK-42240:


 Summary: Move ClientE2ETestSuite into a separate module  to test 
shaded jvm client
 Key: SPARK-42240
 URL: https://issues.apache.org/jira/browse/SPARK-42240
 Project: Spark
  Issue Type: Improvement
  Components: Connect
Affects Versions: 3.4.0, 3.5.0
Reporter: Yang Jie






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42240) Move ClientE2ETestSuite into a separate module to test shaded jvm client

2023-01-30 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42240:


Assignee: Apache Spark

> Move ClientE2ETestSuite into a separate module  to test shaded jvm client
> -
>
> Key: SPARK-42240
> URL: https://issues.apache.org/jira/browse/SPARK-42240
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 3.4.0, 3.5.0
>Reporter: Yang Jie
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42240) Move ClientE2ETestSuite into a separate module to test shaded jvm client

2023-01-30 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17682158#comment-17682158
 ] 

Apache Spark commented on SPARK-42240:
--

User 'LuciferYang' has created a pull request for this issue:
https://github.com/apache/spark/pull/39807

> Move ClientE2ETestSuite into a separate module  to test shaded jvm client
> -
>
> Key: SPARK-42240
> URL: https://issues.apache.org/jira/browse/SPARK-42240
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 3.4.0, 3.5.0
>Reporter: Yang Jie
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42240) Move ClientE2ETestSuite into a separate module to test shaded jvm client

2023-01-30 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42240:


Assignee: (was: Apache Spark)

> Move ClientE2ETestSuite into a separate module  to test shaded jvm client
> -
>
> Key: SPARK-42240
> URL: https://issues.apache.org/jira/browse/SPARK-42240
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 3.4.0, 3.5.0
>Reporter: Yang Jie
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41970) SparkPath

2023-01-30 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17682163#comment-17682163
 ] 

Apache Spark commented on SPARK-41970:
--

User 'databricks-david-lewis' has created a pull request for this issue:
https://github.com/apache/spark/pull/39808

> SparkPath
> -
>
> Key: SPARK-41970
> URL: https://issues.apache.org/jira/browse/SPARK-41970
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.4.0
>Reporter: David Lewis
>Assignee: David Lewis
>Priority: Major
> Fix For: 3.4.0
>
>
> Today, Spark represents file paths in various ways. Sometimes they are Hadoop 
> `Path`s, sometimes they are `Path.toString`s, and sometimes they are 
> `Path.toUri.toString`s.
> This discrepancy means that Spark does not always work when user provided 
> strings have special characters. Sometimes Spark will try to create a URI 
> with an unescaped string; sometimes Spark will double-escape a path and try 
> to access the wrong file.
>  
> This issue proposes a new `SparkPath` class meant to provide type safety when 
> Spark is dealing with paths.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-42241) Correct the condition for `SparkConnectServerUtils#findSparkConnectJar` to find the correct connect server jar for maven

2023-01-30 Thread Yang Jie (Jira)
Yang Jie created SPARK-42241:


 Summary:  Correct the condition for 
`SparkConnectServerUtils#findSparkConnectJar` to find the correct connect 
server jar for maven
 Key: SPARK-42241
 URL: https://issues.apache.org/jira/browse/SPARK-42241
 Project: Spark
  Issue Type: Bug
  Components: Connect, Tests
Affects Versions: 3.4.0
Reporter: Yang Jie






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42230) Improve `lint` job by skipping PySpark and SparkR docs if unchanged

2023-01-30 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17682212#comment-17682212
 ] 

Apache Spark commented on SPARK-42230:
--

User 'dongjoon-hyun' has created a pull request for this issue:
https://github.com/apache/spark/pull/39809

> Improve `lint` job by skipping PySpark and SparkR docs if unchanged
> ---
>
> Key: SPARK-42230
> URL: https://issues.apache.org/jira/browse/SPARK-42230
> Project: Spark
>  Issue Type: Improvement
>  Components: Project Infra
>Affects Versions: 3.4.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
> Fix For: 3.3.2, 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42241) Correct the condition for `SparkConnectServerUtils#findSparkConnectJar` to find the correct connect server jar for maven

2023-01-30 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42241:


Assignee: Apache Spark

>  Correct the condition for `SparkConnectServerUtils#findSparkConnectJar` to 
> find the correct connect server jar for maven
> -
>
> Key: SPARK-42241
> URL: https://issues.apache.org/jira/browse/SPARK-42241
> Project: Spark
>  Issue Type: Bug
>  Components: Connect, Tests
>Affects Versions: 3.4.0
>Reporter: Yang Jie
>Assignee: Apache Spark
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42241) Correct the condition for `SparkConnectServerUtils#findSparkConnectJar` to find the correct connect server jar for maven

2023-01-30 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42241:


Assignee: (was: Apache Spark)

>  Correct the condition for `SparkConnectServerUtils#findSparkConnectJar` to 
> find the correct connect server jar for maven
> -
>
> Key: SPARK-42241
> URL: https://issues.apache.org/jira/browse/SPARK-42241
> Project: Spark
>  Issue Type: Bug
>  Components: Connect, Tests
>Affects Versions: 3.4.0
>Reporter: Yang Jie
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42241) Correct the condition for `SparkConnectServerUtils#findSparkConnectJar` to find the correct connect server jar for maven

2023-01-30 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17682213#comment-17682213
 ] 

Apache Spark commented on SPARK-42241:
--

User 'LuciferYang' has created a pull request for this issue:
https://github.com/apache/spark/pull/39810

>  Correct the condition for `SparkConnectServerUtils#findSparkConnectJar` to 
> find the correct connect server jar for maven
> -
>
> Key: SPARK-42241
> URL: https://issues.apache.org/jira/browse/SPARK-42241
> Project: Spark
>  Issue Type: Bug
>  Components: Connect, Tests
>Affects Versions: 3.4.0
>Reporter: Yang Jie
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42241) Correct the condition for `SparkConnectServerUtils#findSparkConnectJar` to find the correct connect server jar for maven

2023-01-30 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17682214#comment-17682214
 ] 

Apache Spark commented on SPARK-42241:
--

User 'LuciferYang' has created a pull request for this issue:
https://github.com/apache/spark/pull/39810

>  Correct the condition for `SparkConnectServerUtils#findSparkConnectJar` to 
> find the correct connect server jar for maven
> -
>
> Key: SPARK-42241
> URL: https://issues.apache.org/jira/browse/SPARK-42241
> Project: Spark
>  Issue Type: Bug
>  Components: Connect, Tests
>Affects Versions: 3.4.0
>Reporter: Yang Jie
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-42239) Integrate MUST_AGGREGATE_CORRELATED_SCALAR_SUBQUERY

2023-01-30 Thread Max Gekk (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk resolved SPARK-42239.
--
Fix Version/s: 3.5.0
   Resolution: Fixed

Issue resolved by pull request 39806
[https://github.com/apache/spark/pull/39806]

> Integrate MUST_AGGREGATE_CORRELATED_SCALAR_SUBQUERY
> ---
>
> Key: SPARK-42239
> URL: https://issues.apache.org/jira/browse/SPARK-42239
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Haejoon Lee
>Assignee: Haejoon Lee
>Priority: Major
> Fix For: 3.5.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42239) Integrate MUST_AGGREGATE_CORRELATED_SCALAR_SUBQUERY

2023-01-30 Thread Max Gekk (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk reassigned SPARK-42239:


Assignee: Haejoon Lee

> Integrate MUST_AGGREGATE_CORRELATED_SCALAR_SUBQUERY
> ---
>
> Key: SPARK-42239
> URL: https://issues.apache.org/jira/browse/SPARK-42239
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Haejoon Lee
>Assignee: Haejoon Lee
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40684) Update embedded vis-timeline javascript resources

2023-01-30 Thread Eugene Shinn (Truveta) (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17682272#comment-17682272
 ] 

Eugene Shinn (Truveta) commented on SPARK-40684:


I think I filed [SPARK-39740] vis-timeline @ 4.2.1 vulnerable to XSS attacks - 
ASF JIRA (apache.org) for the same issue, but haven't seen any updates on that 
ticket either.

> Update embedded vis-timeline javascript resources
> -
>
> Key: SPARK-40684
> URL: https://issues.apache.org/jira/browse/SPARK-40684
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.3.0
>Reporter: Andrew Kyle Purtell
>Priority: Major
>
> Spark 3.3 currently ships with embedded vis-timeline javascript resources 
> subject to CVE-2020-28487, detected as a minor problem by several static 
> vulnerability assessment tools.
> https://nvd.nist.gov/vuln/detail/CVE-2020-28487: 
> bq. This affects the package vis-timeline before 7.4.4. An attacker with the 
> ability to control the items of a Timeline element can inject additional 
> script code into the generated application.
> This issue is not meant to imply a security problem in Spark itself. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-42221) Introduce a new conf for TimestampNTZ schema inference in JSON/CSV

2023-01-30 Thread Gengliang Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gengliang Wang resolved SPARK-42221.

Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 39777
[https://github.com/apache/spark/pull/39777]

> Introduce a new conf for TimestampNTZ schema inference in JSON/CSV
> --
>
> Key: SPARK-42221
> URL: https://issues.apache.org/jira/browse/SPARK-42221
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Gengliang Wang
>Assignee: Gengliang Wang
>Priority: Major
> Fix For: 3.4.0
>
>
> Introduce a new conf "spark.sql.inferTimestampNTZInDataSources.enabled" for 
> TimestampNTZ schema inference in JSON/CSV



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-42242) Upgrade snappy-java to 1.1.9.0

2023-01-30 Thread Dongjoon Hyun (Jira)
Dongjoon Hyun created SPARK-42242:
-

 Summary: Upgrade snappy-java to 1.1.9.0
 Key: SPARK-42242
 URL: https://issues.apache.org/jira/browse/SPARK-42242
 Project: Spark
  Issue Type: Bug
  Components: Build
Affects Versions: 3.4.0
Reporter: Dongjoon Hyun






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42242) Upgrade snappy-java to 1.1.9.0

2023-01-30 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42242:


Assignee: (was: Apache Spark)

> Upgrade snappy-java to 1.1.9.0
> --
>
> Key: SPARK-42242
> URL: https://issues.apache.org/jira/browse/SPARK-42242
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 3.4.0
>Reporter: Dongjoon Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42242) Upgrade snappy-java to 1.1.9.0

2023-01-30 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17682306#comment-17682306
 ] 

Apache Spark commented on SPARK-42242:
--

User 'dongjoon-hyun' has created a pull request for this issue:
https://github.com/apache/spark/pull/39811

> Upgrade snappy-java to 1.1.9.0
> --
>
> Key: SPARK-42242
> URL: https://issues.apache.org/jira/browse/SPARK-42242
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 3.4.0
>Reporter: Dongjoon Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42242) Upgrade snappy-java to 1.1.9.0

2023-01-30 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42242:


Assignee: Apache Spark

> Upgrade snappy-java to 1.1.9.0
> --
>
> Key: SPARK-42242
> URL: https://issues.apache.org/jira/browse/SPARK-42242
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 3.4.0
>Reporter: Dongjoon Hyun
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-42243) Use `spark.sql.inferTimestampNTZInDataSources.enabled` to infer timestamp type on partition columns

2023-01-30 Thread Gengliang Wang (Jira)
Gengliang Wang created SPARK-42243:
--

 Summary: Use `spark.sql.inferTimestampNTZInDataSources.enabled` to 
infer timestamp type on partition columns
 Key: SPARK-42243
 URL: https://issues.apache.org/jira/browse/SPARK-42243
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.4.0
Reporter: Gengliang Wang
Assignee: Gengliang Wang






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42243) Use `spark.sql.inferTimestampNTZInDataSources.enabled` to infer timestamp type on partition columns

2023-01-30 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17682311#comment-17682311
 ] 

Apache Spark commented on SPARK-42243:
--

User 'gengliangwang' has created a pull request for this issue:
https://github.com/apache/spark/pull/39812

> Use `spark.sql.inferTimestampNTZInDataSources.enabled` to infer timestamp 
> type on partition columns
> ---
>
> Key: SPARK-42243
> URL: https://issues.apache.org/jira/browse/SPARK-42243
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Gengliang Wang
>Assignee: Gengliang Wang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42243) Use `spark.sql.inferTimestampNTZInDataSources.enabled` to infer timestamp type on partition columns

2023-01-30 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42243:


Assignee: Apache Spark  (was: Gengliang Wang)

> Use `spark.sql.inferTimestampNTZInDataSources.enabled` to infer timestamp 
> type on partition columns
> ---
>
> Key: SPARK-42243
> URL: https://issues.apache.org/jira/browse/SPARK-42243
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Gengliang Wang
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42243) Use `spark.sql.inferTimestampNTZInDataSources.enabled` to infer timestamp type on partition columns

2023-01-30 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42243:


Assignee: Gengliang Wang  (was: Apache Spark)

> Use `spark.sql.inferTimestampNTZInDataSources.enabled` to infer timestamp 
> type on partition columns
> ---
>
> Key: SPARK-42243
> URL: https://issues.apache.org/jira/browse/SPARK-42243
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Gengliang Wang
>Assignee: Gengliang Wang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42192) Migrate the `TypeError` from `pyspark/sql/dataframe.py` into `PySparkTypeError`.

2023-01-30 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-42192:


Assignee: Haejoon Lee

> Migrate the `TypeError` from `pyspark/sql/dataframe.py` into 
> `PySparkTypeError`.
> 
>
> Key: SPARK-42192
> URL: https://issues.apache.org/jira/browse/SPARK-42192
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.5.0
>Reporter: Haejoon Lee
>Assignee: Haejoon Lee
>Priority: Major
>
> Migrate the existing errors into new PySpark error framework.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-42192) Migrate the `TypeError` from `pyspark/sql/dataframe.py` into `PySparkTypeError`.

2023-01-30 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-42192.
--
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 39785
[https://github.com/apache/spark/pull/39785]

> Migrate the `TypeError` from `pyspark/sql/dataframe.py` into 
> `PySparkTypeError`.
> 
>
> Key: SPARK-42192
> URL: https://issues.apache.org/jira/browse/SPARK-42192
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.5.0
>Reporter: Haejoon Lee
>Assignee: Haejoon Lee
>Priority: Major
> Fix For: 3.4.0
>
>
> Migrate the existing errors into new PySpark error framework.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42241) Correct the condition for `SparkConnectServerUtils#findSparkConnectJar` to find the correct connect server jar for maven

2023-01-30 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-42241:


Assignee: Yang Jie

>  Correct the condition for `SparkConnectServerUtils#findSparkConnectJar` to 
> find the correct connect server jar for maven
> -
>
> Key: SPARK-42241
> URL: https://issues.apache.org/jira/browse/SPARK-42241
> Project: Spark
>  Issue Type: Bug
>  Components: Connect, Tests
>Affects Versions: 3.4.0
>Reporter: Yang Jie
>Assignee: Yang Jie
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-42241) Correct the condition for `SparkConnectServerUtils#findSparkConnectJar` to find the correct connect server jar for maven

2023-01-30 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-42241.
--
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 39810
[https://github.com/apache/spark/pull/39810]

>  Correct the condition for `SparkConnectServerUtils#findSparkConnectJar` to 
> find the correct connect server jar for maven
> -
>
> Key: SPARK-42241
> URL: https://issues.apache.org/jira/browse/SPARK-42241
> Project: Spark
>  Issue Type: Bug
>  Components: Connect, Tests
>Affects Versions: 3.4.0
>Reporter: Yang Jie
>Assignee: Yang Jie
>Priority: Minor
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-41970) Introduce SparkPath to address paths and URIs

2023-01-30 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41970?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-41970:
-
Summary: Introduce SparkPath to address paths and URIs  (was: SparkPath)

> Introduce SparkPath to address paths and URIs
> -
>
> Key: SPARK-41970
> URL: https://issues.apache.org/jira/browse/SPARK-41970
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.4.0
>Reporter: David Lewis
>Assignee: David Lewis
>Priority: Major
> Fix For: 3.4.0
>
>
> Today, Spark represents file paths in various ways. Sometimes they are Hadoop 
> `Path`s, sometimes they are `Path.toString`s, and sometimes they are 
> `Path.toUri.toString`s.
> This discrepancy means that Spark does not always work when user provided 
> strings have special characters. Sometimes Spark will try to create a URI 
> with an unescaped string; sometimes Spark will double-escape a path and try 
> to access the wrong file.
>  
> This issue proposes a new `SparkPath` class meant to provide type safety when 
> Spark is dealing with paths.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-37936) Use error classes in the parsing errors of intervals

2023-01-30 Thread Haejoon Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haejoon Lee resolved SPARK-37936.
-
Resolution: Fixed

> Use error classes in the parsing errors of intervals
> 
>
> Key: SPARK-37936
> URL: https://issues.apache.org/jira/browse/SPARK-37936
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Max Gekk
>Priority: Major
>
> Modify the following methods in QueryParsingErrors:
>  * moreThanOneFromToUnitInIntervalLiteralError
>  * invalidIntervalLiteralError
>  * invalidIntervalFormError
>  * invalidFromToUnitValueError
>  * fromToIntervalUnsupportedError
>  * mixedIntervalUnitsError
> onto use error classes. Throw an implementation of SparkThrowable. Also write 
> a test per every error in QueryParsingErrorsSuite.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-42004) Migrate "XX000" sqlState onto `INTERNAL_ERROR`

2023-01-30 Thread Haejoon Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haejoon Lee resolved SPARK-42004.
-
Fix Version/s: 3.4.0
   Resolution: Fixed

Done with https://github.com/apache/spark/pull/39537

> Migrate "XX000" sqlState onto `INTERNAL_ERROR`
> --
>
> Key: SPARK-42004
> URL: https://issues.apache.org/jira/browse/SPARK-42004
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Haejoon Lee
>Priority: Major
> Fix For: 3.4.0
>
>
> We should "sqlState" : "XX000" onto INTERNAL_ERROR to follow the standard 
> (This is what PostgreSQL does).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41554) Decimal.changePrecision produces ArrayIndexOutOfBoundsException

2023-01-30 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17682333#comment-17682333
 ] 

Apache Spark commented on SPARK-41554:
--

User 'fe2s' has created a pull request for this issue:
https://github.com/apache/spark/pull/39813

> Decimal.changePrecision produces ArrayIndexOutOfBoundsException
> ---
>
> Key: SPARK-41554
> URL: https://issues.apache.org/jira/browse/SPARK-41554
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.1
>Reporter: Oleksiy Dyagilev
>Priority: Major
>
> {{Reducing Decimal scale by more than 18 produces exception.}}
> {code:java}
> Decimal(1, 38, 19).changePrecision(38, 0){code}
> {code:java}
> java.lang.ArrayIndexOutOfBoundsException: 19
>     at org.apache.spark.sql.types.Decimal.changePrecision(Decimal.scala:377)
>     at 
> org.apache.spark.sql.types.Decimal.changePrecision(Decimal.scala:328){code}
> Reproducing with SQL query:
> {code:java}
> sql("select cast(cast(cast(cast(id as decimal(38,15)) as decimal(38,30)) as 
> decimal(38,37)) as decimal(38,17)) from range(3)").show{code}
> The bug exists for {{Decimal}} that is stored using compact long only, it 
> works fine with {{Decimal}} that uses {{scala.math.BigDecimal}} internally.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-42244) Refine error message by using Python types.

2023-01-30 Thread Haejoon Lee (Jira)
Haejoon Lee created SPARK-42244:
---

 Summary: Refine error message by using Python types.
 Key: SPARK-42244
 URL: https://issues.apache.org/jira/browse/SPARK-42244
 Project: Spark
  Issue Type: Sub-task
  Components: PySpark
Affects Versions: 3.4.0
Reporter: Haejoon Lee


Currently, the type name in error message is mixed like `string` and `str`.

We might need to consolidate them into one rule.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42208) Reuse UDF test cases under `pyspark.sql.tests`

2023-01-30 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17682337#comment-17682337
 ] 

Apache Spark commented on SPARK-42208:
--

User 'xinrong-meng' has created a pull request for this issue:
https://github.com/apache/spark/pull/39814

> Reuse UDF test cases under `pyspark.sql.tests`
> --
>
> Key: SPARK-42208
> URL: https://issues.apache.org/jira/browse/SPARK-42208
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.4.0
>Reporter: Xinrong Meng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42208) Reuse UDF test cases under `pyspark.sql.tests`

2023-01-30 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42208:


Assignee: (was: Apache Spark)

> Reuse UDF test cases under `pyspark.sql.tests`
> --
>
> Key: SPARK-42208
> URL: https://issues.apache.org/jira/browse/SPARK-42208
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.4.0
>Reporter: Xinrong Meng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42208) Reuse UDF test cases under `pyspark.sql.tests`

2023-01-30 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42208:


Assignee: Apache Spark

> Reuse UDF test cases under `pyspark.sql.tests`
> --
>
> Key: SPARK-42208
> URL: https://issues.apache.org/jira/browse/SPARK-42208
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.4.0
>Reporter: Xinrong Meng
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42244) Refine error message by using Python types.

2023-01-30 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17682338#comment-17682338
 ] 

Apache Spark commented on SPARK-42244:
--

User 'itholic' has created a pull request for this issue:
https://github.com/apache/spark/pull/39815

> Refine error message by using Python types.
> ---
>
> Key: SPARK-42244
> URL: https://issues.apache.org/jira/browse/SPARK-42244
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.4.0
>Reporter: Haejoon Lee
>Priority: Major
>
> Currently, the type name in error message is mixed like `string` and `str`.
> We might need to consolidate them into one rule.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42244) Refine error message by using Python types.

2023-01-30 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42244:


Assignee: Apache Spark

> Refine error message by using Python types.
> ---
>
> Key: SPARK-42244
> URL: https://issues.apache.org/jira/browse/SPARK-42244
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.4.0
>Reporter: Haejoon Lee
>Assignee: Apache Spark
>Priority: Major
>
> Currently, the type name in error message is mixed like `string` and `str`.
> We might need to consolidate them into one rule.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42244) Refine error message by using Python types.

2023-01-30 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42244:


Assignee: (was: Apache Spark)

> Refine error message by using Python types.
> ---
>
> Key: SPARK-42244
> URL: https://issues.apache.org/jira/browse/SPARK-42244
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.4.0
>Reporter: Haejoon Lee
>Priority: Major
>
> Currently, the type name in error message is mixed like `string` and `str`.
> We might need to consolidate them into one rule.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42244) Refine error message by using Python types.

2023-01-30 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17682339#comment-17682339
 ] 

Apache Spark commented on SPARK-42244:
--

User 'itholic' has created a pull request for this issue:
https://github.com/apache/spark/pull/39815

> Refine error message by using Python types.
> ---
>
> Key: SPARK-42244
> URL: https://issues.apache.org/jira/browse/SPARK-42244
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.4.0
>Reporter: Haejoon Lee
>Priority: Major
>
> Currently, the type name in error message is mixed like `string` and `str`.
> We might need to consolidate them into one rule.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-42245) Upgrade scalafmt from 3.6.1 to 3.7.1

2023-01-30 Thread BingKun Pan (Jira)
BingKun Pan created SPARK-42245:
---

 Summary: Upgrade scalafmt from 3.6.1 to 3.7.1
 Key: SPARK-42245
 URL: https://issues.apache.org/jira/browse/SPARK-42245
 Project: Spark
  Issue Type: Improvement
  Components: Build
Affects Versions: 3.5.0
Reporter: BingKun Pan






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-42202) Scala Client E2E test stop the server gracefully

2023-01-30 Thread Jira


 [ 
https://issues.apache.org/jira/browse/SPARK-42202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Herman van Hövell resolved SPARK-42202.
---
Fix Version/s: 3.4.0
 Assignee: Zhen Li
   Resolution: Fixed

> Scala Client E2E test stop the server gracefully
> 
>
> Key: SPARK-42202
> URL: https://issues.apache.org/jira/browse/SPARK-42202
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Zhen Li
>Assignee: Zhen Li
>Priority: Minor
> Fix For: 3.4.0
>
>
> The current solution kills the spark connect server process which may result 
> in some errors in the command line.
> Suggest a minor fix to close the server process gracefully.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42245) Upgrade scalafmt from 3.6.1 to 3.7.1

2023-01-30 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17682341#comment-17682341
 ] 

Apache Spark commented on SPARK-42245:
--

User 'panbingkun' has created a pull request for this issue:
https://github.com/apache/spark/pull/39816

> Upgrade scalafmt from 3.6.1 to 3.7.1
> 
>
> Key: SPARK-42245
> URL: https://issues.apache.org/jira/browse/SPARK-42245
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.5.0
>Reporter: BingKun Pan
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42245) Upgrade scalafmt from 3.6.1 to 3.7.1

2023-01-30 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42245:


Assignee: (was: Apache Spark)

> Upgrade scalafmt from 3.6.1 to 3.7.1
> 
>
> Key: SPARK-42245
> URL: https://issues.apache.org/jira/browse/SPARK-42245
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.5.0
>Reporter: BingKun Pan
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42245) Upgrade scalafmt from 3.6.1 to 3.7.1

2023-01-30 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42245:


Assignee: Apache Spark

> Upgrade scalafmt from 3.6.1 to 3.7.1
> 
>
> Key: SPARK-42245
> URL: https://issues.apache.org/jira/browse/SPARK-42245
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.5.0
>Reporter: BingKun Pan
>Assignee: Apache Spark
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42245) Upgrade scalafmt from 3.6.1 to 3.7.1

2023-01-30 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17682342#comment-17682342
 ] 

Apache Spark commented on SPARK-42245:
--

User 'panbingkun' has created a pull request for this issue:
https://github.com/apache/spark/pull/39816

> Upgrade scalafmt from 3.6.1 to 3.7.1
> 
>
> Key: SPARK-42245
> URL: https://issues.apache.org/jira/browse/SPARK-42245
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.5.0
>Reporter: BingKun Pan
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-42246) Reach Full Parity with Vanilla PySpark's UDF in Python

2023-01-30 Thread Xinrong Meng (Jira)
Xinrong Meng created SPARK-42246:


 Summary: Reach Full Parity with Vanilla PySpark's UDF in Python
 Key: SPARK-42246
 URL: https://issues.apache.org/jira/browse/SPARK-42246
 Project: Spark
  Issue Type: Umbrella
  Components: Connect, PySpark
Affects Versions: 3.4.0
Reporter: Xinrong Meng






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42163) Schema pruning fails on non-foldable array index or map key

2023-01-30 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-42163:
---

Assignee: David Cashman

> Schema pruning fails on non-foldable array index or map key
> ---
>
> Key: SPARK-42163
> URL: https://issues.apache.org/jira/browse/SPARK-42163
> Project: Spark
>  Issue Type: Bug
>  Components: Optimizer
>Affects Versions: 3.2.3
>Reporter: David Cashman
>Assignee: David Cashman
>Priority: Major
> Fix For: 3.4.0
>
>
> Schema pruning tries to extract selected fields from struct extractors. It 
> looks through GetArrayItem/GetMapItem, but when doing so, it ignores the 
> index/key, which may itself be a struct field. If it is a struct field that 
> is not otherwise selected, and some other field of the same attribute is 
> selected, then pruning will drop the field, resulting in an optimizer error.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-42163) Schema pruning fails on non-foldable array index or map key

2023-01-30 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-42163.
-
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 39718
[https://github.com/apache/spark/pull/39718]

> Schema pruning fails on non-foldable array index or map key
> ---
>
> Key: SPARK-42163
> URL: https://issues.apache.org/jira/browse/SPARK-42163
> Project: Spark
>  Issue Type: Bug
>  Components: Optimizer
>Affects Versions: 3.2.3
>Reporter: David Cashman
>Priority: Major
> Fix For: 3.4.0
>
>
> Schema pruning tries to extract selected fields from struct extractors. It 
> looks through GetArrayItem/GetMapItem, but when doing so, it ignores the 
> index/key, which may itself be a struct field. If it is a struct field that 
> is not otherwise selected, and some other field of the same attribute is 
> selected, then pruning will drop the field, resulting in an optimizer error.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-42247) `returnType` attribute of UDF when the user-specified return type has column name embeded

2023-01-30 Thread Xinrong Meng (Jira)
Xinrong Meng created SPARK-42247:


 Summary: `returnType` attribute of UDF when the user-specified 
return type has column name embeded
 Key: SPARK-42247
 URL: https://issues.apache.org/jira/browse/SPARK-42247
 Project: Spark
  Issue Type: Sub-task
  Components: Connect, PySpark
Affects Versions: 3.4.0
Reporter: Xinrong Meng


The inconsistency can be reproduced as shown below:

{code}
# connect
>>> pandas_udf(lambda x : x + 1, "id int").returnType
IntegerType()

# vanilla PySpark
>>> pandas_udf(lambda x : x + 1, "id int").returnType
StructType([StructField('id', IntegerType(), True)])
{code}




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42231) Rename error class: MISSING_STATIC_PARTITION_COLUMN

2023-01-30 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-42231:
---

Assignee: Haejoon Lee

> Rename error class: MISSING_STATIC_PARTITION_COLUMN
> ---
>
> Key: SPARK-42231
> URL: https://issues.apache.org/jira/browse/SPARK-42231
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Haejoon Lee
>Assignee: Haejoon Lee
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-42231) Rename error class: MISSING_STATIC_PARTITION_COLUMN

2023-01-30 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-42231.
-
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 39797
[https://github.com/apache/spark/pull/39797]

> Rename error class: MISSING_STATIC_PARTITION_COLUMN
> ---
>
> Key: SPARK-42231
> URL: https://issues.apache.org/jira/browse/SPARK-42231
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Haejoon Lee
>Assignee: Haejoon Lee
>Priority: Major
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-42248) Assign name to _LEGACY_ERROR_TEMP_2141

2023-01-30 Thread Haejoon Lee (Jira)
Haejoon Lee created SPARK-42248:
---

 Summary: Assign name to _LEGACY_ERROR_TEMP_2141
 Key: SPARK-42248
 URL: https://issues.apache.org/jira/browse/SPARK-42248
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.4.0
Reporter: Haejoon Lee






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-42249) Refining html strings in error messages

2023-01-30 Thread Haejoon Lee (Jira)
Haejoon Lee created SPARK-42249:
---

 Summary: Refining html strings in error messages
 Key: SPARK-42249
 URL: https://issues.apache.org/jira/browse/SPARK-42249
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.4.0
Reporter: Haejoon Lee


Using relative path for html string



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-42248) Assign name to _LEGACY_ERROR_TEMP_2141

2023-01-30 Thread Haejoon Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haejoon Lee resolved SPARK-42248.
-
Resolution: Duplicate

> Assign name to _LEGACY_ERROR_TEMP_2141
> --
>
> Key: SPARK-42248
> URL: https://issues.apache.org/jira/browse/SPARK-42248
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Haejoon Lee
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-42250) batch_infer_udf with float fails when the batch size consists of single value

2023-01-30 Thread Hyukjin Kwon (Jira)
Hyukjin Kwon created SPARK-42250:


 Summary: batch_infer_udf with float fails when the batch size 
consists of single value
 Key: SPARK-42250
 URL: https://issues.apache.org/jira/browse/SPARK-42250
 Project: Spark
  Issue Type: Bug
  Components: ML, PySpark
Affects Versions: 3.4.0
Reporter: Hyukjin Kwon


{code}
import numpy as np
import pandas as pd
from pyspark.ml.functions import predict_batch_udf
from pyspark.sql.types import ArrayType, FloatType, StructType, StructField
from typing import Mapping

df = spark.createDataFrame([[[0.0, 1.0, 2.0, 3.0], [0.0, 1.0, 2.0]], [[4.0, 
5.0, 6.0, 7.0], [4.0, 5.0, 6.0]]], schema=["t1", "t2"])

def make_multi_sum_fn():
def predict(x1: np.ndarray, x2: np.ndarray) -> np.ndarray:
return np.sum(x1, axis=1) + np.sum(x2, axis=1)
return predict

multi_sum_udf = predict_batch_udf(
make_multi_sum_fn,
return_type=FloatType(),
batch_size=1,
input_tensor_shapes=[[4], [3]],
)

df.select(multi_sum_udf("t1", "t2")).collect()
{code}

fails as below:

{code}
 File "/.../spark/python/lib/pyspark.zip/pyspark/worker.py", line 829, in main
process()
  File "/.../spark/python/lib/pyspark.zip/pyspark/worker.py", line 821, in 
process
serializer.dump_stream(out_iter, outfile)
  File "/.../spark/python/lib/pyspark.zip/pyspark/sql/pandas/serializers.py", 
line 345, in dump_stream
return ArrowStreamSerializer.dump_stream(self, init_stream_yield_batches(), 
stream)
  File "/.../spark/python/lib/pyspark.zip/pyspark/sql/pandas/serializers.py", 
line 86, in dump_stream
for batch in iterator:
  File "/.../spark/python/lib/pyspark.zip/pyspark/sql/pandas/serializers.py", 
line 339, in init_stream_yield_batches
batch = self._create_batch(series)
  File "/.../spark/python/lib/pyspark.zip/pyspark/sql/pandas/serializers.py", 
line 275, in _create_batch
arrs.append(create_array(s, t))
  File "/.../spark/python/lib/pyspark.zip/pyspark/sql/pandas/serializers.py", 
line 245, in create_array
raise e
  File "/.../spark/python/lib/pyspark.zip/pyspark/sql/pandas/serializers.py", 
line 233, in create_array
array = pa.Array.from_pandas(s, mask=mask, type=t, safe=self._safecheck)
  File "pyarrow/array.pxi", line 1044, in pyarrow.lib.Array.from_pandas
  File "pyarrow/array.pxi", line 316, in pyarrow.lib.array
  File "pyarrow/array.pxi", line 83, in pyarrow.lib._ndarray_to_array
  File "pyarrow/error.pxi", line 100, in pyarrow.lib.check_status
pyarrow.lib.ArrowInvalid: Could not convert array(569.) with type 
numpy.ndarray: tried to convert to float32

at 
org.apache.spark.api.python.BasePythonRunner$ReaderIterator.handlePythonException(PythonRunner.scala:554)
at 
org.apache.spark.sql.execution.python.PythonArrowOutput$$anon$1.read(PythonArrowOutput.scala:118)
at 
org.apache.spark.api.python.BasePythonRunner$ReaderIterator.hasNext(PythonRunner.scala:507)
at 
org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:491)
at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage2.processNext(Unknown
 Source)
at 
org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
at 
org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:760)
at 
org.apache.spark.sql.execution.SparkPlan.$anonfun$getByteArrayRdd$1(SparkPlan.scala:391)
at 
org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:888)
at 
org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:888)
at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:364)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:328)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:92)
at 
org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:161)
at org.apache.spark.scheduler.Task.run(Task.scala:139)
at 
org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:554)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1520)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:557)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)

{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache

[jira] [Updated] (SPARK-42250) predict_batch_udf with float fails when the batch size consists of single value

2023-01-30 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42250?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-42250:
-
Summary: predict_batch_udf with float fails when the batch size consists of 
single value  (was: batch_infer_udf with float fails when the batch size 
consists of single value)

> predict_batch_udf with float fails when the batch size consists of single 
> value
> ---
>
> Key: SPARK-42250
> URL: https://issues.apache.org/jira/browse/SPARK-42250
> Project: Spark
>  Issue Type: Bug
>  Components: ML, PySpark
>Affects Versions: 3.4.0
>Reporter: Hyukjin Kwon
>Priority: Major
>
> {code}
> import numpy as np
> import pandas as pd
> from pyspark.ml.functions import predict_batch_udf
> from pyspark.sql.types import ArrayType, FloatType, StructType, StructField
> from typing import Mapping
> df = spark.createDataFrame([[[0.0, 1.0, 2.0, 3.0], [0.0, 1.0, 2.0]], [[4.0, 
> 5.0, 6.0, 7.0], [4.0, 5.0, 6.0]]], schema=["t1", "t2"])
> def make_multi_sum_fn():
> def predict(x1: np.ndarray, x2: np.ndarray) -> np.ndarray:
> return np.sum(x1, axis=1) + np.sum(x2, axis=1)
> return predict
> multi_sum_udf = predict_batch_udf(
> make_multi_sum_fn,
> return_type=FloatType(),
> batch_size=1,
> input_tensor_shapes=[[4], [3]],
> )
> df.select(multi_sum_udf("t1", "t2")).collect()
> {code}
> fails as below:
> {code}
>  File "/.../spark/python/lib/pyspark.zip/pyspark/worker.py", line 829, in main
> process()
>   File "/.../spark/python/lib/pyspark.zip/pyspark/worker.py", line 821, in 
> process
> serializer.dump_stream(out_iter, outfile)
>   File "/.../spark/python/lib/pyspark.zip/pyspark/sql/pandas/serializers.py", 
> line 345, in dump_stream
> return ArrowStreamSerializer.dump_stream(self, 
> init_stream_yield_batches(), stream)
>   File "/.../spark/python/lib/pyspark.zip/pyspark/sql/pandas/serializers.py", 
> line 86, in dump_stream
> for batch in iterator:
>   File "/.../spark/python/lib/pyspark.zip/pyspark/sql/pandas/serializers.py", 
> line 339, in init_stream_yield_batches
> batch = self._create_batch(series)
>   File "/.../spark/python/lib/pyspark.zip/pyspark/sql/pandas/serializers.py", 
> line 275, in _create_batch
> arrs.append(create_array(s, t))
>   File "/.../spark/python/lib/pyspark.zip/pyspark/sql/pandas/serializers.py", 
> line 245, in create_array
> raise e
>   File "/.../spark/python/lib/pyspark.zip/pyspark/sql/pandas/serializers.py", 
> line 233, in create_array
> array = pa.Array.from_pandas(s, mask=mask, type=t, safe=self._safecheck)
>   File "pyarrow/array.pxi", line 1044, in pyarrow.lib.Array.from_pandas
>   File "pyarrow/array.pxi", line 316, in pyarrow.lib.array
>   File "pyarrow/array.pxi", line 83, in pyarrow.lib._ndarray_to_array
>   File "pyarrow/error.pxi", line 100, in pyarrow.lib.check_status
> pyarrow.lib.ArrowInvalid: Could not convert array(569.) with type 
> numpy.ndarray: tried to convert to float32
>   at 
> org.apache.spark.api.python.BasePythonRunner$ReaderIterator.handlePythonException(PythonRunner.scala:554)
>   at 
> org.apache.spark.sql.execution.python.PythonArrowOutput$$anon$1.read(PythonArrowOutput.scala:118)
>   at 
> org.apache.spark.api.python.BasePythonRunner$ReaderIterator.hasNext(PythonRunner.scala:507)
>   at 
> org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
>   at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:491)
>   at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
>   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage2.processNext(Unknown
>  Source)
>   at 
> org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
>   at 
> org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:760)
>   at 
> org.apache.spark.sql.execution.SparkPlan.$anonfun$getByteArrayRdd$1(SparkPlan.scala:391)
>   at 
> org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:888)
>   at 
> org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:888)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:364)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:328)
>   at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:92)
>   at 
> org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:161)
>   at org.apache.spark.scheduler.Task.run(Task.scala:139)
>   at 
> org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:554)
>   at org.apache.spark.util.Utils$.tryWithSafeF

[jira] [Commented] (SPARK-42250) predict_batch_udf with float fails when the batch size consists of single value

2023-01-30 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17682368#comment-17682368
 ] 

Apache Spark commented on SPARK-42250:
--

User 'HyukjinKwon' has created a pull request for this issue:
https://github.com/apache/spark/pull/39817

> predict_batch_udf with float fails when the batch size consists of single 
> value
> ---
>
> Key: SPARK-42250
> URL: https://issues.apache.org/jira/browse/SPARK-42250
> Project: Spark
>  Issue Type: Bug
>  Components: ML, PySpark
>Affects Versions: 3.4.0
>Reporter: Hyukjin Kwon
>Priority: Major
>
> {code}
> import numpy as np
> import pandas as pd
> from pyspark.ml.functions import predict_batch_udf
> from pyspark.sql.types import ArrayType, FloatType, StructType, StructField
> from typing import Mapping
> df = spark.createDataFrame([[[0.0, 1.0, 2.0, 3.0], [0.0, 1.0, 2.0]], [[4.0, 
> 5.0, 6.0, 7.0], [4.0, 5.0, 6.0]]], schema=["t1", "t2"])
> def make_multi_sum_fn():
> def predict(x1: np.ndarray, x2: np.ndarray) -> np.ndarray:
> return np.sum(x1, axis=1) + np.sum(x2, axis=1)
> return predict
> multi_sum_udf = predict_batch_udf(
> make_multi_sum_fn,
> return_type=FloatType(),
> batch_size=1,
> input_tensor_shapes=[[4], [3]],
> )
> df.select(multi_sum_udf("t1", "t2")).collect()
> {code}
> fails as below:
> {code}
>  File "/.../spark/python/lib/pyspark.zip/pyspark/worker.py", line 829, in main
> process()
>   File "/.../spark/python/lib/pyspark.zip/pyspark/worker.py", line 821, in 
> process
> serializer.dump_stream(out_iter, outfile)
>   File "/.../spark/python/lib/pyspark.zip/pyspark/sql/pandas/serializers.py", 
> line 345, in dump_stream
> return ArrowStreamSerializer.dump_stream(self, 
> init_stream_yield_batches(), stream)
>   File "/.../spark/python/lib/pyspark.zip/pyspark/sql/pandas/serializers.py", 
> line 86, in dump_stream
> for batch in iterator:
>   File "/.../spark/python/lib/pyspark.zip/pyspark/sql/pandas/serializers.py", 
> line 339, in init_stream_yield_batches
> batch = self._create_batch(series)
>   File "/.../spark/python/lib/pyspark.zip/pyspark/sql/pandas/serializers.py", 
> line 275, in _create_batch
> arrs.append(create_array(s, t))
>   File "/.../spark/python/lib/pyspark.zip/pyspark/sql/pandas/serializers.py", 
> line 245, in create_array
> raise e
>   File "/.../spark/python/lib/pyspark.zip/pyspark/sql/pandas/serializers.py", 
> line 233, in create_array
> array = pa.Array.from_pandas(s, mask=mask, type=t, safe=self._safecheck)
>   File "pyarrow/array.pxi", line 1044, in pyarrow.lib.Array.from_pandas
>   File "pyarrow/array.pxi", line 316, in pyarrow.lib.array
>   File "pyarrow/array.pxi", line 83, in pyarrow.lib._ndarray_to_array
>   File "pyarrow/error.pxi", line 100, in pyarrow.lib.check_status
> pyarrow.lib.ArrowInvalid: Could not convert array(569.) with type 
> numpy.ndarray: tried to convert to float32
>   at 
> org.apache.spark.api.python.BasePythonRunner$ReaderIterator.handlePythonException(PythonRunner.scala:554)
>   at 
> org.apache.spark.sql.execution.python.PythonArrowOutput$$anon$1.read(PythonArrowOutput.scala:118)
>   at 
> org.apache.spark.api.python.BasePythonRunner$ReaderIterator.hasNext(PythonRunner.scala:507)
>   at 
> org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
>   at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:491)
>   at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
>   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage2.processNext(Unknown
>  Source)
>   at 
> org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
>   at 
> org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:760)
>   at 
> org.apache.spark.sql.execution.SparkPlan.$anonfun$getByteArrayRdd$1(SparkPlan.scala:391)
>   at 
> org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:888)
>   at 
> org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:888)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:364)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:328)
>   at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:92)
>   at 
> org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:161)
>   at org.apache.spark.scheduler.Task.run(Task.scala:139)
>   at 
> org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:554)
>   at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala

  1   2   >