[jira] [Assigned] (SPARK-37945) Use error classes in the execution errors of arithmetic ops

2022-10-22 Thread Max Gekk (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk reassigned SPARK-37945:


Assignee: Khalid Mammadov

> Use error classes in the execution errors of arithmetic ops
> ---
>
> Key: SPARK-37945
> URL: https://issues.apache.org/jira/browse/SPARK-37945
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Max Gekk
>Assignee: Khalid Mammadov
>Priority: Major
>
> Migrate the following errors in QueryExecutionErrors:
> * overflowInSumOfDecimalError
> * overflowInIntegralDivideError
> * arithmeticOverflowError
> * unaryMinusCauseOverflowError
> * binaryArithmeticCauseOverflowError
> * unscaledValueTooLargeForPrecisionError
> * decimalPrecisionExceedsMaxPrecisionError
> * outOfDecimalTypeRangeError
> * integerOverflowError
> onto use error classes. Throw an implementation of SparkThrowable. Also write 
> a test per every error in QueryExecutionErrorsSuite.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-37945) Use error classes in the execution errors of arithmetic ops

2022-10-22 Thread Max Gekk (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk resolved SPARK-37945.
--
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 38273
[https://github.com/apache/spark/pull/38273]

> Use error classes in the execution errors of arithmetic ops
> ---
>
> Key: SPARK-37945
> URL: https://issues.apache.org/jira/browse/SPARK-37945
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Max Gekk
>Assignee: Khalid Mammadov
>Priority: Major
> Fix For: 3.4.0
>
>
> Migrate the following errors in QueryExecutionErrors:
> * overflowInSumOfDecimalError
> * overflowInIntegralDivideError
> * arithmeticOverflowError
> * unaryMinusCauseOverflowError
> * binaryArithmeticCauseOverflowError
> * unscaledValueTooLargeForPrecisionError
> * decimalPrecisionExceedsMaxPrecisionError
> * outOfDecimalTypeRangeError
> * integerOverflowError
> onto use error classes. Throw an implementation of SparkThrowable. Also write 
> a test per every error in QueryExecutionErrorsSuite.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-40885) Spark will filter out data field sorting when dynamic partitions and data fields are sorted at the same time

2022-10-22 Thread Yuming Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang updated SPARK-40885:

Fix Version/s: (was: 3.4.0)

> Spark will filter out data field sorting when dynamic partitions and data 
> fields are sorted at the same time
> 
>
> Key: SPARK-40885
> URL: https://issues.apache.org/jira/browse/SPARK-40885
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.2, 3.3.0, 3.2.2
>Reporter: ming95
>Priority: Major
> Attachments: 1666494504884.jpg
>
>
> When using dynamic partitions to write data and sort partitions and data 
> fields, Spark will filter the sorting of data fields.
>  
> reproduce sql:
> {code:java}
> CREATE TABLE `sort_table`(
>   `id` int,
>   `name` string
>   )
> PARTITIONED BY (
>   `dt` string)
> stored as textfile
> LOCATION 'sort_table';CREATE TABLE `test_table`(
>   `id` int,
>   `name` string)
> PARTITIONED BY (
>   `dt` string)
> stored as textfile
> LOCATION
>   'test_table';//gen test data
> insert into test_table partition(dt=20221011) select 10,"15" union all select 
> 1,"10" union  all select 5,"50" union  all select 20,"2" union  all select 
> 30,"14"  ;
> set spark.hadoop.hive.exec.dynamici.partition=true;
> set spark.hadoop.hive.exec.dynamic.partition.mode=nonstrict;
> // this sql sort with partition filed (`dt`) and data filed (`name`), but 
> sort with `name` can not work
> insert overwrite table sort_table partition(dt) select id,name,dt from 
> test_table order by name,dt;
>  {code}
>  
> The Sort operator of DAG has only one sort field, but there are actually two 
> in SQL.(See the attached drawing)
>  
> It relate this issue : https://issues.apache.org/jira/browse/SPARK-40588



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-40885) Spark will filter out data field sorting when dynamic partitions and data fields are sorted at the same time

2022-10-22 Thread zzzzming95 (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ming95 updated SPARK-40885:
---
Description: 
When using dynamic partitions to write data and sort partitions and data 
fields, Spark will filter the sorting of data fields.

 

reproduce sql:
{code:java}
CREATE TABLE `sort_table`(
  `id` int,
  `name` string
  )
PARTITIONED BY (
  `dt` string)
stored as textfile
LOCATION 'sort_table';CREATE TABLE `test_table`(
  `id` int,
  `name` string)
PARTITIONED BY (
  `dt` string)
stored as textfile
LOCATION
  'test_table';//gen test data
insert into test_table partition(dt=20221011) select 10,"15" union all select 
1,"10" union  all select 5,"50" union  all select 20,"2" union  all select 
30,"14"  ;
set spark.hadoop.hive.exec.dynamici.partition=true;
set spark.hadoop.hive.exec.dynamic.partition.mode=nonstrict;

// this sql sort with partition filed (`dt`) and data filed (`name`), but sort 
with `name` can not work
insert overwrite table sort_table partition(dt) select id,name,dt from 
test_table order by name,dt;
 {code}
 

The Sort operator of DAG has only one sort field, but there are actually two in 
SQL.(See the attached drawing)

 

It relate this issue : https://issues.apache.org/jira/browse/SPARK-40588

  was:
When using dynamic partitions to write data and sort partitions and data 
fields, Spark will filter the sorting of data fields.

 

reproduce sql:
{code:java}
CREATE TABLE `sort_table`(
  `id` int,
  `name` string
  )
PARTITIONED BY (
  `dt` string)
stored as textfile
LOCATION 'sort_table';CREATE TABLE `test_table`(
  `id` int,
  `name` string)
PARTITIONED BY (
  `dt` string)
stored as textfile
LOCATION
  'test_table';//gen test data
insert into test_table partition(dt=20221011) select 10,"15" union all select 
1,"10" union  all select 5,"50" union  all select 20,"2" union  all select 
30,"14"  ;
set spark.hadoop.hive.exec.dynamici.partition=true
set spark.hadoop.hive.exec.dynamic.partition.mode=nonstrict
insert overwrite table sort_table partition(dt) select id,name,dt from 
test_table order by name,dt;
 {code}
 

The Sort operator of DAG has only one sort field, but there are actually two in 
SQL.(See the attached drawing)

 

It relate this issue : https://issues.apache.org/jira/browse/SPARK-40588


> Spark will filter out data field sorting when dynamic partitions and data 
> fields are sorted at the same time
> 
>
> Key: SPARK-40885
> URL: https://issues.apache.org/jira/browse/SPARK-40885
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.2, 3.3.0, 3.2.2
>Reporter: ming95
>Priority: Major
> Fix For: 3.4.0
>
> Attachments: 1666494504884.jpg
>
>
> When using dynamic partitions to write data and sort partitions and data 
> fields, Spark will filter the sorting of data fields.
>  
> reproduce sql:
> {code:java}
> CREATE TABLE `sort_table`(
>   `id` int,
>   `name` string
>   )
> PARTITIONED BY (
>   `dt` string)
> stored as textfile
> LOCATION 'sort_table';CREATE TABLE `test_table`(
>   `id` int,
>   `name` string)
> PARTITIONED BY (
>   `dt` string)
> stored as textfile
> LOCATION
>   'test_table';//gen test data
> insert into test_table partition(dt=20221011) select 10,"15" union all select 
> 1,"10" union  all select 5,"50" union  all select 20,"2" union  all select 
> 30,"14"  ;
> set spark.hadoop.hive.exec.dynamici.partition=true;
> set spark.hadoop.hive.exec.dynamic.partition.mode=nonstrict;
> // this sql sort with partition filed (`dt`) and data filed (`name`), but 
> sort with `name` can not work
> insert overwrite table sort_table partition(dt) select id,name,dt from 
> test_table order by name,dt;
>  {code}
>  
> The Sort operator of DAG has only one sort field, but there are actually two 
> in SQL.(See the attached drawing)
>  
> It relate this issue : https://issues.apache.org/jira/browse/SPARK-40588



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-40885) Spark will filter out data field sorting when dynamic partitions and data fields are sorted at the same time

2022-10-22 Thread zzzzming95 (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ming95 updated SPARK-40885:
---
Description: 
When using dynamic partitions to write data and sort partitions and data 
fields, Spark will filter the sorting of data fields.

 

reproduce sql:
{code:java}
CREATE TABLE `sort_table`(
  `id` int,
  `name` string
  )
PARTITIONED BY (
  `dt` string)
stored as textfile
LOCATION 'sort_table';CREATE TABLE `test_table`(
  `id` int,
  `name` string)
PARTITIONED BY (
  `dt` string)
stored as textfile
LOCATION
  'test_table';//gen test data
insert into test_table partition(dt=20221011) select 10,"15" union all select 
1,"10" union  all select 5,"50" union  all select 20,"2" union  all select 
30,"14"  ;
set spark.hadoop.hive.exec.dynamici.partition=true
set spark.hadoop.hive.exec.dynamic.partition.mode=nonstrict
insert overwrite table sort_table partition(dt) select id,name,dt from 
test_table order by name,dt;
 {code}
 

The Sort operator of DAG has only one sort field, but there are actually two in 
SQL.(See the attached drawing)

 

It relate this issue : https://issues.apache.org/jira/browse/SPARK-40588

  was:
When using dynamic partitions to write data and sort partitions and data 
fields, Spark will filter the sorting of data fields.

 

reproduce sql:
{code:java}
CREATE TABLE `sort_table`(
  `id` int,
  `name` string
  )
PARTITIONED BY (
  `dt` string)
stored as textfile
LOCATION 'sort_table';CREATE TABLE `test_table`(
  `id` int,
  `name` string)
PARTITIONED BY (
  `dt` string)
stored as textfile
LOCATION
  'test_table';//gen test data
insert into test_table partition(dt=20221011) select 10,"15" union all select 
1,"10" union  all select 5,"50" union  all select 20,"2" union  all select 
30,"14"  ;
set spark.hadoop.hive.exec.dynamici.partition=true
set spark.hadoop.hive.exec.dynamic.partition.mode=nonstrict
insert overwrite table sort_table partition(dt) select id,name,dt from 
test_table order by name,dt;
 {code}
!image-2022-10-23-11-09-47-759.png!

The Sort operator of DAG has only one sort field, but there are actually two in 
SQL.

 

It relate this issue : https://issues.apache.org/jira/browse/SPARK-40588


> Spark will filter out data field sorting when dynamic partitions and data 
> fields are sorted at the same time
> 
>
> Key: SPARK-40885
> URL: https://issues.apache.org/jira/browse/SPARK-40885
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.2, 3.3.0, 3.2.2
>Reporter: ming95
>Priority: Major
> Fix For: 3.4.0
>
> Attachments: 1666494504884.jpg
>
>
> When using dynamic partitions to write data and sort partitions and data 
> fields, Spark will filter the sorting of data fields.
>  
> reproduce sql:
> {code:java}
> CREATE TABLE `sort_table`(
>   `id` int,
>   `name` string
>   )
> PARTITIONED BY (
>   `dt` string)
> stored as textfile
> LOCATION 'sort_table';CREATE TABLE `test_table`(
>   `id` int,
>   `name` string)
> PARTITIONED BY (
>   `dt` string)
> stored as textfile
> LOCATION
>   'test_table';//gen test data
> insert into test_table partition(dt=20221011) select 10,"15" union all select 
> 1,"10" union  all select 5,"50" union  all select 20,"2" union  all select 
> 30,"14"  ;
> set spark.hadoop.hive.exec.dynamici.partition=true
> set spark.hadoop.hive.exec.dynamic.partition.mode=nonstrict
> insert overwrite table sort_table partition(dt) select id,name,dt from 
> test_table order by name,dt;
>  {code}
>  
> The Sort operator of DAG has only one sort field, but there are actually two 
> in SQL.(See the attached drawing)
>  
> It relate this issue : https://issues.apache.org/jira/browse/SPARK-40588



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-40885) Spark will filter out data field sorting when dynamic partitions and data fields are sorted at the same time

2022-10-22 Thread zzzzming95 (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ming95 updated SPARK-40885:
---
Attachment: 1666494504884.jpg

> Spark will filter out data field sorting when dynamic partitions and data 
> fields are sorted at the same time
> 
>
> Key: SPARK-40885
> URL: https://issues.apache.org/jira/browse/SPARK-40885
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.2, 3.3.0, 3.2.2
>Reporter: ming95
>Priority: Major
> Fix For: 3.4.0
>
> Attachments: 1666494504884.jpg
>
>
> When using dynamic partitions to write data and sort partitions and data 
> fields, Spark will filter the sorting of data fields.
>  
> reproduce sql:
> {code:java}
> CREATE TABLE `sort_table`(
>   `id` int,
>   `name` string
>   )
> PARTITIONED BY (
>   `dt` string)
> stored as textfile
> LOCATION 'sort_table';CREATE TABLE `test_table`(
>   `id` int,
>   `name` string)
> PARTITIONED BY (
>   `dt` string)
> stored as textfile
> LOCATION
>   'test_table';//gen test data
> insert into test_table partition(dt=20221011) select 10,"15" union all select 
> 1,"10" union  all select 5,"50" union  all select 20,"2" union  all select 
> 30,"14"  ;
> set spark.hadoop.hive.exec.dynamici.partition=true
> set spark.hadoop.hive.exec.dynamic.partition.mode=nonstrict
> insert overwrite table sort_table partition(dt) select id,name,dt from 
> test_table order by name,dt;
>  {code}
> !image-2022-10-23-11-09-47-759.png!
> The Sort operator of DAG has only one sort field, but there are actually two 
> in SQL.
>  
> It relate this issue : https://issues.apache.org/jira/browse/SPARK-40588



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-40885) Spark will filter out data field sorting when dynamic partitions and data fields are sorted at the same time

2022-10-22 Thread zzzzming95 (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ming95 updated SPARK-40885:
---
External issue URL: https://issues.apache.org/jira/browse/SPARK-40588
   Description: 
When using dynamic partitions to write data and sort partitions and data 
fields, Spark will filter the sorting of data fields.

 

reproduce sql:
{code:java}
CREATE TABLE `sort_table`(
  `id` int,
  `name` string
  )
PARTITIONED BY (
  `dt` string)
stored as textfile
LOCATION 'sort_table';CREATE TABLE `test_table`(
  `id` int,
  `name` string)
PARTITIONED BY (
  `dt` string)
stored as textfile
LOCATION
  'test_table';//gen test data
insert into test_table partition(dt=20221011) select 10,"15" union all select 
1,"10" union  all select 5,"50" union  all select 20,"2" union  all select 
30,"14"  ;
set spark.hadoop.hive.exec.dynamici.partition=true
set spark.hadoop.hive.exec.dynamic.partition.mode=nonstrict
insert overwrite table sort_table partition(dt) select id,name,dt from 
test_table order by name,dt;
 {code}
!image-2022-10-23-11-09-47-759.png!

The Sort operator of DAG has only one sort field, but there are actually two in 
SQL.

 

It relate this issue : https://issues.apache.org/jira/browse/SPARK-40588

  was:
When using dynamic partitions to write data and sort partitions and data 
fields, Spark will filter the sorting of data fields.

 

reproduce sql:
{code:java}
CREATE TABLE `sort_table`(
  `id` int,
  `name` string
  )
PARTITIONED BY (
  `dt` string)
stored as textfile
LOCATION 'sort_table';CREATE TABLE `test_table`(
  `id` int,
  `name` string)
PARTITIONED BY (
  `dt` string)
stored as textfile
LOCATION
  'test_table';//gen test data
insert into test_table partition(dt=20221011) select 10,"15" union all select 
1,"10" union  all select 5,"50" union  all select 20,"2" union  all select 
30,"14"  ;
set spark.hadoop.hive.exec.dynamici.partition=true
set spark.hadoop.hive.exec.dynamic.partition.mode=nonstrict
insert overwrite table sort_table partition(dt) select id,name,dt from 
test_table order by name,dt;
 {code}
 

 

 


> Spark will filter out data field sorting when dynamic partitions and data 
> fields are sorted at the same time
> 
>
> Key: SPARK-40885
> URL: https://issues.apache.org/jira/browse/SPARK-40885
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.2, 3.3.0, 3.2.2
>Reporter: ming95
>Priority: Major
> Fix For: 3.4.0
>
>
> When using dynamic partitions to write data and sort partitions and data 
> fields, Spark will filter the sorting of data fields.
>  
> reproduce sql:
> {code:java}
> CREATE TABLE `sort_table`(
>   `id` int,
>   `name` string
>   )
> PARTITIONED BY (
>   `dt` string)
> stored as textfile
> LOCATION 'sort_table';CREATE TABLE `test_table`(
>   `id` int,
>   `name` string)
> PARTITIONED BY (
>   `dt` string)
> stored as textfile
> LOCATION
>   'test_table';//gen test data
> insert into test_table partition(dt=20221011) select 10,"15" union all select 
> 1,"10" union  all select 5,"50" union  all select 20,"2" union  all select 
> 30,"14"  ;
> set spark.hadoop.hive.exec.dynamici.partition=true
> set spark.hadoop.hive.exec.dynamic.partition.mode=nonstrict
> insert overwrite table sort_table partition(dt) select id,name,dt from 
> test_table order by name,dt;
>  {code}
> !image-2022-10-23-11-09-47-759.png!
> The Sort operator of DAG has only one sort field, but there are actually two 
> in SQL.
>  
> It relate this issue : https://issues.apache.org/jira/browse/SPARK-40588



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-40885) Spark will filter out data field sorting when dynamic partitions and data fields are sorted at the same time

2022-10-22 Thread zzzzming95 (Jira)
ming95 created SPARK-40885:
--

 Summary: Spark will filter out data field sorting when dynamic 
partitions and data fields are sorted at the same time
 Key: SPARK-40885
 URL: https://issues.apache.org/jira/browse/SPARK-40885
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.2.2, 3.3.0, 3.1.2
Reporter: ming95
 Fix For: 3.4.0


When using dynamic partitions to write data and sort partitions and data 
fields, Spark will filter the sorting of data fields.

 

reproduce sql:
{code:java}
CREATE TABLE `sort_table`(
  `id` int,
  `name` string
  )
PARTITIONED BY (
  `dt` string)
stored as textfile
LOCATION 'sort_table';CREATE TABLE `test_table`(
  `id` int,
  `name` string)
PARTITIONED BY (
  `dt` string)
stored as textfile
LOCATION
  'test_table';//gen test data
insert into test_table partition(dt=20221011) select 10,"15" union all select 
1,"10" union  all select 5,"50" union  all select 20,"2" union  all select 
30,"14"  ;
set spark.hadoop.hive.exec.dynamici.partition=true
set spark.hadoop.hive.exec.dynamic.partition.mode=nonstrict
insert overwrite table sort_table partition(dt) select id,name,dt from 
test_table order by name,dt;
 {code}
 

 

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40748) Migrate type check failures of conditions onto error classes

2022-10-22 Thread BingKun Pan (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17622730#comment-17622730
 ] 

BingKun Pan commented on SPARK-40748:
-

I work on it.

> Migrate type check failures of conditions onto error classes
> 
>
> Key: SPARK-40748
> URL: https://issues.apache.org/jira/browse/SPARK-40748
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Max Gekk
>Priority: Major
>
> Replace TypeCheckFailure by DataTypeMismatch in type checks in the 
> conditional expressions:
> 1. If (2): 
> https://github.com/apache/spark/blob/1431975723d8df30a25b2333eddcfd0bb6c57677/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala#L61-L67
> 2. CaseWhen (2): 
> https://github.com/apache/spark/blob/1431975723d8df30a25b2333eddcfd0bb6c57677/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala#L175-L183
> 3. InSubquery (2);
> https://github.com/apache/spark/blob/1431975723d8df30a25b2333eddcfd0bb6c57677/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/predicates.scala#L378-L396
> 4. In (1):
> https://github.com/apache/spark/blob/1431975723d8df30a25b2333eddcfd0bb6c57677/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/predicates.scala#L453



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40588) Sorting issue with AQE turned on

2022-10-22 Thread zzzzming95 (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17622728#comment-17622728
 ] 

ming95 commented on SPARK-40588:


Yes, I found the same problem. This should be a bug in Spark. When the sorting 
field is the same as the dynamic partitioning field, the sorting of non 
partitioning fields will be filtered out.

> Sorting issue with AQE turned on  
> --
>
> Key: SPARK-40588
> URL: https://issues.apache.org/jira/browse/SPARK-40588
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.1.3
> Environment: Spark v3.1.3
> Scala v2.12.13
>Reporter: Swetha Baskaran
>Priority: Major
> Attachments: image-2022-10-16-22-05-47-159.png
>
>
> We are attempting to partition data by a few columns, sort by a particular 
> _sortCol_ and write out one file per partition. 
> {code:java}
> df
>     .repartition(col("day"), col("month"), col("year"))
>     .withColumn("partitionId",spark_partition_id)
>     .withColumn("monotonicallyIncreasingIdUnsorted",monotonicallyIncreasingId)
>     .sortWithinPartitions("year", "month", "day", "sortCol")
>     .withColumn("monotonicallyIncreasingIdSorted",monotonicallyIncreasingId)
>     .write
>     .partitionBy("year", "month", "day")
>     .parquet(path){code}
> When inspecting the results, we observe one file per partition, however we 
> see an _alternating_ pattern of unsorted rows in some files.
> {code:java}
> {"sortCol":10,"partitionId":732,"monotonicallyIncreasingIdUnsorted":6287832121344,"monotonicallyIncreasingIdSorted":6287832121344}
> {"sortCol":1303413,"partitionId":732,"monotonicallyIncreasingIdUnsorted":6287877022389,"monotonicallyIncreasingIdSorted":6287876860586}
> {"sortCol":10,"partitionId":732,"monotonicallyIncreasingIdUnsorted":6287877567881,"monotonicallyIncreasingIdSorted":6287832121345}
> {"sortCol":1303413,"partitionId":732,"monotonicallyIncreasingIdUnsorted":6287835105553,"monotonicallyIncreasingIdSorted":6287876860587}
> {"sortCol":10,"partitionId":732,"monotonicallyIncreasingIdUnsorted":6287832570127,"monotonicallyIncreasingIdSorted":6287832121346}
> {"sortCol":1303413,"partitionId":732,"monotonicallyIncreasingIdUnsorted":6287879965760,"monotonicallyIncreasingIdSorted":6287876860588}
> {"sortCol":10,"partitionId":732,"monotonicallyIncreasingIdUnsorted":6287878762347,"monotonicallyIncreasingIdSorted":6287832121347}
> {"sortCol":1303413,"partitionId":732,"monotonicallyIncreasingIdUnsorted":6287837165012,"monotonicallyIncreasingIdSorted":6287876860589}
> {"sortCol":10,"partitionId":732,"monotonicallyIncreasingIdUnsorted":6287832910545,"monotonicallyIncreasingIdSorted":6287832121348}
> {"sortCol":1303413,"partitionId":732,"monotonicallyIncreasingIdUnsorted":6287881244758,"monotonicallyIncreasingIdSorted":6287876860590}
> {"sortCol":10,"partitionId":732,"monotonicallyIncreasingIdUnsorted":6287880041345,"monotonicallyIncreasingIdSorted":6287832121349}{code}
> Here is a 
> [gist|https://gist.github.com/Swebask/543030748a768be92d3c0ae343d2ae89] to 
> reproduce the issue. 
> Turning off AQE with spark.conf.set("spark.sql.adaptive.enabled", false) 
> fixes the issue.
> I'm working on identifying why AQE affects the sort order. Any leads or 
> thoughts would be appreciated!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40882) Upgrade actions/setup-java to v3 with distribution specified

2022-10-22 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17622720#comment-17622720
 ] 

Apache Spark commented on SPARK-40882:
--

User 'Yikun' has created a pull request for this issue:
https://github.com/apache/spark/pull/38354

> Upgrade actions/setup-java to v3 with distribution specified
> 
>
> Key: SPARK-40882
> URL: https://issues.apache.org/jira/browse/SPARK-40882
> Project: Spark
>  Issue Type: Sub-task
>  Components: Project Infra
>Affects Versions: 3.4.0
>Reporter: Yikun Jiang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-40882) Upgrade actions/setup-java to v3 with distribution specified

2022-10-22 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40882:


Assignee: (was: Apache Spark)

> Upgrade actions/setup-java to v3 with distribution specified
> 
>
> Key: SPARK-40882
> URL: https://issues.apache.org/jira/browse/SPARK-40882
> Project: Spark
>  Issue Type: Sub-task
>  Components: Project Infra
>Affects Versions: 3.4.0
>Reporter: Yikun Jiang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40882) Upgrade actions/setup-java to v3 with distribution specified

2022-10-22 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17622719#comment-17622719
 ] 

Apache Spark commented on SPARK-40882:
--

User 'Yikun' has created a pull request for this issue:
https://github.com/apache/spark/pull/38354

> Upgrade actions/setup-java to v3 with distribution specified
> 
>
> Key: SPARK-40882
> URL: https://issues.apache.org/jira/browse/SPARK-40882
> Project: Spark
>  Issue Type: Sub-task
>  Components: Project Infra
>Affects Versions: 3.4.0
>Reporter: Yikun Jiang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-40882) Upgrade actions/setup-java to v3 with distribution specified

2022-10-22 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40882:


Assignee: Apache Spark

> Upgrade actions/setup-java to v3 with distribution specified
> 
>
> Key: SPARK-40882
> URL: https://issues.apache.org/jira/browse/SPARK-40882
> Project: Spark
>  Issue Type: Sub-task
>  Components: Project Infra
>Affects Versions: 3.4.0
>Reporter: Yikun Jiang
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40881) Upgrade actions/cache to v3 and actions/upload-artifact to v3

2022-10-22 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17622718#comment-17622718
 ] 

Apache Spark commented on SPARK-40881:
--

User 'Yikun' has created a pull request for this issue:
https://github.com/apache/spark/pull/38353

> Upgrade actions/cache to v3 and actions/upload-artifact to v3
> -
>
> Key: SPARK-40881
> URL: https://issues.apache.org/jira/browse/SPARK-40881
> Project: Spark
>  Issue Type: Sub-task
>  Components: Project Infra
>Affects Versions: 3.4.0
>Reporter: Yikun Jiang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40881) Upgrade actions/cache to v3 and actions/upload-artifact to v3

2022-10-22 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17622717#comment-17622717
 ] 

Apache Spark commented on SPARK-40881:
--

User 'Yikun' has created a pull request for this issue:
https://github.com/apache/spark/pull/38353

> Upgrade actions/cache to v3 and actions/upload-artifact to v3
> -
>
> Key: SPARK-40881
> URL: https://issues.apache.org/jira/browse/SPARK-40881
> Project: Spark
>  Issue Type: Sub-task
>  Components: Project Infra
>Affects Versions: 3.4.0
>Reporter: Yikun Jiang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-40881) Upgrade actions/cache to v3 and actions/upload-artifact to v3

2022-10-22 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40881:


Assignee: Apache Spark

> Upgrade actions/cache to v3 and actions/upload-artifact to v3
> -
>
> Key: SPARK-40881
> URL: https://issues.apache.org/jira/browse/SPARK-40881
> Project: Spark
>  Issue Type: Sub-task
>  Components: Project Infra
>Affects Versions: 3.4.0
>Reporter: Yikun Jiang
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-40881) Upgrade actions/cache to v3 and actions/upload-artifact to v3

2022-10-22 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40881:


Assignee: (was: Apache Spark)

> Upgrade actions/cache to v3 and actions/upload-artifact to v3
> -
>
> Key: SPARK-40881
> URL: https://issues.apache.org/jira/browse/SPARK-40881
> Project: Spark
>  Issue Type: Sub-task
>  Components: Project Infra
>Affects Versions: 3.4.0
>Reporter: Yikun Jiang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] (SPARK-40867) Flaky test ProtobufCatalystDataConversionSuite

2022-10-22 Thread Sandish Kumar HN (Jira)


[ https://issues.apache.org/jira/browse/SPARK-40867 ]


Sandish Kumar HN deleted comment on SPARK-40867:
--

was (Author: sanysand...@gmail.com):
this issue got resolved through https://github.com/apache/spark/pull/38286

> Flaky test ProtobufCatalystDataConversionSuite
> --
>
> Key: SPARK-40867
> URL: https://issues.apache.org/jira/browse/SPARK-40867
> Project: Spark
>  Issue Type: Bug
>  Components: Tests
>Affects Versions: 3.4.0
>Reporter: Yang Jie
>Priority: Major
> Fix For: 3.4.0
>
>
> * 
> [https://github.com/LuciferYang/spark/actions/runs/3295309311/jobs/5433733419]
>  * 
> [https://github.com/LuciferYang/spark/actions/runs/3291252601/jobs/5425183034]
> {code:java}
> [info] ProtobufCatalystDataConversionSuite:
> [info] - single StructType(StructField(int32_type,IntegerType,true)) with 
> seed 167 *** FAILED *** (39 milliseconds)
> [info]   Incorrect evaluation (codegen off): from_protobuf(to_protobuf([0], 
> /home/runner/work/spark/spark/connector/protobuf/target/scala-2.12/test-classes/protobuf/catalyst_types.desc,
>  IntegerMsg), 
> /home/runner/work/spark/spark/connector/protobuf/target/scala-2.12/test-classes/protobuf/catalyst_types.desc,
>  IntegerMsg), actual: [null], expected: [0] (ExpressionEvalHelper.scala:209)
> [info]   org.scalatest.exceptions.TestFailedException:
> [info]   at 
> org.scalatest.Assertions.newAssertionFailedException(Assertions.scala:472)
> [info]   at 
> org.scalatest.Assertions.newAssertionFailedException$(Assertions.scala:471)
> [info]   at 
> org.scalatest.funsuite.AnyFunSuite.newAssertionFailedException(AnyFunSuite.scala:1564)
> [info]   at org.scalatest.Assertions.fail(Assertions.scala:933)
> [info]   at org.scalatest.Assertions.fail$(Assertions.scala:929)
> [info]   at org.scalatest.funsuite.AnyFunSuite.fail(AnyFunSuite.scala:1564)
> [info]   at 
> org.apache.spark.sql.catalyst.expressions.ExpressionEvalHelper.checkEvaluationWithoutCodegen(ExpressionEvalHelper.scala:209)
> [info]   at 
> org.apache.spark.sql.catalyst.expressions.ExpressionEvalHelper.checkEvaluationWithoutCodegen$(ExpressionEvalHelper.scala:199)
> [info]   at 
> org.apache.spark.sql.protobuf.ProtobufCatalystDataConversionSuite.checkEvaluationWithoutCodegen(ProtobufCatalystDataConversionSuite.scala:33)
> [info]   at 
> org.apache.spark.sql.catalyst.expressions.ExpressionEvalHelper.checkEvaluation(ExpressionEvalHelper.scala:87)
> [info]   at 
> org.apache.spark.sql.catalyst.expressions.ExpressionEvalHelper.checkEvaluation$(ExpressionEvalHelper.scala:82)
> [info]   at 
> org.apache.spark.sql.protobuf.ProtobufCatalystDataConversionSuite.checkEvaluation(ProtobufCatalystDataConversionSuite.scala:33)
> [info]   at 
> org.apache.spark.sql.protobuf.ProtobufCatalystDataConversionSuite.checkResult(ProtobufCatalystDataConversionSuite.scala:43)
> [info]   at 
> org.apache.spark.sql.protobuf.ProtobufCatalystDataConversionSuite.$anonfun$new$2(ProtobufCatalystDataConversionSuite.scala:122)
> [info]   at 
> scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
> [info]   at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85)
> [info]   at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83)
> [info]   at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
> [info]   at org.scalatest.Transformer.apply(Transformer.scala:22)
> [info]   at org.scalatest.Transformer.apply(Transformer.scala:20)
> [info]   at 
> org.scalatest.funsuite.AnyFunSuiteLike$$anon$1.apply(AnyFunSuiteLike.scala:226)
> [info]   at 
> org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:207)
> [info]   at 
> org.scalatest.funsuite.AnyFunSuiteLike.invokeWithFixture$1(AnyFunSuiteLike.scala:224)
> [info]   at 
> org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$runTest$1(AnyFunSuiteLike.scala:236)
> [info]   at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306)
> [info]   at 
> org.scalatest.funsuite.AnyFunSuiteLike.runTest(AnyFunSuiteLike.scala:236)
> [info]   at 
> org.scalatest.funsuite.AnyFunSuiteLike.runTest$(AnyFunSuiteLike.scala:218)
> [info]   at 
> org.apache.spark.SparkFunSuite.org$scalatest$BeforeAndAfterEach$$super$runTest(SparkFunSuite.scala:66)
> [info]   at 
> org.scalatest.BeforeAndAfterEach.runTest(BeforeAndAfterEach.scala:234)
> [info]   at 
> org.scalatest.BeforeAndAfterEach.runTest$(BeforeAndAfterEach.scala:227)
> [info]   at org.apache.spark.SparkFunSuite.runTest(SparkFunSuite.scala:66)
> [info]   at 
> org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$runTests$1(AnyFunSuiteLike.scala:269)
> [info]   at 
> org.scalatest.SuperEngine.$anonfun$runTestsInBranch$1(Engine.scala:413)
> [info]   at scala.collection.immutable.List.foreach(List.scala:431)
> [info]   at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:401)
> [info]   at org.scalatest.SuperEngine.runTestsInBranch(Engine.scala:396)
> [info]

[jira] [Commented] (SPARK-40867) Flaky test ProtobufCatalystDataConversionSuite

2022-10-22 Thread Sandish Kumar HN (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17622693#comment-17622693
 ] 

Sandish Kumar HN commented on SPARK-40867:
--

[~LuciferYang] this issue got resolved through 
https://github.com/apache/spark/pull/38286

> Flaky test ProtobufCatalystDataConversionSuite
> --
>
> Key: SPARK-40867
> URL: https://issues.apache.org/jira/browse/SPARK-40867
> Project: Spark
>  Issue Type: Bug
>  Components: Tests
>Affects Versions: 3.4.0
>Reporter: Yang Jie
>Priority: Major
>
> * 
> [https://github.com/LuciferYang/spark/actions/runs/3295309311/jobs/5433733419]
>  * 
> [https://github.com/LuciferYang/spark/actions/runs/3291252601/jobs/5425183034]
> {code:java}
> [info] ProtobufCatalystDataConversionSuite:
> [info] - single StructType(StructField(int32_type,IntegerType,true)) with 
> seed 167 *** FAILED *** (39 milliseconds)
> [info]   Incorrect evaluation (codegen off): from_protobuf(to_protobuf([0], 
> /home/runner/work/spark/spark/connector/protobuf/target/scala-2.12/test-classes/protobuf/catalyst_types.desc,
>  IntegerMsg), 
> /home/runner/work/spark/spark/connector/protobuf/target/scala-2.12/test-classes/protobuf/catalyst_types.desc,
>  IntegerMsg), actual: [null], expected: [0] (ExpressionEvalHelper.scala:209)
> [info]   org.scalatest.exceptions.TestFailedException:
> [info]   at 
> org.scalatest.Assertions.newAssertionFailedException(Assertions.scala:472)
> [info]   at 
> org.scalatest.Assertions.newAssertionFailedException$(Assertions.scala:471)
> [info]   at 
> org.scalatest.funsuite.AnyFunSuite.newAssertionFailedException(AnyFunSuite.scala:1564)
> [info]   at org.scalatest.Assertions.fail(Assertions.scala:933)
> [info]   at org.scalatest.Assertions.fail$(Assertions.scala:929)
> [info]   at org.scalatest.funsuite.AnyFunSuite.fail(AnyFunSuite.scala:1564)
> [info]   at 
> org.apache.spark.sql.catalyst.expressions.ExpressionEvalHelper.checkEvaluationWithoutCodegen(ExpressionEvalHelper.scala:209)
> [info]   at 
> org.apache.spark.sql.catalyst.expressions.ExpressionEvalHelper.checkEvaluationWithoutCodegen$(ExpressionEvalHelper.scala:199)
> [info]   at 
> org.apache.spark.sql.protobuf.ProtobufCatalystDataConversionSuite.checkEvaluationWithoutCodegen(ProtobufCatalystDataConversionSuite.scala:33)
> [info]   at 
> org.apache.spark.sql.catalyst.expressions.ExpressionEvalHelper.checkEvaluation(ExpressionEvalHelper.scala:87)
> [info]   at 
> org.apache.spark.sql.catalyst.expressions.ExpressionEvalHelper.checkEvaluation$(ExpressionEvalHelper.scala:82)
> [info]   at 
> org.apache.spark.sql.protobuf.ProtobufCatalystDataConversionSuite.checkEvaluation(ProtobufCatalystDataConversionSuite.scala:33)
> [info]   at 
> org.apache.spark.sql.protobuf.ProtobufCatalystDataConversionSuite.checkResult(ProtobufCatalystDataConversionSuite.scala:43)
> [info]   at 
> org.apache.spark.sql.protobuf.ProtobufCatalystDataConversionSuite.$anonfun$new$2(ProtobufCatalystDataConversionSuite.scala:122)
> [info]   at 
> scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
> [info]   at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85)
> [info]   at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83)
> [info]   at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
> [info]   at org.scalatest.Transformer.apply(Transformer.scala:22)
> [info]   at org.scalatest.Transformer.apply(Transformer.scala:20)
> [info]   at 
> org.scalatest.funsuite.AnyFunSuiteLike$$anon$1.apply(AnyFunSuiteLike.scala:226)
> [info]   at 
> org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:207)
> [info]   at 
> org.scalatest.funsuite.AnyFunSuiteLike.invokeWithFixture$1(AnyFunSuiteLike.scala:224)
> [info]   at 
> org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$runTest$1(AnyFunSuiteLike.scala:236)
> [info]   at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306)
> [info]   at 
> org.scalatest.funsuite.AnyFunSuiteLike.runTest(AnyFunSuiteLike.scala:236)
> [info]   at 
> org.scalatest.funsuite.AnyFunSuiteLike.runTest$(AnyFunSuiteLike.scala:218)
> [info]   at 
> org.apache.spark.SparkFunSuite.org$scalatest$BeforeAndAfterEach$$super$runTest(SparkFunSuite.scala:66)
> [info]   at 
> org.scalatest.BeforeAndAfterEach.runTest(BeforeAndAfterEach.scala:234)
> [info]   at 
> org.scalatest.BeforeAndAfterEach.runTest$(BeforeAndAfterEach.scala:227)
> [info]   at org.apache.spark.SparkFunSuite.runTest(SparkFunSuite.scala:66)
> [info]   at 
> org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$runTests$1(AnyFunSuiteLike.scala:269)
> [info]   at 
> org.scalatest.SuperEngine.$anonfun$runTestsInBranch$1(Engine.scala:413)
> [info]   at scala.collection.immutable.List.foreach(List.scala:431)
> [info]   at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:401)
> [info]   at org.scalatest.SuperE

[jira] [Resolved] (SPARK-40867) Flaky test ProtobufCatalystDataConversionSuite

2022-10-22 Thread Sandish Kumar HN (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sandish Kumar HN resolved SPARK-40867.
--
Fix Version/s: 3.4.0
   Resolution: Fixed

this issue got resolved through https://github.com/apache/spark/pull/38286

> Flaky test ProtobufCatalystDataConversionSuite
> --
>
> Key: SPARK-40867
> URL: https://issues.apache.org/jira/browse/SPARK-40867
> Project: Spark
>  Issue Type: Bug
>  Components: Tests
>Affects Versions: 3.4.0
>Reporter: Yang Jie
>Priority: Major
> Fix For: 3.4.0
>
>
> * 
> [https://github.com/LuciferYang/spark/actions/runs/3295309311/jobs/5433733419]
>  * 
> [https://github.com/LuciferYang/spark/actions/runs/3291252601/jobs/5425183034]
> {code:java}
> [info] ProtobufCatalystDataConversionSuite:
> [info] - single StructType(StructField(int32_type,IntegerType,true)) with 
> seed 167 *** FAILED *** (39 milliseconds)
> [info]   Incorrect evaluation (codegen off): from_protobuf(to_protobuf([0], 
> /home/runner/work/spark/spark/connector/protobuf/target/scala-2.12/test-classes/protobuf/catalyst_types.desc,
>  IntegerMsg), 
> /home/runner/work/spark/spark/connector/protobuf/target/scala-2.12/test-classes/protobuf/catalyst_types.desc,
>  IntegerMsg), actual: [null], expected: [0] (ExpressionEvalHelper.scala:209)
> [info]   org.scalatest.exceptions.TestFailedException:
> [info]   at 
> org.scalatest.Assertions.newAssertionFailedException(Assertions.scala:472)
> [info]   at 
> org.scalatest.Assertions.newAssertionFailedException$(Assertions.scala:471)
> [info]   at 
> org.scalatest.funsuite.AnyFunSuite.newAssertionFailedException(AnyFunSuite.scala:1564)
> [info]   at org.scalatest.Assertions.fail(Assertions.scala:933)
> [info]   at org.scalatest.Assertions.fail$(Assertions.scala:929)
> [info]   at org.scalatest.funsuite.AnyFunSuite.fail(AnyFunSuite.scala:1564)
> [info]   at 
> org.apache.spark.sql.catalyst.expressions.ExpressionEvalHelper.checkEvaluationWithoutCodegen(ExpressionEvalHelper.scala:209)
> [info]   at 
> org.apache.spark.sql.catalyst.expressions.ExpressionEvalHelper.checkEvaluationWithoutCodegen$(ExpressionEvalHelper.scala:199)
> [info]   at 
> org.apache.spark.sql.protobuf.ProtobufCatalystDataConversionSuite.checkEvaluationWithoutCodegen(ProtobufCatalystDataConversionSuite.scala:33)
> [info]   at 
> org.apache.spark.sql.catalyst.expressions.ExpressionEvalHelper.checkEvaluation(ExpressionEvalHelper.scala:87)
> [info]   at 
> org.apache.spark.sql.catalyst.expressions.ExpressionEvalHelper.checkEvaluation$(ExpressionEvalHelper.scala:82)
> [info]   at 
> org.apache.spark.sql.protobuf.ProtobufCatalystDataConversionSuite.checkEvaluation(ProtobufCatalystDataConversionSuite.scala:33)
> [info]   at 
> org.apache.spark.sql.protobuf.ProtobufCatalystDataConversionSuite.checkResult(ProtobufCatalystDataConversionSuite.scala:43)
> [info]   at 
> org.apache.spark.sql.protobuf.ProtobufCatalystDataConversionSuite.$anonfun$new$2(ProtobufCatalystDataConversionSuite.scala:122)
> [info]   at 
> scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
> [info]   at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85)
> [info]   at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83)
> [info]   at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
> [info]   at org.scalatest.Transformer.apply(Transformer.scala:22)
> [info]   at org.scalatest.Transformer.apply(Transformer.scala:20)
> [info]   at 
> org.scalatest.funsuite.AnyFunSuiteLike$$anon$1.apply(AnyFunSuiteLike.scala:226)
> [info]   at 
> org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:207)
> [info]   at 
> org.scalatest.funsuite.AnyFunSuiteLike.invokeWithFixture$1(AnyFunSuiteLike.scala:224)
> [info]   at 
> org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$runTest$1(AnyFunSuiteLike.scala:236)
> [info]   at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306)
> [info]   at 
> org.scalatest.funsuite.AnyFunSuiteLike.runTest(AnyFunSuiteLike.scala:236)
> [info]   at 
> org.scalatest.funsuite.AnyFunSuiteLike.runTest$(AnyFunSuiteLike.scala:218)
> [info]   at 
> org.apache.spark.SparkFunSuite.org$scalatest$BeforeAndAfterEach$$super$runTest(SparkFunSuite.scala:66)
> [info]   at 
> org.scalatest.BeforeAndAfterEach.runTest(BeforeAndAfterEach.scala:234)
> [info]   at 
> org.scalatest.BeforeAndAfterEach.runTest$(BeforeAndAfterEach.scala:227)
> [info]   at org.apache.spark.SparkFunSuite.runTest(SparkFunSuite.scala:66)
> [info]   at 
> org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$runTests$1(AnyFunSuiteLike.scala:269)
> [info]   at 
> org.scalatest.SuperEngine.$anonfun$runTestsInBranch$1(Engine.scala:413)
> [info]   at scala.collection.immutable.List.foreach(List.scala:431)
> [info]   at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:401)
> [info]   at org.scalat

[jira] [Comment Edited] (SPARK-34827) Support fetching shuffle blocks in batch with i/o encryption

2022-10-22 Thread Raj Sharma (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-34827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17622683#comment-17622683
 ] 

Raj Sharma edited comment on SPARK-34827 at 10/22/22 7:24 PM:
--

I like your content.  If anyone wants to learn a new course like Vlocity 
platform developer certification focuses on producing experts who aren't just 
ready to handle the platform but build solutions to keep their respective 
companies and their careers ahead of the competition. Go through this 
link:[https://www.igmguru.com/salesforce/salesforce-vlocity-training-certification|https://www.igmguru.com/salesforce/salesforce-vlocity-training-certification/]


was (Author: JIRAUSER297361):
I like your content.  If anyone wants to learn a new course like [Vlocity 
platform developer 
certification|[https://www.igmguru.com/salesforce/salesforce-vlocity-training-certification/]]
 focuses on producing experts who aren't just ready to handle the platform but 
build solutions to keep their respective companies and their careers ahead of 
the competition.

> Support fetching shuffle blocks in batch with i/o encryption
> 
>
> Key: SPARK-34827
> URL: https://issues.apache.org/jira/browse/SPARK-34827
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0, 3.3.0
>Reporter: Dongjoon Hyun
>Priority: Blocker
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-34827) Support fetching shuffle blocks in batch with i/o encryption

2022-10-22 Thread Raj Sharma (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-34827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17622683#comment-17622683
 ] 

Raj Sharma commented on SPARK-34827:


I like your content.  If anyone wants to learn a new course like [Vlocity 
platform developer 
certification|[https://www.igmguru.com/salesforce/salesforce-vlocity-training-certification/]]
 focuses on producing experts who aren't just ready to handle the platform but 
build solutions to keep their respective companies and their careers ahead of 
the competition.

> Support fetching shuffle blocks in batch with i/o encryption
> 
>
> Key: SPARK-34827
> URL: https://issues.apache.org/jira/browse/SPARK-34827
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0, 3.3.0
>Reporter: Dongjoon Hyun
>Priority: Blocker
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] (SPARK-33807) Data Source V2: Remove read specific distributions

2022-10-22 Thread Chao Sun (Jira)


[ https://issues.apache.org/jira/browse/SPARK-33807 ]


Chao Sun deleted comment on SPARK-33807:
--

was (Author: JIRAUSER295111):
Very informative and effective post.  [Vlocity Platform Developer 
Certification|[https://www.igmguru.com/salesforce/salesforce-vlocity-training-certification/]]
 focuses on producing experts who aren't just ready to handle the platform but 
build solutions to keep their respective companies and their careers ahead of 
the competition.

> Data Source V2: Remove read specific distributions
> --
>
> Key: SPARK-33807
> URL: https://issues.apache.org/jira/browse/SPARK-33807
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Anton Okolnychyi
>Priority: Blocker
>
> We should remove the read-specific distributions for DS V2 as discussed 
> [here|https://github.com/apache/spark/pull/30706#discussion_r543059827].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-33807) Data Source V2: Remove read specific distributions

2022-10-22 Thread Pankaj Nagla (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-33807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17622681#comment-17622681
 ] 

Pankaj Nagla edited comment on SPARK-33807 at 10/22/22 6:36 PM:


Very informative and effective post.  [Vlocity Platform Developer 
Certification|[https://www.igmguru.com/salesforce/salesforce-vlocity-training-certification/]]
 focuses on producing experts who aren't just ready to handle the platform but 
build solutions to keep their respective companies and their careers ahead of 
the competition.


was (Author: JIRAUSER295111):
Very informative and effective post.  [Vlocity Platform Developer 
Certification|[https://www.igmguru.com/salesforce/salesforce-vlocity-training-certification/]]
 focuses on producing experts who aren't just ready to handle the platform but 
build solutions to keep their respective companies and their careers ahead of 
the competition.

> Data Source V2: Remove read specific distributions
> --
>
> Key: SPARK-33807
> URL: https://issues.apache.org/jira/browse/SPARK-33807
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Anton Okolnychyi
>Priority: Blocker
>
> We should remove the read-specific distributions for DS V2 as discussed 
> [here|https://github.com/apache/spark/pull/30706#discussion_r543059827].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-33807) Data Source V2: Remove read specific distributions

2022-10-22 Thread Pankaj Nagla (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-33807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17622681#comment-17622681
 ] 

Pankaj Nagla edited comment on SPARK-33807 at 10/22/22 6:35 PM:


Very informative and effective post.  [Vlocity Platform Developer 
Certification|[https://www.igmguru.com/salesforce/salesforce-vlocity-training-certification/]]
 focuses on producing experts who aren't just ready to handle the platform but 
build solutions to keep their respective companies and their careers ahead of 
the competition.


was (Author: JIRAUSER295111):
Very informative and effective post.  [Vlocity Platform Developer 
Certification|[https://www.igmguru.com/salesforce/salesforce-vlocity-training-certification/]]
 focuses on producing experts who aren't just ready to handle the platform but 
build solutions to keep their respective companies and their careers ahead of 
the competition.

> Data Source V2: Remove read specific distributions
> --
>
> Key: SPARK-33807
> URL: https://issues.apache.org/jira/browse/SPARK-33807
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Anton Okolnychyi
>Priority: Blocker
>
> We should remove the read-specific distributions for DS V2 as discussed 
> [here|https://github.com/apache/spark/pull/30706#discussion_r543059827].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-33807) Data Source V2: Remove read specific distributions

2022-10-22 Thread Pankaj Nagla (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-33807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17622681#comment-17622681
 ] 

Pankaj Nagla commented on SPARK-33807:
--

Very informative and effective post.  [Vlocity Platform Developer 
Certification|[https://www.igmguru.com/salesforce/salesforce-vlocity-training-certification/]]
 focuses on producing experts who aren't just ready to handle the platform but 
build solutions to keep their respective companies and their careers ahead of 
the competition.

> Data Source V2: Remove read specific distributions
> --
>
> Key: SPARK-33807
> URL: https://issues.apache.org/jira/browse/SPARK-33807
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Anton Okolnychyi
>Priority: Blocker
>
> We should remove the read-specific distributions for DS V2 as discussed 
> [here|https://github.com/apache/spark/pull/30706#discussion_r543059827].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40801) Upgrade Apache Commons Text to 1.10

2022-10-22 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17622680#comment-17622680
 ] 

Apache Spark commented on SPARK-40801:
--

User 'bjornjorgensen' has created a pull request for this issue:
https://github.com/apache/spark/pull/38352

> Upgrade Apache Commons Text to 1.10
> ---
>
> Key: SPARK-40801
> URL: https://issues.apache.org/jira/browse/SPARK-40801
> Project: Spark
>  Issue Type: Dependency upgrade
>  Components: Build
>Affects Versions: 3.4.0
>Reporter: Bjørn Jørgensen
>Assignee: Bjørn Jørgensen
>Priority: Major
> Fix For: 3.4.0, 3.3.2
>
>
> [CVE-2022-42889|https://nvd.nist.gov/vuln/detail/CVE-2022-42889]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-34827) Support fetching shuffle blocks in batch with i/o encryption

2022-10-22 Thread Vivek Garg (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-34827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17622679#comment-17622679
 ] 

Vivek Garg commented on SPARK-34827:


I appreciate you sharing this useful information. Very useful and interesting 
post.
[Uipath 
training|https://www.igmguru.com/machine-learning-ai/rpa-uipath-certification-training/].

> Support fetching shuffle blocks in batch with i/o encryption
> 
>
> Key: SPARK-34827
> URL: https://issues.apache.org/jira/browse/SPARK-34827
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0, 3.3.0
>Reporter: Dongjoon Hyun
>Priority: Blocker
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-40391) Test the error class UNSUPPORTED_FEATURE.JDBC_TRANSACTION

2022-10-22 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40391:


Assignee: Apache Spark

> Test the error class UNSUPPORTED_FEATURE.JDBC_TRANSACTION
> -
>
> Key: SPARK-40391
> URL: https://issues.apache.org/jira/browse/SPARK-40391
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Max Gekk
>Assignee: Apache Spark
>Priority: Minor
>
> Add a test for the error class UNSUPPORTED_FEATURE.JDBC_TRANSACTION and place 
> it to QueryExecutionErrorsSuite.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40391) Test the error class UNSUPPORTED_FEATURE.JDBC_TRANSACTION

2022-10-22 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17622655#comment-17622655
 ] 

Apache Spark commented on SPARK-40391:
--

User 'panbingkun' has created a pull request for this issue:
https://github.com/apache/spark/pull/38351

> Test the error class UNSUPPORTED_FEATURE.JDBC_TRANSACTION
> -
>
> Key: SPARK-40391
> URL: https://issues.apache.org/jira/browse/SPARK-40391
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Max Gekk
>Priority: Minor
>
> Add a test for the error class UNSUPPORTED_FEATURE.JDBC_TRANSACTION and place 
> it to QueryExecutionErrorsSuite.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-40391) Test the error class UNSUPPORTED_FEATURE.JDBC_TRANSACTION

2022-10-22 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40391:


Assignee: (was: Apache Spark)

> Test the error class UNSUPPORTED_FEATURE.JDBC_TRANSACTION
> -
>
> Key: SPARK-40391
> URL: https://issues.apache.org/jira/browse/SPARK-40391
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Max Gekk
>Priority: Minor
>
> Add a test for the error class UNSUPPORTED_FEATURE.JDBC_TRANSACTION and place 
> it to QueryExecutionErrorsSuite.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-40856) Update the error template of WRONG_NUM_PARAMS

2022-10-22 Thread Max Gekk (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk resolved SPARK-40856.
--
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 38319
[https://github.com/apache/spark/pull/38319]

> Update the error template of WRONG_NUM_PARAMS
> -
>
> Key: SPARK-40856
> URL: https://issues.apache.org/jira/browse/SPARK-40856
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: BingKun Pan
>Assignee: BingKun Pan
>Priority: Minor
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-40856) Update the error template of WRONG_NUM_PARAMS

2022-10-22 Thread Max Gekk (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk reassigned SPARK-40856:


Assignee: BingKun Pan

> Update the error template of WRONG_NUM_PARAMS
> -
>
> Key: SPARK-40856
> URL: https://issues.apache.org/jira/browse/SPARK-40856
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: BingKun Pan
>Assignee: BingKun Pan
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40588) Sorting issue with AQE turned on

2022-10-22 Thread Enrico Minack (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17622620#comment-17622620
 ] 

Enrico Minack commented on SPARK-40588:
---

Even with AQE enabled (pre Spark 3.4.0), the written files are sorted {*}unless 
spilling occurs{*}.

The reason is that {{FileFormatWriter}} defines a {{requiredOrdering}} as:

[https://github.com/apache/spark/blob/f74867bddfbcdd4d08076db36851e88b15e66556/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileFormatWriter.scala#L188-L189]
{code:java}
val requiredOrdering = partitionColumns ++ 
writerBucketSpec.map(_.bucketIdExpression) ++ sortColumns
{code}
Where {{partitionColumns}} refers to {{.write.partitionBy}} and {{sortColumns}} 
refers to {{.write.sortBy}}, so {{["year", "month", "day"]}} in your case.

It enforces that ordering if the DataFrame is not sorted accordingly. With AQE 
enabled (pre Spark 3.4), {{FileFormatWriter}} does not know about the existing 
ordering and introduces the sorting. This reads the sorted DataFrame (sorted by 
{{["year", "month", "day", "sortCol"]}}) and sorts it by {{["year", "month", 
"day"]}}. If the partition has to be spilled to RAM or disk, it round-robins 
over the spills, because they are all "equal" w.r.t. {{["year", "month", 
"day"]}}, as all next data in the spill files of a partition have the same 
values for these columns (only {{sortCol}} differs, but that is not considered 
by this sort). Hence the order is broken by spilling.

> Sorting issue with AQE turned on  
> --
>
> Key: SPARK-40588
> URL: https://issues.apache.org/jira/browse/SPARK-40588
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.1.3
> Environment: Spark v3.1.3
> Scala v2.12.13
>Reporter: Swetha Baskaran
>Priority: Major
> Attachments: image-2022-10-16-22-05-47-159.png
>
>
> We are attempting to partition data by a few columns, sort by a particular 
> _sortCol_ and write out one file per partition. 
> {code:java}
> df
>     .repartition(col("day"), col("month"), col("year"))
>     .withColumn("partitionId",spark_partition_id)
>     .withColumn("monotonicallyIncreasingIdUnsorted",monotonicallyIncreasingId)
>     .sortWithinPartitions("year", "month", "day", "sortCol")
>     .withColumn("monotonicallyIncreasingIdSorted",monotonicallyIncreasingId)
>     .write
>     .partitionBy("year", "month", "day")
>     .parquet(path){code}
> When inspecting the results, we observe one file per partition, however we 
> see an _alternating_ pattern of unsorted rows in some files.
> {code:java}
> {"sortCol":10,"partitionId":732,"monotonicallyIncreasingIdUnsorted":6287832121344,"monotonicallyIncreasingIdSorted":6287832121344}
> {"sortCol":1303413,"partitionId":732,"monotonicallyIncreasingIdUnsorted":6287877022389,"monotonicallyIncreasingIdSorted":6287876860586}
> {"sortCol":10,"partitionId":732,"monotonicallyIncreasingIdUnsorted":6287877567881,"monotonicallyIncreasingIdSorted":6287832121345}
> {"sortCol":1303413,"partitionId":732,"monotonicallyIncreasingIdUnsorted":6287835105553,"monotonicallyIncreasingIdSorted":6287876860587}
> {"sortCol":10,"partitionId":732,"monotonicallyIncreasingIdUnsorted":6287832570127,"monotonicallyIncreasingIdSorted":6287832121346}
> {"sortCol":1303413,"partitionId":732,"monotonicallyIncreasingIdUnsorted":6287879965760,"monotonicallyIncreasingIdSorted":6287876860588}
> {"sortCol":10,"partitionId":732,"monotonicallyIncreasingIdUnsorted":6287878762347,"monotonicallyIncreasingIdSorted":6287832121347}
> {"sortCol":1303413,"partitionId":732,"monotonicallyIncreasingIdUnsorted":6287837165012,"monotonicallyIncreasingIdSorted":6287876860589}
> {"sortCol":10,"partitionId":732,"monotonicallyIncreasingIdUnsorted":6287832910545,"monotonicallyIncreasingIdSorted":6287832121348}
> {"sortCol":1303413,"partitionId":732,"monotonicallyIncreasingIdUnsorted":6287881244758,"monotonicallyIncreasingIdSorted":6287876860590}
> {"sortCol":10,"partitionId":732,"monotonicallyIncreasingIdUnsorted":6287880041345,"monotonicallyIncreasingIdSorted":6287832121349}{code}
> Here is a 
> [gist|https://gist.github.com/Swebask/543030748a768be92d3c0ae343d2ae89] to 
> reproduce the issue. 
> Turning off AQE with spark.conf.set("spark.sql.adaptive.enabled", false) 
> fixes the issue.
> I'm working on identifying why AQE affects the sort order. Any leads or 
> thoughts would be appreciated!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-40588) Sorting issue with AQE turned on

2022-10-22 Thread Enrico Minack (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17621032#comment-17621032
 ] 

Enrico Minack edited comment on SPARK-40588 at 10/22/22 1:02 PM:
-

Here is a more concise and complete example to reproduce this issue:

{code:scala}
import org.apache.spark.sql.SaveMode

spark.conf.set("spark.sql.adaptive.enabled", true)

val ids = 100
val days = 2
val parts = 2

val ds = spark.range(0, days, 1, parts).withColumnRenamed("id", "day")
  .join(spark.range(0, ids, 1, parts))

ds.repartition($"day")
  .sortWithinPartitions($"day", $"id")
  .write
  .partitionBy("day")
  .mode(SaveMode.Overwrite)
  .csv("interleaved.csv")
{code}

Check the written files are sorted (states {{OK}} when file is sorted):
{code:bash}
for file in interleaved.csv/day\=*/part-*
do
  echo "$(sort -n "$file" | md5sum | cut -d " " -f 1)  $file"
done | md5sum -c
{code}

Files are not sorted for Spark 3.0.x, 3.1.x, 3.2.x and 3.3.x. Current master 
(3.4.0) seems to be fixed.


was (Author: enricomi):
Here is a more concise and complete example to reproduce this issue:

{code:scala}
import org.apache.spark.sql.SaveMode

spark.conf.set("spark.sql.adaptive.enabled", true)

val ids = 1000
val days = 10

val ds = spark.range(days).withColumnRenamed("id", "day").join(spark.range(ids))

// days * 10 is required, as well as a sufficiently large value for ids (10m) 
and day (10)
ds.repartition(days * 10, $"day")
  .sortWithinPartitions($"day", $"id")
  .write
  .partitionBy("day")
  .mode(SaveMode.Overwrite)
  .csv("interleaved.csv")
{code}

Check the written files are sorted (states {{OK}} when file is sorted):
{code:bash}
for file in interleaved.csv/day\=*/part-*
do
  echo "$(sort -n "$file" | md5sum | cut -d " " -f 1)  $file"
done | md5sum -c
{code}

Files are not sorted for Spark 3.0.x, 3.1.x, 3.2.x and 3.3.x. Current master 
(3.4.0) seems to be fixed.

> Sorting issue with AQE turned on  
> --
>
> Key: SPARK-40588
> URL: https://issues.apache.org/jira/browse/SPARK-40588
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.1.3
> Environment: Spark v3.1.3
> Scala v2.12.13
>Reporter: Swetha Baskaran
>Priority: Major
> Attachments: image-2022-10-16-22-05-47-159.png
>
>
> We are attempting to partition data by a few columns, sort by a particular 
> _sortCol_ and write out one file per partition. 
> {code:java}
> df
>     .repartition(col("day"), col("month"), col("year"))
>     .withColumn("partitionId",spark_partition_id)
>     .withColumn("monotonicallyIncreasingIdUnsorted",monotonicallyIncreasingId)
>     .sortWithinPartitions("year", "month", "day", "sortCol")
>     .withColumn("monotonicallyIncreasingIdSorted",monotonicallyIncreasingId)
>     .write
>     .partitionBy("year", "month", "day")
>     .parquet(path){code}
> When inspecting the results, we observe one file per partition, however we 
> see an _alternating_ pattern of unsorted rows in some files.
> {code:java}
> {"sortCol":10,"partitionId":732,"monotonicallyIncreasingIdUnsorted":6287832121344,"monotonicallyIncreasingIdSorted":6287832121344}
> {"sortCol":1303413,"partitionId":732,"monotonicallyIncreasingIdUnsorted":6287877022389,"monotonicallyIncreasingIdSorted":6287876860586}
> {"sortCol":10,"partitionId":732,"monotonicallyIncreasingIdUnsorted":6287877567881,"monotonicallyIncreasingIdSorted":6287832121345}
> {"sortCol":1303413,"partitionId":732,"monotonicallyIncreasingIdUnsorted":6287835105553,"monotonicallyIncreasingIdSorted":6287876860587}
> {"sortCol":10,"partitionId":732,"monotonicallyIncreasingIdUnsorted":6287832570127,"monotonicallyIncreasingIdSorted":6287832121346}
> {"sortCol":1303413,"partitionId":732,"monotonicallyIncreasingIdUnsorted":6287879965760,"monotonicallyIncreasingIdSorted":6287876860588}
> {"sortCol":10,"partitionId":732,"monotonicallyIncreasingIdUnsorted":6287878762347,"monotonicallyIncreasingIdSorted":6287832121347}
> {"sortCol":1303413,"partitionId":732,"monotonicallyIncreasingIdUnsorted":6287837165012,"monotonicallyIncreasingIdSorted":6287876860589}
> {"sortCol":10,"partitionId":732,"monotonicallyIncreasingIdUnsorted":6287832910545,"monotonicallyIncreasingIdSorted":6287832121348}
> {"sortCol":1303413,"partitionId":732,"monotonicallyIncreasingIdUnsorted":6287881244758,"monotonicallyIncreasingIdSorted":6287876860590}
> {"sortCol":10,"partitionId":732,"monotonicallyIncreasingIdUnsorted":6287880041345,"monotonicallyIncreasingIdSorted":6287832121349}{code}
> Here is a 
> [gist|https://gist.github.com/Swebask/543030748a768be92d3c0ae343d2ae89] to 
> reproduce the issue. 
> Turning off AQE with spark.conf.set("spark.sql.adaptive.enabled", false) 
> fixes the issue.
> I'm working on identifying why AQE affects the sort ord

[jira] [Assigned] (SPARK-40752) Migrate type check failures of misc expressions onto error classes

2022-10-22 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40752:


Assignee: (was: Apache Spark)

> Migrate type check failures of misc expressions onto error classes
> --
>
> Key: SPARK-40752
> URL: https://issues.apache.org/jira/browse/SPARK-40752
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Max Gekk
>Priority: Major
>
> Replace TypeCheckFailure by DataTypeMismatch in type checks in the misc 
> expressions:
> 1. Coalesce(1)
> https://github.com/apache/spark/blob/1431975723d8df30a25b2333eddcfd0bb6c57677/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/nullExpressions.scala#L60
> 2. SortOrder(1)
> https://github.com/apache/spark/blob/1431975723d8df30a25b2333eddcfd0bb6c57677/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/SortOrder.scala#L75
> 3. UnwrapUDT(1):
> https://github.com/apache/spark/blob/1431975723d8df30a25b2333eddcfd0bb6c57677/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/UnwrapUDT.scala#L36
> 4. ParseUrl(1):
> https://github.com/apache/spark/blob/1431975723d8df30a25b2333eddcfd0bb6c57677/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/urlExpressions.scala#L185
> 5. XPathExtract(1)
> https://github.com/apache/spark/blob/a241256ed0778005245253fb147db8a16105f75c/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/xml/xpath.scala#L45



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-40752) Migrate type check failures of misc expressions onto error classes

2022-10-22 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40752:


Assignee: Apache Spark

> Migrate type check failures of misc expressions onto error classes
> --
>
> Key: SPARK-40752
> URL: https://issues.apache.org/jira/browse/SPARK-40752
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Max Gekk
>Assignee: Apache Spark
>Priority: Major
>
> Replace TypeCheckFailure by DataTypeMismatch in type checks in the misc 
> expressions:
> 1. Coalesce(1)
> https://github.com/apache/spark/blob/1431975723d8df30a25b2333eddcfd0bb6c57677/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/nullExpressions.scala#L60
> 2. SortOrder(1)
> https://github.com/apache/spark/blob/1431975723d8df30a25b2333eddcfd0bb6c57677/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/SortOrder.scala#L75
> 3. UnwrapUDT(1):
> https://github.com/apache/spark/blob/1431975723d8df30a25b2333eddcfd0bb6c57677/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/UnwrapUDT.scala#L36
> 4. ParseUrl(1):
> https://github.com/apache/spark/blob/1431975723d8df30a25b2333eddcfd0bb6c57677/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/urlExpressions.scala#L185
> 5. XPathExtract(1)
> https://github.com/apache/spark/blob/a241256ed0778005245253fb147db8a16105f75c/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/xml/xpath.scala#L45



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40752) Migrate type check failures of misc expressions onto error classes

2022-10-22 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17622611#comment-17622611
 ] 

Apache Spark commented on SPARK-40752:
--

User 'panbingkun' has created a pull request for this issue:
https://github.com/apache/spark/pull/38350

> Migrate type check failures of misc expressions onto error classes
> --
>
> Key: SPARK-40752
> URL: https://issues.apache.org/jira/browse/SPARK-40752
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Max Gekk
>Priority: Major
>
> Replace TypeCheckFailure by DataTypeMismatch in type checks in the misc 
> expressions:
> 1. Coalesce(1)
> https://github.com/apache/spark/blob/1431975723d8df30a25b2333eddcfd0bb6c57677/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/nullExpressions.scala#L60
> 2. SortOrder(1)
> https://github.com/apache/spark/blob/1431975723d8df30a25b2333eddcfd0bb6c57677/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/SortOrder.scala#L75
> 3. UnwrapUDT(1):
> https://github.com/apache/spark/blob/1431975723d8df30a25b2333eddcfd0bb6c57677/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/UnwrapUDT.scala#L36
> 4. ParseUrl(1):
> https://github.com/apache/spark/blob/1431975723d8df30a25b2333eddcfd0bb6c57677/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/urlExpressions.scala#L185
> 5. XPathExtract(1)
> https://github.com/apache/spark/blob/a241256ed0778005245253fb147db8a16105f75c/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/xml/xpath.scala#L45



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-39404) Unable to query _metadata in streaming if getBatch returns multiple logical nodes in the DataFrame

2022-10-22 Thread Jungtaek Lim (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39404?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jungtaek Lim updated SPARK-39404:
-
Fix Version/s: 3.3.2

> Unable to query _metadata in streaming if getBatch returns multiple logical 
> nodes in the DataFrame
> --
>
> Key: SPARK-39404
> URL: https://issues.apache.org/jira/browse/SPARK-39404
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming
>Affects Versions: 3.2.1
>Reporter: Yaohua Zhao
>Assignee: Yaohua Zhao
>Priority: Major
> Fix For: 3.4.0, 3.3.2
>
>
> Here: 
> [https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/MicroBatchExecution.scala#L585]
>  
> We should probably `transform` instead of `match`



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-40808) Infer schema for CSV files - wrong behavior using header + merge schema

2022-10-22 Thread zzzzming95 (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17622594#comment-17622594
 ] 

ming95 edited comment on SPARK-40808 at 10/22/22 9:39 AM:
--

[~ohadm] 

In spark , infer csv schema will skip first line when set header option is 
true. But only the header of one file will be regarded as the real first line, 
which means that if there are two files with different headers, the header of 
one file will be used as data to infer schema.

 

In this case , You can keep all files with the same header to pass unit test4. 
{code:java}
//file2.csv
"int_col","string_col","double_col","int2_col"
12,"hello2",1.432
22,"world2",5.5342
32,"my name2",86.4552
42,"is ohad2",6.2342 {code}
 

Read the csv directory, it is reasonable to assume that all files in the 
directory have the same schema by default. If there are no other doubts, i will 
mark this issue as resolved.


was (Author: zing):
[~ohadm] 

In spark , infer csv schema will skip first line when set header option is 
true. But only the header of one file will be regarded as the real first line, 
which means that if there are two files with different headers, the header of 
one file will be used as data to infer schema.

 

In this case , You can keep all files with the same header to pass unit test4. 

 
{code:java}
//file2.csv
"int_col","string_col","double_col","int2_col"
12,"hello2",1.432
22,"world2",5.5342
32,"my name2",86.4552
42,"is ohad2",6.2342 {code}

> Infer schema for CSV files - wrong behavior using header + merge schema
> ---
>
> Key: SPARK-40808
> URL: https://issues.apache.org/jira/browse/SPARK-40808
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.2
>Reporter: ohad
>Priority: Major
>  Labels: CSVReader, csv, csvparser
> Attachments: test_csv.py
>
>
> Hello. 
> I am writing unit-tests to some functionality in my application that reading 
> data from CSV files using Spark.
> I am reading the data using:
> {code:java}
> header=True
> mergeSchema=True
> inferSchema=True{code}
> When I am reading this single file:
> {code:java}
> File1:
> "int_col","string_col","decimal_col","date_col"
> 1,"hello",1.43,2022-02-23
> 2,"world",5.534,2021-05-05
> 3,"my name",86.455,2011-08-15
> 4,"is ohad",6.234,2002-03-22{code}
> I am getting this schema:
> {code:java}
> int_col=int
> string_col=string
> decimal_col=double
> date_col=string{code}
> When I am duplicating this file, I am getting the same schema.
> The strange part is when I am adding new int column, it looks like spark is 
> getting confused and think that the column that already identified as int are 
> now string:
> {code:java}
> File1:
> "int_col","string_col","decimal_col","date_col"
> 1,"hello",1.43,2022-02-23
> 2,"world",5.534,2021-05-05
> 3,"my name",86.455,2011-08-15
> 4,"is ohad",6.234,2002-03-22
> File2:
> "int_col","string_col","decimal_col","date_col","int2_col"
> 1,"hello",1.43,2022-02-23,234
> 2,"world",5.534,2021-05-05,5
> 3,"my name",86.455,2011-08-15,32
> 4,"is ohad",6.234,2002-03-22,2
> {code}
> result:
> {code:java}
> int_col=string
> string_col=string
> decimal_col=string
> date_col=string
> int2_col=int{code}
> When I am reading only the second file, it looks fine:
> {code:java}
> File2:
> "int_col","string_col","decimal_col","date_col","int2_col"
> 1,"hello",1.43,2022-02-23,234
> 2,"world",5.534,2021-05-05,5
> 3,"my name",86.455,2011-08-15,32
> 4,"is ohad",6.234,2002-03-22,2{code}
> result:
> {code:java}
> int_col=int
> string_col=string
> decimal_col=double
> date_col=string
> int2_col=int{code}
> For conclusion, it looks like there is a bug mixing the two features: header 
> recognition and merge schema.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40808) Infer schema for CSV files - wrong behavior using header + merge schema

2022-10-22 Thread zzzzming95 (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17622594#comment-17622594
 ] 

ming95 commented on SPARK-40808:


[~ohadm] 

In spark , infer csv schema will skip first line when set header option is 
true. But only the header of one file will be regarded as the real first line, 
which means that if there are two files with different headers, the header of 
one file will be used as data to infer schema.

 

In this case , You can keep all files with the same header to pass unit test4. 

 
{code:java}
//file2.csv
"int_col","string_col","double_col","int2_col"
12,"hello2",1.432
22,"world2",5.5342
32,"my name2",86.4552
42,"is ohad2",6.2342 {code}

> Infer schema for CSV files - wrong behavior using header + merge schema
> ---
>
> Key: SPARK-40808
> URL: https://issues.apache.org/jira/browse/SPARK-40808
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.2
>Reporter: ohad
>Priority: Major
>  Labels: CSVReader, csv, csvparser
> Attachments: test_csv.py
>
>
> Hello. 
> I am writing unit-tests to some functionality in my application that reading 
> data from CSV files using Spark.
> I am reading the data using:
> {code:java}
> header=True
> mergeSchema=True
> inferSchema=True{code}
> When I am reading this single file:
> {code:java}
> File1:
> "int_col","string_col","decimal_col","date_col"
> 1,"hello",1.43,2022-02-23
> 2,"world",5.534,2021-05-05
> 3,"my name",86.455,2011-08-15
> 4,"is ohad",6.234,2002-03-22{code}
> I am getting this schema:
> {code:java}
> int_col=int
> string_col=string
> decimal_col=double
> date_col=string{code}
> When I am duplicating this file, I am getting the same schema.
> The strange part is when I am adding new int column, it looks like spark is 
> getting confused and think that the column that already identified as int are 
> now string:
> {code:java}
> File1:
> "int_col","string_col","decimal_col","date_col"
> 1,"hello",1.43,2022-02-23
> 2,"world",5.534,2021-05-05
> 3,"my name",86.455,2011-08-15
> 4,"is ohad",6.234,2002-03-22
> File2:
> "int_col","string_col","decimal_col","date_col","int2_col"
> 1,"hello",1.43,2022-02-23,234
> 2,"world",5.534,2021-05-05,5
> 3,"my name",86.455,2011-08-15,32
> 4,"is ohad",6.234,2002-03-22,2
> {code}
> result:
> {code:java}
> int_col=string
> string_col=string
> decimal_col=string
> date_col=string
> int2_col=int{code}
> When I am reading only the second file, it looks fine:
> {code:java}
> File2:
> "int_col","string_col","decimal_col","date_col","int2_col"
> 1,"hello",1.43,2022-02-23,234
> 2,"world",5.534,2021-05-05,5
> 3,"my name",86.455,2011-08-15,32
> 4,"is ohad",6.234,2002-03-22,2{code}
> result:
> {code:java}
> int_col=int
> string_col=string
> decimal_col=double
> date_col=string
> int2_col=int{code}
> For conclusion, it looks like there is a bug mixing the two features: header 
> recognition and merge schema.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40884) Upgrade fabric8io - kubernetes-client to 6.2.0

2022-10-22 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17622582#comment-17622582
 ] 

Apache Spark commented on SPARK-40884:
--

User 'bjornjorgensen' has created a pull request for this issue:
https://github.com/apache/spark/pull/38348

> Upgrade fabric8io - kubernetes-client to 6.2.0
> --
>
> Key: SPARK-40884
> URL: https://issues.apache.org/jira/browse/SPARK-40884
> Project: Spark
>  Issue Type: Dependency upgrade
>  Components: Build
>Affects Versions: 3.4.0
>Reporter: Bjørn Jørgensen
>Priority: Major
>
> [Release 
> notes|https://github.com/fabric8io/kubernetes-client/releases/tag/v6.2.0]
> [Snakeyaml version should be updated to mitigate 
> CVE-2022-28857|https://github.com/fabric8io/kubernetes-client/issues/4383]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-40884) Upgrade fabric8io - kubernetes-client to 6.2.0

2022-10-22 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40884:


Assignee: Apache Spark

> Upgrade fabric8io - kubernetes-client to 6.2.0
> --
>
> Key: SPARK-40884
> URL: https://issues.apache.org/jira/browse/SPARK-40884
> Project: Spark
>  Issue Type: Dependency upgrade
>  Components: Build
>Affects Versions: 3.4.0
>Reporter: Bjørn Jørgensen
>Assignee: Apache Spark
>Priority: Major
>
> [Release 
> notes|https://github.com/fabric8io/kubernetes-client/releases/tag/v6.2.0]
> [Snakeyaml version should be updated to mitigate 
> CVE-2022-28857|https://github.com/fabric8io/kubernetes-client/issues/4383]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-40884) Upgrade fabric8io - kubernetes-client to 6.2.0

2022-10-22 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40884:


Assignee: (was: Apache Spark)

> Upgrade fabric8io - kubernetes-client to 6.2.0
> --
>
> Key: SPARK-40884
> URL: https://issues.apache.org/jira/browse/SPARK-40884
> Project: Spark
>  Issue Type: Dependency upgrade
>  Components: Build
>Affects Versions: 3.4.0
>Reporter: Bjørn Jørgensen
>Priority: Major
>
> [Release 
> notes|https://github.com/fabric8io/kubernetes-client/releases/tag/v6.2.0]
> [Snakeyaml version should be updated to mitigate 
> CVE-2022-28857|https://github.com/fabric8io/kubernetes-client/issues/4383]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40884) Upgrade fabric8io - kubernetes-client to 6.2.0

2022-10-22 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17622581#comment-17622581
 ] 

Apache Spark commented on SPARK-40884:
--

User 'bjornjorgensen' has created a pull request for this issue:
https://github.com/apache/spark/pull/38348

> Upgrade fabric8io - kubernetes-client to 6.2.0
> --
>
> Key: SPARK-40884
> URL: https://issues.apache.org/jira/browse/SPARK-40884
> Project: Spark
>  Issue Type: Dependency upgrade
>  Components: Build
>Affects Versions: 3.4.0
>Reporter: Bjørn Jørgensen
>Priority: Major
>
> [Release 
> notes|https://github.com/fabric8io/kubernetes-client/releases/tag/v6.2.0]
> [Snakeyaml version should be updated to mitigate 
> CVE-2022-28857|https://github.com/fabric8io/kubernetes-client/issues/4383]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-40884) Upgrade fabric8io - kubernetes-client to 6.2.0

2022-10-22 Thread Jira
Bjørn Jørgensen created SPARK-40884:
---

 Summary: Upgrade fabric8io - kubernetes-client to 6.2.0
 Key: SPARK-40884
 URL: https://issues.apache.org/jira/browse/SPARK-40884
 Project: Spark
  Issue Type: Dependency upgrade
  Components: Build
Affects Versions: 3.4.0
Reporter: Bjørn Jørgensen


[Release 
notes|https://github.com/fabric8io/kubernetes-client/releases/tag/v6.2.0]

[Snakeyaml version should be updated to mitigate 
CVE-2022-28857|https://github.com/fabric8io/kubernetes-client/issues/4383]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40883) Support Range in Connect proto

2022-10-22 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17622576#comment-17622576
 ] 

Apache Spark commented on SPARK-40883:
--

User 'amaliujia' has created a pull request for this issue:
https://github.com/apache/spark/pull/38347

> Support Range in Connect proto
> --
>
> Key: SPARK-40883
> URL: https://issues.apache.org/jira/browse/SPARK-40883
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Rui Wang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40883) Support Range in Connect proto

2022-10-22 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17622575#comment-17622575
 ] 

Apache Spark commented on SPARK-40883:
--

User 'amaliujia' has created a pull request for this issue:
https://github.com/apache/spark/pull/38347

> Support Range in Connect proto
> --
>
> Key: SPARK-40883
> URL: https://issues.apache.org/jira/browse/SPARK-40883
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Rui Wang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-40883) Support Range in Connect proto

2022-10-22 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40883:


Assignee: Apache Spark

> Support Range in Connect proto
> --
>
> Key: SPARK-40883
> URL: https://issues.apache.org/jira/browse/SPARK-40883
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Rui Wang
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-40883) Support Range in Connect proto

2022-10-22 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40883:


Assignee: (was: Apache Spark)

> Support Range in Connect proto
> --
>
> Key: SPARK-40883
> URL: https://issues.apache.org/jira/browse/SPARK-40883
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Rui Wang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-40883) Support Range in Connect proto

2022-10-22 Thread Rui Wang (Jira)
Rui Wang created SPARK-40883:


 Summary: Support Range in Connect proto
 Key: SPARK-40883
 URL: https://issues.apache.org/jira/browse/SPARK-40883
 Project: Spark
  Issue Type: Sub-task
  Components: Connect
Affects Versions: 3.4.0
Reporter: Rui Wang






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-40881) Upgrade actions/cache to v3 and actions/upload-artifact to v3

2022-10-22 Thread Yikun Jiang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yikun Jiang updated SPARK-40881:

Summary: Upgrade actions/cache to v3 and actions/upload-artifact to v3  
(was: Upgrade actions/cache and actions/upload-artifact)

> Upgrade actions/cache to v3 and actions/upload-artifact to v3
> -
>
> Key: SPARK-40881
> URL: https://issues.apache.org/jira/browse/SPARK-40881
> Project: Spark
>  Issue Type: Sub-task
>  Components: Project Infra
>Affects Versions: 3.4.0
>Reporter: Yikun Jiang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-40882) Upgrade actions/setup-java to v3 with distribution specified

2022-10-22 Thread Yikun Jiang (Jira)
Yikun Jiang created SPARK-40882:
---

 Summary: Upgrade actions/setup-java to v3 with distribution 
specified
 Key: SPARK-40882
 URL: https://issues.apache.org/jira/browse/SPARK-40882
 Project: Spark
  Issue Type: Sub-task
  Components: Project Infra
Affects Versions: 3.4.0
Reporter: Yikun Jiang






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-40881) Upgrade actions/cache and actions/upload-artifact

2022-10-22 Thread Yikun Jiang (Jira)
Yikun Jiang created SPARK-40881:
---

 Summary: Upgrade actions/cache and actions/upload-artifact
 Key: SPARK-40881
 URL: https://issues.apache.org/jira/browse/SPARK-40881
 Project: Spark
  Issue Type: Sub-task
  Components: Project Infra
Affects Versions: 3.4.0
Reporter: Yikun Jiang






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-40871) Upgrade actions/script to v6 and fix notify workflow

2022-10-22 Thread Yikun Jiang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yikun Jiang updated SPARK-40871:

Summary: Upgrade actions/script to v6 and fix notify workflow  (was: 
Upgrade actions/script to v6)

> Upgrade actions/script to v6 and fix notify workflow
> 
>
> Key: SPARK-40871
> URL: https://issues.apache.org/jira/browse/SPARK-40871
> Project: Spark
>  Issue Type: Sub-task
>  Components: Project Infra
>Affects Versions: 3.4.0
>Reporter: Yikun Jiang
>Assignee: Yikun Jiang
>Priority: Major
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-40870) Upgrade docker actions to cleanup warning

2022-10-22 Thread Yikun Jiang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yikun Jiang reassigned SPARK-40870:
---

Assignee: Yikun Jiang

> Upgrade docker actions to cleanup warning
> -
>
> Key: SPARK-40870
> URL: https://issues.apache.org/jira/browse/SPARK-40870
> Project: Spark
>  Issue Type: Sub-task
>  Components: Project Infra
>Affects Versions: 3.4.0
>Reporter: Yikun Jiang
>Assignee: Yikun Jiang
>Priority: Major
>
> docker/setup-qemu-action@v2
> docker/setup-buildx-action@v2
> docker/build-push-action@v3
> docker/login-action@v2



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-40870) Upgrade docker actions to cleanup warning

2022-10-22 Thread Yikun Jiang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yikun Jiang resolved SPARK-40870.
-
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 38342
[https://github.com/apache/spark/pull/38342]

> Upgrade docker actions to cleanup warning
> -
>
> Key: SPARK-40870
> URL: https://issues.apache.org/jira/browse/SPARK-40870
> Project: Spark
>  Issue Type: Sub-task
>  Components: Project Infra
>Affects Versions: 3.4.0
>Reporter: Yikun Jiang
>Assignee: Yikun Jiang
>Priority: Major
> Fix For: 3.4.0
>
>
> docker/setup-qemu-action@v2
> docker/setup-buildx-action@v2
> docker/build-push-action@v3
> docker/login-action@v2



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org