[jira] [Assigned] (SPARK-37945) Use error classes in the execution errors of arithmetic ops
[ https://issues.apache.org/jira/browse/SPARK-37945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Max Gekk reassigned SPARK-37945: Assignee: Khalid Mammadov > Use error classes in the execution errors of arithmetic ops > --- > > Key: SPARK-37945 > URL: https://issues.apache.org/jira/browse/SPARK-37945 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.3.0 >Reporter: Max Gekk >Assignee: Khalid Mammadov >Priority: Major > > Migrate the following errors in QueryExecutionErrors: > * overflowInSumOfDecimalError > * overflowInIntegralDivideError > * arithmeticOverflowError > * unaryMinusCauseOverflowError > * binaryArithmeticCauseOverflowError > * unscaledValueTooLargeForPrecisionError > * decimalPrecisionExceedsMaxPrecisionError > * outOfDecimalTypeRangeError > * integerOverflowError > onto use error classes. Throw an implementation of SparkThrowable. Also write > a test per every error in QueryExecutionErrorsSuite. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-37945) Use error classes in the execution errors of arithmetic ops
[ https://issues.apache.org/jira/browse/SPARK-37945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Max Gekk resolved SPARK-37945. -- Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 38273 [https://github.com/apache/spark/pull/38273] > Use error classes in the execution errors of arithmetic ops > --- > > Key: SPARK-37945 > URL: https://issues.apache.org/jira/browse/SPARK-37945 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.3.0 >Reporter: Max Gekk >Assignee: Khalid Mammadov >Priority: Major > Fix For: 3.4.0 > > > Migrate the following errors in QueryExecutionErrors: > * overflowInSumOfDecimalError > * overflowInIntegralDivideError > * arithmeticOverflowError > * unaryMinusCauseOverflowError > * binaryArithmeticCauseOverflowError > * unscaledValueTooLargeForPrecisionError > * decimalPrecisionExceedsMaxPrecisionError > * outOfDecimalTypeRangeError > * integerOverflowError > onto use error classes. Throw an implementation of SparkThrowable. Also write > a test per every error in QueryExecutionErrorsSuite. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-40885) Spark will filter out data field sorting when dynamic partitions and data fields are sorted at the same time
[ https://issues.apache.org/jira/browse/SPARK-40885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuming Wang updated SPARK-40885: Fix Version/s: (was: 3.4.0) > Spark will filter out data field sorting when dynamic partitions and data > fields are sorted at the same time > > > Key: SPARK-40885 > URL: https://issues.apache.org/jira/browse/SPARK-40885 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.1.2, 3.3.0, 3.2.2 >Reporter: ming95 >Priority: Major > Attachments: 1666494504884.jpg > > > When using dynamic partitions to write data and sort partitions and data > fields, Spark will filter the sorting of data fields. > > reproduce sql: > {code:java} > CREATE TABLE `sort_table`( > `id` int, > `name` string > ) > PARTITIONED BY ( > `dt` string) > stored as textfile > LOCATION 'sort_table';CREATE TABLE `test_table`( > `id` int, > `name` string) > PARTITIONED BY ( > `dt` string) > stored as textfile > LOCATION > 'test_table';//gen test data > insert into test_table partition(dt=20221011) select 10,"15" union all select > 1,"10" union all select 5,"50" union all select 20,"2" union all select > 30,"14" ; > set spark.hadoop.hive.exec.dynamici.partition=true; > set spark.hadoop.hive.exec.dynamic.partition.mode=nonstrict; > // this sql sort with partition filed (`dt`) and data filed (`name`), but > sort with `name` can not work > insert overwrite table sort_table partition(dt) select id,name,dt from > test_table order by name,dt; > {code} > > The Sort operator of DAG has only one sort field, but there are actually two > in SQL.(See the attached drawing) > > It relate this issue : https://issues.apache.org/jira/browse/SPARK-40588 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-40885) Spark will filter out data field sorting when dynamic partitions and data fields are sorted at the same time
[ https://issues.apache.org/jira/browse/SPARK-40885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ming95 updated SPARK-40885: --- Description: When using dynamic partitions to write data and sort partitions and data fields, Spark will filter the sorting of data fields. reproduce sql: {code:java} CREATE TABLE `sort_table`( `id` int, `name` string ) PARTITIONED BY ( `dt` string) stored as textfile LOCATION 'sort_table';CREATE TABLE `test_table`( `id` int, `name` string) PARTITIONED BY ( `dt` string) stored as textfile LOCATION 'test_table';//gen test data insert into test_table partition(dt=20221011) select 10,"15" union all select 1,"10" union all select 5,"50" union all select 20,"2" union all select 30,"14" ; set spark.hadoop.hive.exec.dynamici.partition=true; set spark.hadoop.hive.exec.dynamic.partition.mode=nonstrict; // this sql sort with partition filed (`dt`) and data filed (`name`), but sort with `name` can not work insert overwrite table sort_table partition(dt) select id,name,dt from test_table order by name,dt; {code} The Sort operator of DAG has only one sort field, but there are actually two in SQL.(See the attached drawing) It relate this issue : https://issues.apache.org/jira/browse/SPARK-40588 was: When using dynamic partitions to write data and sort partitions and data fields, Spark will filter the sorting of data fields. reproduce sql: {code:java} CREATE TABLE `sort_table`( `id` int, `name` string ) PARTITIONED BY ( `dt` string) stored as textfile LOCATION 'sort_table';CREATE TABLE `test_table`( `id` int, `name` string) PARTITIONED BY ( `dt` string) stored as textfile LOCATION 'test_table';//gen test data insert into test_table partition(dt=20221011) select 10,"15" union all select 1,"10" union all select 5,"50" union all select 20,"2" union all select 30,"14" ; set spark.hadoop.hive.exec.dynamici.partition=true set spark.hadoop.hive.exec.dynamic.partition.mode=nonstrict insert overwrite table sort_table partition(dt) select id,name,dt from test_table order by name,dt; {code} The Sort operator of DAG has only one sort field, but there are actually two in SQL.(See the attached drawing) It relate this issue : https://issues.apache.org/jira/browse/SPARK-40588 > Spark will filter out data field sorting when dynamic partitions and data > fields are sorted at the same time > > > Key: SPARK-40885 > URL: https://issues.apache.org/jira/browse/SPARK-40885 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.1.2, 3.3.0, 3.2.2 >Reporter: ming95 >Priority: Major > Fix For: 3.4.0 > > Attachments: 1666494504884.jpg > > > When using dynamic partitions to write data and sort partitions and data > fields, Spark will filter the sorting of data fields. > > reproduce sql: > {code:java} > CREATE TABLE `sort_table`( > `id` int, > `name` string > ) > PARTITIONED BY ( > `dt` string) > stored as textfile > LOCATION 'sort_table';CREATE TABLE `test_table`( > `id` int, > `name` string) > PARTITIONED BY ( > `dt` string) > stored as textfile > LOCATION > 'test_table';//gen test data > insert into test_table partition(dt=20221011) select 10,"15" union all select > 1,"10" union all select 5,"50" union all select 20,"2" union all select > 30,"14" ; > set spark.hadoop.hive.exec.dynamici.partition=true; > set spark.hadoop.hive.exec.dynamic.partition.mode=nonstrict; > // this sql sort with partition filed (`dt`) and data filed (`name`), but > sort with `name` can not work > insert overwrite table sort_table partition(dt) select id,name,dt from > test_table order by name,dt; > {code} > > The Sort operator of DAG has only one sort field, but there are actually two > in SQL.(See the attached drawing) > > It relate this issue : https://issues.apache.org/jira/browse/SPARK-40588 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-40885) Spark will filter out data field sorting when dynamic partitions and data fields are sorted at the same time
[ https://issues.apache.org/jira/browse/SPARK-40885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ming95 updated SPARK-40885: --- Description: When using dynamic partitions to write data and sort partitions and data fields, Spark will filter the sorting of data fields. reproduce sql: {code:java} CREATE TABLE `sort_table`( `id` int, `name` string ) PARTITIONED BY ( `dt` string) stored as textfile LOCATION 'sort_table';CREATE TABLE `test_table`( `id` int, `name` string) PARTITIONED BY ( `dt` string) stored as textfile LOCATION 'test_table';//gen test data insert into test_table partition(dt=20221011) select 10,"15" union all select 1,"10" union all select 5,"50" union all select 20,"2" union all select 30,"14" ; set spark.hadoop.hive.exec.dynamici.partition=true set spark.hadoop.hive.exec.dynamic.partition.mode=nonstrict insert overwrite table sort_table partition(dt) select id,name,dt from test_table order by name,dt; {code} The Sort operator of DAG has only one sort field, but there are actually two in SQL.(See the attached drawing) It relate this issue : https://issues.apache.org/jira/browse/SPARK-40588 was: When using dynamic partitions to write data and sort partitions and data fields, Spark will filter the sorting of data fields. reproduce sql: {code:java} CREATE TABLE `sort_table`( `id` int, `name` string ) PARTITIONED BY ( `dt` string) stored as textfile LOCATION 'sort_table';CREATE TABLE `test_table`( `id` int, `name` string) PARTITIONED BY ( `dt` string) stored as textfile LOCATION 'test_table';//gen test data insert into test_table partition(dt=20221011) select 10,"15" union all select 1,"10" union all select 5,"50" union all select 20,"2" union all select 30,"14" ; set spark.hadoop.hive.exec.dynamici.partition=true set spark.hadoop.hive.exec.dynamic.partition.mode=nonstrict insert overwrite table sort_table partition(dt) select id,name,dt from test_table order by name,dt; {code} !image-2022-10-23-11-09-47-759.png! The Sort operator of DAG has only one sort field, but there are actually two in SQL. It relate this issue : https://issues.apache.org/jira/browse/SPARK-40588 > Spark will filter out data field sorting when dynamic partitions and data > fields are sorted at the same time > > > Key: SPARK-40885 > URL: https://issues.apache.org/jira/browse/SPARK-40885 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.1.2, 3.3.0, 3.2.2 >Reporter: ming95 >Priority: Major > Fix For: 3.4.0 > > Attachments: 1666494504884.jpg > > > When using dynamic partitions to write data and sort partitions and data > fields, Spark will filter the sorting of data fields. > > reproduce sql: > {code:java} > CREATE TABLE `sort_table`( > `id` int, > `name` string > ) > PARTITIONED BY ( > `dt` string) > stored as textfile > LOCATION 'sort_table';CREATE TABLE `test_table`( > `id` int, > `name` string) > PARTITIONED BY ( > `dt` string) > stored as textfile > LOCATION > 'test_table';//gen test data > insert into test_table partition(dt=20221011) select 10,"15" union all select > 1,"10" union all select 5,"50" union all select 20,"2" union all select > 30,"14" ; > set spark.hadoop.hive.exec.dynamici.partition=true > set spark.hadoop.hive.exec.dynamic.partition.mode=nonstrict > insert overwrite table sort_table partition(dt) select id,name,dt from > test_table order by name,dt; > {code} > > The Sort operator of DAG has only one sort field, but there are actually two > in SQL.(See the attached drawing) > > It relate this issue : https://issues.apache.org/jira/browse/SPARK-40588 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-40885) Spark will filter out data field sorting when dynamic partitions and data fields are sorted at the same time
[ https://issues.apache.org/jira/browse/SPARK-40885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ming95 updated SPARK-40885: --- Attachment: 1666494504884.jpg > Spark will filter out data field sorting when dynamic partitions and data > fields are sorted at the same time > > > Key: SPARK-40885 > URL: https://issues.apache.org/jira/browse/SPARK-40885 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.1.2, 3.3.0, 3.2.2 >Reporter: ming95 >Priority: Major > Fix For: 3.4.0 > > Attachments: 1666494504884.jpg > > > When using dynamic partitions to write data and sort partitions and data > fields, Spark will filter the sorting of data fields. > > reproduce sql: > {code:java} > CREATE TABLE `sort_table`( > `id` int, > `name` string > ) > PARTITIONED BY ( > `dt` string) > stored as textfile > LOCATION 'sort_table';CREATE TABLE `test_table`( > `id` int, > `name` string) > PARTITIONED BY ( > `dt` string) > stored as textfile > LOCATION > 'test_table';//gen test data > insert into test_table partition(dt=20221011) select 10,"15" union all select > 1,"10" union all select 5,"50" union all select 20,"2" union all select > 30,"14" ; > set spark.hadoop.hive.exec.dynamici.partition=true > set spark.hadoop.hive.exec.dynamic.partition.mode=nonstrict > insert overwrite table sort_table partition(dt) select id,name,dt from > test_table order by name,dt; > {code} > !image-2022-10-23-11-09-47-759.png! > The Sort operator of DAG has only one sort field, but there are actually two > in SQL. > > It relate this issue : https://issues.apache.org/jira/browse/SPARK-40588 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-40885) Spark will filter out data field sorting when dynamic partitions and data fields are sorted at the same time
[ https://issues.apache.org/jira/browse/SPARK-40885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ming95 updated SPARK-40885: --- External issue URL: https://issues.apache.org/jira/browse/SPARK-40588 Description: When using dynamic partitions to write data and sort partitions and data fields, Spark will filter the sorting of data fields. reproduce sql: {code:java} CREATE TABLE `sort_table`( `id` int, `name` string ) PARTITIONED BY ( `dt` string) stored as textfile LOCATION 'sort_table';CREATE TABLE `test_table`( `id` int, `name` string) PARTITIONED BY ( `dt` string) stored as textfile LOCATION 'test_table';//gen test data insert into test_table partition(dt=20221011) select 10,"15" union all select 1,"10" union all select 5,"50" union all select 20,"2" union all select 30,"14" ; set spark.hadoop.hive.exec.dynamici.partition=true set spark.hadoop.hive.exec.dynamic.partition.mode=nonstrict insert overwrite table sort_table partition(dt) select id,name,dt from test_table order by name,dt; {code} !image-2022-10-23-11-09-47-759.png! The Sort operator of DAG has only one sort field, but there are actually two in SQL. It relate this issue : https://issues.apache.org/jira/browse/SPARK-40588 was: When using dynamic partitions to write data and sort partitions and data fields, Spark will filter the sorting of data fields. reproduce sql: {code:java} CREATE TABLE `sort_table`( `id` int, `name` string ) PARTITIONED BY ( `dt` string) stored as textfile LOCATION 'sort_table';CREATE TABLE `test_table`( `id` int, `name` string) PARTITIONED BY ( `dt` string) stored as textfile LOCATION 'test_table';//gen test data insert into test_table partition(dt=20221011) select 10,"15" union all select 1,"10" union all select 5,"50" union all select 20,"2" union all select 30,"14" ; set spark.hadoop.hive.exec.dynamici.partition=true set spark.hadoop.hive.exec.dynamic.partition.mode=nonstrict insert overwrite table sort_table partition(dt) select id,name,dt from test_table order by name,dt; {code} > Spark will filter out data field sorting when dynamic partitions and data > fields are sorted at the same time > > > Key: SPARK-40885 > URL: https://issues.apache.org/jira/browse/SPARK-40885 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.1.2, 3.3.0, 3.2.2 >Reporter: ming95 >Priority: Major > Fix For: 3.4.0 > > > When using dynamic partitions to write data and sort partitions and data > fields, Spark will filter the sorting of data fields. > > reproduce sql: > {code:java} > CREATE TABLE `sort_table`( > `id` int, > `name` string > ) > PARTITIONED BY ( > `dt` string) > stored as textfile > LOCATION 'sort_table';CREATE TABLE `test_table`( > `id` int, > `name` string) > PARTITIONED BY ( > `dt` string) > stored as textfile > LOCATION > 'test_table';//gen test data > insert into test_table partition(dt=20221011) select 10,"15" union all select > 1,"10" union all select 5,"50" union all select 20,"2" union all select > 30,"14" ; > set spark.hadoop.hive.exec.dynamici.partition=true > set spark.hadoop.hive.exec.dynamic.partition.mode=nonstrict > insert overwrite table sort_table partition(dt) select id,name,dt from > test_table order by name,dt; > {code} > !image-2022-10-23-11-09-47-759.png! > The Sort operator of DAG has only one sort field, but there are actually two > in SQL. > > It relate this issue : https://issues.apache.org/jira/browse/SPARK-40588 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-40885) Spark will filter out data field sorting when dynamic partitions and data fields are sorted at the same time
ming95 created SPARK-40885: -- Summary: Spark will filter out data field sorting when dynamic partitions and data fields are sorted at the same time Key: SPARK-40885 URL: https://issues.apache.org/jira/browse/SPARK-40885 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.2.2, 3.3.0, 3.1.2 Reporter: ming95 Fix For: 3.4.0 When using dynamic partitions to write data and sort partitions and data fields, Spark will filter the sorting of data fields. reproduce sql: {code:java} CREATE TABLE `sort_table`( `id` int, `name` string ) PARTITIONED BY ( `dt` string) stored as textfile LOCATION 'sort_table';CREATE TABLE `test_table`( `id` int, `name` string) PARTITIONED BY ( `dt` string) stored as textfile LOCATION 'test_table';//gen test data insert into test_table partition(dt=20221011) select 10,"15" union all select 1,"10" union all select 5,"50" union all select 20,"2" union all select 30,"14" ; set spark.hadoop.hive.exec.dynamici.partition=true set spark.hadoop.hive.exec.dynamic.partition.mode=nonstrict insert overwrite table sort_table partition(dt) select id,name,dt from test_table order by name,dt; {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40748) Migrate type check failures of conditions onto error classes
[ https://issues.apache.org/jira/browse/SPARK-40748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17622730#comment-17622730 ] BingKun Pan commented on SPARK-40748: - I work on it. > Migrate type check failures of conditions onto error classes > > > Key: SPARK-40748 > URL: https://issues.apache.org/jira/browse/SPARK-40748 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Max Gekk >Priority: Major > > Replace TypeCheckFailure by DataTypeMismatch in type checks in the > conditional expressions: > 1. If (2): > https://github.com/apache/spark/blob/1431975723d8df30a25b2333eddcfd0bb6c57677/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala#L61-L67 > 2. CaseWhen (2): > https://github.com/apache/spark/blob/1431975723d8df30a25b2333eddcfd0bb6c57677/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala#L175-L183 > 3. InSubquery (2); > https://github.com/apache/spark/blob/1431975723d8df30a25b2333eddcfd0bb6c57677/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/predicates.scala#L378-L396 > 4. In (1): > https://github.com/apache/spark/blob/1431975723d8df30a25b2333eddcfd0bb6c57677/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/predicates.scala#L453 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40588) Sorting issue with AQE turned on
[ https://issues.apache.org/jira/browse/SPARK-40588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17622728#comment-17622728 ] ming95 commented on SPARK-40588: Yes, I found the same problem. This should be a bug in Spark. When the sorting field is the same as the dynamic partitioning field, the sorting of non partitioning fields will be filtered out. > Sorting issue with AQE turned on > -- > > Key: SPARK-40588 > URL: https://issues.apache.org/jira/browse/SPARK-40588 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.1.3 > Environment: Spark v3.1.3 > Scala v2.12.13 >Reporter: Swetha Baskaran >Priority: Major > Attachments: image-2022-10-16-22-05-47-159.png > > > We are attempting to partition data by a few columns, sort by a particular > _sortCol_ and write out one file per partition. > {code:java} > df > .repartition(col("day"), col("month"), col("year")) > .withColumn("partitionId",spark_partition_id) > .withColumn("monotonicallyIncreasingIdUnsorted",monotonicallyIncreasingId) > .sortWithinPartitions("year", "month", "day", "sortCol") > .withColumn("monotonicallyIncreasingIdSorted",monotonicallyIncreasingId) > .write > .partitionBy("year", "month", "day") > .parquet(path){code} > When inspecting the results, we observe one file per partition, however we > see an _alternating_ pattern of unsorted rows in some files. > {code:java} > {"sortCol":10,"partitionId":732,"monotonicallyIncreasingIdUnsorted":6287832121344,"monotonicallyIncreasingIdSorted":6287832121344} > {"sortCol":1303413,"partitionId":732,"monotonicallyIncreasingIdUnsorted":6287877022389,"monotonicallyIncreasingIdSorted":6287876860586} > {"sortCol":10,"partitionId":732,"monotonicallyIncreasingIdUnsorted":6287877567881,"monotonicallyIncreasingIdSorted":6287832121345} > {"sortCol":1303413,"partitionId":732,"monotonicallyIncreasingIdUnsorted":6287835105553,"monotonicallyIncreasingIdSorted":6287876860587} > {"sortCol":10,"partitionId":732,"monotonicallyIncreasingIdUnsorted":6287832570127,"monotonicallyIncreasingIdSorted":6287832121346} > {"sortCol":1303413,"partitionId":732,"monotonicallyIncreasingIdUnsorted":6287879965760,"monotonicallyIncreasingIdSorted":6287876860588} > {"sortCol":10,"partitionId":732,"monotonicallyIncreasingIdUnsorted":6287878762347,"monotonicallyIncreasingIdSorted":6287832121347} > {"sortCol":1303413,"partitionId":732,"monotonicallyIncreasingIdUnsorted":6287837165012,"monotonicallyIncreasingIdSorted":6287876860589} > {"sortCol":10,"partitionId":732,"monotonicallyIncreasingIdUnsorted":6287832910545,"monotonicallyIncreasingIdSorted":6287832121348} > {"sortCol":1303413,"partitionId":732,"monotonicallyIncreasingIdUnsorted":6287881244758,"monotonicallyIncreasingIdSorted":6287876860590} > {"sortCol":10,"partitionId":732,"monotonicallyIncreasingIdUnsorted":6287880041345,"monotonicallyIncreasingIdSorted":6287832121349}{code} > Here is a > [gist|https://gist.github.com/Swebask/543030748a768be92d3c0ae343d2ae89] to > reproduce the issue. > Turning off AQE with spark.conf.set("spark.sql.adaptive.enabled", false) > fixes the issue. > I'm working on identifying why AQE affects the sort order. Any leads or > thoughts would be appreciated! -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40882) Upgrade actions/setup-java to v3 with distribution specified
[ https://issues.apache.org/jira/browse/SPARK-40882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17622720#comment-17622720 ] Apache Spark commented on SPARK-40882: -- User 'Yikun' has created a pull request for this issue: https://github.com/apache/spark/pull/38354 > Upgrade actions/setup-java to v3 with distribution specified > > > Key: SPARK-40882 > URL: https://issues.apache.org/jira/browse/SPARK-40882 > Project: Spark > Issue Type: Sub-task > Components: Project Infra >Affects Versions: 3.4.0 >Reporter: Yikun Jiang >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-40882) Upgrade actions/setup-java to v3 with distribution specified
[ https://issues.apache.org/jira/browse/SPARK-40882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-40882: Assignee: (was: Apache Spark) > Upgrade actions/setup-java to v3 with distribution specified > > > Key: SPARK-40882 > URL: https://issues.apache.org/jira/browse/SPARK-40882 > Project: Spark > Issue Type: Sub-task > Components: Project Infra >Affects Versions: 3.4.0 >Reporter: Yikun Jiang >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40882) Upgrade actions/setup-java to v3 with distribution specified
[ https://issues.apache.org/jira/browse/SPARK-40882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17622719#comment-17622719 ] Apache Spark commented on SPARK-40882: -- User 'Yikun' has created a pull request for this issue: https://github.com/apache/spark/pull/38354 > Upgrade actions/setup-java to v3 with distribution specified > > > Key: SPARK-40882 > URL: https://issues.apache.org/jira/browse/SPARK-40882 > Project: Spark > Issue Type: Sub-task > Components: Project Infra >Affects Versions: 3.4.0 >Reporter: Yikun Jiang >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-40882) Upgrade actions/setup-java to v3 with distribution specified
[ https://issues.apache.org/jira/browse/SPARK-40882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-40882: Assignee: Apache Spark > Upgrade actions/setup-java to v3 with distribution specified > > > Key: SPARK-40882 > URL: https://issues.apache.org/jira/browse/SPARK-40882 > Project: Spark > Issue Type: Sub-task > Components: Project Infra >Affects Versions: 3.4.0 >Reporter: Yikun Jiang >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40881) Upgrade actions/cache to v3 and actions/upload-artifact to v3
[ https://issues.apache.org/jira/browse/SPARK-40881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17622718#comment-17622718 ] Apache Spark commented on SPARK-40881: -- User 'Yikun' has created a pull request for this issue: https://github.com/apache/spark/pull/38353 > Upgrade actions/cache to v3 and actions/upload-artifact to v3 > - > > Key: SPARK-40881 > URL: https://issues.apache.org/jira/browse/SPARK-40881 > Project: Spark > Issue Type: Sub-task > Components: Project Infra >Affects Versions: 3.4.0 >Reporter: Yikun Jiang >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40881) Upgrade actions/cache to v3 and actions/upload-artifact to v3
[ https://issues.apache.org/jira/browse/SPARK-40881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17622717#comment-17622717 ] Apache Spark commented on SPARK-40881: -- User 'Yikun' has created a pull request for this issue: https://github.com/apache/spark/pull/38353 > Upgrade actions/cache to v3 and actions/upload-artifact to v3 > - > > Key: SPARK-40881 > URL: https://issues.apache.org/jira/browse/SPARK-40881 > Project: Spark > Issue Type: Sub-task > Components: Project Infra >Affects Versions: 3.4.0 >Reporter: Yikun Jiang >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-40881) Upgrade actions/cache to v3 and actions/upload-artifact to v3
[ https://issues.apache.org/jira/browse/SPARK-40881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-40881: Assignee: Apache Spark > Upgrade actions/cache to v3 and actions/upload-artifact to v3 > - > > Key: SPARK-40881 > URL: https://issues.apache.org/jira/browse/SPARK-40881 > Project: Spark > Issue Type: Sub-task > Components: Project Infra >Affects Versions: 3.4.0 >Reporter: Yikun Jiang >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-40881) Upgrade actions/cache to v3 and actions/upload-artifact to v3
[ https://issues.apache.org/jira/browse/SPARK-40881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-40881: Assignee: (was: Apache Spark) > Upgrade actions/cache to v3 and actions/upload-artifact to v3 > - > > Key: SPARK-40881 > URL: https://issues.apache.org/jira/browse/SPARK-40881 > Project: Spark > Issue Type: Sub-task > Components: Project Infra >Affects Versions: 3.4.0 >Reporter: Yikun Jiang >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] (SPARK-40867) Flaky test ProtobufCatalystDataConversionSuite
[ https://issues.apache.org/jira/browse/SPARK-40867 ] Sandish Kumar HN deleted comment on SPARK-40867: -- was (Author: sanysand...@gmail.com): this issue got resolved through https://github.com/apache/spark/pull/38286 > Flaky test ProtobufCatalystDataConversionSuite > -- > > Key: SPARK-40867 > URL: https://issues.apache.org/jira/browse/SPARK-40867 > Project: Spark > Issue Type: Bug > Components: Tests >Affects Versions: 3.4.0 >Reporter: Yang Jie >Priority: Major > Fix For: 3.4.0 > > > * > [https://github.com/LuciferYang/spark/actions/runs/3295309311/jobs/5433733419] > * > [https://github.com/LuciferYang/spark/actions/runs/3291252601/jobs/5425183034] > {code:java} > [info] ProtobufCatalystDataConversionSuite: > [info] - single StructType(StructField(int32_type,IntegerType,true)) with > seed 167 *** FAILED *** (39 milliseconds) > [info] Incorrect evaluation (codegen off): from_protobuf(to_protobuf([0], > /home/runner/work/spark/spark/connector/protobuf/target/scala-2.12/test-classes/protobuf/catalyst_types.desc, > IntegerMsg), > /home/runner/work/spark/spark/connector/protobuf/target/scala-2.12/test-classes/protobuf/catalyst_types.desc, > IntegerMsg), actual: [null], expected: [0] (ExpressionEvalHelper.scala:209) > [info] org.scalatest.exceptions.TestFailedException: > [info] at > org.scalatest.Assertions.newAssertionFailedException(Assertions.scala:472) > [info] at > org.scalatest.Assertions.newAssertionFailedException$(Assertions.scala:471) > [info] at > org.scalatest.funsuite.AnyFunSuite.newAssertionFailedException(AnyFunSuite.scala:1564) > [info] at org.scalatest.Assertions.fail(Assertions.scala:933) > [info] at org.scalatest.Assertions.fail$(Assertions.scala:929) > [info] at org.scalatest.funsuite.AnyFunSuite.fail(AnyFunSuite.scala:1564) > [info] at > org.apache.spark.sql.catalyst.expressions.ExpressionEvalHelper.checkEvaluationWithoutCodegen(ExpressionEvalHelper.scala:209) > [info] at > org.apache.spark.sql.catalyst.expressions.ExpressionEvalHelper.checkEvaluationWithoutCodegen$(ExpressionEvalHelper.scala:199) > [info] at > org.apache.spark.sql.protobuf.ProtobufCatalystDataConversionSuite.checkEvaluationWithoutCodegen(ProtobufCatalystDataConversionSuite.scala:33) > [info] at > org.apache.spark.sql.catalyst.expressions.ExpressionEvalHelper.checkEvaluation(ExpressionEvalHelper.scala:87) > [info] at > org.apache.spark.sql.catalyst.expressions.ExpressionEvalHelper.checkEvaluation$(ExpressionEvalHelper.scala:82) > [info] at > org.apache.spark.sql.protobuf.ProtobufCatalystDataConversionSuite.checkEvaluation(ProtobufCatalystDataConversionSuite.scala:33) > [info] at > org.apache.spark.sql.protobuf.ProtobufCatalystDataConversionSuite.checkResult(ProtobufCatalystDataConversionSuite.scala:43) > [info] at > org.apache.spark.sql.protobuf.ProtobufCatalystDataConversionSuite.$anonfun$new$2(ProtobufCatalystDataConversionSuite.scala:122) > [info] at > scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) > [info] at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85) > [info] at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83) > [info] at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104) > [info] at org.scalatest.Transformer.apply(Transformer.scala:22) > [info] at org.scalatest.Transformer.apply(Transformer.scala:20) > [info] at > org.scalatest.funsuite.AnyFunSuiteLike$$anon$1.apply(AnyFunSuiteLike.scala:226) > [info] at > org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:207) > [info] at > org.scalatest.funsuite.AnyFunSuiteLike.invokeWithFixture$1(AnyFunSuiteLike.scala:224) > [info] at > org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$runTest$1(AnyFunSuiteLike.scala:236) > [info] at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306) > [info] at > org.scalatest.funsuite.AnyFunSuiteLike.runTest(AnyFunSuiteLike.scala:236) > [info] at > org.scalatest.funsuite.AnyFunSuiteLike.runTest$(AnyFunSuiteLike.scala:218) > [info] at > org.apache.spark.SparkFunSuite.org$scalatest$BeforeAndAfterEach$$super$runTest(SparkFunSuite.scala:66) > [info] at > org.scalatest.BeforeAndAfterEach.runTest(BeforeAndAfterEach.scala:234) > [info] at > org.scalatest.BeforeAndAfterEach.runTest$(BeforeAndAfterEach.scala:227) > [info] at org.apache.spark.SparkFunSuite.runTest(SparkFunSuite.scala:66) > [info] at > org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$runTests$1(AnyFunSuiteLike.scala:269) > [info] at > org.scalatest.SuperEngine.$anonfun$runTestsInBranch$1(Engine.scala:413) > [info] at scala.collection.immutable.List.foreach(List.scala:431) > [info] at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:401) > [info] at org.scalatest.SuperEngine.runTestsInBranch(Engine.scala:396) > [info]
[jira] [Commented] (SPARK-40867) Flaky test ProtobufCatalystDataConversionSuite
[ https://issues.apache.org/jira/browse/SPARK-40867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17622693#comment-17622693 ] Sandish Kumar HN commented on SPARK-40867: -- [~LuciferYang] this issue got resolved through https://github.com/apache/spark/pull/38286 > Flaky test ProtobufCatalystDataConversionSuite > -- > > Key: SPARK-40867 > URL: https://issues.apache.org/jira/browse/SPARK-40867 > Project: Spark > Issue Type: Bug > Components: Tests >Affects Versions: 3.4.0 >Reporter: Yang Jie >Priority: Major > > * > [https://github.com/LuciferYang/spark/actions/runs/3295309311/jobs/5433733419] > * > [https://github.com/LuciferYang/spark/actions/runs/3291252601/jobs/5425183034] > {code:java} > [info] ProtobufCatalystDataConversionSuite: > [info] - single StructType(StructField(int32_type,IntegerType,true)) with > seed 167 *** FAILED *** (39 milliseconds) > [info] Incorrect evaluation (codegen off): from_protobuf(to_protobuf([0], > /home/runner/work/spark/spark/connector/protobuf/target/scala-2.12/test-classes/protobuf/catalyst_types.desc, > IntegerMsg), > /home/runner/work/spark/spark/connector/protobuf/target/scala-2.12/test-classes/protobuf/catalyst_types.desc, > IntegerMsg), actual: [null], expected: [0] (ExpressionEvalHelper.scala:209) > [info] org.scalatest.exceptions.TestFailedException: > [info] at > org.scalatest.Assertions.newAssertionFailedException(Assertions.scala:472) > [info] at > org.scalatest.Assertions.newAssertionFailedException$(Assertions.scala:471) > [info] at > org.scalatest.funsuite.AnyFunSuite.newAssertionFailedException(AnyFunSuite.scala:1564) > [info] at org.scalatest.Assertions.fail(Assertions.scala:933) > [info] at org.scalatest.Assertions.fail$(Assertions.scala:929) > [info] at org.scalatest.funsuite.AnyFunSuite.fail(AnyFunSuite.scala:1564) > [info] at > org.apache.spark.sql.catalyst.expressions.ExpressionEvalHelper.checkEvaluationWithoutCodegen(ExpressionEvalHelper.scala:209) > [info] at > org.apache.spark.sql.catalyst.expressions.ExpressionEvalHelper.checkEvaluationWithoutCodegen$(ExpressionEvalHelper.scala:199) > [info] at > org.apache.spark.sql.protobuf.ProtobufCatalystDataConversionSuite.checkEvaluationWithoutCodegen(ProtobufCatalystDataConversionSuite.scala:33) > [info] at > org.apache.spark.sql.catalyst.expressions.ExpressionEvalHelper.checkEvaluation(ExpressionEvalHelper.scala:87) > [info] at > org.apache.spark.sql.catalyst.expressions.ExpressionEvalHelper.checkEvaluation$(ExpressionEvalHelper.scala:82) > [info] at > org.apache.spark.sql.protobuf.ProtobufCatalystDataConversionSuite.checkEvaluation(ProtobufCatalystDataConversionSuite.scala:33) > [info] at > org.apache.spark.sql.protobuf.ProtobufCatalystDataConversionSuite.checkResult(ProtobufCatalystDataConversionSuite.scala:43) > [info] at > org.apache.spark.sql.protobuf.ProtobufCatalystDataConversionSuite.$anonfun$new$2(ProtobufCatalystDataConversionSuite.scala:122) > [info] at > scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) > [info] at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85) > [info] at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83) > [info] at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104) > [info] at org.scalatest.Transformer.apply(Transformer.scala:22) > [info] at org.scalatest.Transformer.apply(Transformer.scala:20) > [info] at > org.scalatest.funsuite.AnyFunSuiteLike$$anon$1.apply(AnyFunSuiteLike.scala:226) > [info] at > org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:207) > [info] at > org.scalatest.funsuite.AnyFunSuiteLike.invokeWithFixture$1(AnyFunSuiteLike.scala:224) > [info] at > org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$runTest$1(AnyFunSuiteLike.scala:236) > [info] at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306) > [info] at > org.scalatest.funsuite.AnyFunSuiteLike.runTest(AnyFunSuiteLike.scala:236) > [info] at > org.scalatest.funsuite.AnyFunSuiteLike.runTest$(AnyFunSuiteLike.scala:218) > [info] at > org.apache.spark.SparkFunSuite.org$scalatest$BeforeAndAfterEach$$super$runTest(SparkFunSuite.scala:66) > [info] at > org.scalatest.BeforeAndAfterEach.runTest(BeforeAndAfterEach.scala:234) > [info] at > org.scalatest.BeforeAndAfterEach.runTest$(BeforeAndAfterEach.scala:227) > [info] at org.apache.spark.SparkFunSuite.runTest(SparkFunSuite.scala:66) > [info] at > org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$runTests$1(AnyFunSuiteLike.scala:269) > [info] at > org.scalatest.SuperEngine.$anonfun$runTestsInBranch$1(Engine.scala:413) > [info] at scala.collection.immutable.List.foreach(List.scala:431) > [info] at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:401) > [info] at org.scalatest.SuperE
[jira] [Resolved] (SPARK-40867) Flaky test ProtobufCatalystDataConversionSuite
[ https://issues.apache.org/jira/browse/SPARK-40867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandish Kumar HN resolved SPARK-40867. -- Fix Version/s: 3.4.0 Resolution: Fixed this issue got resolved through https://github.com/apache/spark/pull/38286 > Flaky test ProtobufCatalystDataConversionSuite > -- > > Key: SPARK-40867 > URL: https://issues.apache.org/jira/browse/SPARK-40867 > Project: Spark > Issue Type: Bug > Components: Tests >Affects Versions: 3.4.0 >Reporter: Yang Jie >Priority: Major > Fix For: 3.4.0 > > > * > [https://github.com/LuciferYang/spark/actions/runs/3295309311/jobs/5433733419] > * > [https://github.com/LuciferYang/spark/actions/runs/3291252601/jobs/5425183034] > {code:java} > [info] ProtobufCatalystDataConversionSuite: > [info] - single StructType(StructField(int32_type,IntegerType,true)) with > seed 167 *** FAILED *** (39 milliseconds) > [info] Incorrect evaluation (codegen off): from_protobuf(to_protobuf([0], > /home/runner/work/spark/spark/connector/protobuf/target/scala-2.12/test-classes/protobuf/catalyst_types.desc, > IntegerMsg), > /home/runner/work/spark/spark/connector/protobuf/target/scala-2.12/test-classes/protobuf/catalyst_types.desc, > IntegerMsg), actual: [null], expected: [0] (ExpressionEvalHelper.scala:209) > [info] org.scalatest.exceptions.TestFailedException: > [info] at > org.scalatest.Assertions.newAssertionFailedException(Assertions.scala:472) > [info] at > org.scalatest.Assertions.newAssertionFailedException$(Assertions.scala:471) > [info] at > org.scalatest.funsuite.AnyFunSuite.newAssertionFailedException(AnyFunSuite.scala:1564) > [info] at org.scalatest.Assertions.fail(Assertions.scala:933) > [info] at org.scalatest.Assertions.fail$(Assertions.scala:929) > [info] at org.scalatest.funsuite.AnyFunSuite.fail(AnyFunSuite.scala:1564) > [info] at > org.apache.spark.sql.catalyst.expressions.ExpressionEvalHelper.checkEvaluationWithoutCodegen(ExpressionEvalHelper.scala:209) > [info] at > org.apache.spark.sql.catalyst.expressions.ExpressionEvalHelper.checkEvaluationWithoutCodegen$(ExpressionEvalHelper.scala:199) > [info] at > org.apache.spark.sql.protobuf.ProtobufCatalystDataConversionSuite.checkEvaluationWithoutCodegen(ProtobufCatalystDataConversionSuite.scala:33) > [info] at > org.apache.spark.sql.catalyst.expressions.ExpressionEvalHelper.checkEvaluation(ExpressionEvalHelper.scala:87) > [info] at > org.apache.spark.sql.catalyst.expressions.ExpressionEvalHelper.checkEvaluation$(ExpressionEvalHelper.scala:82) > [info] at > org.apache.spark.sql.protobuf.ProtobufCatalystDataConversionSuite.checkEvaluation(ProtobufCatalystDataConversionSuite.scala:33) > [info] at > org.apache.spark.sql.protobuf.ProtobufCatalystDataConversionSuite.checkResult(ProtobufCatalystDataConversionSuite.scala:43) > [info] at > org.apache.spark.sql.protobuf.ProtobufCatalystDataConversionSuite.$anonfun$new$2(ProtobufCatalystDataConversionSuite.scala:122) > [info] at > scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) > [info] at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85) > [info] at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83) > [info] at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104) > [info] at org.scalatest.Transformer.apply(Transformer.scala:22) > [info] at org.scalatest.Transformer.apply(Transformer.scala:20) > [info] at > org.scalatest.funsuite.AnyFunSuiteLike$$anon$1.apply(AnyFunSuiteLike.scala:226) > [info] at > org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:207) > [info] at > org.scalatest.funsuite.AnyFunSuiteLike.invokeWithFixture$1(AnyFunSuiteLike.scala:224) > [info] at > org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$runTest$1(AnyFunSuiteLike.scala:236) > [info] at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306) > [info] at > org.scalatest.funsuite.AnyFunSuiteLike.runTest(AnyFunSuiteLike.scala:236) > [info] at > org.scalatest.funsuite.AnyFunSuiteLike.runTest$(AnyFunSuiteLike.scala:218) > [info] at > org.apache.spark.SparkFunSuite.org$scalatest$BeforeAndAfterEach$$super$runTest(SparkFunSuite.scala:66) > [info] at > org.scalatest.BeforeAndAfterEach.runTest(BeforeAndAfterEach.scala:234) > [info] at > org.scalatest.BeforeAndAfterEach.runTest$(BeforeAndAfterEach.scala:227) > [info] at org.apache.spark.SparkFunSuite.runTest(SparkFunSuite.scala:66) > [info] at > org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$runTests$1(AnyFunSuiteLike.scala:269) > [info] at > org.scalatest.SuperEngine.$anonfun$runTestsInBranch$1(Engine.scala:413) > [info] at scala.collection.immutable.List.foreach(List.scala:431) > [info] at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:401) > [info] at org.scalat
[jira] [Comment Edited] (SPARK-34827) Support fetching shuffle blocks in batch with i/o encryption
[ https://issues.apache.org/jira/browse/SPARK-34827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17622683#comment-17622683 ] Raj Sharma edited comment on SPARK-34827 at 10/22/22 7:24 PM: -- I like your content. If anyone wants to learn a new course like Vlocity platform developer certification focuses on producing experts who aren't just ready to handle the platform but build solutions to keep their respective companies and their careers ahead of the competition. Go through this link:[https://www.igmguru.com/salesforce/salesforce-vlocity-training-certification|https://www.igmguru.com/salesforce/salesforce-vlocity-training-certification/] was (Author: JIRAUSER297361): I like your content. If anyone wants to learn a new course like [Vlocity platform developer certification|[https://www.igmguru.com/salesforce/salesforce-vlocity-training-certification/]] focuses on producing experts who aren't just ready to handle the platform but build solutions to keep their respective companies and their careers ahead of the competition. > Support fetching shuffle blocks in batch with i/o encryption > > > Key: SPARK-34827 > URL: https://issues.apache.org/jira/browse/SPARK-34827 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0, 3.3.0 >Reporter: Dongjoon Hyun >Priority: Blocker > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-34827) Support fetching shuffle blocks in batch with i/o encryption
[ https://issues.apache.org/jira/browse/SPARK-34827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17622683#comment-17622683 ] Raj Sharma commented on SPARK-34827: I like your content. If anyone wants to learn a new course like [Vlocity platform developer certification|[https://www.igmguru.com/salesforce/salesforce-vlocity-training-certification/]] focuses on producing experts who aren't just ready to handle the platform but build solutions to keep their respective companies and their careers ahead of the competition. > Support fetching shuffle blocks in batch with i/o encryption > > > Key: SPARK-34827 > URL: https://issues.apache.org/jira/browse/SPARK-34827 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0, 3.3.0 >Reporter: Dongjoon Hyun >Priority: Blocker > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] (SPARK-33807) Data Source V2: Remove read specific distributions
[ https://issues.apache.org/jira/browse/SPARK-33807 ] Chao Sun deleted comment on SPARK-33807: -- was (Author: JIRAUSER295111): Very informative and effective post. [Vlocity Platform Developer Certification|[https://www.igmguru.com/salesforce/salesforce-vlocity-training-certification/]] focuses on producing experts who aren't just ready to handle the platform but build solutions to keep their respective companies and their careers ahead of the competition. > Data Source V2: Remove read specific distributions > -- > > Key: SPARK-33807 > URL: https://issues.apache.org/jira/browse/SPARK-33807 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.3.0 >Reporter: Anton Okolnychyi >Priority: Blocker > > We should remove the read-specific distributions for DS V2 as discussed > [here|https://github.com/apache/spark/pull/30706#discussion_r543059827]. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-33807) Data Source V2: Remove read specific distributions
[ https://issues.apache.org/jira/browse/SPARK-33807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17622681#comment-17622681 ] Pankaj Nagla edited comment on SPARK-33807 at 10/22/22 6:36 PM: Very informative and effective post. [Vlocity Platform Developer Certification|[https://www.igmguru.com/salesforce/salesforce-vlocity-training-certification/]] focuses on producing experts who aren't just ready to handle the platform but build solutions to keep their respective companies and their careers ahead of the competition. was (Author: JIRAUSER295111): Very informative and effective post. [Vlocity Platform Developer Certification|[https://www.igmguru.com/salesforce/salesforce-vlocity-training-certification/]] focuses on producing experts who aren't just ready to handle the platform but build solutions to keep their respective companies and their careers ahead of the competition. > Data Source V2: Remove read specific distributions > -- > > Key: SPARK-33807 > URL: https://issues.apache.org/jira/browse/SPARK-33807 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.3.0 >Reporter: Anton Okolnychyi >Priority: Blocker > > We should remove the read-specific distributions for DS V2 as discussed > [here|https://github.com/apache/spark/pull/30706#discussion_r543059827]. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-33807) Data Source V2: Remove read specific distributions
[ https://issues.apache.org/jira/browse/SPARK-33807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17622681#comment-17622681 ] Pankaj Nagla edited comment on SPARK-33807 at 10/22/22 6:35 PM: Very informative and effective post. [Vlocity Platform Developer Certification|[https://www.igmguru.com/salesforce/salesforce-vlocity-training-certification/]] focuses on producing experts who aren't just ready to handle the platform but build solutions to keep their respective companies and their careers ahead of the competition. was (Author: JIRAUSER295111): Very informative and effective post. [Vlocity Platform Developer Certification|[https://www.igmguru.com/salesforce/salesforce-vlocity-training-certification/]] focuses on producing experts who aren't just ready to handle the platform but build solutions to keep their respective companies and their careers ahead of the competition. > Data Source V2: Remove read specific distributions > -- > > Key: SPARK-33807 > URL: https://issues.apache.org/jira/browse/SPARK-33807 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.3.0 >Reporter: Anton Okolnychyi >Priority: Blocker > > We should remove the read-specific distributions for DS V2 as discussed > [here|https://github.com/apache/spark/pull/30706#discussion_r543059827]. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33807) Data Source V2: Remove read specific distributions
[ https://issues.apache.org/jira/browse/SPARK-33807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17622681#comment-17622681 ] Pankaj Nagla commented on SPARK-33807: -- Very informative and effective post. [Vlocity Platform Developer Certification|[https://www.igmguru.com/salesforce/salesforce-vlocity-training-certification/]] focuses on producing experts who aren't just ready to handle the platform but build solutions to keep their respective companies and their careers ahead of the competition. > Data Source V2: Remove read specific distributions > -- > > Key: SPARK-33807 > URL: https://issues.apache.org/jira/browse/SPARK-33807 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.3.0 >Reporter: Anton Okolnychyi >Priority: Blocker > > We should remove the read-specific distributions for DS V2 as discussed > [here|https://github.com/apache/spark/pull/30706#discussion_r543059827]. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40801) Upgrade Apache Commons Text to 1.10
[ https://issues.apache.org/jira/browse/SPARK-40801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17622680#comment-17622680 ] Apache Spark commented on SPARK-40801: -- User 'bjornjorgensen' has created a pull request for this issue: https://github.com/apache/spark/pull/38352 > Upgrade Apache Commons Text to 1.10 > --- > > Key: SPARK-40801 > URL: https://issues.apache.org/jira/browse/SPARK-40801 > Project: Spark > Issue Type: Dependency upgrade > Components: Build >Affects Versions: 3.4.0 >Reporter: Bjørn Jørgensen >Assignee: Bjørn Jørgensen >Priority: Major > Fix For: 3.4.0, 3.3.2 > > > [CVE-2022-42889|https://nvd.nist.gov/vuln/detail/CVE-2022-42889] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-34827) Support fetching shuffle blocks in batch with i/o encryption
[ https://issues.apache.org/jira/browse/SPARK-34827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17622679#comment-17622679 ] Vivek Garg commented on SPARK-34827: I appreciate you sharing this useful information. Very useful and interesting post. [Uipath training|https://www.igmguru.com/machine-learning-ai/rpa-uipath-certification-training/]. > Support fetching shuffle blocks in batch with i/o encryption > > > Key: SPARK-34827 > URL: https://issues.apache.org/jira/browse/SPARK-34827 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0, 3.3.0 >Reporter: Dongjoon Hyun >Priority: Blocker > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-40391) Test the error class UNSUPPORTED_FEATURE.JDBC_TRANSACTION
[ https://issues.apache.org/jira/browse/SPARK-40391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-40391: Assignee: Apache Spark > Test the error class UNSUPPORTED_FEATURE.JDBC_TRANSACTION > - > > Key: SPARK-40391 > URL: https://issues.apache.org/jira/browse/SPARK-40391 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Max Gekk >Assignee: Apache Spark >Priority: Minor > > Add a test for the error class UNSUPPORTED_FEATURE.JDBC_TRANSACTION and place > it to QueryExecutionErrorsSuite. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40391) Test the error class UNSUPPORTED_FEATURE.JDBC_TRANSACTION
[ https://issues.apache.org/jira/browse/SPARK-40391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17622655#comment-17622655 ] Apache Spark commented on SPARK-40391: -- User 'panbingkun' has created a pull request for this issue: https://github.com/apache/spark/pull/38351 > Test the error class UNSUPPORTED_FEATURE.JDBC_TRANSACTION > - > > Key: SPARK-40391 > URL: https://issues.apache.org/jira/browse/SPARK-40391 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Max Gekk >Priority: Minor > > Add a test for the error class UNSUPPORTED_FEATURE.JDBC_TRANSACTION and place > it to QueryExecutionErrorsSuite. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-40391) Test the error class UNSUPPORTED_FEATURE.JDBC_TRANSACTION
[ https://issues.apache.org/jira/browse/SPARK-40391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-40391: Assignee: (was: Apache Spark) > Test the error class UNSUPPORTED_FEATURE.JDBC_TRANSACTION > - > > Key: SPARK-40391 > URL: https://issues.apache.org/jira/browse/SPARK-40391 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Max Gekk >Priority: Minor > > Add a test for the error class UNSUPPORTED_FEATURE.JDBC_TRANSACTION and place > it to QueryExecutionErrorsSuite. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-40856) Update the error template of WRONG_NUM_PARAMS
[ https://issues.apache.org/jira/browse/SPARK-40856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Max Gekk resolved SPARK-40856. -- Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 38319 [https://github.com/apache/spark/pull/38319] > Update the error template of WRONG_NUM_PARAMS > - > > Key: SPARK-40856 > URL: https://issues.apache.org/jira/browse/SPARK-40856 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: BingKun Pan >Assignee: BingKun Pan >Priority: Minor > Fix For: 3.4.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-40856) Update the error template of WRONG_NUM_PARAMS
[ https://issues.apache.org/jira/browse/SPARK-40856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Max Gekk reassigned SPARK-40856: Assignee: BingKun Pan > Update the error template of WRONG_NUM_PARAMS > - > > Key: SPARK-40856 > URL: https://issues.apache.org/jira/browse/SPARK-40856 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: BingKun Pan >Assignee: BingKun Pan >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40588) Sorting issue with AQE turned on
[ https://issues.apache.org/jira/browse/SPARK-40588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17622620#comment-17622620 ] Enrico Minack commented on SPARK-40588: --- Even with AQE enabled (pre Spark 3.4.0), the written files are sorted {*}unless spilling occurs{*}. The reason is that {{FileFormatWriter}} defines a {{requiredOrdering}} as: [https://github.com/apache/spark/blob/f74867bddfbcdd4d08076db36851e88b15e66556/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileFormatWriter.scala#L188-L189] {code:java} val requiredOrdering = partitionColumns ++ writerBucketSpec.map(_.bucketIdExpression) ++ sortColumns {code} Where {{partitionColumns}} refers to {{.write.partitionBy}} and {{sortColumns}} refers to {{.write.sortBy}}, so {{["year", "month", "day"]}} in your case. It enforces that ordering if the DataFrame is not sorted accordingly. With AQE enabled (pre Spark 3.4), {{FileFormatWriter}} does not know about the existing ordering and introduces the sorting. This reads the sorted DataFrame (sorted by {{["year", "month", "day", "sortCol"]}}) and sorts it by {{["year", "month", "day"]}}. If the partition has to be spilled to RAM or disk, it round-robins over the spills, because they are all "equal" w.r.t. {{["year", "month", "day"]}}, as all next data in the spill files of a partition have the same values for these columns (only {{sortCol}} differs, but that is not considered by this sort). Hence the order is broken by spilling. > Sorting issue with AQE turned on > -- > > Key: SPARK-40588 > URL: https://issues.apache.org/jira/browse/SPARK-40588 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.1.3 > Environment: Spark v3.1.3 > Scala v2.12.13 >Reporter: Swetha Baskaran >Priority: Major > Attachments: image-2022-10-16-22-05-47-159.png > > > We are attempting to partition data by a few columns, sort by a particular > _sortCol_ and write out one file per partition. > {code:java} > df > .repartition(col("day"), col("month"), col("year")) > .withColumn("partitionId",spark_partition_id) > .withColumn("monotonicallyIncreasingIdUnsorted",monotonicallyIncreasingId) > .sortWithinPartitions("year", "month", "day", "sortCol") > .withColumn("monotonicallyIncreasingIdSorted",monotonicallyIncreasingId) > .write > .partitionBy("year", "month", "day") > .parquet(path){code} > When inspecting the results, we observe one file per partition, however we > see an _alternating_ pattern of unsorted rows in some files. > {code:java} > {"sortCol":10,"partitionId":732,"monotonicallyIncreasingIdUnsorted":6287832121344,"monotonicallyIncreasingIdSorted":6287832121344} > {"sortCol":1303413,"partitionId":732,"monotonicallyIncreasingIdUnsorted":6287877022389,"monotonicallyIncreasingIdSorted":6287876860586} > {"sortCol":10,"partitionId":732,"monotonicallyIncreasingIdUnsorted":6287877567881,"monotonicallyIncreasingIdSorted":6287832121345} > {"sortCol":1303413,"partitionId":732,"monotonicallyIncreasingIdUnsorted":6287835105553,"monotonicallyIncreasingIdSorted":6287876860587} > {"sortCol":10,"partitionId":732,"monotonicallyIncreasingIdUnsorted":6287832570127,"monotonicallyIncreasingIdSorted":6287832121346} > {"sortCol":1303413,"partitionId":732,"monotonicallyIncreasingIdUnsorted":6287879965760,"monotonicallyIncreasingIdSorted":6287876860588} > {"sortCol":10,"partitionId":732,"monotonicallyIncreasingIdUnsorted":6287878762347,"monotonicallyIncreasingIdSorted":6287832121347} > {"sortCol":1303413,"partitionId":732,"monotonicallyIncreasingIdUnsorted":6287837165012,"monotonicallyIncreasingIdSorted":6287876860589} > {"sortCol":10,"partitionId":732,"monotonicallyIncreasingIdUnsorted":6287832910545,"monotonicallyIncreasingIdSorted":6287832121348} > {"sortCol":1303413,"partitionId":732,"monotonicallyIncreasingIdUnsorted":6287881244758,"monotonicallyIncreasingIdSorted":6287876860590} > {"sortCol":10,"partitionId":732,"monotonicallyIncreasingIdUnsorted":6287880041345,"monotonicallyIncreasingIdSorted":6287832121349}{code} > Here is a > [gist|https://gist.github.com/Swebask/543030748a768be92d3c0ae343d2ae89] to > reproduce the issue. > Turning off AQE with spark.conf.set("spark.sql.adaptive.enabled", false) > fixes the issue. > I'm working on identifying why AQE affects the sort order. Any leads or > thoughts would be appreciated! -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-40588) Sorting issue with AQE turned on
[ https://issues.apache.org/jira/browse/SPARK-40588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17621032#comment-17621032 ] Enrico Minack edited comment on SPARK-40588 at 10/22/22 1:02 PM: - Here is a more concise and complete example to reproduce this issue: {code:scala} import org.apache.spark.sql.SaveMode spark.conf.set("spark.sql.adaptive.enabled", true) val ids = 100 val days = 2 val parts = 2 val ds = spark.range(0, days, 1, parts).withColumnRenamed("id", "day") .join(spark.range(0, ids, 1, parts)) ds.repartition($"day") .sortWithinPartitions($"day", $"id") .write .partitionBy("day") .mode(SaveMode.Overwrite) .csv("interleaved.csv") {code} Check the written files are sorted (states {{OK}} when file is sorted): {code:bash} for file in interleaved.csv/day\=*/part-* do echo "$(sort -n "$file" | md5sum | cut -d " " -f 1) $file" done | md5sum -c {code} Files are not sorted for Spark 3.0.x, 3.1.x, 3.2.x and 3.3.x. Current master (3.4.0) seems to be fixed. was (Author: enricomi): Here is a more concise and complete example to reproduce this issue: {code:scala} import org.apache.spark.sql.SaveMode spark.conf.set("spark.sql.adaptive.enabled", true) val ids = 1000 val days = 10 val ds = spark.range(days).withColumnRenamed("id", "day").join(spark.range(ids)) // days * 10 is required, as well as a sufficiently large value for ids (10m) and day (10) ds.repartition(days * 10, $"day") .sortWithinPartitions($"day", $"id") .write .partitionBy("day") .mode(SaveMode.Overwrite) .csv("interleaved.csv") {code} Check the written files are sorted (states {{OK}} when file is sorted): {code:bash} for file in interleaved.csv/day\=*/part-* do echo "$(sort -n "$file" | md5sum | cut -d " " -f 1) $file" done | md5sum -c {code} Files are not sorted for Spark 3.0.x, 3.1.x, 3.2.x and 3.3.x. Current master (3.4.0) seems to be fixed. > Sorting issue with AQE turned on > -- > > Key: SPARK-40588 > URL: https://issues.apache.org/jira/browse/SPARK-40588 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.1.3 > Environment: Spark v3.1.3 > Scala v2.12.13 >Reporter: Swetha Baskaran >Priority: Major > Attachments: image-2022-10-16-22-05-47-159.png > > > We are attempting to partition data by a few columns, sort by a particular > _sortCol_ and write out one file per partition. > {code:java} > df > .repartition(col("day"), col("month"), col("year")) > .withColumn("partitionId",spark_partition_id) > .withColumn("monotonicallyIncreasingIdUnsorted",monotonicallyIncreasingId) > .sortWithinPartitions("year", "month", "day", "sortCol") > .withColumn("monotonicallyIncreasingIdSorted",monotonicallyIncreasingId) > .write > .partitionBy("year", "month", "day") > .parquet(path){code} > When inspecting the results, we observe one file per partition, however we > see an _alternating_ pattern of unsorted rows in some files. > {code:java} > {"sortCol":10,"partitionId":732,"monotonicallyIncreasingIdUnsorted":6287832121344,"monotonicallyIncreasingIdSorted":6287832121344} > {"sortCol":1303413,"partitionId":732,"monotonicallyIncreasingIdUnsorted":6287877022389,"monotonicallyIncreasingIdSorted":6287876860586} > {"sortCol":10,"partitionId":732,"monotonicallyIncreasingIdUnsorted":6287877567881,"monotonicallyIncreasingIdSorted":6287832121345} > {"sortCol":1303413,"partitionId":732,"monotonicallyIncreasingIdUnsorted":6287835105553,"monotonicallyIncreasingIdSorted":6287876860587} > {"sortCol":10,"partitionId":732,"monotonicallyIncreasingIdUnsorted":6287832570127,"monotonicallyIncreasingIdSorted":6287832121346} > {"sortCol":1303413,"partitionId":732,"monotonicallyIncreasingIdUnsorted":6287879965760,"monotonicallyIncreasingIdSorted":6287876860588} > {"sortCol":10,"partitionId":732,"monotonicallyIncreasingIdUnsorted":6287878762347,"monotonicallyIncreasingIdSorted":6287832121347} > {"sortCol":1303413,"partitionId":732,"monotonicallyIncreasingIdUnsorted":6287837165012,"monotonicallyIncreasingIdSorted":6287876860589} > {"sortCol":10,"partitionId":732,"monotonicallyIncreasingIdUnsorted":6287832910545,"monotonicallyIncreasingIdSorted":6287832121348} > {"sortCol":1303413,"partitionId":732,"monotonicallyIncreasingIdUnsorted":6287881244758,"monotonicallyIncreasingIdSorted":6287876860590} > {"sortCol":10,"partitionId":732,"monotonicallyIncreasingIdUnsorted":6287880041345,"monotonicallyIncreasingIdSorted":6287832121349}{code} > Here is a > [gist|https://gist.github.com/Swebask/543030748a768be92d3c0ae343d2ae89] to > reproduce the issue. > Turning off AQE with spark.conf.set("spark.sql.adaptive.enabled", false) > fixes the issue. > I'm working on identifying why AQE affects the sort ord
[jira] [Assigned] (SPARK-40752) Migrate type check failures of misc expressions onto error classes
[ https://issues.apache.org/jira/browse/SPARK-40752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-40752: Assignee: (was: Apache Spark) > Migrate type check failures of misc expressions onto error classes > -- > > Key: SPARK-40752 > URL: https://issues.apache.org/jira/browse/SPARK-40752 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Max Gekk >Priority: Major > > Replace TypeCheckFailure by DataTypeMismatch in type checks in the misc > expressions: > 1. Coalesce(1) > https://github.com/apache/spark/blob/1431975723d8df30a25b2333eddcfd0bb6c57677/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/nullExpressions.scala#L60 > 2. SortOrder(1) > https://github.com/apache/spark/blob/1431975723d8df30a25b2333eddcfd0bb6c57677/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/SortOrder.scala#L75 > 3. UnwrapUDT(1): > https://github.com/apache/spark/blob/1431975723d8df30a25b2333eddcfd0bb6c57677/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/UnwrapUDT.scala#L36 > 4. ParseUrl(1): > https://github.com/apache/spark/blob/1431975723d8df30a25b2333eddcfd0bb6c57677/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/urlExpressions.scala#L185 > 5. XPathExtract(1) > https://github.com/apache/spark/blob/a241256ed0778005245253fb147db8a16105f75c/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/xml/xpath.scala#L45 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-40752) Migrate type check failures of misc expressions onto error classes
[ https://issues.apache.org/jira/browse/SPARK-40752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-40752: Assignee: Apache Spark > Migrate type check failures of misc expressions onto error classes > -- > > Key: SPARK-40752 > URL: https://issues.apache.org/jira/browse/SPARK-40752 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Max Gekk >Assignee: Apache Spark >Priority: Major > > Replace TypeCheckFailure by DataTypeMismatch in type checks in the misc > expressions: > 1. Coalesce(1) > https://github.com/apache/spark/blob/1431975723d8df30a25b2333eddcfd0bb6c57677/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/nullExpressions.scala#L60 > 2. SortOrder(1) > https://github.com/apache/spark/blob/1431975723d8df30a25b2333eddcfd0bb6c57677/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/SortOrder.scala#L75 > 3. UnwrapUDT(1): > https://github.com/apache/spark/blob/1431975723d8df30a25b2333eddcfd0bb6c57677/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/UnwrapUDT.scala#L36 > 4. ParseUrl(1): > https://github.com/apache/spark/blob/1431975723d8df30a25b2333eddcfd0bb6c57677/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/urlExpressions.scala#L185 > 5. XPathExtract(1) > https://github.com/apache/spark/blob/a241256ed0778005245253fb147db8a16105f75c/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/xml/xpath.scala#L45 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40752) Migrate type check failures of misc expressions onto error classes
[ https://issues.apache.org/jira/browse/SPARK-40752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17622611#comment-17622611 ] Apache Spark commented on SPARK-40752: -- User 'panbingkun' has created a pull request for this issue: https://github.com/apache/spark/pull/38350 > Migrate type check failures of misc expressions onto error classes > -- > > Key: SPARK-40752 > URL: https://issues.apache.org/jira/browse/SPARK-40752 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Max Gekk >Priority: Major > > Replace TypeCheckFailure by DataTypeMismatch in type checks in the misc > expressions: > 1. Coalesce(1) > https://github.com/apache/spark/blob/1431975723d8df30a25b2333eddcfd0bb6c57677/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/nullExpressions.scala#L60 > 2. SortOrder(1) > https://github.com/apache/spark/blob/1431975723d8df30a25b2333eddcfd0bb6c57677/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/SortOrder.scala#L75 > 3. UnwrapUDT(1): > https://github.com/apache/spark/blob/1431975723d8df30a25b2333eddcfd0bb6c57677/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/UnwrapUDT.scala#L36 > 4. ParseUrl(1): > https://github.com/apache/spark/blob/1431975723d8df30a25b2333eddcfd0bb6c57677/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/urlExpressions.scala#L185 > 5. XPathExtract(1) > https://github.com/apache/spark/blob/a241256ed0778005245253fb147db8a16105f75c/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/xml/xpath.scala#L45 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-39404) Unable to query _metadata in streaming if getBatch returns multiple logical nodes in the DataFrame
[ https://issues.apache.org/jira/browse/SPARK-39404?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jungtaek Lim updated SPARK-39404: - Fix Version/s: 3.3.2 > Unable to query _metadata in streaming if getBatch returns multiple logical > nodes in the DataFrame > -- > > Key: SPARK-39404 > URL: https://issues.apache.org/jira/browse/SPARK-39404 > Project: Spark > Issue Type: Bug > Components: Structured Streaming >Affects Versions: 3.2.1 >Reporter: Yaohua Zhao >Assignee: Yaohua Zhao >Priority: Major > Fix For: 3.4.0, 3.3.2 > > > Here: > [https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/MicroBatchExecution.scala#L585] > > We should probably `transform` instead of `match` -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-40808) Infer schema for CSV files - wrong behavior using header + merge schema
[ https://issues.apache.org/jira/browse/SPARK-40808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17622594#comment-17622594 ] ming95 edited comment on SPARK-40808 at 10/22/22 9:39 AM: -- [~ohadm] In spark , infer csv schema will skip first line when set header option is true. But only the header of one file will be regarded as the real first line, which means that if there are two files with different headers, the header of one file will be used as data to infer schema. In this case , You can keep all files with the same header to pass unit test4. {code:java} //file2.csv "int_col","string_col","double_col","int2_col" 12,"hello2",1.432 22,"world2",5.5342 32,"my name2",86.4552 42,"is ohad2",6.2342 {code} Read the csv directory, it is reasonable to assume that all files in the directory have the same schema by default. If there are no other doubts, i will mark this issue as resolved. was (Author: zing): [~ohadm] In spark , infer csv schema will skip first line when set header option is true. But only the header of one file will be regarded as the real first line, which means that if there are two files with different headers, the header of one file will be used as data to infer schema. In this case , You can keep all files with the same header to pass unit test4. {code:java} //file2.csv "int_col","string_col","double_col","int2_col" 12,"hello2",1.432 22,"world2",5.5342 32,"my name2",86.4552 42,"is ohad2",6.2342 {code} > Infer schema for CSV files - wrong behavior using header + merge schema > --- > > Key: SPARK-40808 > URL: https://issues.apache.org/jira/browse/SPARK-40808 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.2 >Reporter: ohad >Priority: Major > Labels: CSVReader, csv, csvparser > Attachments: test_csv.py > > > Hello. > I am writing unit-tests to some functionality in my application that reading > data from CSV files using Spark. > I am reading the data using: > {code:java} > header=True > mergeSchema=True > inferSchema=True{code} > When I am reading this single file: > {code:java} > File1: > "int_col","string_col","decimal_col","date_col" > 1,"hello",1.43,2022-02-23 > 2,"world",5.534,2021-05-05 > 3,"my name",86.455,2011-08-15 > 4,"is ohad",6.234,2002-03-22{code} > I am getting this schema: > {code:java} > int_col=int > string_col=string > decimal_col=double > date_col=string{code} > When I am duplicating this file, I am getting the same schema. > The strange part is when I am adding new int column, it looks like spark is > getting confused and think that the column that already identified as int are > now string: > {code:java} > File1: > "int_col","string_col","decimal_col","date_col" > 1,"hello",1.43,2022-02-23 > 2,"world",5.534,2021-05-05 > 3,"my name",86.455,2011-08-15 > 4,"is ohad",6.234,2002-03-22 > File2: > "int_col","string_col","decimal_col","date_col","int2_col" > 1,"hello",1.43,2022-02-23,234 > 2,"world",5.534,2021-05-05,5 > 3,"my name",86.455,2011-08-15,32 > 4,"is ohad",6.234,2002-03-22,2 > {code} > result: > {code:java} > int_col=string > string_col=string > decimal_col=string > date_col=string > int2_col=int{code} > When I am reading only the second file, it looks fine: > {code:java} > File2: > "int_col","string_col","decimal_col","date_col","int2_col" > 1,"hello",1.43,2022-02-23,234 > 2,"world",5.534,2021-05-05,5 > 3,"my name",86.455,2011-08-15,32 > 4,"is ohad",6.234,2002-03-22,2{code} > result: > {code:java} > int_col=int > string_col=string > decimal_col=double > date_col=string > int2_col=int{code} > For conclusion, it looks like there is a bug mixing the two features: header > recognition and merge schema. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40808) Infer schema for CSV files - wrong behavior using header + merge schema
[ https://issues.apache.org/jira/browse/SPARK-40808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17622594#comment-17622594 ] ming95 commented on SPARK-40808: [~ohadm] In spark , infer csv schema will skip first line when set header option is true. But only the header of one file will be regarded as the real first line, which means that if there are two files with different headers, the header of one file will be used as data to infer schema. In this case , You can keep all files with the same header to pass unit test4. {code:java} //file2.csv "int_col","string_col","double_col","int2_col" 12,"hello2",1.432 22,"world2",5.5342 32,"my name2",86.4552 42,"is ohad2",6.2342 {code} > Infer schema for CSV files - wrong behavior using header + merge schema > --- > > Key: SPARK-40808 > URL: https://issues.apache.org/jira/browse/SPARK-40808 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.2 >Reporter: ohad >Priority: Major > Labels: CSVReader, csv, csvparser > Attachments: test_csv.py > > > Hello. > I am writing unit-tests to some functionality in my application that reading > data from CSV files using Spark. > I am reading the data using: > {code:java} > header=True > mergeSchema=True > inferSchema=True{code} > When I am reading this single file: > {code:java} > File1: > "int_col","string_col","decimal_col","date_col" > 1,"hello",1.43,2022-02-23 > 2,"world",5.534,2021-05-05 > 3,"my name",86.455,2011-08-15 > 4,"is ohad",6.234,2002-03-22{code} > I am getting this schema: > {code:java} > int_col=int > string_col=string > decimal_col=double > date_col=string{code} > When I am duplicating this file, I am getting the same schema. > The strange part is when I am adding new int column, it looks like spark is > getting confused and think that the column that already identified as int are > now string: > {code:java} > File1: > "int_col","string_col","decimal_col","date_col" > 1,"hello",1.43,2022-02-23 > 2,"world",5.534,2021-05-05 > 3,"my name",86.455,2011-08-15 > 4,"is ohad",6.234,2002-03-22 > File2: > "int_col","string_col","decimal_col","date_col","int2_col" > 1,"hello",1.43,2022-02-23,234 > 2,"world",5.534,2021-05-05,5 > 3,"my name",86.455,2011-08-15,32 > 4,"is ohad",6.234,2002-03-22,2 > {code} > result: > {code:java} > int_col=string > string_col=string > decimal_col=string > date_col=string > int2_col=int{code} > When I am reading only the second file, it looks fine: > {code:java} > File2: > "int_col","string_col","decimal_col","date_col","int2_col" > 1,"hello",1.43,2022-02-23,234 > 2,"world",5.534,2021-05-05,5 > 3,"my name",86.455,2011-08-15,32 > 4,"is ohad",6.234,2002-03-22,2{code} > result: > {code:java} > int_col=int > string_col=string > decimal_col=double > date_col=string > int2_col=int{code} > For conclusion, it looks like there is a bug mixing the two features: header > recognition and merge schema. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40884) Upgrade fabric8io - kubernetes-client to 6.2.0
[ https://issues.apache.org/jira/browse/SPARK-40884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17622582#comment-17622582 ] Apache Spark commented on SPARK-40884: -- User 'bjornjorgensen' has created a pull request for this issue: https://github.com/apache/spark/pull/38348 > Upgrade fabric8io - kubernetes-client to 6.2.0 > -- > > Key: SPARK-40884 > URL: https://issues.apache.org/jira/browse/SPARK-40884 > Project: Spark > Issue Type: Dependency upgrade > Components: Build >Affects Versions: 3.4.0 >Reporter: Bjørn Jørgensen >Priority: Major > > [Release > notes|https://github.com/fabric8io/kubernetes-client/releases/tag/v6.2.0] > [Snakeyaml version should be updated to mitigate > CVE-2022-28857|https://github.com/fabric8io/kubernetes-client/issues/4383] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-40884) Upgrade fabric8io - kubernetes-client to 6.2.0
[ https://issues.apache.org/jira/browse/SPARK-40884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-40884: Assignee: Apache Spark > Upgrade fabric8io - kubernetes-client to 6.2.0 > -- > > Key: SPARK-40884 > URL: https://issues.apache.org/jira/browse/SPARK-40884 > Project: Spark > Issue Type: Dependency upgrade > Components: Build >Affects Versions: 3.4.0 >Reporter: Bjørn Jørgensen >Assignee: Apache Spark >Priority: Major > > [Release > notes|https://github.com/fabric8io/kubernetes-client/releases/tag/v6.2.0] > [Snakeyaml version should be updated to mitigate > CVE-2022-28857|https://github.com/fabric8io/kubernetes-client/issues/4383] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-40884) Upgrade fabric8io - kubernetes-client to 6.2.0
[ https://issues.apache.org/jira/browse/SPARK-40884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-40884: Assignee: (was: Apache Spark) > Upgrade fabric8io - kubernetes-client to 6.2.0 > -- > > Key: SPARK-40884 > URL: https://issues.apache.org/jira/browse/SPARK-40884 > Project: Spark > Issue Type: Dependency upgrade > Components: Build >Affects Versions: 3.4.0 >Reporter: Bjørn Jørgensen >Priority: Major > > [Release > notes|https://github.com/fabric8io/kubernetes-client/releases/tag/v6.2.0] > [Snakeyaml version should be updated to mitigate > CVE-2022-28857|https://github.com/fabric8io/kubernetes-client/issues/4383] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40884) Upgrade fabric8io - kubernetes-client to 6.2.0
[ https://issues.apache.org/jira/browse/SPARK-40884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17622581#comment-17622581 ] Apache Spark commented on SPARK-40884: -- User 'bjornjorgensen' has created a pull request for this issue: https://github.com/apache/spark/pull/38348 > Upgrade fabric8io - kubernetes-client to 6.2.0 > -- > > Key: SPARK-40884 > URL: https://issues.apache.org/jira/browse/SPARK-40884 > Project: Spark > Issue Type: Dependency upgrade > Components: Build >Affects Versions: 3.4.0 >Reporter: Bjørn Jørgensen >Priority: Major > > [Release > notes|https://github.com/fabric8io/kubernetes-client/releases/tag/v6.2.0] > [Snakeyaml version should be updated to mitigate > CVE-2022-28857|https://github.com/fabric8io/kubernetes-client/issues/4383] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-40884) Upgrade fabric8io - kubernetes-client to 6.2.0
Bjørn Jørgensen created SPARK-40884: --- Summary: Upgrade fabric8io - kubernetes-client to 6.2.0 Key: SPARK-40884 URL: https://issues.apache.org/jira/browse/SPARK-40884 Project: Spark Issue Type: Dependency upgrade Components: Build Affects Versions: 3.4.0 Reporter: Bjørn Jørgensen [Release notes|https://github.com/fabric8io/kubernetes-client/releases/tag/v6.2.0] [Snakeyaml version should be updated to mitigate CVE-2022-28857|https://github.com/fabric8io/kubernetes-client/issues/4383] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40883) Support Range in Connect proto
[ https://issues.apache.org/jira/browse/SPARK-40883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17622576#comment-17622576 ] Apache Spark commented on SPARK-40883: -- User 'amaliujia' has created a pull request for this issue: https://github.com/apache/spark/pull/38347 > Support Range in Connect proto > -- > > Key: SPARK-40883 > URL: https://issues.apache.org/jira/browse/SPARK-40883 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Rui Wang >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40883) Support Range in Connect proto
[ https://issues.apache.org/jira/browse/SPARK-40883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17622575#comment-17622575 ] Apache Spark commented on SPARK-40883: -- User 'amaliujia' has created a pull request for this issue: https://github.com/apache/spark/pull/38347 > Support Range in Connect proto > -- > > Key: SPARK-40883 > URL: https://issues.apache.org/jira/browse/SPARK-40883 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Rui Wang >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-40883) Support Range in Connect proto
[ https://issues.apache.org/jira/browse/SPARK-40883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-40883: Assignee: Apache Spark > Support Range in Connect proto > -- > > Key: SPARK-40883 > URL: https://issues.apache.org/jira/browse/SPARK-40883 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Rui Wang >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-40883) Support Range in Connect proto
[ https://issues.apache.org/jira/browse/SPARK-40883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-40883: Assignee: (was: Apache Spark) > Support Range in Connect proto > -- > > Key: SPARK-40883 > URL: https://issues.apache.org/jira/browse/SPARK-40883 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Rui Wang >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-40883) Support Range in Connect proto
Rui Wang created SPARK-40883: Summary: Support Range in Connect proto Key: SPARK-40883 URL: https://issues.apache.org/jira/browse/SPARK-40883 Project: Spark Issue Type: Sub-task Components: Connect Affects Versions: 3.4.0 Reporter: Rui Wang -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-40881) Upgrade actions/cache to v3 and actions/upload-artifact to v3
[ https://issues.apache.org/jira/browse/SPARK-40881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yikun Jiang updated SPARK-40881: Summary: Upgrade actions/cache to v3 and actions/upload-artifact to v3 (was: Upgrade actions/cache and actions/upload-artifact) > Upgrade actions/cache to v3 and actions/upload-artifact to v3 > - > > Key: SPARK-40881 > URL: https://issues.apache.org/jira/browse/SPARK-40881 > Project: Spark > Issue Type: Sub-task > Components: Project Infra >Affects Versions: 3.4.0 >Reporter: Yikun Jiang >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-40882) Upgrade actions/setup-java to v3 with distribution specified
Yikun Jiang created SPARK-40882: --- Summary: Upgrade actions/setup-java to v3 with distribution specified Key: SPARK-40882 URL: https://issues.apache.org/jira/browse/SPARK-40882 Project: Spark Issue Type: Sub-task Components: Project Infra Affects Versions: 3.4.0 Reporter: Yikun Jiang -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-40881) Upgrade actions/cache and actions/upload-artifact
Yikun Jiang created SPARK-40881: --- Summary: Upgrade actions/cache and actions/upload-artifact Key: SPARK-40881 URL: https://issues.apache.org/jira/browse/SPARK-40881 Project: Spark Issue Type: Sub-task Components: Project Infra Affects Versions: 3.4.0 Reporter: Yikun Jiang -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-40871) Upgrade actions/script to v6 and fix notify workflow
[ https://issues.apache.org/jira/browse/SPARK-40871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yikun Jiang updated SPARK-40871: Summary: Upgrade actions/script to v6 and fix notify workflow (was: Upgrade actions/script to v6) > Upgrade actions/script to v6 and fix notify workflow > > > Key: SPARK-40871 > URL: https://issues.apache.org/jira/browse/SPARK-40871 > Project: Spark > Issue Type: Sub-task > Components: Project Infra >Affects Versions: 3.4.0 >Reporter: Yikun Jiang >Assignee: Yikun Jiang >Priority: Major > Fix For: 3.4.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-40870) Upgrade docker actions to cleanup warning
[ https://issues.apache.org/jira/browse/SPARK-40870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yikun Jiang reassigned SPARK-40870: --- Assignee: Yikun Jiang > Upgrade docker actions to cleanup warning > - > > Key: SPARK-40870 > URL: https://issues.apache.org/jira/browse/SPARK-40870 > Project: Spark > Issue Type: Sub-task > Components: Project Infra >Affects Versions: 3.4.0 >Reporter: Yikun Jiang >Assignee: Yikun Jiang >Priority: Major > > docker/setup-qemu-action@v2 > docker/setup-buildx-action@v2 > docker/build-push-action@v3 > docker/login-action@v2 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-40870) Upgrade docker actions to cleanup warning
[ https://issues.apache.org/jira/browse/SPARK-40870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yikun Jiang resolved SPARK-40870. - Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 38342 [https://github.com/apache/spark/pull/38342] > Upgrade docker actions to cleanup warning > - > > Key: SPARK-40870 > URL: https://issues.apache.org/jira/browse/SPARK-40870 > Project: Spark > Issue Type: Sub-task > Components: Project Infra >Affects Versions: 3.4.0 >Reporter: Yikun Jiang >Assignee: Yikun Jiang >Priority: Major > Fix For: 3.4.0 > > > docker/setup-qemu-action@v2 > docker/setup-buildx-action@v2 > docker/build-push-action@v3 > docker/login-action@v2 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org