[ https://issues.apache.org/jira/browse/SPARK-44512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17746715#comment-17746715 ]
Yiu-Chung Lee commented on SPARK-44512: --------------------------------------- No. After testing another production data, spark.sql.optimizer.plannedWrite.enabled=false does not solve the problem either. > dataset.sort.select.write.partitionBy does not return a sorted output > --------------------------------------------------------------------- > > Key: SPARK-44512 > URL: https://issues.apache.org/jira/browse/SPARK-44512 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 3.4.1 > Reporter: Yiu-Chung Lee > Priority: Major > Labels: correctness > > (In this example the dataset is of type Tuple3, and the columns are named _1, > _2 and _3) > > I found then when AQE is enabled, the following code does not produce sorted > output (.drop() also have the same problem) > {{dataset.sort("_1")}} > {{.select("_2", "_3")}} > {{.write()}} > {{.partitionBy("_2")}} > {{.text("output");}} > > However, if I insert an identity mapper between select and write, the output > would be sorted as expected. > {{dataset = dataset.sort("_1")}} > {{.select("_2", "_3");}} > {{dataset.map((MapFunction<Row, Row>) row -> row, dataset.encoder())}} > {{.write()}} > {{{}.{}}}{{{}partitionBy("_2"){}}} > {{.text("output")}} > Below is the complete code that reproduces the problem. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org