[ https://issues.apache.org/jira/browse/SPARK-30408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
APeng Zhang updated SPARK-30408: -------------------------------- Description: OrderBy in sortBy clause will be removed by EliminateSorts. code to reproduce: {code:java} val dataset = Seq( ("a", 1, 4), ("b", 2, 5), ("c", 3, 6) ).toDF("a", "b", "c") val groupData = dataset.orderBy("b") val sortData = groupData.sortWithinPartitions("c") {code} The content of groupData is: {code:java} partition 0: [a,1,4] partition 1: [b,2,5] partition 2: [c,3,6]{code} The content of sortData is: {code:java} partition 0: [a,1,4] partition 1: [b,2,5], [c,3,6]{code} UT to cover this defect: In EliminateSortsSuite.scala {code:java} test("should not remove orderBy in sortBy clause") { val plan = testRelation.orderBy('a.asc).sortBy('b.desc) val optimized = Optimize.execute(plan.analyze) val correctAnswer = testRelation.orderBy('a.asc).sortBy('b.desc).analyze comparePlans(optimized, correctAnswer) }{code} This test will be failed because sortBy was removed by EliminateSorts. was: OrderBy in sortBy clause will be removed by EliminateSorts. code to reproduce: {code:java} val dataset = Seq( ("a", 1, 4), ("b", 2, 5), ("c", 3, 6) ).toDF("a", "b", "c") val groupData = dataset.orderBy("b") val sortData = groupData.sortWithinPartitions("c") {code} The content of groupData is: {code:java} partition 0: [a,1,4] partition 1: [b,2,5] partition 2: [c,3,6]{code} The content of sortData is: {code:java} partition 0: [a,1,4] partition 1: [b,2,5], [c,3,6]{code} UT to cover this defect: In EliminateSortsSuite.scala {code:java} test("should not remove orderBy in sortBy clause") { val plan = testRelation.orderBy('a.asc).sortBy('b.desc) val optimized = Optimize.execute(plan.analyze) val correctAnswer = testRelation.orderBy('a.asc).sortBy('b.desc).analyze comparePlans(optimized, correctAnswer) }{code} This test will be failed because sortBy was removed by EliminateSorts. > orderBy in sortBy clause is removed by EliminateSorts > ----------------------------------------------------- > > Key: SPARK-30408 > URL: https://issues.apache.org/jira/browse/SPARK-30408 > Project: Spark > Issue Type: Bug > Components: Optimizer > Affects Versions: 2.4.0, 2.4.1, 2.4.2, 2.4.3, 2.4.4 > Reporter: APeng Zhang > Priority: Major > > OrderBy in sortBy clause will be removed by EliminateSorts. > code to reproduce: > {code:java} > val dataset = Seq( ("a", 1, 4), ("b", 2, 5), ("c", 3, 6) ).toDF("a", "b", > "c") > val groupData = dataset.orderBy("b") > val sortData = groupData.sortWithinPartitions("c") > {code} > The content of groupData is: > {code:java} > partition 0: > [a,1,4] > partition 1: > [b,2,5] > partition 2: > [c,3,6]{code} > The content of sortData is: > {code:java} > partition 0: > [a,1,4] > partition 1: > [b,2,5], > [c,3,6]{code} > > UT to cover this defect: > In EliminateSortsSuite.scala > {code:java} > test("should not remove orderBy in sortBy clause") { > val plan = testRelation.orderBy('a.asc).sortBy('b.desc) > val optimized = Optimize.execute(plan.analyze) > val correctAnswer = testRelation.orderBy('a.asc).sortBy('b.desc).analyze > comparePlans(optimized, correctAnswer) > }{code} > > > This test will be failed because sortBy was removed by EliminateSorts. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org