[ https://issues.apache.org/jira/browse/SPARK-2967?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Michael Armbrust reassigned SPARK-2967: --------------------------------------- Assignee: Michael Armbrust > Several SQL unit test failed when sort-based shuffle is enabled > --------------------------------------------------------------- > > Key: SPARK-2967 > URL: https://issues.apache.org/jira/browse/SPARK-2967 > Project: Spark > Issue Type: Bug > Affects Versions: 1.1.0 > Reporter: Saisai Shao > Assignee: Michael Armbrust > Priority: Critical > > Several SQLQuerySuite unit test failed when sort-based shuffle is enabled. > Seems SQL test uses GenericMutableRow which will make ExternalSorter's > internal buffer all refered to the same object finally because of object's > mutability. Seems row should be copied when feeding into ExternalSorter. > The error shows below, though have many failures, I only pasted part of them: > {noformat} > SQLQuerySuite: > - SPARK-2041 column name equals tablename > - SPARK-2407 Added Parser of SQL SUBSTR() > - index into array > - left semi greater than predicate > - index into array of arrays > - agg *** FAILED *** > Results do not match for query: > Aggregate ['a], ['a,SUM('b) AS c1#38] > UnresolvedRelation None, testData2, None > > == Analyzed Plan == > Aggregate [a#4], [a#4,SUM(CAST(b#5, LongType)) AS c1#38L] > SparkLogicalPlan (ExistingRdd [a#4,b#5], MapPartitionsRDD[7] at > mapPartitions at basicOperators.scala:215) > > == Physical Plan == > Aggregate false, [a#4], [a#4,SUM(PartialSum#40L) AS c1#38L] > Exchange (HashPartitioning [a#4], 200) > Aggregate true, [a#4], [a#4,SUM(CAST(b#5, LongType)) AS PartialSum#40L] > ExistingRdd [a#4,b#5], MapPartitionsRDD[7] at mapPartitions at > basicOperators.scala:215 > > == Results == > !== Correct Answer - 3 == == Spark Answer - 3 == > !Vector(1, 3) [1,3] > !Vector(2, 3) [1,3] > !Vector(3, 3) [1,3] (QueryTest.scala:53) > - aggregates with nulls > - select * > - simple select > - sorting *** FAILED *** > Results do not match for query: > Sort ['a ASC,'b ASC] > Project [*] > UnresolvedRelation None, testData2, None > > == Analyzed Plan == > Sort [a#4 ASC,b#5 ASC] > Project [a#4,b#5] > SparkLogicalPlan (ExistingRdd [a#4,b#5], MapPartitionsRDD[7] at > mapPartitions at basicOperators.scala:215) > > == Physical Plan == > Sort [a#4 ASC,b#5 ASC], true > Exchange (RangePartitioning [a#4 ASC,b#5 ASC], 200) > ExistingRdd [a#4,b#5], MapPartitionsRDD[7] at mapPartitions at > basicOperators.scala:215 > > == Results == > !== Correct Answer - 6 == == Spark Answer - 6 == > !Vector(1, 1) [3,2] > !Vector(1, 2) [3,2] > !Vector(2, 1) [3,2] > !Vector(2, 2) [3,2] > !Vector(3, 1) [3,2] > !Vector(3, 2) [3,2] (QueryTest.scala:53) > - limit > - average > - average overflow *** FAILED *** > Results do not match for query: > Aggregate ['b], [AVG('a) AS c0#90,'b] > UnresolvedRelation None, largeAndSmallInts, None > > == Analyzed Plan == > Aggregate [b#3], [AVG(CAST(a#2, LongType)) AS c0#90,b#3] > SparkLogicalPlan (ExistingRdd [a#2,b#3], MapPartitionsRDD[4] at > mapPartitions at basicOperators.scala:215) > > == Physical Plan == > Aggregate false, [b#3], [(CAST(SUM(PartialSum#93L), DoubleType) / > CAST(SUM(PartialCount#94L), DoubleType)) AS c0#90,b#3] > Exchange (HashPartitioning [b#3], 200) > Aggregate true, [b#3], [b#3,COUNT(CAST(a#2, LongType)) AS > PartialCount#94L,SUM(CAST(a#2, LongType)) AS PartialSum#93L] > ExistingRdd [a#2,b#3], MapPartitionsRDD[4] at mapPartitions at > basicOperators.scala:215 > > == Results == > !== Correct Answer - 2 == == Spark Answer - 2 == > !Vector(2.0, 2) [2.147483645E9,1] > !Vector(2.147483645E9, 1) [2.147483645E9,1] (QueryTest.scala:53) > {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org