[ https://issues.apache.org/jira/browse/SPARK-25156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Yonghwan Lee updated SPARK-25156: --------------------------------- Description: I performed two joins and two left outer join on five tables. There are several different results when you run the same query multiple times. Table A ||Column a||Column b||Column c||Column d|| |Long(nullable: false)|Integer(nullable: false)|String(nullable: true)|String(nullable: false)| Table B ||Column a||Column b|| |Long(nullable: false)|String(nullable: false)| Table C ||Column a||Column b|| |Integer(nullable: false)|String(nullable: false)| Table D ||Column a||Column b||Column c|| |Long(nullable: true)|Long(nullable: false)|Integer(nullable: false)| Table E ||Column a||Column b||Column c|| |Long(nullable: false)|Integer(nullable: false)|String| Query(Spark SQL) {code:java} select A.c, B.b, C.b, D.c, E.c inner join B on A.a = B.a inner join C on A.b = C.a left outer join D on A.d <=> cast(D.a as string) left outer join E on D.b = E.a and D.c = E.b{code} I performed above query 10 times, it returns 7 times correct result(count: 830001460) and 3 times incorrect result(count: 830001299) + I execute {code:java} sql("set spark.sql.shuffle.partitions=801"){code} Is this spark sql's bug? was: I performed two joins and two left outer join on five tables. There are several different results when you run the same query multiple times. Table A ||Column a||Column b||Column c||Column d|| |Long(nullable: false)|Integer(nullable: false)|String(nullable: true)|String(nullable: false)| Table B ||Column a||Column b|| |Long(nullable: false)|String(nullable: false)| Table C ||Column a||Column b|| |Integer(nullable: false)|String(nullable: false)| Table D ||Column a||Column b||Column c|| |Long(nullable: true)|Long(nullable: false)|Integer(nullable: false)| Table E ||Column a||Column b||Column c|| |Long(nullable: false)|Integer(nullable: false)|String| Query(Spark SQL) {code:java} select A.c, B.b, C.b, D.c, E.c inner join B on A.a = B.a inner join C on A.b = C.a left outer join D on A.d <=> cast(D.a as string) left outer join E on D.b = E.a and D.c = E.b{code} I performed above query 10 times, it returns 7 times correct result(count: 830001460) and 3 times incorrect result(count: 830001299) Is this spark sql's bug? > Same query returns different result > ----------------------------------- > > Key: SPARK-25156 > URL: https://issues.apache.org/jira/browse/SPARK-25156 > Project: Spark > Issue Type: Question > Components: Spark Core > Affects Versions: 2.1.1 > Environment: * Spark Version: 2.1.1 > * Java Version: Java 7 > * Scala Version: 2.11.8 > Reporter: Yonghwan Lee > Priority: Major > Labels: Question > > I performed two joins and two left outer join on five tables. > There are several different results when you run the same query multiple > times. > Table A > > ||Column a||Column b||Column c||Column d|| > |Long(nullable: false)|Integer(nullable: false)|String(nullable: > true)|String(nullable: false)| > Table B > ||Column a||Column b|| > |Long(nullable: false)|String(nullable: false)| > Table C > ||Column a||Column b|| > |Integer(nullable: false)|String(nullable: false)| > Table D > ||Column a||Column b||Column c|| > |Long(nullable: true)|Long(nullable: false)|Integer(nullable: false)| > Table E > ||Column a||Column b||Column c|| > |Long(nullable: false)|Integer(nullable: false)|String| > Query(Spark SQL) > {code:java} > select A.c, B.b, C.b, D.c, E.c > inner join B on A.a = B.a > inner join C on A.b = C.a > left outer join D on A.d <=> cast(D.a as string) > left outer join E on D.b = E.a and D.c = E.b{code} > > I performed above query 10 times, it returns 7 times correct result(count: > 830001460) and 3 times incorrect result(count: 830001299) > > + I execute > {code:java} > sql("set spark.sql.shuffle.partitions=801"){code} > Is this spark sql's bug? -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org