Takeshi Yamamuro created SPARK-20390: ----------------------------------------
Summary: Non-deterministic expressions could exist in grouping keys Key: SPARK-20390 URL: https://issues.apache.org/jira/browse/SPARK-20390 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 2.1.0 Reporter: Takeshi Yamamuro Deterministic expressions only exist in grouping keys though, non-deterministic one could exist there in some cases. This is because `AttributeReference` does not respect `deterministic` properties in query plans. A example is as follows; {code} scala> val df = sql("""select rand(0), count(1) group by 1""") df: org.apache.spark.sql.DataFrame = [rand(0): double, count(1): bigint] scala> df.explain(true) == Parsed Logical Plan == 'Aggregate [1], [unresolvedalias('rand(0), None), unresolvedalias('count(1), None)] +- OneRowRelation$ == Analyzed Logical Plan == rand(0): double, count(1): bigint Aggregate [_nondeterministic#92], [_nondeterministic#92 AS rand(0)#90, count(1) AS count(1)#91L] +- Project [rand(0) AS _nondeterministic#92] +- OneRowRelation$ == Optimized Logical Plan == Aggregate [_nondeterministic#92], [_nondeterministic#92 AS rand(0)#90, count(1) AS count(1)#91L] +- Project [rand(0) AS _nondeterministic#92] +- OneRowRelation$ == Physical Plan == *HashAggregate(keys=[_nondeterministic#92], functions=[count(1)], output=[rand(0)#90, count(1)#91L]) +- Exchange hashpartitioning(_nondeterministic#92, 200) +- *HashAggregate(keys=[_nondeterministic#92], functions=[partial_count(1)], output=[_nondeterministic#92, count#94L]) +- *Project [rand(0) AS _nondeterministic#92] +- Scan OneRowRelation[] scala> df.show +------------------+--------+ | rand(0)|count(1)| +------------------+--------+ |0.8446490682263027| 1| +------------------+--------+ {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org