[ https://issues.apache.org/jira/browse/SPARK-8712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Yin Huai updated SPARK-8712: ---------------------------- Description: Seems we ignored distinct keyword. Also, for our master, I got {code} scala> Seq((1, 2, 3)).toDF("i", "j", "k").registerTempTable("t") scala> sql("select count(distinct j) over (partition by i) from t").explain(true) == Parsed Logical Plan == 'Project [UnresolvedAlias CountDistinct UnresolvedAttribute [j] ] 'UnresolvedRelation [t], None == Analyzed Logical Plan == _c0: bigint Aggregate [COUNT(DISTINCT j#23) AS _c0#27L] Subquery t Project [_1#19 AS i#22,_2#20 AS j#23,_3#21 AS k#24] LocalRelation [_1#19,_2#20,_3#21], [[1,2,3]] == Optimized Logical Plan == Aggregate [COUNT(DISTINCT j#23) AS _c0#27L] LocalRelation [j#23], [[2]] == Physical Plan == GeneratedAggregate false, [CombineAndCount(partialSets#28) AS _c0#27L], false Exchange SinglePartition GeneratedAggregate true, [AddToHashSet(j#23) AS partialSets#28], false LocalTableScan [j#23], [[2]] Code Generation: true == RDD == scala> sql("select count(j) over (partition by i) from t").explain(true) == Parsed Logical Plan == 'Project [UnresolvedAlias WindowExpression UnresolvedWindowFunction count UnresolvedAttribute [j] WindowSpecDefinition UnspecifiedFrame UnresolvedAttribute [i] ] 'UnresolvedRelation [t], None == Analyzed Logical Plan == _c0: bigint Project [_c0#31L] Project [j#23,i#22,_c0#31L,_c0#31L] Window [j#23,i#22], [HiveWindowFunction#org.apache.hadoop.hive.ql.udf.generic.GenericUDAFCount(j#23) WindowSpecDefinition ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING AS _c0#31L], WindowSpecDefinition ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING Project [j#23,i#22] Subquery t Project [_1#19 AS i#22,_2#20 AS j#23,_3#21 AS k#24] LocalRelation [_1#19,_2#20,_3#21], [[1,2,3]] == Optimized Logical Plan == Project [_c0#31L] Window [j#23,i#22], [HiveWindowFunction#org.apache.hadoop.hive.ql.udf.generic.GenericUDAFCount(j#23) WindowSpecDefinition ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING AS _c0#31L], WindowSpecDefinition ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING LocalRelation [j#23,i#22], [[2,1]] == Physical Plan == Project [_c0#31L] Window [j#23,i#22], [HiveWindowFunction#org.apache.hadoop.hive.ql.udf.generic.GenericUDAFCount(j#23) WindowSpecDefinition ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING AS _c0#31L], WindowSpecDefinition ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING ExternalSort [i#22 ASC], false Exchange (HashPartitioning 200) LocalTableScan [j#23,i#22], [[2,1]] Code Generation: true == RDD == {code} was:Seems we ignored distinct keyword. > Window function does not work with distinct aggregations > -------------------------------------------------------- > > Key: SPARK-8712 > URL: https://issues.apache.org/jira/browse/SPARK-8712 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 1.4.0 > Reporter: Yin Huai > Assignee: Yin Huai > > Seems we ignored distinct keyword. Also, for our master, I got > {code} > scala> Seq((1, 2, 3)).toDF("i", "j", "k").registerTempTable("t") > scala> sql("select count(distinct j) over (partition by i) from > t").explain(true) > == Parsed Logical Plan == > 'Project [UnresolvedAlias > CountDistinct > UnresolvedAttribute [j] > ] > 'UnresolvedRelation [t], None > == Analyzed Logical Plan == > _c0: bigint > Aggregate [COUNT(DISTINCT j#23) AS _c0#27L] > Subquery t > Project [_1#19 AS i#22,_2#20 AS j#23,_3#21 AS k#24] > LocalRelation [_1#19,_2#20,_3#21], [[1,2,3]] > == Optimized Logical Plan == > Aggregate [COUNT(DISTINCT j#23) AS _c0#27L] > LocalRelation [j#23], [[2]] > == Physical Plan == > GeneratedAggregate false, [CombineAndCount(partialSets#28) AS _c0#27L], false > Exchange SinglePartition > GeneratedAggregate true, [AddToHashSet(j#23) AS partialSets#28], false > LocalTableScan [j#23], [[2]] > Code Generation: true > == RDD == > scala> sql("select count(j) over (partition by i) from t").explain(true) > == Parsed Logical Plan == > 'Project [UnresolvedAlias > WindowExpression > UnresolvedWindowFunction count > UnresolvedAttribute [j] > WindowSpecDefinition UnspecifiedFrame > UnresolvedAttribute [i] > ] > 'UnresolvedRelation [t], None > == Analyzed Logical Plan == > _c0: bigint > Project [_c0#31L] > Project [j#23,i#22,_c0#31L,_c0#31L] > Window [j#23,i#22], > [HiveWindowFunction#org.apache.hadoop.hive.ql.udf.generic.GenericUDAFCount(j#23) > WindowSpecDefinition ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED > FOLLOWING AS _c0#31L], WindowSpecDefinition ROWS BETWEEN UNBOUNDED PRECEDING > AND UNBOUNDED FOLLOWING > Project [j#23,i#22] > Subquery t > Project [_1#19 AS i#22,_2#20 AS j#23,_3#21 AS k#24] > LocalRelation [_1#19,_2#20,_3#21], [[1,2,3]] > == Optimized Logical Plan == > Project [_c0#31L] > Window [j#23,i#22], > [HiveWindowFunction#org.apache.hadoop.hive.ql.udf.generic.GenericUDAFCount(j#23) > WindowSpecDefinition ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED > FOLLOWING AS _c0#31L], WindowSpecDefinition ROWS BETWEEN UNBOUNDED PRECEDING > AND UNBOUNDED FOLLOWING > LocalRelation [j#23,i#22], [[2,1]] > == Physical Plan == > Project [_c0#31L] > Window [j#23,i#22], > [HiveWindowFunction#org.apache.hadoop.hive.ql.udf.generic.GenericUDAFCount(j#23) > WindowSpecDefinition ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED > FOLLOWING AS _c0#31L], WindowSpecDefinition ROWS BETWEEN UNBOUNDED PRECEDING > AND UNBOUNDED FOLLOWING > ExternalSort [i#22 ASC], false > Exchange (HashPartitioning 200) > LocalTableScan [j#23,i#22], [[2,1]] > Code Generation: true > == RDD == > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org