[ https://issues.apache.org/jira/browse/SPARK-25942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Wenchen Fan resolved SPARK-25942. --------------------------------- Resolution: Fixed Fix Version/s: 3.0.0 Issue resolved by pull request 22944 [https://github.com/apache/spark/pull/22944] > Aggregate expressions shouldn't be resolved on AppendColumns > ------------------------------------------------------------ > > Key: SPARK-25942 > URL: https://issues.apache.org/jira/browse/SPARK-25942 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 2.4.0 > Reporter: Liang-Chi Hsieh > Assignee: Liang-Chi Hsieh > Priority: Major > Fix For: 3.0.0 > > > Dataset.groupByKey will bring in new attributes from serializer. If key type > is the same as original Dataset's object type, they have same serializer > output and so the attribute names will conflict. > This won't be a problem at most of cases, if we don't refer conflict > attributes: > {code:java} > val ds: Dataset[(ClassData, Long)] = Seq(ClassData("one", 1), > ClassData("two", 2)).toDS() > .map(c => ClassData(c.a, c.b + 1)) > .groupByKey(p => p).count() > {code} > But if we use conflict attributes, `Analyzer` will complain about ambiguous > references: > {code} > val ds = Seq(1, 2, 3).toDS() > val agg = ds.groupByKey(_ >= 2).agg(sum("value").as[Long], sum($"value" + > 1).as[Long]) > {code} > > {code:java} > org.apache.spark.sql.AnalysisException: Reference 'value' is ambiguous, could > be: value, value.; > [info] at > org.apache.spark.sql.catalyst.expressions.package$AttributeSeq.resolve(package.scala:247) > [info] at > org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveChildren(LogicalPlan.scala:101) > [info] at > org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveReferences$$anonfun$38.apply(Analyzer.scala:889) > [info] at > org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveReferences$$anonfun$38.apply(Analyzer.scala:891) > ... > {code} > Based on the API document and implementation details of > KeyValueGroupedDataset, we should not allow aggregate expressions on > KeyValueGroupedDataset to access key attributes. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org