[ https://issues.apache.org/jira/browse/SPARK-12491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15073092#comment-15073092 ]
Herman van Hovell commented on SPARK-12491: ------------------------------------------- I did some testing on a spark cluster, and it seems that I can reproduce your problem. I am getting wrong results (not 0's) when I run the aggregate for the first time, the results are correct when I run the same query multiple times. Could you confirm this by swapping the two queries, and running them a few times? Only the first should be producing wrong results. @[~yhuai] any thoughts? > UDAF result differs in SQL if alias is used > ------------------------------------------- > > Key: SPARK-12491 > URL: https://issues.apache.org/jira/browse/SPARK-12491 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 1.5.2 > Reporter: Tristan > Attachments: UDAF_GM.zip > > > Using the GeometricMean UDAF example > (https://databricks.com/blog/2015/09/16/spark-1-5-dataframe-api-highlights-datetimestring-handling-time-intervals-and-udafs.html), > I found the following discrepancy in results: > {code} > scala> sqlContext.sql("select group_id, gm(id) from simple group by > group_id").show() > +--------+---+ > |group_id|_c1| > +--------+---+ > | 0|0.0| > | 1|0.0| > | 2|0.0| > +--------+---+ > scala> sqlContext.sql("select group_id, gm(id) as GeometricMean from simple > group by group_id").show() > +--------+-----------------+ > |group_id| GeometricMean| > +--------+-----------------+ > | 0|8.981385496571725| > | 1|7.301716979342118| > | 2|7.706253151292568| > +--------+-----------------+ > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org