[ https://issues.apache.org/jira/browse/SPARK-13680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15970399#comment-15970399 ]
Hyukjin Kwon commented on SPARK-13680: -------------------------------------- Thank you so much for your confirmation. I am resolving this as a Cannot Reproduce as the guide lines. Probably it would be nicer if anyone identifies the JIRA and backports if applicable. > Java UDAF with more than one intermediate argument returns wrong results > ------------------------------------------------------------------------ > > Key: SPARK-13680 > URL: https://issues.apache.org/jira/browse/SPARK-13680 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 1.5.0 > Environment: CDH 5.5.2 > Reporter: Yael Aharon > Attachments: data.csv, setup.hql > > > I am trying to incorporate the Java UDAF from > https://github.com/apache/spark/blob/master/sql/hive/src/test/java/org/apache/spark/sql/hive/aggregate/MyDoubleAvg.java > into an SQL query. > I registered the UDAF like this: > sqlContext.udf().register("myavg", new MyDoubleAvg()); > My SQL query is: > SELECT AVG(seqi) AS `avg_seqi`, AVG(seqd) AS `avg_seqd`, AVG(ci) AS `avg_ci`, > AVG(cd) AS `avg_cd`, AVG(stdevd) AS `avg_stdevd`, AVG(stdevi) AS > `avg_stdevi`, MAX(seqi) AS `max_seqi`, MAX(seqd) AS `max_seqd`, MAX(ci) AS > `max_ci`, MAX(cd) AS `max_cd`, MAX(stdevd) AS `max_stdevd`, MAX(stdevi) AS > `max_stdevi`, MIN(seqi) AS `min_seqi`, MIN(seqd) AS `min_seqd`, MIN(ci) AS > `min_ci`, MIN(cd) AS `min_cd`, MIN(stdevd) AS `min_stdevd`, MIN(stdevi) AS > `min_stdevi`,SUM(seqi) AS `sum_seqi`, SUM(seqd) AS `sum_seqd`, SUM(ci) AS > `sum_ci`, SUM(cd) AS `sum_cd`, SUM(stdevd) AS `sum_stdevd`, SUM(stdevi) AS > `sum_stdevi`, myavg(seqd) as `myavg_seqd`, AVG(zero) AS `avg_zero`, > AVG(nulli) AS `avg_nulli`,AVG(nulld) AS `avg_nulld`, SUM(zero) AS `sum_zero`, > SUM(nulli) AS `sum_nulli`,SUM(nulld) AS `sum_nulld`,MAX(zero) AS `max_zero`, > MAX(nulli) AS `max_nulli`,MAX(nulld) AS `max_nulld`,count( * ) AS > `count_all`, count(nulli) AS `count_nulli` FROM mytable > As soon as I add the UDAF myavg to the SQL, all the results become incorrect. > When I remove the call to the UDAF, the results are correct. > I was able to go around the issue by modifying bufferSchema of the UDAF to > use an array and the corresponding update and merge methods. -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org