[ https://issues.apache.org/jira/browse/SPARK-12491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15071454#comment-15071454 ]
Herman van Hovell commented on SPARK-12491: ------------------------------------------- I cannot reproduce this issue on 1.5.2 and the latest master in local mode. I used the following code: {noformat} import org.apache.spark.sql.expressions.MutableAggregationBuffer import org.apache.spark.sql.expressions.UserDefinedAggregateFunction import org.apache.spark.sql.Row import org.apache.spark.sql.types._ class GeometricMean extends UserDefinedAggregateFunction { def inputSchema: org.apache.spark.sql.types.StructType = StructType(StructField("value", DoubleType) :: Nil) def bufferSchema: StructType = StructType( StructField("count", LongType) :: StructField("product", DoubleType) :: Nil ) def dataType: DataType = DoubleType def deterministic: Boolean = true def initialize(buffer: MutableAggregationBuffer): Unit = { buffer(0) = 0L buffer(1) = 1.0 } def update(buffer: MutableAggregationBuffer,input: Row): Unit = { buffer(0) = buffer.getAs[Long](0) + 1 buffer(1) = buffer.getAs[Double](1) * input.getAs[Double](0) } def merge(buffer1: MutableAggregationBuffer, buffer2: Row): Unit = { buffer1(0) = buffer1.getAs[Long](0) + buffer2.getAs[Long](0) buffer1(1) = buffer1.getAs[Double](1) * buffer2.getAs[Double](1) } def evaluate(buffer: Row): Any = { math.pow(buffer.getDouble(1), 1.toDouble / buffer.getLong(0)) } } // Create an instance of UDAF GeometricMean. val gm = new GeometricMean // Register the udaf sqlContext.udf.register("gm", gm) // Create a simple DataFrame with a single column called "id" // containing number 1 to 10. val df = sqlContext.range(1, 11).select(($"id" / 3).cast("int").as("group_id"), $"id") df.registerTempTable("simple") // Without alias sqlContext.sql("select group_id, gm(id) from simple group by group_id").show() // With alias sqlContext.sql("select group_id, gm(id) as GeometricMean from simple group by group_id").show() {noformat} Could you share the logical plans for both queries? You can get a logical plan by doing this: {noformat} // Query without alias val q = sqlContext.sql("select group_id, gm(id) from simple group by group_id") q.explain(true) {noformat} > UDAF result differs in SQL if alias is used > ------------------------------------------- > > Key: SPARK-12491 > URL: https://issues.apache.org/jira/browse/SPARK-12491 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 1.5.2 > Reporter: Tristan > > Using the GeometricMean UDAF example > (https://databricks.com/blog/2015/09/16/spark-1-5-dataframe-api-highlights-datetimestring-handling-time-intervals-and-udafs.html), > I found the following discrepancy in results: > scala> sqlContext.sql("select group_id, gm(id) from simple group by > group_id").show() > +--------+---+ > |group_id|_c1| > +--------+---+ > | 0|0.0| > | 1|0.0| > | 2|0.0| > +--------+---+ > scala> sqlContext.sql("select group_id, gm(id) as GeometricMean from simple > group by group_id").show() > +--------+-----------------+ > |group_id| GeometricMean| > +--------+-----------------+ > | 0|8.981385496571725| > | 1|7.301716979342118| > | 2|7.706253151292568| > +--------+-----------------+ -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org