I’m trying to perform a Spark SQL (1.5) query containing a UDF in the select and group by clauses. From what I’ve been able to find this should be supported. A few examples include https://github.com/spirom/LearningSpark/blob/master/src/main/scala/sql/UDF.scala, https://issues.apache.org/jira/browse/SPARK-9338, and https://issues.apache.org/jira/browse/SPARK-9435. I just can’t seem to get it to work. I can use a nested query as a workaround but this is just one of many such queries that are generated by UI parameters some of which use UDFs and some that don’t. If I can simplify to not requiring the nested query it would make the code much easier to understand and maintain. Is this possible with the 1.5 release?
select cdr3_length, frame_type, sum(fraction_templates), TagFilter(sample_tags, 'Biological Sex', false) from sequences_26aa4082_f714_4f53_9bf0_4cdf9d523f6a group by cdr3_length, frame_type, TagFilter(sample_tags, 'Biological Sex', false) [error] c.a.i.c.Analyzer - expression 'sample_tags' is neither present in the group by, nor is it an aggregate function. Add to group by or wrap in first() if you don't care which value you get.; org.apache.spark.sql.AnalysisException: expression 'sample_tags' is neither present in the group by, nor is it an aggregate function. Add to group by or wrap in first() if you don't care which value you get.; at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class.failAnalysis(CheckAnalysis.scala:37) ~[spark-catalyst_2.11-1.5.0.jar:1.5.0] at org.apache.spark.sql.catalyst.analysis.Analyzer.failAnalysis(Analyzer.scala:44) ~[spark-catalyst_2.11-1.5.0.jar:1.5.0] at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.org$apache$spark$sql$catalyst$analysis$CheckAnalysis$class$$anonfun$$checkValidAggregateExpression$1(CheckAnalysis.scala:110) ~[spark-catalyst_2.11-1.5.0.jar:1.5.0] at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$org$apache$spark$sql$catalyst$analysis$CheckAnalysis$class$$anonfun$$checkValidAggregateExpression$1$3.apply(CheckAnalysis.scala:116) ~[spark-catalyst_2.11-1.5.0.jar:1.5.0] at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$org$apache$spark$sql$catalyst$analysis$CheckAnalysis$class$$anonfun$$checkValidAggregateExpression$1$3.apply(CheckAnalysis.scala:116) ~[spark-catalyst_2.11-1.5.0.jar:1.5.0] Thanks, Jeff This message (and any attachments) is intended only for the designated recipient(s). It may contain confidential or proprietary information, or have other limitations on use as indicated by the sender. If you are not a designated recipient, you may not review, use, copy or distribute this message. If you received this in error, please notify the sender by reply e-mail and delete this message.