Not able to group by Scala UDF

Jeff Jones Fri, 18 Sep 2015 14:41:11 -0700

I’m trying to perform a Spark SQL (1.5) query containing a UDF in the select 
and group by clauses. From what I’ve been able to find this should be 
supported.  A few examples include 
https://github.com/spirom/LearningSpark/blob/master/src/main/scala/sql/UDF.scala,
 https://issues.apache.org/jira/browse/SPARK-9338, and 
https://issues.apache.org/jira/browse/SPARK-9435.  I just can’t seem to get it 
to work. I can use a nested query as a workaround but this is just one of many 
such queries that are generated by UI parameters some of which use UDFs and 
some that don’t. If I can simplify to not requiring the nested query it would 
make the code much easier to understand and maintain.  Is this possible with 
the 1.5 release?



select cdr3_length, frame_type, sum(fraction_templates), TagFilter(sample_tags, 
'Biological Sex', false)

from sequences_26aa4082_f714_4f53_9bf0_4cdf9d523f6a

group by cdr3_length, frame_type, TagFilter(sample_tags, 'Biological Sex', 
false)



[error] c.a.i.c.Analyzer - expression 'sample_tags' is neither present in the 
group by, nor is it an aggregate function. Add to group by or wrap in first() 
if you don't care which value you get.;

org.apache.spark.sql.AnalysisException: expression 'sample_tags' is neither 
present in the group by, nor is it an aggregate function. Add to group by or 
wrap in first() if you don't care which value you get.;

at 
org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class.failAnalysis(CheckAnalysis.scala:37)
 ~[spark-catalyst_2.11-1.5.0.jar:1.5.0]

at 
org.apache.spark.sql.catalyst.analysis.Analyzer.failAnalysis(Analyzer.scala:44) 
~[spark-catalyst_2.11-1.5.0.jar:1.5.0]

at 
org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.org$apache$spark$sql$catalyst$analysis$CheckAnalysis$class$$anonfun$$checkValidAggregateExpression$1(CheckAnalysis.scala:110)
 ~[spark-catalyst_2.11-1.5.0.jar:1.5.0]

at 
org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$org$apache$spark$sql$catalyst$analysis$CheckAnalysis$class$$anonfun$$checkValidAggregateExpression$1$3.apply(CheckAnalysis.scala:116)
 ~[spark-catalyst_2.11-1.5.0.jar:1.5.0]

at 
org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$org$apache$spark$sql$catalyst$analysis$CheckAnalysis$class$$anonfun$$checkValidAggregateExpression$1$3.apply(CheckAnalysis.scala:116)
 ~[spark-catalyst_2.11-1.5.0.jar:1.5.0]


Thanks,

Jeff


This message (and any attachments) is intended only for the designated 
recipient(s). It
may contain confidential or proprietary information, or have other limitations 
on use as
indicated by the sender. If you are not a designated recipient, you may not 
review, use,
copy or distribute this message. If you received this in error, please notify 
the sender by
reply e-mail and delete this message.

Not able to group by Scala UDF

Reply via email to