[ 
https://issues.apache.org/jira/browse/SPARK-19519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16019434#comment-16019434
 ] 

Faisal commented on SPARK-19519:
--------------------------------

I got to know, this is how I should the reference the aggregated column
dataframe.select(col("max(colmn1)")). The letter "max" should be hard coded to 
reference the aggregated column. Thanks

> Groupby for multiple columns not working
> ----------------------------------------
>
>                 Key: SPARK-19519
>                 URL: https://issues.apache.org/jira/browse/SPARK-19519
>             Project: Spark
>          Issue Type: Bug
>          Components: Java API
>    Affects Versions: 1.5.0
>            Reporter: Faisal
>
> Please look at the below join between multiple dataframes, then while 
> applying  groupby function for the multiple columns for the aggregate max 
> does not yield results instead exception User class threw exception: 
> org.apache.spark.sql.AnalysisException: expression 'propVal' is neither 
> present in the group by, nor is it an aggregate function. Add to group by or 
> wrap in first() if you don't care which value you get.
>  DataFrame joinModCtypeAsgns = modCtypeAsgnsDf.as("mod")
>                       .join(moduleCodeDf.as("mc"), 
> moduleCodeDf.col("EntityCode").equalTo(modCtypeAsgnsDf.col("charValCode")))
>                       .join(dictDfCharCode.as("dc"), 
> dictDfCharCode.col("EntityCode").equalTo(modCtypeAsgnsDf.col("charCode")))
>                       .join(dictDfIsAChar, 
> dictDfIsAChar.col("EntityCode").equalTo(modCtypeAsgnsDf.col("charCode")));
>                       
>         joinModCtypeAsgns.select(col("mc.propVal").as("mcaModCode"),
>                       col("dc.propVal").as("mcaCtypeCode"),
>                       max(col("mod.updatedDate")).as("mcaLastChangedDate"),
>                       coalesce(max(when(col("mndtryInd").equalTo("Y"), "Y")),
>                          max(when(col("mndtryInd").equalTo("N"), "N")),
>                          max(col("mndtryInd"))).as("mcaMandatoryFlg"),
>                        lit("N").as("mcaLockedFlg"),
>                        coalesce(max(when(col("fldColInd").equalTo("Y"), "F")),
>                          max(when(col("fldColInd").equalTo("N"), 
> "I")),max(col("fldColInd"))).as("mcaFieldCollectionFlg"))
> .groupBy(col("mc.propVal"),col("dc.propVal")).agg(col("mc.propVal"),col("dc.propVal"),max(col("mod.updatedDate")));



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to