[ https://issues.apache.org/jira/browse/SPARK-19519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16019434#comment-16019434 ]
Faisal commented on SPARK-19519: -------------------------------- I got to know, this is how I should the reference the aggregated column dataframe.select(col("max(colmn1)")). The letter "max" should be hard coded to reference the aggregated column. Thanks > Groupby for multiple columns not working > ---------------------------------------- > > Key: SPARK-19519 > URL: https://issues.apache.org/jira/browse/SPARK-19519 > Project: Spark > Issue Type: Bug > Components: Java API > Affects Versions: 1.5.0 > Reporter: Faisal > > Please look at the below join between multiple dataframes, then while > applying groupby function for the multiple columns for the aggregate max > does not yield results instead exception User class threw exception: > org.apache.spark.sql.AnalysisException: expression 'propVal' is neither > present in the group by, nor is it an aggregate function. Add to group by or > wrap in first() if you don't care which value you get. > DataFrame joinModCtypeAsgns = modCtypeAsgnsDf.as("mod") > .join(moduleCodeDf.as("mc"), > moduleCodeDf.col("EntityCode").equalTo(modCtypeAsgnsDf.col("charValCode"))) > .join(dictDfCharCode.as("dc"), > dictDfCharCode.col("EntityCode").equalTo(modCtypeAsgnsDf.col("charCode"))) > .join(dictDfIsAChar, > dictDfIsAChar.col("EntityCode").equalTo(modCtypeAsgnsDf.col("charCode"))); > > joinModCtypeAsgns.select(col("mc.propVal").as("mcaModCode"), > col("dc.propVal").as("mcaCtypeCode"), > max(col("mod.updatedDate")).as("mcaLastChangedDate"), > coalesce(max(when(col("mndtryInd").equalTo("Y"), "Y")), > max(when(col("mndtryInd").equalTo("N"), "N")), > max(col("mndtryInd"))).as("mcaMandatoryFlg"), > lit("N").as("mcaLockedFlg"), > coalesce(max(when(col("fldColInd").equalTo("Y"), "F")), > max(when(col("fldColInd").equalTo("N"), > "I")),max(col("fldColInd"))).as("mcaFieldCollectionFlg")) > .groupBy(col("mc.propVal"),col("dc.propVal")).agg(col("mc.propVal"),col("dc.propVal"),max(col("mod.updatedDate"))); -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org