Liang-Chi Hsieh created SPARK-17867: ---------------------------------------
Summary: Dataset.dropDuplicates (i.e. distinct) should consider the columns with same column name Key: SPARK-17867 URL: https://issues.apache.org/jira/browse/SPARK-17867 Project: Spark Issue Type: Bug Components: SQL Reporter: Liang-Chi Hsieh We find and get the first resolved attribute from output with the given column name in Dataset.dropDuplicates. When we have the more than one columns with the same name. Other columns are put into aggregation columns, instead of grouping columns. We should fix this. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org