[ https://issues.apache.org/jira/browse/SPARK-7324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Michael Armbrust resolved SPARK-7324. ------------------------------------- Resolution: Fixed Fix Version/s: 1.4.0 Issue resolved by pull request 6066 [https://github.com/apache/spark/pull/6066] > Add DataFrame.dropDuplicates > ---------------------------- > > Key: SPARK-7324 > URL: https://issues.apache.org/jira/browse/SPARK-7324 > Project: Spark > Issue Type: Sub-task > Components: SQL > Reporter: Reynold Xin > Fix For: 1.4.0 > > > Similar to > http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.drop_duplicates.html > def dropDuplicates(): DataFrame > def dropDuplicates(subset: Seq[String]): DataFrame > We can turn this into groupBy(cols).agg(first(...)) -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org