Khoa Tran created SPARK-23705:
---------------------------------

             Summary: dataframe.groupBy() may inadvertently receive sequence of 
non-distinct strings
                 Key: SPARK-23705
                 URL: https://issues.apache.org/jira/browse/SPARK-23705
             Project: Spark
          Issue Type: Improvement
          Components: SQL
    Affects Versions: 2.3.0
            Reporter: Khoa Tran


{code:java}
// code placeholder
package org.apache.spark.sql
.
.
.
class Dataset[T] private[sql](
.
.
.
def groupBy(col1: String, cols: String*): RelationalGroupedDataset = {
  val colNames: Seq[String] = col1 +: cols
  RelationalGroupedDataset(
    toDF(), colNames.map(colName => resolve(colName)), 
RelationalGroupedDataset.GroupByType)
}
{code}
should append a `.distinct` after `colNames` when used in `groupBy` 

 

Not sure if the community agrees with this or it's up to the users to perform 
the distinct operation



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to