bogao007 commented on code in PR #40923: URL: https://github.com/apache/spark/pull/40923#discussion_r1175032518
########## connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/Dataset.scala: ########## @@ -1275,6 +1276,24 @@ class Dataset[T] private[sql] ( proto.Aggregate.GroupType.GROUP_TYPE_GROUPBY) } + /** + * (Scala-specific) + * Returns a [[KeyValueGroupedDataset]] where the data is grouped by the given key `func`. + * + * @group typedrel + * @since 2.0.0 + */ + def groupByKey[K: Encoder](func: T => K): KeyValueGroupedDataset[K, T] = { + + new KeyValueGroupedDataset( + encoderFor[K], + encoderFor[T], + sparkSession, + plan, + Seq.empty, Review Comment: Set to empty for both `dataAttributes` and `groupingAttributes` since I didn't find a good way to retrieve them. From the existing implementation [here](https://github.com/apache/spark/blob/16a28b1a961052a250dcf05b7c249c92156e1077/sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala#L1939-L1947), they are retrieved from `logicalPlan`. But since Spark Connect's `Dataset` is not able to get the `logicalPlan`, should we remove `dataAttributes` and `groupingAttributes` from the constructor and move the logic to `transformFlatMapGroupsWithState` in `SparkConnectPlanner`? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org