[GitHub] [spark] bogao007 commented on a diff in pull request #40923: [Draft] State API (FlatMapGroupsWithState) in Scala for Spark Connect

via GitHub Mon, 24 Apr 2023 02:49:18 -0700


bogao007 commented on code in PR #40923:
URL: https://github.com/apache/spark/pull/40923#discussion_r1175032518



##########
connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/Dataset.scala:
##########
@@ -1275,6 +1276,24 @@ class Dataset[T] private[sql] (
       proto.Aggregate.GroupType.GROUP_TYPE_GROUPBY)
   }
 
+  /**
+   * (Scala-specific)
+   * Returns a [[KeyValueGroupedDataset]] where the data is grouped by the 
given key `func`.
+   *
+   * @group typedrel
+   * @since 2.0.0
+   */
+  def groupByKey[K: Encoder](func: T => K): KeyValueGroupedDataset[K, T] = {
+
+    new KeyValueGroupedDataset(
+      encoderFor[K],
+      encoderFor[T],
+      sparkSession,
+      plan,
+      Seq.empty,

Review Comment:
   Set to empty for both `dataAttributes` and `groupingAttributes` since I 
didn't find a good way to retrieve them. From the existing implementation 
[here](https://github.com/apache/spark/blob/16a28b1a961052a250dcf05b7c249c92156e1077/sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala#L1939-L1947),
 they are retrieved from `logicalPlan`. But since Spark Connect's `Dataset` is 
not able to get the `logicalPlan`, should we remove `dataAttributes` and 
`groupingAttributes` from the constructor and move the logic to 
`transformFlatMapGroupsWithState` in `SparkConnectPlanner`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] bogao007 commented on a diff in pull request #40923: [Draft] State API (FlatMapGroupsWithState) in Scala for Spark Connect

Reply via email to