[
https://issues.apache.org/jira/browse/FLINK-1159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15193188#comment-15193188
]
ASF GitHub Bot commented on FLINK-1159:
---------------------------------------
Github user tillrohrmann commented on a diff in the pull request:
https://github.com/apache/flink/pull/1704#discussion_r55993580
--- Diff:
flink-scala/src/main/scala/org/apache/flink/api/scala/extensions/acceptPartialFunctions/OnDataSet.scala
---
@@ -0,0 +1,104 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.flink.api.scala.extensions.acceptPartialFunctions
+
+import org.apache.flink.api.common.typeinfo.TypeInformation
+import org.apache.flink.api.scala.{GroupedDataSet, DataSet}
+
+import scala.reflect.ClassTag
+
+class OnDataSet[T: TypeInformation](ds: DataSet[T]) {
+
+ /**
+ * Applies a function `fun` to each item of the data set
+ *
+ * @param fun The function to be applied to each item
+ * @tparam R The type of the items in the returned data set
+ * @return A dataset of R
+ */
+ def mapWith[R: TypeInformation: ClassTag](fun: T => R): DataSet[R] =
+ ds.map(fun)
+
+ /**
+ * Applies a function `fun` to a partition as a whole
+ *
+ * @param fun The function to be applied on the whole partition
+ * @tparam R The type of the items in the returned data set
+ * @return A dataset of R
+ */
+ def mapPartitionWith[R: TypeInformation: ClassTag](fun: Seq[T] => R):
DataSet[R] =
+ ds.mapPartition {
+ (it, out) =>
+ out.collect(fun(it.to[Seq]))
--- End diff --
Does `it.to[Seq]` materializes the `iterator`? If so, then this is not so
good because you can run out of memory.
> Case style anonymous functions not supported by Scala API
> ---------------------------------------------------------
>
> Key: FLINK-1159
> URL: https://issues.apache.org/jira/browse/FLINK-1159
> Project: Flink
> Issue Type: Bug
> Components: Scala API
> Reporter: Till Rohrmann
> Assignee: Stefano Baghino
>
> In Scala it is very common to define anonymous functions of the following form
> {code}
> {
> case foo: Bar => foobar(foo)
> case _ => throw new RuntimeException()
> }
> {code}
> These case style anonymous functions are not supported yet by the Scala API.
> Thus, one has to write redundant code to name the function parameter.
> What works is the following pattern, but it is not intuitive for someone
> coming from Scala:
> {code}
> dataset.map{
> _ match{
> case foo:Bar => ...
> }
> }
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)