[GitHub] incubator-flink pull request: New operator map partition function

StephanEwen Thu, 03 Jul 2014 04:47:36 -0700

Github user StephanEwen commented on a diff in the pull request:

    https://github.com/apache/incubator-flink/pull/42#discussion_r14509280
  
    --- Diff: 
stratosphere-java/src/main/java/eu/stratosphere/api/java/DataSet.java ---
    @@ -135,6 +139,27 @@ public ExecutionEnvironment getExecutionEnvironment() {
                }
                return new MapOperator<T, R>(this, mapper);
        }
    +
    +
    +
    +    /**
    +     * Applies a Map transformation on a {@link DataSet} by using an 
iterator.<br/>
    --- End diff --
    
    I think this comment is not quite correct. Something more appropriate is 
    
    ```
    Applies a Map operation to the entire partition of the data. The function 
is called once per parallel partition of the data, and the entire partition is 
available through the given Iterator. The number of elements that each instance 
of the MapPartition function sees is non deterministic and depends on the 
degree of parallelism of the operation.
    
    This function is intended for operations that cannot transform individual 
elements, requires no grouping of elements. To transform individual elements, 
the use of {@code map()} and {@code flatMap()} is preferable."



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-flink pull request: New operator map partition function

Reply via email to