Github user rmetzger commented on a diff in the pull request:

    https://github.com/apache/incubator-flink/pull/87#discussion_r15664277
  
    --- Diff: docs/java_api_guide.md ---
    @@ -427,6 +403,88 @@ DataSet<Tuple2<String, Integer>> out = 
in.project(2,0).types(String.class, Integ
     Defining Keys
     -------------
     
    +One transformation (join, coGroup) require that a key is defined on
    +its argument DataSets, and other transformations (Reduce, GroupReduce,
    +Aggregate) allow that the DataSet is grouped on a key before they are
    +applied.
    +
    +A DataSet is grouped as
    +{% highlight java %}
    +DataSet<...> input = // [...]
    +DataSet<...> reduced = input
    +   .groupBy(/*define key here*/)
    +   .reduceGroup(/*do something*/);
    +{% endhighlight %}
    +
    +The data model of Flink is not based on key-value pairs. Therefore,
    +you do not need to physically pack the data set types into keys and
    +values. Keys are "virtual": they are defined as functions over the
    +actual data to guide the grouping operator.
    +
    +The simplest case is grouping a data set of Tuples on one or more
    +fields of the Tuple:
    +{% highlight java %}
    +DataSet<Tuple3<Integer,String,Long>> input = // [...]
    +DataSet<Tuple3<Integer,String,Long> grouped = input
    +   .groupBy(1)
    --- End diff --
    
    Aren't the fields 0-indexed? So `groupBy(1)` is the second field (String) ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

Reply via email to