[GitHub] flink pull request #5257: [FLINK-8381] [table] Document more flexible schema...

2018-01-09 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/5257


---


[GitHub] flink pull request #5257: [FLINK-8381] [table] Document more flexible schema...

2018-01-08 Thread fhueske
Github user fhueske commented on a diff in the pull request:

https://github.com/apache/flink/pull/5257#discussion_r160275286
  
--- Diff: docs/dev/table/common.md ---
@@ -802,7 +802,87 @@ val dsTuple: DataSet[(String, Int)] = 
tableEnv.toDataSet[(String, Int)](table)
 
 ### Mapping of Data Types to Table Schema
 
-Flink's DataStream and DataSet APIs support very diverse types, such as 
Tuples (built-in Scala and Flink Java tuples), POJOs, case classes, and atomic 
types. In the following we describe how the Table API converts these types into 
an internal row representation and show examples of converting a `DataStream` 
into a `Table`.
+Flink's DataStream and DataSet APIs support very diverse types. Composite 
types such as Tuples (built-in Scala and Flink Java tuples), POJOs, Scala case 
classes, and Flink's Row type allow for nested data structures with multiple 
fields that can be accessed in table expressions. Other types are treated as 
atomic types. In the following, we describe how the Table API converts these 
types into an internal row representation and show examples of converting a 
`DataStream` into a `Table`.
+
+The mapping of a data type to a table schema can happen in two ways: 
**based on the field positions** or **based on the field names**.
+
+**Position-based Mapping**
+
+Position-based mapping can be used to give fields a more meaningful name 
while keeping the field order. This mapping is available for composite data 
types *with a defined field order* as well as atomic types. Composite data 
types such as tuples, rows, and case classes have such a field order. However, 
fields of a POJO must be mapped based on the field names (see next section).
+
+When defining a position-based mapping, the specified names must not exist 
in the input data type, otherwise the API will assume that the mapping should 
happen based on the field names. If no field names are specified, the default 
field names and field order of the composite type are used or `f0` for atomic 
types. 
+
+
+
+{% highlight java %}
+// get a StreamTableEnvironment, works for BatchTableEnvironment 
equivalently
+StreamTableEnvironment tableEnv = 
TableEnvironment.getTableEnvironment(env);
+
+DataStream> stream = ...
+// convert DataStream into Table with default field names "f0" and "f1"
+Table table = tableEnv.fromDataStream(stream);
+// convert DataStream into Table with field names "myLong" and "myInt"
+Table table = tableEnv.fromDataStream(stream, "myLong, myInt");
+{% endhighlight %}
+
+
+
+{% highlight scala %}
+// get a TableEnvironment
+val tableEnv = TableEnvironment.getTableEnvironment(env)
+
+val stream: DataStream[(Long, Int)] = ...
+// convert DataStream into Table with default field names "_1" and "_2"
+val table: Table = tableEnv.fromDataStream(stream)
+// convert DataStream into Table with field names "myLong" and "myInt"
+val table: Table = tableEnv.fromDataStream(stream, 'myLong 'myInt)
+{% endhighlight %}
+
+
+
+**Name-based Mapping**
+
+Name-based mapping can be used for any data type including POJOs. It is 
the most flexible way of defining a table schema mapping. All fields in the 
mapping are referenced by name and can be possibly renamed using an alias `as`. 
Fields can be reordered and projected out.
+
+If no field names are specified, the default field names and field order 
of the composite type are used or `f0` for atomic types.
+
+
--- End diff --

Move and split code examples to the discussion of the individual types.


---


[GitHub] flink pull request #5257: [FLINK-8381] [table] Document more flexible schema...

2018-01-08 Thread fhueske
Github user fhueske commented on a diff in the pull request:

https://github.com/apache/flink/pull/5257#discussion_r160274966
  
--- Diff: docs/dev/table/common.md ---
@@ -802,7 +802,87 @@ val dsTuple: DataSet[(String, Int)] = 
tableEnv.toDataSet[(String, Int)](table)
 
 ### Mapping of Data Types to Table Schema
 
-Flink's DataStream and DataSet APIs support very diverse types, such as 
Tuples (built-in Scala and Flink Java tuples), POJOs, case classes, and atomic 
types. In the following we describe how the Table API converts these types into 
an internal row representation and show examples of converting a `DataStream` 
into a `Table`.
+Flink's DataStream and DataSet APIs support very diverse types. Composite 
types such as Tuples (built-in Scala and Flink Java tuples), POJOs, Scala case 
classes, and Flink's Row type allow for nested data structures with multiple 
fields that can be accessed in table expressions. Other types are treated as 
atomic types. In the following, we describe how the Table API converts these 
types into an internal row representation and show examples of converting a 
`DataStream` into a `Table`.
--- End diff --

The description is the two modes is good, but the following sections for 
the different types was not updated. 
I think we could describe *Position-based Mapping* and *Name-based Mapping* 
first and move the concrete code examples to the individual type sections. For 
example for `Tuples` we would show position and name base mappings in the same 
code example. This would also highlight the difference.
We should also double-check the text descriptions for the different types.
  


---


[GitHub] flink pull request #5257: [FLINK-8381] [table] Document more flexible schema...

2018-01-08 Thread fhueske
Github user fhueske commented on a diff in the pull request:

https://github.com/apache/flink/pull/5257#discussion_r160275240
  
--- Diff: docs/dev/table/common.md ---
@@ -802,7 +802,87 @@ val dsTuple: DataSet[(String, Int)] = 
tableEnv.toDataSet[(String, Int)](table)
 
 ### Mapping of Data Types to Table Schema
 
-Flink's DataStream and DataSet APIs support very diverse types, such as 
Tuples (built-in Scala and Flink Java tuples), POJOs, case classes, and atomic 
types. In the following we describe how the Table API converts these types into 
an internal row representation and show examples of converting a `DataStream` 
into a `Table`.
+Flink's DataStream and DataSet APIs support very diverse types. Composite 
types such as Tuples (built-in Scala and Flink Java tuples), POJOs, Scala case 
classes, and Flink's Row type allow for nested data structures with multiple 
fields that can be accessed in table expressions. Other types are treated as 
atomic types. In the following, we describe how the Table API converts these 
types into an internal row representation and show examples of converting a 
`DataStream` into a `Table`.
+
+The mapping of a data type to a table schema can happen in two ways: 
**based on the field positions** or **based on the field names**.
+
+**Position-based Mapping**
+
+Position-based mapping can be used to give fields a more meaningful name 
while keeping the field order. This mapping is available for composite data 
types *with a defined field order* as well as atomic types. Composite data 
types such as tuples, rows, and case classes have such a field order. However, 
fields of a POJO must be mapped based on the field names (see next section).
+
+When defining a position-based mapping, the specified names must not exist 
in the input data type, otherwise the API will assume that the mapping should 
happen based on the field names. If no field names are specified, the default 
field names and field order of the composite type are used or `f0` for atomic 
types. 
+
+
--- End diff --

Move and split code examples to the discussion of the individual types.
  


---


[GitHub] flink pull request #5257: [FLINK-8381] [table] Document more flexible schema...

2018-01-08 Thread twalthr
GitHub user twalthr opened a pull request:

https://github.com/apache/flink/pull/5257

[FLINK-8381] [table] Document more flexible schema definition

## What is the purpose of the change

Documentation for schema definition modes.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/twalthr/flink FLINK-8381

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/5257.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #5257


commit 6ef689a0509fb1040600212b72d6a0a1ef66a3b9
Author: twalthr 
Date:   2018-01-08T10:46:45Z

[FLINK-8381] [table] Document more flexible schema definition




---