[jira] [Commented] (FLINK-8203) Make schema definition of DataStream/DataSet to Table conversion more flexible

ASF GitHub Bot (JIRA) Fri, 05 Jan 2018 06:37:17 -0800

    [ 
https://issues.apache.org/jira/browse/FLINK-8203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16313208#comment-16313208
 ]


ASF GitHub Bot commented on FLINK-8203:
---------------------------------------

Github user fhueske commented on a diff in the pull request:

    https://github.com/apache/flink/pull/5132#discussion_r159877092
  
    --- Diff: 
flink-libraries/flink-table/src/main/scala/org/apache/flink/table/api/TableEnvironment.scala
 ---
    @@ -855,42 +852,26 @@ abstract class TableEnvironment(val config: 
TableConfig) {
               "An input of GenericTypeInfo<Row> cannot be converted to Table. 
" +
                 "Please specify the type of the input with a RowTypeInfo.")
     
    -      case t: TupleTypeInfo[A] =>
    -        exprs.zipWithIndex flatMap {
    -          case (UnresolvedFieldReference(name: String), idx) =>
    -            if (isReferenceByPosition) {
    -              Some((idx, name))
    -            } else {
    -              referenceByName(name, t)
    -            }
    -          case (Alias(UnresolvedFieldReference(origName), name: String, 
_), _) =>
    -            val idx = t.getFieldIndex(origName)
    -            if (idx < 0) {
    -              throw new TableException(s"$origName is not a field of type 
$t. " +
    -                s"Expected: ${t.getFieldNames.mkString(", ")}")
    -            }
    -            Some((idx, name))
    -          case (_: TimeAttribute, _) =>
    -            None
    -          case _ => throw new TableException(
    -            "Field reference expression or alias on field expression 
expected.")
    -        }
    +      case t: TupleTypeInfoBase[A] if t.isInstanceOf[TupleTypeInfo[A]] ||
    +        t.isInstanceOf[CaseClassTypeInfo[A]] || 
t.isInstanceOf[RowTypeInfo] =>
    +
    +        // determine schema definition mode (by position or by name)
    +        val isRefByPos = isReferenceByPosition(t, exprs)
     
    -      case c: CaseClassTypeInfo[A] =>
             exprs.zipWithIndex flatMap {
               case (UnresolvedFieldReference(name: String), idx) =>
    -            if (isReferenceByPosition) {
    +            if (isRefByPos) {
    --- End diff --
    
    moving the `isRefByPos` check outside of the `match` might be easier to 
read?


> Make schema definition of DataStream/DataSet to Table conversion more flexible
> ------------------------------------------------------------------------------
>
>                 Key: FLINK-8203
>                 URL: https://issues.apache.org/jira/browse/FLINK-8203
>             Project: Flink
>          Issue Type: Bug
>          Components: Table API & SQL
>    Affects Versions: 1.4.0, 1.5.0
>            Reporter: Fabian Hueske
>            Assignee: Timo Walther
>
> When converting or registering a {{DataStream}} or {{DataSet}} as {{Table}}, 
> the schema of the table can be defined (by default it is extracted from the 
> {{TypeInformation}}.
> The schema needs to be manually specified to select (project) fields, rename 
> fields, or define time attributes. Right now, there are several limitations 
> how the fields can be defined that also depend on the type of the 
> {{DataStream}} / {{DataSet}}. Types with explicit field ordering (e.g., 
> tuples, case classes, Row) require schema definition based on the position of 
> fields. Pojo types which have no fixed order of fields, require to refer to 
> fields by name. Moreover, there are several restrictions on how time 
> attributes can be defined, e.g., event time attribute must replace an 
> existing field or be appended and proctime attributes must be appended.
> I think we can make the schema definition more flexible and provide two modes:
> 1. Reference input fields by name: All fields in the schema definition are 
> referenced by name (and possibly renamed using an alias ({{as}}). In this 
> mode, fields can be reordered and projected out. Moreover, we can define 
> proctime and eventtime attributes at arbitrary positions using arbitrary 
> names (except those that existing the result schema). This mode can be used 
> for any input type, including POJOs. This mode is used if all field 
> references exist in the input type.
> 2. Reference input fields by position: Field references might not refer to 
> existing fields in the input type. In this mode, fields are simply renamed. 
> Event-time attributes can replace the field on their position in the input 
> data (if it is of correct type) or be appended at the end. Proctime 
> attributes must be appended at the end. This mode can only be used if the 
> input type has a defined field order (tuple, case class, Row).
> We need to add more tests the check for all combinations of input types and 
> schema definition modes.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (FLINK-8203) Make schema definition of DataStream/DataSet to Table conversion more flexible

Reply via email to