[GitHub] [flink] twalthr commented on a change in pull request #14111: [FLINK-20191][document] Add documents for FLIP-95 ability interfaces

2020-11-23 Thread GitBox


twalthr commented on a change in pull request #14111:
URL: https://github.com/apache/flink/pull/14111#discussion_r528729420



##
File path: docs/dev/table/connectors/kafka.md
##
@@ -249,6 +249,15 @@ Besides enabling Flink's checkpointing, you can also 
choose three different mode
 
 Please refer to [Kafka documentation]({% link dev/connectors/kafka.md 
%}#kafka-producers-and-fault-tolerance) for more caveats about delivery 
guarantees.
 
+### Per-partition-watermark Source
+
+Flink supports to emit per-partition-watermark for Kafka. Using this feature, 
watermarks are generated inside the Kafka consumer. The per-partition-watermark 
are merged in
+the same way as watermarks are merged on stream shuffles. The output watermark 
of the source is determined by the minimum watermark among the partitions it 
reads. Considering a watermark assigner
+advance the watermark according to the event-time on the records. If some 
partitions in the topics are idle, the watermark assigner will not advance. You 
can alleviate this problem by
+setting appropriate [idelness timeouts]({{ site.baseurl 
}}/dev/event_timestamps_watermarks.html#dealing-with-idle-sources).

Review comment:
   Maybe there is some misunderstanding. The link that you put in the docs 
helps DataStream API users:
   ```
   WatermarkStrategy
   .>forBoundedOutOfOrderness(Duration.ofSeconds(20))
   .withIdleness(Duration.ofMinutes(1));
   ```
   
   But it should be updated to help Table/SQL API users.
   
   I found 
`org.apache.flink.table.api.config.ExecutionConfigOptions#TABLE_EXEC_SOURCE_IDLE_TIMEOUT`
 we should link to this instead if it works for the Kafka connector.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] twalthr commented on a change in pull request #14111: [FLINK-20191][document] Add documents for FLIP-95 ability interfaces

2020-11-23 Thread GitBox


twalthr commented on a change in pull request #14111:
URL: https://github.com/apache/flink/pull/14111#discussion_r528611200



##
File path: docs/dev/table/connectors/kafka.md
##
@@ -249,6 +249,15 @@ Besides enabling Flink's checkpointing, you can also 
choose three different mode
 
 Please refer to [Kafka documentation]({% link dev/connectors/kafka.md 
%}#kafka-producers-and-fault-tolerance) for more caveats about delivery 
guarantees.
 
+### Per-partition-watermark Source
+
+Flink supports to emit per-partition-watermark for Kafka. Using this feature, 
watermarks are generated inside the Kafka consumer. The per-partition-watermark 
are merged in
+the same way as watermarks are merged on stream shuffles. The output watermark 
of the source is determined by the minimum watermark among the partitions it 
reads. Considering a watermark assigner
+advance the watermark according to the event-time on the records. If some 
partitions in the topics are idle, the watermark assigner will not advance. You 
can alleviate this problem by
+setting appropriate [idelness timeouts]({{ site.baseurl 
}}/dev/event_timestamps_watermarks.html#dealing-with-idle-sources).

Review comment:
   This documentation makes no sense for Table/SQL users. We don't have a 
possibility to set this via options, right?

##
File path: docs/dev/table/connectors/kafka.md
##
@@ -249,6 +249,15 @@ Besides enabling Flink's checkpointing, you can also 
choose three different mode
 
 Please refer to [Kafka documentation]({% link dev/connectors/kafka.md 
%}#kafka-producers-and-fault-tolerance) for more caveats about delivery 
guarantees.
 
+### Per-partition-watermark Source
+
+Flink supports to emit per-partition-watermark for Kafka. Using this feature, 
watermarks are generated inside the Kafka consumer. The per-partition-watermark 
are merged in
+the same way as watermarks are merged on stream shuffles. The output watermark 
of the source is determined by the minimum watermark among the partitions it 
reads. Considering a watermark assigner
+advance the watermark according to the event-time on the records. If some 
partitions in the topics are idle, the watermark assigner will not advance. You 
can alleviate this problem by
+setting appropriate [idelness timeouts]({{ site.baseurl 
}}/dev/event_timestamps_watermarks.html#dealing-with-idle-sources).

Review comment:
   We should actually introduce an option for it.

##
File path: docs/dev/table/sourceSinks.zh.md
##
@@ -31,16 +31,15 @@ of a dynamic table is stored in external systems (such as 
databases, key-value s
 _Dynamic sources_ and _dynamic sinks_ can be used to read and write data from 
and to an external system. In
 the documentation, sources and sinks are often summarized under the term 
_connector_.
 
-Flink provides pre-defined connectors for Kafka, Hive, and different file 
systems. See the [connector section]({% link dev/table/connectors/index.zh.md 
%})
+Flink provides pre-defined connectors for Kafka, Hive, and different file 
systems. See the [connector section]({% link dev/table/connectors/index.md %})

Review comment:
   undo this change?

##
File path: docs/dev/table/sourceSinks.zh.md
##
@@ -31,16 +31,15 @@ of a dynamic table is stored in external systems (such as 
databases, key-value s
 _Dynamic sources_ and _dynamic sinks_ can be used to read and write data from 
and to an external system. In
 the documentation, sources and sinks are often summarized under the term 
_connector_.
 
-Flink provides pre-defined connectors for Kafka, Hive, and different file 
systems. See the [connector section]({% link dev/table/connectors/index.zh.md 
%})
+Flink provides pre-defined connectors for Kafka, Hive, and different file 
systems. See the [connector section]({% link dev/table/connectors/index.md %})
 for more information about built-in table sources and sinks.
 
 This page focuses on how to develop a custom, user-defined connector.
 
 Attention New table source and table 
sink interfaces have been
 introduced in Flink 1.11 as part of 
[FLIP-95](https://cwiki.apache.org/confluence/display/FLINK/FLIP-95%3A+New+TableSource+and+TableSink+interfaces).
-Also the factory interfaces have been reworked. FLIP-95 is not fully 
implemented yet. Many ability interfaces
-are not supported yet (e.g. for filter or partition push down). If necessary, 
please also have a look
-at the [old table sources and sinks page]({% link 
dev/table/legacySourceSinks.zh.md %}). Those interfaces
+Also the factory interfaces have been reworked. Many ability interfaces has 
been supported (e.g. for filter or partition push down).

Review comment:
   Remove `Many ability interfaces has been supported (e.g. for filter or 
partition push down).` entirely?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL 

[GitHub] [flink] twalthr commented on a change in pull request #14111: [FLINK-20191][document] Add documents for FLIP-95 ability interfaces

2020-11-17 Thread GitBox


twalthr commented on a change in pull request #14111:
URL: https://github.com/apache/flink/pull/14111#discussion_r525878851



##
File path: docs/dev/table/sourceSinks.zh.md
##
@@ -193,6 +192,149 @@ for more information.
 The runtime implementation of a `LookupTableSource` is a `TableFunction` or 
`AsyncTableFunction`. The function
 will be called with values for the given lookup keys during runtime.
 
+ Defining a Dynamic Table Source with Projection Push-Down

Review comment:
   How about we just describe the interfaces in 1 or 2 sentences. Maybe in 
a table. And link to the corresponding class on Github. It is very difficult to 
keep docs and interfaces in sync. We should avoid code duplication.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org