[jira] [Updated] (FLINK-17535) Treat min/max as part of the hierarchy of config option
[ https://issues.apache.org/jira/browse/FLINK-17535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xintong Song updated FLINK-17535: - Fix Version/s: (was: 1.14.0) 1.15.0 > Treat min/max as part of the hierarchy of config option > --- > > Key: FLINK-17535 > URL: https://issues.apache.org/jira/browse/FLINK-17535 > Project: Flink > Issue Type: Improvement > Components: Runtime / Configuration >Reporter: Yangze Guo >Priority: Minor > Labels: auto-deprioritized-major > Fix For: 1.15.0 > > > As discussed in > http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Should-max-min-be-part-of-the-hierarchy-of-config-option-td40578.html. > We decide to treat min/max as part of the hierarchy of config option. This > ticket is an umbrella of all tasks related to it. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (FLINK-15645) enable COPY TO/FROM in Postgres JDBC source/sink for faster batch processing
[ https://issues.apache.org/jira/browse/FLINK-15645?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xintong Song updated FLINK-15645: - Fix Version/s: (was: 1.14.0) 1.15.0 > enable COPY TO/FROM in Postgres JDBC source/sink for faster batch processing > > > Key: FLINK-15645 > URL: https://issues.apache.org/jira/browse/FLINK-15645 > Project: Flink > Issue Type: New Feature > Components: Connectors / JDBC >Reporter: Bowen Li >Priority: Minor > Labels: auto-deprioritized-major, auto-unassigned > Fix For: 1.15.0 > > > Postgres has its own SQL extension as COPY FROM/TO via JDBC for faster bulk > loading/reading [https://www.postgresql.org/docs/12/sql-copy.html] > Flink should be able to support that for batch use cases -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (FLINK-15729) Table planner dependency instructions for executing in IDE can be improved
[ https://issues.apache.org/jira/browse/FLINK-15729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xintong Song updated FLINK-15729: - Fix Version/s: (was: 1.14.0) 1.15.0 > Table planner dependency instructions for executing in IDE can be improved > -- > > Key: FLINK-15729 > URL: https://issues.apache.org/jira/browse/FLINK-15729 > Project: Flink > Issue Type: Improvement > Components: Documentation, Table SQL / API >Reporter: Tzu-Li (Gordon) Tai >Priority: Minor > Fix For: 1.15.0 > > > In the docs: > https://ci.apache.org/projects/flink/flink-docs-release-1.9/dev/table/#table-program-dependencies > For it to work in the IDE, it would be clearer to add that in the IDE, you > would additionally need to enable the option to `Include dependencies with > "provided" scope`. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (FLINK-16466) Group by on event time should produce insert only result
[ https://issues.apache.org/jira/browse/FLINK-16466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xintong Song updated FLINK-16466: - Fix Version/s: (was: 1.14.0) 1.15.0 > Group by on event time should produce insert only result > > > Key: FLINK-16466 > URL: https://issues.apache.org/jira/browse/FLINK-16466 > Project: Flink > Issue Type: Improvement > Components: Table SQL / Planner >Affects Versions: 1.10.0 >Reporter: Kurt Young >Priority: Minor > Labels: auto-deprioritized-major > Fix For: 1.15.0 > > > Currently when doing aggregation queries, we can output insert only results > only when grouping by windows. But when users defined event time and also > watermark, we can also support emit insert only results when grouping on > event time. To be more precise, it should only require event time is one of > the grouping keys. One can think of grouping by event time is kind of a > special window, with both window start and window end equals to event time. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (FLINK-16143) Turn on more date time functions of blink planner
[ https://issues.apache.org/jira/browse/FLINK-16143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xintong Song updated FLINK-16143: - Fix Version/s: (was: 1.14.0) 1.15.0 > Turn on more date time functions of blink planner > - > > Key: FLINK-16143 > URL: https://issues.apache.org/jira/browse/FLINK-16143 > Project: Flink > Issue Type: New Feature > Components: Table SQL / API >Reporter: Zili Chen >Priority: Minor > Labels: auto-deprioritized-major > Fix For: 1.15.0 > > > Currently blink planner has a series of built-in functions such as > DATEDIFF > DATE_ADD > DATE_SUB > which haven't been into used so far. We could add the necessary register, > generate and convert code to make it available in production scope. > > what do you think [~jark]? > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (FLINK-16776) Support schema evolution for Hive tables
[ https://issues.apache.org/jira/browse/FLINK-16776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xintong Song updated FLINK-16776: - Fix Version/s: (was: 1.14.0) 1.15.0 > Support schema evolution for Hive tables > > > Key: FLINK-16776 > URL: https://issues.apache.org/jira/browse/FLINK-16776 > Project: Flink > Issue Type: New Feature > Components: Connectors / Hive >Affects Versions: 1.10.0 >Reporter: Rui Li >Priority: Minor > Labels: auto-deprioritized-major, auto-unassigned > Fix For: 1.15.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (FLINK-18199) Translate "Filesystem SQL Connector" page into Chinese
[ https://issues.apache.org/jira/browse/FLINK-18199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xintong Song updated FLINK-18199: - Fix Version/s: (was: 1.14.0) 1.15.0 > Translate "Filesystem SQL Connector" page into Chinese > -- > > Key: FLINK-18199 > URL: https://issues.apache.org/jira/browse/FLINK-18199 > Project: Flink > Issue Type: Sub-task > Components: chinese-translation, Connectors / FileSystem, > Documentation, Table SQL / Ecosystem >Reporter: Jark Wu >Priority: Major > Labels: auto-unassigned, pull-request-available > Fix For: 1.15.0 > > > The page url is > https://ci.apache.org/projects/flink/flink-docs-master/zh/dev/table/connectors/filesystem.html > The markdown file is located in > flink/docs/dev/table/connectors/filesystem.zh.md -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (FLINK-17405) add test cases for cancel job in SQL client
[ https://issues.apache.org/jira/browse/FLINK-17405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xintong Song updated FLINK-17405: - Fix Version/s: (was: 1.14.0) 1.15.0 > add test cases for cancel job in SQL client > --- > > Key: FLINK-17405 > URL: https://issues.apache.org/jira/browse/FLINK-17405 > Project: Flink > Issue Type: Improvement > Components: Table SQL / Client >Reporter: godfrey he >Priority: Minor > Labels: auto-deprioritized-major, auto-unassigned, > pull-request-available > Fix For: 1.15.0 > > > as discussed in [FLINK-15669| > https://issues.apache.org/jira/browse/FLINK-15669], we can re-add some tests > to verify cancel job logic in SQL client. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (FLINK-17808) Rename checkpoint meta file to "_metadata" until it has completed writing
[ https://issues.apache.org/jira/browse/FLINK-17808?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xintong Song updated FLINK-17808: - Fix Version/s: (was: 1.14.0) 1.15.0 > Rename checkpoint meta file to "_metadata" until it has completed writing > - > > Key: FLINK-17808 > URL: https://issues.apache.org/jira/browse/FLINK-17808 > Project: Flink > Issue Type: Improvement > Components: Runtime / Checkpointing >Affects Versions: 1.10.0 >Reporter: Yun Tang >Priority: Minor > Labels: auto-deprioritized-major > Fix For: 1.15.0 > > > In practice, some developers or customers would use some strategy to find the > recent _metadata as the checkpoint to recover (e.g as many proposals in > FLINK-9043 suggest). However, there existed a "_meatadata" file does not mean > the checkpoint have been completed as the writing to create the "_meatadata" > file could break as some force quit (e.g. yarn application -kill). > We could create the checkpoint meta stream to write data to file named as > "_metadata.inprogress" and renamed it to "_metadata" once completed writing. > By doing so, we could ensure the "_metadata" is not broken. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (FLINK-17785) Refactor orc shim to create a OrcShimFactory in hive
[ https://issues.apache.org/jira/browse/FLINK-17785?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xintong Song updated FLINK-17785: - Fix Version/s: (was: 1.14.0) 1.15.0 > Refactor orc shim to create a OrcShimFactory in hive > > > Key: FLINK-17785 > URL: https://issues.apache.org/jira/browse/FLINK-17785 > Project: Flink > Issue Type: Improvement > Components: Connectors / Hive >Reporter: Jingsong Lee >Priority: Minor > Labels: auto-deprioritized-major > Fix For: 1.15.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (FLINK-19038) It doesn't support to call Table.limit() continuously
[ https://issues.apache.org/jira/browse/FLINK-19038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xintong Song updated FLINK-19038: - Fix Version/s: (was: 1.14.0) 1.15.0 > It doesn't support to call Table.limit() continuously > - > > Key: FLINK-19038 > URL: https://issues.apache.org/jira/browse/FLINK-19038 > Project: Flink > Issue Type: Sub-task > Components: Table SQL / API >Affects Versions: 1.12.0 >Reporter: Dian Fu >Priority: Major > Labels: auto-unassigned, pull-request-available > Fix For: 1.15.0 > > > For example, table.limit(3).limit(2) will failed with "FETCH is already > defined." > {code} > org.apache.flink.table.api.ValidationException: FETCH is already defined. > at > org.apache.flink.table.operations.utils.SortOperationFactory.validateAndGetChildSort(SortOperationFactory.java:125) > at > org.apache.flink.table.operations.utils.SortOperationFactory.createLimitWithFetch(SortOperationFactory.java:105) > at > org.apache.flink.table.operations.utils.OperationTreeBuilder.limitWithFetch(OperationTreeBuilder.java:418) > {code} > However, as we support to call table.limit() without specifying the order, I > guess this should be a valid usage and should be allowed. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (FLINK-18742) Some configuration args do not take effect at client
[ https://issues.apache.org/jira/browse/FLINK-18742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xintong Song updated FLINK-18742: - Fix Version/s: (was: 1.14.0) 1.15.0 > Some configuration args do not take effect at client > > > Key: FLINK-18742 > URL: https://issues.apache.org/jira/browse/FLINK-18742 > Project: Flink > Issue Type: Improvement > Components: Command Line Client >Affects Versions: 1.11.1 >Reporter: Matt Wang >Priority: Minor > Labels: auto-deprioritized-major, auto-unassigned, > pull-request-available > Fix For: 1.15.0 > > > Some configuration args from command line will not work at client, for > example, the job sets the {color:#505f79}_classloader.resolve-order_{color} > to _{color:#505f79}parent-first,{color}_ it can work at TaskManager, but > Client doesn't. > The *FlinkUserCodeClassLoaders* will be created before calling the method of > _{color:#505f79}getEffectiveConfiguration(){color}_ at > {color:#505f79}org.apache.flink.client.cli.CliFrontend{color}, so the > _{color:#505f79}Configuration{color}_ used by > _{color:#505f79}PackagedProgram{color}_ does not include Configuration args. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (FLINK-18567) Add Support for Azure Cognitive Search Table & SQL Connector
[ https://issues.apache.org/jira/browse/FLINK-18567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xintong Song updated FLINK-18567: - Fix Version/s: (was: 1.14.0) 1.15.0 > Add Support for Azure Cognitive Search Table & SQL Connector > > > Key: FLINK-18567 > URL: https://issues.apache.org/jira/browse/FLINK-18567 > Project: Flink > Issue Type: Improvement > Components: API / DataStream, Connectors / Common >Affects Versions: 1.12.0 >Reporter: Israel Ekpo >Priority: Minor > Labels: auto-deprioritized-major > Fix For: 1.15.0 > > > The objective of this improvement is to add Azure Cognitive Search [2] as an > output sink for the Table & SQL connectors [1] > [1] > [https://ci.apache.org/projects/flink/flink-docs-release-1.11/dev/table/connectors/] > [2] > [https://docs.microsoft.com/en-us/azure/search/search-what-is-azure-search] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (FLINK-18201) Support create_function in Python Table API
[ https://issues.apache.org/jira/browse/FLINK-18201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xintong Song updated FLINK-18201: - Fix Version/s: (was: 1.14.0) 1.15.0 > Support create_function in Python Table API > --- > > Key: FLINK-18201 > URL: https://issues.apache.org/jira/browse/FLINK-18201 > Project: Flink > Issue Type: Improvement > Components: API / Python >Reporter: Dian Fu >Priority: Minor > Labels: auto-deprioritized-major, auto-unassigned > Fix For: 1.15.0 > > > There is an interface *createFunction* in the Java *TableEnvironment*. It's > used to register a Java UserDefinedFunction class as a catalog function. We > should align the Python Table API with Java and add such an interface in the > Python *TableEnvironment*. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (FLINK-18624) Document CREATE TEMPORARY TABLE
[ https://issues.apache.org/jira/browse/FLINK-18624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xintong Song updated FLINK-18624: - Fix Version/s: (was: 1.14.0) 1.15.0 > Document CREATE TEMPORARY TABLE > --- > > Key: FLINK-18624 > URL: https://issues.apache.org/jira/browse/FLINK-18624 > Project: Flink > Issue Type: New Feature > Components: Documentation >Reporter: Jingsong Lee >Priority: Minor > Labels: auto-deprioritized-major > Fix For: 1.11.5, 1.15.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (FLINK-18442) Move `testSessionWindowsWithContinuousEventTimeTrigger` to `ContinuousEventTimeTriggerTest`
[ https://issues.apache.org/jira/browse/FLINK-18442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xintong Song updated FLINK-18442: - Fix Version/s: (was: 1.14.0) 1.15.0 > Move `testSessionWindowsWithContinuousEventTimeTrigger` to > `ContinuousEventTimeTriggerTest` > --- > > Key: FLINK-18442 > URL: https://issues.apache.org/jira/browse/FLINK-18442 > Project: Flink > Issue Type: Improvement > Components: Tests >Reporter: Lijie Wang >Priority: Minor > Labels: auto-deprioritized-major, auto-unassigned > Fix For: 1.15.0 > > > `testSessionWindowsWithContinuousEventTimeTrigger` in `WindowOperatorTest` is > introduced when fix > [FLINK-4862|https://issues.apache.org/jira/browse/FLINK-4862]. > But it's better to move `testSessionWindowsWithContinuousEventTimeTrigger` > into `ContinuousEventTimeTriggerTest`. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (FLINK-18563) Add Support for Azure Cosmos DB DataStream Connector
[ https://issues.apache.org/jira/browse/FLINK-18563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xintong Song updated FLINK-18563: - Fix Version/s: (was: 1.14.0) 1.15.0 > Add Support for Azure Cosmos DB DataStream Connector > > > Key: FLINK-18563 > URL: https://issues.apache.org/jira/browse/FLINK-18563 > Project: Flink > Issue Type: Improvement > Components: API / DataStream, Connectors / Common >Affects Versions: 1.12.0 >Reporter: Israel Ekpo >Priority: Minor > Labels: auto-deprioritized-major > Fix For: 1.15.0 > > > Add Support for Azure Cosmos DB DataStream Connector > The objective of this improvement is to add Azure Cosmos DB [2] as an input > source and output sink for the DataStream connectors [1] > [1] > https://ci.apache.org/projects/flink/flink-docs-release-1.11/dev/connectors/#datastream-connectors > [2] https://docs.microsoft.com/en-us/azure/cosmos-db/change-feed -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (FLINK-18564) Add Support for Azure Event Hub DataStream Connector
[ https://issues.apache.org/jira/browse/FLINK-18564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xintong Song updated FLINK-18564: - Fix Version/s: (was: 1.14.0) 1.15.0 > Add Support for Azure Event Hub DataStream Connector > > > Key: FLINK-18564 > URL: https://issues.apache.org/jira/browse/FLINK-18564 > Project: Flink > Issue Type: Improvement > Components: API / DataStream, Connectors / Common >Affects Versions: 1.12.0 >Reporter: Israel Ekpo >Priority: Minor > Labels: auto-deprioritized-major > Fix For: 1.15.0 > > > The objective of this improvement is to add Azure Event Hubs [2] as an input > source and output sink for the DataStream connectors [1] > [1] > https://ci.apache.org/projects/flink/flink-docs-release-1.11/dev/connectors/#datastream-connectors > [2] https://docs.microsoft.com/en-us/azure/event-hubs/ -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (FLINK-18022) Add e2e test for new streaming file sink
[ https://issues.apache.org/jira/browse/FLINK-18022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xintong Song updated FLINK-18022: - Fix Version/s: (was: 1.14.0) 1.15.0 > Add e2e test for new streaming file sink > > > Key: FLINK-18022 > URL: https://issues.apache.org/jira/browse/FLINK-18022 > Project: Flink > Issue Type: Improvement > Components: Connectors / FileSystem, Tests >Affects Versions: 1.11.0 >Reporter: Danny Chen >Priority: Minor > Labels: auto-deprioritized-major > Fix For: 1.15.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (FLINK-18565) Add Support for Azure Event Grid DataStream Connector
[ https://issues.apache.org/jira/browse/FLINK-18565?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xintong Song updated FLINK-18565: - Fix Version/s: (was: 1.14.0) 1.15.0 > Add Support for Azure Event Grid DataStream Connector > - > > Key: FLINK-18565 > URL: https://issues.apache.org/jira/browse/FLINK-18565 > Project: Flink > Issue Type: Improvement > Components: API / DataStream, Connectors / Common >Affects Versions: 1.12.0 >Reporter: Israel Ekpo >Priority: Minor > Labels: auto-deprioritized-major > Fix For: 1.15.0 > > > The objective of this improvement is to add Azure Event Grid [2] as an output > sink for the DataStream connectors [1] > [1] > https://ci.apache.org/projects/flink/flink-docs-release-1.11/dev/connectors/#datastream-connectors > [2] https://docs.microsoft.com/en-us/azure/event-grid/overview -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (FLINK-18229) Pending worker requests should be properly cleared
[ https://issues.apache.org/jira/browse/FLINK-18229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xintong Song updated FLINK-18229: - Fix Version/s: (was: 1.14.0) 1.15.0 > Pending worker requests should be properly cleared > -- > > Key: FLINK-18229 > URL: https://issues.apache.org/jira/browse/FLINK-18229 > Project: Flink > Issue Type: Sub-task > Components: Deployment / Kubernetes, Deployment / YARN, Runtime / > Coordination >Affects Versions: 1.9.3, 1.10.1, 1.11.0 >Reporter: Xintong Song >Priority: Major > Fix For: 1.15.0 > > > Currently, if Kubernetes/Yarn does not have enough resources to fulfill > Flink's resource requirement, there will be pending pod/container requests on > Kubernetes/Yarn. These pending resource requirements are never cleared until > either fulfilled or the Flink cluster is shutdown. > However, sometimes Flink no longer needs the pending resources. E.g., the > slot request is then fulfilled by another slots that become available, or the > job failed due to slot request timeout (in a session cluster). In such cases, > Flink does not remove the resource request until the resource is allocated, > then it discovers that it no longer needs the allocated resource and release > them. This would affect the underlying Kubernetes/Yarn cluster, especially > when the cluster is under heavy workload. > It would be good for Flink to cancel pod/container requests as earlier as > possible if it can discover that some of the pending workers are no longer > needed. > There are several approaches potentially achieve this. > # We can always check whether there's a pending worker that can be canceled > when a \{{PendingTaskManagerSlot}} is unassigned. > # We can have a separate timeout for requesting new worker. If the resource > cannot be allocated within the given time since requested, we should cancel > that resource request and claim a resource allocation failure. > # We can share the same timeout for starting new worker (proposed in > FLINK-13554). This is similar to 2), but it requires the worker to be > registered, rather than allocated, before timeout. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (FLINK-18202) Introduce Protobuf format
[ https://issues.apache.org/jira/browse/FLINK-18202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xintong Song updated FLINK-18202: - Fix Version/s: (was: 1.14.0) 1.15.0 > Introduce Protobuf format > - > > Key: FLINK-18202 > URL: https://issues.apache.org/jira/browse/FLINK-18202 > Project: Flink > Issue Type: New Feature > Components: Formats (JSON, Avro, Parquet, ORC, SequenceFile), Table > SQL / Ecosystem >Reporter: Benchao Li >Priority: Major > Labels: auto-unassigned, pull-request-available, sprint > Fix For: 1.15.0 > > Attachments: image-2020-06-15-17-18-03-182.png > > > PB[1] is a very famous and wildly used (de)serialization framework. The ML[2] > also has some discussions about this. It's a useful feature. > This issue maybe needs some designs, or a FLIP. > [1] [https://developers.google.com/protocol-buffers] > [2] [http://apache-flink.147419.n8.nabble.com/Flink-SQL-UDF-td3725.html] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (FLINK-17916) Provide API to separate KafkaShuffle's Producer and Consumer to different jobs
[ https://issues.apache.org/jira/browse/FLINK-17916?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xintong Song updated FLINK-17916: - Fix Version/s: (was: 1.14.0) 1.15.0 > Provide API to separate KafkaShuffle's Producer and Consumer to different jobs > -- > > Key: FLINK-17916 > URL: https://issues.apache.org/jira/browse/FLINK-17916 > Project: Flink > Issue Type: Improvement > Components: API / DataStream, Connectors / Kafka >Affects Versions: 1.11.0 >Reporter: Yuan Mei >Priority: Minor > Labels: auto-deprioritized-major, auto-unassigned, > pull-request-available > Fix For: 1.15.0 > > > Follow up of FLINK-15670 > *Separate sink (producer) and source (consumer) to different jobs* > * In the same job, a sink and a source are recovered independently according > to regional failover. However, they share the same checkpoint coordinator and > correspondingly, share the same global checkpoint snapshot. > * That says if the consumer fails, the producer can not commit written data > because of two-phase commit set-up (the producer needs a checkpoint-complete > signal to complete the second stage) > * Same applies to the producer > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (FLINK-18627) Get unmatch filter method records to side output
[ https://issues.apache.org/jira/browse/FLINK-18627?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xintong Song updated FLINK-18627: - Fix Version/s: (was: 1.14.0) 1.15.0 > Get unmatch filter method records to side output > > > Key: FLINK-18627 > URL: https://issues.apache.org/jira/browse/FLINK-18627 > Project: Flink > Issue Type: Improvement > Components: API / DataStream >Reporter: Roey Shem Tov >Priority: Minor > Labels: auto-deprioritized-major, pull-request-available > Fix For: 1.15.0 > > > Unmatch records to filter functions should send somehow to side output. > Example: > > {code:java} > datastream > .filter(i->i%2==0) > .sideOutput(oddNumbersSideOutput); > {code} > > > That's way we can filter multiple times and send the filtered records to our > side output instead of dropping it immediatly, it can be useful in many ways. > > What do you think? -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (FLINK-18405) Add watermark support for unaligned checkpoints
[ https://issues.apache.org/jira/browse/FLINK-18405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xintong Song updated FLINK-18405: - Fix Version/s: (was: 1.14.0) 1.15.0 > Add watermark support for unaligned checkpoints > --- > > Key: FLINK-18405 > URL: https://issues.apache.org/jira/browse/FLINK-18405 > Project: Flink > Issue Type: Improvement > Components: Runtime / Network >Affects Versions: 1.12.0 >Reporter: Arvid Heise >Priority: Minor > Labels: auto-deprioritized-major > Fix For: 1.15.0 > > > Currently, Flink generates the watermark as a first step of recovery instead > of > storing the latest watermark in the operators to ease rescaling. In unaligned > checkpoints, that means on recovery, Flink generates watermarks after it > restores in-flight data. If your pipeline uses an operator that applies the > latest watermark on each record, it will produce incorrect results during > recovery if the watermark is not directly or indirectly part of the operator > state. Thus, SQL OVER operator should not be used with unaligned > checkpoints, while window operators are safe to use. > A possible solution is to store the watermark in the operator state. If > rescaling may occur, watermarks should be stored per key-group in a > union-state. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (FLINK-19263) Enforce alphabetical order in configuration option docs
[ https://issues.apache.org/jira/browse/FLINK-19263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xintong Song updated FLINK-19263: - Fix Version/s: (was: 1.14.0) 1.15.0 > Enforce alphabetical order in configuration option docs > --- > > Key: FLINK-19263 > URL: https://issues.apache.org/jira/browse/FLINK-19263 > Project: Flink > Issue Type: Improvement > Components: Documentation, Runtime / Configuration >Reporter: Chesnay Schepler >Priority: Minor > Labels: auto-deprioritized-major > Fix For: 1.15.0 > > > The ConfigDocsGenerator sorts options alphabetically, however there are no > checks to ensure that the generated files adhere to that order. > This is a problem because time and time again these files are manually > modified, breaking the order, causing other PRs that then use the generator > to include unrelated changes. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (FLINK-18981) Support column comment for Hive tables
[ https://issues.apache.org/jira/browse/FLINK-18981?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xintong Song updated FLINK-18981: - Fix Version/s: (was: 1.14.0) 1.15.0 > Support column comment for Hive tables > -- > > Key: FLINK-18981 > URL: https://issues.apache.org/jira/browse/FLINK-18981 > Project: Flink > Issue Type: New Feature > Components: Connectors / Hive >Reporter: Rui Li >Priority: Minor > Labels: auto-deprioritized-major > Fix For: 1.15.0 > > > Start working on this once FLINK-18958 is done -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (FLINK-19499) Expose Metric Groups to Split Assigners
[ https://issues.apache.org/jira/browse/FLINK-19499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xintong Song updated FLINK-19499: - Fix Version/s: (was: 1.14.0) 1.15.0 > Expose Metric Groups to Split Assigners > --- > > Key: FLINK-19499 > URL: https://issues.apache.org/jira/browse/FLINK-19499 > Project: Flink > Issue Type: Improvement > Components: Connectors / FileSystem >Reporter: Stephan Ewen >Priority: Minor > Labels: auto-deprioritized-major > Fix For: 1.15.0 > > > Split Assigners should have access to metric groups, so they can report > metrics on assignment, like pending splits, local-, and remote assignments. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (FLINK-18235) Improve the checkpoint strategy for Python UDF execution
[ https://issues.apache.org/jira/browse/FLINK-18235?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xintong Song updated FLINK-18235: - Fix Version/s: (was: 1.14.0) 1.15.0 > Improve the checkpoint strategy for Python UDF execution > > > Key: FLINK-18235 > URL: https://issues.apache.org/jira/browse/FLINK-18235 > Project: Flink > Issue Type: Improvement > Components: API / Python >Reporter: Dian Fu >Priority: Minor > Labels: auto-deprioritized-major > Fix For: 1.15.0 > > > Currently, when a checkpoint is triggered for the Python operator, all the > data buffered will be flushed to the Python worker to be processed. This will > increase the overall checkpoint time in case there are a lot of elements > buffered and Python UDF is slow. We should improve the checkpoint strategy to > improve this, e.g. buffering the data into state instead of flushing them > out. We can also let users to config the checkpoint strategy if needed. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (FLINK-18578) Add rejecting checkpoint logic in source
[ https://issues.apache.org/jira/browse/FLINK-18578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xintong Song updated FLINK-18578: - Fix Version/s: (was: 1.14.0) 1.15.0 > Add rejecting checkpoint logic in source > > > Key: FLINK-18578 > URL: https://issues.apache.org/jira/browse/FLINK-18578 > Project: Flink > Issue Type: New Feature > Components: Connectors / Common, Runtime / Checkpointing >Reporter: Qingsheng Ren >Priority: Minor > Labels: auto-deprioritized-major, auto-unassigned > Fix For: 1.15.0 > > > Under some database's change data capture (CDC) case, the process is usually > divided into two phases: snapshotting phase (lock captured tables and scan > all records in them) and log streaming phase (read all changes starting from > the moment of locking tables) in order to build a complete view of captured > tables. The first snapshotting phase should be atomic so we have to give up > all records created in snapshotting phase if any failure happen, because > contents in captured tables might have changed during recovery. And > checkpointing within snapshotting phase is meaningless too. > As a result, we need to add a new feature in the source to reject checkpoint > if the source is currently within an atomic operation or some other processes > that cannot do a checkpoint currently. This rejection should not be treated > as a failure that could lead to failure of the entire job. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (FLINK-18753) Support local recovery for Unaligned checkpoints
[ https://issues.apache.org/jira/browse/FLINK-18753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xintong Song updated FLINK-18753: - Fix Version/s: (was: 1.14.0) 1.15.0 > Support local recovery for Unaligned checkpoints > > > Key: FLINK-18753 > URL: https://issues.apache.org/jira/browse/FLINK-18753 > Project: Flink > Issue Type: Improvement > Components: Runtime / Checkpointing, Runtime / Task >Affects Versions: 1.11.0 >Reporter: Roman Khachatryan >Priority: Minor > Labels: auto-deprioritized-major > Fix For: 1.15.0 > > > There is standard mechanism of writing two duplicating streams with state > data: to DFS and to local FS. On recovery, if local state is present and it > matches state on JM then it will be used (instead of downloading state from > DFS). > Currently, local recovery is not supported if running with Unaligned > checkpoints enabled. > To enable it, we need to > # write this second stream (see > CheckpointStreamWithResultProvider.getCheckpointOutputStream) > # make sure localState is included into the snapshot reported to JM > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (FLINK-19103) The PushPartitionIntoTableSourceScanRule will lead a performance problem when there are still many partitions after pruning
[ https://issues.apache.org/jira/browse/FLINK-19103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xintong Song updated FLINK-19103: - Fix Version/s: (was: 1.14.0) 1.15.0 > The PushPartitionIntoTableSourceScanRule will lead a performance problem when > there are still many partitions after pruning > --- > > Key: FLINK-19103 > URL: https://issues.apache.org/jira/browse/FLINK-19103 > Project: Flink > Issue Type: Improvement > Components: Table SQL / Planner >Affects Versions: 1.10.2, 1.11.1 >Reporter: fa zheng >Priority: Minor > Labels: auto-deprioritized-major > Fix For: 1.15.0 > > > The PushPartitionIntoTableSourceScanRule will obtain new statistic after > pruning, however, it uses a for loop to get statistics of each partitions and > then merge them together. During this process, flink will try to call > metastore's interface four times in one loop. When remaining partitions are > huge, it spends a lot of time to get new statistic. > > {code:scala} > val newStatistic = { > val tableStats = catalogOption match { > case Some(catalog) => > def mergePartitionStats(): TableStats = { > var stats: TableStats = null > for (p <- remainingPartitions) { > getPartitionStats(catalog, tableIdentifier, p) match { > case Some(currStats) => > if (stats == null) { > stats = currStats > } else { > stats = stats.merge(currStats) > } > case None => return null > } > } > stats > } > mergePartitionStats() > case None => null > } > > FlinkStatistic.builder().statistic(statistic).tableStats(tableStats).build() > } > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (FLINK-18231) flink kafka connector exists same qualified class name, whch will cause conflicts when user has all the versions of flink-kafka-connector
[ https://issues.apache.org/jira/browse/FLINK-18231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xintong Song updated FLINK-18231: - Fix Version/s: (was: 1.14.0) 1.15.0 > flink kafka connector exists same qualified class name, whch will cause > conflicts when user has all the versions of flink-kafka-connector > -- > > Key: FLINK-18231 > URL: https://issues.apache.org/jira/browse/FLINK-18231 > Project: Flink > Issue Type: Improvement > Components: Connectors / Kafka >Affects Versions: 1.11.0 >Reporter: jackylau >Priority: Minor > Labels: auto-deprioritized-major, pull-request-available > Fix For: 1.15.0 > > > > There are 0.9/0.10/0.11/2.x four version of kafka connector in 1.11 > There are 0.10/0.11/2.x three version of kafka connector in 1.12-snapshot > But flink kafka connector exists same qualified class name such as > KafkaConsumerThread and Handover whch will cause conflicts when i have all > the versions of flink-kafka-connector. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (FLINK-18568) Add Support for Azure Data Lake Store Gen 2 in Streaming File Sink
[ https://issues.apache.org/jira/browse/FLINK-18568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xintong Song updated FLINK-18568: - Fix Version/s: (was: 1.14.0) 1.15.0 > Add Support for Azure Data Lake Store Gen 2 in Streaming File Sink > -- > > Key: FLINK-18568 > URL: https://issues.apache.org/jira/browse/FLINK-18568 > Project: Flink > Issue Type: Improvement > Components: API / DataStream, Connectors / Common >Affects Versions: 1.12.0 >Reporter: Israel Ekpo >Assignee: Srinivasulu Punuru >Priority: Minor > Labels: auto-deprioritized-major, stale-assigned > Fix For: 1.15.0 > > > The objective of this improvement is to add support for Azure Data Lake Store > Gen 2 (ADLS Gen2) [2] as one of the supported filesystems for the Streaming > File Sink [1] > [1] > https://ci.apache.org/projects/flink/flink-docs-release-1.11/dev/connectors/streamfile_sink.html > [2] https://hadoop.apache.org/docs/current/hadoop-azure/abfs.html -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (FLINK-19774) Introduce Sub Partition View Version for Approximate Local Recovery
[ https://issues.apache.org/jira/browse/FLINK-19774?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xintong Song updated FLINK-19774: - Fix Version/s: (was: 1.14.0) 1.15.0 > Introduce Sub Partition View Version for Approximate Local Recovery > --- > > Key: FLINK-19774 > URL: https://issues.apache.org/jira/browse/FLINK-19774 > Project: Flink > Issue Type: Sub-task > Components: Runtime / Task >Reporter: Yuan Mei >Priority: Major > Labels: auto-unassigned > Fix For: 1.15.0 > > > > This ticket is to solve a corner case where a downstream task continuously > fails multiple times, or an orphan task execution may exist for a short > period of time after new execution is running (as described in the FLIP) > > Here is an idea of how to cleanly and thoroughly solve this kind of problem: > # We go with the simplified release view version: only release view before a > new creation (in thread2). That says we won't clean up view when downstream > task disconnects ({{releaseView}} would not be called from the reference copy > of view) (in thread1 or 2). > * > ** This would greatly simplify the threading model > ** This won't cause any resource leak, since view release is only to notify > the upstream result partition to releaseOnConsumption when all subpartitions > are consumed in PipelinedSubPartitionView. In our case, we do not release the > result partition on consumption any way (the result partition is put in track > in JobMaster, similar to the ResultParition.blocking Type). > 2. Each view is associated with a downstream task execution version > * > ** This is making sense because we actually have different versions of view > now, corresponding to the vertex.version of the downstream task. > ** createView is performed only if the new version to create is greater than > the existing one > ** If we decide to create a new view, the old view should be released. > I think this way, we can completely disconnect the old view with the > subpartition. Besides that, the working handler in use would always hold the > freshest view reference. > > Point 1 has already been addressed in FLINK-19632. This ticket is to address > Point 2. > Details discussion in [https://github.com/apache/flink/pull/13648] > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (FLINK-18734) Add documentation for DynamoStreams Consumer CDC
[ https://issues.apache.org/jira/browse/FLINK-18734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xintong Song updated FLINK-18734: - Fix Version/s: (was: 1.14.0) 1.15.0 > Add documentation for DynamoStreams Consumer CDC > > > Key: FLINK-18734 > URL: https://issues.apache.org/jira/browse/FLINK-18734 > Project: Flink > Issue Type: Improvement > Components: Connectors / Kinesis, Documentation >Affects Versions: 1.11.1 >Reporter: Vinay >Priority: Minor > Labels: CDC, documentation > Fix For: 1.11.5, 1.15.0 > > > Flink already supports CDC for DynamoDb - > https://issues.apache.org/jira/browse/FLINK-4582 by reading the data from > DynamoStreams but there is no documentation for the same. Given that Flink > now supports CDC for Debezium as well , we should add the documentation for > Dynamo CDC so that more users can use this feature. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (FLINK-19432) Whether to capture the updates which don't change any monitored columns
[ https://issues.apache.org/jira/browse/FLINK-19432?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xintong Song updated FLINK-19432: - Fix Version/s: (was: 1.14.0) 1.15.0 > Whether to capture the updates which don't change any monitored columns > --- > > Key: FLINK-19432 > URL: https://issues.apache.org/jira/browse/FLINK-19432 > Project: Flink > Issue Type: Improvement > Components: Formats (JSON, Avro, Parquet, ORC, SequenceFile) >Affects Versions: 1.11.1 >Reporter: shizhengchao >Priority: Minor > Labels: auto-deprioritized-major > Fix For: 1.15.0 > > > with `debezium-json` and `canal-json`: > Whether to capture the updates which don't change any monitored columns. This > may happen if the monitored columns (columns defined in Flink SQL DDL) is a > subset of the columns in database table. We can provide an optional option, > default 'true', which means all the updates will be captured. You can set to > 'false' to only capture changed updates -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (FLINK-19846) Grammar mistakes in annotations and log
[ https://issues.apache.org/jira/browse/FLINK-19846?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xintong Song updated FLINK-19846: - Fix Version/s: (was: 1.14.0) 1.15.0 > Grammar mistakes in annotations and log > --- > > Key: FLINK-19846 > URL: https://issues.apache.org/jira/browse/FLINK-19846 > Project: Flink > Issue Type: Improvement >Affects Versions: 1.11.2 >Reporter: zoucao >Priority: Minor > Fix For: 1.15.0 > > > There exit some grammar mistakes in annotations and documents. The mistakes > include but are not limited to the following examples: > * a entry in WebLogAnalysis.java [246:34] and adm-zip.js [291:33](which > should be an entry) > * a input in JobGraphGenerator.java [1125:69] etc(which should be an > input) > * a intersection > * an user-* in Table.java etc. (which should be a user) > using global search in intellij idea, more mistakes could be foud like this. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (FLINK-18873) Make the WatermarkStrategy API more scala friendly
[ https://issues.apache.org/jira/browse/FLINK-18873?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xintong Song updated FLINK-18873: - Fix Version/s: (was: 1.14.0) 1.15.0 > Make the WatermarkStrategy API more scala friendly > -- > > Key: FLINK-18873 > URL: https://issues.apache.org/jira/browse/FLINK-18873 > Project: Flink > Issue Type: Improvement > Components: API / Core, API / DataStream >Affects Versions: 1.11.0 >Reporter: Dawid Wysakowicz >Priority: Minor > Labels: auto-deprioritized-major > Fix For: 1.15.0 > > > Right now there is no reliable way of passing WatermarkGeneratorSupplier > and/or TimestampAssigner as lambdas in scala. > The only way to use this API is: > {code} > .assignTimestampsAndWatermarks( > WatermarkStrategy.forGenerator[(String, Long)]( > new WatermarkGeneratorSupplier[(String, Long)] { > override def createWatermarkGenerator(context: > WatermarkGeneratorSupplier.Context): WatermarkGenerator[(String, Long)] = > new MyPeriodicGenerator > } > ) > .withTimestampAssigner(new SerializableTimestampAssigner[(String, > Long)] { > override def extractTimestamp(t: (String, Long), l: Long): Long = > t._2 > }) > ) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (FLINK-19425) Correct the usage of BulkWriter#flush and BulkWriter#finish
[ https://issues.apache.org/jira/browse/FLINK-19425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xintong Song updated FLINK-19425: - Fix Version/s: (was: 1.14.0) 1.15.0 > Correct the usage of BulkWriter#flush and BulkWriter#finish > --- > > Key: FLINK-19425 > URL: https://issues.apache.org/jira/browse/FLINK-19425 > Project: Flink > Issue Type: Improvement > Components: Connectors / Common >Affects Versions: 1.11.0 >Reporter: hailong wang >Priority: Minor > Labels: auto-deprioritized-major > Fix For: 1.11.0, 1.15.0 > > > From the comments, BulkWriter#finish method should flush all buffer before > close. > But some subclasses of it do not flush data. These classes are as follows: > 1.AvroBulkWriter#finish > 2.HadoopCompressionBulkWriter#finish > 3.NoCompressionBulkWriter#finish > 4.SequenceFileWriter#finish > We should invoke BulkWriter#flush in this finish methods. > On the other hand, We don't have to invoke BulkWriter#flush in close method. > For BulkWriter#finish will flush all data. > 1. HadoopPathBasedPartFileWriter#closeForCommit > 2. BulkPartWriter#closeForCommit > 3. FileSystemTableSink#OutputFormat#close > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (FLINK-19085) Remove deprecated methods for writing CSV and Text files from DataStream
[ https://issues.apache.org/jira/browse/FLINK-19085?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xintong Song updated FLINK-19085: - Fix Version/s: (was: 1.14.0) 1.15.0 > Remove deprecated methods for writing CSV and Text files from DataStream > > > Key: FLINK-19085 > URL: https://issues.apache.org/jira/browse/FLINK-19085 > Project: Flink > Issue Type: Sub-task > Components: API / DataStream >Reporter: Dawid Wysakowicz >Priority: Major > Labels: auto-unassigned > Fix For: 1.15.0 > > > We can remove long deprecated {{PublicEvolving}} methods: > - DataStream#writeAsText > - DataStream#writeAsCsv -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (FLINK-18556) Drop the unused options in TableConfig
[ https://issues.apache.org/jira/browse/FLINK-18556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xintong Song updated FLINK-18556: - Fix Version/s: (was: 1.14.0) 1.15.0 > Drop the unused options in TableConfig > -- > > Key: FLINK-18556 > URL: https://issues.apache.org/jira/browse/FLINK-18556 > Project: Flink > Issue Type: Sub-task > Components: Table SQL / API >Reporter: Jark Wu >Priority: Major > Fix For: 1.15.0 > > > As disucssed in FLINK-16835, the following {{TableConfig}} options are not > preserved: > * {{nullCheck}}: Flink will automatically enable null checks based on the > table schema ({{NOT NULL}} property) > * {{decimalContext}}: this configuration is only used by the legacy planner > which will be removed in one of the next releases > * {{maxIdleStateRetention}}: is automatically derived as 1.5* > {{idleStateRetention}} until StateTtlConfig is fully supported (at which > point only a single parameter is required). > The blink planner should remove the dependencies on {{nullCheck}} and > {{maxIdleStateRetention}} first. Besides, this maybe blocked by when to drop > old planner, because old planner is still using them. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (FLINK-18213) refactor kafka sql connector to use just one shade to compatible 0.10.0.2 +
[ https://issues.apache.org/jira/browse/FLINK-18213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xintong Song updated FLINK-18213: - Fix Version/s: (was: 1.14.0) 1.15.0 > refactor kafka sql connector to use just one shade to compatible 0.10.0.2 + > --- > > Key: FLINK-18213 > URL: https://issues.apache.org/jira/browse/FLINK-18213 > Project: Flink > Issue Type: Improvement > Components: Connectors / Kafka >Affects Versions: 1.11.0 >Reporter: jackylau >Priority: Minor > Labels: auto-deprioritized-major > Fix For: 1.15.0 > > > Flink master supports 0.10/0.11/2.x, with three flink-sql-connector shade jar > currently (1.12-snapshot). > As we all know ,kafka client is compatible after 0.10.2.x, so we can use > kafka client 2.x to access to brocker server are 0.10/0.11/2.x. > So we can just use one kafka sql shade jar. > for this , we should do 2 things > 1) refactor to 1 shade jar > 2) rename flink-kafka-connector mudules with same qualified name in case of > conflicts such as NoSuchMethod or ClassNotFound error -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (FLINK-19362) Remove confusing comment for `DOT` operator codegen
[ https://issues.apache.org/jira/browse/FLINK-19362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xintong Song updated FLINK-19362: - Fix Version/s: (was: 1.14.0) 1.15.0 > Remove confusing comment for `DOT` operator codegen > --- > > Key: FLINK-19362 > URL: https://issues.apache.org/jira/browse/FLINK-19362 > Project: Flink > Issue Type: Improvement > Components: Table SQL / Runtime >Affects Versions: 1.11.0 >Reporter: hailong wang >Priority: Minor > Labels: auto-deprioritized-major > Fix For: 1.15.0 > > > `DOT` operator codegen (ExprCodeGenerator#generateCallExpression) has comment > as following: > {code:java} > // due to https://issues.apache.org/jira/browse/CALCITE-2162, expression such > as > // "array[1].a.b" won't work now. > if (operands.size > 2) { > throw new CodeGenException( > "A DOT operator with more than 2 operands is not supported yet.") > } > {code} > But `array[1].a.b` actually can work for flink job. `DOT` will be transform > to `RexFieldAccess` for CALCITE-2542. And `generateDot` will never be invoked > except suppporting ITEM for ROW types. > Simply, I think we can only delete the comment which is confusing. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (FLINK-18566) Add Support for Azure Cognitive Search DataStream Connector
[ https://issues.apache.org/jira/browse/FLINK-18566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xintong Song updated FLINK-18566: - Fix Version/s: (was: 1.14.0) 1.15.0 > Add Support for Azure Cognitive Search DataStream Connector > --- > > Key: FLINK-18566 > URL: https://issues.apache.org/jira/browse/FLINK-18566 > Project: Flink > Issue Type: Improvement > Components: API / DataStream, Connectors / Common >Affects Versions: 1.12.0 >Reporter: Israel Ekpo >Priority: Minor > Labels: auto-deprioritized-major > Fix For: 1.15.0 > > > The objective of this improvement is to add Azure Cognitive Search [2] as an > output sink for the DataStream connectors [1] > [1] > https://ci.apache.org/projects/flink/flink-docs-release-1.11/dev/connectors/#datastream-connectors > [2] https://docs.microsoft.com/en-us/azure/search/search-what-is-azure-search -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (FLINK-18954) Add documentation for connectors in Python DataStream API.
[ https://issues.apache.org/jira/browse/FLINK-18954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xintong Song updated FLINK-18954: - Fix Version/s: (was: 1.14.0) 1.15.0 > Add documentation for connectors in Python DataStream API. > -- > > Key: FLINK-18954 > URL: https://issues.apache.org/jira/browse/FLINK-18954 > Project: Flink > Issue Type: Sub-task > Components: API / Python >Reporter: Hequn Cheng >Priority: Major > Labels: auto-unassigned > Fix For: 1.15.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (FLINK-19743) Add Source metrics definitions
[ https://issues.apache.org/jira/browse/FLINK-19743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xintong Song updated FLINK-19743: - Fix Version/s: (was: 1.14.0) 1.15.0 > Add Source metrics definitions > -- > > Key: FLINK-19743 > URL: https://issues.apache.org/jira/browse/FLINK-19743 > Project: Flink > Issue Type: Sub-task > Components: Connectors / Common >Affects Versions: 1.11.2 >Reporter: Jiangjie Qin >Priority: Major > Labels: auto-unassigned, pull-request-available > Fix For: 1.12.6, 1.15.0 > > > Add the metrics defined in > [FLIP-33|https://cwiki.apache.org/confluence/display/FLINK/FLIP-33%3A+Standardize+Connector+Metrics] > to \{{OperatorMetricsGroup}} and {{SourceReaderContext}} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (FLINK-19935) Supports configure heap memory of sql-client to avoid OOM
[ https://issues.apache.org/jira/browse/FLINK-19935?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xintong Song updated FLINK-19935: - Fix Version/s: (was: 1.14.0) 1.15.0 > Supports configure heap memory of sql-client to avoid OOM > - > > Key: FLINK-19935 > URL: https://issues.apache.org/jira/browse/FLINK-19935 > Project: Flink > Issue Type: New Feature > Components: Table SQL / Client >Affects Versions: 1.11.2 >Reporter: harold.miao >Priority: Minor > Labels: auto-deprioritized-major > Fix For: 1.15.0 > > Attachments: image-2020-11-03-10-31-08-294.png > > > hi > when use sql-client submit job, the command below donot set JVM heap > pramameters. And cause OOM error in my production environment. > exec $JAVA_RUN $JVM_ARGS "${log_setting[@]}" -classpath "`manglePathList > "$CC_CLASSPATH:$INTERNAL_HADOOP_CLASSPATHS"`" > org.apache.flink.table.client.SqlClient "$@" > > !image-2020-11-03-10-31-08-294.png! -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (FLINK-18887) Add ElasticSearch connector for Python DataStream API
[ https://issues.apache.org/jira/browse/FLINK-18887?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xintong Song updated FLINK-18887: - Fix Version/s: (was: 1.14.0) 1.15.0 > Add ElasticSearch connector for Python DataStream API > - > > Key: FLINK-18887 > URL: https://issues.apache.org/jira/browse/FLINK-18887 > Project: Flink > Issue Type: Sub-task > Components: API / Python >Reporter: Shuiqiang Chen >Priority: Major > Labels: auto-unassigned > Fix For: 1.15.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (FLINK-19647) Support the limit push down for the hbase connector
[ https://issues.apache.org/jira/browse/FLINK-19647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xintong Song updated FLINK-19647: - Fix Version/s: (was: 1.14.0) 1.15.0 > Support the limit push down for the hbase connector > --- > > Key: FLINK-19647 > URL: https://issues.apache.org/jira/browse/FLINK-19647 > Project: Flink > Issue Type: Sub-task > Components: Connectors / HBase, Table SQL / Ecosystem >Reporter: Shengkai Fang >Priority: Major > Labels: auto-unassigned, pull-request-available > Fix For: 1.15.0 > > > Support limit push down for hbase. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (FLINK-19034) Remove deprecated StreamExecutionEnvironment#set/getNumberOfExecutionRetries
[ https://issues.apache.org/jira/browse/FLINK-19034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xintong Song updated FLINK-19034: - Fix Version/s: (was: 1.14.0) 1.15.0 > Remove deprecated StreamExecutionEnvironment#set/getNumberOfExecutionRetries > > > Key: FLINK-19034 > URL: https://issues.apache.org/jira/browse/FLINK-19034 > Project: Flink > Issue Type: Sub-task > Components: API / DataStream >Reporter: Dawid Wysakowicz >Priority: Major > Labels: auto-unassigned > Fix For: 1.15.0 > > > Remove deprecated > {code} > StreamExecutionEnvironment#setNumberOfExecutionRetries/getNumberOfExecutionRetries > {code} > The corresponding settings in {{ExecutionConfig}} will be removed in a > separate issue, as they are {{Public}}. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (FLINK-19659) Array type supports equals and not_equals operator when element types are different but castable
[ https://issues.apache.org/jira/browse/FLINK-19659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xintong Song updated FLINK-19659: - Fix Version/s: (was: 1.14.0) 1.15.0 > Array type supports equals and not_equals operator when element types are > different but castable > > > Key: FLINK-19659 > URL: https://issues.apache.org/jira/browse/FLINK-19659 > Project: Flink > Issue Type: New Feature > Components: Table SQL / API >Affects Versions: 1.11.0 >Reporter: hailong wang >Priority: Minor > Labels: auto-deprioritized-major, auto-unassigned, > pull-request-available > Fix For: 1.15.0 > > > Currently, Array type supports `equals` and `not_equals` when element types > are the same or can not be cased. For example, > {code:java} > Array[1] <> Array[1] -> false{code} > {code:java} > Array[1] <> Array[cast(1 as bigint)] -> false > {code} > But for the element types which are castable, it will throw error, > {code:java} > org.apache.flink.table.planner.codegen.CodeGenException: Unsupported cast > from 'ARRAY NOT NULL' to 'ARRAY NOT > NULL'.org.apache.flink.table.planner.codegen.CodeGenException: Unsupported > cast from 'ARRAY NOT NULL' to 'ARRAY NOT > NULL'. at > org.apache.flink.table.planner.codegen.calls.ScalarOperatorGens$.generateCast(ScalarOperatorGens.scala:1295) > at > org.apache.flink.table.planner.codegen.ExprCodeGenerator.generateCallExpression(ExprCodeGenerator.scala:703) > at > org.apache.flink.table.planner.codegen.ExprCodeGenerator.visitCall(ExprCodeGenerator.scala:498) > at > org.apache.flink.table.planner.codegen.ExprCodeGenerator.visitCall(ExprCodeGenerator.scala:55) > at org.apache.calcite.rex.RexCall.accept(RexCall.java:288){code} > But the result should be false or true, for example, > {code:java} > /Array[1] <> Array[cast(1 as bigint)] -> false > {code} > > BTW, Map and MultiSet type are same as this, If it did, I am pleasure to open > other issues to track those. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (FLINK-19613) Create flink-connector-files-test-utils for formats testing
[ https://issues.apache.org/jira/browse/FLINK-19613?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xintong Song updated FLINK-19613: - Fix Version/s: (was: 1.14.0) 1.15.0 > Create flink-connector-files-test-utils for formats testing > --- > > Key: FLINK-19613 > URL: https://issues.apache.org/jira/browse/FLINK-19613 > Project: Flink > Issue Type: Improvement > Components: Connectors / FileSystem, Tests >Reporter: Jingsong Lee >Priority: Minor > Labels: auto-deprioritized-major > Fix For: 1.15.0 > > > Since flink-connector-files has some tests with scala dependencies, we cannot > create test-jar for it. > We should create a new module {{flink-connector-files-test-utils}} , it > should be a scala free module, formats can rely on it for complete testing. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (FLINK-19722) Pushdown Watermark to SourceProvider (new Source)
[ https://issues.apache.org/jira/browse/FLINK-19722?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xintong Song updated FLINK-19722: - Fix Version/s: (was: 1.14.0) 1.15.0 > Pushdown Watermark to SourceProvider (new Source) > - > > Key: FLINK-19722 > URL: https://issues.apache.org/jira/browse/FLINK-19722 > Project: Flink > Issue Type: Improvement > Components: Table SQL / Planner >Reporter: Jingsong Lee >Priority: Major > Labels: auto-deprioritized-critical > Fix For: 1.15.0 > > > See {{StreamExecutionEnvironment.fromSource(Source, WatermarkStrategy)}} > The new source can get watermark strategy to handle split watermark. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (FLINK-20091) Introduce avro.ignore-parse-errors for AvroRowDataDeserializationSchema
[ https://issues.apache.org/jira/browse/FLINK-20091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xintong Song updated FLINK-20091: - Fix Version/s: (was: 1.14.0) 1.15.0 > Introduce avro.ignore-parse-errors for AvroRowDataDeserializationSchema > > > Key: FLINK-20091 > URL: https://issues.apache.org/jira/browse/FLINK-20091 > Project: Flink > Issue Type: Improvement > Components: Formats (JSON, Avro, Parquet, ORC, SequenceFile), Table > SQL / Ecosystem >Affects Versions: 1.12.0 >Reporter: hailong wang >Priority: Minor > Labels: auto-deprioritized-major, auto-unassigned, > pull-request-available > Fix For: 1.15.0 > > > Introduce avro.ignore-parse-errors to allow users to skip rows with parsing > errors instead of failing when deserializing avro format data. > This is useful when there are dirty data, for without this option, users can > not skip the dirty row. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (FLINK-19007) Automatically set presto staging directory to io.tmp.dirs
[ https://issues.apache.org/jira/browse/FLINK-19007?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xintong Song updated FLINK-19007: - Fix Version/s: (was: 1.14.0) 1.15.0 > Automatically set presto staging directory to io.tmp.dirs > - > > Key: FLINK-19007 > URL: https://issues.apache.org/jira/browse/FLINK-19007 > Project: Flink > Issue Type: Improvement > Components: Connectors / FileSystem, Runtime / Configuration >Reporter: Chesnay Schepler >Priority: Minor > Labels: auto-deprioritized-major, usability > Fix For: 1.15.0 > > > The presto S3 filesystem uses a staging directory on the local disk (for > buffering data or something). This directory by default points to {{/tmp}}. > We often see users configuring a special directory for temporary files via > {{io.tmp.dirs}}, and it would make sense to automatically set the > {{staging-directory}} to the same path, iff it was not explicitly configured. > The corresponding property is {{presto.s3.staging-directory}}, found at > {{com.facebook.presto.hive.s3.PrestoS3FileSystem.S3_STAGING_DIRECTORY}}. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (FLINK-19819) SourceReaderBase supports limit push down
[ https://issues.apache.org/jira/browse/FLINK-19819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xintong Song updated FLINK-19819: - Fix Version/s: (was: 1.14.0) 1.15.0 > SourceReaderBase supports limit push down > - > > Key: FLINK-19819 > URL: https://issues.apache.org/jira/browse/FLINK-19819 > Project: Flink > Issue Type: Improvement > Components: Connectors / Common >Reporter: Jingsong Lee >Priority: Minor > Labels: auto-deprioritized-major > Fix For: 1.15.0 > > > User requirement: > Users need to look at a few random pieces of data in a table to see what the > data looks like. So users often use the SQL: > "select * from table limit 10" > For a large table, expect to end soon because only a few pieces of data are > queried. > For DataStream or BoundedStream, they are push based execution models, so the > downstream cannot control the end of source operator. > We need push down limit to source operator, so that source operator can end > early. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (FLINK-19230) Support Python UDAF on blink batch planner
[ https://issues.apache.org/jira/browse/FLINK-19230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xintong Song updated FLINK-19230: - Fix Version/s: (was: 1.14.0) 1.15.0 > Support Python UDAF on blink batch planner > -- > > Key: FLINK-19230 > URL: https://issues.apache.org/jira/browse/FLINK-19230 > Project: Flink > Issue Type: New Feature > Components: API / Python >Reporter: Wei Zhong >Priority: Minor > Labels: auto-deprioritized-major, auto-unassigned > Fix For: 1.15.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (FLINK-19895) Unify Life Cycle Management of ResultPartitionType Pipelined Family
[ https://issues.apache.org/jira/browse/FLINK-19895?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xintong Song updated FLINK-19895: - Fix Version/s: (was: 1.14.0) 1.15.0 > Unify Life Cycle Management of ResultPartitionType Pipelined Family > --- > > Key: FLINK-19895 > URL: https://issues.apache.org/jira/browse/FLINK-19895 > Project: Flink > Issue Type: Sub-task > Components: Runtime / Task >Reporter: Yuan Mei >Priority: Major > Fix For: 1.15.0 > > > This ticket is to unify lifecycle management of > `ResultPartitionType.PIPELINED(_BOUNDED)` and > `ResultPartitionType.PIPELINED_APPOXIMATE`, so that we can get rid of the > hacky attribute `reconenctable` introduced in FLINK-19693 > > In short: > *The current behavior of PIPELINED(_BOUNDED) is* -> > Release partition as soon as consumer exits > Release partition as soon as producer fails/canceled > *Current behavior of PIPELINED_APPOXIMATE* -> > Release partition as soon as producer fails/canceled > Release partition when the job exists > *Unified Pipelined Family to* > Release partition when producer exits. > > One more question: > *whether we can unify Blocking + Pieliened Family to* > Producer release partition when producer fails/canceled > Release partition when the job exists -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (FLINK-20246) Add documentation on Python worker memory tuning in the memory tuning page
[ https://issues.apache.org/jira/browse/FLINK-20246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xintong Song updated FLINK-20246: - Fix Version/s: (was: 1.14.0) 1.15.0 > Add documentation on Python worker memory tuning in the memory tuning page > -- > > Key: FLINK-20246 > URL: https://issues.apache.org/jira/browse/FLINK-20246 > Project: Flink > Issue Type: Improvement > Components: API / Python, Documentation >Reporter: Dian Fu >Priority: Minor > Labels: auto-deprioritized-major, auto-unassigned > Fix For: 1.15.0 > > > Per the discussion in FLINK-19177, we need to add some documentation on > Python worker memory tuning in the memory tuning page. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (FLINK-19845) Migrate all FileSystemFormatFactory implementations
[ https://issues.apache.org/jira/browse/FLINK-19845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xintong Song updated FLINK-19845: - Fix Version/s: (was: 1.14.0) 1.15.0 > Migrate all FileSystemFormatFactory implementations > --- > > Key: FLINK-19845 > URL: https://issues.apache.org/jira/browse/FLINK-19845 > Project: Flink > Issue Type: Improvement > Components: Formats (JSON, Avro, Parquet, ORC, SequenceFile) >Reporter: Jingsong Lee >Priority: Major > Fix For: 1.15.0 > > > We should use interfaces introduced by FLINK-19599 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (FLINK-20036) Join Has NoUniqueKey when using mini-batch
[ https://issues.apache.org/jira/browse/FLINK-20036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xintong Song updated FLINK-20036: - Fix Version/s: (was: 1.14.0) 1.15.0 > Join Has NoUniqueKey when using mini-batch > -- > > Key: FLINK-20036 > URL: https://issues.apache.org/jira/browse/FLINK-20036 > Project: Flink > Issue Type: Improvement > Components: Table SQL / Planner >Affects Versions: 1.11.2 >Reporter: Rex Remind >Priority: Minor > Labels: auto-deprioritized-major > Fix For: 1.15.0 > > > Hello, > > We tried out mini-batch mode and our Join suddenly had NoUniqueKey. > Join: > {code:java} > Table membershipsTable = tableEnv.from(SOURCE_MEMBERSHIPS) > .renameColumns($("id").as("membership_id")) > .select($("*")).join(usersTable, $("user_id").isEqual($("id"))); > {code} > Mini-batch config: > {code:java} > configuration.setString("table.exec.mini-batch.enabled", "true"); // enable > mini-batch optimization > configuration.setString("table.exec.mini-batch.allow-latency", "5 s"); // use > 5 seconds to buffer input records > configuration.setString("table.exec.mini-batch.size", "5000"); // the maximum > number of records can be buffered by each aggregate operator task > {code} > > Join with mini-batch: > {code:java} > Join(joinType=[InnerJoin], where=[(user_id = id0)], select=[id, > group_id, user_id, uuid, owner, id0, deleted_at], > leftInputSpec=[NoUniqueKey], rightInputSpec=[NoUniqueKey]) > {code} > Join without mini-batch: > {code:java} > Join(joinType=[InnerJoin], where=[(user_id = id0)], select=[id, group_id, > user_id, uuid, owner, id0, deleted_at], leftInputSpec=[HasUniqueKey], > rightInputSpec=[JoinKeyContainsUniqueKey]) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (FLINK-19965) Refactor HiveMapredSplitReader to adapt to the new hive source
[ https://issues.apache.org/jira/browse/FLINK-19965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xintong Song updated FLINK-19965: - Fix Version/s: (was: 1.14.0) 1.15.0 > Refactor HiveMapredSplitReader to adapt to the new hive source > -- > > Key: FLINK-19965 > URL: https://issues.apache.org/jira/browse/FLINK-19965 > Project: Flink > Issue Type: Sub-task > Components: Connectors / Hive >Reporter: Rui Li >Priority: Major > Fix For: 1.15.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (FLINK-20110) Support 'merge' method for first_value and last_value UDAF
[ https://issues.apache.org/jira/browse/FLINK-20110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xintong Song updated FLINK-20110: - Fix Version/s: (was: 1.14.0) 1.15.0 > Support 'merge' method for first_value and last_value UDAF > -- > > Key: FLINK-20110 > URL: https://issues.apache.org/jira/browse/FLINK-20110 > Project: Flink > Issue Type: Improvement > Components: Table SQL / Runtime >Affects Versions: 1.12.0 >Reporter: hailong wang >Priority: Major > Labels: auto-unassigned, pull-request-available > Fix For: 1.15.0 > > > From the user-zh email, when use first_value function in hop window, It > throws the exception because first_vaue does not implement the merge method. > We can support 'merge' method for first_value and last_value UDAF -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (FLINK-20190) A New Window Trigger that can trigger window operation both by event time interval、event count for DataStream API
[ https://issues.apache.org/jira/browse/FLINK-20190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xintong Song updated FLINK-20190: - Fix Version/s: (was: 1.14.0) 1.15.0 > A New Window Trigger that can trigger window operation both by event time > interval、event count for DataStream API > - > > Key: FLINK-20190 > URL: https://issues.apache.org/jira/browse/FLINK-20190 > Project: Flink > Issue Type: New Feature > Components: API / DataStream >Reporter: GaryGao >Priority: Minor > Labels: auto-deprioritized-major > Fix For: 1.15.0 > > > In production environment, when we are do some window operation, such as > window aggregation, using data stream api, developers are always asked to not > only trigger the window operation when the watermark pass the max timestamp > of window, but also trigger it both by fixed event time interval and fixed > count of event.The reason why we want to do this is we are looking forward to > get the frequently updated window operation result, other than waiting for a > long time until the watermark pass the max timestamp of window.This is very > useful in reporting and other BI applications. > For now the default triggers provided by flink can not close this > requirement, so I developed a New Trigger, so called > CountAndContinuousEventTimeTrigger, combine ContinuousEventTimeTrigger with > CountTrigger to do the above thing. > > To use CountAndContinuousEventTimeTrigger, you should specify two parameters > as revealed in it constructor: > {code:java} > private CountAndContinuousEventTimeTrigger(Time interval, long > maxCount);{code} > * Time interval, it means this trigger will continuously fires based on a > given time interval, the same as ContinuousEventTimeTrigger. > * long maxCount, it means this trigger will fires once the count of elements > in a pane reaches the given count, the same as CountTrigger. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (FLINK-19527) Update SQL Pages
[ https://issues.apache.org/jira/browse/FLINK-19527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xintong Song updated FLINK-19527: - Fix Version/s: (was: 1.14.0) 1.15.0 > Update SQL Pages > > > Key: FLINK-19527 > URL: https://issues.apache.org/jira/browse/FLINK-19527 > Project: Flink > Issue Type: Sub-task > Components: Documentation >Reporter: Seth Wiesman >Priority: Major > Labels: auto-unassigned, pull-request-available > Fix For: 1.15.0 > > > SQL > Goal: Show users the main features early and link to concepts if necessary. > How to use SQL? Intended for users with SQL knowledge. > Overview > Getting started with link to more detailed execution section. > Full Reference > Available operations in SQL as a table. This location allows to further > split the page in the future if we think an operation needs more space > without affecting the top-level structure. > Data Definition > Explain special SQL syntax around DDL. > Pattern Matching > Make pattern matching more visible. > ... more features in the future -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (FLINK-20518) WebUI should escape characters in metric names
[ https://issues.apache.org/jira/browse/FLINK-20518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xintong Song updated FLINK-20518: - Fix Version/s: (was: 1.14.0) 1.15.0 > WebUI should escape characters in metric names > -- > > Key: FLINK-20518 > URL: https://issues.apache.org/jira/browse/FLINK-20518 > Project: Flink > Issue Type: Improvement > Components: Runtime / Web Frontend >Affects Versions: 1.10.0 >Reporter: Chesnay Schepler >Assignee: tartarus >Priority: Minor > Labels: auto-deprioritized-major, pull-request-available, > stale-assigned > Fix For: 1.15.0 > > > Metric names can contain characters like {{+}} or {{?}} that should be > escaped when querying metrics. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (FLINK-19967) Clean legacy classes for Parquet and Orc
[ https://issues.apache.org/jira/browse/FLINK-19967?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xintong Song updated FLINK-19967: - Fix Version/s: (was: 1.14.0) 1.15.0 > Clean legacy classes for Parquet and Orc > > > Key: FLINK-19967 > URL: https://issues.apache.org/jira/browse/FLINK-19967 > Project: Flink > Issue Type: Sub-task > Components: Formats (JSON, Avro, Parquet, ORC, SequenceFile) >Reporter: Jingsong Lee >Priority: Major > Fix For: 1.15.0 > > > We have introduce new BulkFormats, we can clean old inner classes. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (FLINK-19881) Optimize temporal join with upsert-Source(upsert-kafka)
[ https://issues.apache.org/jira/browse/FLINK-19881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xintong Song updated FLINK-19881: - Fix Version/s: (was: 1.14.0) 1.15.0 > Optimize temporal join with upsert-Source(upsert-kafka) > --- > > Key: FLINK-19881 > URL: https://issues.apache.org/jira/browse/FLINK-19881 > Project: Flink > Issue Type: Improvement > Components: Table SQL / Runtime >Reporter: Leonard Xu >Priority: Minor > Labels: auto-deprioritized-major > Fix For: 1.15.0 > > > Currently upsert-kafka will do normalize in a physical node named > `ChangelogNormalize`, the normalization will do a deduplicate using state and > produce `UPDATE_AFTER`, `DELETE` changelog. We do same thing In the state of > temporal join operator, we can merge them to one as an optimization if the > query contains temporal join an upsert-kafka. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (FLINK-20280) Support batch mode for Python DataStream API
[ https://issues.apache.org/jira/browse/FLINK-20280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xintong Song updated FLINK-20280: - Fix Version/s: (was: 1.14.0) 1.15.0 > Support batch mode for Python DataStream API > > > Key: FLINK-20280 > URL: https://issues.apache.org/jira/browse/FLINK-20280 > Project: Flink > Issue Type: New Feature > Components: API / Python >Reporter: Dian Fu >Priority: Major > Labels: auto-unassigned, pull-request-available > Fix For: 1.15.0 > > > Currently, it still doesn't support batch mode for the Python DataStream API. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (FLINK-20188) Add Documentation for new File Source
[ https://issues.apache.org/jira/browse/FLINK-20188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xintong Song updated FLINK-20188: - Fix Version/s: (was: 1.14.0) 1.15.0 > Add Documentation for new File Source > - > > Key: FLINK-20188 > URL: https://issues.apache.org/jira/browse/FLINK-20188 > Project: Flink > Issue Type: New Feature > Components: Connectors / FileSystem, Documentation >Reporter: Stephan Ewen >Priority: Major > Fix For: 1.15.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (FLINK-20726) Introduce Pulsar connector
[ https://issues.apache.org/jira/browse/FLINK-20726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xintong Song updated FLINK-20726: - Fix Version/s: (was: 1.14.0) 1.15.0 > Introduce Pulsar connector > -- > > Key: FLINK-20726 > URL: https://issues.apache.org/jira/browse/FLINK-20726 > Project: Flink > Issue Type: New Feature > Components: Connectors / Common >Affects Versions: 1.13.0 >Reporter: Jianyun Zhao >Priority: Major > Labels: auto-unassigned > Fix For: 1.15.0 > > > Pulsar is an important player in messaging middleware, and it is essential > for Flink to support Pulsar. > Our existing code is maintained at > [streamnative/pulsar-flink|https://github.com/streamnative/pulsar-flink] , > next we will split it into several pr merges back to the community. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (FLINK-20767) add nested field support for SupportsFilterPushDown
[ https://issues.apache.org/jira/browse/FLINK-20767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xintong Song updated FLINK-20767: - Fix Version/s: (was: 1.14.0) 1.15.0 > add nested field support for SupportsFilterPushDown > --- > > Key: FLINK-20767 > URL: https://issues.apache.org/jira/browse/FLINK-20767 > Project: Flink > Issue Type: Sub-task > Components: Table SQL / Planner >Affects Versions: 1.12.0 >Reporter: Jun Zhang >Priority: Major > Fix For: 1.15.0 > > > I think we should add the nested field support for SupportsFilterPushDown -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (FLINK-20787) Improve the Table API to make it usable
[ https://issues.apache.org/jira/browse/FLINK-20787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xintong Song updated FLINK-20787: - Fix Version/s: (was: 1.14.0) 1.15.0 > Improve the Table API to make it usable > --- > > Key: FLINK-20787 > URL: https://issues.apache.org/jira/browse/FLINK-20787 > Project: Flink > Issue Type: Improvement > Components: Table SQL / API >Reporter: Dian Fu >Priority: Minor > Labels: auto-deprioritized-major > Fix For: 1.15.0 > > > Currently, there are quite a few bugs in the Table API which makes it > difficult to use. Users will encounter all kinds of problems when using Table > API and have to find various workarounds from time to time. This is an > umbrella JIRA for all the issues specific in the Table API and trying to make > the Table API smooth to use. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (FLINK-20154) Improve error messages when using CLI with wrong target
[ https://issues.apache.org/jira/browse/FLINK-20154?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xintong Song updated FLINK-20154: - Fix Version/s: (was: 1.14.0) 1.15.0 > Improve error messages when using CLI with wrong target > --- > > Key: FLINK-20154 > URL: https://issues.apache.org/jira/browse/FLINK-20154 > Project: Flink > Issue Type: Improvement > Components: Command Line Client, Documentation >Affects Versions: 1.11.0, 1.12.0 >Reporter: Till Rohrmann >Priority: Minor > Labels: auto-deprioritized-major, usability > Fix For: 1.15.0 > > > According to the [CLI > documentation|https://ci.apache.org/projects/flink/flink-docs-stable/ops/cli.html#job-submission-examples] > one can use the CLI with the following {{--target}} values: "remote", > "local", "kubernetes-session", "yarn-per-job", "yarn-session", > "yarn-application" and "kubernetes-application". However, when running the > following commands: > {{bin/flink run -t yarn-session -p 1 examples/streaming/WindowJoin.jar}} I > get the following exception: > {code} > org.apache.flink.client.program.ProgramInvocationException: The main method > caused an error: null > at > org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:330) > at > org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:198) > at > org.apache.flink.client.ClientUtils.executeProgram(ClientUtils.java:114) > at > org.apache.flink.client.cli.CliFrontend.executeProgram(CliFrontend.java:743) > at org.apache.flink.client.cli.CliFrontend.run(CliFrontend.java:242) > at > org.apache.flink.client.cli.CliFrontend.parseAndRun(CliFrontend.java:971) > at > org.apache.flink.client.cli.CliFrontend.lambda$main$10(CliFrontend.java:1047) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1754) > at > org.apache.flink.runtime.security.contexts.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41) > at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1047) > Caused by: java.lang.IllegalStateException > at > org.apache.flink.util.Preconditions.checkState(Preconditions.java:182) > at > org.apache.flink.client.deployment.executors.AbstractSessionClusterExecutor.execute(AbstractSessionClusterExecutor.java:63) > at > org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.executeAsync(StreamExecutionEnvironment.java:1917) > at > org.apache.flink.client.program.StreamContextEnvironment.executeAsync(StreamContextEnvironment.java:128) > at > org.apache.flink.client.program.StreamContextEnvironment.execute(StreamContextEnvironment.java:76) > at > org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.execute(StreamExecutionEnvironment.java:1799) > at > org.apache.flink.streaming.examples.join.WindowJoin.main(WindowJoin.java:86) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:316) > ... 11 more > {code} > Similarly when running the command {{bin/flink run -t yarn-application -p 1 > examples/streaming/WindowJoin.jar}} I get the exception: > {code} > org.apache.flink.client.program.ProgramInvocationException: The main method > caused an error: No ExecutorFactory found to execute the application. > at > org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:330) > at > org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:198) > at > org.apache.flink.client.ClientUtils.executeProgram(ClientUtils.java:114) > at > org.apache.flink.client.cli.CliFrontend.executeProgram(CliFrontend.java:743) > at org.apache.flink.client.cli.CliFrontend.run(CliFrontend.java:242) > at > org.apache.flink.client.cli.CliFrontend.parseAndRun(CliFrontend.java:971) > at > org.apache.flink.client.cli.CliFrontend.lambda$main$10(CliFrontend.java:1047) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1754) >
[jira] [Updated] (FLINK-20656) Update docs for new KafkaSource connector.
[ https://issues.apache.org/jira/browse/FLINK-20656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xintong Song updated FLINK-20656: - Fix Version/s: (was: 1.14.0) 1.15.0 > Update docs for new KafkaSource connector. > -- > > Key: FLINK-20656 > URL: https://issues.apache.org/jira/browse/FLINK-20656 > Project: Flink > Issue Type: New Feature > Components: Connectors / Kafka >Affects Versions: 1.12.0 >Reporter: Jiangjie Qin >Priority: Minor > Labels: auto-deprioritized-major, auto-unassigned > Fix For: 1.12.6, 1.15.0 > > > We need to add docs for the KafkaSource connector. Namely the following page: > https://ci.apache.org/projects/flink/flink-docs-release-1.12/dev/connectors/kafka.html -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (FLINK-20794) Support to select distinct columns in the Table API
[ https://issues.apache.org/jira/browse/FLINK-20794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xintong Song updated FLINK-20794: - Fix Version/s: (was: 1.14.0) 1.15.0 > Support to select distinct columns in the Table API > --- > > Key: FLINK-20794 > URL: https://issues.apache.org/jira/browse/FLINK-20794 > Project: Flink > Issue Type: Sub-task > Components: Table SQL / API >Reporter: Dian Fu >Priority: Major > Fix For: 1.15.0 > > > Currently, there is no corresponding functionality in Table API for the > following SQL: > {code:java} > SELECT DISTINCT users FROM Orders > {code} > For example, for the following job: > {code:java} > table.select("distinct a") > {code} > It will thrown the following exception: > {code:java} > org.apache.flink.table.api.ExpressionParserException: Could not parse > expression at column 10: ',' expected but 'a' > foundorg.apache.flink.table.api.ExpressionParserException: Could not parse > expression at column 10: ',' expected but 'a' founddistinct a ^ > at > org.apache.flink.table.expressions.PlannerExpressionParserImpl$.throwError(PlannerExpressionParserImpl.scala:726) > at > org.apache.flink.table.expressions.PlannerExpressionParserImpl$.parseExpressionList(PlannerExpressionParserImpl.scala:710) > at > org.apache.flink.table.expressions.PlannerExpressionParserImpl.parseExpressionList(PlannerExpressionParserImpl.scala:47) > at > org.apache.flink.table.expressions.ExpressionParser.parseExpressionList(ExpressionParser.java:40) > at > org.apache.flink.table.api.internal.TableImpl.select(TableImpl.java:121){code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (FLINK-20572) HiveCatalog should be a standalone module
[ https://issues.apache.org/jira/browse/FLINK-20572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xintong Song updated FLINK-20572: - Fix Version/s: (was: 1.14.0) 1.15.0 > HiveCatalog should be a standalone module > - > > Key: FLINK-20572 > URL: https://issues.apache.org/jira/browse/FLINK-20572 > Project: Flink > Issue Type: Improvement > Components: Connectors / Hive >Reporter: Rui Li >Priority: Minor > Labels: auto-deprioritized-major > Fix For: 1.15.0 > > > Currently HiveCatalog is the only implementation that supports persistent > metadata. It's possible that users just want to use HiveCatalog to manage > metadata, and doesn't intend to read/write Hive tables. However HiveCatalog > is part of Hive connector which requires lots of Hive dependencies, and > introducing these dependencies increases the chance of lib conflicts. We > should investigate whether we can move HiveCatalog to a light-weight > standalone module. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (FLINK-20849) Improve JavaDoc and logging of new KafkaSource
[ https://issues.apache.org/jira/browse/FLINK-20849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xintong Song updated FLINK-20849: - Fix Version/s: (was: 1.14.0) 1.15.0 > Improve JavaDoc and logging of new KafkaSource > -- > > Key: FLINK-20849 > URL: https://issues.apache.org/jira/browse/FLINK-20849 > Project: Flink > Issue Type: Improvement > Components: Connectors / Kafka >Reporter: Qingsheng Ren >Priority: Minor > Labels: auto-deprioritized-major > Fix For: 1.12.6, 1.15.0 > > > Some JavaDoc and logging message of the new KafkaSource should be more > descriptive to provide more information to users. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (FLINK-20578) Cannot create empty array using ARRAY[]
[ https://issues.apache.org/jira/browse/FLINK-20578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xintong Song updated FLINK-20578: - Fix Version/s: (was: 1.14.0) 1.15.0 > Cannot create empty array using ARRAY[] > --- > > Key: FLINK-20578 > URL: https://issues.apache.org/jira/browse/FLINK-20578 > Project: Flink > Issue Type: Sub-task > Components: Table SQL / API >Affects Versions: 1.11.2 >Reporter: Fabian Hueske >Priority: Major > Labels: starter > Fix For: 1.15.0 > > > Calling the ARRAY function without an element (`ARRAY[]`) results in an error > message. > Is that the expected behavior? > How can users create empty arrays? -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (FLINK-20834) Add metrics for reporting the memory usage and CPU usage of the Python UDF workers
[ https://issues.apache.org/jira/browse/FLINK-20834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xintong Song updated FLINK-20834: - Fix Version/s: (was: 1.14.0) 1.15.0 > Add metrics for reporting the memory usage and CPU usage of the Python UDF > workers > -- > > Key: FLINK-20834 > URL: https://issues.apache.org/jira/browse/FLINK-20834 > Project: Flink > Issue Type: Improvement > Components: API / Python >Reporter: Wei Zhong >Priority: Minor > Labels: auto-deprioritized-major > Fix For: 1.15.0 > > > Currently these is no official approach to access the memory usage and CPU > usage of the Python UDF workers. We need to add these metrics to monitor the > running status of the Python processes. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (FLINK-20687) Missing 'yarn-application' target in CLI help message
[ https://issues.apache.org/jira/browse/FLINK-20687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xintong Song updated FLINK-20687: - Fix Version/s: (was: 1.14.0) 1.15.0 > Missing 'yarn-application' target in CLI help message > - > > Key: FLINK-20687 > URL: https://issues.apache.org/jira/browse/FLINK-20687 > Project: Flink > Issue Type: Improvement > Components: Command Line Client >Affects Versions: 1.12.0 >Reporter: Ruguo Yu >Priority: Minor > Labels: pull-request-available > Fix For: 1.12.6, 1.15.0 > > Attachments: image-2020-12-20-21-48-18-391.png, > image-2020-12-20-22-02-01-312.png > > > Missing 'yarn-application' target in CLI help message when i enter command > 'flink run-application -h', as follows: > !image-2020-12-20-21-48-18-391.png|width=516,height=372! > The target name is obtained through SPI, and I checked the SPI > META-INF/servicesis is correct. > > Next i put flink-shaded-hadoop-*-.jar to flink lib derectory or set > HADOOP_CLASSPATH, it can show 'yarn-application', as follows: > !image-2020-12-20-22-02-01-312.png|width=808,height=507! > However, I think it is reasonable to show 'yarn-application' without any > action. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (FLINK-20895) Support LocalAggregatePushDown in Blink planner
[ https://issues.apache.org/jira/browse/FLINK-20895?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xintong Song updated FLINK-20895: - Fix Version/s: (was: 1.14.0) 1.15.0 > Support LocalAggregatePushDown in Blink planner > --- > > Key: FLINK-20895 > URL: https://issues.apache.org/jira/browse/FLINK-20895 > Project: Flink > Issue Type: Sub-task > Components: Table SQL / Planner >Reporter: Sebastian Liu >Priority: Major > Labels: auto-unassigned, pull-request-available > Fix For: 1.15.0 > > > Will add related rule to support LocalAggregatePushDown in Blink planner -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (FLINK-20788) It doesn't support to use cube/rollup/grouping sets in the Table API
[ https://issues.apache.org/jira/browse/FLINK-20788?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xintong Song updated FLINK-20788: - Fix Version/s: (was: 1.14.0) 1.15.0 > It doesn't support to use cube/rollup/grouping sets in the Table API > > > Key: FLINK-20788 > URL: https://issues.apache.org/jira/browse/FLINK-20788 > Project: Flink > Issue Type: Sub-task > Components: Table SQL / API >Reporter: Dian Fu >Priority: Major > Fix For: 1.15.0 > > > Currently, it doesn't support to use cube/rollup/grouping sets in the Table > API. For the following job: > {code} > table.groupBy("cube(a, b)") > {code} > It will throw the following exception: > {code} > org.apache.flink.table.api.ValidationException: Undefined function: cube > at > org.apache.flink.table.expressions.resolver.LookupCallResolver.lambda$visit$0(LookupCallResolver.java:49) > at java.util.Optional.orElseThrow(Optional.java:290) > at > org.apache.flink.table.expressions.resolver.LookupCallResolver.visit(LookupCallResolver.java:49) > at > org.apache.flink.table.expressions.resolver.LookupCallResolver.visit(LookupCallResolver.java:38) > at > org.apache.flink.table.expressions.ApiExpressionVisitor.visit(ApiExpressionVisitor.java:37) > at > org.apache.flink.table.expressions.LookupCallExpression.accept(LookupCallExpression.java:65) > at > org.apache.flink.table.expressions.resolver.rules.LookupCallByNameRule.lambda$apply$0(LookupCallByNameRule.java:38) > at > java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193) > at > java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1374) > at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481) > at > java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471) > at > java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708) > at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) > at > java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499) > at > org.apache.flink.table.expressions.resolver.rules.LookupCallByNameRule.apply(LookupCallByNameRule.java:38) > at > org.apache.flink.table.expressions.resolver.ExpressionResolver.lambda$null$1(ExpressionResolver.java:211) > at java.util.function.Function.lambda$andThen$1(Function.java:88) > at java.util.function.Function.lambda$andThen$1(Function.java:88) > at java.util.function.Function.lambda$andThen$1(Function.java:88) > at java.util.function.Function.lambda$andThen$1(Function.java:88) > at java.util.function.Function.lambda$andThen$1(Function.java:88) > at java.util.function.Function.lambda$andThen$1(Function.java:88) > at java.util.function.Function.lambda$andThen$1(Function.java:88) > at > org.apache.flink.table.expressions.resolver.ExpressionResolver.resolve(ExpressionResolver.java:178) > at > org.apache.flink.table.operations.utils.OperationTreeBuilder.aggregate(OperationTreeBuilder.java:236) > at > org.apache.flink.table.api.internal.TableImpl$GroupedTableImpl.select(TableImpl.java:632) > at > org.apache.flink.table.api.internal.TableImpl$GroupedTableImpl.select(TableImpl.java:615) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (FLINK-20484) Improve hive temporal table exception
[ https://issues.apache.org/jira/browse/FLINK-20484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xintong Song updated FLINK-20484: - Fix Version/s: (was: 1.14.0) 1.15.0 > Improve hive temporal table exception > -- > > Key: FLINK-20484 > URL: https://issues.apache.org/jira/browse/FLINK-20484 > Project: Flink > Issue Type: Improvement > Components: Connectors / Hive >Affects Versions: 1.12.0 >Reporter: Leonard Xu >Priority: Minor > Labels: auto-deprioritized-major, auto-unassigned > Fix For: 1.15.0 > > Attachments: image-2020-12-04-16-42-28-399.png > > > user need to set options when use the latest hive partition as temporal > table: > {code:java} > 'streaming-source.enable'='true', > 'streaming-source.partition.include' = 'latest', > {code} > if user missed the option `streaming-source.partition.include`, the hive > table becomes a unbounded table, currently a unbounded table can only be used > as versioned table in temporal join, so the framework require PK and > watermark, but it's hard to understand to users. > !image-2020-12-04-16-42-28-399.png! -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (FLINK-20865) Prevent potential resource deadlock in fine-grained resource management
[ https://issues.apache.org/jira/browse/FLINK-20865?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xintong Song updated FLINK-20865: - Fix Version/s: (was: 1.14.0) 1.15.0 > Prevent potential resource deadlock in fine-grained resource management > --- > > Key: FLINK-20865 > URL: https://issues.apache.org/jira/browse/FLINK-20865 > Project: Flink > Issue Type: Improvement > Components: Runtime / Coordination >Reporter: Yangze Guo >Priority: Minor > Labels: auto-deprioritized-major > Fix For: 1.15.0 > > Attachments: 屏幕快照 2021-01-06 下午2.32.57.png > > > !屏幕快照 2021-01-06 下午2.32.57.png|width=954,height=288! > The above figure demonstrates a potential case of deadlock due to scheduling > dependency. For the given topology, initially the scheduler will request 4 > slots, for A, B, C and D. Assuming only 2 slots are available, if both slots > are assigned to Pipeline Region 0 (as shown on the left), A and B will first > finish execution, then C and D will be executed, and finally E will be > executed. However, if in the beginning the 2 slots are assigned to A and C > (as shown on the right), then neither of A and C can finish execution due to > missing B and D consuming the data they produced. > Currently, with coarse-grained resource management, the scheduler guarantees > to always finish fulfilling requirements of one pipeline region before > starting to fulfill requirements of another. That means the deadlock case > shown on the right of the above figure can never happen. > However, there’s no such guarantee in fine-grained resource management. Since > resource requirements for SSGs can be different, there’s no control on which > requirements will be fulfilled first, when there’s not enough resources to > fulfill all the requirements. Therefore, it’s not always possible to fulfill > one pipeline region prior to another. > To solve this problem, we can make the scheduler defer requesting slots for > other SSGs before requirements of the current SSG are fulfilled, for > fine-grained resource management, at the price of more scheduling time. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (FLINK-20896) Support SupportsAggregatePushDown for JDBC TableSource
[ https://issues.apache.org/jira/browse/FLINK-20896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xintong Song updated FLINK-20896: - Fix Version/s: (was: 1.14.0) 1.15.0 > Support SupportsAggregatePushDown for JDBC TableSource > -- > > Key: FLINK-20896 > URL: https://issues.apache.org/jira/browse/FLINK-20896 > Project: Flink > Issue Type: Sub-task > Components: Connectors / JDBC, Table SQL / Ecosystem >Reporter: Sebastian Liu >Priority: Major > Labels: auto-unassigned > Fix For: 1.15.0 > > > Will add SupportsAggregatePushDown implementation for JDBC TableSource. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (FLINK-20955) Refactor HBase Source in accordance with FLIP-27
[ https://issues.apache.org/jira/browse/FLINK-20955?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xintong Song updated FLINK-20955: - Fix Version/s: (was: 1.14.0) 1.15.0 > Refactor HBase Source in accordance with FLIP-27 > > > Key: FLINK-20955 > URL: https://issues.apache.org/jira/browse/FLINK-20955 > Project: Flink > Issue Type: Improvement > Components: Connectors / HBase, Table SQL / Ecosystem >Reporter: Moritz Manner >Priority: Minor > Labels: auto-deprioritized-major, auto-unassigned, > pull-request-available > Fix For: 1.15.0 > > > The HBase connector source implementation should be updated in accordance > with [FLIP-27: Refactor Source > Interface|https://cwiki.apache.org/confluence/display/FLINK/FLIP-27%3A+Refactor+Source+Interface]. > One source should map to one table in HBase. Users can specify which > column[families] to watch; each change in one of the columns triggers a > record with change type, table, column family, column, value, and timestamp. > h3. Idea > The new Flink HBase Source makes use of the internal [replication mechanism > of HBase|https://hbase.apache.org/book.html#_cluster_replication]. The Source > is registering at the HBase cluster and will receive all WAL edits written in > HBase. From those WAL edits the Source can create the DataStream. > h3. Split > We're still not 100% sure which information a Split should contain. We have > the following possibilities: > # There is only one Split per Source and the Split contains all the > necessary information to connect with HBase. The SourceReader which processes > the Split will receive all WAL edits for all tables and filters the relevant > edits. > # There are multiple Splits per Source, each Split representing one HBase > Region to read from. This assumes that it is possible to only receive WAL > edits from a specific HBase Region and not receive all WAL edits. This would > be preferable as it allows parallel processing of multiple regions, but we > still need to figure out how this is possible. > In both cases the Split will contain information about the HBase instance and > table. > h3. Split Enumerator > Depending on which Split we'll decide on, the split enumerator will connect > to HBase and get all relevant regions or just create one Split. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (FLINK-21559) Python DataStreamTests::test_process_function failed on AZP
[ https://issues.apache.org/jira/browse/FLINK-21559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xintong Song updated FLINK-21559: - Fix Version/s: (was: 1.14.0) 1.15.0 > Python DataStreamTests::test_process_function failed on AZP > --- > > Key: FLINK-21559 > URL: https://issues.apache.org/jira/browse/FLINK-21559 > Project: Flink > Issue Type: Improvement > Components: API / Python >Affects Versions: 1.13.0 >Reporter: Till Rohrmann >Priority: Minor > Labels: auto-deprioritized-critical, auto-deprioritized-major, > test-stability > Fix For: 1.15.0 > > > The Python test case {{DataStreamTests::test_process_function}} failed on AZP. > {code} > === short test summary info > > FAILED > pyflink/datastream/tests/test_data_stream.py::DataStreamTests::test_process_function > = 1 failed, 705 passed, 22 skipped, 303 warnings in 583.39s (0:09:43) > == > ERROR: InvocationError for command /__w/3/s/flink-python/.tox/py38/bin/pytest > --durations=20 (exited with code 1) > {code} > https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=13992=logs=9cada3cb-c1d3-5621-16da-0f718fb86602=8d78fe4f-d658-5c70-12f8-4921589024c3 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (FLINK-21634) ALTER TABLE statement enhancement
[ https://issues.apache.org/jira/browse/FLINK-21634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xintong Song updated FLINK-21634: - Fix Version/s: (was: 1.14.0) 1.15.0 > ALTER TABLE statement enhancement > - > > Key: FLINK-21634 > URL: https://issues.apache.org/jira/browse/FLINK-21634 > Project: Flink > Issue Type: New Feature > Components: Table SQL / API, Table SQL / Client >Reporter: Jark Wu >Assignee: Jark Wu >Priority: Major > Labels: auto-unassigned, stale-assigned > Fix For: 1.15.0 > > > We already introduced ALTER TABLE statement in FLIP-69 [1], but only support > to rename table name and change table options. One useful feature of ALTER > TABLE statement is modifying schema. This is also heavily required by > integration with data lakes (e.g. iceberg). > Therefore, I propose to support following ALTER TABLE statements (except > {{SET}} and {{RENAME TO}}, others are all new introduced syntax): > {code:sql} > ALTER TABLE table_name { > ADD { | ( [, ...]) } > | MODIFY { | ( [, ...]) } > | DROP {column_name | (column_name, column_name, ) | PRIMARY KEY | > CONSTRAINT constraint_name | WATERMARK} > | RENAME old_column_name TO new_column_name > | RENAME TO new_table_name > | SET (key1=val1, ...) > | RESET (key1, ...) > } > :: > { | | } > :: > column_name [FIRST | AFTER column_name] > :: > [CONSTRAINT constraint_name] PRIMARY KEY (column_name, ...) NOT ENFORCED > :: > WATERMARK FOR rowtime_column_name AS watermark_strategy_expression > :: > { | | > } [COMMENT column_comment] > :: > column_type > :: > column_type METADATA [ FROM metadata_key ] [ VIRTUAL ] > :: > AS computed_column_expression > {code} > And some examples: > {code:sql} > -- add a new column > ALTER TABLE mytable ADD new_column STRING COMMENT 'new_column docs'; > -- add columns, constraint, and watermark > ALTER TABLE mytable ADD ( > log_ts STRING COMMENT 'log timestamp string' FIRST, > ts AS TO_TIMESTAMP(log_ts) AFTER log_ts, > PRIMARY KEY (id) NOT ENFORCED, > WATERMARK FOR ts AS ts - INTERVAL '3' SECOND > ); > -- modify a column type > ALTER TABLE prod.db.sample MODIFY measurement double COMMENT 'unit is bytes > per second' AFTER `id`; > -- modify definition of column log_ts and ts, primary key, watermark. they > must exist in table schema > ALTER TABLE mytable ADD ( > log_ts STRING COMMENT 'log timestamp string' AFTER `id`, -- reoder > columns > ts AS TO_TIMESTAMP(log_ts) AFTER log_ts, > PRIMARY KEY (id) NOT ENFORCED, > WATERMARK FOR ts AS ts - INTERVAL '3' SECOND > ); > -- drop an old column > ALTER TABLE prod.db.sample DROP measurement; > -- drop columns > ALTER TABLE prod.db.sample DROP (col1, col2, col3); > -- drop a watermark > ALTER TABLE prod.db.sample DROP WATERMARK; > -- rename column name > ALTER TABLE prod.db.sample RENAME `data` TO payload; > -- rename table name > ALTER TABLE mytable RENAME TO mytable2; > -- set options > ALTER TABLE kafka_table SET ( > 'scan.startup.mode' = 'specific-offsets', > 'scan.startup.specific-offsets' = 'partition:0,offset:42' > ); > -- reset options > ALTER TABLE kafka_table RESET ('scan.startup.mode', > 'scan.startup.specific-offsets'); > {code} > Note: we don't need to introduce new interfaces, because all the alter table > operation will be forward to catalog through {{Catalog#alterTable(tablePath, > newTable, ignoreIfNotExists)}}. > [1]: > https://ci.apache.org/projects/flink/flink-docs-master/docs/dev/table/sql/alter/#alter-table > [2]: http://iceberg.apache.org/spark-ddl/#alter-table-alter-column > [3]: https://trino.io/docs/current/sql/alter-table.html > [4]: https://dev.mysql.com/doc/refman/8.0/en/alter-table.html > [5]: https://www.postgresql.org/docs/9.1/sql-altertable.html -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (FLINK-20920) Document how to connect to kerberized HMS
[ https://issues.apache.org/jira/browse/FLINK-20920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xintong Song updated FLINK-20920: - Fix Version/s: (was: 1.14.0) 1.15.0 > Document how to connect to kerberized HMS > - > > Key: FLINK-20920 > URL: https://issues.apache.org/jira/browse/FLINK-20920 > Project: Flink > Issue Type: Improvement > Components: Connectors / Hive, Documentation >Reporter: Rui Li >Priority: Minor > Labels: auto-deprioritized-major, auto-unassigned > Fix For: 1.15.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (FLINK-20946) Optimize Python ValueState Implementation In PyFlink
[ https://issues.apache.org/jira/browse/FLINK-20946?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xintong Song updated FLINK-20946: - Fix Version/s: (was: 1.14.0) 1.15.0 > Optimize Python ValueState Implementation In PyFlink > > > Key: FLINK-20946 > URL: https://issues.apache.org/jira/browse/FLINK-20946 > Project: Flink > Issue Type: Improvement > Components: API / Python >Affects Versions: 1.13.0 >Reporter: Huang Xingbo >Priority: Minor > Labels: auto-deprioritized-major, auto-unassigned, > pull-request-available > Fix For: 1.15.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (FLINK-20952) Changelog json formats should support inherit options from JSON format
[ https://issues.apache.org/jira/browse/FLINK-20952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xintong Song updated FLINK-20952: - Fix Version/s: (was: 1.14.0) 1.15.0 > Changelog json formats should support inherit options from JSON format > -- > > Key: FLINK-20952 > URL: https://issues.apache.org/jira/browse/FLINK-20952 > Project: Flink > Issue Type: Improvement > Components: Formats (JSON, Avro, Parquet, ORC, SequenceFile), Table > SQL / Ecosystem >Reporter: Jark Wu >Priority: Minor > Labels: auto-deprioritized-major, auto-unassigned, > pull-request-available > Fix For: 1.15.0 > > > Recently, we introduced several config options for json format, e.g. > FLINK-20861. It reveals a potential problem that adding a small config option > into json may need touch debezium-json, canal-json, maxwell-json formats. > This is verbose and error-prone. We need an abstract machanism support > reuable options. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (FLINK-21408) Clarify which DataStream sources support Batch execution
[ https://issues.apache.org/jira/browse/FLINK-21408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xintong Song updated FLINK-21408: - Fix Version/s: (was: 1.14.0) 1.15.0 > Clarify which DataStream sources support Batch execution > > > Key: FLINK-21408 > URL: https://issues.apache.org/jira/browse/FLINK-21408 > Project: Flink > Issue Type: Improvement > Components: API / DataStream, Documentation >Reporter: Chesnay Schepler >Priority: Minor > Labels: auto-deprioritized-major > Fix For: 1.15.0 > > > The DataStream "Execution Mode" documentation goes to great lengths to > describe the differences between the modes and impact on various aspects of > Flink like checkpointing. > However the topic of connectors, and specifically which for Batch mode, or > whether there even are any that don't, is not mentioned at all. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (FLINK-20853) Add reader schema null check for AvroDeserializationSchema when recordClazz is GenericRecord
[ https://issues.apache.org/jira/browse/FLINK-20853?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xintong Song updated FLINK-20853: - Fix Version/s: (was: 1.14.0) 1.15.0 > Add reader schema null check for AvroDeserializationSchema when recordClazz > is GenericRecord > - > > Key: FLINK-20853 > URL: https://issues.apache.org/jira/browse/FLINK-20853 > Project: Flink > Issue Type: Improvement > Components: Formats (JSON, Avro, Parquet, ORC, SequenceFile) >Affects Versions: 1.12.0 >Reporter: hailong wang >Priority: Minor > Fix For: 1.15.0 > > > Reader schema can not be null when recordClazz is GenericRecord. > Although its constructor is default, this will cause NPE when reader schema > is null and recordClazz is GenericRecord for the class extends it, such as > RegistryAvroDeserializationSchema. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (FLINK-21184) Support temporary table/function for hive connector
[ https://issues.apache.org/jira/browse/FLINK-21184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xintong Song updated FLINK-21184: - Fix Version/s: (was: 1.14.0) 1.15.0 > Support temporary table/function for hive connector > --- > > Key: FLINK-21184 > URL: https://issues.apache.org/jira/browse/FLINK-21184 > Project: Flink > Issue Type: New Feature > Components: Connectors / Hive >Reporter: Rui Li >Priority: Major > Labels: auto-deprioritized-major, auto-unassigned > Fix For: 1.15.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (FLINK-21179) Make sure that the open/close methods of the Python DataStream Function are not implemented when using in ReducingState and AggregatingState
[ https://issues.apache.org/jira/browse/FLINK-21179?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xintong Song updated FLINK-21179: - Fix Version/s: (was: 1.14.0) 1.15.0 > Make sure that the open/close methods of the Python DataStream Function are > not implemented when using in ReducingState and AggregatingState > > > Key: FLINK-21179 > URL: https://issues.apache.org/jira/browse/FLINK-21179 > Project: Flink > Issue Type: Sub-task > Components: API / Python >Reporter: Wei Zhong >Priority: Major > Fix For: 1.15.0 > > > As the ReducingState and AggregatingState only support non-rich functions, we > need to make sure that the open/close methods of the Python DataStream > Function are not implemented when using in ReducingState and AggregatingState. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (FLINK-21053) Prevent potential RejectedExecutionExceptions in CheckpointCoordinator failing JM
[ https://issues.apache.org/jira/browse/FLINK-21053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xintong Song updated FLINK-21053: - Fix Version/s: (was: 1.14.0) 1.15.0 > Prevent potential RejectedExecutionExceptions in CheckpointCoordinator > failing JM > - > > Key: FLINK-21053 > URL: https://issues.apache.org/jira/browse/FLINK-21053 > Project: Flink > Issue Type: Improvement > Components: Runtime / Checkpointing >Reporter: Roman Khachatryan >Priority: Minor > Labels: auto-unassigned > Fix For: 1.15.0 > > > In the past, there were multiple bugs caused by throwing/handling > RejectedExecutionException in CheckpointCoordinator (FLINK-18290, > FLINK-20992). > > And I think it's still possible as there are many places where an executor is > passed to calls to CompletableFuture.xxxAsync while it can already be shut > down. > > In FLINK-20992 we discussed two approaches to fix this. > One approach is to check executor state inside a synchronized block every > time when it is used. > Second approach is to > # Create executors inside CheckpointCoordinator (both io & timer thread > pools) > # Check isShutdown() in their RejectedExecution handlers (if yes and it's > RejectedExecutionException then just log; otherwise delegate to > FatalExitExceptionHandler) > # (this will allow to remove such RejectedExecutionException checks from > coordinator code) > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (FLINK-21375) Refactor HybridMemorySegment
[ https://issues.apache.org/jira/browse/FLINK-21375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xintong Song updated FLINK-21375: - Fix Version/s: (was: 1.14.0) 1.15.0 > Refactor HybridMemorySegment > > > Key: FLINK-21375 > URL: https://issues.apache.org/jira/browse/FLINK-21375 > Project: Flink > Issue Type: Technical Debt > Components: Runtime / Coordination >Reporter: Xintong Song >Priority: Minor > Labels: auto-deprioritized-major > Fix For: 1.15.0 > > > Per the discussion in [this PR|https://github.com/apache/flink/pull/14904], > we plan to refactor {{HybridMemorySegment}} as follows. > * Separate into memory type specific implementations: heap / direct / native > (unsafe) > * Remove {{wrap()}}, replacing with {{processAsByteBuffer()}} > * Remove native memory cleaner logic -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (FLINK-21022) flink-connector-es add onSuccess handler after bulk process for sync success data to other third party system for data consistency checking
[ https://issues.apache.org/jira/browse/FLINK-21022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xintong Song updated FLINK-21022: - Fix Version/s: (was: 1.14.0) 1.15.0 > flink-connector-es add onSuccess handler after bulk process for sync success > data to other third party system for data consistency checking > --- > > Key: FLINK-21022 > URL: https://issues.apache.org/jira/browse/FLINK-21022 > Project: Flink > Issue Type: Improvement > Components: Connectors / ElasticSearch >Reporter: Zheng WEI >Priority: Minor > Labels: auto-deprioritized-major, pull-request-available > Fix For: 1.11.5, 1.15.0 > > > flink-connector-es add onSuccess handler after successful bulk process, in > order to sync success data to other third party system for data consistency > checking. Default the implementation of onSuccess function is empty logic, > user can set its own onSuccess handler when needed. -- This message was sent by Atlassian Jira (v8.3.4#803005)