[jira] [Created] (FLINK-28619) flink sql window aggregation using early fire will produce empty data
simenliuxing created FLINK-28619: Summary: flink sql window aggregation using early fire will produce empty data Key: FLINK-28619 URL: https://issues.apache.org/jira/browse/FLINK-28619 Project: Flink Issue Type: Bug Components: Table SQL / Planner, Table SQL / Runtime Affects Versions: 1.15.0 Reporter: simenliuxing Fix For: 1.15.2 sql is as follows: {code:java} set table.exec.emit.early-fire.enabled=true; set table.exec.emit.early-fire.delay=1s; set table.exec.resource.default-parallelism = 1; CREATE TABLE source_table ( id int, name VARCHAR, age int, proc_time AS PROCTIME() ) WITH ( 'connector' = 'datagen' ,'rows-per-second' = '3' ,'number-of-rows' = '1000' ,'fields.id.min' = '1' ,'fields.id.max' = '1' ,'fields.age.min' = '1' ,'fields.age.max' = '150' ,'fields.name.length' = '3' ); CREATE TABLE sink_table ( id int, name VARCHAR, ageAgg int, PRIMARY KEY (id) NOT ENFORCED ) WITH ( 'connector' = 'upsert-kafka' ,'properties.bootstrap.servers' = 'localhost:9092' ,'topic' = 'aaa' ,'key.format' = 'json' ,'value.format' = 'json' ,'value.fields-include' = 'ALL' ); INSERT INTO sink_table select id, last_value(name) as name, sum(age) as ageAgg from source_table group by tumble(proc_time, interval '1' day), id; {code} Result received in kafka: {code:java} {"id":1,"name":"efe","ageAgg":455} null {"id":1,"name":"96a","ageAgg":701} null {"id":1,"name":"d71","ageAgg":1289} null {"id":1,"name":"89c","ageAgg":1515}{code} Is the extra null normal? -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [DISCUSS] Replace Attempt column with Attempt Number on the subtask list page of the Web UI
Hi user mail list, I'm also forwarding this thread to you. Please let me know if you have any comments or feedback! Best, Gen On Wed, Jul 20, 2022 at 4:25 PM Zhu Zhu wrote: > Thanks for starting this discussion, Gen! > I agree it is confusing or even troublesome to show an attempt id that is > different from the corresponding attempt number in REST, metrics and logs. > It adds burden to users to do the mapping in troubleshooting. Mis-mapping > can be easy to happen and result in a waste of efforts and wrong > conclusion. > > Therefore, +1 for this proposal. > > Thanks, > Zhu > > Gen Luo 于2022年7月20日周三 15:24写道: > > > > Hi everyone, > > > > I'd like to propose a change on the Web UI to replace the Attempt column > > with an Attempt Number column on the subtask list page. > > > > From the very beginning, the attempt number shown is calculated at the > > frontend by subtask.attempt + 1, which means the attempt number shown on > > the web UI is not the same as it is in the runtime, as well as the logs > and > > the metrics. Users may get confused since they can't find logs or metrics > > of the subtask with the same attempt number. > > > > Fortunately, by now the users don't need to care about the attempt > number, > > since there can be only one attempt of each subtask. However, the > confusion > > seems inevitable once the speculative execution[1] or the attempt history > > is introduced, since multiple attempts of the same subtask can be > executed > > or presented at the same time. > > > > I suggest that the attempt number shown on the web UI should be changed > to > > align that on the runtime side, which is used in logging and metrics > > reporting. To avoid confusion, the column should also be renamed as > > "Attempt Number". The changes should only affect the Web UI. No REST API > > needs to change. What do you think? > > > > [1] > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-168%3A+Speculative+Execution+for+Batch+Job > > > > Best, > > Gen >
[jira] [Created] (FLINK-28618) Cannot use hive.dialect on master
liubo created FLINK-28618: - Summary: Cannot use hive.dialect on master Key: FLINK-28618 URL: https://issues.apache.org/jira/browse/FLINK-28618 Project: Flink Issue Type: Bug Components: Connectors / Hive Environment: hadoop_class 2.10 openjdk11 Reporter: liubo Attachments: image-2022-07-21-11-01-12-395.png, image-2022-07-21-11-04-12-552.png I build the newest master flink and copy \{hive-exec-2.1.1.jar;flink-sql-connector-hive-2.3.9_2.12-1.16-SNAPSHOT.jar;flink-connector-hive_2.12-1.16-SNAPSHOT.jar} to $FLINK_HOME/lib 。 then , i got faild in sql-client !image-2022-07-21-11-01-12-395.png! and after copy opt/flink-table-planner_2.12-1.16-SNAPSHOT.jar even cannot open the sql-client !image-2022-07-21-11-04-12-552.png! so , what's wronge? -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-28617) Support stop job statement in SqlGatewayService
Paul Lin created FLINK-28617: Summary: Support stop job statement in SqlGatewayService Key: FLINK-28617 URL: https://issues.apache.org/jira/browse/FLINK-28617 Project: Flink Issue Type: Sub-task Components: Table SQL / Gateway Reporter: Paul Lin -- This message was sent by Atlassian Jira (v8.20.10#820010)
RE: [VOTE] FLIP-251: Support collecting arbitrary number of streams
+1. (Non-binding) Thanks for driving this. Best Roc Marshal On 2022/07/19 08:26:15 Chesnay Schepler wrote: > I'd like to proceed with the vote for FLIP-251 [1], as no objections or > issues were raised in [2]. > > The vote will last for at least 72 hours unless there is an objection or > insufficient votes. > > [1] > https://cwiki.apache.org/confluence/display/FLINK/FLIP-251%3A+Support+collecting+arbitrary+number+of+streams > [2] https://lists.apache.org/thread/ksv71m7rvcwslonw07h2qnw77zpqozvh > >
Re: [DISCUSS] FLIP-217 Support watermark alignment of source splits
Hi Sebastian, Thank you for updating the FLIP and sorry for my delayed response. As Piotr pointed out, we would need to incorporate the fallback flag into the design to reflect the outcome of the previous discussion. Based on the current FLIP and as detailed by Becket, the SourceOperator coordinates the alignment. It is responsible for the pause/resume decision and knows how many splits are assigned. Therefore shouldn't it have all the information needed to efficiently handle the case of UnsupportedOperationException thrown by a reader? Although the fallback requires some extra implementation effort, I think that is more than offset by not surprising users and offering a smoother migration path. Yes, the flag is a temporary feature that will become obsolete in perhaps 2-3 releases (can we please also include that into the FLIP?). But since it would be just a configuration property that can be ignored at that point (for which there is precedence), no code change will be forced on users. As for the property name, perhaps the following would be even more descriptive? coarse.grained.wm.alignment.fallback.enabled Thanks! Thomas On Wed, Jul 13, 2022 at 10:59 AM Becket Qin wrote: > > Thanks for the explanation, Sebastian. I understand your concern now. > > 1. About the major concern. Personally I'd consider the coarse grained > watermark alignment as a special case for backward compatibility. In the > future, if for whatever reason we want to pause a split and that is not > supported, it seems the only thing that makes sense is throwing an exception, > instead of pausing the entire source reader. Regarding this FLIP, if the > logic that determines which split should be paused is in the SourceOperator, > the SourceOperator actually knows the reason why it pauses a split. It also > knows whether there are more than one split assigned to the source reader. So > it can just fallback to the coarse grained watermark alignment, without > affecting other reasons of pausing a split, right? And in the future, if > there are more purposes for pausing / resuming a split, the SourceOperator > still needs to understand each of the reasons in order to resume the splits > after all the pausing conditions are no longer met. > > 2. Naming wise, would "coarse.grained.watermark.alignment.enabled" address > your concern? > > The only concern I have for Option A is that people may not be able to > benefit from split level WM alignment until all the sources they need have > that implemented. This seems unnecessarily delaying the adoption of a new > feature, which looks like a more substantive downside compared with the > "coarse.grained.wm.alignment.enabled" option. > > BTW, the SourceOperator doesn't need to invoke the pauseOrResumeSplit() > method and catch the UnsupportedOperation every time. A flag can be set so it > doesn't attempt to pause the split after the first time it sees the exception. > > > Thanks, > > Jiangjie (Becket) Qin > > > > On Wed, Jul 13, 2022 at 5:11 PM Sebastian Mattheis > wrote: >> >> Hi Becket, Hi Thomas, Hi Piotrek, >> >> Thanks for the feedback. I would like to highlight some concerns: >> >> Major: A configuration parameter like "allow coarse grained alignment" >> defines a semantic that mixes two contexts conditionally as follows: "ignore >> incapability to pause splits in SourceReader/SplitReader" IF (conditional) >> we "allow coarse grained watermark alignment". At the same time we said that >> there is no way to check the capability of SourceReader/SplitReader to >> pause/resume other than observing a UnsupportedOperationException during >> runtime such that we cannot disable the trigger for watermark split >> alignment in the SourceOperator. Instead, we can only ignore the >> incapability of SourceReader/SplitReader during execution of a pause/resume >> attempt which, consequently, requires to check the "allow coarse grained >> alignment " parameter value (to implement the conditional semantic). >> However, during this execution we actually don't know whether the attempt >> was executed for the purpose of watermark alignment or for some other >> purpose such that the check actually depends on who triggered the >> pause/resume attempt and hides the exception potentially unexpectedly for >> some other use case. Of course, currently there is no other purpose and, >> hence, no other trigger than watermark alignment. However, this breaks, in >> my perspective, the idea of having pauseOrResumeSplits (re)usable for other >> use cases. >> Minor: I'm not aware of any configuration parameter in the format like >> `allow.*` as you suggested with `allow.coarse.grained.watermark.alignment`. >> Would that still be okay to do? >> >> As we have agreed to not have a "supportsPausableSplits" method because of >> potential inconsistencies between return value of this method and the actual >> implementation (and also the difficulty to have a meaningful return value >>
[GitHub] [flink-connector-redis] knaufk commented on pull request #2: [FLINK-15571][connector][WIP] Redis Stream connector for Flink
knaufk commented on PR #2: URL: https://github.com/apache/flink-connector-redis/pull/2#issuecomment-1190766247 Thanks @sazzad16 for you contribution and patience. I think the community would really benefit from a Redis Connector and I'll try to help you get this in. The main challenge with the current PR is that it uses the old - about to be deprecated - source and sink APIs of Apache Flink (SourceFunction, SinkFunction). Could you try migrating your implemention to these new APIs? For the Source, https://nightlies.apache.org/flink/flink-docs-release-1.15/docs/dev/datastream/sources/ contains a description of the interfaces and there are already a few examples like [Kafka](https://github.com/apache/flink/blob/master/flink-connectors/flink-connector-kafka/src/main/java/org/apache/flink/connector/kafka/source/KafkaSource.java) or [Pulsar](https://github.com/apache/flink/blob/master/flink-connectors/flink-connector-pulsar/src/main/java/org/apache/flink/connector/pulsar/source/PulsarSource.java). For the source, there is [ElasticSearch](https://github.com/apache/flink-connector-elasticsearch/blob/main/flink-connector-elasticsearch-base/src/main/java/org/apache/flink/connector/elasticsearch/sink/ElasticsearchSink.java) that uses the new Sink API as well as the [FileSink](https://github.com/apache/flink/blob/master/flink-connectors/flink-connector-files/src/main/java/org/apache/flink/connector/file/sink/FileSink.java). However, I would recommend you have a look whether you can leverage the [Async Sink](https://cwiki.apache.org/confluence/display/FLINK/FLIP-171%3A+Async+Sink) like e.g. the DynamoDB Sink is doing, which is also under development right now. As prerequisite, you would need to be to asynchronously write to Redis and tell based on the resulting Future whether the request was successful or not (see Public Interfaces in the Async Sink FLIP). From what I know about Redis this should be possible and would greatly simplify the implementation of a Sink. Lastly, I propose we split this contribution up into at least four separate PRs to get this moving more quickly. * the Source * the Sink * the TableSource * the TableSink Just start with what seems most relevant to you. I am sorry about these requests to port the implementation to the new APIs, but building on the old APIs will not be sustainable and with the new APIs we immediately get all the support for both batch and stream execution in the DataStream and Table API as well as better chances this connector ties in perfectly with checkpointing and watermarking without much work from your side. Thanks again and looking forward to hearing from you. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [VOTE] FLIP-252: Amazon DynamoDB Sink Connector
+1. Thanks! Am Mi., 20. Juli 2022 um 16:48 Uhr schrieb Tzu-Li (Gordon) Tai < tzuli...@apache.org>: > +1 > > On Wed, Jul 20, 2022 at 6:13 AM Danny Cranmer > wrote: > > > Hi there, > > > > After the discussion in [1], I’d like to open a voting thread for > FLIP-252 > > [2], which proposes the addition of an Amazon DynamoDB sink based on the > > Async Sink [3]. > > > > The vote will be open until July 23rd earliest (72h), unless there are > any > > binding vetos. > > > > Cheers, Danny > > > > [1] https://lists.apache.org/thread/ssmf2c86n3xyd5qqmcdft22sqn4qw8mw > > [2] > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-252%3A+Amazon+DynamoDB+Sink+Connector > > [3] > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-171%3A+Async+Sink > > > -- https://twitter.com/snntrable https://github.com/knaufk
Re: [VOTE] FLIP-251: Support collecting arbitrary number of streams
+1 (binding) Am Mi., 20. Juli 2022 um 15:48 Uhr schrieb David Anderson < dander...@apache.org>: > +1 > > Thank you Chesnay. > > On Tue, Jul 19, 2022 at 3:09 PM Alexander Fedulov > > wrote: > > > +1 > > Looking forward to using the API to simplify tests setups. > > > > Best, > > Alexander Fedulov > > > > On Tue, Jul 19, 2022 at 2:31 PM Martijn Visser > > > wrote: > > > > > Thanks for creating the FLIP and opening the vote Chesnay. > > > > > > +1 (binding) > > > > > > Op di 19 jul. 2022 om 10:26 schreef Chesnay Schepler < > ches...@apache.org > > >: > > > > > > > I'd like to proceed with the vote for FLIP-251 [1], as no objections > or > > > > issues were raised in [2]. > > > > > > > > The vote will last for at least 72 hours unless there is an objection > > or > > > > insufficient votes. > > > > > > > > [1] > > > > > > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-251%3A+Support+collecting+arbitrary+number+of+streams > > > > [2] > https://lists.apache.org/thread/ksv71m7rvcwslonw07h2qnw77zpqozvh > > > > > > > > > > > > > > -- https://twitter.com/snntrable https://github.com/knaufk
[DISCUSS] FLIP-246: Multi Cluster Kafka Source
Hi all, We would like to start a discussion thread on FLIP-246: Multi Cluster Kafka Source [1] where we propose to provide a source connector for dynamically reading from Kafka multiple clusters, which will not require Flink job restart. This can greatly improve the Kafka migration experience for clusters and topics, and it solves some existing problems with the current KafkaSource. There was some interest from users [2] from a meetup and the mailing list. Looking forward to comments and feedback, thanks! [1] https://cwiki.apache.org/confluence/display/FLINK/FLIP-246%3A+Multi+Cluster+Kafka+Source [2] https://lists.apache.org/thread/zmpnzx6jjsqc0oldvdm5y2n674xzc3jc Best, Mason
[jira] [Created] (FLINK-28616) Quickstart and examples docs should use configured stable version
Gyula Fora created FLINK-28616: -- Summary: Quickstart and examples docs should use configured stable version Key: FLINK-28616 URL: https://issues.apache.org/jira/browse/FLINK-28616 Project: Flink Issue Type: Improvement Components: Kubernetes Operator Reporter: Gyula Fora The Kubernetes Operator documentation contains several parts that refer to the current stable versions. A good example would be a quickstart which uses helm repo link for the last stable release. We should change the logic so that this always points to the configured stable release (part of config doc). This way we would avoid the need for upgrading this as part of every release. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-28615) Add K8s recommend labels to the deployments, services created by operator
Xin Hao created FLINK-28615: --- Summary: Add K8s recommend labels to the deployments, services created by operator Key: FLINK-28615 URL: https://issues.apache.org/jira/browse/FLINK-28615 Project: Flink Issue Type: Improvement Components: Kubernetes Operator Reporter: Xin Hao I'm using the Flink operator with Argo CD, it will be nice if we can add the K8s recommend labels to the deployments and services. Such as: {code:java} labels: app.kubernetes.io/managed-by: apache-flink-operator app.kubernetes.io/part-of: flink-session-cluster-a {code} With this, the users can see all the resources created by FlinkDeployment See also: https://kubernetes.io/docs/concepts/overview/working-with-objects/common-labels/#labels -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [VOTE] Apache Flink Kubernetes Operator Release 1.1.0, release candidate #1
Hi there, Thanks a lot for the release! +1 (non-binding) Successfully verified the following: - Checksums and gpg signatures of the tar files. - No binaries in source release - Build from source, build image from source - Helm Repo works, Helm install works - Submit example applications without errors - Check that flink sql/python examples with flink kubernetes operator work as expected - Check licenses in the docs dir in source code Best, Biao Geng * From: Gyula Fóra Date: Wednesday, July 20, 2022 at 5:47 PM To: dev Subject: [VOTE] Apache Flink Kubernetes Operator Release 1.1.0, release candidate #1 Hi everyone, Please review and vote on the release candidate #1 for the version 1.1.0 of Apache Flink Kubernetes Operator, as follows: [ ] +1, Approve the release [ ] -1, Do not approve the release (please provide specific comments) **Release Overview** As an overview, the release consists of the following: a) Kubernetes Operator canonical source distribution (including the Dockerfile), to be deployed to the release repository at dist.apache.org b) Kubernetes Operator Helm Chart to be deployed to the release repository at dist.apache.org c) Maven artifacts to be deployed to the Maven Central Repository d) Docker image to be pushed to dockerhub **Staging Areas to Review** The staging areas containing the above mentioned artifacts are as follows, for your review: * All artifacts for a,b) can be found in the corresponding dev repository at dist.apache.org [1] * All artifacts for c) can be found at the Apache Nexus Repository [2] * The docker image for d) is staged on github [3] All artifacts are signed with the key 0B4A34ADDFFA2BB54EB720B221F06303B87DAFF1 [4] Other links for your review: * JIRA release notes [5] * source code tag "release-1.1.0-rc1" [6] * PR to update the website Downloads page to include Kubernetes Operator links [7] **Vote Duration** The voting time will run for at least 72 hours. It is adopted by majority approval, with at least 3 PMC affirmative votes. **Note on Verification** You can follow the basic verification guide here[8]. Note that you don't need to verify everything yourself, but please make note of what you have tested together with your +- vote. Thanks, Gyula Fora [1] https://dist.apache.org/repos/dist/dev/flink/flink-kubernetes-operator-1.1.0-rc1/ [2] https://repository.apache.org/content/repositories/orgapacheflink-1518/ [3] ghcr.io/apache/flink-kubernetes-operator:c9dec3f [4] https://dist.apache.org/repos/dist/release/flink/KEYS [5] https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12351723 [6] https://github.com/apache/flink-kubernetes-operator/tree/release-1.1.0-rc1 [7] https://github.com/apache/flink-web/pull/560 [8] https://cwiki.apache.org/confluence/display/FLINK/Verifying+a+Flink+Kubernetes+Operator+Release
Re: [VOTE] FLIP-252: Amazon DynamoDB Sink Connector
+1 On Wed, Jul 20, 2022 at 6:13 AM Danny Cranmer wrote: > Hi there, > > After the discussion in [1], I’d like to open a voting thread for FLIP-252 > [2], which proposes the addition of an Amazon DynamoDB sink based on the > Async Sink [3]. > > The vote will be open until July 23rd earliest (72h), unless there are any > binding vetos. > > Cheers, Danny > > [1] https://lists.apache.org/thread/ssmf2c86n3xyd5qqmcdft22sqn4qw8mw > [2] > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-252%3A+Amazon+DynamoDB+Sink+Connector > [3] > https://cwiki.apache.org/confluence/display/FLINK/FLIP-171%3A+Async+Sink >
Re: [VOTE] FLIP-251: Support collecting arbitrary number of streams
+1 Thank you Chesnay. On Tue, Jul 19, 2022 at 3:09 PM Alexander Fedulov wrote: > +1 > Looking forward to using the API to simplify tests setups. > > Best, > Alexander Fedulov > > On Tue, Jul 19, 2022 at 2:31 PM Martijn Visser > wrote: > > > Thanks for creating the FLIP and opening the vote Chesnay. > > > > +1 (binding) > > > > Op di 19 jul. 2022 om 10:26 schreef Chesnay Schepler >: > > > > > I'd like to proceed with the vote for FLIP-251 [1], as no objections or > > > issues were raised in [2]. > > > > > > The vote will last for at least 72 hours unless there is an objection > or > > > insufficient votes. > > > > > > [1] > > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-251%3A+Support+collecting+arbitrary+number+of+streams > > > [2] https://lists.apache.org/thread/ksv71m7rvcwslonw07h2qnw77zpqozvh > > > > > > > > >
Re: [DISCUSS] FLIP-252: Amazon DynamoDB Sink Connector
Thanks for the feedback and support Robert. I have created a vote thread [1] Danny Cranmer [1] https://lists.apache.org/thread/4qq33hxfjf5ms21pbo4mk2q7xn4s2l0h On Wed, Jul 20, 2022 at 7:19 AM Robert Metzger wrote: > Thanks a lot for this nice proposal! > > DynamoDB seems to be a connector that Flink is still lacking, and with the > Async Sink interface, it seems that we can implement this fairly easily. > > +1 to proceed to the formal vote for this FLIP! > > On Fri, Jul 15, 2022 at 7:51 PM Danny Cranmer > wrote: > > > Hello all, > > > > We would like to start a discussion thread on FLIP-252: Amazon DynamoDB > > Sink Connector [1] where we propose to provide a sink connector for > Amazon > > DynamoDB [2] based on the Async Sink [3]. Looking forward to comments and > > feedback. Thank you. > > > > [1] > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-252%3A+Amazon+DynamoDB+Sink+Connector > > [2] https://aws.amazon.com/dynamodb > > [3] > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-171%3A+Async+Sink > > >
[VOTE] FLIP-252: Amazon DynamoDB Sink Connector
Hi there, After the discussion in [1], I’d like to open a voting thread for FLIP-252 [2], which proposes the addition of an Amazon DynamoDB sink based on the Async Sink [3]. The vote will be open until July 23rd earliest (72h), unless there are any binding vetos. Cheers, Danny [1] https://lists.apache.org/thread/ssmf2c86n3xyd5qqmcdft22sqn4qw8mw [2] https://cwiki.apache.org/confluence/display/FLINK/FLIP-252%3A+Amazon+DynamoDB+Sink+Connector [3] https://cwiki.apache.org/confluence/display/FLINK/FLIP-171%3A+Async+Sink
Re: [ANNOUNCE] Table Store release-0.2 branch cut
Awesome! Looking forward to the release. Best, Shuo On Wed, Jul 20, 2022 at 3:12 PM Leonard Xu wrote: > Thanks Jinsong for the great work, look forward to the release. > > Best, > Leonard > > > 2022年7月20日 下午2:50,Jingsong Li 写道: > > > > Hi Flink devs! > > > > The version on the main branch has been updated to 0.3-SNAPSHOT. > > > > The release-0.2 branch has been forked from main: > > https://github.com/apache/flink-table-store/tree/release-0.2 > > > > Documentation: > > https://nightlies.apache.org/flink/flink-table-store-docs-release-0.2/ > > > > The most important thing about Table Store 0.2 is that it enriches the > > Table Store ecosystem. Now not only Flink 1.15, but also Flink 1.14, > > Hive 2.3, Spark 2.4, Spark 3+, and Trino are all supported. > > > > More features are: Catalog (Including Hive Metastore), Rescale bucket, > > Append-only mode, Improved documentation etc.. > > > > You are very welcome to test it! > > > > We will try to prepare the first RC in the next week. > > > > Best, > > Jingsong > >
[VOTE] Apache Flink Kubernetes Operator Release 1.1.0, release candidate #1
Hi everyone, Please review and vote on the release candidate #1 for the version 1.1.0 of Apache Flink Kubernetes Operator, as follows: [ ] +1, Approve the release [ ] -1, Do not approve the release (please provide specific comments) **Release Overview** As an overview, the release consists of the following: a) Kubernetes Operator canonical source distribution (including the Dockerfile), to be deployed to the release repository at dist.apache.org b) Kubernetes Operator Helm Chart to be deployed to the release repository at dist.apache.org c) Maven artifacts to be deployed to the Maven Central Repository d) Docker image to be pushed to dockerhub **Staging Areas to Review** The staging areas containing the above mentioned artifacts are as follows, for your review: * All artifacts for a,b) can be found in the corresponding dev repository at dist.apache.org [1] * All artifacts for c) can be found at the Apache Nexus Repository [2] * The docker image for d) is staged on github [3] All artifacts are signed with the key 0B4A34ADDFFA2BB54EB720B221F06303B87DAFF1 [4] Other links for your review: * JIRA release notes [5] * source code tag "release-1.1.0-rc1" [6] * PR to update the website Downloads page to include Kubernetes Operator links [7] **Vote Duration** The voting time will run for at least 72 hours. It is adopted by majority approval, with at least 3 PMC affirmative votes. **Note on Verification** You can follow the basic verification guide here[8]. Note that you don't need to verify everything yourself, but please make note of what you have tested together with your +- vote. Thanks, Gyula Fora [1] https://dist.apache.org/repos/dist/dev/flink/flink-kubernetes-operator-1.1.0-rc1/ [2] https://repository.apache.org/content/repositories/orgapacheflink-1518/ [3] ghcr.io/apache/flink-kubernetes-operator:c9dec3f [4] https://dist.apache.org/repos/dist/release/flink/KEYS [5] https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12351723 [6] https://github.com/apache/flink-kubernetes-operator/tree/release-1.1.0-rc1 [7] https://github.com/apache/flink-web/pull/560 [8] https://cwiki.apache.org/confluence/display/FLINK/Verifying+a+Flink+Kubernetes+Operator+Release
Re: [DISCUSS] FLIP-247 Bulk fetch of table and column statistics for given partitions
I agree with Jingsong. There are use cases to get partitions and partition stats in a single call to reduce the IO cost. For example, extending Catalog#listPartitions to Catalog#listPartitionsWithStats, and extending Catalog#listPartitionsByFilter to Catalog#listPartitionsWithStatsByFilter. This allows the planner to choose the cheapest one to request partitions and stats. But of course, this can be introduced in the future. Besides, I found that the FLIP doesn't cover compatibility properly. Introducing two new methods in Catalog will break all third-party catalog implementations. On the other hand, the planner should have a way to identify whether the Catalog supports bulk get. Best, Jark On Wed, 20 Jul 2022 at 16:43, Jingsong Li wrote: > Thanks for your reply. > > - Consider bulkGetPartitionStatistics, partition statistics are > already in HiveMetastoreClient.listPartitions. But on our side, we > need Catalog.getPartitions first, and then > Catalog.bulkGetPartitionStatistics. > > - Consider bulkGetPartitionColumnStatistics, yes, as you said, we need it. > > But to unify them, the current FLIP is acceptable. > > Best, > Jingsong > > On Wed, Jul 20, 2022 at 3:57 PM Jing Ge wrote: > > > > Hi Jingsong, > > > > Thanks for clarifying it. Are you suggesting a new method or changing the > > name of the methods described in the FLIP? > > Please see my answers and further questions below. > > > > Best regards, > > Jing > > > > On Wed, Jul 20, 2022 at 4:28 AM Jingsong Li > wrote: > > > > > Hi Jing, > > > > > > I understand that the statistics for partitions are currently only > > > used by Hive, so we can look at the Hive implementation: > > > > > > See HiveCatalog.getPartitionStatistics. > > > To get the statistics, we actually get them from the > > > org.apache.hadoop.hive.metastore.api.Partition object. > > > > > > > Correct. Both new methods will do the same thing but in bulk mode. > > > > > > > > > > According to HiveMetastore's API, partition-related operations > > > actually get the partition as well as the statistics information. > > > > > > > I am not really sure which API it is. Methods in HiveMetaStoreClient that > > are dealing with partitions will either return partitions or > statisticObjs, > > e.g. listPartitions(...) or getPartitionColumnStatistics(...) > > > > > > > > > > So if the current partition statistics are just for Hive, can we > > > consider unifying it with Hive? > > > > > > > Yes, we are on the same page. This FLIP is trying to unify it with > > HiveMetaStoreClient. > > > > > > > > > > For example, in PushPartitionIntoTableSourceScanRule, just use > > > `listPartitionWithStats`, and adjust table statistics from partitions. > > > > > > > Does the bulkGetPartitionStatistics work in this case too? Or, do you > mean > > you need both partitions and related statistics returned by a new method > > called `listPartitionWithStats`? > > > > > > > Best, > > > Jingsong > > > > > > On Tue, Jul 19, 2022 at 8:44 PM Jing Ge wrote: > > > > > > > > Thanks Jingsong for the suggestion. > > > > > > > > Do you mean using a different naming convention? There is a thought > and > > > > description in the FLIP about using "list" or "bulkGet": > > > > > > > >- bulkGetPartitionStatistics(...) has been chosen over > > > >listPartitionStatistics(...), because, comparing to database and > > > partition > > > >that are static and can be listed, statistics are more dynamic and > > > will > > > >need more computation logic to create, therefore using "get" is > > > >semantically more feasible than list. The "bulk" gives users the > hint > > > that > > > >this method will work in the bulk mode and return a collection of > > > instances. > > > > > > > > > > > > As a reference, we can see that no method in MetaStoreClient, that > > > > calculates statistics, uses the "list" naming convention. > > > > > > > > Best regards, > > > > Jing > > > > > > > > On Fri, Jul 15, 2022 at 5:38 AM Jingsong Li > > > wrote: > > > > > > > > > Thanks for starting this discussion. > > > > > > > > > > Have we considered introducing a listPartitionWithStats() in > Catalog? > > > > > > > > > > Best, > > > > > Jingsong > > > > > > > > > > On Fri, Jul 15, 2022 at 10:08 AM Jark Wu wrote: > > > > > > > > > > > > Hi Jing, > > > > > > > > > > > > Thanks for starting this discussion. The bulk fetch is a great > > > > > improvement > > > > > > for the optimizer. > > > > > > The FLIP looks good to me. > > > > > > > > > > > > Best, > > > > > > Jark > > > > > > > > > > > > On Fri, 8 Jul 2022 at 17:36, Jing Ge wrote: > > > > > > > > > > > > > Hi devs, > > > > > > > > > > > > > > After having multiple discussions with Jark and Goldfrey, I'd > like > > > to > > > > > start > > > > > > > a discussion on the mailing list w.r.t. FLIP-247[1], which will > > > > > > > significantly improve the performance by providing the bulk > fetch > > > > > > > capability for table and column statistics. > > > > > > > > > > > > > >
Re: [DISCUSS] FLIP-247 Bulk fetch of table and column statistics for given partitions
Thanks for your reply. - Consider bulkGetPartitionStatistics, partition statistics are already in HiveMetastoreClient.listPartitions. But on our side, we need Catalog.getPartitions first, and then Catalog.bulkGetPartitionStatistics. - Consider bulkGetPartitionColumnStatistics, yes, as you said, we need it. But to unify them, the current FLIP is acceptable. Best, Jingsong On Wed, Jul 20, 2022 at 3:57 PM Jing Ge wrote: > > Hi Jingsong, > > Thanks for clarifying it. Are you suggesting a new method or changing the > name of the methods described in the FLIP? > Please see my answers and further questions below. > > Best regards, > Jing > > On Wed, Jul 20, 2022 at 4:28 AM Jingsong Li wrote: > > > Hi Jing, > > > > I understand that the statistics for partitions are currently only > > used by Hive, so we can look at the Hive implementation: > > > > See HiveCatalog.getPartitionStatistics. > > To get the statistics, we actually get them from the > > org.apache.hadoop.hive.metastore.api.Partition object. > > > > Correct. Both new methods will do the same thing but in bulk mode. > > > > > > According to HiveMetastore's API, partition-related operations > > actually get the partition as well as the statistics information. > > > > I am not really sure which API it is. Methods in HiveMetaStoreClient that > are dealing with partitions will either return partitions or statisticObjs, > e.g. listPartitions(...) or getPartitionColumnStatistics(...) > > > > > > So if the current partition statistics are just for Hive, can we > > consider unifying it with Hive? > > > > Yes, we are on the same page. This FLIP is trying to unify it with > HiveMetaStoreClient. > > > > > > For example, in PushPartitionIntoTableSourceScanRule, just use > > `listPartitionWithStats`, and adjust table statistics from partitions. > > > > Does the bulkGetPartitionStatistics work in this case too? Or, do you mean > you need both partitions and related statistics returned by a new method > called `listPartitionWithStats`? > > > > Best, > > Jingsong > > > > On Tue, Jul 19, 2022 at 8:44 PM Jing Ge wrote: > > > > > > Thanks Jingsong for the suggestion. > > > > > > Do you mean using a different naming convention? There is a thought and > > > description in the FLIP about using "list" or "bulkGet": > > > > > >- bulkGetPartitionStatistics(...) has been chosen over > > >listPartitionStatistics(...), because, comparing to database and > > partition > > >that are static and can be listed, statistics are more dynamic and > > will > > >need more computation logic to create, therefore using "get" is > > >semantically more feasible than list. The "bulk" gives users the hint > > that > > >this method will work in the bulk mode and return a collection of > > instances. > > > > > > > > > As a reference, we can see that no method in MetaStoreClient, that > > > calculates statistics, uses the "list" naming convention. > > > > > > Best regards, > > > Jing > > > > > > On Fri, Jul 15, 2022 at 5:38 AM Jingsong Li > > wrote: > > > > > > > Thanks for starting this discussion. > > > > > > > > Have we considered introducing a listPartitionWithStats() in Catalog? > > > > > > > > Best, > > > > Jingsong > > > > > > > > On Fri, Jul 15, 2022 at 10:08 AM Jark Wu wrote: > > > > > > > > > > Hi Jing, > > > > > > > > > > Thanks for starting this discussion. The bulk fetch is a great > > > > improvement > > > > > for the optimizer. > > > > > The FLIP looks good to me. > > > > > > > > > > Best, > > > > > Jark > > > > > > > > > > On Fri, 8 Jul 2022 at 17:36, Jing Ge wrote: > > > > > > > > > > > Hi devs, > > > > > > > > > > > > After having multiple discussions with Jark and Goldfrey, I'd like > > to > > > > start > > > > > > a discussion on the mailing list w.r.t. FLIP-247[1], which will > > > > > > significantly improve the performance by providing the bulk fetch > > > > > > capability for table and column statistics. > > > > > > > > > > > > Currently the statistics information about tables can only be > > fetched > > > > from > > > > > > the catalog by each given partition iteratively. Since getting > > > > statistics > > > > > > information from catalogs is a very heavy operation, in order to > > > > improve > > > > > > the query performance, we’d better provide functionality to fetch > > the > > > > > > statistics information of a table for all given partitions in one > > shot. > > > > > > > > > > > > Based on the manual performance test, for 2000 partitions, the cost > > > > will be > > > > > > improved from 10s to 2s. The improvement result is 500%. > > > > > > > > > > > > [1] > > > > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-247%3A+Bulk+fetch+of+table+and+column+statistics+for+given+partitions > > > > > > > > > > > > Best regards, > > > > > > Jing > > > > > > > > > > > >
Re: [DISCUSS] Replace Attempt column with Attempt Number on the subtask list page of the Web UI
Thanks for starting this discussion, Gen! I agree it is confusing or even troublesome to show an attempt id that is different from the corresponding attempt number in REST, metrics and logs. It adds burden to users to do the mapping in troubleshooting. Mis-mapping can be easy to happen and result in a waste of efforts and wrong conclusion. Therefore, +1 for this proposal. Thanks, Zhu Gen Luo 于2022年7月20日周三 15:24写道: > > Hi everyone, > > I'd like to propose a change on the Web UI to replace the Attempt column > with an Attempt Number column on the subtask list page. > > From the very beginning, the attempt number shown is calculated at the > frontend by subtask.attempt + 1, which means the attempt number shown on > the web UI is not the same as it is in the runtime, as well as the logs and > the metrics. Users may get confused since they can't find logs or metrics > of the subtask with the same attempt number. > > Fortunately, by now the users don't need to care about the attempt number, > since there can be only one attempt of each subtask. However, the confusion > seems inevitable once the speculative execution[1] or the attempt history > is introduced, since multiple attempts of the same subtask can be executed > or presented at the same time. > > I suggest that the attempt number shown on the web UI should be changed to > align that on the runtime side, which is used in logging and metrics > reporting. To avoid confusion, the column should also be renamed as > "Attempt Number". The changes should only affect the Web UI. No REST API > needs to change. What do you think? > > [1] > https://cwiki.apache.org/confluence/display/FLINK/FLIP-168%3A+Speculative+Execution+for+Batch+Job > > Best, > Gen
Re: [DISCUSS] FLIP-247 Bulk fetch of table and column statistics for given partitions
Hi Jingsong, Thanks for clarifying it. Are you suggesting a new method or changing the name of the methods described in the FLIP? Please see my answers and further questions below. Best regards, Jing On Wed, Jul 20, 2022 at 4:28 AM Jingsong Li wrote: > Hi Jing, > > I understand that the statistics for partitions are currently only > used by Hive, so we can look at the Hive implementation: > > See HiveCatalog.getPartitionStatistics. > To get the statistics, we actually get them from the > org.apache.hadoop.hive.metastore.api.Partition object. > Correct. Both new methods will do the same thing but in bulk mode. > > According to HiveMetastore's API, partition-related operations > actually get the partition as well as the statistics information. > I am not really sure which API it is. Methods in HiveMetaStoreClient that are dealing with partitions will either return partitions or statisticObjs, e.g. listPartitions(...) or getPartitionColumnStatistics(...) > > So if the current partition statistics are just for Hive, can we > consider unifying it with Hive? > Yes, we are on the same page. This FLIP is trying to unify it with HiveMetaStoreClient. > > For example, in PushPartitionIntoTableSourceScanRule, just use > `listPartitionWithStats`, and adjust table statistics from partitions. > Does the bulkGetPartitionStatistics work in this case too? Or, do you mean you need both partitions and related statistics returned by a new method called `listPartitionWithStats`? > Best, > Jingsong > > On Tue, Jul 19, 2022 at 8:44 PM Jing Ge wrote: > > > > Thanks Jingsong for the suggestion. > > > > Do you mean using a different naming convention? There is a thought and > > description in the FLIP about using "list" or "bulkGet": > > > >- bulkGetPartitionStatistics(...) has been chosen over > >listPartitionStatistics(...), because, comparing to database and > partition > >that are static and can be listed, statistics are more dynamic and > will > >need more computation logic to create, therefore using "get" is > >semantically more feasible than list. The "bulk" gives users the hint > that > >this method will work in the bulk mode and return a collection of > instances. > > > > > > As a reference, we can see that no method in MetaStoreClient, that > > calculates statistics, uses the "list" naming convention. > > > > Best regards, > > Jing > > > > On Fri, Jul 15, 2022 at 5:38 AM Jingsong Li > wrote: > > > > > Thanks for starting this discussion. > > > > > > Have we considered introducing a listPartitionWithStats() in Catalog? > > > > > > Best, > > > Jingsong > > > > > > On Fri, Jul 15, 2022 at 10:08 AM Jark Wu wrote: > > > > > > > > Hi Jing, > > > > > > > > Thanks for starting this discussion. The bulk fetch is a great > > > improvement > > > > for the optimizer. > > > > The FLIP looks good to me. > > > > > > > > Best, > > > > Jark > > > > > > > > On Fri, 8 Jul 2022 at 17:36, Jing Ge wrote: > > > > > > > > > Hi devs, > > > > > > > > > > After having multiple discussions with Jark and Goldfrey, I'd like > to > > > start > > > > > a discussion on the mailing list w.r.t. FLIP-247[1], which will > > > > > significantly improve the performance by providing the bulk fetch > > > > > capability for table and column statistics. > > > > > > > > > > Currently the statistics information about tables can only be > fetched > > > from > > > > > the catalog by each given partition iteratively. Since getting > > > statistics > > > > > information from catalogs is a very heavy operation, in order to > > > improve > > > > > the query performance, we’d better provide functionality to fetch > the > > > > > statistics information of a table for all given partitions in one > shot. > > > > > > > > > > Based on the manual performance test, for 2000 partitions, the cost > > > will be > > > > > improved from 10s to 2s. The improvement result is 500%. > > > > > > > > > > [1] > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-247%3A+Bulk+fetch+of+table+and+column+statistics+for+given+partitions > > > > > > > > > > Best regards, > > > > > Jing > > > > > > > > >
[DISCUSS] Replace Attempt column with Attempt Number on the subtask list page of the Web UI
Hi everyone, I'd like to propose a change on the Web UI to replace the Attempt column with an Attempt Number column on the subtask list page. >From the very beginning, the attempt number shown is calculated at the frontend by subtask.attempt + 1, which means the attempt number shown on the web UI is not the same as it is in the runtime, as well as the logs and the metrics. Users may get confused since they can't find logs or metrics of the subtask with the same attempt number. Fortunately, by now the users don't need to care about the attempt number, since there can be only one attempt of each subtask. However, the confusion seems inevitable once the speculative execution[1] or the attempt history is introduced, since multiple attempts of the same subtask can be executed or presented at the same time. I suggest that the attempt number shown on the web UI should be changed to align that on the runtime side, which is used in logging and metrics reporting. To avoid confusion, the column should also be renamed as "Attempt Number". The changes should only affect the Web UI. No REST API needs to change. What do you think? [1] https://cwiki.apache.org/confluence/display/FLINK/FLIP-168%3A+Speculative+Execution+for+Batch+Job Best, Gen
Re: [ANNOUNCE] Table Store release-0.2 branch cut
Thanks Jinsong for the great work, look forward to the release. Best, Leonard > 2022年7月20日 下午2:50,Jingsong Li 写道: > > Hi Flink devs! > > The version on the main branch has been updated to 0.3-SNAPSHOT. > > The release-0.2 branch has been forked from main: > https://github.com/apache/flink-table-store/tree/release-0.2 > > Documentation: > https://nightlies.apache.org/flink/flink-table-store-docs-release-0.2/ > > The most important thing about Table Store 0.2 is that it enriches the > Table Store ecosystem. Now not only Flink 1.15, but also Flink 1.14, > Hive 2.3, Spark 2.4, Spark 3+, and Trino are all supported. > > More features are: Catalog (Including Hive Metastore), Rescale bucket, > Append-only mode, Improved documentation etc.. > > You are very welcome to test it! > > We will try to prepare the first RC in the next week. > > Best, > Jingsong
[ANNOUNCE] Table Store release-0.2 branch cut
Hi Flink devs! The version on the main branch has been updated to 0.3-SNAPSHOT. The release-0.2 branch has been forked from main: https://github.com/apache/flink-table-store/tree/release-0.2 Documentation: https://nightlies.apache.org/flink/flink-table-store-docs-release-0.2/ The most important thing about Table Store 0.2 is that it enriches the Table Store ecosystem. Now not only Flink 1.15, but also Flink 1.14, Hive 2.3, Spark 2.4, Spark 3+, and Trino are all supported. More features are: Catalog (Including Hive Metastore), Rescale bucket, Append-only mode, Improved documentation etc.. You are very welcome to test it! We will try to prepare the first RC in the next week. Best, Jingsong
Re: [VOTE] FLIP-238: Introduce FLIP-27-based Data Generator Source
+1(non-binding) Thanks for driving this! Best regards, Jing On Wed, Jul 20, 2022 at 7:48 AM Robert Metzger wrote: > +1 > > On Wed, Jul 20, 2022 at 4:41 AM Rui Fan <1996fan...@gmail.com> wrote: > > > +1(non-binding) > > > > New Source can better support some features, such as > > Unaligned Checkpoint, Watermark alignment, etc. > > The data generator based on the new Source is very helpful > > for daily testing. > > > > Very much looking forward to using it. > > > > Best wishes > > Rui Fan > > > > On Wed, Jul 20, 2022 at 4:22 AM Martijn Visser > > > wrote: > > > > > +1 (binding) > > > > > > Thanks for the efforts Alex! > > > > > > Op di 19 jul. 2022 om 21:31 schreef Alexander Fedulov < > > > alexan...@ververica.com>: > > > > > > > Hi everyone, > > > > > > > > following the discussion in [1], I would like to open up a vote for > > > > adding a FLIP-27-based Data Generator Source [2]. > > > > > > > > The addition of this source also unblocks the currently pending > > > > efforts for deprecating the Source Function API [3]. > > > > > > > > The poll will be open until July 25 (72h + weekend), unless there is > > > > an objection or not enough votes. > > > > > > > > [1] https://lists.apache.org/thread/7gjxto1rmkpff4kl54j8nlg5db2rqhkt > > > > [2] https://cwiki.apache.org/confluence/x/9Av1D > > > > [3] > https://github.com/apache/flink/pull/20049#issuecomment-1170948767 > > > > > > > > Best, > > > > Alexander Fedulov > > > > > > > > > >
Re: [DISCUSS] FLIP-252: Amazon DynamoDB Sink Connector
Thanks a lot for this nice proposal! DynamoDB seems to be a connector that Flink is still lacking, and with the Async Sink interface, it seems that we can implement this fairly easily. +1 to proceed to the formal vote for this FLIP! On Fri, Jul 15, 2022 at 7:51 PM Danny Cranmer wrote: > Hello all, > > We would like to start a discussion thread on FLIP-252: Amazon DynamoDB > Sink Connector [1] where we propose to provide a sink connector for Amazon > DynamoDB [2] based on the Async Sink [3]. Looking forward to comments and > feedback. Thank you. > > [1] > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-252%3A+Amazon+DynamoDB+Sink+Connector > [2] https://aws.amazon.com/dynamodb > [3] > https://cwiki.apache.org/confluence/display/FLINK/FLIP-171%3A+Async+Sink >
Re: 邮件退订
To unsubscribe, you can send any email to dev-unsubscr...@flink.apache.org Best regards, Yuxia - 原始邮件 - 发件人: "cason0126" 收件人: "dev" 发送时间: 星期三, 2022年 7 月 20日 上午 11:49:33 主题: 邮件退订 邮件退订 | | cason0126 | | cason0...@163.com |