date:20220720

[jira] [Created] (FLINK-28619) flink sql window aggregation using early fire will produce empty data

2022-07-20 Thread simenliuxing (Jira)

simenliuxing created FLINK-28619:


 Summary: flink sql window aggregation using early fire will 
produce empty data
 Key: FLINK-28619
 URL: https://issues.apache.org/jira/browse/FLINK-28619
 Project: Flink
  Issue Type: Bug
  Components: Table SQL / Planner, Table SQL / Runtime
Affects Versions: 1.15.0
Reporter: simenliuxing
 Fix For: 1.15.2


sql is as follows:

 
{code:java}
set table.exec.emit.early-fire.enabled=true;
set table.exec.emit.early-fire.delay=1s;
set table.exec.resource.default-parallelism = 1;
CREATE TABLE source_table
(
    id   int,
    name VARCHAR,
    age  int,
    proc_time AS PROCTIME()
) WITH (
      'connector' = 'datagen'
      ,'rows-per-second' = '3'
      ,'number-of-rows' = '1000'
      ,'fields.id.min' = '1'
      ,'fields.id.max' = '1'
      ,'fields.age.min' = '1'
      ,'fields.age.max' = '150'
      ,'fields.name.length' = '3'
      );
CREATE TABLE sink_table
(
    id     int,
    name   VARCHAR,
    ageAgg int,
    PRIMARY KEY (id) NOT ENFORCED
) WITH (
      'connector' = 'upsert-kafka'
      ,'properties.bootstrap.servers' = 'localhost:9092'
      ,'topic' = 'aaa'
      ,'key.format' = 'json'
      ,'value.format' = 'json'
      ,'value.fields-include' = 'ALL'
      );
INSERT
INTO sink_table
select id,
       last_value(name) as name,
       sum(age)         as ageAgg
from source_table
group by tumble(proc_time, interval '1' day), id;
{code}
Result received in kafka:

 
{code:java}
{"id":1,"name":"efe","ageAgg":455}
null
{"id":1,"name":"96a","ageAgg":701}
null
{"id":1,"name":"d71","ageAgg":1289}
null
{"id":1,"name":"89c","ageAgg":1515}{code}
 

 

Is the extra null normal?

 

 

 

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Re: [DISCUSS] Replace Attempt column with Attempt Number on the subtask list page of the Web UI

2022-07-20 Thread Gen Luo

Hi user mail list,

I'm also forwarding this thread to you. Please let me know if you have any
comments or feedback!

Best,
Gen

On Wed, Jul 20, 2022 at 4:25 PM Zhu Zhu  wrote:

> Thanks for starting this discussion, Gen!
> I agree it is confusing or even troublesome to show an attempt id that is
> different from the corresponding attempt number in REST, metrics and logs.
> It adds burden to users to do the mapping in troubleshooting. Mis-mapping
> can be easy to happen and result in a waste of efforts and wrong
> conclusion.
>
> Therefore, +1 for this proposal.
>
> Thanks,
> Zhu
>
> Gen Luo  于2022年7月20日周三 15:24写道：
> >
> > Hi everyone,
> >
> > I'd like to propose a change on the Web UI to replace the Attempt column
> > with an Attempt Number column on the subtask list page.
> >
> > From the very beginning, the attempt number shown is calculated at the
> > frontend by subtask.attempt + 1, which means the attempt number shown on
> > the web UI is not the same as it is in the runtime, as well as the logs
> and
> > the metrics. Users may get confused since they can't find logs or metrics
> > of the subtask with the same attempt number.
> >
> > Fortunately, by now the users don't need to care about the attempt
> number,
> > since there can be only one attempt of each subtask. However, the
> confusion
> > seems inevitable once the speculative execution[1] or the attempt history
> > is introduced, since multiple attempts of the same subtask can be
> executed
> > or presented at the same time.
> >
> > I suggest that the attempt number shown on the web UI should be changed
> to
> > align that on the runtime side, which is used in logging and metrics
> > reporting. To avoid confusion, the column should also be renamed as
> > "Attempt Number". The changes should only affect the Web UI. No REST API
> > needs to change. What do you think?
> >
> > [1]
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-168%3A+Speculative+Execution+for+Batch+Job
> >
> > Best,
> > Gen
>

[jira] [Created] (FLINK-28618) Cannot use hive.dialect on master

2022-07-20 Thread liubo (Jira)

liubo created FLINK-28618:
-

 Summary: Cannot use hive.dialect on master 
 Key: FLINK-28618
 URL: https://issues.apache.org/jira/browse/FLINK-28618
 Project: Flink
  Issue Type: Bug
  Components: Connectors / Hive
 Environment: hadoop_class  2.10

openjdk11
Reporter: liubo
 Attachments: image-2022-07-21-11-01-12-395.png, 
image-2022-07-21-11-04-12-552.png

I build the newest master flink and copy 
\{hive-exec-2.1.1.jar;flink-sql-connector-hive-2.3.9_2.12-1.16-SNAPSHOT.jar;flink-connector-hive_2.12-1.16-SNAPSHOT.jar}
 to $FLINK_HOME/lib 。

 

then ， i got faild in sql-client  !image-2022-07-21-11-01-12-395.png!

and after copy opt/flink-table-planner_2.12-1.16-SNAPSHOT.jar 

even cannot open the sql-client 

!image-2022-07-21-11-04-12-552.png!

 

so , what's wronge? 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-28617) Support stop job statement in SqlGatewayService

2022-07-20 Thread Paul Lin (Jira)

Paul Lin created FLINK-28617:


 Summary: Support stop job statement in SqlGatewayService
 Key: FLINK-28617
 URL: https://issues.apache.org/jira/browse/FLINK-28617
 Project: Flink
  Issue Type: Sub-task
  Components: Table SQL / Gateway
Reporter: Paul Lin






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

RE: [VOTE] FLIP-251: Support collecting arbitrary number of streams

2022-07-20 Thread Roc Marshal

+1. (Non-binding)
Thanks for driving this.

Best
Roc Marshal

On 2022/07/19 08:26:15 Chesnay Schepler wrote:
> I'd like to proceed with the vote for FLIP-251 [1], as no objections or 
> issues were raised in [2].
> 
> The vote will last for at least 72 hours unless there is an objection or
> insufficient votes.
> 
>   [1] 
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-251%3A+Support+collecting+arbitrary+number+of+streams
>   [2] https://lists.apache.org/thread/ksv71m7rvcwslonw07h2qnw77zpqozvh
> 
>

Re: [DISCUSS] FLIP-217 Support watermark alignment of source splits

2022-07-20 Thread Thomas Weise

Hi Sebastian,

Thank you for updating the FLIP and sorry for my delayed response. As
Piotr pointed out, we would need to incorporate the fallback flag into
the design to reflect the outcome of the previous discussion.

Based on the current FLIP and as detailed by Becket, the
SourceOperator coordinates the alignment. It is responsible for the
pause/resume decision and knows how many splits are assigned.
Therefore shouldn't it have all the information needed to efficiently
handle the case of UnsupportedOperationException thrown by a reader?

Although the fallback requires some extra implementation effort, I
think that is more than offset by not surprising users and offering a
smoother migration path. Yes, the flag is a temporary feature that
will become obsolete in perhaps 2-3 releases (can we please also
include that into the FLIP?). But since it would be just a
configuration property that can be ignored at that point (for which
there is precedence), no code change will be forced on users.

As for the property name, perhaps the following would be even more descriptive?

coarse.grained.wm.alignment.fallback.enabled

Thanks!
Thomas


On Wed, Jul 13, 2022 at 10:59 AM Becket Qin  wrote:
>
> Thanks for the explanation, Sebastian. I understand your concern now.
>
> 1. About the major concern. Personally I'd consider the coarse grained 
> watermark alignment as a special case for backward compatibility. In the 
> future, if for whatever reason we want to pause a split and that is not 
> supported, it seems the only thing that makes sense is throwing an exception, 
> instead of pausing the entire source reader. Regarding this FLIP, if the 
> logic that determines which split should be paused is in the SourceOperator, 
> the SourceOperator actually knows the reason why it pauses a split. It also 
> knows whether there are more than one split assigned to the source reader. So 
> it can just fallback to the coarse grained watermark alignment, without 
> affecting other reasons of pausing a split, right? And in the future, if 
> there are more purposes for pausing / resuming a split, the SourceOperator 
> still needs to understand each of the reasons in order to resume the splits 
> after all the pausing conditions are no longer met.
>
> 2. Naming wise, would "coarse.grained.watermark.alignment.enabled" address 
> your concern?
>
> The only concern I have for Option A is that people may not be able to 
> benefit from split level WM alignment until all the sources they need have 
> that implemented. This seems unnecessarily delaying the adoption of a new 
> feature, which looks like a more substantive downside compared with the 
> "coarse.grained.wm.alignment.enabled" option.
>
> BTW, the SourceOperator doesn't need to invoke the pauseOrResumeSplit() 
> method and catch the UnsupportedOperation every time. A flag can be set so it 
> doesn't attempt to pause the split after the first time it sees the exception.
>
>
> Thanks,
>
> Jiangjie (Becket) Qin
>
>
>
> On Wed, Jul 13, 2022 at 5:11 PM Sebastian Mattheis  
> wrote:
>>
>> Hi Becket, Hi Thomas, Hi Piotrek,
>>
>> Thanks for the feedback. I would like to highlight some concerns:
>>
>> Major: A configuration parameter like "allow coarse grained alignment" 
>> defines a semantic that mixes two contexts conditionally as follows: "ignore 
>> incapability to pause splits in SourceReader/SplitReader" IF (conditional) 
>> we "allow coarse grained watermark alignment". At the same time we said that 
>> there is no way to check the capability of SourceReader/SplitReader to 
>> pause/resume other than observing a UnsupportedOperationException during 
>> runtime such that we cannot disable the trigger for watermark split 
>> alignment in the SourceOperator. Instead, we can only ignore the 
>> incapability of SourceReader/SplitReader during execution of a pause/resume 
>> attempt which, consequently, requires to check the "allow coarse grained 
>> alignment " parameter value (to implement the conditional semantic). 
>> However, during this execution we actually don't know whether the attempt 
>> was executed for the purpose of watermark alignment or for some other 
>> purpose such that the check actually depends on who triggered the 
>> pause/resume attempt and hides the exception potentially unexpectedly for 
>> some other use case. Of course, currently there is no other purpose and, 
>> hence, no other trigger than watermark alignment. However, this breaks, in 
>> my perspective, the idea of having pauseOrResumeSplits (re)usable for other 
>> use cases.
>> Minor: I'm not aware of any configuration parameter in the format like 
>> `allow.*` as you suggested with `allow.coarse.grained.watermark.alignment`. 
>> Would that still be okay to do?
>>
>> As we have agreed to not have a "supportsPausableSplits" method because of 
>> potential inconsistencies between return value of this method and the actual 
>> implementation (and also the difficulty to have a meaningful return value 
>>

[GitHub] [flink-connector-redis] knaufk commented on pull request #2: [FLINK-15571][connector][WIP] Redis Stream connector for Flink

2022-07-20 Thread GitBox



knaufk commented on PR #2:
URL: 
https://github.com/apache/flink-connector-redis/pull/2#issuecomment-1190766247

   Thanks @sazzad16 for you contribution and patience. I think the community 
would really benefit from a Redis Connector and I'll try to help you get this 
in. 
   
   The main challenge with the current PR is that it uses the old - about to be 
deprecated - source and sink APIs of Apache Flink (SourceFunction, 
SinkFunction). 
   
   Could you try migrating your implemention to these new APIs?
   
   For the Source, 
https://nightlies.apache.org/flink/flink-docs-release-1.15/docs/dev/datastream/sources/
 contains a description of the interfaces and there are already a few examples 
like 
[Kafka](https://github.com/apache/flink/blob/master/flink-connectors/flink-connector-kafka/src/main/java/org/apache/flink/connector/kafka/source/KafkaSource.java)
 or 
[Pulsar](https://github.com/apache/flink/blob/master/flink-connectors/flink-connector-pulsar/src/main/java/org/apache/flink/connector/pulsar/source/PulsarSource.java).
 
   
   For the source, there is 
[ElasticSearch](https://github.com/apache/flink-connector-elasticsearch/blob/main/flink-connector-elasticsearch-base/src/main/java/org/apache/flink/connector/elasticsearch/sink/ElasticsearchSink.java)
 that uses the new Sink API as well as the 
[FileSink](https://github.com/apache/flink/blob/master/flink-connectors/flink-connector-files/src/main/java/org/apache/flink/connector/file/sink/FileSink.java).
 However, I would recommend you have a look whether you can leverage the [Async 
Sink](https://cwiki.apache.org/confluence/display/FLINK/FLIP-171%3A+Async+Sink) 
like e.g. the DynamoDB Sink is doing, which is also under development right 
now. As prerequisite, you would need to be to asynchronously write to Redis and 
tell based on the resulting Future whether the request was successful or not 
(see Public Interfaces in the Async Sink FLIP). From what I know about Redis 
this should be possible and would greatly simplify the implementation of a 
Sink. 
   
   Lastly, I propose we split this contribution up into at least four separate 
PRs to get this moving more quickly.
   * the Source
   * the Sink
   * the TableSource
   * the TableSink
   
   Just start with what seems most relevant to you.
   
   I am sorry about these requests to port the implementation to the new APIs, 
but building on the old APIs will not be sustainable and with the new APIs we 
immediately get all the support for both batch and stream execution in the 
DataStream and Table API as well as better chances this connector ties in 
perfectly with checkpointing and watermarking without much work from your side. 
   
   Thanks again and looking forward to hearing from you. 
   
   
   

   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [VOTE] FLIP-252: Amazon DynamoDB Sink Connector

2022-07-20 Thread Konstantin Knauf

+1. Thanks!

Am Mi., 20. Juli 2022 um 16:48 Uhr schrieb Tzu-Li (Gordon) Tai <
tzuli...@apache.org>:

> +1
>
> On Wed, Jul 20, 2022 at 6:13 AM Danny Cranmer 
> wrote:
>
> > Hi there,
> >
> > After the discussion in [1], I’d like to open a voting thread for
> FLIP-252
> > [2], which proposes the addition of an Amazon DynamoDB sink based on the
> > Async Sink [3].
> >
> > The vote will be open until July 23rd earliest (72h), unless there are
> any
> > binding vetos.
> >
> > Cheers, Danny
> >
> > [1] https://lists.apache.org/thread/ssmf2c86n3xyd5qqmcdft22sqn4qw8mw
> > [2]
> >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-252%3A+Amazon+DynamoDB+Sink+Connector
> > [3]
> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-171%3A+Async+Sink
> >
>


-- 
https://twitter.com/snntrable
https://github.com/knaufk

Re: [VOTE] FLIP-251: Support collecting arbitrary number of streams

2022-07-20 Thread Konstantin Knauf

+1 (binding)

Am Mi., 20. Juli 2022 um 15:48 Uhr schrieb David Anderson <
dander...@apache.org>:

> +1
>
> Thank you Chesnay.
>
> On Tue, Jul 19, 2022 at 3:09 PM Alexander Fedulov  >
> wrote:
>
> > +1
> > Looking forward to using the API to simplify tests setups.
> >
> > Best,
> > Alexander Fedulov
> >
> > On Tue, Jul 19, 2022 at 2:31 PM Martijn Visser  >
> > wrote:
> >
> > > Thanks for creating the FLIP and opening the vote Chesnay.
> > >
> > > +1 (binding)
> > >
> > > Op di 19 jul. 2022 om 10:26 schreef Chesnay Schepler <
> ches...@apache.org
> > >:
> > >
> > > > I'd like to proceed with the vote for FLIP-251 [1], as no objections
> or
> > > > issues were raised in [2].
> > > >
> > > > The vote will last for at least 72 hours unless there is an objection
> > or
> > > > insufficient votes.
> > > >
> > > >   [1]
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-251%3A+Support+collecting+arbitrary+number+of+streams
> > > >   [2]
> https://lists.apache.org/thread/ksv71m7rvcwslonw07h2qnw77zpqozvh
> > > >
> > > >
> > >
> >
>


-- 
https://twitter.com/snntrable
https://github.com/knaufk

[DISCUSS] FLIP-246: Multi Cluster Kafka Source

2022-07-20 Thread Mason Chen

Hi all,

We would like to start a discussion thread on FLIP-246: Multi Cluster Kafka
Source [1] where we propose to provide a source connector for dynamically
reading from Kafka multiple clusters, which will not require Flink job
restart. This can greatly improve the Kafka migration experience for
clusters and topics, and it solves some existing problems with the current
KafkaSource. There was some interest from users [2] from a meetup and the
mailing list. Looking forward to comments and feedback, thanks!

[1]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-246%3A+Multi+Cluster+Kafka+Source
[2] https://lists.apache.org/thread/zmpnzx6jjsqc0oldvdm5y2n674xzc3jc

Best,
Mason

[jira] [Created] (FLINK-28616) Quickstart and examples docs should use configured stable version

2022-07-20 Thread Gyula Fora (Jira)

Gyula Fora created FLINK-28616:
--

 Summary: Quickstart and examples docs should use configured stable 
version
 Key: FLINK-28616
 URL: https://issues.apache.org/jira/browse/FLINK-28616
 Project: Flink
  Issue Type: Improvement
  Components: Kubernetes Operator
Reporter: Gyula Fora


The Kubernetes Operator documentation contains several parts that refer to the 
current stable versions.  A good example would be a quickstart which uses helm 
repo link for the last stable release. 

We should change the logic so that this always points to the configured stable 
release (part of config doc). This way we would avoid the need for upgrading 
this as part of every release.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-28615) Add K8s recommend labels to the deployments, services created by operator

2022-07-20 Thread Xin Hao (Jira)

Xin Hao created FLINK-28615:
---

 Summary: Add K8s recommend labels to the deployments, services 
created by operator
 Key: FLINK-28615
 URL: https://issues.apache.org/jira/browse/FLINK-28615
 Project: Flink
  Issue Type: Improvement
  Components: Kubernetes Operator
Reporter: Xin Hao


I'm using the Flink operator with Argo CD, it will be nice if we can add the 
K8s recommend labels to the deployments and services. Such as:
{code:java}
labels:
  app.kubernetes.io/managed-by: apache-flink-operator
  app.kubernetes.io/part-of: flink-session-cluster-a {code}
With this, the users can see all the resources created by FlinkDeployment

 

See also:

https://kubernetes.io/docs/concepts/overview/working-with-objects/common-labels/#labels



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Re: [VOTE] Apache Flink Kubernetes Operator Release 1.1.0, release candidate #1

2022-07-20 Thread Geng Biao

Hi there,
Thanks a lot for the release!

+1 (non-binding)


Successfully verified the following:
- Checksums and gpg signatures of the tar files.
- No binaries in source release
- Build from source, build image from source
- Helm Repo works, Helm install works
- Submit example applications without errors
- Check that flink sql/python examples with flink kubernetes operator work as 
expected
- Check licenses in the docs dir in source code

Best,
Biao Geng


  *

From: Gyula Fóra 
Date: Wednesday, July 20, 2022 at 5:47 PM
To: dev 
Subject: [VOTE] Apache Flink Kubernetes Operator Release 1.1.0, release 
candidate #1
Hi everyone,

Please review and vote on the release candidate #1 for the version 1.1.0 of
Apache Flink Kubernetes Operator,
as follows:
[ ] +1, Approve the release
[ ] -1, Do not approve the release (please provide specific comments)

**Release Overview**

As an overview, the release consists of the following:
a) Kubernetes Operator canonical source distribution (including the
Dockerfile), to be deployed to the release repository at dist.apache.org
b) Kubernetes Operator Helm Chart to be deployed to the release repository
at dist.apache.org
c) Maven artifacts to be deployed to the Maven Central Repository
d) Docker image to be pushed to dockerhub

**Staging Areas to Review**

The staging areas containing the above mentioned artifacts are as follows,
for your review:
* All artifacts for a,b) can be found in the corresponding dev repository
at dist.apache.org [1]
* All artifacts for c) can be found at the Apache Nexus Repository [2]
* The docker image for d) is staged on github [3]

All artifacts are signed with the key
0B4A34ADDFFA2BB54EB720B221F06303B87DAFF1 [4]

Other links for your review:
* JIRA release notes [5]
* source code tag "release-1.1.0-rc1" [6]
* PR to update the website Downloads page to include Kubernetes Operator
links [7]

**Vote Duration**

The voting time will run for at least 72 hours.
It is adopted by majority approval, with at least 3 PMC affirmative votes.

**Note on Verification**

You can follow the basic verification guide here[8].
Note that you don't need to verify everything yourself, but please make
note of what you have tested together with your +- vote.

Thanks,
Gyula Fora

[1]
https://dist.apache.org/repos/dist/dev/flink/flink-kubernetes-operator-1.1.0-rc1/
[2] https://repository.apache.org/content/repositories/orgapacheflink-1518/
[3] ghcr.io/apache/flink-kubernetes-operator:c9dec3f
[4] https://dist.apache.org/repos/dist/release/flink/KEYS
[5]
https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12351723
[6]
https://github.com/apache/flink-kubernetes-operator/tree/release-1.1.0-rc1
[7] https://github.com/apache/flink-web/pull/560
[8]
https://cwiki.apache.org/confluence/display/FLINK/Verifying+a+Flink+Kubernetes+Operator+Release

Re: [VOTE] FLIP-252: Amazon DynamoDB Sink Connector

2022-07-20 Thread Tzu-Li (Gordon) Tai

+1

On Wed, Jul 20, 2022 at 6:13 AM Danny Cranmer 
wrote:

> Hi there,
>
> After the discussion in [1], I’d like to open a voting thread for FLIP-252
> [2], which proposes the addition of an Amazon DynamoDB sink based on the
> Async Sink [3].
>
> The vote will be open until July 23rd earliest (72h), unless there are any
> binding vetos.
>
> Cheers, Danny
>
> [1] https://lists.apache.org/thread/ssmf2c86n3xyd5qqmcdft22sqn4qw8mw
> [2]
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-252%3A+Amazon+DynamoDB+Sink+Connector
> [3]
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-171%3A+Async+Sink
>

Re: [VOTE] FLIP-251: Support collecting arbitrary number of streams

2022-07-20 Thread David Anderson

+1

Thank you Chesnay.

On Tue, Jul 19, 2022 at 3:09 PM Alexander Fedulov 
wrote:

> +1
> Looking forward to using the API to simplify tests setups.
>
> Best,
> Alexander Fedulov
>
> On Tue, Jul 19, 2022 at 2:31 PM Martijn Visser 
> wrote:
>
> > Thanks for creating the FLIP and opening the vote Chesnay.
> >
> > +1 (binding)
> >
> > Op di 19 jul. 2022 om 10:26 schreef Chesnay Schepler  >:
> >
> > > I'd like to proceed with the vote for FLIP-251 [1], as no objections or
> > > issues were raised in [2].
> > >
> > > The vote will last for at least 72 hours unless there is an objection
> or
> > > insufficient votes.
> > >
> > >   [1]
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-251%3A+Support+collecting+arbitrary+number+of+streams
> > >   [2] https://lists.apache.org/thread/ksv71m7rvcwslonw07h2qnw77zpqozvh
> > >
> > >
> >
>

Re: [DISCUSS] FLIP-252: Amazon DynamoDB Sink Connector

2022-07-20 Thread Danny Cranmer

Thanks for the feedback and support Robert. I have created a vote thread [1]

Danny Cranmer

[1] https://lists.apache.org/thread/4qq33hxfjf5ms21pbo4mk2q7xn4s2l0h

On Wed, Jul 20, 2022 at 7:19 AM Robert Metzger  wrote:

> Thanks a lot for this nice proposal!
>
> DynamoDB seems to be a connector that Flink is still lacking, and with the
> Async Sink interface, it seems that we can implement this fairly easily.
>
> +1 to proceed to the formal vote for this FLIP!
>
> On Fri, Jul 15, 2022 at 7:51 PM Danny Cranmer 
> wrote:
>
> > Hello all,
> >
> > We would like to start a discussion thread on FLIP-252: Amazon DynamoDB
> > Sink Connector [1] where we propose to provide a sink connector for
> Amazon
> > DynamoDB [2] based on the Async Sink [3]. Looking forward to comments and
> > feedback. Thank you.
> >
> > [1]
> >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-252%3A+Amazon+DynamoDB+Sink+Connector
> > [2] https://aws.amazon.com/dynamodb
> > [3]
> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-171%3A+Async+Sink
> >
>

[VOTE] FLIP-252: Amazon DynamoDB Sink Connector

2022-07-20 Thread Danny Cranmer

Hi there,

After the discussion in [1], I’d like to open a voting thread for FLIP-252
[2], which proposes the addition of an Amazon DynamoDB sink based on the
Async Sink [3].

The vote will be open until July 23rd earliest (72h), unless there are any
binding vetos.

Cheers, Danny

[1] https://lists.apache.org/thread/ssmf2c86n3xyd5qqmcdft22sqn4qw8mw
[2]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-252%3A+Amazon+DynamoDB+Sink+Connector
[3] https://cwiki.apache.org/confluence/display/FLINK/FLIP-171%3A+Async+Sink

Re: [ANNOUNCE] Table Store release-0.2 branch cut

2022-07-20 Thread Shuo Cheng

Awesome! Looking forward to the release.

Best,
Shuo

On Wed, Jul 20, 2022 at 3:12 PM Leonard Xu  wrote:

> Thanks Jinsong for the great work, look forward to the release.
>
> Best,
> Leonard
>
> > 2022年7月20日 下午2:50，Jingsong Li  写道：
> >
> > Hi Flink devs!
> >
> > The version on the main branch has been updated to 0.3-SNAPSHOT.
> >
> > The release-0.2 branch has been forked from main:
> > https://github.com/apache/flink-table-store/tree/release-0.2
> >
> > Documentation:
> > https://nightlies.apache.org/flink/flink-table-store-docs-release-0.2/
> >
> > The most important thing about Table Store 0.2 is that it enriches the
> > Table Store ecosystem. Now not only Flink 1.15, but also Flink 1.14,
> > Hive 2.3, Spark 2.4, Spark 3+, and Trino are all supported.
> >
> > More features are: Catalog (Including Hive Metastore), Rescale bucket,
> > Append-only mode, Improved documentation etc..
> >
> > You are very welcome to test it!
> >
> > We will try to prepare the first RC in the next week.
> >
> > Best,
> > Jingsong
>
>

[VOTE] Apache Flink Kubernetes Operator Release 1.1.0, release candidate #1

2022-07-20 Thread Gyula Fóra

Hi everyone,

Please review and vote on the release candidate #1 for the version 1.1.0 of
Apache Flink Kubernetes Operator,
as follows:
[ ] +1, Approve the release
[ ] -1, Do not approve the release (please provide specific comments)

**Release Overview**

As an overview, the release consists of the following:
a) Kubernetes Operator canonical source distribution (including the
Dockerfile), to be deployed to the release repository at dist.apache.org
b) Kubernetes Operator Helm Chart to be deployed to the release repository
at dist.apache.org
c) Maven artifacts to be deployed to the Maven Central Repository
d) Docker image to be pushed to dockerhub

**Staging Areas to Review**

The staging areas containing the above mentioned artifacts are as follows,
for your review:
* All artifacts for a,b) can be found in the corresponding dev repository
at dist.apache.org [1]
* All artifacts for c) can be found at the Apache Nexus Repository [2]
* The docker image for d) is staged on github [3]

All artifacts are signed with the key
0B4A34ADDFFA2BB54EB720B221F06303B87DAFF1 [4]

Other links for your review:
* JIRA release notes [5]
* source code tag "release-1.1.0-rc1" [6]
* PR to update the website Downloads page to include Kubernetes Operator
links [7]

**Vote Duration**

The voting time will run for at least 72 hours.
It is adopted by majority approval, with at least 3 PMC affirmative votes.

**Note on Verification**

You can follow the basic verification guide here[8].
Note that you don't need to verify everything yourself, but please make
note of what you have tested together with your +- vote.

Thanks,
Gyula Fora

[1]
https://dist.apache.org/repos/dist/dev/flink/flink-kubernetes-operator-1.1.0-rc1/
[2] https://repository.apache.org/content/repositories/orgapacheflink-1518/
[3] ghcr.io/apache/flink-kubernetes-operator:c9dec3f
[4] https://dist.apache.org/repos/dist/release/flink/KEYS
[5]
https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12351723
[6]
https://github.com/apache/flink-kubernetes-operator/tree/release-1.1.0-rc1
[7] https://github.com/apache/flink-web/pull/560
[8]
https://cwiki.apache.org/confluence/display/FLINK/Verifying+a+Flink+Kubernetes+Operator+Release

Re: [DISCUSS] FLIP-247 Bulk fetch of table and column statistics for given partitions

2022-07-20 Thread Jark Wu

I agree with Jingsong.

There are use cases to get partitions and partition stats in a single call
to reduce the IO cost.
For example, extending Catalog#listPartitions to
Catalog#listPartitionsWithStats,
and extending Catalog#listPartitionsByFilter to
Catalog#listPartitionsWithStatsByFilter.
This allows the planner to choose the cheapest one to request partitions
and stats.
But of course, this can be introduced in the future.

Besides, I found that the FLIP doesn't cover compatibility properly.
Introducing two new methods in Catalog
will break all third-party catalog implementations. On the other hand, the
planner should
have a way to identify whether the Catalog supports bulk get.

Best,
Jark

On Wed, 20 Jul 2022 at 16:43, Jingsong Li  wrote:

> Thanks for your reply.
>
> - Consider bulkGetPartitionStatistics, partition statistics are
> already in HiveMetastoreClient.listPartitions. But on our side, we
> need Catalog.getPartitions first, and then
> Catalog.bulkGetPartitionStatistics.
>
> - Consider bulkGetPartitionColumnStatistics, yes, as you said, we need it.
>
> But to unify them, the current FLIP is acceptable.
>
> Best,
> Jingsong
>
> On Wed, Jul 20, 2022 at 3:57 PM Jing Ge  wrote:
> >
> > Hi Jingsong,
> >
> > Thanks for clarifying it. Are you suggesting a new method or changing the
> > name of the methods described in the FLIP?
> > Please see my answers and further questions below.
> >
> > Best regards,
> > Jing
> >
> > On Wed, Jul 20, 2022 at 4:28 AM Jingsong Li 
> wrote:
> >
> > > Hi Jing,
> > >
> > > I understand that the statistics for partitions are currently only
> > > used by Hive, so we can look at the Hive implementation:
> > >
> > > See HiveCatalog.getPartitionStatistics.
> > > To get the statistics, we actually get them from the
> > > org.apache.hadoop.hive.metastore.api.Partition object.
> > >
> >
> > Correct. Both new methods will do the same thing but in bulk mode.
> >
> >
> > >
> > > According to HiveMetastore's API, partition-related operations
> > > actually get the partition as well as the statistics information.
> > >
> >
> > I am not really sure which API it is. Methods in HiveMetaStoreClient that
> > are dealing with partitions will either return partitions or
> statisticObjs,
> > e.g. listPartitions(...) or getPartitionColumnStatistics(...)
> >
> >
> > >
> > > So if the current partition statistics are just for Hive, can we
> > > consider unifying it with Hive?
> > >
> >
> > Yes, we are on the same page. This FLIP is trying to unify it with
> > HiveMetaStoreClient.
> >
> >
> > >
> > > For example, in PushPartitionIntoTableSourceScanRule, just use
> > > `listPartitionWithStats`, and adjust table statistics from partitions.
> > >
> >
> > Does the bulkGetPartitionStatistics work in this case too? Or, do you
> mean
> > you need both partitions and related statistics returned by a new method
> > called `listPartitionWithStats`?
> >
> >
> > > Best,
> > > Jingsong
> > >
> > > On Tue, Jul 19, 2022 at 8:44 PM Jing Ge  wrote:
> > > >
> > > > Thanks Jingsong for the suggestion.
> > > >
> > > > Do you mean using a different naming convention? There is a thought
> and
> > > > description in the FLIP about using "list" or "bulkGet":
> > > >
> > > >- bulkGetPartitionStatistics(...) has been chosen over
> > > >listPartitionStatistics(...), because, comparing to database and
> > > partition
> > > >that are static and can be listed, statistics are more dynamic and
> > > will
> > > >need more computation logic to create, therefore using "get" is
> > > >semantically more feasible than list. The "bulk" gives users the
> hint
> > > that
> > > >this method will work in the bulk mode and return a collection of
> > > instances.
> > > >
> > > >
> > > > As a reference, we can see that no method in MetaStoreClient, that
> > > > calculates statistics, uses the "list" naming convention.
> > > >
> > > > Best regards,
> > > > Jing
> > > >
> > > > On Fri, Jul 15, 2022 at 5:38 AM Jingsong Li 
> > > wrote:
> > > >
> > > > > Thanks for starting this discussion.
> > > > >
> > > > > Have we considered introducing a listPartitionWithStats() in
> Catalog?
> > > > >
> > > > > Best,
> > > > > Jingsong
> > > > >
> > > > > On Fri, Jul 15, 2022 at 10:08 AM Jark Wu  wrote:
> > > > > >
> > > > > > Hi Jing,
> > > > > >
> > > > > > Thanks for starting this discussion. The bulk fetch is a great
> > > > > improvement
> > > > > > for the optimizer.
> > > > > > The FLIP looks good to me.
> > > > > >
> > > > > > Best,
> > > > > > Jark
> > > > > >
> > > > > > On Fri, 8 Jul 2022 at 17:36, Jing Ge  wrote:
> > > > > >
> > > > > > > Hi devs,
> > > > > > >
> > > > > > > After having multiple discussions with Jark and Goldfrey, I'd
> like
> > > to
> > > > > start
> > > > > > > a discussion on the mailing list w.r.t. FLIP-247[1], which will
> > > > > > > significantly improve the performance by providing the bulk
> fetch
> > > > > > > capability for table and column statistics.
> > > > > > >
> > > > > > >

Re: [DISCUSS] FLIP-247 Bulk fetch of table and column statistics for given partitions

2022-07-20 Thread Jingsong Li

Thanks for your reply.

- Consider bulkGetPartitionStatistics, partition statistics are
already in HiveMetastoreClient.listPartitions. But on our side, we
need Catalog.getPartitions first, and then
Catalog.bulkGetPartitionStatistics.

- Consider bulkGetPartitionColumnStatistics, yes, as you said, we need it.

But to unify them, the current FLIP is acceptable.

Best,
Jingsong

On Wed, Jul 20, 2022 at 3:57 PM Jing Ge  wrote:
>
> Hi Jingsong,
>
> Thanks for clarifying it. Are you suggesting a new method or changing the
> name of the methods described in the FLIP?
> Please see my answers and further questions below.
>
> Best regards,
> Jing
>
> On Wed, Jul 20, 2022 at 4:28 AM Jingsong Li  wrote:
>
> > Hi Jing,
> >
> > I understand that the statistics for partitions are currently only
> > used by Hive, so we can look at the Hive implementation:
> >
> > See HiveCatalog.getPartitionStatistics.
> > To get the statistics, we actually get them from the
> > org.apache.hadoop.hive.metastore.api.Partition object.
> >
>
> Correct. Both new methods will do the same thing but in bulk mode.
>
>
> >
> > According to HiveMetastore's API, partition-related operations
> > actually get the partition as well as the statistics information.
> >
>
> I am not really sure which API it is. Methods in HiveMetaStoreClient that
> are dealing with partitions will either return partitions or statisticObjs,
> e.g. listPartitions(...) or getPartitionColumnStatistics(...)
>
>
> >
> > So if the current partition statistics are just for Hive, can we
> > consider unifying it with Hive?
> >
>
> Yes, we are on the same page. This FLIP is trying to unify it with
> HiveMetaStoreClient.
>
>
> >
> > For example, in PushPartitionIntoTableSourceScanRule, just use
> > `listPartitionWithStats`, and adjust table statistics from partitions.
> >
>
> Does the bulkGetPartitionStatistics work in this case too? Or, do you mean
> you need both partitions and related statistics returned by a new method
> called `listPartitionWithStats`?
>
>
> > Best,
> > Jingsong
> >
> > On Tue, Jul 19, 2022 at 8:44 PM Jing Ge  wrote:
> > >
> > > Thanks Jingsong for the suggestion.
> > >
> > > Do you mean using a different naming convention? There is a thought and
> > > description in the FLIP about using "list" or "bulkGet":
> > >
> > >- bulkGetPartitionStatistics(...) has been chosen over
> > >listPartitionStatistics(...), because, comparing to database and
> > partition
> > >that are static and can be listed, statistics are more dynamic and
> > will
> > >need more computation logic to create, therefore using "get" is
> > >semantically more feasible than list. The "bulk" gives users the hint
> > that
> > >this method will work in the bulk mode and return a collection of
> > instances.
> > >
> > >
> > > As a reference, we can see that no method in MetaStoreClient, that
> > > calculates statistics, uses the "list" naming convention.
> > >
> > > Best regards,
> > > Jing
> > >
> > > On Fri, Jul 15, 2022 at 5:38 AM Jingsong Li 
> > wrote:
> > >
> > > > Thanks for starting this discussion.
> > > >
> > > > Have we considered introducing a listPartitionWithStats() in Catalog?
> > > >
> > > > Best,
> > > > Jingsong
> > > >
> > > > On Fri, Jul 15, 2022 at 10:08 AM Jark Wu  wrote:
> > > > >
> > > > > Hi Jing,
> > > > >
> > > > > Thanks for starting this discussion. The bulk fetch is a great
> > > > improvement
> > > > > for the optimizer.
> > > > > The FLIP looks good to me.
> > > > >
> > > > > Best,
> > > > > Jark
> > > > >
> > > > > On Fri, 8 Jul 2022 at 17:36, Jing Ge  wrote:
> > > > >
> > > > > > Hi devs,
> > > > > >
> > > > > > After having multiple discussions with Jark and Goldfrey, I'd like
> > to
> > > > start
> > > > > > a discussion on the mailing list w.r.t. FLIP-247[1], which will
> > > > > > significantly improve the performance by providing the bulk fetch
> > > > > > capability for table and column statistics.
> > > > > >
> > > > > > Currently the statistics information about tables can only be
> > fetched
> > > > from
> > > > > > the catalog by each given partition iteratively. Since getting
> > > > statistics
> > > > > > information from catalogs is a very heavy operation, in order to
> > > > improve
> > > > > > the query performance, we’d better provide functionality to fetch
> > the
> > > > > > statistics information of a table for all given partitions in one
> > shot.
> > > > > >
> > > > > > Based on the manual performance test, for 2000 partitions, the cost
> > > > will be
> > > > > > improved from 10s to 2s. The improvement result is 500%.
> > > > > >
> > > > > > [1]
> > > > > >
> > > > > >
> > > >
> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-247%3A+Bulk+fetch+of+table+and+column+statistics+for+given+partitions
> > > > > >
> > > > > > Best regards,
> > > > > > Jing
> > > > > >
> > > >
> >

Re: [DISCUSS] Replace Attempt column with Attempt Number on the subtask list page of the Web UI

2022-07-20 Thread Zhu Zhu

Thanks for starting this discussion, Gen!
I agree it is confusing or even troublesome to show an attempt id that is
different from the corresponding attempt number in REST, metrics and logs.
It adds burden to users to do the mapping in troubleshooting. Mis-mapping
can be easy to happen and result in a waste of efforts and wrong
conclusion.

Therefore, +1 for this proposal.

Thanks,
Zhu

Gen Luo  于2022年7月20日周三 15:24写道：
>
> Hi everyone,
>
> I'd like to propose a change on the Web UI to replace the Attempt column
> with an Attempt Number column on the subtask list page.
>
> From the very beginning, the attempt number shown is calculated at the
> frontend by subtask.attempt + 1, which means the attempt number shown on
> the web UI is not the same as it is in the runtime, as well as the logs and
> the metrics. Users may get confused since they can't find logs or metrics
> of the subtask with the same attempt number.
>
> Fortunately, by now the users don't need to care about the attempt number,
> since there can be only one attempt of each subtask. However, the confusion
> seems inevitable once the speculative execution[1] or the attempt history
> is introduced, since multiple attempts of the same subtask can be executed
> or presented at the same time.
>
> I suggest that the attempt number shown on the web UI should be changed to
> align that on the runtime side, which is used in logging and metrics
> reporting. To avoid confusion, the column should also be renamed as
> "Attempt Number". The changes should only affect the Web UI. No REST API
> needs to change. What do you think?
>
> [1]
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-168%3A+Speculative+Execution+for+Batch+Job
>
> Best,
> Gen

Re: [DISCUSS] FLIP-247 Bulk fetch of table and column statistics for given partitions

2022-07-20 Thread Jing Ge

Hi Jingsong,

Thanks for clarifying it. Are you suggesting a new method or changing the
name of the methods described in the FLIP?
Please see my answers and further questions below.

Best regards,
Jing

On Wed, Jul 20, 2022 at 4:28 AM Jingsong Li  wrote:

> Hi Jing,
>
> I understand that the statistics for partitions are currently only
> used by Hive, so we can look at the Hive implementation:
>
> See HiveCatalog.getPartitionStatistics.
> To get the statistics, we actually get them from the
> org.apache.hadoop.hive.metastore.api.Partition object.
>

Correct. Both new methods will do the same thing but in bulk mode.


>
> According to HiveMetastore's API, partition-related operations
> actually get the partition as well as the statistics information.
>

I am not really sure which API it is. Methods in HiveMetaStoreClient that
are dealing with partitions will either return partitions or statisticObjs,
e.g. listPartitions(...) or getPartitionColumnStatistics(...)


>
> So if the current partition statistics are just for Hive, can we
> consider unifying it with Hive?
>

Yes, we are on the same page. This FLIP is trying to unify it with
HiveMetaStoreClient.


>
> For example, in PushPartitionIntoTableSourceScanRule, just use
> `listPartitionWithStats`, and adjust table statistics from partitions.
>

Does the bulkGetPartitionStatistics work in this case too? Or, do you mean
you need both partitions and related statistics returned by a new method
called `listPartitionWithStats`?


> Best,
> Jingsong
>
> On Tue, Jul 19, 2022 at 8:44 PM Jing Ge  wrote:
> >
> > Thanks Jingsong for the suggestion.
> >
> > Do you mean using a different naming convention? There is a thought and
> > description in the FLIP about using "list" or "bulkGet":
> >
> >- bulkGetPartitionStatistics(...) has been chosen over
> >listPartitionStatistics(...), because, comparing to database and
> partition
> >that are static and can be listed, statistics are more dynamic and
> will
> >need more computation logic to create, therefore using "get" is
> >semantically more feasible than list. The "bulk" gives users the hint
> that
> >this method will work in the bulk mode and return a collection of
> instances.
> >
> >
> > As a reference, we can see that no method in MetaStoreClient, that
> > calculates statistics, uses the "list" naming convention.
> >
> > Best regards,
> > Jing
> >
> > On Fri, Jul 15, 2022 at 5:38 AM Jingsong Li 
> wrote:
> >
> > > Thanks for starting this discussion.
> > >
> > > Have we considered introducing a listPartitionWithStats() in Catalog?
> > >
> > > Best,
> > > Jingsong
> > >
> > > On Fri, Jul 15, 2022 at 10:08 AM Jark Wu  wrote:
> > > >
> > > > Hi Jing,
> > > >
> > > > Thanks for starting this discussion. The bulk fetch is a great
> > > improvement
> > > > for the optimizer.
> > > > The FLIP looks good to me.
> > > >
> > > > Best,
> > > > Jark
> > > >
> > > > On Fri, 8 Jul 2022 at 17:36, Jing Ge  wrote:
> > > >
> > > > > Hi devs,
> > > > >
> > > > > After having multiple discussions with Jark and Goldfrey, I'd like
> to
> > > start
> > > > > a discussion on the mailing list w.r.t. FLIP-247[1], which will
> > > > > significantly improve the performance by providing the bulk fetch
> > > > > capability for table and column statistics.
> > > > >
> > > > > Currently the statistics information about tables can only be
> fetched
> > > from
> > > > > the catalog by each given partition iteratively. Since getting
> > > statistics
> > > > > information from catalogs is a very heavy operation, in order to
> > > improve
> > > > > the query performance, we’d better provide functionality to fetch
> the
> > > > > statistics information of a table for all given partitions in one
> shot.
> > > > >
> > > > > Based on the manual performance test, for 2000 partitions, the cost
> > > will be
> > > > > improved from 10s to 2s. The improvement result is 500%.
> > > > >
> > > > > [1]
> > > > >
> > > > >
> > >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-247%3A+Bulk+fetch+of+table+and+column+statistics+for+given+partitions
> > > > >
> > > > > Best regards,
> > > > > Jing
> > > > >
> > >
>

[DISCUSS] Replace Attempt column with Attempt Number on the subtask list page of the Web UI

2022-07-20 Thread Gen Luo

Hi everyone,

I'd like to propose a change on the Web UI to replace the Attempt column
with an Attempt Number column on the subtask list page.

>From the very beginning, the attempt number shown is calculated at the
frontend by subtask.attempt + 1, which means the attempt number shown on
the web UI is not the same as it is in the runtime, as well as the logs and
the metrics. Users may get confused since they can't find logs or metrics
of the subtask with the same attempt number.

Fortunately, by now the users don't need to care about the attempt number,
since there can be only one attempt of each subtask. However, the confusion
seems inevitable once the speculative execution[1] or the attempt history
is introduced, since multiple attempts of the same subtask can be executed
or presented at the same time.

I suggest that the attempt number shown on the web UI should be changed to
align that on the runtime side, which is used in logging and metrics
reporting. To avoid confusion, the column should also be renamed as
"Attempt Number". The changes should only affect the Web UI. No REST API
needs to change. What do you think?

[1]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-168%3A+Speculative+Execution+for+Batch+Job

Best,
Gen

Re: [ANNOUNCE] Table Store release-0.2 branch cut

2022-07-20 Thread Leonard Xu

Thanks Jinsong for the great work, look forward to the release.

Best,
Leonard

> 2022年7月20日 下午2:50，Jingsong Li  写道：
> 
> Hi Flink devs!
> 
> The version on the main branch has been updated to 0.3-SNAPSHOT.
> 
> The release-0.2 branch has been forked from main:
> https://github.com/apache/flink-table-store/tree/release-0.2
> 
> Documentation:
> https://nightlies.apache.org/flink/flink-table-store-docs-release-0.2/
> 
> The most important thing about Table Store 0.2 is that it enriches the
> Table Store ecosystem. Now not only Flink 1.15, but also Flink 1.14,
> Hive 2.3, Spark 2.4, Spark 3+, and Trino are all supported.
> 
> More features are: Catalog (Including Hive Metastore), Rescale bucket,
> Append-only mode, Improved documentation etc..
> 
> You are very welcome to test it!
> 
> We will try to prepare the first RC in the next week.
> 
> Best,
> Jingsong

[ANNOUNCE] Table Store release-0.2 branch cut

2022-07-20 Thread Jingsong Li

Hi Flink devs!

The version on the main branch has been updated to 0.3-SNAPSHOT.

The release-0.2 branch has been forked from main:
https://github.com/apache/flink-table-store/tree/release-0.2

Documentation:
https://nightlies.apache.org/flink/flink-table-store-docs-release-0.2/

The most important thing about Table Store 0.2 is that it enriches the
Table Store ecosystem. Now not only Flink 1.15, but also Flink 1.14,
Hive 2.3, Spark 2.4, Spark 3+, and Trino are all supported.

More features are: Catalog (Including Hive Metastore), Rescale bucket,
Append-only mode, Improved documentation etc..

You are very welcome to test it!

We will try to prepare the first RC in the next week.

Best,
Jingsong

Re: [VOTE] FLIP-238: Introduce FLIP-27-based Data Generator Source

2022-07-20 Thread Jing Ge

+1(non-binding)

Thanks for driving this!

Best regards,
Jing

On Wed, Jul 20, 2022 at 7:48 AM Robert Metzger  wrote:

> +1
>
> On Wed, Jul 20, 2022 at 4:41 AM Rui Fan <1996fan...@gmail.com> wrote:
>
> > +1(non-binding)
> >
> > New Source can better support some features, such as
> > Unaligned Checkpoint, Watermark alignment, etc.
> > The data generator based on the new Source is very helpful
> > for daily testing.
> >
> > Very much looking forward to using it.
> >
> > Best wishes
> > Rui Fan
> >
> > On Wed, Jul 20, 2022 at 4:22 AM Martijn Visser  >
> > wrote:
> >
> > > +1 (binding)
> > >
> > > Thanks for the efforts Alex!
> > >
> > > Op di 19 jul. 2022 om 21:31 schreef Alexander Fedulov <
> > > alexan...@ververica.com>:
> > >
> > > > Hi everyone,
> > > >
> > > > following the discussion in [1], I would like to open up a vote for
> > > > adding a FLIP-27-based Data Generator Source [2].
> > > >
> > > > The addition of this source also unblocks the currently pending
> > > > efforts for deprecating the Source Function API [3].
> > > >
> > > > The poll will be open until July 25 (72h + weekend), unless there is
> > > > an objection or not enough votes.
> > > >
> > > > [1] https://lists.apache.org/thread/7gjxto1rmkpff4kl54j8nlg5db2rqhkt
> > > > [2] https://cwiki.apache.org/confluence/x/9Av1D
> > > > [3]
> https://github.com/apache/flink/pull/20049#issuecomment-1170948767
> > > >
> > > > Best,
> > > > Alexander Fedulov
> > > >
> > >
> >
>

Re: [DISCUSS] FLIP-252: Amazon DynamoDB Sink Connector

2022-07-20 Thread Robert Metzger

Thanks a lot for this nice proposal!

DynamoDB seems to be a connector that Flink is still lacking, and with the
Async Sink interface, it seems that we can implement this fairly easily.

+1 to proceed to the formal vote for this FLIP!

On Fri, Jul 15, 2022 at 7:51 PM Danny Cranmer 
wrote:

> Hello all,
>
> We would like to start a discussion thread on FLIP-252: Amazon DynamoDB
> Sink Connector [1] where we propose to provide a sink connector for Amazon
> DynamoDB [2] based on the Async Sink [3]. Looking forward to comments and
> feedback. Thank you.
>
> [1]
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-252%3A+Amazon+DynamoDB+Sink+Connector
> [2] https://aws.amazon.com/dynamodb
> [3]
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-171%3A+Async+Sink
>

Re: 邮件退订

2022-07-20 Thread yuxia

To unsubscribe, you can send any email to dev-unsubscr...@flink.apache.org

Best regards,
Yuxia

- 原始邮件 -
发件人: "cason0126" 
收件人: "dev" 
发送时间: 星期三, 2022年 7 月 20日 上午 11:49:33
主题: 邮件退订

邮件退订


| |
cason0126
|
|
cason0...@163.com
|

[jira] [Created] (FLINK-28619) flink sql window aggregation using early fire will produce empty data

Re: [DISCUSS] Replace Attempt column with Attempt Number on the subtask list page of the Web UI

[jira] [Created] (FLINK-28618) Cannot use hive.dialect on master

[jira] [Created] (FLINK-28617) Support stop job statement in SqlGatewayService

RE: [VOTE] FLIP-251: Support collecting arbitrary number of streams

Re: [DISCUSS] FLIP-217 Support watermark alignment of source splits

[GitHub] [flink-connector-redis] knaufk commented on pull request #2: [FLINK-15571][connector][WIP] Redis Stream connector for Flink

Re: [VOTE] FLIP-252: Amazon DynamoDB Sink Connector

Re: [VOTE] FLIP-251: Support collecting arbitrary number of streams

[DISCUSS] FLIP-246: Multi Cluster Kafka Source

[jira] [Created] (FLINK-28616) Quickstart and examples docs should use configured stable version

[jira] [Created] (FLINK-28615) Add K8s recommend labels to the deployments, services created by operator

Re: [VOTE] Apache Flink Kubernetes Operator Release 1.1.0, release candidate #1

Re: [VOTE] FLIP-252: Amazon DynamoDB Sink Connector

Re: [VOTE] FLIP-251: Support collecting arbitrary number of streams

Re: [DISCUSS] FLIP-252: Amazon DynamoDB Sink Connector

[VOTE] FLIP-252: Amazon DynamoDB Sink Connector

Re: [ANNOUNCE] Table Store release-0.2 branch cut

[VOTE] Apache Flink Kubernetes Operator Release 1.1.0, release candidate #1

Re: [DISCUSS] FLIP-247 Bulk fetch of table and column statistics for given partitions

Re: [DISCUSS] FLIP-247 Bulk fetch of table and column statistics for given partitions

Re: [DISCUSS] Replace Attempt column with Attempt Number on the subtask list page of the Web UI

Re: [DISCUSS] FLIP-247 Bulk fetch of table and column statistics for given partitions

[DISCUSS] Replace Attempt column with Attempt Number on the subtask list page of the Web UI

Re: [ANNOUNCE] Table Store release-0.2 branch cut

[ANNOUNCE] Table Store release-0.2 branch cut

Re: [VOTE] FLIP-238: Introduce FLIP-27-based Data Generator Source

Re: [DISCUSS] FLIP-252: Amazon DynamoDB Sink Connector

Re: 邮件退订

29 matches

Site Navigation

Mail list logo

Footer information