date:20220202

Re: [VOTE] Deprecate Per-Job Mode in Flink 1.15

2022-02-02 Thread Xintong Song

Thanks for the clarification, Konstantin.

+1 for deprecating per-job mode in Flink 1.15, and reevaluating when to
drop it after Flink 1.16.

Thank you~

Xintong Song



On Tue, Feb 1, 2022 at 5:27 PM Konstantin Knauf  wrote:

> Hi Xintong, Hi Yang, Hi everyone,
>
> Thank you for speaking up. The vote is formally only about the deprecation
> in Flink 1.15.
>
> We can and should continue to collect blockers for the deletion of per-job
> mode on YARN. Then there should be one release that allows users to switch.
> So, Flink 1.16 indeed is unrealistic for dropping, as we would need to
> address all Blockers still in Flink 1.15.
>
> I think a certain degree of urgency helps us to address these issues and
> encourages users to switch to application mode. So, I would continue to
> target Flink 1.17 for dropping per-job mode, but let's reevaluate after
> Flink 1.16.
>
> Hope this helps,
>
> Konstantin
>
> Since we recently decided that
> On Sun, Jan 30, 2022 at 4:13 AM Yang Wang  wrote:
>
> > Hi all,
> >
> >
> >
> > I second Xintong’s comments to not drop the per-job mode too
> aggressively.
> > And I am afraid
> >
> > we need to get more inputs from users after deprecating the per-job mode
> in
> > release-1.15.
> >
> >
> > Most Flink on YARN users are using CLI command to integrate with the job
> > lifecycle management system.
> >
> > And they are still using the old compatibility mode "flink run -m
> > yarn-cluster", not the generic CLI mode "--target
> > yarn-per-job/yarn-application".
> >
> > Apart from the functionalities, they need some time to upgrade the
> external
> > systems.
> >
> >
> > BTW, the application mode does not support attached mode now. Some users
> > have asked for this in FLINK-25495[1].
> >
> >
> > [1]. https://issues.apache.org/jira/browse/FLINK-25495
> >
> >
> >
> >
> > Best,
> >
> > Yang
> >
> > Xintong Song  于2022年1月30日周日 08:35写道：
> >
> > > Hi Konstantin,
> > >
> > > Could we be more specific about what this vote is for? I'm asking
> > because I
> > > don't think we have consensus on all you have mentioned.
> > >
> > > To be specific, I'd be +1 for deprecating per-job mode in 1.15.
> However,
> > > I'm not sure about the following.
> > > - Targeting to drop it in 1.16 or 1.17. TBH, I'd expect to stay
> > compatible
> > > on the per-job mode a bit longer.
> > > - Targeting Yarn application mode on par with the standalone / K8s. I
> > think
> > > we need the Yarn application mode on par with the Yarn per-job mode, as
> > the
> > > latter is being dropped and users are migrating from.
> > > - FLINK-24897 being the only blocker for dropping the per-job mode. I
> > think
> > > a good time to drop the per-job mode is probably when we know most
> users
> > > have migrated to the application mode. Even if the Yarn application
> mode
> > > provides equivalent functionality as the Yarn per-job mode does, it's
> > > probably nicer to not force users to migrate if the per-job mode is
> still
> > > widely used.
> > >
> > > Discussing the above items is not my purpose here. Just trying to say
> > that
> > > IMHO in the previous discussion [1] we have not reached consensus on
> all
> > > the things mentioned in this voting thread. Consequently, if these are
> > all
> > > included in the scope of the vote, I'm afraid I cannot give my +1 on
> > this.
> > > Sorry if I'm nitpicking.
> > >
> > > Thank you~
> > >
> > > Xintong Song
> > >
> > >
> > > [1] https://lists.apache.org/thread/b8g76cqgtr2c515rd1bs41vy285f317n
> > >
> > > On Sat, Jan 29, 2022 at 2:27 PM Jing Zhang 
> wrote:
> > >
> > > > +1 (binding)
> > > >
> > > > Thanks Konstantin for driving this.
> > > >
> > > > Best,
> > > > Jing Zhang
> > > >
> > > > Chenya Zhang  于2022年1月29日周六 07:04写道：
> > > >
> > > > > +1 (non-binding)
> > > > >
> > > > > On Fri, Jan 28, 2022 at 12:46 PM Thomas Weise 
> > wrote:
> > > > >
> > > > > > +1 (binding)
> > > > > >
> > > > > > On Fri, Jan 28, 2022 at 9:27 AM David Morávek 
> > > wrote:
> > > > > >
> > > > > > > +1 (non-binding)
> > > > > > >
> > > > > > > D.
> > > > > > >
> > > > > > > On Fri 28. 1. 2022 at 17:53, Till Rohrmann <
> trohrm...@apache.org
> > >
> > > > > wrote:
> > > > > > >
> > > > > > > > +1 (binding)
> > > > > > > >
> > > > > > > > Cheers,
> > > > > > > > Till
> > > > > > > >
> > > > > > > > On Fri, Jan 28, 2022 at 4:57 PM Gabor Somogyi <
> > > > > > gabor.g.somo...@gmail.com
> > > > > > > >
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > +1 (non-binding)
> > > > > > > > >
> > > > > > > > > We're intended to make tests when FLINK-24897
> > > > > > > > >  is
> > fixed.
> > > > > > > > > In case of further issues we're going to create further
> > jiras.
> > > > > > > > >
> > > > > > > > > BR,
> > > > > > > > > G
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > On Fri, Jan 28, 2022 at 4:30 PM Konstantin Knauf <
> > > > > kna...@apache.org>
> > > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Hi everyone,
> > > > > > > > > >

[jira] [Created] (FLINK-25936) MutableTypeCell is not setting correctly the value's type_name

2022-02-02 Thread Igal Shilman (Jira)

Igal Shilman created FLINK-25936:


 Summary: MutableTypeCell is not setting correctly the value's 
type_name
 Key: FLINK-25936
 URL: https://issues.apache.org/jira/browse/FLINK-25936
 Project: Flink
  Issue Type: Improvement
Reporter: Igal Shilman


In the remote Java SDK, when a type doesn't specify an IMMUTABLE_VALUE type 
characteristics, (which is the case for custom type implementations by default) 
the type_name field is not set correctly on the resulting TypedValue.

A MutableTypeCell was assuming incorrectly that the backend will send a 
TypedValue with a type_name field set even if the value is missing 
(TypedValue.has_value = false). This is not the case, and hence the type_name 
needs to be set explicitly.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Re: Statefun async http request via RequestReplyFunctionBuilder

2022-02-02 Thread Galen Warren

Ah, nevermind. I was misunderstanding how maxRequestDuration related to the
retries.

Locally, I had Flink set not to retry on failure, so once
maxRequestDuration expired without a successful result, Flink itself was
stopping, and I wasn't seeing the function get called again. But I see now
that, if Flink is set to retry on failure, then it will start from the last
checkpoint and retry, which is what I wanted to happen.

Sorry for the confusion.

On Wed, Feb 2, 2022 at 2:55 PM Galen Warren  wrote:

> Gotcha, thanks. I may be able to work on that one in a couple weeks if
> you're looking for help.
>
> Unrelated question -- another thing that would be useful for me would be
> the ability to set a maximum backoff interval in BoundedExponentialBackoff
> or the async equivalent. My situation is this. I'd like to set a long
> maxRequestDuration, so it takes a good while for Statefun to "give up" on a
> function call, i.e. perhaps several hours or even days. During that time,
> if the backoff interval doubles on each failure, those backoff intervals
> get pretty long.
>
> Sometimes, in exponential backoff implementations, I've seen the concept
> of a max backoff interval, i.e. once the backoff interval reaches that
> point, it won't go any higher. So I could set it to, say, 60 seconds, and
> no matter how long it would retry the function, the interval between
> retries wouldn't be more than that.
>
> Do you think that would be a useful addition? I could post something to
> the dev list if you want.
>
> On Wed, Feb 2, 2022 at 2:39 PM Igal Shilman  wrote:
>
>> Hi Galen,
>> You are right, it is not possible, but there is no real reason for that.
>> We should fix this, and I've created the following JIRA issue [1]
>>
>>
>> [1] https://issues.apache.org/jira/browse/FLINK-25933
>>
>> On Wed, Feb 2, 2022 at 6:30 PM Galen Warren 
>> wrote:
>>
>> > Is it possible to choose the async HTTP transport using
>> > RequestReplyFunctionBuilder? It looks to me that it is not, but I
>> wanted to
>> > double check. Thanks.
>> >
>>
>

Re: Statefun async http request via RequestReplyFunctionBuilder

2022-02-02 Thread Igal Shilman

Great, ping me when you would like to pick this up.

For the related issue, I think that can be a good addition indeed!

On Wed, Feb 2, 2022 at 8:55 PM Galen Warren  wrote:

> Gotcha, thanks. I may be able to work on that one in a couple weeks if
> you're looking for help.
>
> Unrelated question -- another thing that would be useful for me would be
> the ability to set a maximum backoff interval in BoundedExponentialBackoff
> or the async equivalent. My situation is this. I'd like to set a long
> maxRequestDuration, so it takes a good while for Statefun to "give up" on a
> function call, i.e. perhaps several hours or even days. During that time,
> if the backoff interval doubles on each failure, those backoff intervals
> get pretty long.
>
> Sometimes, in exponential backoff implementations, I've seen the concept of
> a max backoff interval, i.e. once the backoff interval reaches that point,
> it won't go any higher. So I could set it to, say, 60 seconds, and no
> matter how long it would retry the function, the interval between retries
> wouldn't be more than that.
>
> Do you think that would be a useful addition? I could post something to the
> dev list if you want.
>
> On Wed, Feb 2, 2022 at 2:39 PM Igal Shilman  wrote:
>
> > Hi Galen,
> > You are right, it is not possible, but there is no real reason for that.
> > We should fix this, and I've created the following JIRA issue [1]
> >
> >
> > [1] https://issues.apache.org/jira/browse/FLINK-25933
> >
> > On Wed, Feb 2, 2022 at 6:30 PM Galen Warren 
> > wrote:
> >
> > > Is it possible to choose the async HTTP transport using
> > > RequestReplyFunctionBuilder? It looks to me that it is not, but I
> wanted
> > to
> > > double check. Thanks.
> > >
> >
>

[jira] [Created] (FLINK-25935) Add a harness based entry point to simply getting started.

2022-02-02 Thread Igal Shilman (Jira)

Igal Shilman created FLINK-25935:


 Summary: Add a harness based entry point to simply getting started.
 Key: FLINK-25935
 URL: https://issues.apache.org/jira/browse/FLINK-25935
 Project: Flink
  Issue Type: Improvement
  Components: Stateful Functions
Reporter: Igal Shilman


It would be nicer to improve the getting started experience by providing an 
additional entry point in the StateFun distribution that is built on the 
Harness.

This can be as simple as providing a Main function that configures RocksDB and 
starts the StateFun Flink job.

The rest of the configurations needs to come from the module.yaml

 

Having something like that will allow us to simplfy the playground even further 
by reducing the start time and the memory requirements for a docker-compose 
based example.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Created] (FLINK-25934) Modernize statefun playground examples

2022-02-02 Thread Igal Shilman (Jira)

Igal Shilman created FLINK-25934:


 Summary: Modernize statefun playground examples
 Key: FLINK-25934
 URL: https://issues.apache.org/jira/browse/FLINK-25934
 Project: Flink
  Issue Type: Improvement
  Components: Stateful Functions
Reporter: Igal Shilman


It is about time to touch up abit the examples in playground.

Most of the docker-compose/docker files are pretty old and there are a lot of 
room for improvement.
 # use redpanda instead of kafka+zk - from local experiments it seems to cut 
the start time and the memory requirements significantly. In addition it also 
comes with a REST proxy, which can improve the interactivity with the examples 
quite a lot.
 # For the Java examples, there is no reason to use java8 for the remote 
functions. We can use at least 11, if not higher.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Re: Statefun async http request via RequestReplyFunctionBuilder

2022-02-02 Thread Galen Warren

Gotcha, thanks. I may be able to work on that one in a couple weeks if
you're looking for help.

Unrelated question -- another thing that would be useful for me would be
the ability to set a maximum backoff interval in BoundedExponentialBackoff
or the async equivalent. My situation is this. I'd like to set a long
maxRequestDuration, so it takes a good while for Statefun to "give up" on a
function call, i.e. perhaps several hours or even days. During that time,
if the backoff interval doubles on each failure, those backoff intervals
get pretty long.

Sometimes, in exponential backoff implementations, I've seen the concept of
a max backoff interval, i.e. once the backoff interval reaches that point,
it won't go any higher. So I could set it to, say, 60 seconds, and no
matter how long it would retry the function, the interval between retries
wouldn't be more than that.

Do you think that would be a useful addition? I could post something to the
dev list if you want.

On Wed, Feb 2, 2022 at 2:39 PM Igal Shilman  wrote:

> Hi Galen,
> You are right, it is not possible, but there is no real reason for that.
> We should fix this, and I've created the following JIRA issue [1]
>
>
> [1] https://issues.apache.org/jira/browse/FLINK-25933
>
> On Wed, Feb 2, 2022 at 6:30 PM Galen Warren 
> wrote:
>
> > Is it possible to choose the async HTTP transport using
> > RequestReplyFunctionBuilder? It looks to me that it is not, but I wanted
> to
> > double check. Thanks.
> >
>

Re: Statefun async http request via RequestReplyFunctionBuilder

2022-02-02 Thread Igal Shilman

Hi Galen,
You are right, it is not possible, but there is no real reason for that.
We should fix this, and I've created the following JIRA issue [1]

[1] https://issues.apache.org/jira/browse/FLINK-25933

On Wed, Feb 2, 2022 at 6:30 PM Galen Warren  wrote:

> Is it possible to choose the async HTTP transport using
> RequestReplyFunctionBuilder? It looks to me that it is not, but I wanted to
> double check. Thanks.
>

[jira] [Created] (FLINK-25933) Allow configuring different transports in RequestReplyFunctionBuilder

2022-02-02 Thread Igal Shilman (Jira)

Igal Shilman created FLINK-25933:


 Summary: Allow configuring different transports in 
RequestReplyFunctionBuilder
 Key: FLINK-25933
 URL: https://issues.apache.org/jira/browse/FLINK-25933
 Project: Flink
  Issue Type: Improvement
  Components: Stateful Functions
Reporter: Igal Shilman


Currently it is not possible to configure the type of the transport used while 
using the data stream integration.

It would be useful to do so.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Re: Need help with finding inner workings of watermark stream idleness

2022-02-02 Thread Jeff Carter

Thanks, Seth. Yea this looks perfect.

I had a feeling I'd need to get deep into things, and no time like the
present haha.

May ask for more guidance with those inner workings to get a bit of a road
map. But that gets into the feature idea and beyond the scope of this
thread's original question so I'll just do that in a jira ticket in a bit.
Just wanted this so I could structure the ticket and plan of attack better.

Thanks!!

On Tue, Feb 1, 2022, 2:03 PM Seth Wiesman  wrote:

> Hi Jeff,
>
> I think the class you're looking for is StatusWatermarkValve. Note that
> this is fairly deep into the runtime stack.
>
> Seth
>
> On Tue, Feb 1, 2022 at 2:34 PM Jeff Carter  wrote:
>
> > Thanks, Till.
> >
> > That definitely helps a bit. I'm still not seeing where there is some
> idle
> > variable that the output.markIdle is setting to true (or whatever it
> sets).
> > Like the ideal thing would be if there is just some "output.isIdle()"
> that
> > could be called to know if the stream is or isnt idle. Since that doesn't
> > exist, what is the variable in "output" that dictates if it is idle or
> not
> > that that I'd just have to make an isIdle() method to make its state
> > visible to other code.
> >
> > I see the checkIfIdle() method in the code (in at least the testing
> piece)
> > you pointed out, but that seems like it's just a way to set a timer and
> > check if the idle state should be set or not. But I dont know if that's
> > setting some isIdle variable or if it's just checked and calculated
> > everytime and that method is basically the variable I'm looking for. But
> > that might just be my confusion.
> >
> > On Tue, Jan 11, 2022, 11:05 AM Till Rohrmann 
> wrote:
> >
> > > Hi Jeff,
> > >
> > > I think this happens in the WatermarksWithIdleness [1].
> > >
> > > [1]
> > >
> > >
> >
> https://github.com/apache/flink/blob/master/flink-core/src/main/java/org/apache/flink/api/common/eventtime/WatermarksWithIdleness.java#L73
> > >
> > > Cheers,
> > > Till
> > >
> > > On Tue, Jan 11, 2022 at 6:05 PM Jeff Carter 
> > wrote:
> > >
> > > > I'm looking into making a feature for flink related to watermarks and
> > am
> > > > digging into the inner watermark mechanisms, specifically with
> > idleness.
> > > > I'm familiar with idleness, but digging into the root code I can only
> > get
> > > > to where idlenessTimeout gets set in
> > WatermarkStrategyWithIdleness.java.
> > > >
> > > >  But what I'm looking for the pieces beyond that. If I set the
> idleness
> > > to
> > > > 500 milliseconds, where in the code does it actually go "I haven't
> > seen a
> > > > message in 500 milliseconds. I'm setting this stream to idle."?
> > > >
> > > > The reason being that what I'm thinking of would need to be able to
> see
> > > if
> > > > any streams are marked idle, and if so react accordingly.
> > > >
> > > > Thanks for any help in advance.
> > > >
> > >
> >
>

Statefun async http request via RequestReplyFunctionBuilder

2022-02-02 Thread Galen Warren

Is it possible to choose the async HTTP transport using
RequestReplyFunctionBuilder? It looks to me that it is not, but I wanted to
double check. Thanks.

[jira] [Created] (FLINK-25932) Introduce ExecNodeContext.generateUid()

2022-02-02 Thread Timo Walther (Jira)

Timo Walther created FLINK-25932:


 Summary: Introduce ExecNodeContext.generateUid()
 Key: FLINK-25932
 URL: https://issues.apache.org/jira/browse/FLINK-25932
 Project: Flink
  Issue Type: Sub-task
  Components: Table SQL / Planner
Reporter: Timo Walther


FLINK-25387 introduced an {{ExecNodeContext}} which contains all information to 
generate unique and deterministic identifiers for all created 
{{Transformation}}.

This issue includes:
- Add {{ExecNodeContext.generateUid(operatorName: String): String}}
- Go through all ExecNodes and give transformations a uid. The name can be 
constant within the ExecNode such that both annotation and method can use it.
- We give all transformations a uid, including stateless ones.
- The final UID should look like: {{13_stream-exec-sink_1_upsert-materializer}}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Created] (FLINK-25931) Add projection pushdown support for CsvFormatFactory

2022-02-02 Thread Alexander Fedulov (Jira)

Alexander Fedulov created FLINK-25931:
-

 Summary: Add projection pushdown support for CsvFormatFactory
 Key: FLINK-25931
 URL: https://issues.apache.org/jira/browse/FLINK-25931
 Project: Flink
  Issue Type: Sub-task
  Components: Formats (JSON, Avro, Parquet, ORC, SequenceFile)
Reporter: Alexander Fedulov


FLINK-24703 added support for projection pushdown in 
{_}CsvFileFormatFactory{_}. The same functionality should be added for 
non-file-based connectors based on _CsvFormatFactory_ and on 
_Ser/DeSerialization_ schemas.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Re: [DISCUSS] FLIP-212: Introduce Flink Kubernetes Operator

2022-02-02 Thread Gyula Fóra

Hi Danny!

Thanks for the feedback :)

Versioning:
Versioning will be independent from Flink and the operator will depend on a
fixed flink version (in every given operator version).
This should be the exact same setup as with Stateful Functions (
https://github.com/apache/flink-statefun). So independent release cycle but
still within the Flink umbrella.

Deployment error handling:
I think that's a very good point, as general exception handling for the
different failure scenarios is a tricky problem. I think the exception
classifiers and retry strategies could avoid a lot of manual intervention
from the user. We will definitely need to add something like this. Once we
have the repo created with the initial operator code we should open some
tickets for this and put it on the short term roadmap!

Cheers,
Gyula

On Wed, Feb 2, 2022 at 4:50 PM Danny Cranmer 
wrote:

> Hey team,
>
> Great work on the FLIP, I am looking forward to this one. I agree that we
> can move forward to the voting stage.
>
> I have general feedback around how we will handle job submission failure
> and retry. As discussed in the Rejected Alternatives section, we can use
> Java to handle job submission failures from the Flink client. It would be
> useful to have the ability to configure exception classifiers and retry
> strategy as part of operator configuration.
>
> Given this will be in a separate Github repository I am curious how ther
> versioning strategy will work in relation to the Flink version? Do we have
> any other components with a similar setup I can look at? Will the operator
> version track Flink or will it use its own versioning strategy with a Flink
> version support matrix, or similar?
>
> Thanks,
>
>
>
> On Tue, Feb 1, 2022 at 2:33 PM Márton Balassi 
> wrote:
>
> > Hi team,
> >
> > Thank you for the great feedback, Thomas has updated the FLIP page
> > accordingly. If you are comfortable with the currently existing design
> and
> > depth in the FLIP [1] I suggest moving forward to the voting stage - once
> > that reaches a positive conclusion it lets us create the separate code
> > repository under the flink project for the operator.
> >
> > I encourage everyone to keep improving the details in the meantime,
> however
> > I believe given the existing design and the general sentiment on this
> > thread that the most efficient path from here is starting the
> > implementation so that we can collectively iterate over it.
> >
> > [1]
> >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-212%3A+Introduce+Flink+Kubernetes+Operator
> >
> > On Mon, Jan 31, 2022 at 10:15 PM Thomas Weise  wrote:
> >
> > > HI Xintong,
> > >
> > > Thanks for the feedback and please see responses below -->
> > >
> > > On Fri, Jan 28, 2022 at 12:21 AM Xintong Song 
> > > wrote:
> > >
> > > > Thanks Thomas for drafting this FLIP, and everyone for the
> discussion.
> > > >
> > > > I also have a few questions and comments.
> > > >
> > > > ## Job Submission
> > > > Deploying a Flink session cluster via kubectl & CR and then
> submitting
> > > jobs
> > > > to the cluster via Flink cli / REST is probably the approach that
> > > requires
> > > > the least effort. However, I'd like to point out 2 weaknesses.
> > > > 1. A lot of users use Flink in perjob/application modes. For these
> > users,
> > > > having to run the job in two steps (deploy the cluster, and submit
> the
> > > job)
> > > > is not that convenient.
> > > > 2. One of our motivations is being able to manage Flink applications'
> > > > lifecycles with kubectl. Submitting jobs from cli sounds not aligned
> > with
> > > > this motivation.
> > > > I think it's probably worth it to support submitting jobs via
> kubectl &
> > > CR
> > > > in the first version, both together with deploying the cluster like
> in
> > > > perjob/application mode and after deploying the cluster like in
> session
> > > > mode.
> > > >
> > >
> > > The intention is to support application management through operator and
> > CR,
> > > which means there won't be any 2 step submission process, which as you
> > > allude to would defeat the purpose of this project. The CR example
> shows
> > > the application part. Please note that the bare cluster support is an
> > > *additional* feature for scenarios that require external job
> management.
> > Is
> > > there anything on the FLIP page that creates a different impression?
> > >
> > >
> > > >
> > > > ## Versioning
> > > > Which Flink versions does the operator plan to support?
> > > > 1. Native K8s deployment was firstly introduced in Flink 1.10
> > > > 2. Native K8s HA was introduced in Flink 1.12
> > > > 3. The Pod template support was introduced in Flink 1.13
> > > > 4. There was some changes to the Flink docker image entrypoint script
> > in,
> > > > IIRC, Flink 1.13
> > > >
> > >
> > > Great, thanks for providing this. It is important for the compatibility
> > > going forward also. We are targeting Flink 1.14.x upwards. Before the
> > > operator is ready there will

Re: [DISCUSS] FLIP-212: Introduce Flink Kubernetes Operator

2022-02-02 Thread Danny Cranmer

Hey team,

Great work on the FLIP, I am looking forward to this one. I agree that we
can move forward to the voting stage.

I have general feedback around how we will handle job submission failure
and retry. As discussed in the Rejected Alternatives section, we can use
Java to handle job submission failures from the Flink client. It would be
useful to have the ability to configure exception classifiers and retry
strategy as part of operator configuration.

Given this will be in a separate Github repository I am curious how ther
versioning strategy will work in relation to the Flink version? Do we have
any other components with a similar setup I can look at? Will the operator
version track Flink or will it use its own versioning strategy with a Flink
version support matrix, or similar?

Thanks,



On Tue, Feb 1, 2022 at 2:33 PM Márton Balassi 
wrote:

> Hi team,
>
> Thank you for the great feedback, Thomas has updated the FLIP page
> accordingly. If you are comfortable with the currently existing design and
> depth in the FLIP [1] I suggest moving forward to the voting stage - once
> that reaches a positive conclusion it lets us create the separate code
> repository under the flink project for the operator.
>
> I encourage everyone to keep improving the details in the meantime, however
> I believe given the existing design and the general sentiment on this
> thread that the most efficient path from here is starting the
> implementation so that we can collectively iterate over it.
>
> [1]
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-212%3A+Introduce+Flink+Kubernetes+Operator
>
> On Mon, Jan 31, 2022 at 10:15 PM Thomas Weise  wrote:
>
> > HI Xintong,
> >
> > Thanks for the feedback and please see responses below -->
> >
> > On Fri, Jan 28, 2022 at 12:21 AM Xintong Song 
> > wrote:
> >
> > > Thanks Thomas for drafting this FLIP, and everyone for the discussion.
> > >
> > > I also have a few questions and comments.
> > >
> > > ## Job Submission
> > > Deploying a Flink session cluster via kubectl & CR and then submitting
> > jobs
> > > to the cluster via Flink cli / REST is probably the approach that
> > requires
> > > the least effort. However, I'd like to point out 2 weaknesses.
> > > 1. A lot of users use Flink in perjob/application modes. For these
> users,
> > > having to run the job in two steps (deploy the cluster, and submit the
> > job)
> > > is not that convenient.
> > > 2. One of our motivations is being able to manage Flink applications'
> > > lifecycles with kubectl. Submitting jobs from cli sounds not aligned
> with
> > > this motivation.
> > > I think it's probably worth it to support submitting jobs via kubectl &
> > CR
> > > in the first version, both together with deploying the cluster like in
> > > perjob/application mode and after deploying the cluster like in session
> > > mode.
> > >
> >
> > The intention is to support application management through operator and
> CR,
> > which means there won't be any 2 step submission process, which as you
> > allude to would defeat the purpose of this project. The CR example shows
> > the application part. Please note that the bare cluster support is an
> > *additional* feature for scenarios that require external job management.
> Is
> > there anything on the FLIP page that creates a different impression?
> >
> >
> > >
> > > ## Versioning
> > > Which Flink versions does the operator plan to support?
> > > 1. Native K8s deployment was firstly introduced in Flink 1.10
> > > 2. Native K8s HA was introduced in Flink 1.12
> > > 3. The Pod template support was introduced in Flink 1.13
> > > 4. There was some changes to the Flink docker image entrypoint script
> in,
> > > IIRC, Flink 1.13
> > >
> >
> > Great, thanks for providing this. It is important for the compatibility
> > going forward also. We are targeting Flink 1.14.x upwards. Before the
> > operator is ready there will be another Flink release. Let's see if
> anyone
> > is interested in earlier versions?
> >
> >
> > >
> > > ## Compatibility
> > > What kind of API compatibility we can commit to? It's probably fine to
> > have
> > > alpha / beta version APIs that allow incompatible future changes for
> the
> > > first version. But eventually we would need to guarantee backwards
> > > compatibility, so that an early version CR can work with a new version
> > > operator.
> > >
> >
> > Another great point and please let me include that on the FLIP page. ;-)
> >
> > I think we should allow incompatible changes for the first one or two
> > versions, similar to how other major features have evolved recently, such
> > as FLIP-27.
> >
> > Would be great to get broader feedback on this one.
> >
> > Cheers,
> > Thomas
> >
> >
> >
> > >
> > > Thank you~
> > >
> > > Xintong Song
> > >
> > >
> > >
> > > On Fri, Jan 28, 2022 at 1:18 PM Thomas Weise  wrote:
> > >
> > > > Thanks for the feedback!
> > > >
> > > > >
> > > > > # 1 Flink Native vs Standalone integration
> > > > > Maybe we should make this

Re: Off for a week starting Friday

2022-02-02 Thread Till Rohrmann

Thanks for letting us know Etienne. Have a nice time off :-)

Cheers,
Till

On Wed, Feb 2, 2022 at 3:56 PM Etienne Chauchot 
wrote:

> Hi all,
>
> I'll be off for a week starting Friday afternoon so I might be
> unresponsive on ongoing PRs/tickets.
>
> Best
>
> Etienne.
>
>

Off for a week starting Friday

2022-02-02 Thread Etienne Chauchot


Hi all,

I'll be off for a week starting Friday afternoon so I might be 
unresponsive on ongoing PRs/tickets.


Best

Etienne.

[jira] [Created] (FLINK-25930) Remove identity casting from ScalarOperatorGens

2022-02-02 Thread Marios Trivyzas (Jira)

Marios Trivyzas created FLINK-25930:
---

 Summary: Remove identity casting from ScalarOperatorGens
 Key: FLINK-25930
 URL: https://issues.apache.org/jira/browse/FLINK-25930
 Project: Flink
  Issue Type: Sub-task
Reporter: Marios Trivyzas


Following: [https://github.com/apache/flink/pull/18582]

we could remove the following code from {*}ScalarOperatorGens{*}:

 

 
{noformat}
case (_, _) if isInteroperable(operand.resultType, targetType) =>
operand.copy(resultType = targetType)
 
{noformat}
and use our *IdentityCastRule* instead but there is an issue.

 

Currently the *isInteroperable* allows casting between types with different 
nullability whereas the *IdentityCastRule* uses the 
{*}LogicalTypeCasts#{*}{*}supportsAvoidingCast{*} which in turn uses the 
*CastAvoidanceChecker* which doesn't allow to cast from a nullable type to the 
same but non-nullable type, i.e. INT -> INT NOT NULL
{noformat}
if (sourceType.isNullable() && !targetType.isNullable()
|| sourceType.getClass() != targetType.getClass()
|| // TODO drop this line once we remove legacy types
sourceType.getTypeRoot() != targetType.getTypeRoot()) {
return false;
}{noformat}
 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Created] (FLINK-25929) Jepsen tests don't work with Zookeeper 3.4

2022-02-02 Thread Chesnay Schepler (Jira)

Chesnay Schepler created FLINK-25929:


 Summary: Jepsen tests don't work with Zookeeper 3.4
 Key: FLINK-25929
 URL: https://issues.apache.org/jira/browse/FLINK-25929
 Project: Flink
  Issue Type: Technical Debt
  Components: Test Infrastructure
Affects Versions: 1.15.0
Reporter: Chesnay Schepler
Assignee: Chesnay Schepler
 Fix For: 1.15.0






--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Created] (FLINK-25928) Refactor timestamp<->number validation messages

2022-02-02 Thread Marios Trivyzas (Jira)

Marios Trivyzas created FLINK-25928:
---

 Summary: Refactor timestamp<->number validation messages
 Key: FLINK-25928
 URL: https://issues.apache.org/jira/browse/FLINK-25928
 Project: Flink
  Issue Type: Sub-task
Reporter: Marios Trivyzas
Assignee: Marios Trivyzas


Move timestamp -> number and number -> timestamp validation messages which 
suggest the usage of dedicated methods to a validation method in 
`{*}LogicalTypeCasts{*}`, and call this validation early, before trying to 
resolve the `{*}CastRule`{*}s.

https://github.com/apache/flink/pull/18582#discussion_r797372485

https://github.com/apache/flink/commit/29d8d03b1be818a64834e5ba670a83d8857111ab



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Created] (FLINK-25927) Make flink-connector-base dependency usage consistent across all connectors.

2022-02-02 Thread Alexander Fedulov (Jira)

Alexander Fedulov created FLINK-25927:
-

 Summary: Make flink-connector-base dependency usage consistent 
across all connectors.
 Key: FLINK-25927
 URL: https://issues.apache.org/jira/browse/FLINK-25927
 Project: Flink
  Issue Type: Improvement
  Components: Connectors / Common
Reporter: Alexander Fedulov


* {{flink-connector-base}} was inconsistently used in connectors (directly 
shaded in some and transitively pulled in via {{flink-connector-files}} which 
was itself shaded in the table uber jar)
 * FLINK-24687 moved {{flink-connector-files}} out from the {{flink-table}}  
uber jar
 * It is necessary to make usage of {{flink-connector-base}} consistent across 
all connectors



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Created] (FLINK-25926) Update org.postgresql:postgresql to 42.3.2

2022-02-02 Thread Martijn Visser (Jira)

Martijn Visser created FLINK-25926:
--

 Summary: Update org.postgresql:postgresql to 42.3.2
 Key: FLINK-25926
 URL: https://issues.apache.org/jira/browse/FLINK-25926
 Project: Flink
  Issue Type: Technical Debt
  Components: Connectors / JDBC
Affects Versions: 1.14.3, 1.13.5, 1.15.0
Reporter: Martijn Visser
Assignee: Martijn Visser


Security vulnerability CVE-2022-21724 is fixed in 42.3.2. Flink is currently on 
42.2.10.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Created] (FLINK-25925) JobMasterTest.testJobMasterAcceptsExcessSlotsWhenJobIsRestarting

2022-02-02 Thread Till Rohrmann (Jira)

Till Rohrmann created FLINK-25925:
-

 Summary: 
JobMasterTest.testJobMasterAcceptsExcessSlotsWhenJobIsRestarting
 Key: FLINK-25925
 URL: https://issues.apache.org/jira/browse/FLINK-25925
 Project: Flink
  Issue Type: Bug
  Components: Runtime / Coordination
Affects Versions: 1.15.0
Reporter: Till Rohrmann
Assignee: Till Rohrmann


The test {{JobMasterTest.testJobMasterAcceptsExcessSlotsWhenJobIsRestarting}} 
fails on AZP with

{code}
Feb 02 02:49:46 [ERROR]   
JobMasterTest.testJobMasterAcceptsExcessSlotsWhenJobIsRestarting:1944 
Feb 02 02:49:46 Expected: is 
Feb 02 02:49:46  but: was 
{code}

https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=30598=logs=0e7be18f-84f2-53f0-a32d-4a5e4a174679=7c1d86e3-35bd-5fd5-3b7c-30c126a78702=9114



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Created] (FLINK-25924) KDF Integration tests intermittently fails

2022-02-02 Thread Zichen Liu (Jira)

Zichen Liu created FLINK-25924:
--

 Summary: KDF Integration tests intermittently fails
 Key: FLINK-25924
 URL: https://issues.apache.org/jira/browse/FLINK-25924
 Project: Flink
  Issue Type: New Feature
  Components: Connectors / Kinesis
Reporter: Zichen Liu
Assignee: Ahmed Hamdy
 Fix For: 1.15.0


h2. Motivation

*User stories:*
As a Flink user, I’d like to use the Table API for the new Kinesis Data Streams 
 sink.

*Scope:*
 * Introduce {{AsyncDynamicTableSink}} that enables Sinking Tables into Async 
Implementations.
 * Implement a new {{KinesisDynamicTableSink}} that uses 
{{KinesisDataStreamSink}} Async Implementation and implements 
{{{}AsyncDynamicTableSink{}}}.
 * The implementation introduces Async Sink configurations as optional options 
in the table definition, with default values derived from the 
{{KinesisDataStream}} default values.
 * Unit/Integration testing. modify KinesisTableAPI tests for the new 
implementation, add unit tests for {{AsyncDynamicTableSink}} and 
{{KinesisDynamicTableSink}} and {{{}KinesisDynamicTableSinkFactory{}}}.
 * Java / code-level docs.

h2. References

More details to be found 
[https://cwiki.apache.org/confluence/display/FLINK/FLIP-171%3A+Async+Sink]

 

*Update:*

^^ Status Update ^^
__List of all work outstanding for 1.15 release__

[Merged] https://github.com/apache/flink/pull/18165 - KDS DataStream Docs
[Merged] https://github.com/apache/flink/pull/18396 - [hotfix] for infinte loop 
if not flushing during commit
[Merged] https://github.com/apache/flink/pull/18421 - Mark Kinesis Producer as 
deprecated (Prod: FLINK-24227)
[Merged] https://github.com/apache/flink/pull/18348 - KDS Table API Sink & Docs
[Merged] https://github.com/apache/flink/pull/18488 - base sink retry entries 
in order not in reverse
[Merged] https://github.com/apache/flink/pull/18512 - changing failed requests 
handler to accept List in AsyncSinkWriter
[Merged] https://github.com/apache/flink/pull/18483 - Do not expose the element 
converter
[Merged] https://github.com/apache/flink/pull/18468 - Adding Kinesis data 
streams sql uber-jar

Ready for review:
[SUCCESS ] https://github.com/apache/flink/pull/18314 - KDF DataStream Sink & 
Docs
[BLOCKED on ^^ ] https://github.com/apache/flink/pull/18426 - rename 
flink-connector-aws-kinesis-data-* to flink-connector-aws-kinesis-* (module 
names) and KinesisData*Sink to Kinesis*Sink (class names)

Pending PR:
* Firehose Table API Sink & Docs
* KDF Table API SQL jar

TBD:
* FLINK-25846: Not shutting down
* FLINK-25848: Validation during start up
* FLINK-25792: flush() bug
* FLINK-25793: throughput exceeded
* Update the defaults of KDS sink and update the docs + do the same for KDF
* add a `AsyncSinkCommonConfig` class (to wrap the 6 params) to the 
`flink-connector-base` and propagate it to the two connectors
- feature freeze
* KDS performance testing
* KDF performance testing
* Clone the new docs to .../contents.zh/... and add the location to the 
corresponding Chinese translation jira - KDS -
* Rename [Amazon AWS Kinesis Streams] to [Amazon Kinesis Data Streams] in docs 
(legacy issue)
- Flink 1.15 release
* KDS end to end sanity test - hits aws apis rather than local docker images
* KDS Python wrappers
* FLINK-25733 - Create A migration guide for Kinesis Table API connector - can 
happen after 1.15
* If `endpoint` is provided, `region` should not be required like it currently 
is
* Test if Localstack container requires the 1ms timeout
* Adaptive level of logging (in discussion)

FYI:
* FLINK-25661 - Add Custom Fatal Exception handler in AsyncSinkWriter - 
https://github.com/apache/flink/pull/18449
* https://issues.apache.org/jira/browse/FLINK-24229 DDB Sink

Chinese translation:
https://issues.apache.org/jira/browse/FLINK-25735 - KDS DataStream Sink
https://issues.apache.org/jira/browse/FLINK-25736 - KDS Table API Sink
https://issues.apache.org/jira/browse/FLINK-25737 - KDF DataStream Sink



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Created] (FLINK-25923) Add tests for native savepoint format schema evolution

2022-02-02 Thread Dawid Wysakowicz (Jira)

Dawid Wysakowicz created FLINK-25923:


 Summary: Add tests for native savepoint format schema evolution
 Key: FLINK-25923
 URL: https://issues.apache.org/jira/browse/FLINK-25923
 Project: Flink
  Issue Type: Sub-task
  Components: Runtime / Checkpointing
Reporter: Dawid Wysakowicz


Check test coverage for:

Schema evolution

https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/dev/datastream/fault-tolerance/serialization/schema_evolution/



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Created] (FLINK-25922) KinesisFirehoseSinkITCase hangs on AZP

2022-02-02 Thread Till Rohrmann (Jira)

Till Rohrmann created FLINK-25922:
-

 Summary: KinesisFirehoseSinkITCase hangs on AZP
 Key: FLINK-25922
 URL: https://issues.apache.org/jira/browse/FLINK-25922
 Project: Flink
  Issue Type: Bug
  Components: Connectors / Kinesis
Affects Versions: 1.15.0
Reporter: Till Rohrmann


The test {{KinesisFirehoseSinkITCase}} hangs on AZP.

{code}
2022-02-02T05:37:07.5806959Z "main" #1 prio=5 os_prio=0 tid=0x7f06ec00b800 
nid=0x6145 waiting on condition [0x7f06f4cc5000]
2022-02-02T05:37:07.5807433Zjava.lang.Thread.State: WAITING (parking)
2022-02-02T05:37:07.5807815Zat sun.misc.Unsafe.park(Native Method)
2022-02-02T05:37:07.5808450Z- parking to wait for  <0x83f34570> (a 
java.util.concurrent.CompletableFuture$Signaller)
2022-02-02T05:37:07.5808955Zat 
java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
2022-02-02T05:37:07.5809493Zat 
java.util.concurrent.CompletableFuture$Signaller.block(CompletableFuture.java:1707)
2022-02-02T05:37:07.5810034Zat 
java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3323)
2022-02-02T05:37:07.5810568Zat 
java.util.concurrent.CompletableFuture.waitingGet(CompletableFuture.java:1742)
2022-02-02T05:37:07.5811102Zat 
java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1908)
2022-02-02T05:37:07.5811812Zat 
org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.execute(StreamExecutionEnvironment.java:1989)
2022-02-02T05:37:07.5812487Zat 
org.apache.flink.streaming.api.environment.LocalStreamEnvironment.execute(LocalStreamEnvironment.java:69)
2022-02-02T05:37:07.5813165Zat 
org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.execute(StreamExecutionEnvironment.java:1969)
2022-02-02T05:37:07.5813832Zat 
org.apache.flink.connector.firehose.sink.KinesisFirehoseSinkITCase.test(KinesisFirehoseSinkITCase.java:119)
2022-02-02T05:37:07.5814508Zat 
sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
2022-02-02T05:37:07.5815011Zat 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
2022-02-02T05:37:07.5815584Zat 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
2022-02-02T05:37:07.5816176Zat 
java.lang.reflect.Method.invoke(Method.java:498)
2022-02-02T05:37:07.5816687Zat 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
2022-02-02T05:37:07.5817256Zat 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
2022-02-02T05:37:07.5817806Zat 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
2022-02-02T05:37:07.5818367Zat 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
2022-02-02T05:37:07.5819005Zat 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
2022-02-02T05:37:07.5819542Zat 
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
2022-02-02T05:37:07.5820056Zat 
org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
2022-02-02T05:37:07.5820597Zat 
org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100)
2022-02-02T05:37:07.5821116Zat 
org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366)
2022-02-02T05:37:07.5821625Zat 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103)
2022-02-02T05:37:07.5822198Zat 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63)
2022-02-02T05:37:07.5822711Zat 
org.junit.runners.ParentRunner$4.run(ParentRunner.java:331)
2022-02-02T05:37:07.5823190Zat 
org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79)
2022-02-02T05:37:07.5823696Zat 
org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329)
2022-02-02T05:37:07.5824242Zat 
org.junit.runners.ParentRunner.access$100(ParentRunner.java:66)
2022-02-02T05:37:07.5824738Zat 
org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293)
2022-02-02T05:37:07.5825316Zat 
org.testcontainers.containers.FailureDetectingExternalResource$1.evaluate(FailureDetectingExternalResource.java:30)
2022-02-02T05:37:07.5825962Zat 
org.junit.rules.RunRules.evaluate(RunRules.java:20)
2022-02-02T05:37:07.5826440Zat 
org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
2022-02-02T05:37:07.5826927Zat 
org.junit.runners.ParentRunner.run(ParentRunner.java:413)
2022-02-02T05:37:07.5827380Zat 
org.junit.runner.JUnitCore.run(JUnitCore.java:137)
2022-02-02T05:37:07.5827821Zat 
org.junit.runner.JUnitCore.run(JUnitCore.java:115)
2022-02-02T05:37:07.5828306Zat 
org.junit.vintage.engine.execution.RunnerExecutor.execute(RunnerExecutor.java:42)
2022-02-02T05:37:07.5828883Zat

[jira] [Created] (FLINK-25921) Support different input parallelism for preCommit topology

2022-02-02 Thread Fabian Paul (Jira)

Fabian Paul created FLINK-25921:
---

 Summary: Support different input parallelism for preCommit topology
 Key: FLINK-25921
 URL: https://issues.apache.org/jira/browse/FLINK-25921
 Project: Flink
  Issue Type: Sub-task
  Components: API / DataStream
Affects Versions: 1.15.0, 1.16.0
Reporter: Fabian Paul


Currently, we assume that the pre-commit topology has the same parallelism as 
the operator before when inserting the failover region. To support a different 
parallelism we might need to insert a different identity map to customize the 
mapping.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Created] (FLINK-25920) Allow receiving updates of CommittableSummary

2022-02-02 Thread Fabian Paul (Jira)

Fabian Paul created FLINK-25920:
---

 Summary: Allow receiving updates of CommittableSummary
 Key: FLINK-25920
 URL: https://issues.apache.org/jira/browse/FLINK-25920
 Project: Flink
  Issue Type: Sub-task
  Components: API / DataStream, Connectors / Common
Affects Versions: 1.15.0, 1.16.0
Reporter: Fabian Paul


In the case of unaligned checkpoints, it might happen that the checkpoint 
barrier overtakes the records and an empty committable summary is emitted that 
needs to be correct at a later point when the records arrive.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Created] (FLINK-25919) Sink V2 improvements and followups

2022-02-02 Thread Fabian Paul (Jira)

Fabian Paul created FLINK-25919:
---

 Summary: Sink V2 improvements and followups
 Key: FLINK-25919
 URL: https://issues.apache.org/jira/browse/FLINK-25919
 Project: Flink
  Issue Type: Improvement
  Components: API / DataStream, Connectors / Common
Affects Versions: 1.16.0
Reporter: Fabian Paul


This is an umbrella ticket for know limitations and improvements we still want 
to do for the Sink V2 
https://cwiki.apache.org/confluence/display/FLINK/FLIP-191%3A+Extend+unified+Sink+interface+to+support+small+file+compaction



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Created] (FLINK-25918) Use FileEnumerator to implement filter pushdown of filepath metadata

2022-02-02 Thread Francesco Guardiani (Jira)

Francesco Guardiani created FLINK-25918:
---

 Summary: Use FileEnumerator to implement filter pushdown of 
filepath metadata 
 Key: FLINK-25918
 URL: https://issues.apache.org/jira/browse/FLINK-25918
 Project: Flink
  Issue Type: Improvement
  Components: Connectors / FileSystem
Reporter: Francesco Guardiani


Right now, unless you configure partition keys, the table file source will 
ingest all the files in the provided {{path}}.

Which means that a query like:

{code:sql}
SELECT * FROM MyFileTable WHERE filepath LIKE "%.csv"
{code}

Will ingest all the files and then, after the records are loaded in flink, the 
filtering happens and discards all the records not coming from a file with 
pattern "%.csv".

Using the filter push down feature provided by the DynamicTableSource stack, we 
could instead provide the {{FileSourceBuilder}} directly a {{FileEnumerator}} 
that does the filtering of input files, so we can effectively skip reading them.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Created] (FLINK-25917) Share RpcSystem aross tests

2022-02-02 Thread Chesnay Schepler (Jira)

Chesnay Schepler created FLINK-25917:


 Summary: Share RpcSystem aross tests
 Key: FLINK-25917
 URL: https://issues.apache.org/jira/browse/FLINK-25917
 Project: Flink
  Issue Type: Technical Debt
  Components: Tests
Reporter: Chesnay Schepler
Assignee: Chesnay Schepler
 Fix For: 1.15.0


We currently create a dedicated RpcSystem for each test, meaning that we go 
through the whole cycle of extracting the rpc-akka jar and creating a new 
classloader.

For testing purposes we can re-use the same RpcSystem, which will alleviate 
FLINK-18356 and maybe speed up some test or two.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Re: [ANNOUNCE] Apache Flink Stateful Functions 3.2.0 released

2022-02-02 Thread Etienne Chauchot


Congrats to everyone involved !

Best

Etienne

Le 01/02/2022 à 12:23, Till Rohrmann a écrit :

The Apache Flink community is very happy to announce the release of Apache
Flink Stateful Functions 3.2.0.

Stateful Functions is an API that simplifies building distributed stateful
applications.
It's based on functions with persistent state that can interact dynamically
with strong consistency guarantees.

Please check out the release blog post for an overview of the release:
https://flink.apache.org/news/2022/01/31/release-statefun-3.2.0.html

The release is available for download at:
https://flink.apache.org/downloads.html

Maven artifacts for Stateful Functions can be found at:
https://search.maven.org/search?q=g:org.apache.flink%20statefun

Python SDK for Stateful Functions published to the PyPI index can be found
at: https://pypi.org/project/apache-flink-statefun/

JavaScript SDK for Stateful Functions published to the NPM registry can be
found at: https://www.npmjs.com/package/apache-flink-statefun

Official Docker image for building Stateful Functions applications can be
found at:
https://hub.docker.com/r/apache/flink-statefun

The full release notes are available in Jira:
https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12350540

We would like to thank all contributors of the Apache Flink community who
made this release possible!

Cheers,
Till

Re: [VOTE] Deprecate Per-Job Mode in Flink 1.15

[jira] [Created] (FLINK-25936) MutableTypeCell is not setting correctly the value's type_name

Re: Statefun async http request via RequestReplyFunctionBuilder

Re: Statefun async http request via RequestReplyFunctionBuilder

[jira] [Created] (FLINK-25935) Add a harness based entry point to simply getting started.

[jira] [Created] (FLINK-25934) Modernize statefun playground examples

Re: Statefun async http request via RequestReplyFunctionBuilder

Re: Statefun async http request via RequestReplyFunctionBuilder

[jira] [Created] (FLINK-25933) Allow configuring different transports in RequestReplyFunctionBuilder

Re: Need help with finding inner workings of watermark stream idleness

Statefun async http request via RequestReplyFunctionBuilder

[jira] [Created] (FLINK-25932) Introduce ExecNodeContext.generateUid()

[jira] [Created] (FLINK-25931) Add projection pushdown support for CsvFormatFactory

Re: [DISCUSS] FLIP-212: Introduce Flink Kubernetes Operator

Re: [DISCUSS] FLIP-212: Introduce Flink Kubernetes Operator

Re: Off for a week starting Friday

Off for a week starting Friday

[jira] [Created] (FLINK-25930) Remove identity casting from ScalarOperatorGens

[jira] [Created] (FLINK-25929) Jepsen tests don't work with Zookeeper 3.4

[jira] [Created] (FLINK-25928) Refactor timestamp<->number validation messages

[jira] [Created] (FLINK-25927) Make flink-connector-base dependency usage consistent across all connectors.

[jira] [Created] (FLINK-25926) Update org.postgresql:postgresql to 42.3.2

[jira] [Created] (FLINK-25925) JobMasterTest.testJobMasterAcceptsExcessSlotsWhenJobIsRestarting

[jira] [Created] (FLINK-25924) KDF Integration tests intermittently fails

[jira] [Created] (FLINK-25923) Add tests for native savepoint format schema evolution

[jira] [Created] (FLINK-25922) KinesisFirehoseSinkITCase hangs on AZP

[jira] [Created] (FLINK-25921) Support different input parallelism for preCommit topology

[jira] [Created] (FLINK-25920) Allow receiving updates of CommittableSummary

[jira] [Created] (FLINK-25919) Sink V2 improvements and followups

[jira] [Created] (FLINK-25918) Use FileEnumerator to implement filter pushdown of filepath metadata

[jira] [Created] (FLINK-25917) Share RpcSystem aross tests

Re: [ANNOUNCE] Apache Flink Stateful Functions 3.2.0 released

32 matches

Site Navigation

Mail list logo

Footer information