Re: [DISCUSS] FLIP-381: Deprecate configuration getters/setters that return/set complex Java objects
Thanks Junrui for driving this proposal! ConfigOption is easy to use for flink users, easy to manage options for flink platform maintainers, and easy to maintain for flink developers and flink community. So big +1 for this proposal! Best, Rui On Thu, Nov 2, 2023 at 10:10 AM Junrui Lee wrote: > Hi devs, > > I would like to start a discussion on FLIP-381: Deprecate configuration > getters/setters that return/set complex Java objects[1]. > > Currently, the job configuration in FLINK is spread out across different > components, which leads to inconsistencies and confusion. To address this > issue, it is necessary to migrate non-ConfigOption complex Java objects to > use ConfigOption and adopt a single Configuration object to host all the > configuration. > However, there is a significant blocker in implementing this solution. > These complex Java objects in StreamExecutionEnvironment, CheckpointConfig, > and ExecutionConfig have already been exposed through the public API, > making it challenging to modify the existing implementation. > > Therefore, I propose to deprecate these Java objects and their > corresponding getter/setter interfaces, ultimately removing them in > FLINK-2.0. > > Your feedback and thoughts on this proposal are highly appreciated. > > Best regards, > Junrui Lee > > [1] > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=278464992 >
[DISCUSS] FLIP-381: Deprecate configuration getters/setters that return/set complex Java objects
Hi devs, I would like to start a discussion on FLIP-381: Deprecate configuration getters/setters that return/set complex Java objects[1]. Currently, the job configuration in FLINK is spread out across different components, which leads to inconsistencies and confusion. To address this issue, it is necessary to migrate non-ConfigOption complex Java objects to use ConfigOption and adopt a single Configuration object to host all the configuration. However, there is a significant blocker in implementing this solution. These complex Java objects in StreamExecutionEnvironment, CheckpointConfig, and ExecutionConfig have already been exposed through the public API, making it challenging to modify the existing implementation. Therefore, I propose to deprecate these Java objects and their corresponding getter/setter interfaces, ultimately removing them in FLINK-2.0. Your feedback and thoughts on this proposal are highly appreciated. Best regards, Junrui Lee [1] https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=278464992
RE: [DISCUSS] Confluent Avro support without Schema Registry access
Thanks for the pointer to FLINK-33045 - I hadn’t spotted that. That sounds like it’d address one possible issue (where someone using Flink shouldn’t be, or perhaps doesn’t have access/permission to, register new schemas). I should be clear that I absolutely agree that using a schema registry is optimum. It should be the norm – it should be the default, preferred and recommended option. However, I think that there may still be times where the schema registry isn’t available. Maybe you’re using a mirrored copy of the topic on another kafka cluster and don’t have the original Kafka cluster’s schema registry available. Maybe networking restrictions means where you are running Flink doesn’t have connectivity to the schema registry. Maybe the developer using Flink doesn’t have permission for or access to the schema registry. Maybe the schema registry is currently unavailable. Maybe the developer using Flink is developing their Flink job offline, disconnected from the environment where the schema registry is running (ahead of in future deploying their finished Flink job where it will have access to the schema registry). It is in such circumstances that I think the approach the avro formatter offers is a useful fallback. Through the table schema, it lets you specify the shape of your data, allowing you to process it without requiring an external dependency. It seems to me that making it impossible to process Confluent Avro-encoded messages without access to an additional external component is too strict a restriction (as much as there are completely valid reasons for it to be a recommendation). And, with a very small modification to the avro formatter, it’s a restriction we could remove. Kind regards Dale From: Ryan Skraba Date: Monday, 30 October 2023 at 16:42 To: dev@flink.apache.org Subject: [EXTERNAL] Re: [DISCUSS] Confluent Avro support without Schema Registry access Hello! I took a look at FLINK-33045, which is somewhat related: In that improvement, the author wants to control who registers schemas. The Flink job would know the Avro schema to use, and would look up the ID to use in framing the Avro binary. It uses but never changes the schema registry. Here it sounds like you want nearly the same thing with one more step: if the Flink job is configured with the schema to use, it could also be pre-configured with the ID that the schema registry knows. Technically, it could be configured with a *set* of schemas mapped to their IDs when the job starts, but I imagine this would be pretty clunky. I'm curious if you can share what customer use cases wouldn't want access to the schema registry! One of the reasons it exists is to prevent systems from writing unreadable or corrupted data to a Kafka topic (or other messaging system) -- which I think is what Martijn is asking about. It's unlikely to be a performance gain from hiding it. Thanks for bringing this up for discussion! Ryan [FLINK-33045]: https://issues.apache.org/jira/browse/FLINK-33045 [Single Object Encoding]: https://avro.apache.org/docs/1.11.1/specification/_print/#single-object-encoding-specification On Fri, Oct 27, 2023 at 3:13 PM Dale Lane wrote: > > > if you strip the magic byte, and the schema has > > evolved when you're consuming it from Flink, > > you can end up with deserialization errors given > > that a field might have been deleted/added/ > > changed etc. > > Aren’t we already fairly dependent on the schema remaining consistent, > because otherwise we’d need to update the table schema as well? > > > it wouldn't work when you actually want to > > write avro-confluent, because that requires a > > check when producing if you're still being compliant. > > I’m not sure what you mean here, sorry. Are you thinking about issues if you > needed to mix-and-match with both formatters at the same time? (Rather than > just using the Avro formatter as I was describing) > > Kind regards > > Dale > > > > From: Martijn Visser > Date: Friday, 27 October 2023 at 14:03 > To: dev@flink.apache.org > Subject: [EXTERNAL] Re: [DISCUSS] Confluent Avro support without Schema > Registry access > Hi Dale, > > I'm struggling to understand in what cases you want to read data > serialized in connection with Confluent Schema Registry, but can't get > access to the Schema Registry service. It seems like a rather exotic > situation and it beats the purposes of using a Schema Registry in the > first place? I also doubt that it's actually really useful: if you > strip the magic byte, and the schema has evolved when you're consuming > it from Flink, you can end up with deserialization errors given that a > field might have been deleted/added/changed etc. Also, it wouldn't > work when you actually want to write avro-confluent, because that > requires a check when producing if you're still being compliant. > > Best regards, > > Martijn > > On Fri, Oct 27, 2023 at 2:53 PM Dale Lane wrote: > > > > TLDR: > > We currently require a
Re: [DISCUSS] FLIP-378: Support Avro timestamp with local timezone
Thanks @Leonard Xu . Two minor versions are definitely needed for flip the configs. On Mon, Oct 30, 2023 at 8:55 PM Leonard Xu wrote: > Thanks @Peter for driving this FLIP > > +1 from my side, the timestamp semantics mapping looks good to me. > > > In the end, the legacy behavior will be dropped in > > Flink 2.0 > > I don’t think we can drop this option which introduced in 1.19 and drop > in 2.0, the API removal requires at least two minor versions. > > > Best, > Leonard > > > 2023年10月31日 上午11:18,Peter Huang 写道: > > > > Hi Devs, > > > > Currently, Flink Avro Format doesn't support the Avro time (milli/micros) > > with local timezone type. > > Although the Avro timestamp (millis/micros) type is supported and is > mapped > > to flink timestamp without timezone, > > it is not compliant to semantics defined in Consistent timestamp types in > > Hadoop SQL engines > > < > https://docs.google.com/document/d/1gNRww9mZJcHvUDCXklzjFEQGpefsuR_akCDfWsdE35Q/edit#heading=h.n699ftkvhjlo > > > > . > > > > I propose to support Avro timestamps with the compliance to the mapping > > semantics [1] by using a configuration flag. > > To keep back compatibility, current mapping is kept as default behavior. > > Users can explicitly turn on the new mapping > > by setting it to false. In the end, the legacy behavior will be dropped > in > > Flink 2.0 > > > > Looking forward to your feedback. > > > > > > [1] > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-378%3A+Support+Avro+timestamp+with+local+timezone > > > > > > Best Regards > > > > Peter Huang > >
Re: [DISCUSS][FLINK-33240] Document deprecated options as well
Hi Zhanghao, Thank you for the proposal. +1 from my side. It would be more user-friendly to have the deprecated options in the same section as the non-deprecated ones. Therefore, adding them in the same section sounds good to me. Best regards, Junrui Zhanghao Chen 于2023年11月1日周三 21:10写道: > Hi Samrat and Ruan, > > Thanks for the suggestion. I'm actually in favor of adding the deprecated > options in the same section as the non-deprecated ones. This would make > user search for descriptions of the replacement options more easily. It > would be a different story for options deprecated because the related > API/module is entirely deprecated, e.g. DataSet API. In that case, users > would not search for replacement on an individual option but rather need to > migrate to a new API, and it would be better to move these options to a > separate section. WDYT? > > Best, > Zhanghao Chen > > From: Samrat Deb > Sent: Wednesday, November 1, 2023 15:31 > To: dev@flink.apache.org > Cc: u...@flink.apache.org > Subject: Re: [DISCUSS][FLINK-33240] Document deprecated options as well > > Thanks for the proposal , > +1 for adding deprecated identifier > > [Thought] Can we have seperate section / page for deprecated configs ? Wdut > ? > > > Bests, > Samrat > > > On Tue, 31 Oct 2023 at 3:44 PM, Alexander Fedulov < > alexander.fedu...@gmail.com> wrote: > > > Hi Zhanghao, > > > > Thanks for the proposition. > > In general +1, this sounds like a good idea as long it is clear that the > > usage of these settings is discouraged. > > Just one minor concern - the configuration page is already very long, do > > you have a rough estimate of how many more options would be added with > this > > change? > > > > Best, > > Alexander Fedulov > > > > On Mon, 30 Oct 2023 at 18:24, Matthias Pohl > .invalid> > > wrote: > > > > > Thanks for your proposal, Zhanghao Chen. I think it adds more > > transparency > > > to the configuration documentation. > > > > > > +1 from my side on the proposal > > > > > > On Wed, Oct 11, 2023 at 2:09 PM Zhanghao Chen < > zhanghao.c...@outlook.com > > > > > > wrote: > > > > > > > Hi Flink users and developers, > > > > > > > > Currently, Flink won't generate doc for the deprecated options. This > > > might > > > > confuse users when upgrading from an older version of Flink: they > have > > to > > > > either carefully read the release notes or check the source code for > > > > upgrade guidance on deprecated options. > > > > > > > > I propose to document deprecated options as well, with a > "(deprecated)" > > > > tag placed at the beginning of the option description to highlight > the > > > > deprecation status [1]. > > > > > > > > Looking forward to your feedbacks on it. > > > > > > > > [1] https://issues.apache.org/jira/browse/FLINK-33240 > > > > > > > > Best, > > > > Zhanghao Chen > > > > > > > > > >
Re: [DISCUSS] Kubernetes Operator 1.7.0 release planning
+1 for targeting the release as soon as possible. Given the effort that Rui has undergone to decouple the autoscaling implementation, it makes sense to also include an alternative implementation with the release. In the long run, I wonder whether the standalone implementation should even be part of the Kubernetes operator repository. It can be hosted in a different repository and simply consume the flink-autoscaler jar. But the same applies to the flink-autoscaler module. For this release, we can keep everything together. I have a minor issue [1] I would like to include in the release. -Max [1] https://issues.apache.org/jira/browse/FLINK-33429 On Tue, Oct 31, 2023 at 11:14 AM Rui Fan <1996fan...@gmail.com> wrote: > > Thanks Gyula for driving this release! > > I'd like to check with you and community, could we > postpone the code freeze by a week? > > I'm developing the FLINK-33099[1], and the prod code is done. > I need some time to develop the tests. I hope this feature is included in > 1.7.0 for two main reasons: > > 1. We have completed the decoupling of the autoscaler and > kubernetes-operator in 1.7.0. During the decoupling period, we modified > a large number of autoscaler-related interfaces. The standalone autoscaler > is an autoscaler process that can run independently. It can help us confirm > whether the new interface is reasonable. > 2. 1.18.0 was recently released, standalone autoscaler allows more users to > play autoscaler and in-place rescale. > > I have created a draft PR[2] for FLINK-33099, it just includes prod code. > I have run it manually, it works well. And I will try my best to finish all > unit tests before Friday, but must finish all unit tests before next Monday > at the latest. > > WDYT? > > I'm deeply sorry for the request to postpone the release. > > [1] https://issues.apache.org/jira/browse/FLINK-33099 > [2] https://github.com/apache/flink-kubernetes-operator/pull/698 > > Best, > Rui > > On Tue, Oct 31, 2023 at 4:10 PM Samrat Deb wrote: > > > Thank you Gyula > > > > (+1 non-binding) in support of you taking on the role of release manager. > > > > > I think this is reasonable as I am not aware of any big features / bug > > fixes being worked on right now. Given the size of the changes related to > > the autoscaler module refactor we should try to focus the remaining time on > > testing. > > > > I completely agree with you. Since the changes are quite extensive, it's > > crucial to allocate more time for thorough testing and verification. > > > > Regarding working with you for the release, I might not have the necessary > > privileges for that. > > > > However, I'd be more than willing to assist with testing the changes, > > validating various features, and checking for any potential regressions in > > the flink-kubernetes-operator. > > Just let me know how I can support the testing efforts. > > > > Bests, > > Samrat > > > > > > On Tue, 31 Oct 2023 at 12:59 AM, Gyula Fóra wrote: > > > > > Hi all! > > > > > > I would like to kick off the release planning for the operator 1.7.0 > > > release. We have added quite a lot of new functionality over the last few > > > weeks and I think the operator is in a good state to kick this off. > > > > > > Based on the original release schedule we had Nov 1 as the proposed > > feature > > > freeze date and Nov 7 as the date for the release cut / rc1. > > > > > > I think this is reasonable as I am not aware of any big features / bug > > > fixes being worked on right now. Given the size of the changes related to > > > the autoscaler module refactor we should try to focus the remaining time > > on > > > testing. > > > > > > I am happy to volunteer as a release manager but I am of course open to > > > working together with someone on this. > > > > > > What do you think? > > > > > > Cheers, > > > Gyula > > > > >
[jira] [Created] (FLINK-33429) Metric collection during stabilization phase may error due to missing metrics
Maximilian Michels created FLINK-33429: -- Summary: Metric collection during stabilization phase may error due to missing metrics Key: FLINK-33429 URL: https://issues.apache.org/jira/browse/FLINK-33429 Project: Flink Issue Type: Bug Components: Autoscaler Affects Versions: kubernetes-operator-1.7.0 Reporter: Maximilian Michels Assignee: Maximilian Michels Fix For: kubernetes-operator-1.7.0 The new code for the 1.7.0 release introduces metric collection during the stabilization phase to allow sampling the observed true processing rate. Metrics might not be fully initialized during that phase, as evident through the error metrics. The following error is thrown: {noformat} java.lang.RuntimeException: Could not find required metric NUM_RECORDS_OUT_PER_SEC for 667f5d5aa757fb217b92c06f0f5d2bf2 {noformat} To prevent these errors shadowing actual errors, we should detect and ignore this recoverable exception. -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [DISCUSS][FLINK-33240] Document deprecated options as well
Hi Samrat and Ruan, Thanks for the suggestion. I'm actually in favor of adding the deprecated options in the same section as the non-deprecated ones. This would make user search for descriptions of the replacement options more easily. It would be a different story for options deprecated because the related API/module is entirely deprecated, e.g. DataSet API. In that case, users would not search for replacement on an individual option but rather need to migrate to a new API, and it would be better to move these options to a separate section. WDYT? Best, Zhanghao Chen From: Samrat Deb Sent: Wednesday, November 1, 2023 15:31 To: dev@flink.apache.org Cc: u...@flink.apache.org Subject: Re: [DISCUSS][FLINK-33240] Document deprecated options as well Thanks for the proposal , +1 for adding deprecated identifier [Thought] Can we have seperate section / page for deprecated configs ? Wdut ? Bests, Samrat On Tue, 31 Oct 2023 at 3:44 PM, Alexander Fedulov < alexander.fedu...@gmail.com> wrote: > Hi Zhanghao, > > Thanks for the proposition. > In general +1, this sounds like a good idea as long it is clear that the > usage of these settings is discouraged. > Just one minor concern - the configuration page is already very long, do > you have a rough estimate of how many more options would be added with this > change? > > Best, > Alexander Fedulov > > On Mon, 30 Oct 2023 at 18:24, Matthias Pohl .invalid> > wrote: > > > Thanks for your proposal, Zhanghao Chen. I think it adds more > transparency > > to the configuration documentation. > > > > +1 from my side on the proposal > > > > On Wed, Oct 11, 2023 at 2:09 PM Zhanghao Chen > > > wrote: > > > > > Hi Flink users and developers, > > > > > > Currently, Flink won't generate doc for the deprecated options. This > > might > > > confuse users when upgrading from an older version of Flink: they have > to > > > either carefully read the release notes or check the source code for > > > upgrade guidance on deprecated options. > > > > > > I propose to document deprecated options as well, with a "(deprecated)" > > > tag placed at the beginning of the option description to highlight the > > > deprecation status [1]. > > > > > > Looking forward to your feedbacks on it. > > > > > > [1] https://issues.apache.org/jira/browse/FLINK-33240 > > > > > > Best, > > > Zhanghao Chen > > > > > >
Re: [DISCUSS][FLINK-33240] Document deprecated options as well
Hi Alexander, I haven't done a complete analysis yet. But through simple code search, roughly 35 options would be added with this change. Also note that some old options defined in a ConfigConstant class won's be added here as flink-doc won't discover these constant-based options. Best, Zhanghao Chen From: Alexander Fedulov Sent: Tuesday, October 31, 2023 18:12 To: dev@flink.apache.org Cc: u...@flink.apache.org Subject: Re: [DISCUSS][FLINK-33240] Document deprecated options as well Hi Zhanghao, Thanks for the proposition. In general +1, this sounds like a good idea as long it is clear that the usage of these settings is discouraged. Just one minor concern - the configuration page is already very long, do you have a rough estimate of how many more options would be added with this change? Best, Alexander Fedulov On Mon, 30 Oct 2023 at 18:24, Matthias Pohl wrote: Thanks for your proposal, Zhanghao Chen. I think it adds more transparency to the configuration documentation. +1 from my side on the proposal On Wed, Oct 11, 2023 at 2:09 PM Zhanghao Chen mailto:zhanghao.c...@outlook.com>> wrote: > Hi Flink users and developers, > > Currently, Flink won't generate doc for the deprecated options. This might > confuse users when upgrading from an older version of Flink: they have to > either carefully read the release notes or check the source code for > upgrade guidance on deprecated options. > > I propose to document deprecated options as well, with a "(deprecated)" > tag placed at the beginning of the option description to highlight the > deprecation status [1]. > > Looking forward to your feedbacks on it. > > [1] https://issues.apache.org/jira/browse/FLINK-33240 > > Best, > Zhanghao Chen >
[jira] [Created] (FLINK-33428) Flink sql cep support 'followed','notNext' and 'notFollowedBy' semantics
xiaoran created FLINK-33428: --- Summary: Flink sql cep support 'followed','notNext' and 'notFollowedBy' semantics Key: FLINK-33428 URL: https://issues.apache.org/jira/browse/FLINK-33428 Project: Flink Issue Type: New Feature Components: Table SQL / API, Table SQL / Planner Affects Versions: 1.16.0 Reporter: xiaoran Currently, the cep mode of the Flink API can support next, notNext, followedBy, followedByAny, and notFollowedBy semantics, but Flink SQL only supports next semantics. The remaining notNext and followedBy semantics are implemented by other alternatives, while the notFollowedBy semantics are not currently implemented. At present, this semantics is generally implemented in business scenarios, such as judging that a user has placed an order within 15 minutes without paying. Therefore, I suggest to provide new functionality to support notFollowedBy in sql mode, along with the other three semantics The syntax of enhanced MATCH_RECOGNIZE is proposed as follows: MATCH_RECOGNIZE ( [ PARTITION BY [, ... ] ] [ ORDER BY [, ... ] ] [ MEASURES [AS] [, ... ] ] [ ONE ROW PER MATCH [ { SHOW TIMEOUT MATCHES } ] | ALL ROWS PER MATCH [ { SHOW TIMEOUT MATCHES } ] ] [ AFTER MATCH SKIP { PAST LAST ROW | TO NEXT ROW | TO [ { FIRST | LAST} ] } ] PATTERN ( ) DEFINE AS [, ... ] ) [^ ] is proposed in to express the notNext semantic. For example, A [^B] is translated to A.notNext(B). {- -} is proposed in to express the followedBy semantic. For example, A { B*? -} C is translated to A.followedBy(C). {- symbol1 -} with [^ ] is proposed in to express the notFollowedBy semantic. For example, A {- B*? -} [^C] is translated to A.notFollwedBy(B). -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [DISCUSS] FLIP-376: Add DISTRIBUTED BY clause
Hi Timo, Gotcha, let's use passive verbs. I am actually thinking about "BUCKETED BY 6" or "BUCKETED INTO 6". Not really used in SQL, but afaiu Spark uses the concept[1]. [1] https://spark.apache.org/docs/3.1.1/api/python/reference/api/pyspark.sql.DataFrameWriter.bucketBy.html Best regards, Jing On Mon, Oct 30, 2023 at 5:25 PM Timo Walther wrote: > Hi Jing, > > > Have you considered using BUCKET BY directly? > > Which vendor uses this syntax? Most vendors that I checked call this > concept "distribution". > > In any case, the "BY" is optional, so certain DDL statements would > declare it like "BUCKET INTO 6 BUCKETS"? And following the PARTITIONED, > we should use the passive voice. > > > Did you mean users can use their own algorithm? How to do it? > > "own algorithm" only refers to deciding between a list of partitioning > strategies (namely hash and range partitioning) if the connector offers > more than one. > > Regards, > Timo > > > On 30.10.23 12:39, Jing Ge wrote: > > Hi Timo, > > > > The FLIP looks great! Thanks for bringing it to our attention! In order > to > > make sure we are on the same page, I would ask some questions: > > > > 1. DISTRIBUTED BY reminds me DISTRIBUTE BY from Hive like Benchao > mentioned > > which is used to distribute rows amond reducers, i.e. focusing on the > > shuffle during the computation. The FLIP is focusing more on storage, if > I > > am not mistaken. Have you considered using BUCKET BY directly? > > > > 2. According to the FLIP: " CREATE TABLE MyTable (uid BIGINT, name > STRING) > > DISTRIBUTED BY HASH(uid) INTO 6 BUCKETS > > > > - For advanced users, the algorithm can be defined explicitly. > > - Currently, either HASH() or RANGE(). > > > > " > > Did you mean users can use their own algorithm? How to do it? > > > > Best regards, > > Jing > > > > On Mon, Oct 30, 2023 at 11:13 AM Timo Walther > wrote: > > > >> Let me reply to the feedback from Yunfan: > >> > >> > Distribute by in DML is also supported by Hive > >> > >> I see DISTRIBUTED BY and DISTRIBUTE BY as two separate discussions. This > >> discussion is about DDL. For DDL, we have more freedom as every vendor > >> has custom syntax for CREATE TABLE clauses. Furthermore, this is tightly > >> connector to the connector implementation, not the engine. However, for > >> DML we need to watch out for standard compliance and introduce changes > >> with high caution. > >> > >> How a LookupTableSource interprets the DISTRIBUTED BY is > >> connector-dependent in my opinion. In general this FLIP is a sink > >> ability, but we could have a follow FLIP that helps in distributing load > >> of lookup joins. > >> > >> > to avoid data skew problem > >> > >> I understand the use case and that it is important to solve it > >> eventually. Maybe a solution might be to introduce helper Polymorphic > >> Table Functions [1] in the future instead of new syntax. > >> > >> [1] > >> > >> > https://www.ifis.uni-luebeck.de/~moeller/Lectures/WS-19-20/NDBDM/12-Literature/Michels-SQL-2016.pdf > >> > >> > >> Let me reply to the feedback from Benchao: > >> > >> > Do you think it's useful to add some extensibility for the hash > >> strategy > >> > >> The hash strategy is fully determined by the connector, not the Flink > >> SQL engine. We are not using Flink's hash strategy in any way. If the > >> hash strategy for the regular Flink file system connector should be > >> changed, this should be expressed via config option. Otherwise we should > >> offer a dedicated `hive-filesystem` or `spark-filesystem` connector. > >> > >> Regards, > >> Timo > >> > >> > >> On 30.10.23 10:44, Timo Walther wrote: > >>> Hi Jark, > >>> > >>> my intention was to avoid too complex syntax in the first version. In > >>> the past years, we could enable use cases also without this clause, so > >>> we should be careful with overloading it with too functionality in the > >>> first version. We can still iterate on it later, the interfaces are > >>> flexible enough to support more in the future. > >>> > >>> I agree that maybe an explicit HASH and RANGE doesn't harm. Also making > >>> the bucket number optional. > >>> > >>> I updated the FLIP accordingly. Now the SupportsBucketing interface > >>> declares more methods that help in validation and proving helpful error > >>> messages to users. > >>> > >>> Let me know what you think. > >>> > >>> Regards, > >>> Timo > >>> > >>> > >>> On 27.10.23 10:20, Jark Wu wrote: > Hi Timo, > > Thanks for starting this discussion. I really like it! > The FLIP is already in good shape, I only have some minor comments. > > 1. Could we also support HASH and RANGE distribution kind on the DDL > syntax? > I noticed that HASH and UNKNOWN are introduced in the Java API, but > not in > the syntax. > > 2. Can we make "INTO n BUCKETS" optional in CREATE TABLE and ALTER > >> TABLE? > Some storage engines support automatically determining the bucket > number >
[jira] [Created] (FLINK-33427) Mark new relocated autoscaler configs IGNORE in the operator
Gyula Fora created FLINK-33427: -- Summary: Mark new relocated autoscaler configs IGNORE in the operator Key: FLINK-33427 URL: https://issues.apache.org/jira/browse/FLINK-33427 Project: Flink Issue Type: Bug Components: Kubernetes Operator Reporter: Gyula Fora Assignee: Gyula Fora Fix For: kubernetes-operator-1.7.0 The operator currently only ignores "kubrernetes.operator" prefixed configs to not trigger upgrades. Autoscaler configs should also fall in this category. -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [DISCUSS] FLIP-379: Dynamic source parallelism inference for batch jobs
Thanks Lijie for the comments! 1. For Hive source, dynamic parallelism inference in batch scenarios is a superset of static parallelism inference. As a follow-up task, we can consider changing the default value of 'table.exec.hive.infer-source-parallelism' to false. 2. I think that both dynamic parallelism inference and static parallelism inference have their own use cases. Currently, for streaming sources and other sources that are not sensitive to dynamic information, the benefits of dynamic parallelism inference may not be significant. In such cases, we can continue to use static parallelism inference. Thanks, Xia Lijie Wang 于2023年11月1日周三 14:52写道: > Hi Xia, > > Thanks for driving this FLIP, +1 for the proposal. > > I have 2 questions about the relationship between static inference and > dynamic inference: > > 1. AFAIK, currently the hive table source enable static inference by > default. In this case, which one (static vs dynamic) will take effect ? I > think it would be better if we can point this out in FLIP > > 2. As you mentioned above, dynamic inference is the most ideal way, so do > we have plan to deprecate the static inference in the future? > > Best, > Lijie > > Zhu Zhu 于2023年10月31日周二 20:19写道: > > > Thanks for opening the FLIP and kicking off this discussion, Xia! > > The proposed changes make up an important missing part of the dynamic > > parallelism inference of adaptive batch scheduler. > > > > Besides that, it is also one good step towards supporting dynamic > > parallelism inference for streaming sources, e.g. allowing Kafka > > sources to determine its parallelism automatically based on the > > number of partitions. > > > > +1 for the proposal. > > > > Thanks, > > Zhu > > > > Xia Sun 于2023年10月31日周二 16:01写道: > > > > > Hi everyone, > > > I would like to start a discussion on FLIP-379: Dynamic source > > parallelism > > > inference for batch jobs[1]. > > > > > > In general, there are three main ways to set source parallelism for > batch > > > jobs: > > > (1) User-defined source parallelism. > > > (2) Connector static parallelism inference. > > > (3) Dynamic parallelism inference. > > > > > > Compared to manually setting parallelism, automatic parallelism > inference > > > is easier to use and can better adapt to varying data volumes each day. > > > However, static parallelism inference cannot leverage runtime > > information, > > > resulting in inaccurate parallelism inference. Therefore, for batch > jobs, > > > dynamic parallelism inference is the most ideal, but currently, the > > support > > > for adaptive batch scheduler is not very comprehensive. > > > > > > Therefore, we aim to introduce a general interface that enables the > > > adaptive batch scheduler to dynamically infer the source parallelism at > > > runtime. Please refer to the FLIP[1] document for more details about > the > > > proposed design and implementation. > > > > > > I also thank Zhu Zhu and LiJie Wang for their suggestions during the > > > pre-discussion. > > > Looking forward to your feedback and suggestions, thanks. > > > > > > [1] > > > > > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-379%3A+Dynamic+source+parallelism+inference+for+batch+jobs > > > > > > Best regards, > > > Xia > > > > > >
[jira] [Created] (FLINK-33426) If the directory does not have the read permission, an exception cannot be thrown, when read this path.
wenhao.yu created FLINK-33426: - Summary: If the directory does not have the read permission, an exception cannot be thrown, when read this path. Key: FLINK-33426 URL: https://issues.apache.org/jira/browse/FLINK-33426 Project: Flink Issue Type: Bug Components: API / DataStream Affects Versions: 1.17.1, 1.13.5 Reporter: wenhao.yu Fix For: 1.17.1 When I use StreamExecutionEnvironment.ReadFile () this API, found that while reading on HDFS directory if the directory does not give permission to read, then this exception is not thrown, task would have been run, the outside world will not perceive the task status. -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [DISCUSS][FLINK-33240] Document deprecated options as well
Thanks for the proposal. +1 from my side and +1 for putting them to a separate section. Best, Hang Samrat Deb 于2023年11月1日周三 15:32写道: > Thanks for the proposal , > +1 for adding deprecated identifier > > [Thought] Can we have seperate section / page for deprecated configs ? Wdut > ? > > > Bests, > Samrat > > > On Tue, 31 Oct 2023 at 3:44 PM, Alexander Fedulov < > alexander.fedu...@gmail.com> wrote: > > > Hi Zhanghao, > > > > Thanks for the proposition. > > In general +1, this sounds like a good idea as long it is clear that the > > usage of these settings is discouraged. > > Just one minor concern - the configuration page is already very long, do > > you have a rough estimate of how many more options would be added with > this > > change? > > > > Best, > > Alexander Fedulov > > > > On Mon, 30 Oct 2023 at 18:24, Matthias Pohl > .invalid> > > wrote: > > > > > Thanks for your proposal, Zhanghao Chen. I think it adds more > > transparency > > > to the configuration documentation. > > > > > > +1 from my side on the proposal > > > > > > On Wed, Oct 11, 2023 at 2:09 PM Zhanghao Chen < > zhanghao.c...@outlook.com > > > > > > wrote: > > > > > > > Hi Flink users and developers, > > > > > > > > Currently, Flink won't generate doc for the deprecated options. This > > > might > > > > confuse users when upgrading from an older version of Flink: they > have > > to > > > > either carefully read the release notes or check the source code for > > > > upgrade guidance on deprecated options. > > > > > > > > I propose to document deprecated options as well, with a > "(deprecated)" > > > > tag placed at the beginning of the option description to highlight > the > > > > deprecation status [1]. > > > > > > > > Looking forward to your feedbacks on it. > > > > > > > > [1] https://issues.apache.org/jira/browse/FLINK-33240 > > > > > > > > Best, > > > > Zhanghao Chen > > > > > > > > > >
[jira] [Created] (FLINK-33425) Flink SQL doesn't support a inline field in struct type as primary key
Elakiya created FLINK-33425: --- Summary: Flink SQL doesn't support a inline field in struct type as primary key Key: FLINK-33425 URL: https://issues.apache.org/jira/browse/FLINK-33425 Project: Flink Issue Type: New Feature Components: Table SQL / API Affects Versions: 1.16.2 Reporter: Elakiya I have a Kafka topic named employee which uses confluent avro schema and will emit the payload as below: { "employee": { "id": "123456", "name": "sampleName" } } I am using the upsert-kafka connector to consume the events from the above Kafka topic as below using the Flink SQL DDL statement, also here I want to use the id field as the Primary key. But I am unable to use the id field since it is inside the object and currently Flink doesn't support this feature. I am using Apache Flink 16.2 and its dependent versions DDL Statement: String statement = "CREATE TABLE Employee (\r\n" + " employee ROW(id STRING, name STRING\r\n" + " ),\r\n" + " PRIMARY KEY ([employee.id|http://employee.id/]) NOT ENFORCED\r\n" + ") WITH (\r\n" + " 'connector' = 'upsert-kafka',\r\n" + " 'topic' = 'employee',\r\n" + " 'properties.bootstrap.servers' = 'kafka-cp-kafka:9092',\r\n" + " 'key.format' = 'raw',\r\n" + " 'value.format' = 'avro-confluent',\r\n" + " 'value.avro-confluent.url' = '[http://kafka-cp-schema-registry:8081|http://kafka-cp-schema-registry:8081/]',\r\n" + ")"; A new feature to use the property of a Row datatype (in this case employee.id) as a primary key would be helpful in many scenarios. Let me know if more details are required. -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [DISCUSS][FLINK-33240] Document deprecated options as well
Thanks for the proposal , +1 for adding deprecated identifier [Thought] Can we have seperate section / page for deprecated configs ? Wdut ? Bests, Samrat On Tue, 31 Oct 2023 at 3:44 PM, Alexander Fedulov < alexander.fedu...@gmail.com> wrote: > Hi Zhanghao, > > Thanks for the proposition. > In general +1, this sounds like a good idea as long it is clear that the > usage of these settings is discouraged. > Just one minor concern - the configuration page is already very long, do > you have a rough estimate of how many more options would be added with this > change? > > Best, > Alexander Fedulov > > On Mon, 30 Oct 2023 at 18:24, Matthias Pohl .invalid> > wrote: > > > Thanks for your proposal, Zhanghao Chen. I think it adds more > transparency > > to the configuration documentation. > > > > +1 from my side on the proposal > > > > On Wed, Oct 11, 2023 at 2:09 PM Zhanghao Chen > > > wrote: > > > > > Hi Flink users and developers, > > > > > > Currently, Flink won't generate doc for the deprecated options. This > > might > > > confuse users when upgrading from an older version of Flink: they have > to > > > either carefully read the release notes or check the source code for > > > upgrade guidance on deprecated options. > > > > > > I propose to document deprecated options as well, with a "(deprecated)" > > > tag placed at the beginning of the option description to highlight the > > > deprecation status [1]. > > > > > > Looking forward to your feedbacks on it. > > > > > > [1] https://issues.apache.org/jira/browse/FLINK-33240 > > > > > > Best, > > > Zhanghao Chen > > > > > >
[jira] [Created] (FLINK-33424) Resolve the problem that yarnClient cannot load yarn configurations
zhengzhili created FLINK-33424: -- Summary: Resolve the problem that yarnClient cannot load yarn configurations Key: FLINK-33424 URL: https://issues.apache.org/jira/browse/FLINK-33424 Project: Flink Issue Type: Bug Components: Client / Job Submission Affects Versions: 1.17.1 Reporter: zhengzhili Fix For: 1.19.0 -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [DISCUSS] FLIP-379: Dynamic source parallelism inference for batch jobs
Hi Xia, Thanks for driving this FLIP, +1 for the proposal. I have 2 questions about the relationship between static inference and dynamic inference: 1. AFAIK, currently the hive table source enable static inference by default. In this case, which one (static vs dynamic) will take effect ? I think it would be better if we can point this out in FLIP 2. As you mentioned above, dynamic inference is the most ideal way, so do we have plan to deprecate the static inference in the future? Best, Lijie Zhu Zhu 于2023年10月31日周二 20:19写道: > Thanks for opening the FLIP and kicking off this discussion, Xia! > The proposed changes make up an important missing part of the dynamic > parallelism inference of adaptive batch scheduler. > > Besides that, it is also one good step towards supporting dynamic > parallelism inference for streaming sources, e.g. allowing Kafka > sources to determine its parallelism automatically based on the > number of partitions. > > +1 for the proposal. > > Thanks, > Zhu > > Xia Sun 于2023年10月31日周二 16:01写道: > > > Hi everyone, > > I would like to start a discussion on FLIP-379: Dynamic source > parallelism > > inference for batch jobs[1]. > > > > In general, there are three main ways to set source parallelism for batch > > jobs: > > (1) User-defined source parallelism. > > (2) Connector static parallelism inference. > > (3) Dynamic parallelism inference. > > > > Compared to manually setting parallelism, automatic parallelism inference > > is easier to use and can better adapt to varying data volumes each day. > > However, static parallelism inference cannot leverage runtime > information, > > resulting in inaccurate parallelism inference. Therefore, for batch jobs, > > dynamic parallelism inference is the most ideal, but currently, the > support > > for adaptive batch scheduler is not very comprehensive. > > > > Therefore, we aim to introduce a general interface that enables the > > adaptive batch scheduler to dynamically infer the source parallelism at > > runtime. Please refer to the FLIP[1] document for more details about the > > proposed design and implementation. > > > > I also thank Zhu Zhu and LiJie Wang for their suggestions during the > > pre-discussion. > > Looking forward to your feedback and suggestions, thanks. > > > > [1] > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-379%3A+Dynamic+source+parallelism+inference+for+batch+jobs > > > > Best regards, > > Xia > > >
[jira] [Created] (FLINK-33423) Resolve the problem that yarnClient cannot load yarn configurations
zhengzhili created FLINK-33423: -- Summary: Resolve the problem that yarnClient cannot load yarn configurations Key: FLINK-33423 URL: https://issues.apache.org/jira/browse/FLINK-33423 Project: Flink Issue Type: Bug Components: Client / Job Submission Reporter: zhengzhili -- This message was sent by Atlassian Jira (v8.20.10#820010)