[jira] [Created] (FLINK-31194) ntroduces savepoint mechanism of Table Store
Nicholas Jiang created FLINK-31194: -- Summary: ntroduces savepoint mechanism of Table Store Key: FLINK-31194 URL: https://issues.apache.org/jira/browse/FLINK-31194 Project: Flink Issue Type: New Feature Components: Table Store Affects Versions: table-store-0.4.0 Reporter: Nicholas Jiang Fix For: table-store-0.4.0 Disaster Recovery is very much mission critical for any software. Especially when it comes to data systems, the impact could be very serious leading to delay in business decisions or even wrong business decisions at times. Flink Table Store could introduce savepoint mechanism to assist users in recovering data from a previous state. As the name suggest, "savepoint" saves the table as of the snapshot, so that it lets you restore the table to this savepoint at a later point in snapshot if need be. Care is taken to ensure cleaner will not clean up any files that are savepointed. On similar lines, savepoint cannot be triggered on a snapshot that is already cleaned up. In simpler terms, this is synonymous to taking a backup, just that we don't make a new copy of the table, but just save the state of the table elegantly so that we can restore it later when in need. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-31193) The option `table.exec.hive.native-agg-function.enabled` should work at job level when using it in SqlClient side
dalongliu created FLINK-31193: - Summary: The option `table.exec.hive.native-agg-function.enabled` should work at job level when using it in SqlClient side Key: FLINK-31193 URL: https://issues.apache.org/jira/browse/FLINK-31193 Project: Flink Issue Type: Sub-task Components: Connectors / Hive Affects Versions: 1.17.0 Reporter: dalongliu Fix For: 1.18.0 -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [DISCUSS] FLIP-296: Watermark options for table API & SQL
Hi kui Thanks for your answer and +1 to yuxia too > we should not bind the watermark-related options to a connector to ensure semantic clarity. In my opinion, adding watermark-related options to a connector is much more clear. Currently users can define simple watermark strategy in DDL, adding more configuration items in connector options is easy to understand Best, Shammon On Thu, Feb 23, 2023 at 10:52 AM Jingsong Li wrote: > Thanks for your proposal. > > +1 to yuxia, consider watermark-related hints as option hints. > > Personally, I am cautious about adding SQL syntax, WATERMARK_PARAMS is > also SQL syntax to some extent. > > We can use OPTIONS to meet this requirement if possible. > > Best, > Jingsong > > On Thu, Feb 23, 2023 at 10:41 AM yuxia > wrote: > > > > Hi, Yuan Kui. > > Thanks for driving it. > > IMO, the 'OPTIONS' hint may be not only specific to the connector > options. Just as a reference, we also have `sink.parallelism`[1] as a > connector options. It enables > > user to specific the writer's parallelism dynamically per-query. > > > > Personally, I perfer to consider watermark-related hints as option > hints. So, user can define a default watermark strategy for the table, and > if user dosen't needed to changes it, they need to do nothing in their > query instead of specific it ervery time. > > > > [1] > https://nightlies.apache.org/flink/flink-docs-master/zh/docs/connectors/table/filesystem/#sink-parallelism > > > > Best regards, > > Yuxia > > > > Best regards, > > Yuxia > > > > - 原始邮件 - > > 发件人: "kui yuan" > > 收件人: "dev" > > 抄送: "Jark Wu" > > 发送时间: 星期三, 2023年 2 月 22日 下午 10:08:11 > > 主题: Re: [DISCUSS] FLIP-296: Watermark options for table API & SQL > > > > Hi all, > > > > Thanks for the lively discussion and I will respond to these questions > one > > by one. However, there are also some common questions and I will answer > > together. > > > > @郑 Thanks for your reply. The features mentioned in this flip are only > for > > those source connectors that implement the SupportsWatermarkPushDown > > interface, generating watermarks in other graph locations is not in the > > scope of this discussion. Perhaps another flip can be proposed later to > > implement this feature. > > > > @Shammon Thanks for your reply. In Flip-296, a rejected alternative is > > adding watermark related options in the connector options,we believe that > > we should not bind the watermark-related options to a connector to ensure > > semantic clarity. > > > > > What will happen if we add watermark related options in `the connector > > > options`? Will the connector ignore these options or throw an > exception? > > > How can we support this? > > > > If user defines different watermark configurations for one table in two > > places, I tend to prefer the first place would prevail, but we can also > > throw exception or just print logs to prompt the user, which are > > implementation details. > > > > > If one table is used by two operators with different watermark params, > > > what will happen? > > > > @Martijn Thanks for your reply. I'm sorry that we are not particularly > > accurate, this hint is mainly for SQL, not table API. > > > > > While the FLIP talks about watermark options for Table API & SQL, I > only > > > see proposed syntax for SQL, not for the Table API. What is your > proposal > > > for the Table API > > > > @Jane Thanks for your reply. For the first question, If the user uses > this > > hint on those sourse that does not implement the > SupportsWatermarkPushDown > > interface, it will be completely invalid. The task will run as normal as > if > > the hint had not been used. > > > > > What's the behavior if there are multiple table sources, among which > > > some do not support `SupportsWatermarkPushDown`? > > > > @Jane feedback that 'WATERMARK_PARAMS' is difficult to remember, perhaps > > the naming issue can be put to the end of the discussion, because more > > people like @Martijn @Shuo are considering whether these configurations > > should be put into the DDL or the 'OPTIONS' hint. Here's what I > > think, Putting these configs into DDL or putting them into 'OPTIONS' hint > > is actually the same thing, because the 'OPTIONS' hint is mainly used to > > configure the properties of conenctor. The reason why I want to use a new > > hint is to make sure the semantics clear, in my opinion the configuration > > of watermark should not be mixed up with connector. However, a new hint > > does make it more difficult to use to some extent, for example, when a > user > > uses both 'OPTIONS' hint and 'WATERMARK_PARAMS' hint. For this point, > maby > > it is more appropriate to use uniform 'OPTIONS' hint. > > On the other hand, if we enrich more watermark option keys in 'OPTIONS' > > hints, The question will be what we treat the definatrions of 'OPTIONS' > > hint, is this only specific to the connector options or could be more? > > Maybe @Jark could share more insights here. In my opion,
[jira] [Created] (FLINK-31192) dataGen takes too long to initialize under sequence
xzw0223 created FLINK-31192: --- Summary: dataGen takes too long to initialize under sequence Key: FLINK-31192 URL: https://issues.apache.org/jira/browse/FLINK-31192 Project: Flink Issue Type: Improvement Affects Versions: 1.16.1, 1.16.0 Reporter: xzw0223 Fix For: 1.16.1, 1.16.0 The SequenceGenerator preloads all sequence values in open. If the totalElement number is too large, it will take too long. [https://github.com/apache/flink/blob/master/flink-streaming-java/src/main/java/org/apache/flink/streaming/api/functions/source/datagen/SequenceGenerator.java#L91] The reason is that the capacity of the Deque will be expanded twice when the current capacity is full, and the array copy is required, which is time-consuming. Here's what I think : do not preload the full amount of data on Sequence, and generate a piece of data each time next is called to solve the problem of slow initialization caused by loading full amount of data. record the currently sent Sequence position through the checkpoint, and continue to send data through the recorded position after an abnormal restart to ensure fault tolerance -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-31191) VectorIndexer should check whether doublesByColumn is null before snapshot
Zhipeng Zhang created FLINK-31191: - Summary: VectorIndexer should check whether doublesByColumn is null before snapshot Key: FLINK-31191 URL: https://issues.apache.org/jira/browse/FLINK-31191 Project: Flink Issue Type: Bug Components: Library / Machine Learning Affects Versions: ml-2.2.0 Reporter: Zhipeng Zhang Currently VectorIndexer would lead to NPE when doing checkpoint. It should check whether `doublesByColumn` is null before calling snapshot. logview: [https://github.com/apache/flink-ml/actions/runs/4249415318/jobs/7389547039] details: [735|https://github.com/apache/flink-ml/actions/runs/4249415318/jobs/7389547039#step:4:736]Caused by: java.lang.NullPointerException [736|https://github.com/apache/flink-ml/actions/runs/4249415318/jobs/7389547039#step:4:737] at org.apache.flink.ml.feature.vectorindexer.VectorIndexer$ComputeDistinctDoublesOperator.convertToListArray(VectorIndexer.java:232) [737|https://github.com/apache/flink-ml/actions/runs/4249415318/jobs/7389547039#step:4:738] at org.apache.flink.ml.feature.vectorindexer.VectorIndexer$ComputeDistinctDoublesOperator.snapshotState(VectorIndexer.java:228) [738|https://github.com/apache/flink-ml/actions/runs/4249415318/jobs/7389547039#step:4:739] at org.apache.flink.streaming.api.operators.StreamOperatorStateHandler.snapshotState(StreamOperatorStateHandler.java:222) [739|https://github.com/apache/flink-ml/actions/runs/4249415318/jobs/7389547039#step:4:740] ... 33 more -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [DISCUSS] FLIP-297: Improve Auxiliary Sql Statements
Hi Jingsong. thanks. i got it. In this way, there is no need to introduce new API changes. Best Regards, Ran Tao Jingsong Li 于2023年2月23日周四 12:26写道: > Hi Ran, > > I mean we can just use > TableEnvironment.getCatalog(getCurrentCatalog).get().listDatabases(). > > We don't need to provide new apis just for utils. > > Best, > Jingsong > > On Thu, Feb 23, 2023 at 12:11 PM Ran Tao wrote: > > > > Hi Jingsong, thanks. > > > > The implementation of these statements in TableEnvironmentImpl is called > > through the catalog api. > > > > but it does support some new override methods on the catalog api side, > and > > I will update it later. Thank you. > > > > e.g. > > TableEnvironmentImpl > > @Override > > public String[] listDatabases() { > > return catalogManager > > .getCatalog(catalogManager.getCurrentCatalog()) > > .get() > > .listDatabases() > > .toArray(new String[0]); > > } > > > > Best Regards, > > Ran Tao > > > > > > Jingsong Li 于2023年2月23日周四 11:47写道: > > > > > Thanks for the proposal. > > > > > > +1 for the proposal. > > > > > > I am confused about "Proposed TableEnvironment SQL API Changes", can > > > we just use catalog api for this requirement? > > > > > > Best, > > > Jingsong > > > > > > On Thu, Feb 23, 2023 at 10:48 AM Jacky Lau > wrote: > > > > > > > > Hi Ran: > > > > Thanks for driving the FLIP. the google doc looks really good. it is > > > > important to improve user interactive experience. +1 to support this > > > > feature. > > > > > > > > Jing Ge 于2023年2月23日周四 00:51写道: > > > > > > > > > Hi Ran, > > > > > > > > > > Thanks for driving the FLIP. It looks overall good. Would you > like to > > > add > > > > > a description of useLike and notLike? I guess useLike true is for > > > "LIKE" > > > > > and notLike true is for "NOT LIKE" but I am not sure if I > understood it > > > > > correctly. Furthermore, does it make sense to support "ILIKE" too? > > > > > > > > > > Best regards, > > > > > Jing > > > > > > > > > > On Wed, Feb 22, 2023 at 1:17 PM Ran Tao > wrote: > > > > > > > > > > > Currently flink sql auxiliary statements has supported some good > > > features > > > > > > such as catalog/databases/table support. > > > > > > > > > > > > But these features are not very complete compared with other > popular > > > > > > engines such as spark, presto, hive and commercial engines such > as > > > > > > snowflake. > > > > > > > > > > > > For example, many engines support show operation with filtering > > > except > > > > > > flink, and support describe other object(flink only support > describe > > > > > > table). > > > > > > > > > > > > I wonder can we add these useful features for flink? > > > > > > You can find details in this doc.[1] or FLIP.[2] > > > > > > > > > > > > Also, please let me know if there is a mistake. Looking forward > to > > > your > > > > > > reply. > > > > > > > > > > > > [1] > > > > > > > > > > > > > > > > > > > > > https://docs.google.com/document/d/1hAiOfPx14VTBTOlpyxG7FA2mB1k5M31VnKYad2XpJ1I/ > > > > > > [2] > > > > > > > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-297%3A+Improve+Auxiliary+Sql+Statements > > > > > > > > > > > > Best Regards, > > > > > > Ran Tao > > > > > > > > > > > > > > >
[jira] [Created] (FLINK-31190) Supports Spark call procedure command on Table Store
Nicholas Jiang created FLINK-31190: -- Summary: Supports Spark call procedure command on Table Store Key: FLINK-31190 URL: https://issues.apache.org/jira/browse/FLINK-31190 Project: Flink Issue Type: New Feature Components: Table Store Affects Versions: table-store-0.4.0 Reporter: Nicholas Jiang Fix For: table-store-0.4.0 At present Hudi and Iceberg supports the Spark call procedure command to execute the table service action etc. Flink Table Store could also support Spark call procedure command to run compaction etc. -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [DISCUSS] FLIP-297: Improve Auxiliary Sql Statements
Hi Ran, I mean we can just use TableEnvironment.getCatalog(getCurrentCatalog).get().listDatabases(). We don't need to provide new apis just for utils. Best, Jingsong On Thu, Feb 23, 2023 at 12:11 PM Ran Tao wrote: > > Hi Jingsong, thanks. > > The implementation of these statements in TableEnvironmentImpl is called > through the catalog api. > > but it does support some new override methods on the catalog api side, and > I will update it later. Thank you. > > e.g. > TableEnvironmentImpl > @Override > public String[] listDatabases() { > return catalogManager > .getCatalog(catalogManager.getCurrentCatalog()) > .get() > .listDatabases() > .toArray(new String[0]); > } > > Best Regards, > Ran Tao > > > Jingsong Li 于2023年2月23日周四 11:47写道: > > > Thanks for the proposal. > > > > +1 for the proposal. > > > > I am confused about "Proposed TableEnvironment SQL API Changes", can > > we just use catalog api for this requirement? > > > > Best, > > Jingsong > > > > On Thu, Feb 23, 2023 at 10:48 AM Jacky Lau wrote: > > > > > > Hi Ran: > > > Thanks for driving the FLIP. the google doc looks really good. it is > > > important to improve user interactive experience. +1 to support this > > > feature. > > > > > > Jing Ge 于2023年2月23日周四 00:51写道: > > > > > > > Hi Ran, > > > > > > > > Thanks for driving the FLIP. It looks overall good. Would you like to > > add > > > > a description of useLike and notLike? I guess useLike true is for > > "LIKE" > > > > and notLike true is for "NOT LIKE" but I am not sure if I understood it > > > > correctly. Furthermore, does it make sense to support "ILIKE" too? > > > > > > > > Best regards, > > > > Jing > > > > > > > > On Wed, Feb 22, 2023 at 1:17 PM Ran Tao wrote: > > > > > > > > > Currently flink sql auxiliary statements has supported some good > > features > > > > > such as catalog/databases/table support. > > > > > > > > > > But these features are not very complete compared with other popular > > > > > engines such as spark, presto, hive and commercial engines such as > > > > > snowflake. > > > > > > > > > > For example, many engines support show operation with filtering > > except > > > > > flink, and support describe other object(flink only support describe > > > > > table). > > > > > > > > > > I wonder can we add these useful features for flink? > > > > > You can find details in this doc.[1] or FLIP.[2] > > > > > > > > > > Also, please let me know if there is a mistake. Looking forward to > > your > > > > > reply. > > > > > > > > > > [1] > > > > > > > > > > > > > > > > https://docs.google.com/document/d/1hAiOfPx14VTBTOlpyxG7FA2mB1k5M31VnKYad2XpJ1I/ > > > > > [2] > > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-297%3A+Improve+Auxiliary+Sql+Statements > > > > > > > > > > Best Regards, > > > > > Ran Tao > > > > > > > > > > >
Re: [DISCUSS] FLIP-297: Improve Auxiliary Sql Statements
Hi Jingsong, thanks. The implementation of these statements in TableEnvironmentImpl is called through the catalog api. but it does support some new override methods on the catalog api side, and I will update it later. Thank you. e.g. TableEnvironmentImpl @Override public String[] listDatabases() { return catalogManager .getCatalog(catalogManager.getCurrentCatalog()) .get() .listDatabases() .toArray(new String[0]); } Best Regards, Ran Tao Jingsong Li 于2023年2月23日周四 11:47写道: > Thanks for the proposal. > > +1 for the proposal. > > I am confused about "Proposed TableEnvironment SQL API Changes", can > we just use catalog api for this requirement? > > Best, > Jingsong > > On Thu, Feb 23, 2023 at 10:48 AM Jacky Lau wrote: > > > > Hi Ran: > > Thanks for driving the FLIP. the google doc looks really good. it is > > important to improve user interactive experience. +1 to support this > > feature. > > > > Jing Ge 于2023年2月23日周四 00:51写道: > > > > > Hi Ran, > > > > > > Thanks for driving the FLIP. It looks overall good. Would you like to > add > > > a description of useLike and notLike? I guess useLike true is for > "LIKE" > > > and notLike true is for "NOT LIKE" but I am not sure if I understood it > > > correctly. Furthermore, does it make sense to support "ILIKE" too? > > > > > > Best regards, > > > Jing > > > > > > On Wed, Feb 22, 2023 at 1:17 PM Ran Tao wrote: > > > > > > > Currently flink sql auxiliary statements has supported some good > features > > > > such as catalog/databases/table support. > > > > > > > > But these features are not very complete compared with other popular > > > > engines such as spark, presto, hive and commercial engines such as > > > > snowflake. > > > > > > > > For example, many engines support show operation with filtering > except > > > > flink, and support describe other object(flink only support describe > > > > table). > > > > > > > > I wonder can we add these useful features for flink? > > > > You can find details in this doc.[1] or FLIP.[2] > > > > > > > > Also, please let me know if there is a mistake. Looking forward to > your > > > > reply. > > > > > > > > [1] > > > > > > > > > > > > https://docs.google.com/document/d/1hAiOfPx14VTBTOlpyxG7FA2mB1k5M31VnKYad2XpJ1I/ > > > > [2] > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-297%3A+Improve+Auxiliary+Sql+Statements > > > > > > > > Best Regards, > > > > Ran Tao > > > > > > > >
RE: Re: Re: [discussion] To introduce a formatter for Markdown files
Hi Jing, Prettier is a versatile formatting tools with very limited options for markdown files. If we need to specify the rules, I think other tools, such as markdownlint [1], can be a better choice. As for markdownlint, we can specify many kinds of rules. Here is an example for `unordered list` whose style ID is MD004 [2]. ```json { "default": true, "MD004": { "style": "asterisk" } } ``` Then markdown files only accept '*' for making a list item. The specific formatting tools may be left as a vote later. [1] https://github.com/DavidAnson/markdownlint [2] https://github.com/DavidAnson/markdownlint/blob/main/doc/Rules.md On 2023/02/22 21:22:02 Jing Ge wrote: > Hi Zhongpu, > > Thanks for the clarification. Prettier looks fine for formatting mk files. > Is there any way to validate rules like e.g. only using '*' for listing > items in the CI pipeline? > > Best regards, > Jing > > On Wed, Feb 22, 2023 at 2:28 PM Zhongpu Chen wrote: > > > Hi Jing, > > > > Sorry for the last reply in a messed up formatting, as I am not quite > > familiar with the mail list usage. > > > > First, as long as contributors install Prettier with the same version, > > markdown files can be auto-formatted in the same way without extra working > > via the `prettier --write **/*.md` command, and this can also be set as a > > part of pre-commit hook. To check whether markdown files satisfy formatting > > requirements, we can use `prettier --check **/*.md` where exit code 0 means > > it does follow the styles. > > > > Secondly, the format check can be integrated with the CI pipeline easily. > > Here is an example using GitHub Actions, and I think the setting in Azure > > pipelines is similar. > > > > ```yml > > jobs: > > Check-MD-Format-Actions: > > runs-on: ubuntu-latest > > steps: > > - name: Check out repo code > > uses: actions/checkout@v3 > > - name: Use Node.js > > uses: actions/setup-node@v3 > > with: > > node-version: "12.x" > > - name: Install pretty > > run: npm i -g prettier > > - name: Prettify code > > run: prettier --check **/*.md > > ``` > > > > At last, it will never cause any conflict. Prettier is compatible with > > CommonMark (https://commonmark.org/) and GitHub Flavored Markdown. > > > > > > On 2023/02/22 10:22:46 Jing Ge wrote: > > > Hi Zhongpu, > > > > > > Thanks for starting this discussion. I was wondering how we could let > > every > > > contributor follow the same mk rule. I am not familiar with Prettier. > > Would > > > you like to help me understand some basic questions? Thanks. > > > > > > Is there any way to integrate the format check into our CI pipeline? > > > Otherwise, it will be hard to control it. Another thing is that some docs > > > are written with Hugo syntax [1]. Would they work together with no > > conflict? > > > > > > > > > Best regards, > > > Jing > > > > > > [1] https://gohugo.io/ > > > > > > On Wed, Feb 22, 2023 at 9:50 AM Zhongpu Chen wrote: > > > > > > > As I mentioned in FLINK-31177 ( > > > > https://issues.apache.org/jira/browse/FLINK-31177): > > > > > > > > Currently, markdown files in *docs* are maintained and updated by many > > > > contributors, and different people have varying code style taste. By > > the > > > > way, as the syntax of markdown is not really strict, the styles tend to > > be > > > > inconsistent. > > > > > > > > To name a few, > > > > > > > > - Some prefer `*` to make a list item, while others may prefer `-`. > > > > - It is common to leave many unnecessary blank lines and spaces. > > > > - To make a divider, the number of `-` can be varying. > > > > > > > > To this end, I think it would be nicer to encourage or demand > > contributors > > > > to format their markdown files before making a pull request. > > Personally, I > > > > think Prettier (https://prettier.io/) is a good candidate. > > > > What do you think? > > > > > > > > -- > > > > Zhongpu Chen > > > > > > > > > >
Re: [DISCUSS] FLIP-297: Improve Auxiliary Sql Statements
Thanks for the proposal. +1 for the proposal. I am confused about "Proposed TableEnvironment SQL API Changes", can we just use catalog api for this requirement? Best, Jingsong On Thu, Feb 23, 2023 at 10:48 AM Jacky Lau wrote: > > Hi Ran: > Thanks for driving the FLIP. the google doc looks really good. it is > important to improve user interactive experience. +1 to support this > feature. > > Jing Ge 于2023年2月23日周四 00:51写道: > > > Hi Ran, > > > > Thanks for driving the FLIP. It looks overall good. Would you like to add > > a description of useLike and notLike? I guess useLike true is for "LIKE" > > and notLike true is for "NOT LIKE" but I am not sure if I understood it > > correctly. Furthermore, does it make sense to support "ILIKE" too? > > > > Best regards, > > Jing > > > > On Wed, Feb 22, 2023 at 1:17 PM Ran Tao wrote: > > > > > Currently flink sql auxiliary statements has supported some good features > > > such as catalog/databases/table support. > > > > > > But these features are not very complete compared with other popular > > > engines such as spark, presto, hive and commercial engines such as > > > snowflake. > > > > > > For example, many engines support show operation with filtering except > > > flink, and support describe other object(flink only support describe > > > table). > > > > > > I wonder can we add these useful features for flink? > > > You can find details in this doc.[1] or FLIP.[2] > > > > > > Also, please let me know if there is a mistake. Looking forward to your > > > reply. > > > > > > [1] > > > > > > > > https://docs.google.com/document/d/1hAiOfPx14VTBTOlpyxG7FA2mB1k5M31VnKYad2XpJ1I/ > > > [2] > > > > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-297%3A+Improve+Auxiliary+Sql+Statements > > > > > > Best Regards, > > > Ran Tao > > > > >
Re: [DISCUSS] FLIP-297: Improve Auxiliary Sql Statements
thanks Jing. there is a small mistake.useLike it means whether to enable like. e.g. SHOW TABLES from cat1.db1 not like 't%' // useLike is true, notLike is true SHOW TABLES from cat1.db1 like 't%' // useLike is true, notLike is false useLike both are true. I have updated the flip for this. question2: i think currently we do not support 'ILIKE' in this flip. Because other statements are currently based on the implementation of LIKE, the implementation of this flip is compatible with other SQL syntax. also, in the doc in this flip you can find that some other engines only support LIKE. what do you think? Best Regards, Ran Tao https://github.com/chucheng92 Jing Ge 于2023年2月23日周四 00:50写道: > Hi Ran, > > Thanks for driving the FLIP. It looks overall good. Would you like to add > a description of useLike and notLike? I guess useLike true is for "LIKE" > and notLike true is for "NOT LIKE" but I am not sure if I understood it > correctly. Furthermore, does it make sense to support "ILIKE" too? > > Best regards, > Jing > > On Wed, Feb 22, 2023 at 1:17 PM Ran Tao wrote: > > > Currently flink sql auxiliary statements has supported some good features > > such as catalog/databases/table support. > > > > But these features are not very complete compared with other popular > > engines such as spark, presto, hive and commercial engines such as > > snowflake. > > > > For example, many engines support show operation with filtering except > > flink, and support describe other object(flink only support describe > > table). > > > > I wonder can we add these useful features for flink? > > You can find details in this doc.[1] or FLIP.[2] > > > > Also, please let me know if there is a mistake. Looking forward to your > > reply. > > > > [1] > > > > > https://docs.google.com/document/d/1hAiOfPx14VTBTOlpyxG7FA2mB1k5M31VnKYad2XpJ1I/ > > [2] > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-297%3A+Improve+Auxiliary+Sql+Statements > > > > Best Regards, > > Ran Tao > > >
[jira] [Created] (FLINK-31189) Allow ignore less frequent values in StringIndexer
Fan Hong created FLINK-31189: Summary: Allow ignore less frequent values in StringIndexer Key: FLINK-31189 URL: https://issues.apache.org/jira/browse/FLINK-31189 Project: Flink Issue Type: Improvement Components: Library / Machine Learning Reporter: Fan Hong -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [DISCUSS] FLIP-296: Watermark options for table API & SQL
Thanks for your proposal. +1 to yuxia, consider watermark-related hints as option hints. Personally, I am cautious about adding SQL syntax, WATERMARK_PARAMS is also SQL syntax to some extent. We can use OPTIONS to meet this requirement if possible. Best, Jingsong On Thu, Feb 23, 2023 at 10:41 AM yuxia wrote: > > Hi, Yuan Kui. > Thanks for driving it. > IMO, the 'OPTIONS' hint may be not only specific to the connector options. > Just as a reference, we also have `sink.parallelism`[1] as a connector > options. It enables > user to specific the writer's parallelism dynamically per-query. > > Personally, I perfer to consider watermark-related hints as option hints. So, > user can define a default watermark strategy for the table, and if user > dosen't needed to changes it, they need to do nothing in their query instead > of specific it ervery time. > > [1] > https://nightlies.apache.org/flink/flink-docs-master/zh/docs/connectors/table/filesystem/#sink-parallelism > > Best regards, > Yuxia > > Best regards, > Yuxia > > - 原始邮件 - > 发件人: "kui yuan" > 收件人: "dev" > 抄送: "Jark Wu" > 发送时间: 星期三, 2023年 2 月 22日 下午 10:08:11 > 主题: Re: [DISCUSS] FLIP-296: Watermark options for table API & SQL > > Hi all, > > Thanks for the lively discussion and I will respond to these questions one > by one. However, there are also some common questions and I will answer > together. > > @郑 Thanks for your reply. The features mentioned in this flip are only for > those source connectors that implement the SupportsWatermarkPushDown > interface, generating watermarks in other graph locations is not in the > scope of this discussion. Perhaps another flip can be proposed later to > implement this feature. > > @Shammon Thanks for your reply. In Flip-296, a rejected alternative is > adding watermark related options in the connector options,we believe that > we should not bind the watermark-related options to a connector to ensure > semantic clarity. > > > What will happen if we add watermark related options in `the connector > > options`? Will the connector ignore these options or throw an exception? > > How can we support this? > > If user defines different watermark configurations for one table in two > places, I tend to prefer the first place would prevail, but we can also > throw exception or just print logs to prompt the user, which are > implementation details. > > > If one table is used by two operators with different watermark params, > > what will happen? > > @Martijn Thanks for your reply. I'm sorry that we are not particularly > accurate, this hint is mainly for SQL, not table API. > > > While the FLIP talks about watermark options for Table API & SQL, I only > > see proposed syntax for SQL, not for the Table API. What is your proposal > > for the Table API > > @Jane Thanks for your reply. For the first question, If the user uses this > hint on those sourse that does not implement the SupportsWatermarkPushDown > interface, it will be completely invalid. The task will run as normal as if > the hint had not been used. > > > What's the behavior if there are multiple table sources, among which > > some do not support `SupportsWatermarkPushDown`? > > @Jane feedback that 'WATERMARK_PARAMS' is difficult to remember, perhaps > the naming issue can be put to the end of the discussion, because more > people like @Martijn @Shuo are considering whether these configurations > should be put into the DDL or the 'OPTIONS' hint. Here's what I > think, Putting these configs into DDL or putting them into 'OPTIONS' hint > is actually the same thing, because the 'OPTIONS' hint is mainly used to > configure the properties of conenctor. The reason why I want to use a new > hint is to make sure the semantics clear, in my opinion the configuration > of watermark should not be mixed up with connector. However, a new hint > does make it more difficult to use to some extent, for example, when a user > uses both 'OPTIONS' hint and 'WATERMARK_PARAMS' hint. For this point, maby > it is more appropriate to use uniform 'OPTIONS' hint. > On the other hand, if we enrich more watermark option keys in 'OPTIONS' > hints, The question will be what we treat the definatrions of 'OPTIONS' > hint, is this only specific to the connector options or could be more? > Maybe @Jark could share more insights here. In my opion, 'OPTIONS' is only > related to the connector options, which is not like the gernal watermark > options. > > > > Shuo Cheng 于2023年2月22日周三 19:17写道: > > > Hi Kui, > > > > Thanks for driving the discussion. It's quite useful to introduce Watermark > > options. I have some questions: > > > > What kind of hints is "WATERMARK_PARAMS"? > > Currently, we have two kinds of hints in Flink: Dynamic Table Options & > > Query Hints. As described in the Flip, "WATERMARK_PARAMS" is more like > > Dynamic Table Options. So two questions arise here: > > > > 1) Are these watermark options to be exposed as connector WITH options? Aa > > described in SQL
Re: [DISCUSS] FLIP-297: Improve Auxiliary Sql Statements
Hi Ran: Thanks for driving the FLIP. the google doc looks really good. it is important to improve user interactive experience. +1 to support this feature. Jing Ge 于2023年2月23日周四 00:51写道: > Hi Ran, > > Thanks for driving the FLIP. It looks overall good. Would you like to add > a description of useLike and notLike? I guess useLike true is for "LIKE" > and notLike true is for "NOT LIKE" but I am not sure if I understood it > correctly. Furthermore, does it make sense to support "ILIKE" too? > > Best regards, > Jing > > On Wed, Feb 22, 2023 at 1:17 PM Ran Tao wrote: > > > Currently flink sql auxiliary statements has supported some good features > > such as catalog/databases/table support. > > > > But these features are not very complete compared with other popular > > engines such as spark, presto, hive and commercial engines such as > > snowflake. > > > > For example, many engines support show operation with filtering except > > flink, and support describe other object(flink only support describe > > table). > > > > I wonder can we add these useful features for flink? > > You can find details in this doc.[1] or FLIP.[2] > > > > Also, please let me know if there is a mistake. Looking forward to your > > reply. > > > > [1] > > > > > https://docs.google.com/document/d/1hAiOfPx14VTBTOlpyxG7FA2mB1k5M31VnKYad2XpJ1I/ > > [2] > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-297%3A+Improve+Auxiliary+Sql+Statements > > > > Best Regards, > > Ran Tao > > >
Re: [DISCUSS] FLIP-296: Watermark options for table API & SQL
Hi, Yuan Kui. Thanks for driving it. IMO, the 'OPTIONS' hint may be not only specific to the connector options. Just as a reference, we also have `sink.parallelism`[1] as a connector options. It enables user to specific the writer's parallelism dynamically per-query. Personally, I perfer to consider watermark-related hints as option hints. So, user can define a default watermark strategy for the table, and if user dosen't needed to changes it, they need to do nothing in their query instead of specific it ervery time. [1] https://nightlies.apache.org/flink/flink-docs-master/zh/docs/connectors/table/filesystem/#sink-parallelism Best regards, Yuxia Best regards, Yuxia - 原始邮件 - 发件人: "kui yuan" 收件人: "dev" 抄送: "Jark Wu" 发送时间: 星期三, 2023年 2 月 22日 下午 10:08:11 主题: Re: [DISCUSS] FLIP-296: Watermark options for table API & SQL Hi all, Thanks for the lively discussion and I will respond to these questions one by one. However, there are also some common questions and I will answer together. @郑 Thanks for your reply. The features mentioned in this flip are only for those source connectors that implement the SupportsWatermarkPushDown interface, generating watermarks in other graph locations is not in the scope of this discussion. Perhaps another flip can be proposed later to implement this feature. @Shammon Thanks for your reply. In Flip-296, a rejected alternative is adding watermark related options in the connector options,we believe that we should not bind the watermark-related options to a connector to ensure semantic clarity. > What will happen if we add watermark related options in `the connector > options`? Will the connector ignore these options or throw an exception? > How can we support this? If user defines different watermark configurations for one table in two places, I tend to prefer the first place would prevail, but we can also throw exception or just print logs to prompt the user, which are implementation details. > If one table is used by two operators with different watermark params, > what will happen? @Martijn Thanks for your reply. I'm sorry that we are not particularly accurate, this hint is mainly for SQL, not table API. > While the FLIP talks about watermark options for Table API & SQL, I only > see proposed syntax for SQL, not for the Table API. What is your proposal > for the Table API @Jane Thanks for your reply. For the first question, If the user uses this hint on those sourse that does not implement the SupportsWatermarkPushDown interface, it will be completely invalid. The task will run as normal as if the hint had not been used. > What's the behavior if there are multiple table sources, among which > some do not support `SupportsWatermarkPushDown`? @Jane feedback that 'WATERMARK_PARAMS' is difficult to remember, perhaps the naming issue can be put to the end of the discussion, because more people like @Martijn @Shuo are considering whether these configurations should be put into the DDL or the 'OPTIONS' hint. Here's what I think, Putting these configs into DDL or putting them into 'OPTIONS' hint is actually the same thing, because the 'OPTIONS' hint is mainly used to configure the properties of conenctor. The reason why I want to use a new hint is to make sure the semantics clear, in my opinion the configuration of watermark should not be mixed up with connector. However, a new hint does make it more difficult to use to some extent, for example, when a user uses both 'OPTIONS' hint and 'WATERMARK_PARAMS' hint. For this point, maby it is more appropriate to use uniform 'OPTIONS' hint. On the other hand, if we enrich more watermark option keys in 'OPTIONS' hints, The question will be what we treat the definatrions of 'OPTIONS' hint, is this only specific to the connector options or could be more? Maybe @Jark could share more insights here. In my opion, 'OPTIONS' is only related to the connector options, which is not like the gernal watermark options. Shuo Cheng 于2023年2月22日周三 19:17写道: > Hi Kui, > > Thanks for driving the discussion. It's quite useful to introduce Watermark > options. I have some questions: > > What kind of hints is "WATERMARK_PARAMS"? > Currently, we have two kinds of hints in Flink: Dynamic Table Options & > Query Hints. As described in the Flip, "WATERMARK_PARAMS" is more like > Dynamic Table Options. So two questions arise here: > > 1) Are these watermark options to be exposed as connector WITH options? Aa > described in SQL Hints doc[1], "Dynamic Table Options allow to specify or > override table options dynamically", which implies that these options can > also be configured in WITH options. > > 2) Do we really need a new hint name like 'WATERMARK_PARAMS', table > options use "OPTIONS" as hint name, like '/*+ > OPTIONS('csv.ignore-parse-errors'='true') */', maybe we can enrich more > table option keys for watermark, e.g., /*+ > OPTIONS('watermark.emit-strategy'='ON_PERIODIC') */. > > > [1] > >
[jira] [Created] (FLINK-31188) Expose kubernetes scheduler configOption when running flink on kubernetes
Kelu Tao created FLINK-31188: Summary: Expose kubernetes scheduler configOption when running flink on kubernetes Key: FLINK-31188 URL: https://issues.apache.org/jira/browse/FLINK-31188 Project: Flink Issue Type: Improvement Components: Deployment / Kubernetes Reporter: Kelu Tao Now when we deploy Flink job on kubernetes, the scheduler is kubernetes scheduler by default. But the custom kubernetes scheduler setting sometimes is needed by users. So can we add the config option for kubernetes scheduler setting? Thanks. -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: Re: [discussion] To introduce a formatter for Markdown files
Hi Zhongpu, Thanks for the clarification. Prettier looks fine for formatting mk files. Is there any way to validate rules like e.g. only using '*' for listing items in the CI pipeline? Best regards, Jing On Wed, Feb 22, 2023 at 2:28 PM Zhongpu Chen wrote: > Hi Jing, > > Sorry for the last reply in a messed up formatting, as I am not quite > familiar with the mail list usage. > > First, as long as contributors install Prettier with the same version, > markdown files can be auto-formatted in the same way without extra working > via the `prettier --write **/*.md` command, and this can also be set as a > part of pre-commit hook. To check whether markdown files satisfy formatting > requirements, we can use `prettier --check **/*.md` where exit code 0 means > it does follow the styles. > > Secondly, the format check can be integrated with the CI pipeline easily. > Here is an example using GitHub Actions, and I think the setting in Azure > pipelines is similar. > > ```yml > jobs: > Check-MD-Format-Actions: > runs-on: ubuntu-latest > steps: > - name: Check out repo code > uses: actions/checkout@v3 > - name: Use Node.js > uses: actions/setup-node@v3 > with: > node-version: "12.x" > - name: Install pretty > run: npm i -g prettier > - name: Prettify code > run: prettier --check **/*.md > ``` > > At last, it will never cause any conflict. Prettier is compatible with > CommonMark (https://commonmark.org/) and GitHub Flavored Markdown. > > > On 2023/02/22 10:22:46 Jing Ge wrote: > > Hi Zhongpu, > > > > Thanks for starting this discussion. I was wondering how we could let > every > > contributor follow the same mk rule. I am not familiar with Prettier. > Would > > you like to help me understand some basic questions? Thanks. > > > > Is there any way to integrate the format check into our CI pipeline? > > Otherwise, it will be hard to control it. Another thing is that some docs > > are written with Hugo syntax [1]. Would they work together with no > conflict? > > > > > > Best regards, > > Jing > > > > [1] https://gohugo.io/ > > > > On Wed, Feb 22, 2023 at 9:50 AM Zhongpu Chen wrote: > > > > > As I mentioned in FLINK-31177 ( > > > https://issues.apache.org/jira/browse/FLINK-31177): > > > > > > Currently, markdown files in *docs* are maintained and updated by many > > > contributors, and different people have varying code style taste. By > the > > > way, as the syntax of markdown is not really strict, the styles tend to > be > > > inconsistent. > > > > > > To name a few, > > > > > >- Some prefer `*` to make a list item, while others may prefer `-`. > > >- It is common to leave many unnecessary blank lines and spaces. > > >- To make a divider, the number of `-` can be varying. > > > > > > To this end, I think it would be nicer to encourage or demand > contributors > > > to format their markdown files before making a pull request. > Personally, I > > > think Prettier (https://prettier.io/) is a good candidate. > > > What do you think? > > > > > > -- > > > Zhongpu Chen > > > > > >
[jira] [Created] (FLINK-31187) Standalone HA mode does not work if dynamic properties are supplied
Mate Czagany created FLINK-31187: Summary: Standalone HA mode does not work if dynamic properties are supplied Key: FLINK-31187 URL: https://issues.apache.org/jira/browse/FLINK-31187 Project: Flink Issue Type: Bug Components: Kubernetes Operator Affects Versions: kubernetes-operator-1.4.0 Reporter: Mate Czagany Attachments: standalone-ha.yaml With FLINK-30518 '--host $(POD_IP)' has been added to the arguments of the JMs which fixes the issue with HA on standalone mode, but it always gets appended to the end of the final JM arguments: https://github.com/usamj/flink-kubernetes-operator/blob/72ec9d384def3091ce50c2a3e2a06cded3b572e6/flink-kubernetes-standalone/src/main/java/org/apache/flink/kubernetes/operator/kubeclient/decorators/CmdStandaloneJobManagerDecorator.java#L107 But this will not be parsed properly in case any dynamic properties were set in the arguments, e.g.: {code:java} Program Arguments: --configDir /opt/flink/conf -D jobmanager.memory.off-heap.size=134217728b -D jobmanager.memory.jvm-overhead.min=201326592b -D jobmanager.memory.jvm-metaspace.size=268435456b -D jobmanager.memory.heap.size=469762048b -D jobmanager.memory.jvm-overhead.max=201326592b --job-classname org.apache.flink.streaming.examples.statemachine.StateMachineExample --test test --host 172.17.0.11{code} You can verify this bug by using the YAML I've attached and in the JM logs you can see this line: {code:java} Remoting started; listening on addresses :[akka.tcp://flink@flink-example-statemachine.flink:6123]{code} Without any program arguments supplied this would correctly be: {code:java} Remoting started; listening on addresses :[akka.tcp://flink@172.17.0.8:6123]{code} I believe this could be easily fixed by appending the --host parameter before JobSpec.args and if a committer can assign this ticket to me I can create a PR for this. -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [DISCUSS] FLIP-297: Improve Auxiliary Sql Statements
Hi Ran, Thanks for driving the FLIP. It looks overall good. Would you like to add a description of useLike and notLike? I guess useLike true is for "LIKE" and notLike true is for "NOT LIKE" but I am not sure if I understood it correctly. Furthermore, does it make sense to support "ILIKE" too? Best regards, Jing On Wed, Feb 22, 2023 at 1:17 PM Ran Tao wrote: > Currently flink sql auxiliary statements has supported some good features > such as catalog/databases/table support. > > But these features are not very complete compared with other popular > engines such as spark, presto, hive and commercial engines such as > snowflake. > > For example, many engines support show operation with filtering except > flink, and support describe other object(flink only support describe > table). > > I wonder can we add these useful features for flink? > You can find details in this doc.[1] or FLIP.[2] > > Also, please let me know if there is a mistake. Looking forward to your > reply. > > [1] > > https://docs.google.com/document/d/1hAiOfPx14VTBTOlpyxG7FA2mB1k5M31VnKYad2XpJ1I/ > [2] > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-297%3A+Improve+Auxiliary+Sql+Statements > > Best Regards, > Ran Tao >
Re: [ANNOUNCE] New Apache Flink Committer - Sergey Nuyanzin
Congrats Sergey! Well deserved!!! Best Regards, Andriy Redko On Wed, Feb 22, 2023, 9:03 AM Samrat Deb wrote: > Congratulations Sergey > > On Wed, Feb 22, 2023 at 3:31 PM Hang Ruan wrote: > > > Congratulations Sergey! > > > > Jane Chan 于2023年2月22日周三 15:53写道: > > > > > Congratulations, Sergey! > > > > > > Best regards, > > > Jane > > > > > > > > > On Wed, Feb 22, 2023 at 9:44 AM Dian Fu wrote: > > > > > > > Congratulations Sergey! > > > > > > > > On Tue, Feb 21, 2023 at 9:07 PM Rui Fan wrote: > > > > > > > > > Congratulations, Sergey! > > > > > > > > > > Best > > > > > Rui Fan > > > > > > > > > > On Tue, Feb 21, 2023 at 20:55 Junrui Lee > > wrote: > > > > > > > > > > > Congratulations Sergey! > > > > > > > > > > > > weijie guo 于2023年2月21日周二 20:54写道: > > > > > > > > > > > > > Congratulations, Sergey~ > > > > > > > > > > > > > > Best regards, > > > > > > > > > > > > > > Weijie > > > > > > > > > > > > > > > > > > > > > Jing Ge 于2023年2月21日周二 20:52写道: > > > > > > > > > > > > > > > congrats Sergey! > > > > > > > > > > > > > > > > On Tue, Feb 21, 2023 at 1:15 PM Matthias Pohl > > > > > > > > wrote: > > > > > > > > > > > > > > > > > Congratulations, Sergey! Good job & well-deserved! :) > > > > > > > > > > > > > > > > > > On Tue, Feb 21, 2023 at 1:03 PM yuxia < > > > > luoyu...@alumni.sjtu.edu.cn > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > Congratulations Sergey! > > > > > > > > > > > > > > > > > > > > Best regards, > > > > > > > > > > Yuxia > > > > > > > > > > > > > > > > > > > > - 原始邮件 - > > > > > > > > > > 发件人: "Martijn Visser" > > > > > > > > > > 收件人: "dev" > > > > > > > > > > 发送时间: 星期二, 2023年 2 月 21日 下午 7:58:35 > > > > > > > > > > 主题: Re: [ANNOUNCE] New Apache Flink Committer - Sergey > > > Nuyanzin > > > > > > > > > > > > > > > > > > > > Congrats Sergey, well deserved :) > > > > > > > > > > > > > > > > > > > > On Tue, Feb 21, 2023 at 12:53 PM Benchao Li < > > > > > libenc...@apache.org> > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > Congratulations Sergey! > > > > > > > > > > > > > > > > > > > > > > Timo Walther 于2023年2月21日周二 > 19:51写道: > > > > > > > > > > > > > > > > > > > > > > > Hi everyone, > > > > > > > > > > > > > > > > > > > > > > > > On behalf of the PMC, I'm very happy to announce > Sergey > > > > > > Nuyanzin > > > > > > > > as a > > > > > > > > > > > > new Flink Committer. > > > > > > > > > > > > > > > > > > > > > > > > Sergey started contributing small improvements to the > > > > project > > > > > > in > > > > > > > > > 2018. > > > > > > > > > > > > Over the past 1.5 years, he has become more active > and > > > > > focused > > > > > > on > > > > > > > > > > adding > > > > > > > > > > > > and reviewing changes to the Flink SQL ecosystem. > > > > > > > > > > > > > > > > > > > > > > > > Currently, he is upgrading Flink's SQL engine to the > > > latest > > > > > > > Apache > > > > > > > > > > > > Calcite version [1][2][3] and helps in updating other > > > > > > > project-wide > > > > > > > > > > > > dependencies as well. > > > > > > > > > > > > > > > > > > > > > > > > Please join me in congratulating Sergey Nuyanzin for > > > > > becoming a > > > > > > > > Flink > > > > > > > > > > > > Committer! > > > > > > > > > > > > > > > > > > > > > > > > Best, > > > > > > > > > > > > Timo Walther (on behalf of the Flink PMC) > > > > > > > > > > > > > > > > > > > > > > > > [1] > https://issues.apache.org/jira/browse/FLINK-29932 > > > > > > > > > > > > [2] > https://issues.apache.org/jira/browse/FLINK-21239 > > > > > > > > > > > > [3] > https://issues.apache.org/jira/browse/FLINK-20873 > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > > > > > > > > > > > > > > > Best, > > > > > > > > > > > Benchao Li > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >
Re: [DISCUSS] FLIP-296: Watermark options for table API & SQL
Hi all, Thanks for the lively discussion and I will respond to these questions one by one. However, there are also some common questions and I will answer together. @郑 Thanks for your reply. The features mentioned in this flip are only for those source connectors that implement the SupportsWatermarkPushDown interface, generating watermarks in other graph locations is not in the scope of this discussion. Perhaps another flip can be proposed later to implement this feature. @Shammon Thanks for your reply. In Flip-296, a rejected alternative is adding watermark related options in the connector options,we believe that we should not bind the watermark-related options to a connector to ensure semantic clarity. > What will happen if we add watermark related options in `the connector > options`? Will the connector ignore these options or throw an exception? > How can we support this? If user defines different watermark configurations for one table in two places, I tend to prefer the first place would prevail, but we can also throw exception or just print logs to prompt the user, which are implementation details. > If one table is used by two operators with different watermark params, > what will happen? @Martijn Thanks for your reply. I'm sorry that we are not particularly accurate, this hint is mainly for SQL, not table API. > While the FLIP talks about watermark options for Table API & SQL, I only > see proposed syntax for SQL, not for the Table API. What is your proposal > for the Table API @Jane Thanks for your reply. For the first question, If the user uses this hint on those sourse that does not implement the SupportsWatermarkPushDown interface, it will be completely invalid. The task will run as normal as if the hint had not been used. > What's the behavior if there are multiple table sources, among which > some do not support `SupportsWatermarkPushDown`? @Jane feedback that 'WATERMARK_PARAMS' is difficult to remember, perhaps the naming issue can be put to the end of the discussion, because more people like @Martijn @Shuo are considering whether these configurations should be put into the DDL or the 'OPTIONS' hint. Here's what I think, Putting these configs into DDL or putting them into 'OPTIONS' hint is actually the same thing, because the 'OPTIONS' hint is mainly used to configure the properties of conenctor. The reason why I want to use a new hint is to make sure the semantics clear, in my opinion the configuration of watermark should not be mixed up with connector. However, a new hint does make it more difficult to use to some extent, for example, when a user uses both 'OPTIONS' hint and 'WATERMARK_PARAMS' hint. For this point, maby it is more appropriate to use uniform 'OPTIONS' hint. On the other hand, if we enrich more watermark option keys in 'OPTIONS' hints, The question will be what we treat the definatrions of 'OPTIONS' hint, is this only specific to the connector options or could be more? Maybe @Jark could share more insights here. In my opion, 'OPTIONS' is only related to the connector options, which is not like the gernal watermark options. Shuo Cheng 于2023年2月22日周三 19:17写道: > Hi Kui, > > Thanks for driving the discussion. It's quite useful to introduce Watermark > options. I have some questions: > > What kind of hints is "WATERMARK_PARAMS"? > Currently, we have two kinds of hints in Flink: Dynamic Table Options & > Query Hints. As described in the Flip, "WATERMARK_PARAMS" is more like > Dynamic Table Options. So two questions arise here: > > 1) Are these watermark options to be exposed as connector WITH options? Aa > described in SQL Hints doc[1], "Dynamic Table Options allow to specify or > override table options dynamically", which implies that these options can > also be configured in WITH options. > > 2) Do we really need a new hint name like 'WATERMARK_PARAMS', table > options use "OPTIONS" as hint name, like '/*+ > OPTIONS('csv.ignore-parse-errors'='true') */', maybe we can enrich more > table option keys for watermark, e.g., /*+ > OPTIONS('watermark.emit-strategy'='ON_PERIODIC') */. > > > [1] > > https://nightlies.apache.org/flink/flink-docs-release-1.16/docs/dev/table/sql/queries/hints/ > > On Wed, Feb 22, 2023 at 10:22 AM kui yuan wrote: > > > Hi devs, > > > > > > I'd like to start a discussion thread for FLIP-296[1]. This comes from an > > offline discussion with @Yun Tang, and we hope to enrich table API & SQL > to > > support many watermark-related features which were only implemented at > the > > datastream API level. > > > > > > Basically, we want to introduce watermark options in table API & SQL via > > SQL hint named 'WATERMARK_PARAMS' to support features: > > > > 1、Configurable watermark emit strategy > > > > 2、Dealing with idle sources > > > > 3、Watermark alignment > > > > > > Last but not least, thanks to Qingsheng and Jing Zhang for the initial > > reviews. > > > > > > Looking forward to your thoughts and any feedback is appreciated! > > > > > >
Re: [ANNOUNCE] New Apache Flink Committer - Sergey Nuyanzin
Congratulations Sergey On Wed, Feb 22, 2023 at 3:31 PM Hang Ruan wrote: > Congratulations Sergey! > > Jane Chan 于2023年2月22日周三 15:53写道: > > > Congratulations, Sergey! > > > > Best regards, > > Jane > > > > > > On Wed, Feb 22, 2023 at 9:44 AM Dian Fu wrote: > > > > > Congratulations Sergey! > > > > > > On Tue, Feb 21, 2023 at 9:07 PM Rui Fan wrote: > > > > > > > Congratulations, Sergey! > > > > > > > > Best > > > > Rui Fan > > > > > > > > On Tue, Feb 21, 2023 at 20:55 Junrui Lee > wrote: > > > > > > > > > Congratulations Sergey! > > > > > > > > > > weijie guo 于2023年2月21日周二 20:54写道: > > > > > > > > > > > Congratulations, Sergey~ > > > > > > > > > > > > Best regards, > > > > > > > > > > > > Weijie > > > > > > > > > > > > > > > > > > Jing Ge 于2023年2月21日周二 20:52写道: > > > > > > > > > > > > > congrats Sergey! > > > > > > > > > > > > > > On Tue, Feb 21, 2023 at 1:15 PM Matthias Pohl > > > > > > > wrote: > > > > > > > > > > > > > > > Congratulations, Sergey! Good job & well-deserved! :) > > > > > > > > > > > > > > > > On Tue, Feb 21, 2023 at 1:03 PM yuxia < > > > luoyu...@alumni.sjtu.edu.cn > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > Congratulations Sergey! > > > > > > > > > > > > > > > > > > Best regards, > > > > > > > > > Yuxia > > > > > > > > > > > > > > > > > > - 原始邮件 - > > > > > > > > > 发件人: "Martijn Visser" > > > > > > > > > 收件人: "dev" > > > > > > > > > 发送时间: 星期二, 2023年 2 月 21日 下午 7:58:35 > > > > > > > > > 主题: Re: [ANNOUNCE] New Apache Flink Committer - Sergey > > Nuyanzin > > > > > > > > > > > > > > > > > > Congrats Sergey, well deserved :) > > > > > > > > > > > > > > > > > > On Tue, Feb 21, 2023 at 12:53 PM Benchao Li < > > > > libenc...@apache.org> > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > Congratulations Sergey! > > > > > > > > > > > > > > > > > > > > Timo Walther 于2023年2月21日周二 19:51写道: > > > > > > > > > > > > > > > > > > > > > Hi everyone, > > > > > > > > > > > > > > > > > > > > > > On behalf of the PMC, I'm very happy to announce Sergey > > > > > Nuyanzin > > > > > > > as a > > > > > > > > > > > new Flink Committer. > > > > > > > > > > > > > > > > > > > > > > Sergey started contributing small improvements to the > > > project > > > > > in > > > > > > > > 2018. > > > > > > > > > > > Over the past 1.5 years, he has become more active and > > > > focused > > > > > on > > > > > > > > > adding > > > > > > > > > > > and reviewing changes to the Flink SQL ecosystem. > > > > > > > > > > > > > > > > > > > > > > Currently, he is upgrading Flink's SQL engine to the > > latest > > > > > > Apache > > > > > > > > > > > Calcite version [1][2][3] and helps in updating other > > > > > > project-wide > > > > > > > > > > > dependencies as well. > > > > > > > > > > > > > > > > > > > > > > Please join me in congratulating Sergey Nuyanzin for > > > > becoming a > > > > > > > Flink > > > > > > > > > > > Committer! > > > > > > > > > > > > > > > > > > > > > > Best, > > > > > > > > > > > Timo Walther (on behalf of the Flink PMC) > > > > > > > > > > > > > > > > > > > > > > [1] https://issues.apache.org/jira/browse/FLINK-29932 > > > > > > > > > > > [2] https://issues.apache.org/jira/browse/FLINK-21239 > > > > > > > > > > > [3] https://issues.apache.org/jira/browse/FLINK-20873 > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > > > > > > > > > > > > > Best, > > > > > > > > > > Benchao Li > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >
RE: Re: [discussion] To introduce a formatter for Markdown files
Hi Jing, Sorry for the last reply in a messed up formatting, as I am not quite familiar with the mail list usage. First, as long as contributors install Prettier with the same version, markdown files can be auto-formatted in the same way without extra working via the `prettier --write **/*.md` command, and this can also be set as a part of pre-commit hook. To check whether markdown files satisfy formatting requirements, we can use `prettier --check **/*.md` where exit code 0 means it does follow the styles. Secondly, the format check can be integrated with the CI pipeline easily. Here is an example using GitHub Actions, and I think the setting in Azure pipelines is similar. ```yml jobs: Check-MD-Format-Actions: runs-on: ubuntu-latest steps: - name: Check out repo code uses: actions/checkout@v3 - name: Use Node.js uses: actions/setup-node@v3 with: node-version: "12.x" - name: Install pretty run: npm i -g prettier - name: Prettify code run: prettier --check **/*.md ``` At last, it will never cause any conflict. Prettier is compatible with CommonMark (https://commonmark.org/) and GitHub Flavored Markdown. On 2023/02/22 10:22:46 Jing Ge wrote: > Hi Zhongpu, > > Thanks for starting this discussion. I was wondering how we could let every > contributor follow the same mk rule. I am not familiar with Prettier. Would > you like to help me understand some basic questions? Thanks. > > Is there any way to integrate the format check into our CI pipeline? > Otherwise, it will be hard to control it. Another thing is that some docs > are written with Hugo syntax [1]. Would they work together with no conflict? > > > Best regards, > Jing > > [1] https://gohugo.io/ > > On Wed, Feb 22, 2023 at 9:50 AM Zhongpu Chen wrote: > > > As I mentioned in FLINK-31177 ( > > https://issues.apache.org/jira/browse/FLINK-31177): > > > > Currently, markdown files in *docs* are maintained and updated by many > > contributors, and different people have varying code style taste. By the > > way, as the syntax of markdown is not really strict, the styles tend to be > > inconsistent. > > > > To name a few, > > > >- Some prefer `*` to make a list item, while others may prefer `-`. > >- It is common to leave many unnecessary blank lines and spaces. > >- To make a divider, the number of `-` can be varying. > > > > To this end, I think it would be nicer to encourage or demand contributors > > to format their markdown files before making a pull request. Personally, I > > think Prettier (https://prettier.io/) is a good candidate. > > What do you think? > > > > -- > > Zhongpu Chen > > >
[jira] [Created] (FLINK-31186) Removing topic from kafka source does nothing
Exidex created FLINK-31186: -- Summary: Removing topic from kafka source does nothing Key: FLINK-31186 URL: https://issues.apache.org/jira/browse/FLINK-31186 Project: Flink Issue Type: Bug Affects Versions: 1.15.3 Reporter: Exidex As far as I can tell, there is no good way to remove topic from the list of topic that kafka source consumes from. We use {{StreamExecutionEnvironment.fromSource}} api with {{KafkaSource.setTopics}} which accepts list of topics. but when we remove the topic from list after some time the flink kafka source still consumes from it. My guess is that it relates to this TODO in code: [GitHub|https://github.com/apache/flink/blob/cc66d4855e6f8ee9986809a18f68a458bcfe3c12/flink-connectors/flink-connector-kafka/src/main/java/org/apache/flink/connector/kafka/source/enumerator/KafkaSourceEnumerator.java#L305] You can kind of workaroud this by removing whole job state or changing uid of kafka source but that affects either whole job or whole source. The other way is to use state processor api but it doesn't expose source operator state, which in turn can be worked around using reflection and copying code from SourceCoordinator. None of those are satisfactory -- This message was sent by Atlassian Jira (v8.20.10#820010)
RE: Re: [discussion] To introduce a formatter for Markdown files
> I was wondering how we could let every contributor follow the same mk rule. As long as contributors install Prettier with the same version, markdown files can be auto-formatted in the same way without extra working via the `prettier --write **/*.md` command, and this can also be set as a part of pre-commit hook. To check whether markdown files satisfy formatting requirements, we can use `prettier --check **/*.md` where exit code 0 means it does follow the styles. > Is there any way to integrate the format check into our CI pipeline? Yes, it is possible. I have tested using GitHub Actions, and I think Azure pipelines also support this feature. ```yml jobs: Check-MD-Format-Actions: runs-on: ubuntu-latest steps: - name: Check out repo code uses: actions/checkout@v3 - name: Prettify code uses: creyD/prettier_action@v4.2 with: prettier_options: --check **/*.md ``` > Another thing is that some docs are written with Hugo syntax. Would they work together with no conflict? Well, it will never cause any conflict. Since Hugo syntax is a superset of regular markdown syntax, and Prettier is compatible with CommonMark ( https://commonmark.org/) and GitHub Flavored Markdown. On 2023/02/22 10:22:46 Jing Ge wrote: > Hi Zhongpu, > > Thanks for starting this discussion. I was wondering how we could let every > contributor follow the same mk rule. I am not familiar with Prettier. Would > you like to help me understand some basic questions? Thanks. > > Is there any way to integrate the format check into our CI pipeline? > Otherwise, it will be hard to control it. Another thing is that some docs > are written with Hugo syntax [1]. Would they work together with no conflict? > > > Best regards, > Jing > > [1] https://gohugo.io/ > > On Wed, Feb 22, 2023 at 9:50 AM Zhongpu Chen wrote: > > > As I mentioned in FLINK-31177 ( > > https://issues.apache.org/jira/browse/FLINK-31177): > > > > Currently, markdown files in *docs* are maintained and updated by many > > contributors, and different people have varying code style taste. By the > > way, as the syntax of markdown is not really strict, the styles tend to be > > inconsistent. > > > > To name a few, > > > >- Some prefer `*` to make a list item, while others may prefer `-`. > >- It is common to leave many unnecessary blank lines and spaces. > >- To make a divider, the number of `-` can be varying. > > > > To this end, I think it would be nicer to encourage or demand contributors > > to format their markdown files before making a pull request. Personally, I > > think Prettier (https://prettier.io/) is a good candidate. > > What do you think? > > > > -- > > Zhongpu Chen > > >
[DISCUSS] FLIP-297: Improve Auxiliary Sql Statements
Currently flink sql auxiliary statements has supported some good features such as catalog/databases/table support. But these features are not very complete compared with other popular engines such as spark, presto, hive and commercial engines such as snowflake. For example, many engines support show operation with filtering except flink, and support describe other object(flink only support describe table). I wonder can we add these useful features for flink? You can find details in this doc.[1] or FLIP.[2] Also, please let me know if there is a mistake. Looking forward to your reply. [1] https://docs.google.com/document/d/1hAiOfPx14VTBTOlpyxG7FA2mB1k5M31VnKYad2XpJ1I/ [2] https://cwiki.apache.org/confluence/display/FLINK/FLIP-297%3A+Improve+Auxiliary+Sql+Statements Best Regards, Ran Tao
[jira] [Created] (FLINK-31185) Python BroadcastProcessFunction not support side output
Juntao Hu created FLINK-31185: - Summary: Python BroadcastProcessFunction not support side output Key: FLINK-31185 URL: https://issues.apache.org/jira/browse/FLINK-31185 Project: Flink Issue Type: Bug Components: API / Python Affects Versions: 1.16.1 Reporter: Juntao Hu Fix For: 1.17.0, 1.16.2 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[RESULT] [VOTE] Apache Flink Kubernetes Operator Release 1.4.0, release candidate #1
I'm happy to announce that we have unanimously approved this release. There are 6 approving votes, 3 of which are binding: * Marton Balassi (binding) * Gyula Fora (binding) * Maximilian Mixhels (binding) * Jim Busche * Peter Huang * Matt Wang There are no disapproving votes. Thanks everyone! Gyula
Re: [VOTE] Apache Flink Kubernetes Operator Release 1.4.0, release candidate #1
Thank you everyone, closing this vote now! Gyula On Tue, Feb 21, 2023 at 2:19 PM Matt Wang wrote: > Thank you, Gyula. > > > +1 (non-binding) > > I tested the following: > > 1. Downloaded the archives, checksums, signatures and README file > 2. Build the source distribution to ensure all source files have Apache > headers > 3. Verify that all POM files point to the same version > 4. Helm repo install from > https://dist.apache.org/repos/dist/dev/flink/flink-kubernetes-operator-1.4.0-rc1/ > 5. Ran example jobs with V1_16, and verified operator logs > > > > -- > > Best, > Matt Wang > > > Replied Message > | From | Márton Balassi | > | Date | 02/21/2023 21:07 | > | To | | > | Subject | Re: [VOTE] Apache Flink Kubernetes Operator Release 1.4.0, > release candidate #1 | > Thank you, Gyula. > > +1 (binding) > > - Verified Helm repo works as expected, points to correct image tag, build, > version > - Verified basic examples + checked operator logs everything looks as > expected > - Verified hashes, signatures and source release contains no binaries > - Ran built-in tests, built jars + docker image from source successfully > > Best, > Marton > > On Mon, Feb 20, 2023 at 2:58 PM Gyula Fóra wrote: > > +1 (binding) > > - Downloaded archives, validated binary content, checksum signatures, > NOTICE files > - Installed to local k8s cluster from Helm repo, verified docker image, > logs > - Ran some basic examples, ran autoscaler example > > Cheers > Gyula > > On Sun, Feb 19, 2023 at 5:09 AM Peter Huang > wrote: > > Thanks for preparing the release! > > +1 (non-binding) > > 1. Downloaded the archives, checksums, and signatures > 2. Extract and inspect the source code for binaries > 3. Compiled and tested the source code via mvn verify > 4. Deployed helm chart from flink-kubernetes-operator-1.4.0-helm.tgz to > local minikube cluster > 5. Ran example jobs with V1_14, V1_15, V1_16, and verified operator logs > 6. Built the docker container locally and verified it through repeatting > step 5 > > Best Regards > Peter Huang > > On Fri, Feb 17, 2023 at 8:53 PM Jim Busche wrote: > > Thanks Gyula for the release. > > +1 (non-binding) > > > I tested the following: > > * Helm repo install from flink-kubernetes-operator-1.4.0-helm.tgz > * Podman Dockerfile build from source, looked good. > * Twistlock security scans of proposed image looks good. No > currently > known vulnerabilities with the built image or > ghcr.io/apache/flink-kubernetes-operator:7fc23a1 > * UI, basic sample, basic session jobs look good. Logs look as > expected. > * Checksums looked good > * Tested OLM build/install on OpenShift 4.12 > * Verified that sessionjob correctly can write in the operator’s > /opt/flink/artifacts filesystem on OpenShift even in a non-default > namespace. > > > > Thanks, Jim > > > >
[jira] [Created] (FLINK-31184) Failed to get python udf runner directory via running GET_RUNNER_DIR_SCRIPT
Wei Zhong created FLINK-31184: - Summary: Failed to get python udf runner directory via running GET_RUNNER_DIR_SCRIPT Key: FLINK-31184 URL: https://issues.apache.org/jira/browse/FLINK-31184 Project: Flink Issue Type: Bug Components: API / Python Affects Versions: 1.16.1, 1.15.3, 1.17.0 Reporter: Wei Zhong The following exception is thrown when using python udf in user job: {code:java} Caused by: java.io.IOException: Cannot run program "ERROR: ld.so: object '/usr/lib64/libjemalloc.so.1' from LD_PRELOAD cannot be preloaded: ignored. /mnt/ssd/0/yarn/nm-local-dir/usercache/flink/appcache/application_1670838323719_705777/python-dist-fe870981-4de7-4229-ad0b-f51881e80d90/python-archives/pipeline_venv_v5.tar.gz/lib/python3.7/site-packages/pyflink/bin/pyflink-udf-runner.sh": error=2, No such file or directory at java.lang.ProcessBuilder.start(ProcessBuilder.java:1048) at org.apache.beam.runners.fnexecution.environment.ProcessManager.startProcess(ProcessManager.java:147) at org.apache.beam.runners.fnexecution.environment.ProcessManager.startProcess(ProcessManager.java:122) at org.apache.beam.runners.fnexecution.environment.ProcessEnvironmentFactory.createEnvironment(ProcessEnvironmentFactory.java:106) at org.apache.beam.runners.fnexecution.control.DefaultJobBundleFactory$1.load(DefaultJobBundleFactory.java:252) at org.apache.beam.runners.fnexecution.control.DefaultJobBundleFactory$1.load(DefaultJobBundleFactory.java:231) at org.apache.beam.vendor.guava.v26_0_jre.com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3528) at org.apache.beam.vendor.guava.v26_0_jre.com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2277) at org.apache.beam.vendor.guava.v26_0_jre.com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2154) at org.apache.beam.vendor.guava.v26_0_jre.com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2044) at org.apache.beam.vendor.guava.v26_0_jre.com.google.common.cache.LocalCache.get(LocalCache.java:3952) at org.apache.beam.vendor.guava.v26_0_jre.com.google.common.cache.LocalCache.getOrLoad(LocalCache.java:3974) at org.apache.beam.vendor.guava.v26_0_jre.com.google.common.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4958) at org.apache.beam.vendor.guava.v26_0_jre.com.google.common.cache.LocalCache$LocalLoadingCache.getUnchecked(LocalCache.java:4964) ... 19 more Suppressed: java.lang.NullPointerException: Process for id does not exist: 1-1 at org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkNotNull(Preconditions.java:895) at org.apache.beam.runners.fnexecution.environment.ProcessManager.stopProcess(ProcessManager.java:172) at org.apache.beam.runners.fnexecution.environment.ProcessEnvironmentFactory.createEnvironment(ProcessEnvironmentFactory.java:126) ... 29 more Caused by: java.io.IOException: error=2, No such file or directory at java.lang.UNIXProcess.forkAndExec(Native Method) at java.lang.UNIXProcess.(UNIXProcess.java:247) at java.lang.ProcessImpl.start(ProcessImpl.java:134) at java.lang.ProcessBuilder.start(ProcessBuilder.java:1029) ... 32 more {code} This is because SRE introduce a environment param {code:java} LD_PRELOAD=/usr/lib64/libjemalloc.so.1 {code} The logic of the python process itself can be executed normally, but an extra error message will be printed. So the whole output looks like: {code:java} ERROR: ld.so: object '/usr/lib64/libjemalloc.so.1' from LD_PRELOAD cannot be preloaded: ignored. /mnt/ssd/0/yarn/nm-local-dir/usercache/flink/appcache/application_1670838323719_705777/python-dist-fe870981-4de7-4229-ad0b-f51881e80d90/python-archives/pipeline_venv_v5.tar.gz/lib/python3.7/site-packages/pyflink/bin/{code} And the whole output is treated as a command, which caused the exception. It seems the output is not very reliable. Maybe we need to find another way to transfer data, or filter the output before using. -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [DISCUSS] FLIP-296: Watermark options for table API & SQL
Hi Kui, Thanks for driving the discussion. It's quite useful to introduce Watermark options. I have some questions: What kind of hints is "WATERMARK_PARAMS"? Currently, we have two kinds of hints in Flink: Dynamic Table Options & Query Hints. As described in the Flip, "WATERMARK_PARAMS" is more like Dynamic Table Options. So two questions arise here: 1) Are these watermark options to be exposed as connector WITH options? Aa described in SQL Hints doc[1], "Dynamic Table Options allow to specify or override table options dynamically", which implies that these options can also be configured in WITH options. 2) Do we really need a new hint name like 'WATERMARK_PARAMS', table options use "OPTIONS" as hint name, like '/*+ OPTIONS('csv.ignore-parse-errors'='true') */', maybe we can enrich more table option keys for watermark, e.g., /*+ OPTIONS('watermark.emit-strategy'='ON_PERIODIC') */. [1] https://nightlies.apache.org/flink/flink-docs-release-1.16/docs/dev/table/sql/queries/hints/ On Wed, Feb 22, 2023 at 10:22 AM kui yuan wrote: > Hi devs, > > > I'd like to start a discussion thread for FLIP-296[1]. This comes from an > offline discussion with @Yun Tang, and we hope to enrich table API & SQL to > support many watermark-related features which were only implemented at the > datastream API level. > > > Basically, we want to introduce watermark options in table API & SQL via > SQL hint named 'WATERMARK_PARAMS' to support features: > > 1、Configurable watermark emit strategy > > 2、Dealing with idle sources > > 3、Watermark alignment > > > Last but not least, thanks to Qingsheng and Jing Zhang for the initial > reviews. > > > Looking forward to your thoughts and any feedback is appreciated! > > > [1] > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=240884405 > > > Best > > Yuan Kui >
[jira] [Created] (FLINK-31183) Flink Kinesis EFO Consumer can fail to stop gracefully
Danny Cranmer created FLINK-31183: - Summary: Flink Kinesis EFO Consumer can fail to stop gracefully Key: FLINK-31183 URL: https://issues.apache.org/jira/browse/FLINK-31183 Project: Flink Issue Type: Bug Components: Connectors / Kinesis Affects Versions: aws-connector-4.0.0, 1.16.1, aws-connector-3.0.0, 1.15.3 Reporter: Danny Cranmer Fix For: 1.17.0, 1.15.4, aws-connector-4.1.0, 1.16.2 *Background* When stopping a Flink job using the stop-with-savepoint API the EFO Kinesis source can fail to close gracefully. Sample stack trace {code:java} 2023-02-16 20:45:40 org.apache.flink.runtime.checkpoint.CheckpointException: Task has failed. at org.apache.flink.runtime.messages.checkpoint.SerializedCheckpointException.unwrap(SerializedCheckpointException.java:51) at org.apache.flink.runtime.checkpoint.CheckpointCoordinator.receiveDeclineMessage(CheckpointCoordinator.java:1013) at org.apache.flink.runtime.scheduler.ExecutionGraphHandler.lambda$declineCheckpoint$2(ExecutionGraphHandler.java:103) at org.apache.flink.runtime.scheduler.ExecutionGraphHandler.lambda$processCheckpointCoordinatorMessage$3(ExecutionGraphHandler.java:119) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) at java.base/java.lang.Thread.run(Thread.java:829) Caused by: org.apache.flink.runtime.checkpoint.CheckpointException: Task name with subtask : Source: vas_source_stream (38/48)#0 Failure reason: Task has failed. at org.apache.flink.runtime.taskmanager.Task.declineCheckpoint(Task.java:1395) at org.apache.flink.runtime.taskmanager.Task.lambda$triggerCheckpointBarrier$3(Task.java:1338) at java.base/java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:930) at java.base/java.util.concurrent.CompletableFuture$UniHandle.tryFire(CompletableFuture.java:907) at java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506) at java.base/java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2088) at org.apache.flink.streaming.runtime.tasks.SourceStreamTask$LegacySourceFunctionThread.run(SourceStreamTask.java:343) Caused by: java.util.concurrent.CompletionException: java.util.concurrent.RejectedExecutionException: event executor terminated at java.base/java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:331) at java.base/java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:346) at java.base/java.util.concurrent.CompletableFuture$UniCompose.tryFire(CompletableFuture.java:1063) ... 3 more Caused by: java.util.concurrent.RejectedExecutionException: event executor terminated at org.apache.flink.kinesis.shaded.io.netty.util.concurrent.SingleThreadEventExecutor.reject(SingleThreadEventExecutor.java:923) at org.apache.flink.kinesis.shaded.io.netty.util.concurrent.SingleThreadEventExecutor.offerTask(SingleThreadEventExecutor.java:350) at org.apache.flink.kinesis.shaded.io.netty.util.concurrent.SingleThreadEventExecutor.addTask(SingleThreadEventExecutor.java:343) at org.apache.flink.kinesis.shaded.io.netty.util.concurrent.SingleThreadEventExecutor.execute(SingleThreadEventExecutor.java:825) at org.apache.flink.kinesis.shaded.io.netty.util.concurrent.SingleThreadEventExecutor.execute(SingleThreadEventExecutor.java:815) at org.apache.flink.kinesis.shaded.software.amazon.awssdk.http.nio.netty.internal.nrs.HandlerPublisher$ChannelSubscription.cancel(HandlerPublisher.java:502) at org.apache.flink.kinesis.shaded.software.amazon.awssdk.utils.async.DelegatingSubscription.cancel(DelegatingSubscription.java:37) at org.apache.flink.kinesis.shaded.software.amazon.awssdk.http.nio.netty.internal.http2.Http2ResetSendingSubscription.cancel(Http2ResetSendingSubscription.java:41) at org.apache.flink.kinesis.shaded.software.amazon.awssdk.utils.async.DelegatingSubscription.cancel(DelegatingSubscription.java:37) at org.apache.flink.kinesis.shaded.software.amazon.awssdk.http.nio.netty.internal.ResponseHandler$OnCancelSubscription.cancel(ResponseHandler.java:409) at org.apache.flink.kinesis.shaded.software.amazon.awssdk.utils.async.FlatteningSubscriber$1.cancel(FlatteningSubscriber.java:98) at org.apache.flink.kinesis.shaded.software.amazon.awssdk.utils.async.FlatteningSubscriber.handleStateUpdate(FlatteningSubscriber.java:170) at org.apache.flink.kinesis.shaded.software.amazon.awssdk.utils.async.FlatteningSubscriber.access$100(FlatteningSubscriber.java:29) at
Re: [discussion] To introduce a formatter for Markdown files
Hi Zhongpu, Thanks for starting this discussion. I was wondering how we could let every contributor follow the same mk rule. I am not familiar with Prettier. Would you like to help me understand some basic questions? Thanks. Is there any way to integrate the format check into our CI pipeline? Otherwise, it will be hard to control it. Another thing is that some docs are written with Hugo syntax [1]. Would they work together with no conflict? Best regards, Jing [1] https://gohugo.io/ On Wed, Feb 22, 2023 at 9:50 AM Zhongpu Chen wrote: > As I mentioned in FLINK-31177 ( > https://issues.apache.org/jira/browse/FLINK-31177): > > Currently, markdown files in *docs* are maintained and updated by many > contributors, and different people have varying code style taste. By the > way, as the syntax of markdown is not really strict, the styles tend to be > inconsistent. > > To name a few, > >- Some prefer `*` to make a list item, while others may prefer `-`. >- It is common to leave many unnecessary blank lines and spaces. >- To make a divider, the number of `-` can be varying. > > To this end, I think it would be nicer to encourage or demand contributors > to format their markdown files before making a pull request. Personally, I > think Prettier (https://prettier.io/) is a good candidate. > What do you think? > > -- > Zhongpu Chen >
[jira] [Created] (FLINK-31182) CompiledPlan cannot deserialize BridgingSqlFunction correctly
Jane Chan created FLINK-31182: - Summary: CompiledPlan cannot deserialize BridgingSqlFunction correctly Key: FLINK-31182 URL: https://issues.apache.org/jira/browse/FLINK-31182 Project: Flink Issue Type: Bug Components: Table SQL / Planner Affects Versions: 1.17.0, 1.18.0, 1.17.1 Reporter: Jane Chan This issue is reported from the [user mail list|https://lists.apache.org/thread/dglf8zgtt1yx3vrdytn0dlv7b3pw1nq6]. The stacktrace is {code:java} Caused by: org.apache.flink.table.api.TableException: Could not resolve internal system function '$UNNEST_ROWS$1'. This is a bug, please file an issue. at org.apache.flink.table.planner.plan.nodes.exec.serde.RexNodeJsonDeserializer.deserializeInternalFunction(RexNodeJsonDeserializer.java:392) at org.apache.flink.table.planner.plan.nodes.exec.serde.RexNodeJsonDeserializer.deserializeSqlOperator(RexNodeJsonDeserializer.java:337) at org.apache.flink.table.planner.plan.nodes.exec.serde.RexNodeJsonDeserializer.deserializeCall(RexNodeJsonDeserializer.java:307) at org.apache.flink.table.planner.plan.nodes.exec.serde.RexNodeJsonDeserializer.deserialize(RexNodeJsonDeserializer.java:146) at org.apache.flink.table.planner.plan.nodes.exec.serde.RexNodeJsonDeserializer.deserialize(RexNodeJsonDeserializer.java:128) at org.apache.flink.table.planner.plan.nodes.exec.serde.RexNodeJsonDeserializer.deserialize(RexNodeJsonDeserializer.java:115) {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [ANNOUNCE] New Apache Flink Committer - Sergey Nuyanzin
Congratulations Sergey! Jane Chan 于2023年2月22日周三 15:53写道: > Congratulations, Sergey! > > Best regards, > Jane > > > On Wed, Feb 22, 2023 at 9:44 AM Dian Fu wrote: > > > Congratulations Sergey! > > > > On Tue, Feb 21, 2023 at 9:07 PM Rui Fan wrote: > > > > > Congratulations, Sergey! > > > > > > Best > > > Rui Fan > > > > > > On Tue, Feb 21, 2023 at 20:55 Junrui Lee wrote: > > > > > > > Congratulations Sergey! > > > > > > > > weijie guo 于2023年2月21日周二 20:54写道: > > > > > > > > > Congratulations, Sergey~ > > > > > > > > > > Best regards, > > > > > > > > > > Weijie > > > > > > > > > > > > > > > Jing Ge 于2023年2月21日周二 20:52写道: > > > > > > > > > > > congrats Sergey! > > > > > > > > > > > > On Tue, Feb 21, 2023 at 1:15 PM Matthias Pohl > > > > > > wrote: > > > > > > > > > > > > > Congratulations, Sergey! Good job & well-deserved! :) > > > > > > > > > > > > > > On Tue, Feb 21, 2023 at 1:03 PM yuxia < > > luoyu...@alumni.sjtu.edu.cn > > > > > > > > > > wrote: > > > > > > > > > > > > > > > Congratulations Sergey! > > > > > > > > > > > > > > > > Best regards, > > > > > > > > Yuxia > > > > > > > > > > > > > > > > - 原始邮件 - > > > > > > > > 发件人: "Martijn Visser" > > > > > > > > 收件人: "dev" > > > > > > > > 发送时间: 星期二, 2023年 2 月 21日 下午 7:58:35 > > > > > > > > 主题: Re: [ANNOUNCE] New Apache Flink Committer - Sergey > Nuyanzin > > > > > > > > > > > > > > > > Congrats Sergey, well deserved :) > > > > > > > > > > > > > > > > On Tue, Feb 21, 2023 at 12:53 PM Benchao Li < > > > libenc...@apache.org> > > > > > > > wrote: > > > > > > > > > > > > > > > > > Congratulations Sergey! > > > > > > > > > > > > > > > > > > Timo Walther 于2023年2月21日周二 19:51写道: > > > > > > > > > > > > > > > > > > > Hi everyone, > > > > > > > > > > > > > > > > > > > > On behalf of the PMC, I'm very happy to announce Sergey > > > > Nuyanzin > > > > > > as a > > > > > > > > > > new Flink Committer. > > > > > > > > > > > > > > > > > > > > Sergey started contributing small improvements to the > > project > > > > in > > > > > > > 2018. > > > > > > > > > > Over the past 1.5 years, he has become more active and > > > focused > > > > on > > > > > > > > adding > > > > > > > > > > and reviewing changes to the Flink SQL ecosystem. > > > > > > > > > > > > > > > > > > > > Currently, he is upgrading Flink's SQL engine to the > latest > > > > > Apache > > > > > > > > > > Calcite version [1][2][3] and helps in updating other > > > > > project-wide > > > > > > > > > > dependencies as well. > > > > > > > > > > > > > > > > > > > > Please join me in congratulating Sergey Nuyanzin for > > > becoming a > > > > > > Flink > > > > > > > > > > Committer! > > > > > > > > > > > > > > > > > > > > Best, > > > > > > > > > > Timo Walther (on behalf of the Flink PMC) > > > > > > > > > > > > > > > > > > > > [1] https://issues.apache.org/jira/browse/FLINK-29932 > > > > > > > > > > [2] https://issues.apache.org/jira/browse/FLINK-21239 > > > > > > > > > > [3] https://issues.apache.org/jira/browse/FLINK-20873 > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > > > > > > > > > > > Best, > > > > > > > > > Benchao Li > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >
[VOTE] Flink minor version support policy for old releases
I am starting a vote to update the "Update Policy for old releases" [1] to include additional bugfix support for end of life versions. As per the discussion thread [2], the change we are voting on is: - Support policy: updated to include: "Upon release of a new Flink minor version, the community will perform one final bugfix release for resolved critical/blocker issues in the Flink minor version losing support." - Release process: add a step to start the discussion thread for the final patch version, if there are resolved critical/blocking issues to flush. Voting schema: since our bylaws [3] do not cover this particular scenario, and releases require PMC involvement, we will use a consensus vote with PMC binding votes. Thanks, Danny [1] https://flink.apache.org/downloads.html#update-policy-for-old-releases [2] https://lists.apache.org/thread/szq23kr3rlkm80rw7k9n95js5vqpsnbv [3] https://cwiki.apache.org/confluence/display/FLINK/Flink+Bylaws
[discussion] To introduce a formatter for Markdown files
As I mentioned in FLINK-31177 ( https://issues.apache.org/jira/browse/FLINK-31177): Currently, markdown files in *docs* are maintained and updated by many contributors, and different people have varying code style taste. By the way, as the syntax of markdown is not really strict, the styles tend to be inconsistent. To name a few, - Some prefer `*` to make a list item, while others may prefer `-`. - It is common to leave many unnecessary blank lines and spaces. - To make a divider, the number of `-` can be varying. To this end, I think it would be nicer to encourage or demand contributors to format their markdown files before making a pull request. Personally, I think Prettier (https://prettier.io/) is a good candidate. What do you think? -- Zhongpu Chen
Re: [DISCUSS] FLIP-296: Watermark options for table API & SQL
Hi Kui, Thanks for bringing this to the discussion. Toward the FLIP, I have several questions. 1. What's the behavior if there are multiple table sources, among which some do not support `SupportsWatermarkPushDown`? > So the features that this flip intends to support are only for those > sources that implement the `SupportsWatermarkPushDown` interface. > 2. Is there any reason for having a different parameter name for the global and hint configuration? Personally, I feel this is a little hard to remember. Can we unify the parameter naming? > For the 'ON_EVENT' strategy, option ‘emit-gap-on-event’ can configure how > many events to emit a watermark, the default value is 1. We will also add > a global parameter 'table.exec.watermark-emit.gap' to achieve the same > goal, which will be valid for each source and will ease the user's > configuration to some extent. > 3. As Martijn mentioned, the API changes are absent in the FLIP. Can you add more details? Best, Jane On Wed, Feb 22, 2023 at 4:29 PM Martijn Visser wrote: > Hi Yuan Kui, > > Thanks for creating the FLIP. A couple of questions / remarks > > 1. While the FLIP talks about watermark options for Table API & SQL, I only > see proposed syntax for SQL, not for the Table API. What is your proposal > for the Table API? > > 2. A rejected alternative is adding watermark related options in the SQL > DDL because it would make the DDL complex and lengthy. But looking at the > current hints for dynamic table options, we already provide the option to > specify or override table options dynamically [1]. So I would think that > these watermark options actually should be part of the SQL DDL and that a > user can use the dynamic table hints to specify/override these options if > needed. The advantage of putting these options in the SQL DDL is that the > user has one location where these types of options need to be specified. If > we would only do this via hints, the user would need to specify these > options in two places. What do you think? > > Best regards, > > Martijn > > [1] > > https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/sql/queries/hints/#dynamic-table-options > > On Wed, Feb 22, 2023 at 7:58 AM Shammon FY wrote: > > > Hi kui > > > > Thanks for initializing this discussion. I have two questions > > > > 1. What will happen if we add watermark related options in `the connector > > options`? Will the connector ignore these options or throw an exception? > > How can we support this? > > > > 2. If one table is used by two operators with different watermark params, > > what will happen? For example, there is a source table T and user submit > > the following job > > > > SELECT T1.id, T1.cnt_val, T2.sum_val FROM (SELECT id, count(val) as > cnt_val > > FROM T /*+ WATERMARK PARAMS(‘rowtime’ = ’time_column’, > ‘rowtime_expression’ > > = ’time_column - INTERVAL 5 SECOND') */ GROUP BY id) T1 JOIN (SELECT id, > > sum(val) as sum_val FROM T /*+ WATERMARK PARAMS(‘rowtime’ = > ’time_column’, > > ‘rowtime_expression’ = ’time_column - INTERVAL 10 SECOND') */ GROUP BY > id) > > T2 ON T1.id=T2.id > > > > Best, > > Shammon > > > > > > On Wed, Feb 22, 2023 at 11:28 AM 郑舒力 wrote: > > > > > Hi kui, > > > > > > There is a scenario using watermark that cannot be implemented through > > SQL > > > now: > > > We only support specify rowtime column in table DDL,if the rowtime > field > > > is generated by join dimension table , it cannot be implemented > > > > > > Can we consider implement through HINTS like : > > > Select * from t1 join t2 /*+ WATERMARK PARAMS(‘rowtime’ = > ’time_column’, > > > ‘rowtime_expression’ = ’time_column - INTERVAL 5 SECOND') */ > > > > > > > > > > > > > 2023年2月22日 10:22,kui yuan 写道: > > > > > > > > Hi devs, > > > > > > > > > > > > I'd like to start a discussion thread for FLIP-296[1]. This comes > from > > an > > > > offline discussion with @Yun Tang, and we hope to enrich table API & > > SQL > > > to > > > > support many watermark-related features which were only implemented > at > > > the > > > > datastream API level. > > > > > > > > > > > > Basically, we want to introduce watermark options in table API & SQL > > via > > > > SQL hint named 'WATERMARK_PARAMS' to support features: > > > > > > > > 1、Configurable watermark emit strategy > > > > > > > > 2、Dealing with idle sources > > > > > > > > 3、Watermark alignment > > > > > > > > > > > > Last but not least, thanks to Qingsheng and Jing Zhang for the > initial > > > > reviews. > > > > > > > > > > > > Looking forward to your thoughts and any feedback is appreciated! > > > > > > > > > > > > [1] > > > > > > > > > > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=240884405 > > > > > > > > > > > > Best > > > > > > > > Yuan Kui > > > > > > > > >
[jira](FLINK-31177) To introduce a formatter for Markdown files
As I mentioned in FLINK-31177 ( https://issues.apache.org/jira/browse/FLINK-31177): Currently, markdown files in *docs* are maintained and updated by many contributors, and different people have varying code style taste. By the way, as the syntax of markdown is not really strict, the styles tend to be inconsistent. To name a few, - Some prefer `*` to make a list item, while others may prefer `-`. - It is common to leave many unnecessary blank lines and spaces. - To make a divider, the number of `-` can be varying. To this end, I think it would be nicer to encourage or demand contributors to format their markdown files before making a pull request. Personally, I think Prettier (https://prettier.io/) is a good candidate. What do you think? -- Zhongpu Chen
Re: [DISCUSS] FLIP-296: Watermark options for table API & SQL
Hi Yuan Kui, Thanks for creating the FLIP. A couple of questions / remarks 1. While the FLIP talks about watermark options for Table API & SQL, I only see proposed syntax for SQL, not for the Table API. What is your proposal for the Table API? 2. A rejected alternative is adding watermark related options in the SQL DDL because it would make the DDL complex and lengthy. But looking at the current hints for dynamic table options, we already provide the option to specify or override table options dynamically [1]. So I would think that these watermark options actually should be part of the SQL DDL and that a user can use the dynamic table hints to specify/override these options if needed. The advantage of putting these options in the SQL DDL is that the user has one location where these types of options need to be specified. If we would only do this via hints, the user would need to specify these options in two places. What do you think? Best regards, Martijn [1] https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/sql/queries/hints/#dynamic-table-options On Wed, Feb 22, 2023 at 7:58 AM Shammon FY wrote: > Hi kui > > Thanks for initializing this discussion. I have two questions > > 1. What will happen if we add watermark related options in `the connector > options`? Will the connector ignore these options or throw an exception? > How can we support this? > > 2. If one table is used by two operators with different watermark params, > what will happen? For example, there is a source table T and user submit > the following job > > SELECT T1.id, T1.cnt_val, T2.sum_val FROM (SELECT id, count(val) as cnt_val > FROM T /*+ WATERMARK PARAMS(‘rowtime’ = ’time_column’, ‘rowtime_expression’ > = ’time_column - INTERVAL 5 SECOND') */ GROUP BY id) T1 JOIN (SELECT id, > sum(val) as sum_val FROM T /*+ WATERMARK PARAMS(‘rowtime’ = ’time_column’, > ‘rowtime_expression’ = ’time_column - INTERVAL 10 SECOND') */ GROUP BY id) > T2 ON T1.id=T2.id > > Best, > Shammon > > > On Wed, Feb 22, 2023 at 11:28 AM 郑舒力 wrote: > > > Hi kui, > > > > There is a scenario using watermark that cannot be implemented through > SQL > > now: > > We only support specify rowtime column in table DDL,if the rowtime field > > is generated by join dimension table , it cannot be implemented > > > > Can we consider implement through HINTS like : > > Select * from t1 join t2 /*+ WATERMARK PARAMS(‘rowtime’ = ’time_column’, > > ‘rowtime_expression’ = ’time_column - INTERVAL 5 SECOND') */ > > > > > > > > > 2023年2月22日 10:22,kui yuan 写道: > > > > > > Hi devs, > > > > > > > > > I'd like to start a discussion thread for FLIP-296[1]. This comes from > an > > > offline discussion with @Yun Tang, and we hope to enrich table API & > SQL > > to > > > support many watermark-related features which were only implemented at > > the > > > datastream API level. > > > > > > > > > Basically, we want to introduce watermark options in table API & SQL > via > > > SQL hint named 'WATERMARK_PARAMS' to support features: > > > > > > 1、Configurable watermark emit strategy > > > > > > 2、Dealing with idle sources > > > > > > 3、Watermark alignment > > > > > > > > > Last but not least, thanks to Qingsheng and Jing Zhang for the initial > > > reviews. > > > > > > > > > Looking forward to your thoughts and any feedback is appreciated! > > > > > > > > > [1] > > > > > > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=240884405 > > > > > > > > > Best > > > > > > Yuan Kui > > > > >
[jira] [Created] (FLINK-31181) Support LIKE operator pushdown
Grzegorz Kołakowski created FLINK-31181: --- Summary: Support LIKE operator pushdown Key: FLINK-31181 URL: https://issues.apache.org/jira/browse/FLINK-31181 Project: Flink Issue Type: Improvement Components: Connectors / JDBC Reporter: Grzegorz Kołakowski Filter pushdown has been introduced with [FLINK-16024|https://issues.apache.org/jira/browse/FLINK-16024] for selected unary and binary operators. We can extend the functionality to support pushdown for LIKE operator as well. -- This message was sent by Atlassian Jira (v8.20.10#820010)