Re: [PR] [FLINK-33449][table]Support array_contains_seq function [flink]
xuyangzhong commented on PR #23656: URL: https://github.com/apache/flink/pull/23656#issuecomment-1855089363 Hi, @leoyy0316 and @MartijnVisser . Let's talk in the jira instead of this pr. I shared my view there. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [FLINK-33449][table]Support array_contains_seq function [flink]
leoyy0316 commented on PR #23656: URL: https://github.com/apache/flink/pull/23656#issuecomment-1807423595 > > I have a question, why we always follow hive and spark? if we thinks is a feature need to support, we can implement it firstly. i think having rich functions is better > > Because we want to make sure that we have a consistent set of functions available, with function names that are consistent. If we pick random function names from different tools, that because more problematic. It might be that Calcite or any of the other tech solves the same problem but with a different function and/or different function signature. That's why we always try to look at what others are also doing. Then again, this discussion should have happened before a PR was opened. Either in the Jira, or on the Dev mailing list. See https://flink.apache.org/how-to-contribute/contribute-code/#1-create-jira-ticket-and-reach-consensus yeah, it is good. trino ,clickhouse, starrocks has the same function. if flink teams think we not need to support this function now. pls close this jira and pr. thanks -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [FLINK-33449][table]Support array_contains_seq function [flink]
MartijnVisser commented on PR #23656: URL: https://github.com/apache/flink/pull/23656#issuecomment-1805337202 > I have a question, why we always follow hive and spark? if we thinks is a feature need to support, we can implement it firstly. i think having rich functions is better Because we want to make sure that we have a consistent set of functions available, with function names that are consistent. If we pick random function names from different tools, that because more problematic. It might be that Calcite or any of the other tech solves the same problem but with a different function and/or different function signature. That's why we always try to look at what others are also doing. Then again, this discussion should have happened before a PR was opened. Either in the Jira, or on the Dev mailing list. See https://flink.apache.org/how-to-contribute/contribute-code/#1-create-jira-ticket-and-reach-consensus -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [FLINK-33449][table]Support array_contains_seq function [flink]
leoyy0316 commented on PR #23656: URL: https://github.com/apache/flink/pull/23656#issuecomment-1805324411 > > trino support this function, we need use this function, so can we suuport this function? > > I don't think Trino is the primary engine we compare ourselves against. Does the equivalent of this function exist in other databases/tech like Postgres, MySQL, Hive or Spark? I have a question, why we always follow hive and spark? if we thinks is a feature need to support, we can implement it firstly. i think having rich functions is better -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [FLINK-33449][table]Support array_contains_seq function [flink]
MartijnVisser commented on PR #23656: URL: https://github.com/apache/flink/pull/23656#issuecomment-1805286458 > trino support this function, we need use this function, so can we suuport this function? I don't think Trino is the primary engine we compare ourselves against. Does the equivalent of this function exist in other databases/tech like Postgres, MySQL, Hive or Spark? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [FLINK-33449][table]Support array_contains_seq function [flink]
leoyy0316 commented on PR #23656: URL: https://github.com/apache/flink/pull/23656#issuecomment-1793390742 > > I'm not sure that we want to support this function (I have no strong opinion there, but I don't see this in other technologies like Hive or Spark). However, without tests, this PR shouldn't get merged > > i will add ut. trino support this function, we need use this function, so can we suuport this function? also add ut, thanks -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [FLINK-33449][table]Support array_contains_seq function [flink]
MartijnVisser commented on PR #23656: URL: https://github.com/apache/flink/pull/23656#issuecomment-1792398513 > we need use this function, so can we suuport this function? That's not something that we should decide over a PR, but before opening a Jira ticket and a PR should be brought up on the Dev mailing list -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [FLINK-33449][table]Support array_contains_seq function [flink]
leoyy0316 commented on PR #23656: URL: https://github.com/apache/flink/pull/23656#issuecomment-1792384316 > I'm not sure that we want to support this function (I have no strong opinion there, but I don't see this in other technologies like Hive or Spark). However, without tests, this PR shouldn't get merged i will add ut. trino support this function, we need use this function, so can we suuport this function? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [FLINK-33449][table]Support array_contains_seq function [flink]
flinkbot commented on PR #23656: URL: https://github.com/apache/flink/pull/23656#issuecomment-1792360510 ## CI report: * 31d5c1e7a3896f8b5958dc56486f510c9bfdd263 UNKNOWN Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[PR] [FLINK-33449][table]Support array_contains_seq function [flink]
leoyy0316 opened a new pull request, #23656: URL: https://github.com/apache/flink/pull/23656 ## What is the purpose of the change *(For example: This pull request makes task deployment go through the blob server, rather than through RPC. That way we avoid re-transferring them on each deployment (during recovery).)* This is an implementation of ARRAY_Contains_seq ## Brief change log ARRAY_CONTAINS_SEQ for Table API and SQL Syntax: ARRAY_CONTAINS_SEQ (arr1, arr2) Returns: The function returns boolean Examples: *(for example:)* - *The TaskInfo is stored in the blob store on job creation time as a persistent artifact* - *Deployments RPC transmits only the blob storage reference* - *TaskManagers retrieve the TaskInfo from the blob cache* Flink SQL> select array_contains_seq(array[1,1,2,3],array[1,2]); ## Verifying this change Please make sure both new and modified tests in this PR follows the conventions defined in our code quality guide: https://flink.apache.org/contributing/code-style-and-quality-common.html#testing *(Please pick either of the following options)* This change is a trivial rework / code cleanup without any test coverage. *(or)* This change is already covered by existing tests, such as *(please describe tests)*. *(or)* This change added tests and can be verified as follows: *(example:)* - *Added integration tests for end-to-end deployment with large payloads (100MB)* - *Extended integration test for recovery after master (JobManager) failure* - *Added test that validates that TaskInfo is transferred only once across recoveries* - *Manually verified the change by running a 4 node cluster with 2 JobManagers and 4 TaskManagers, a stateful streaming program, and killing one JobManager and two TaskManagers during the execution, verifying that recovery happens correctly.* ## Does this pull request potentially affect one of the following parts: - Dependencies (does it add or upgrade a dependency): (yes / no) - The public API, i.e., is any changed class annotated with `@Public(Evolving)`: (yes / no) - The serializers: (yes / no / don't know) - The runtime per-record code paths (performance sensitive): (yes / no / don't know) - Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn, ZooKeeper: (yes / no / don't know) - The S3 file system connector: (yes / no / don't know) ## Documentation - Does this pull request introduce a new feature? (yes / no) - If yes, how is the feature documented? (not applicable / docs / JavaDocs / not documented) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org