[jira] [Updated] (FLINK-34345) Remove TaskExecutorManager related logic
[ https://issues.apache.org/jira/browse/FLINK-34345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rui Fan updated FLINK-34345: Affects Version/s: 1.19.0 > Remove TaskExecutorManager related logic > > > Key: FLINK-34345 > URL: https://issues.apache.org/jira/browse/FLINK-34345 > Project: Flink > Issue Type: Technical Debt >Affects Versions: 1.19.0 >Reporter: Caican Cai >Assignee: Caican Cai >Priority: Minor > Labels: pull-request-available > > Remove TaskExecutorManager related logic -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (FLINK-34345) Remove TaskExecutorManager related logic
[ https://issues.apache.org/jira/browse/FLINK-34345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rui Fan reassigned FLINK-34345: --- Assignee: Caican Cai > Remove TaskExecutorManager related logic > > > Key: FLINK-34345 > URL: https://issues.apache.org/jira/browse/FLINK-34345 > Project: Flink > Issue Type: Technical Debt >Reporter: Caican Cai >Assignee: Caican Cai >Priority: Minor > Labels: pull-request-available > > Remove TaskExecutorManager related logic -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [PR] [FLINK-34345][runtime] Remove TaskExecutorManager related logic [flink]
caicancai commented on PR #24257: URL: https://github.com/apache/flink/pull/24257#issuecomment-1925163414 @RocMarshal @1996fanrui If you have time, can you help me review it? Thank you. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [FLINK-34345][runtime] Remove TaskExecutorManager related logic [flink]
caicancai commented on PR #24257: URL: https://github.com/apache/flink/pull/24257#issuecomment-1925162917 related to https://issues.apache.org/jira/browse/FLINK-31449 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [FLINK-34345][runtime] Remove TaskExecutorManager related logic [flink]
flinkbot commented on PR #24257: URL: https://github.com/apache/flink/pull/24257#issuecomment-1925162848 ## CI report: * 38fe004b350cad98d065f0619409fcf120de8c69 UNKNOWN Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (FLINK-34345) Remove TaskExecutorManager related logic
[ https://issues.apache.org/jira/browse/FLINK-34345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated FLINK-34345: --- Labels: pull-request-available (was: ) > Remove TaskExecutorManager related logic > > > Key: FLINK-34345 > URL: https://issues.apache.org/jira/browse/FLINK-34345 > Project: Flink > Issue Type: Technical Debt >Reporter: Caican Cai >Priority: Minor > Labels: pull-request-available > > Remove TaskExecutorManager related logic -- This message was sent by Atlassian Jira (v8.20.10#820010)
[PR] [FLINK-34345][runtime] Remove TaskExecutorManager related logic [flink]
caicancai opened a new pull request, #24257: URL: https://github.com/apache/flink/pull/24257 ## What is the purpose of the change *(For example: This pull request makes task deployment go through the blob server, rather than through RPC. That way we avoid re-transferring them on each deployment (during recovery).)* ## Brief change log *(for example:)* - *The TaskInfo is stored in the blob store on job creation time as a persistent artifact* - *Deployments RPC transmits only the blob storage reference* - *TaskManagers retrieve the TaskInfo from the blob cache* ## Verifying this change Please make sure both new and modified tests in this PR follows the conventions defined in our code quality guide: https://flink.apache.org/contributing/code-style-and-quality-common.html#testing *(Please pick either of the following options)* This change is a trivial rework / code cleanup without any test coverage. *(or)* This change is already covered by existing tests, such as *(please describe tests)*. *(or)* This change added tests and can be verified as follows: *(example:)* - *Added integration tests for end-to-end deployment with large payloads (100MB)* - *Extended integration test for recovery after master (JobManager) failure* - *Added test that validates that TaskInfo is transferred only once across recoveries* - *Manually verified the change by running a 4 node cluster with 2 JobManagers and 4 TaskManagers, a stateful streaming program, and killing one JobManager and two TaskManagers during the execution, verifying that recovery happens correctly.* ## Does this pull request potentially affect one of the following parts: - Dependencies (does it add or upgrade a dependency): (yes / no) - The public API, i.e., is any changed class annotated with `@Public(Evolving)`: (yes / no) - The serializers: (yes / no / don't know) - The runtime per-record code paths (performance sensitive): (yes / no / don't know) - Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn, ZooKeeper: (yes / no / don't know) - The S3 file system connector: (yes / no / don't know) ## Documentation - Does this pull request introduce a new feature? (yes / no) - If yes, how is the feature documented? (not applicable / docs / JavaDocs / not documented) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (FLINK-34345) Remove TaskExecutorManager related logic
[ https://issues.apache.org/jira/browse/FLINK-34345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Caican Cai updated FLINK-34345: --- Issue Type: Technical Debt (was: Improvement) > Remove TaskExecutorManager related logic > > > Key: FLINK-34345 > URL: https://issues.apache.org/jira/browse/FLINK-34345 > Project: Flink > Issue Type: Technical Debt >Reporter: Caican Cai >Priority: Minor > > Remove TaskExecutorManager related logic -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-34345) Remove TaskExecutorManager related logic
Caican Cai created FLINK-34345: -- Summary: Remove TaskExecutorManager related logic Key: FLINK-34345 URL: https://issues.apache.org/jira/browse/FLINK-34345 Project: Flink Issue Type: Improvement Reporter: Caican Cai Remove TaskExecutorManager related logic -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-33734) Merge unaligned checkpoint state handle
[ https://issues.apache.org/jira/browse/FLINK-33734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17813874#comment-17813874 ] Rui Fan commented on FLINK-33734: - Thanks [~Feifan Wang] for the effort. IIUC, the checkpoint avg time is reduced from 82 s to 30s, right? IMHO, I think the checkpoint time is still too long. WDYT? > Merge unaligned checkpoint state handle > --- > > Key: FLINK-33734 > URL: https://issues.apache.org/jira/browse/FLINK-33734 > Project: Flink > Issue Type: Improvement > Components: Runtime / Checkpointing >Reporter: Feifan Wang >Assignee: Feifan Wang >Priority: Major > Labels: pull-request-available > > h3. Background > Unaligned checkpoint will write the inflight-data of all InputChannel and > ResultSubpartition of the same subtask to the same file during checkpoint. > The InputChannelStateHandle and ResultSubpartitionStateHandle organize the > metadata of inflight-data at the channel granularity, which causes the file > name to be repeated many times. When a job is under backpressure and task > parallelism is high, the metadata of unaligned checkpoints will bloat. This > will result in: > # The amount of data reported by taskmanager to jobmanager increases, and > jobmanager takes longer to process these RPC requests. > # The metadata of the entire checkpoint becomes very large, and it takes > longer to serialize and write it to dfs. > Both of the above points ultimately lead to longer checkpoint duration. > h3. A Production example > Take our production job with a parallelism of 4800 as an example: > # When there is no back pressure, checkpoint end-to-end duration is within 7 > seconds. > # When under pressure: checkpoint end-to-end duration often exceeds 1 > minute. We found that jobmanager took more than 40 seconds to process rpc > requests, and serialized metadata took more than 20 seconds.Some checkpoint > statistics: > |metadata file size|950 MB| > |channel state count|12,229,854| > |channel file count|5536| > Of the 950MB in the metadata file, 68% are redundant file paths. > We enabled log-based checkpoint on this job and hoped that the checkpoint > could be completed within 30 seconds. This problem made it difficult to > achieve this goal. > h3. Propose changes > I suggest introducing MergedInputChannelStateHandle and > MergedResultSubpartitionStateHandle to eliminate redundant file paths. > The taskmanager merges all InputChannelStateHandles with the same delegated > StreamStateHandle in the same subtask into one MergedInputChannelStateHandle > before reporting. When recovering from checkpoint, jobmangager converts > MergedInputChannelStateHandle to InputChannelStateHandle collection before > assigning state handle, and the rest of the process does not need to be > changed. > Structure of MergedInputChannelStateHandle : > > {code:java} > { // MergedInputChannelStateHandle > "delegate": { > "filePath": > "viewfs://hadoop-meituan/flink-yg15/checkpoints/retained/1234567/ab8d0c2f02a47586490b15e7a2c30555/chk-31/ffe54c0a-9b6e-4724-aae7-61b96bf8b1cf", > "stateSize": 123456 > }, > "size": 2000, > "subtaskIndex":0, > "channels": [ // One InputChannel per element > { > "info": { > "gateIdx": 0, > "inputChannelIdx": 0 > }, > "offsets": [ > 100,200,300,400 > ], > "size": 1400 > }, > { > "info": { > "gateIdx": 0, > "inputChannelIdx": 1 > }, > "offsets": [ > 500,600 > ], > "size": 600 > } > ] > } > {code} > MergedResultSubpartitionStateHandle is similar. > > > WDYT [~roman] , [~pnowojski] , [~fanrui] ? -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (FLINK-26369) Translate the part zh-page mixed with not be translated.
[ https://issues.apache.org/jira/browse/FLINK-26369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rui Fan resolved FLINK-26369. - Resolution: Fixed > Translate the part zh-page mixed with not be translated. > - > > Key: FLINK-26369 > URL: https://issues.apache.org/jira/browse/FLINK-26369 > Project: Flink > Issue Type: Bug > Components: Documentation >Affects Versions: 1.15.0 >Reporter: Aiden Gong >Assignee: Zhongqiang Gong >Priority: Minor > Labels: pull-request-available > Fix For: 1.19.0 > > > These file should be translated. > Files: > docs/content.zh/docs/deployment/ha/overview.md > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-26369) Translate the part zh-page mixed with not be translated.
[ https://issues.apache.org/jira/browse/FLINK-26369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17813872#comment-17813872 ] Rui Fan commented on FLINK-26369: - Merged to master(1.19.0) via : 5fe6f2024f77c03b2f2a897827765b209c42d5b0 > Translate the part zh-page mixed with not be translated. > - > > Key: FLINK-26369 > URL: https://issues.apache.org/jira/browse/FLINK-26369 > Project: Flink > Issue Type: Bug > Components: Documentation >Affects Versions: 1.15.0 >Reporter: Aiden Gong >Assignee: Zhongqiang Gong >Priority: Minor > Labels: pull-request-available > Fix For: 1.19.0 > > > These file should be translated. > Files: > docs/content.zh/docs/deployment/ha/overview.md > -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [PR] [FLINK-26369][doc-zh] Translate the part zh-page mixed with not be translated. [flink]
1996fanrui merged PR #20340: URL: https://github.com/apache/flink/pull/20340 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [FLINK-26369][doc-zh] Translate the part zh-page mixed with not be translated. [flink]
GOODBOY008 commented on PR #20340: URL: https://github.com/apache/flink/pull/20340#issuecomment-1925022155 @1996fanrui Branch has rebased, Thanks for your review. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [FLINK-31836][table] Upgrade Calcite version to 1.34.0 [flink]
flinkbot commented on PR #24256: URL: https://github.com/apache/flink/pull/24256#issuecomment-1925021000 ## CI report: * 6650bc89e66820688b8507d870762aa5f3e016b3 UNKNOWN Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (FLINK-31836) Upgrade Calcite version to 1.34.0
[ https://issues.apache.org/jira/browse/FLINK-31836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated FLINK-31836: --- Labels: pull-request-available (was: ) > Upgrade Calcite version to 1.34.0 > - > > Key: FLINK-31836 > URL: https://issues.apache.org/jira/browse/FLINK-31836 > Project: Flink > Issue Type: Technical Debt > Components: Table SQL / API >Reporter: Sergey Nuyanzin >Assignee: Sergey Nuyanzin >Priority: Major > Labels: pull-request-available > > Calcite 1.34.0 has been released > https://calcite.apache.org/news/2023/03/14/release-1.34.0/ -- This message was sent by Atlassian Jira (v8.20.10#820010)
[PR] [FLINK-31836][table] Upgrade Calcite version to 1.34.0 [flink]
snuyanzin opened a new pull request, #24256: URL: https://github.com/apache/flink/pull/24256 ## What is the purpose of the change This PR is about to upgrade Calcite to 1.34.0 it is based on https://github.com/apache/flink/pull/24255 once upgrade to 1.33.0 will be approved and merged I will rebase this one Since 1.34.0 was a short release the upgrade is relatively simple, so only the latest commit is for 1.34.0 upgrade ## Verifying this change This change is already covered by existing tests ## Does this pull request potentially affect one of the following parts: - Dependencies (does it add or upgrade a dependency): ( yes) - The public API, i.e., is any changed class annotated with `@Public(Evolving)`: ( no) - The serializers: ( no) - The runtime per-record code paths (performance sensitive): (yes / no / don't know) - Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn, ZooKeeper: ( no ) - The S3 file system connector: ( no) ## Documentation - Does this pull request introduce a new feature? ( no) - If yes, how is the feature documented? (not applicable) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Assigned] (FLINK-26369) Translate the part zh-page mixed with not be translated.
[ https://issues.apache.org/jira/browse/FLINK-26369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rui Fan reassigned FLINK-26369: --- Assignee: Zhongqiang Gong > Translate the part zh-page mixed with not be translated. > - > > Key: FLINK-26369 > URL: https://issues.apache.org/jira/browse/FLINK-26369 > Project: Flink > Issue Type: Bug > Components: Documentation >Affects Versions: 1.15.0 >Reporter: Aiden Gong >Assignee: Zhongqiang Gong >Priority: Minor > Labels: pull-request-available > Fix For: 1.19.0 > > > These file should be translated. > Files: > docs/content.zh/docs/deployment/ha/overview.md > -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [PR] [FLINK-26369][doc-zh] Translate the part zh-page mixed with not be translated. [flink]
1996fanrui commented on PR #20340: URL: https://github.com/apache/flink/pull/20340#issuecomment-1925001112 Hi @GOODBOY008 , would you mind rebasing the master branch? If I remember correctly, this ci isuue has been fixed in master branch. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [FLINK-26369][doc-zh] Translate the part zh-page mixed with not be translated. [flink]
GOODBOY008 commented on PR #20340: URL: https://github.com/apache/flink/pull/20340#issuecomment-1924979780 @flinkbot run azure -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [FLINK-31362][table] Upgrade to Calcite version to 1.33.0 [flink]
flinkbot commented on PR #24255: URL: https://github.com/apache/flink/pull/24255#issuecomment-1924979141 ## CI report: * b9d77b02be5ddc59ed2596babcbcce7d49f92c4a UNKNOWN Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (FLINK-31362) Upgrade to Calcite version to 1.33.0
[ https://issues.apache.org/jira/browse/FLINK-31362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated FLINK-31362: --- Labels: pull-request-available (was: ) > Upgrade to Calcite version to 1.33.0 > > > Key: FLINK-31362 > URL: https://issues.apache.org/jira/browse/FLINK-31362 > Project: Flink > Issue Type: Improvement > Components: Table SQL / API >Reporter: Aitozi >Assignee: Sergey Nuyanzin >Priority: Major > Labels: pull-request-available > > In Calcite 1.33.0, C-style escape strings have been supported. We could > leverage it to enhance our string literals usage. > issue: https://issues.apache.org/jira/browse/CALCITE-5305 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[PR] [FLINK-31362][table] Upgrade to Calcite version to 1.33.0 [flink]
snuyanzin opened a new pull request, #24255: URL: https://github.com/apache/flink/pull/24255 ## What is the purpose of the change The PR is about to bump Calcite to 1.33.0 ## Brief change log *(for example:)* - *The TaskInfo is stored in the blob store on job creation time as a persistent artifact* - *Deployments RPC transmits only the blob storage reference* - *TaskManagers retrieve the TaskInfo from the blob cache* ## Verifying this change This change is already covered by existing tests + some tests are changed because of changes in Calcite ## Does this pull request potentially affect one of the following parts: - Dependencies (does it add or upgrade a dependency): (yes ) - The public API, i.e., is any changed class annotated with `@Public(Evolving)`: (no) - The serializers: ( no ) - The runtime per-record code paths (performance sensitive): (no ) - Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn, ZooKeeper: (no ) - The S3 file system connector: (no ) ## Documentation - Does this pull request introduce a new feature? ( no) - If yes, how is the feature documented? (not applicable) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (FLINK-34014) Jdbc connector can avoid send empty insert to database when there's no buffer data
[ https://issues.apache.org/jira/browse/FLINK-34014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated FLINK-34014: --- Labels: pull-request-available (was: ) > Jdbc connector can avoid send empty insert to database when there's no buffer > data > -- > > Key: FLINK-34014 > URL: https://issues.apache.org/jira/browse/FLINK-34014 > Project: Flink > Issue Type: Improvement > Components: Connectors / JDBC >Reporter: luoyuxia >Priority: Major > Labels: pull-request-available > > In jdbc connector, we will have a background thread to flush buffered data to > database, but when no data is in buffer, we can avoid the flush to database. > we can avoid it in method JdbcOutputFormat#attemptFlush or in > JdbcBatchStatementExecutor like TableBufferedStatementExecutor which can > aovid calling {{statementExecutor.executeBatch()}} when buffer is empty -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [PR] [FLINK-34014][jdbc] Avoid executeBatch when buffer is empty [flink-connector-jdbc]
boring-cyborg[bot] commented on PR #100: URL: https://github.com/apache/flink-connector-jdbc/pull/100#issuecomment-1924778857 Thanks for opening this pull request! Please check out our contributing guidelines. (https://flink.apache.org/contributing/how-to-contribute.html) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (FLINK-34152) Tune memory of autoscaled jobs
[ https://issues.apache.org/jira/browse/FLINK-34152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17813772#comment-17813772 ] Emanuele Pirro commented on FLINK-34152: Thanks Maximilian for reporting this. It's indeed an issue, especially the `OutOfMemoryErrors` on scale in. > Tune memory of autoscaled jobs > -- > > Key: FLINK-34152 > URL: https://issues.apache.org/jira/browse/FLINK-34152 > Project: Flink > Issue Type: New Feature > Components: Autoscaler, Kubernetes Operator >Reporter: Maximilian Michels >Assignee: Maximilian Michels >Priority: Major > Labels: pull-request-available > Fix For: kubernetes-operator-1.8.0 > > > The current autoscaling algorithm adjusts the parallelism of the job task > vertices according to the processing needs. By adjusting the parallelism, we > systematically scale the amount of CPU for a task. At the same time, we also > indirectly change the amount of memory tasks have at their dispense. However, > there are some problems with this. > # Memory is overprovisioned: On scale up we may add more memory than we > actually need. Even on scale down, the memory / cpu ratio can still be off > and too much memory is used. > # Memory is underprovisioned: For stateful jobs, we risk running into > OutOfMemoryErrors on scale down. Even before running out of memory, too > little memory can have a negative impact on the effectiveness of the scaling. > We lack the capability to tune memory proportionally to the processing needs. > In the same way that we measure CPU usage and size the tasks accordingly, we > need to evaluate memory usage and adjust the heap memory size. > A tuning algorithm could look like this: > h2. 1. Establish a memory baseline > We observe the average heap memory usage at task managers. > h2. 2. Calculate memory usage per record > The memory requirements per record can be estimated by calculating this ratio: > {noformat} > memory_per_rec = sum(heap_usage) / sum(records_processed) > {noformat} > This ratio is surprisingly constant based off empirical data. > h2. 3. Scale memory proportionally to the per-record memory needs > {noformat} > memory_per_tm = expected_records_per_sec * memory_per_rec / num_task_managers > {noformat} > A minimum memory limit needs to be added to avoid scaling down memory too > much. The max memory per TM should be equal to the initially defined > user-specified limit from the ResourceSpec. > {noformat} > memory_per_tm = max(min_limit, memory_per_tm) > memory_per_tm = min(max_limit, memory_per_tm) {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [PR] [FLINK-26088][Connectors/ElasticSearch] Add Elasticsearch 8.0 support [flink-connector-elasticsearch]
reta commented on PR #53: URL: https://github.com/apache/flink-connector-elasticsearch/pull/53#issuecomment-1924270597 @MartijnVisser @mtfelisb it looks great, I have no more comments, it would be great to have e2e tests added but we could also do that in separate pull request. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (FLINK-34325) Inconsistent state with data loss after OutOfMemoryError
[ https://issues.apache.org/jira/browse/FLINK-34325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexis Sarda-Espinosa updated FLINK-34325: -- Description: I have a job that uses broadcast state to maintain a cache of required metadata. I am currently evaluating memory requirements of my specific use case, and I ran into a weird situation that seems worrisome. All sources in my job are Kafka sources. I wrote a large amount of messages in Kafka to force the broadcast state's cache to grow. At some point, this caused an "{{java.lang.OutOfMemoryError: Java heap space}}" error in the -Job- Task Manager. I would have expected the whole java process of the TM to crash, but the job was simply restarted. What's worrisome is that, after 2 checkpoint failures ^1^, the job restarted and subsequently resumed from the latest successful checkpoint and completely ignored all the events I wrote to Kafka, which I can verify because I have a custom metric that exposes the approximate size of this cache, and the fact that the job didn't crashloop at this point after reading all the messages from Kafka over and over again. I'm attaching an excerpt of the Job Manager's logs. My main concerns are: # It seems the memory error somehow prevented the Kafka offsets from being "rolled back", so eventually the Kafka events that should have ended in the broadcast state's cache were ignored. # -Is it normal that the state is somehow "materialized" in the JM and is thus affected by the size of the JM's heap? Is this something particular due to the use of broadcast state? I found this very surprising.- See comments ^1^ Two failures are expected since {{execution.checkpointing.tolerable-failed-checkpoints=1}} was: I have a job that uses broadcast state to maintain a cache of required metadata. I am currently evaluating memory requirements of my specific use case, and I ran into a weird situation that seems worrisome. All sources in my job are Kafka sources. I wrote a large amount of messages in Kafka to force the broadcast state's cache to grow. At some point, this caused an "{{java.lang.OutOfMemoryError: Java heap space}}" error in the -Job- Task Manager. I would have expected the whole java process of the TM to crash, but the job was simply restarted. What's worrisome is that, after 2 checkpoint failures ^1^, the job restarted and subsequently resumed from the latest successful checkpoint and completely ignored all the events I wrote to Kafka, which I can verify because I have a custom metric that exposes the approximate size of this cache, and the fact that the job didn't crashloop at this point after reading all the messages from Kafka over and over again. I'm attaching an excerpt of the Job Manager's logs. My main concerns are: # It seems the memory error from the JM didn't prevent the Kafka offsets from being "rolled back", so eventually the Kafka events that should have ended in the broadcast state's cache were ignored. # -Is it normal that the state is somehow "materialized" in the JM and is thus affected by the size of the JM's heap? Is this something particular due to the use of broadcast state? I found this very surprising.- See comments ^1^ Two failures are expected since {{execution.checkpointing.tolerable-failed-checkpoints=1}} > Inconsistent state with data loss after OutOfMemoryError > > > Key: FLINK-34325 > URL: https://issues.apache.org/jira/browse/FLINK-34325 > Project: Flink > Issue Type: Bug >Affects Versions: 1.17.1 > Environment: Flink on Kubernetes with HA, RocksDB with incremental > checkpoints on Azure >Reporter: Alexis Sarda-Espinosa >Priority: Major > Attachments: jobmanager_log.txt > > > I have a job that uses broadcast state to maintain a cache of required > metadata. I am currently evaluating memory requirements of my specific use > case, and I ran into a weird situation that seems worrisome. > All sources in my job are Kafka sources. I wrote a large amount of messages > in Kafka to force the broadcast state's cache to grow. At some point, this > caused an "{{java.lang.OutOfMemoryError: Java heap space}}" error in the > -Job- Task Manager. I would have expected the whole java process of the TM to > crash, but the job was simply restarted. What's worrisome is that, after 2 > checkpoint failures ^1^, the job restarted and subsequently resumed from the > latest successful checkpoint and completely ignored all the events I wrote to > Kafka, which I can verify because I have a custom metric that exposes the > approximate size of this cache, and the fact that the job didn't crashloop at > this point after reading all the messages from Kafka over and over again. > I'm attaching an excerpt of the Job Manager's logs. My main concerns
[jira] [Updated] (FLINK-34325) Inconsistent state with data loss after OutOfMemoryError
[ https://issues.apache.org/jira/browse/FLINK-34325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexis Sarda-Espinosa updated FLINK-34325: -- Description: I have a job that uses broadcast state to maintain a cache of required metadata. I am currently evaluating memory requirements of my specific use case, and I ran into a weird situation that seems worrisome. All sources in my job are Kafka sources. I wrote a large amount of messages in Kafka to force the broadcast state's cache to grow. At some point, this caused an "{{java.lang.OutOfMemoryError: Java heap space}}" error in the -Job- Task Manager. I would have expected the whole java process of the TM to crash, but the job was simply restarted. What's worrisome is that, after 2 checkpoint failures ^1^, the job restarted and subsequently resumed from the latest successful checkpoint and completely ignored all the events I wrote to Kafka, which I can verify because I have a custom metric that exposes the approximate size of this cache, and the fact that the job didn't crashloop at this point after reading all the messages from Kafka over and over again. I'm attaching an excerpt of the Job Manager's logs. My main concerns are: # It seems the memory error from the JM didn't prevent the Kafka offsets from being "rolled back", so eventually the Kafka events that should have ended in the broadcast state's cache were ignored. # -Is it normal that the state is somehow "materialized" in the JM and is thus affected by the size of the JM's heap? Is this something particular due to the use of broadcast state? I found this very surprising.- See comments ^1^ Two failures are expected since {{execution.checkpointing.tolerable-failed-checkpoints=1}} was: I have a job that uses broadcast state to maintain a cache of required metadata. I am currently evaluating memory requirements of my specific use case, and I ran into a weird situation that seems worrisome. All sources in my job are Kafka sources. I wrote a large amount of messages in Kafka to force the broadcast state's cache to grow. At some point, this caused an "{{java.lang.OutOfMemoryError: Java heap space}}" error in the Job Manager. I would have expected the whole java process of the JM to crash, but the job was simply restarted. What's worrisome is that, after 2 checkpoint failures ^1^, the job restarted and subsequently resumed from the latest successful checkpoint and completely ignored all the events I wrote to Kafka, which I can verify because I have a custom metric that exposes the approximate size of this cache, and the fact that the job didn't crashloop at this point after reading all the messages from Kafka over and over again. I'm attaching an excerpt of the Job Manager's logs. My main concerns are: # It seems the memory error from the JM didn't prevent the Kafka offsets from being "rolled back", so eventually the Kafka events that should have ended in the broadcast state's cache were ignored. # -Is it normal that the state is somehow "materialized" in the JM and is thus affected by the size of the JM's heap? Is this something particular due to the use of broadcast state? I found this very surprising.- See comments ^1^ Two failures are expected since {{execution.checkpointing.tolerable-failed-checkpoints=1}} > Inconsistent state with data loss after OutOfMemoryError > > > Key: FLINK-34325 > URL: https://issues.apache.org/jira/browse/FLINK-34325 > Project: Flink > Issue Type: Bug >Affects Versions: 1.17.1 > Environment: Flink on Kubernetes with HA, RocksDB with incremental > checkpoints on Azure >Reporter: Alexis Sarda-Espinosa >Priority: Major > Attachments: jobmanager_log.txt > > > I have a job that uses broadcast state to maintain a cache of required > metadata. I am currently evaluating memory requirements of my specific use > case, and I ran into a weird situation that seems worrisome. > All sources in my job are Kafka sources. I wrote a large amount of messages > in Kafka to force the broadcast state's cache to grow. At some point, this > caused an "{{java.lang.OutOfMemoryError: Java heap space}}" error in the > -Job- Task Manager. I would have expected the whole java process of the TM to > crash, but the job was simply restarted. What's worrisome is that, after 2 > checkpoint failures ^1^, the job restarted and subsequently resumed from the > latest successful checkpoint and completely ignored all the events I wrote to > Kafka, which I can verify because I have a custom metric that exposes the > approximate size of this cache, and the fact that the job didn't crashloop at > this point after reading all the messages from Kafka over and over again. > I'm attaching an excerpt of the Job Manager's logs. My main
Re: [PR] [FLINK-33396][table-planner] Fix table alias not be cleared in subquery [flink]
jeyhunkarimov commented on code in PR #24239: URL: https://github.com/apache/flink/pull/24239#discussion_r1476267210 ## flink-table/flink-table-planner/src/main/scala/org/apache/flink/table/planner/plan/utils/RelTreeWriterImpl.scala: ## @@ -286,4 +297,45 @@ class RelTreeWriterImpl( }) } } + + private def copy(pw: PrintWriter): RelTreeWriterImpl = { +new RelTreeWriterImpl( + pw, + explainLevel, + withIdPrefix, + withChangelogTraits, + withRowType, + withTreeStyle, + withUpsertKey, + withQueryHint, + withQueryBlockAlias, + statementNum, + withAdvice, + withRicherDetailInSubQuery) + } + + /** + * Mainly copy from [[RexSubQuery#computeDigest]]. + * + * Modified to support explain sub-query with richer additional detail by [[RelTreeWriterImpl]] in + * Flink rather than Calcite. + */ + private def computeSubQueryRicherDigest(subQuery: RexSubQuery): String = { +val sb = new StringBuilder(subQuery.getOperator.getName); +sb.append("(") +subQuery.getOperands.forEach( + operand => { +sb.append(operand) +sb.append(", ") + }) +sb.append("{\n") + +val sw = new StringWriter +val newRelTreeWriter = copy(new PrintWriter(sw)) +subQuery.rel.explain(newRelTreeWriter) +sb.append(sw.toString) + +sb.append("})") +sb.toString() + } Review Comment: IMO we should keep the scope of this PR to remove the hints about alias. And in the tests, we should only see the change, `query_plan_with_alias_hint` -> `the_exact_same_query_plan_without_unnecessary_alias`. Would that make sense to you? ## flink-table/flink-table-planner/src/main/scala/org/apache/flink/table/planner/plan/utils/RelTreeWriterImpl.scala: ## @@ -189,7 +191,16 @@ class RelTreeWriterImpl( case value => if (j == 0) s.append("(") else s.append(", ") j = j + 1 - s.append(value.left).append("=[").append(value.right).append("]") + val rightStr = value.right match { +case subQuery: RexSubQuery => + if (withRicherDetailInSubQuery) { +computeSubQueryRicherDigest(subQuery) + } else { +value.right.toString + } +case _ => value.right.toString + } + s.append(value.left).append("=[").append(rightStr).append("]") Review Comment: Is this block needed for the fixing the table alias clearance? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (FLINK-34343) ResourceManager registration is not completed when registering the JobMaster
[ https://issues.apache.org/jira/browse/FLINK-34343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matthias Pohl updated FLINK-34343: -- Priority: Critical (was: Blocker) > ResourceManager registration is not completed when registering the JobMaster > > > Key: FLINK-34343 > URL: https://issues.apache.org/jira/browse/FLINK-34343 > Project: Flink > Issue Type: Bug > Components: Runtime / Coordination, Runtime / RPC >Affects Versions: 1.17.2, 1.19.0, 1.18.1 >Reporter: Matthias Pohl >Priority: Critical > Labels: test-stability > Attachments: FLINK-34343_k8s_application_cluster_e2e_test.log > > > https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=57203=logs=64debf87-ecdb-5aef-788d-8720d341b5cb=2302fb98-0839-5df2-3354-bbae636f81a7=8066 > The test run failed due to a NullPointerException: > {code} > Feb 02 01:11:55 2024-02-02 01:11:47,791 INFO > org.apache.flink.runtime.rpc.pekko.FencedPekkoRpcActor [] - The rpc > endpoint > org.apache.flink.runtime.resourcemanager.active.ActiveResourceManager has not > been started yet. Discarding message LocalFencedMessage(000 > 0, > LocalRpcInvocation(ResourceManagerGateway.registerJobMaster(JobMasterId, > ResourceID, String, JobID, Time))) until processing is started. > Feb 02 01:11:55 2024-02-02 01:11:47,797 WARN > org.apache.flink.runtime.rpc.pekko.SupervisorActor [] - RpcActor > pekko://flink/user/rpc/resourcemanager_2 has failed. Shutting it down now. > Feb 02 01:11:55 java.lang.NullPointerException: Cannot invoke > "org.apache.flink.runtime.rpc.RpcServer.getAddress()" because > "this.rpcServer" is null > Feb 02 01:11:55 at > org.apache.flink.runtime.rpc.RpcEndpoint.getAddress(RpcEndpoint.java:322) > ~[flink-dist-1.19-SNAPSHOT.jar:1.19-SNAPSHOT] > Feb 02 01:11:55 at > org.apache.flink.runtime.rpc.pekko.PekkoRpcActor.handleMessage(PekkoRpcActor.java:182) > ~[flink-rpc-akka06a9bb81-2e68-483a-b236-a283d0b1d097.jar:1.19-SNAPSHOT] > Feb 02 01:11:55 at > org.apache.pekko.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:33) > ~[flink-rpc-akka06a9bb81-2e68-483a-b236-a283d0b1d097.jar:1.19-SNAPSHOT] > Feb 02 01:11:55 at > org.apache.pekko.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:29) > ~[flink-rpc-akka06a9bb81-2e68-483a-b236-a283d0b1d097.jar:1.19-SNAPSHOT] > Feb 02 01:11:55 at > scala.PartialFunction.applyOrElse(PartialFunction.scala:127) > ~[flink-rpc-akka06a9bb81-2e68-483a-b236-a283d0b1d097.jar:1.19-SNAPSHOT] > Feb 02 01:11:55 at > scala.PartialFunction.applyOrElse$(PartialFunction.scala:126) > ~[flink-rpc-akka06a9bb81-2e68-483a-b236-a283d0b1d097.jar:1.19-SNAPSHOT] > Feb 02 01:11:55 at > org.apache.pekko.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala:29) > ~[flink-rpc-akka06a9bb81-2e68-483a-b236-a283d0b1d097.jar:1.19-SNAPSHOT] > Feb 02 01:11:55 at > scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:175) > ~[flink-rpc-akka06a9bb81-2e68-483a-b236-a283d0b1d097.jar:1.19-SNAPSHOT] > Feb 02 01:11:55 at > scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:176) > ~[flink-rpc-akka06a9bb81-2e68-483a-b236-a283d0b1d097.jar:1.19-SNAPSHOT] > Feb 02 01:11:55 at > scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:176) > ~[flink-rpc-akka06a9bb81-2e68-483a-b236-a283d0b1d097.jar:1.19-SNAPSHOT] > Feb 02 01:11:55 at > org.apache.pekko.actor.Actor.aroundReceive(Actor.scala:547) > ~[flink-rpc-akka06a9bb81-2e68-483a-b236-a283d0b1d097.jar:1.19-SNAPSHOT] > Feb 02 01:11:55 at > org.apache.pekko.actor.Actor.aroundReceive$(Actor.scala:545) > ~[flink-rpc-akka06a9bb81-2e68-483a-b236-a283d0b1d097.jar:1.19-SNAPSHOT] > Feb 02 01:11:55 at > org.apache.pekko.actor.AbstractActor.aroundReceive(AbstractActor.scala:229) > ~[flink-rpc-akka06a9bb81-2e68-483a-b236-a283d0b1d097.jar:1.19-SNAPSHOT] > Feb 02 01:11:55 at > org.apache.pekko.actor.ActorCell.receiveMessage(ActorCell.scala:590) > ~[flink-rpc-akka06a9bb81-2e68-483a-b236-a283d0b1d097.jar:1.19-SNAPSHOT] > Feb 02 01:11:55 at > org.apache.pekko.actor.ActorCell.invoke(ActorCell.scala:557) > ~[flink-rpc-akka06a9bb81-2e68-483a-b236-a283d0b1d097.jar:1.19-SNAPSHOT] > Feb 02 01:11:55 at > org.apache.pekko.dispatch.Mailbox.processMailbox(Mailbox.scala:280) > ~[flink-rpc-akka06a9bb81-2e68-483a-b236-a283d0b1d097.jar:1.19-SNAPSHOT] > Feb 02 01:11:55 at > org.apache.pekko.dispatch.Mailbox.run(Mailbox.scala:241) > ~[flink-rpc-akka06a9bb81-2e68-483a-b236-a283d0b1d097.jar:1.19-SNAPSHOT] > Feb 02 01:11:55 at > org.apache.pekko.dispatch.Mailbox.exec(Mailbox.scala:253) >
[jira] [Updated] (FLINK-34343) ResourceManager registration is not completed when registering the JobMaster
[ https://issues.apache.org/jira/browse/FLINK-34343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matthias Pohl updated FLINK-34343: -- Affects Version/s: 1.18.1 1.17.2 > ResourceManager registration is not completed when registering the JobMaster > > > Key: FLINK-34343 > URL: https://issues.apache.org/jira/browse/FLINK-34343 > Project: Flink > Issue Type: Bug > Components: Runtime / Coordination, Runtime / RPC >Affects Versions: 1.17.2, 1.19.0, 1.18.1 >Reporter: Matthias Pohl >Priority: Blocker > Labels: test-stability > Attachments: FLINK-34343_k8s_application_cluster_e2e_test.log > > > https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=57203=logs=64debf87-ecdb-5aef-788d-8720d341b5cb=2302fb98-0839-5df2-3354-bbae636f81a7=8066 > The test run failed due to a NullPointerException: > {code} > Feb 02 01:11:55 2024-02-02 01:11:47,791 INFO > org.apache.flink.runtime.rpc.pekko.FencedPekkoRpcActor [] - The rpc > endpoint > org.apache.flink.runtime.resourcemanager.active.ActiveResourceManager has not > been started yet. Discarding message LocalFencedMessage(000 > 0, > LocalRpcInvocation(ResourceManagerGateway.registerJobMaster(JobMasterId, > ResourceID, String, JobID, Time))) until processing is started. > Feb 02 01:11:55 2024-02-02 01:11:47,797 WARN > org.apache.flink.runtime.rpc.pekko.SupervisorActor [] - RpcActor > pekko://flink/user/rpc/resourcemanager_2 has failed. Shutting it down now. > Feb 02 01:11:55 java.lang.NullPointerException: Cannot invoke > "org.apache.flink.runtime.rpc.RpcServer.getAddress()" because > "this.rpcServer" is null > Feb 02 01:11:55 at > org.apache.flink.runtime.rpc.RpcEndpoint.getAddress(RpcEndpoint.java:322) > ~[flink-dist-1.19-SNAPSHOT.jar:1.19-SNAPSHOT] > Feb 02 01:11:55 at > org.apache.flink.runtime.rpc.pekko.PekkoRpcActor.handleMessage(PekkoRpcActor.java:182) > ~[flink-rpc-akka06a9bb81-2e68-483a-b236-a283d0b1d097.jar:1.19-SNAPSHOT] > Feb 02 01:11:55 at > org.apache.pekko.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:33) > ~[flink-rpc-akka06a9bb81-2e68-483a-b236-a283d0b1d097.jar:1.19-SNAPSHOT] > Feb 02 01:11:55 at > org.apache.pekko.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:29) > ~[flink-rpc-akka06a9bb81-2e68-483a-b236-a283d0b1d097.jar:1.19-SNAPSHOT] > Feb 02 01:11:55 at > scala.PartialFunction.applyOrElse(PartialFunction.scala:127) > ~[flink-rpc-akka06a9bb81-2e68-483a-b236-a283d0b1d097.jar:1.19-SNAPSHOT] > Feb 02 01:11:55 at > scala.PartialFunction.applyOrElse$(PartialFunction.scala:126) > ~[flink-rpc-akka06a9bb81-2e68-483a-b236-a283d0b1d097.jar:1.19-SNAPSHOT] > Feb 02 01:11:55 at > org.apache.pekko.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala:29) > ~[flink-rpc-akka06a9bb81-2e68-483a-b236-a283d0b1d097.jar:1.19-SNAPSHOT] > Feb 02 01:11:55 at > scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:175) > ~[flink-rpc-akka06a9bb81-2e68-483a-b236-a283d0b1d097.jar:1.19-SNAPSHOT] > Feb 02 01:11:55 at > scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:176) > ~[flink-rpc-akka06a9bb81-2e68-483a-b236-a283d0b1d097.jar:1.19-SNAPSHOT] > Feb 02 01:11:55 at > scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:176) > ~[flink-rpc-akka06a9bb81-2e68-483a-b236-a283d0b1d097.jar:1.19-SNAPSHOT] > Feb 02 01:11:55 at > org.apache.pekko.actor.Actor.aroundReceive(Actor.scala:547) > ~[flink-rpc-akka06a9bb81-2e68-483a-b236-a283d0b1d097.jar:1.19-SNAPSHOT] > Feb 02 01:11:55 at > org.apache.pekko.actor.Actor.aroundReceive$(Actor.scala:545) > ~[flink-rpc-akka06a9bb81-2e68-483a-b236-a283d0b1d097.jar:1.19-SNAPSHOT] > Feb 02 01:11:55 at > org.apache.pekko.actor.AbstractActor.aroundReceive(AbstractActor.scala:229) > ~[flink-rpc-akka06a9bb81-2e68-483a-b236-a283d0b1d097.jar:1.19-SNAPSHOT] > Feb 02 01:11:55 at > org.apache.pekko.actor.ActorCell.receiveMessage(ActorCell.scala:590) > ~[flink-rpc-akka06a9bb81-2e68-483a-b236-a283d0b1d097.jar:1.19-SNAPSHOT] > Feb 02 01:11:55 at > org.apache.pekko.actor.ActorCell.invoke(ActorCell.scala:557) > ~[flink-rpc-akka06a9bb81-2e68-483a-b236-a283d0b1d097.jar:1.19-SNAPSHOT] > Feb 02 01:11:55 at > org.apache.pekko.dispatch.Mailbox.processMailbox(Mailbox.scala:280) > ~[flink-rpc-akka06a9bb81-2e68-483a-b236-a283d0b1d097.jar:1.19-SNAPSHOT] > Feb 02 01:11:55 at > org.apache.pekko.dispatch.Mailbox.run(Mailbox.scala:241) > ~[flink-rpc-akka06a9bb81-2e68-483a-b236-a283d0b1d097.jar:1.19-SNAPSHOT] > Feb 02 01:11:55 at > org.apache.pekko.dispatch.Mailbox.exec(Mailbox.scala:253) >
[jira] [Commented] (FLINK-33819) Support setting CompressType in RocksDBStateBackend
[ https://issues.apache.org/jira/browse/FLINK-33819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17813740#comment-17813740 ] Piotr Nowojski commented on FLINK-33819: Thanks [~mayuehappy]. {quote} merged 4f7725aa into master which is not blocked by the performance result. I also think it's worthy discussing and we could find a better default behavious in the next version. {quote} +1 Yes > Support setting CompressType in RocksDBStateBackend > --- > > Key: FLINK-33819 > URL: https://issues.apache.org/jira/browse/FLINK-33819 > Project: Flink > Issue Type: Improvement > Components: Runtime / State Backends >Affects Versions: 1.18.0 >Reporter: Yue Ma >Assignee: Yue Ma >Priority: Major > Labels: pull-request-available > Fix For: 1.19.0 > > Attachments: image-2023-12-14-11-32-32-968.png, > image-2023-12-14-11-35-22-306.png > > > Currently, RocksDBStateBackend does not support setting the compression > level, and Snappy is used for compression by default. But we have some > scenarios where compression will use a lot of CPU resources. Turning off > compression can significantly reduce CPU overhead. So we may need to support > a parameter for users to set the CompressType of Rocksdb. > !image-2023-12-14-11-35-22-306.png! -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Comment Edited] (FLINK-33819) Support setting CompressType in RocksDBStateBackend
[ https://issues.apache.org/jira/browse/FLINK-33819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17813740#comment-17813740 ] Piotr Nowojski edited comment on FLINK-33819 at 2/2/24 3:52 PM: Thanks [~mayuehappy]. {quote} merged 4f7725aa into master which is not blocked by the performance result. I also think it's worthy discussing and we could find a better default behavious in the next version. {quote} +1 was (Author: pnowojski): Thanks [~mayuehappy]. {quote} merged 4f7725aa into master which is not blocked by the performance result. I also think it's worthy discussing and we could find a better default behavious in the next version. {quote} +1 Yes > Support setting CompressType in RocksDBStateBackend > --- > > Key: FLINK-33819 > URL: https://issues.apache.org/jira/browse/FLINK-33819 > Project: Flink > Issue Type: Improvement > Components: Runtime / State Backends >Affects Versions: 1.18.0 >Reporter: Yue Ma >Assignee: Yue Ma >Priority: Major > Labels: pull-request-available > Fix For: 1.19.0 > > Attachments: image-2023-12-14-11-32-32-968.png, > image-2023-12-14-11-35-22-306.png > > > Currently, RocksDBStateBackend does not support setting the compression > level, and Snappy is used for compression by default. But we have some > scenarios where compression will use a lot of CPU resources. Turning off > compression can significantly reduce CPU overhead. So we may need to support > a parameter for users to set the CompressType of Rocksdb. > !image-2023-12-14-11-35-22-306.png! -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Reopened] (FLINK-33696) FLIP-385: Add OpenTelemetryTraceReporter and OpenTelemetryMetricReporter
[ https://issues.apache.org/jira/browse/FLINK-33696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Piotr Nowojski reopened FLINK-33696: No it's not yet done. I still have to open a PR adding OpenTelemetry reporters. 7db2ecad was adding only slf4j reporter. > FLIP-385: Add OpenTelemetryTraceReporter and OpenTelemetryMetricReporter > > > Key: FLINK-33696 > URL: https://issues.apache.org/jira/browse/FLINK-33696 > Project: Flink > Issue Type: New Feature > Components: Runtime / Metrics >Reporter: Piotr Nowojski >Assignee: Piotr Nowojski >Priority: Major > Fix For: 1.19.0 > > > h1. Motivation > [FLIP-384|https://cwiki.apache.org/confluence/display/FLINK/FLIP-384%3A+Introduce+TraceReporter+and+use+it+to+create+checkpointing+and+recovery+traces] > is adding TraceReporter interface. However with > [FLIP-384|https://cwiki.apache.org/confluence/display/FLINK/FLIP-384%3A+Introduce+TraceReporter+and+use+it+to+create+checkpointing+and+recovery+traces] > alone, Log4jTraceReporter would be the only available implementation of > TraceReporter interface, which is not very helpful. > In this FLIP I’m proposing to contribute both MetricExporter and > TraceReporter implementation using OpenTelemetry. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Closed] (FLINK-34169) [benchmark] CI fails during test running
[ https://issues.apache.org/jira/browse/FLINK-34169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zakelly Lan closed FLINK-34169. --- Resolution: Fixed > [benchmark] CI fails during test running > > > Key: FLINK-34169 > URL: https://issues.apache.org/jira/browse/FLINK-34169 > Project: Flink > Issue Type: Bug > Components: Benchmarks >Reporter: Zakelly Lan >Assignee: Yunfeng Zhou >Priority: Critical > > [CI link: > https://github.com/apache/flink-benchmarks/actions/runs/7580834955/job/20647663157?pr=85|https://github.com/apache/flink-benchmarks/actions/runs/7580834955/job/20647663157?pr=85] > which says: > {code:java} > // omit some stack traces > Caused by: java.util.concurrent.ExecutionException: Boxed Error > 1115 at scala.concurrent.impl.Promise$.resolver(Promise.scala:87) > 1116 at > scala.concurrent.impl.Promise$.scala$concurrent$impl$Promise$$resolveTry(Promise.scala:79) > 1117 at > scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:284) > 1118 at > org.apache.pekko.pattern.PromiseActorRef.$bang(AskSupport.scala:629) > 1119 at org.apache.pekko.actor.ActorRef.tell(ActorRef.scala:141) > 1120 at > org.apache.flink.runtime.rpc.pekko.PekkoRpcActor.handleRpcInvocation(PekkoRpcActor.java:317) > 1121 at > org.apache.flink.runtime.rpc.pekko.PekkoRpcActor.handleRpcMessage(PekkoRpcActor.java:222) > 1122 ... 22 more > 1123 Caused by: java.lang.NoClassDefFoundError: > javax/activation/UnsupportedDataTypeException > 1124 at > org.apache.flink.runtime.io.network.partition.consumer.SingleInputGateFactory.createKnownInputChannel(SingleInputGateFactory.java:387) > 1125 at > org.apache.flink.runtime.io.network.partition.consumer.SingleInputGateFactory.lambda$createInputChannel$2(SingleInputGateFactory.java:353) > 1126 at > org.apache.flink.runtime.shuffle.ShuffleUtils.applyWithShuffleTypeCheck(ShuffleUtils.java:51) > 1127 at > org.apache.flink.runtime.io.network.partition.consumer.SingleInputGateFactory.createInputChannel(SingleInputGateFactory.java:333) > 1128 at > org.apache.flink.runtime.io.network.partition.consumer.SingleInputGateFactory.createInputChannelsAndTieredStorageService(SingleInputGateFactory.java:284) > 1129 at > org.apache.flink.runtime.io.network.partition.consumer.SingleInputGateFactory.create(SingleInputGateFactory.java:204) > 1130 at > org.apache.flink.runtime.io.network.NettyShuffleEnvironment.createInputGates(NettyShuffleEnvironment.java:265) > 1131 at > org.apache.flink.runtime.taskmanager.Task.(Task.java:418) > 1132 at > org.apache.flink.runtime.taskexecutor.TaskExecutor.submitTask(TaskExecutor.java:821) > 1133 at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > 1134 at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > 1135 at > java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > 1136 at java.base/java.lang.reflect.Method.invoke(Method.java:566) > 1137 at > org.apache.flink.runtime.rpc.pekko.PekkoRpcActor.lambda$handleRpcInvocation$1(PekkoRpcActor.java:309) > 1138 at > org.apache.flink.runtime.concurrent.ClassLoadingUtils.runWithContextClassLoader(ClassLoadingUtils.java:83) > 1139 at > org.apache.flink.runtime.rpc.pekko.PekkoRpcActor.handleRpcInvocation(PekkoRpcActor.java:307) > 1140 ... 23 more > 1141 Caused by: java.lang.ClassNotFoundException: > javax.activation.UnsupportedDataTypeException > 1142 at > java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:581) > 1143 at > java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:178) > 1144 at > java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:527) > 1145 ... 39 more {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-34169) [benchmark] CI fails during test running
[ https://issues.apache.org/jira/browse/FLINK-34169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17813730#comment-17813730 ] Zakelly Lan commented on FLINK-34169: - Thanks, closing now. > [benchmark] CI fails during test running > > > Key: FLINK-34169 > URL: https://issues.apache.org/jira/browse/FLINK-34169 > Project: Flink > Issue Type: Bug > Components: Benchmarks >Reporter: Zakelly Lan >Assignee: Yunfeng Zhou >Priority: Critical > > [CI link: > https://github.com/apache/flink-benchmarks/actions/runs/7580834955/job/20647663157?pr=85|https://github.com/apache/flink-benchmarks/actions/runs/7580834955/job/20647663157?pr=85] > which says: > {code:java} > // omit some stack traces > Caused by: java.util.concurrent.ExecutionException: Boxed Error > 1115 at scala.concurrent.impl.Promise$.resolver(Promise.scala:87) > 1116 at > scala.concurrent.impl.Promise$.scala$concurrent$impl$Promise$$resolveTry(Promise.scala:79) > 1117 at > scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:284) > 1118 at > org.apache.pekko.pattern.PromiseActorRef.$bang(AskSupport.scala:629) > 1119 at org.apache.pekko.actor.ActorRef.tell(ActorRef.scala:141) > 1120 at > org.apache.flink.runtime.rpc.pekko.PekkoRpcActor.handleRpcInvocation(PekkoRpcActor.java:317) > 1121 at > org.apache.flink.runtime.rpc.pekko.PekkoRpcActor.handleRpcMessage(PekkoRpcActor.java:222) > 1122 ... 22 more > 1123 Caused by: java.lang.NoClassDefFoundError: > javax/activation/UnsupportedDataTypeException > 1124 at > org.apache.flink.runtime.io.network.partition.consumer.SingleInputGateFactory.createKnownInputChannel(SingleInputGateFactory.java:387) > 1125 at > org.apache.flink.runtime.io.network.partition.consumer.SingleInputGateFactory.lambda$createInputChannel$2(SingleInputGateFactory.java:353) > 1126 at > org.apache.flink.runtime.shuffle.ShuffleUtils.applyWithShuffleTypeCheck(ShuffleUtils.java:51) > 1127 at > org.apache.flink.runtime.io.network.partition.consumer.SingleInputGateFactory.createInputChannel(SingleInputGateFactory.java:333) > 1128 at > org.apache.flink.runtime.io.network.partition.consumer.SingleInputGateFactory.createInputChannelsAndTieredStorageService(SingleInputGateFactory.java:284) > 1129 at > org.apache.flink.runtime.io.network.partition.consumer.SingleInputGateFactory.create(SingleInputGateFactory.java:204) > 1130 at > org.apache.flink.runtime.io.network.NettyShuffleEnvironment.createInputGates(NettyShuffleEnvironment.java:265) > 1131 at > org.apache.flink.runtime.taskmanager.Task.(Task.java:418) > 1132 at > org.apache.flink.runtime.taskexecutor.TaskExecutor.submitTask(TaskExecutor.java:821) > 1133 at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > 1134 at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > 1135 at > java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > 1136 at java.base/java.lang.reflect.Method.invoke(Method.java:566) > 1137 at > org.apache.flink.runtime.rpc.pekko.PekkoRpcActor.lambda$handleRpcInvocation$1(PekkoRpcActor.java:309) > 1138 at > org.apache.flink.runtime.concurrent.ClassLoadingUtils.runWithContextClassLoader(ClassLoadingUtils.java:83) > 1139 at > org.apache.flink.runtime.rpc.pekko.PekkoRpcActor.handleRpcInvocation(PekkoRpcActor.java:307) > 1140 ... 23 more > 1141 Caused by: java.lang.ClassNotFoundException: > javax.activation.UnsupportedDataTypeException > 1142 at > java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:581) > 1143 at > java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:178) > 1144 at > java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:527) > 1145 ... 39 more {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Closed] (FLINK-23582) Add documentation for session window
[ https://issues.apache.org/jira/browse/FLINK-23582?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Martijn Visser closed FLINK-23582. -- Resolution: Duplicate > Add documentation for session window > > > Key: FLINK-23582 > URL: https://issues.apache.org/jira/browse/FLINK-23582 > Project: Flink > Issue Type: Sub-task > Components: Documentation, Table SQL / API >Reporter: Jing Zhang >Assignee: Jing Zhang >Priority: Minor > Labels: pull-request-available, stale-assigned > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-33856) Add metrics to monitor the interaction performance between task and external storage system in the process of checkpoint making
[ https://issues.apache.org/jira/browse/FLINK-33856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17813723#comment-17813723 ] Martijn Visser commented on FLINK-33856: There's been a change at the ASF on who can create Confluence accounts, we're discussing on how to proceed. Hopefully I'll have an answer soon. > Add metrics to monitor the interaction performance between task and external > storage system in the process of checkpoint making > --- > > Key: FLINK-33856 > URL: https://issues.apache.org/jira/browse/FLINK-33856 > Project: Flink > Issue Type: Improvement > Components: Runtime / Checkpointing >Affects Versions: 1.18.0 >Reporter: Jufang He >Assignee: Jufang He >Priority: Major > Labels: pull-request-available > > When Flink makes a checkpoint, the interaction performance with the external > file system has a great impact on the overall time-consuming. Therefore, it > is easy to observe the bottleneck point by adding performance indicators when > the task interacts with the external file storage system. These include: the > rate of file write , the latency to write the file, the latency to close the > file. > In flink side add the above metrics has the following advantages: convenient > statistical different task E2E time-consuming; do not need to distinguish the > type of external storage system, can be unified in the > FsCheckpointStreamFactory. -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [PR] [FLINK-33138] DataStream API implementation [flink-connector-prometheus]
hlteoh37 commented on code in PR #1: URL: https://github.com/apache/flink-connector-prometheus/pull/1#discussion_r1476176456 ## example-datastream-job/pom.xml: ## @@ -0,0 +1,156 @@ + + +http://maven.apache.org/POM/4.0.0; + xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance; + xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd;> +4.0.0 + +org.apache.flink +flink-connector-prometheus-parent +1.0.0-SNAPSHOT + + +Flink : Connectors : Prometheus : Sample Application +example-datastream-job + + +UTF-8 +1.17.2 +2.17.1 + + + + +org.apache.flink +flink-connector-prometheus +${project.version} + + + + +org.apache.flink.connector.prometheus +amp-request-signer +${project.version} + + + + +org.apache.flink +flink-streaming-java +${flink.version} +provided + + +org.apache.flink +flink-clients +${flink.version} +provided + + +org.apache.flink +flink-connector-base +${flink.version} +provided + + + +org.apache.logging.log4j +log4j-slf4j-impl +${log4j.version} +runtime + + +org.apache.logging.log4j +log4j-api +${log4j.version} +runtime + + +org.apache.logging.log4j +log4j-core +${log4j.version} +runtime + + + + +org.apache.flink +flink-runtime-web Review Comment: why do we need flink-runtime-web? ## amp-request-signer/pom.xml: ## @@ -0,0 +1,60 @@ + + +http://maven.apache.org/POM/4.0.0; + xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance; + xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd;> +4.0.0 + +org.apache.flink +flink-connector-prometheus-parent +1.0.0-SNAPSHOT + + +Flink : Connectors : Prometheus : AMP request signer +org.apache.flink.connector.prometheus +amp-request-signer + + +UTF-8 +1.17.0 + + + + +org.apache.flink +flink-connector-prometheus +${project.version} +provided + + + +com.amazonaws +aws-java-sdk-core +1.12.570 Review Comment: can we delegate this to a bom in parent? ## prometheus-connector/src/main/java/org/apache/flink/connector/prometheus/sink/errorhandling/SinkWriterErrorHandlingBehaviorConfiguration.java: ## @@ -0,0 +1,105 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.connector.prometheus.sink.errorhandling; + +import java.io.Serializable; +import java.util.Optional; + +import static org.apache.flink.connector.prometheus.sink.errorhandling.OnErrorBehavior.FAIL; + +/** + * Configure the error-handling behavior of the writer, for different types of error. Also defines + * default behaviors. + */ +public class SinkWriterErrorHandlingBehaviorConfiguration implements Serializable { + +public static final OnErrorBehavior ON_MAX_RETRY_EXCEEDED_DEFAULT_BEHAVIOR = FAIL; +public static final OnErrorBehavior ON_HTTP_CLIENT_IO_FAIL_DEFAULT_BEHAVIOR = FAIL; +public static final OnErrorBehavior ON_PROMETHEUS_NON_RETRIABLE_ERROR_DEFAULT_BEHAVIOR = FAIL; + +/** Behaviour when the max retries is exceeded on Prometheus retriable errors. */ +private final OnErrorBehavior onMaxRetryExceeded; + +/** Behaviour when the HTTP client fails, for an I/O problem. */ +private final OnErrorBehavior onHttpClientIOFail; + +/** Behaviour when Prometheus Remote-Write respond with a non-retriable error. */ +private final OnErrorBehavior onPrometheusNonRetriableError; + +public
Re: [PR] [FLINK-34344] Pass JobID to CheckpointStatsTracker [flink]
flinkbot commented on PR #24254: URL: https://github.com/apache/flink/pull/24254#issuecomment-1924070918 ## CI report: * 03d4cfb0053a784e0f88677dcf1e8a6e4dc3c9f4 UNKNOWN Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (FLINK-34122) Deprecate old serialization config methods and options
[ https://issues.apache.org/jira/browse/FLINK-34122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17813714#comment-17813714 ] Martijn Visser commented on FLINK-34122: Whatever works for you! > Deprecate old serialization config methods and options > -- > > Key: FLINK-34122 > URL: https://issues.apache.org/jira/browse/FLINK-34122 > Project: Flink > Issue Type: Sub-task > Components: API / Type Serialization System, Runtime / Configuration >Affects Versions: 1.19.0 >Reporter: Zhanghao Chen >Assignee: Zhanghao Chen >Priority: Major > Labels: pull-request-available > Fix For: 1.19.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (FLINK-34344) Wrong JobID in CheckpointStatsTracker
[ https://issues.apache.org/jira/browse/FLINK-34344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated FLINK-34344: --- Labels: pull-request-available (was: ) > Wrong JobID in CheckpointStatsTracker > - > > Key: FLINK-34344 > URL: https://issues.apache.org/jira/browse/FLINK-34344 > Project: Flink > Issue Type: Bug > Components: Runtime / Checkpointing >Affects Versions: 1.19.0 >Reporter: Roman Khachatryan >Assignee: Roman Khachatryan >Priority: Major > Labels: pull-request-available > Fix For: 1.19.0 > > > The job id is generated randomly: > ``` > public CheckpointStatsTracker(int numRememberedCheckpoints, MetricGroup > metricGroup) { > this(numRememberedCheckpoints, metricGroup, new JobID(), > Integer.MAX_VALUE); > } > ``` > This affects how it is logged (or reported elsewhere). -- This message was sent by Atlassian Jira (v8.20.10#820010)
[PR] [FLINK-34344] Pass JobID to CheckpointStatsTracker [flink]
rkhachatryan opened a new pull request, #24254: URL: https://github.com/apache/flink/pull/24254 ## What is the purpose of the change *(For example: This pull request makes task deployment go through the blob server, rather than through RPC. That way we avoid re-transferring them on each deployment (during recovery).)* ## Brief change log *(for example:)* - *The TaskInfo is stored in the blob store on job creation time as a persistent artifact* - *Deployments RPC transmits only the blob storage reference* - *TaskManagers retrieve the TaskInfo from the blob cache* ## Verifying this change Please make sure both new and modified tests in this PR follows the conventions defined in our code quality guide: https://flink.apache.org/contributing/code-style-and-quality-common.html#testing *(Please pick either of the following options)* This change is a trivial rework / code cleanup without any test coverage. *(or)* This change is already covered by existing tests, such as *(please describe tests)*. *(or)* This change added tests and can be verified as follows: *(example:)* - *Added integration tests for end-to-end deployment with large payloads (100MB)* - *Extended integration test for recovery after master (JobManager) failure* - *Added test that validates that TaskInfo is transferred only once across recoveries* - *Manually verified the change by running a 4 node cluster with 2 JobManagers and 4 TaskManagers, a stateful streaming program, and killing one JobManager and two TaskManagers during the execution, verifying that recovery happens correctly.* ## Does this pull request potentially affect one of the following parts: - Dependencies (does it add or upgrade a dependency): (yes / no) - The public API, i.e., is any changed class annotated with `@Public(Evolving)`: (yes / no) - The serializers: (yes / no / don't know) - The runtime per-record code paths (performance sensitive): (yes / no / don't know) - Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn, ZooKeeper: (yes / no / don't know) - The S3 file system connector: (yes / no / don't know) ## Documentation - Does this pull request introduce a new feature? (yes / no) - If yes, how is the feature documented? (not applicable / docs / JavaDocs / not documented) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [FLINK-21949][table] Support ARRAY_AGG aggregate function [flink]
snuyanzin commented on PR #23411: URL: https://github.com/apache/flink/pull/23411#issuecomment-1924052949 I think we could wait until Monday or even more since right now a feature freeze and need to wait for cutting release branch -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (FLINK-34344) Wrong JobID in CheckpointStatsTracker
Roman Khachatryan created FLINK-34344: - Summary: Wrong JobID in CheckpointStatsTracker Key: FLINK-34344 URL: https://issues.apache.org/jira/browse/FLINK-34344 Project: Flink Issue Type: Bug Components: Runtime / Checkpointing Affects Versions: 1.19.0 Reporter: Roman Khachatryan Assignee: Roman Khachatryan Fix For: 1.19.0 The job id is generated randomly: ``` public CheckpointStatsTracker(int numRememberedCheckpoints, MetricGroup metricGroup) { this(numRememberedCheckpoints, metricGroup, new JobID(), Integer.MAX_VALUE); } ``` This affects how it is logged (or reported elsewhere). -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Closed] (FLINK-34295) Release Testing Instructions: Verify FLINK-33712 Deprecate RuntimeContext#getExecutionConfig
[ https://issues.apache.org/jira/browse/FLINK-34295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lincoln lee closed FLINK-34295. --- Resolution: Fixed [~JunRuiLi] Thanks for confirming! > Release Testing Instructions: Verify FLINK-33712 Deprecate > RuntimeContext#getExecutionConfig > > > Key: FLINK-34295 > URL: https://issues.apache.org/jira/browse/FLINK-34295 > Project: Flink > Issue Type: Sub-task > Components: Runtime / Configuration >Affects Versions: 1.19.0 >Reporter: lincoln lee >Assignee: Junrui Li >Priority: Blocker > Labels: release-testing > Fix For: 1.19.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Closed] (FLINK-34296) Release Testing Instructions: Verify FLINK-33581 Deprecate configuration getters/setters that return/set complex Java objects
[ https://issues.apache.org/jira/browse/FLINK-34296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lincoln lee closed FLINK-34296. --- Resolution: Fixed > Release Testing Instructions: Verify FLINK-33581 Deprecate configuration > getters/setters that return/set complex Java objects > - > > Key: FLINK-34296 > URL: https://issues.apache.org/jira/browse/FLINK-34296 > Project: Flink > Issue Type: Sub-task > Components: Runtime / Configuration >Affects Versions: 1.19.0 >Reporter: lincoln lee >Assignee: Junrui Li >Priority: Blocker > Labels: release-testing > Fix For: 1.19.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-34296) Release Testing Instructions: Verify FLINK-33581 Deprecate configuration getters/setters that return/set complex Java objects
[ https://issues.apache.org/jira/browse/FLINK-34296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17813709#comment-17813709 ] lincoln lee commented on FLINK-34296: - [~JunRuiLi] Thanks for confirming, will close it. > Release Testing Instructions: Verify FLINK-33581 Deprecate configuration > getters/setters that return/set complex Java objects > - > > Key: FLINK-34296 > URL: https://issues.apache.org/jira/browse/FLINK-34296 > Project: Flink > Issue Type: Sub-task > Components: Runtime / Configuration >Affects Versions: 1.19.0 >Reporter: lincoln lee >Assignee: Junrui Li >Priority: Blocker > Labels: release-testing > Fix For: 1.19.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-34299) Release Testing Instructions: Verify FLINK-33203 Adding a separate configuration for specifying Java Options of the SQL Gateway
[ https://issues.apache.org/jira/browse/FLINK-34299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17813708#comment-17813708 ] lincoln lee commented on FLINK-34299: - [~guoyangze] Thanks for confirming! > Release Testing Instructions: Verify FLINK-33203 Adding a separate > configuration for specifying Java Options of the SQL Gateway > > > Key: FLINK-34299 > URL: https://issues.apache.org/jira/browse/FLINK-34299 > Project: Flink > Issue Type: Sub-task > Components: Table SQL / Gateway >Affects Versions: 1.19.0 >Reporter: lincoln lee >Assignee: Yangze Guo >Priority: Blocker > Labels: release-testing > Fix For: 1.19.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [PR] [FLINK-24239] Event time temporal join should support values from array, map, row, etc. as join key [flink]
flinkbot commented on PR #24253: URL: https://github.com/apache/flink/pull/24253#issuecomment-1924021657 ## CI report: * e06aee059e610ce72fb153eff7c8f466cde82bfa UNKNOWN Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[PR] [FLINK-24239] Event time temporal join should support values from array, map, row, etc. as join key [flink]
dawidwys opened a new pull request, #24253: URL: https://github.com/apache/flink/pull/24253 ## What is the purpose of the change This makes it possible to use more complex keys for temporal join. It tries pushing down join predicates before extracting join keys which makes more cases produce an equi-join condition required for a temporal join. ## Verifying this change Added tests in `TemporalJoinRestoreTest` covering keys from: * a nested row * value from a map ## Does this pull request potentially affect one of the following parts: - Dependencies (does it add or upgrade a dependency): (yes / **no**) - The public API, i.e., is any changed class annotated with `@Public(Evolving)`: (yes / **no**) - The serializers: (yes / **no** / don't know) - The runtime per-record code paths (performance sensitive): (yes / **no** / don't know) - Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn, ZooKeeper: (yes / **no** / don't know) - The S3 file system connector: (yes / **no** / don't know) ## Documentation - Does this pull request introduce a new feature? (yes / **no**) - If yes, how is the feature documented? (**not applicable** / docs / JavaDocs / not documented) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (FLINK-24239) Event time temporal join should support values from array, map, row, etc. as join key
[ https://issues.apache.org/jira/browse/FLINK-24239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated FLINK-24239: --- Labels: pull-request-available (was: ) > Event time temporal join should support values from array, map, row, etc. as > join key > - > > Key: FLINK-24239 > URL: https://issues.apache.org/jira/browse/FLINK-24239 > Project: Flink > Issue Type: Improvement > Components: Table SQL / Planner >Affects Versions: 1.12.6, 1.13.3, 1.14.1, 1.15.0 >Reporter: Caizhi Weng >Assignee: Dawid Wysakowicz >Priority: Major > Labels: pull-request-available > > This ticket is from the [mailing > list|https://lists.apache.org/thread.html/r90cab9c5026e527357d58db70d7e9b5875e57b942738f032bd54bfd3%40%3Cuser-zh.flink.apache.org%3E]. > Currently in event time temporal join when join keys are from an array, map > or row, an exception will be thrown saying "Currently the join key in > Temporal Table Join can not be empty". This is quite confusing for users as > they've already set the join keys. > Add the following test case to {{TableEnvironmentITCase}} to reproduce this > issue. > {code:scala} > @Test > def myTest(): Unit = { > tEnv.executeSql( > """ > |CREATE TABLE A ( > | a MAP, > | ts TIMESTAMP(3), > | WATERMARK FOR ts AS ts > |) WITH ( > | 'connector' = 'values' > |) > |""".stripMargin) > tEnv.executeSql( > """ > |CREATE TABLE B ( > | id INT, > | ts TIMESTAMP(3), > | WATERMARK FOR ts AS ts > |) WITH ( > | 'connector' = 'values' > |) > |""".stripMargin) > tEnv.executeSql("SELECT * FROM A LEFT JOIN B FOR SYSTEM_TIME AS OF A.ts AS > b ON A.a['ID'] = id").print() > } > {code} > The exception stack is > {code:java} > org.apache.flink.table.api.ValidationException: Currently the join key in > Temporal Table Join can not be empty. > at > org.apache.flink.table.planner.plan.rules.logical.LogicalCorrelateToJoinFromGeneralTemporalTableRule.onMatch(LogicalCorrelateToJoinFromTemporalTableRule.scala:272) > at > org.apache.calcite.plan.AbstractRelOptPlanner.fireRule(AbstractRelOptPlanner.java:333) > at org.apache.calcite.plan.hep.HepPlanner.applyRule(HepPlanner.java:542) > at > org.apache.calcite.plan.hep.HepPlanner.applyRules(HepPlanner.java:407) > at > org.apache.calcite.plan.hep.HepPlanner.executeInstruction(HepPlanner.java:243) > at > org.apache.calcite.plan.hep.HepInstruction$RuleInstance.execute(HepInstruction.java:127) > at > org.apache.calcite.plan.hep.HepPlanner.executeProgram(HepPlanner.java:202) > at > org.apache.calcite.plan.hep.HepPlanner.findBestExp(HepPlanner.java:189) > at > org.apache.flink.table.planner.plan.optimize.program.FlinkHepProgram.optimize(FlinkHepProgram.scala:69) > at > org.apache.flink.table.planner.plan.optimize.program.FlinkHepRuleSetProgram.optimize(FlinkHepRuleSetProgram.scala:87) > at > org.apache.flink.table.planner.plan.optimize.program.FlinkGroupProgram$$anonfun$optimize$1$$anonfun$apply$1.apply(FlinkGroupProgram.scala:63) > at > org.apache.flink.table.planner.plan.optimize.program.FlinkGroupProgram$$anonfun$optimize$1$$anonfun$apply$1.apply(FlinkGroupProgram.scala:60) > at > scala.collection.TraversableOnce$$anonfun$foldLeft$1.apply(TraversableOnce.scala:157) > at > scala.collection.TraversableOnce$$anonfun$foldLeft$1.apply(TraversableOnce.scala:157) > at scala.collection.Iterator$class.foreach(Iterator.scala:891) > at scala.collection.AbstractIterator.foreach(Iterator.scala:1334) > at scala.collection.IterableLike$class.foreach(IterableLike.scala:72) > at scala.collection.AbstractIterable.foreach(Iterable.scala:54) > at > scala.collection.TraversableOnce$class.foldLeft(TraversableOnce.scala:157) > at scala.collection.AbstractTraversable.foldLeft(Traversable.scala:104) > at > org.apache.flink.table.planner.plan.optimize.program.FlinkGroupProgram$$anonfun$optimize$1.apply(FlinkGroupProgram.scala:60) > at > org.apache.flink.table.planner.plan.optimize.program.FlinkGroupProgram$$anonfun$optimize$1.apply(FlinkGroupProgram.scala:55) > at > scala.collection.TraversableOnce$$anonfun$foldLeft$1.apply(TraversableOnce.scala:157) > at > scala.collection.TraversableOnce$$anonfun$foldLeft$1.apply(TraversableOnce.scala:157) > at scala.collection.immutable.Range.foreach(Range.scala:160) > at > scala.collection.TraversableOnce$class.foldLeft(TraversableOnce.scala:157) > at scala.collection.AbstractTraversable.foldLeft(Traversable.scala:104) > at >
Re: [PR] [FLINK-21949][table] Support ARRAY_AGG aggregate function [flink]
dawidwys commented on PR #23411: URL: https://github.com/apache/flink/pull/23411#issuecomment-1924010094 Sorry, it takes such a long time from my side. I had a vacation in the meantime. I'll try to check it Monday. Nevertheless if you're comfortable with the PR @snuyanzin feel free to merge it without waiting for my review. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [FLINK-34007][k8s] Add graceful shutdown to KubernetesLeaderElector [flink]
flinkbot commented on PR #24252: URL: https://github.com/apache/flink/pull/24252#issuecomment-1923919766 ## CI report: * 4d84982a1376edabc04108045e4cbd32eb9e59da UNKNOWN Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[PR] [FLINK-34007][k8s] Add graceful shutdown to KubernetesLeaderElector [flink]
XComp opened a new pull request, #24252: URL: https://github.com/apache/flink/pull/24252 ## What is the purpose of the change There was a test failure in [this build](https://github.com/XComp/flink/actions/runs/7745486791/job/21121947504#step:14:7309) which was testing FLINK-34333 (the 1.18 backport of FLINK-34007). `KubernetesLeaderElectorITCase.testMultipleKubernetesLeaderElectors` failed with the following error: ``` 18:38:52,290 [KubernetesLeaderElector-ExecutorService-thread-1] ERROR io.fabric8.kubernetes.client.extended.leaderelection.LeaderElector [] - Exception occurred while releasing lock 'ConfigMapLock: default - kubernetesleaderelectoritcase-configmap-2e3f2dee-c41e-4f6f-830e-426018db34 a7 (406ed8e6-5cc6-4777-81b3-2ca706b83d72)' on cancel io.fabric8.kubernetes.client.KubernetesClientException: Operation: [get] for kind: [ConfigMap] with name: [kubernetesleaderelectoritcase-configmap-2e3f2dee-c41e-4f6f-830e-426018db34a7] in namespace: [default] failed. at io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:159) ~[kubernetes-client-api-6.9.2.jar:?] at io.fabric8.kubernetes.client.dsl.internal.BaseOperation.requireFromServer(BaseOperation.java:194) ~[kubernetes-client-6.9.2.jar:?] at io.fabric8.kubernetes.client.dsl.internal.BaseOperation.get(BaseOperation.java:148) ~[kubernetes-client-6.9.2.jar:?] at io.fabric8.kubernetes.client.dsl.internal.BaseOperation.get(BaseOperation.java:97) ~[kubernetes-client-6.9.2.jar:?] at io.fabric8.kubernetes.client.extended.leaderelection.resourcelock.ResourceLock.get(ResourceLock.java:49) ~[kubernetes-client-api-6.9.2.jar:?] at io.fabric8.kubernetes.client.extended.leaderelection.LeaderElector.release(LeaderElector.java:148) ~[kubernetes-client-api-6.9.2.jar:?] at io.fabric8.kubernetes.client.extended.leaderelection.LeaderElector.stopLeading(LeaderElector.java:126) ~[kubernetes-client-api-6.9.2.jar:?] at io.fabric8.kubernetes.client.extended.leaderelection.LeaderElector.lambda$null$1(LeaderElector.java:96) ~[kubernetes-client-api-6.9.2.jar:?] at java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:774) ~[?:1.8.0_402] at java.util.concurrent.CompletableFuture.uniWhenCompleteStage(CompletableFuture.java:792) ~[?:1.8.0_402] at java.util.concurrent.CompletableFuture.whenComplete(CompletableFuture.java:2153) ~[?:1.8.0_402] at io.fabric8.kubernetes.client.extended.leaderelection.LeaderElector.lambda$start$2(LeaderElector.java:95) ~[kubernetes-client-api-6.9.2.jar:?] at java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:774) ~[?:1.8.0_402] at java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:750) ~[?:1.8.0_402] at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488) ~[?:1.8.0_402] at java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1975) ~[?:1.8.0_402] at io.fabric8.kubernetes.client.extended.leaderelection.LeaderElector.lambda$acquire$4(LeaderElector.java:173) ~[kubernetes-client-api-6.9.2.jar:?] at io.fabric8.kubernetes.client.extended.leaderelection.LeaderElector.lambda$loop$8(LeaderElector.java:282) ~[kubernetes-client-api-6.9.2.jar:?] at java.util.concurrent.CompletableFuture$AsyncRun.run(CompletableFuture.java:1640) [?:1.8.0_402] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_402] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_402] at java.lang.Thread.run(Thread.java:750) [?:1.8.0_402] Caused by: java.io.InterruptedIOException at io.fabric8.kubernetes.client.dsl.internal.OperationSupport.waitForResult(OperationSupport.java:494) ~[kubernetes-client-6.9.2.jar:?] at io.fabric8.kubernetes.client.dsl.internal.OperationSupport.handleResponse(OperationSupport.java:524) ~[kubernetes-client-6.9.2.jar:?] at io.fabric8.kubernetes.client.dsl.internal.OperationSupport.handleGet(OperationSupport.java:467) ~[kubernetes-client-6.9.2.jar:?] at io.fabric8.kubernetes.client.dsl.internal.BaseOperation.handleGet(BaseOperation.java:791) ~[kubernetes-client-6.9.2.jar:?] at io.fabric8.kubernetes.client.dsl.internal.BaseOperation.requireFromServer(BaseOperation.java:192) ~[kubernetes-client-6.9.2.jar:?] ... 20 more Caused by: java.lang.InterruptedException at java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:347) ~[?:1.8.0_402] at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1908) ~[?:1.8.0_402] at
Re: [PR] [FLINK-34217][docs] Update user doc for type serialization with FLIP-398 [flink]
X-czh commented on PR #24230: URL: https://github.com/apache/flink/pull/24230#issuecomment-1923775660 @flinkbot run azure -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Assigned] (FLINK-24239) Event time temporal join should support values from array, map, row, etc. as join key
[ https://issues.apache.org/jira/browse/FLINK-24239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dawid Wysakowicz reassigned FLINK-24239: Assignee: Dawid Wysakowicz > Event time temporal join should support values from array, map, row, etc. as > join key > - > > Key: FLINK-24239 > URL: https://issues.apache.org/jira/browse/FLINK-24239 > Project: Flink > Issue Type: Improvement > Components: Table SQL / Planner >Affects Versions: 1.12.6, 1.13.3, 1.14.1, 1.15.0 >Reporter: Caizhi Weng >Assignee: Dawid Wysakowicz >Priority: Major > > This ticket is from the [mailing > list|https://lists.apache.org/thread.html/r90cab9c5026e527357d58db70d7e9b5875e57b942738f032bd54bfd3%40%3Cuser-zh.flink.apache.org%3E]. > Currently in event time temporal join when join keys are from an array, map > or row, an exception will be thrown saying "Currently the join key in > Temporal Table Join can not be empty". This is quite confusing for users as > they've already set the join keys. > Add the following test case to {{TableEnvironmentITCase}} to reproduce this > issue. > {code:scala} > @Test > def myTest(): Unit = { > tEnv.executeSql( > """ > |CREATE TABLE A ( > | a MAP, > | ts TIMESTAMP(3), > | WATERMARK FOR ts AS ts > |) WITH ( > | 'connector' = 'values' > |) > |""".stripMargin) > tEnv.executeSql( > """ > |CREATE TABLE B ( > | id INT, > | ts TIMESTAMP(3), > | WATERMARK FOR ts AS ts > |) WITH ( > | 'connector' = 'values' > |) > |""".stripMargin) > tEnv.executeSql("SELECT * FROM A LEFT JOIN B FOR SYSTEM_TIME AS OF A.ts AS > b ON A.a['ID'] = id").print() > } > {code} > The exception stack is > {code:java} > org.apache.flink.table.api.ValidationException: Currently the join key in > Temporal Table Join can not be empty. > at > org.apache.flink.table.planner.plan.rules.logical.LogicalCorrelateToJoinFromGeneralTemporalTableRule.onMatch(LogicalCorrelateToJoinFromTemporalTableRule.scala:272) > at > org.apache.calcite.plan.AbstractRelOptPlanner.fireRule(AbstractRelOptPlanner.java:333) > at org.apache.calcite.plan.hep.HepPlanner.applyRule(HepPlanner.java:542) > at > org.apache.calcite.plan.hep.HepPlanner.applyRules(HepPlanner.java:407) > at > org.apache.calcite.plan.hep.HepPlanner.executeInstruction(HepPlanner.java:243) > at > org.apache.calcite.plan.hep.HepInstruction$RuleInstance.execute(HepInstruction.java:127) > at > org.apache.calcite.plan.hep.HepPlanner.executeProgram(HepPlanner.java:202) > at > org.apache.calcite.plan.hep.HepPlanner.findBestExp(HepPlanner.java:189) > at > org.apache.flink.table.planner.plan.optimize.program.FlinkHepProgram.optimize(FlinkHepProgram.scala:69) > at > org.apache.flink.table.planner.plan.optimize.program.FlinkHepRuleSetProgram.optimize(FlinkHepRuleSetProgram.scala:87) > at > org.apache.flink.table.planner.plan.optimize.program.FlinkGroupProgram$$anonfun$optimize$1$$anonfun$apply$1.apply(FlinkGroupProgram.scala:63) > at > org.apache.flink.table.planner.plan.optimize.program.FlinkGroupProgram$$anonfun$optimize$1$$anonfun$apply$1.apply(FlinkGroupProgram.scala:60) > at > scala.collection.TraversableOnce$$anonfun$foldLeft$1.apply(TraversableOnce.scala:157) > at > scala.collection.TraversableOnce$$anonfun$foldLeft$1.apply(TraversableOnce.scala:157) > at scala.collection.Iterator$class.foreach(Iterator.scala:891) > at scala.collection.AbstractIterator.foreach(Iterator.scala:1334) > at scala.collection.IterableLike$class.foreach(IterableLike.scala:72) > at scala.collection.AbstractIterable.foreach(Iterable.scala:54) > at > scala.collection.TraversableOnce$class.foldLeft(TraversableOnce.scala:157) > at scala.collection.AbstractTraversable.foldLeft(Traversable.scala:104) > at > org.apache.flink.table.planner.plan.optimize.program.FlinkGroupProgram$$anonfun$optimize$1.apply(FlinkGroupProgram.scala:60) > at > org.apache.flink.table.planner.plan.optimize.program.FlinkGroupProgram$$anonfun$optimize$1.apply(FlinkGroupProgram.scala:55) > at > scala.collection.TraversableOnce$$anonfun$foldLeft$1.apply(TraversableOnce.scala:157) > at > scala.collection.TraversableOnce$$anonfun$foldLeft$1.apply(TraversableOnce.scala:157) > at scala.collection.immutable.Range.foreach(Range.scala:160) > at > scala.collection.TraversableOnce$class.foldLeft(TraversableOnce.scala:157) > at scala.collection.AbstractTraversable.foldLeft(Traversable.scala:104) > at > org.apache.flink.table.planner.plan.optimize.program.FlinkGroupProgram.optimize(FlinkGroupProgram.scala:55) >
[jira] [Commented] (FLINK-33734) Merge unaligned checkpoint state handle
[ https://issues.apache.org/jira/browse/FLINK-33734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17813646#comment-17813646 ] Feifan Wang commented on FLINK-33734: - I found that only merging handles was not enough. Although the metadata was obviously smaller, it still took a long time for jobmanager to process rpc and serialize metadata. I speculate the reason is that although the merge handle avoids duplicate file paths, but the number of objects in the metadata is not significantly reduced (mainly channel infos). So I tried serializing the channel infos directly on the taskmanager side, and the test results showed that this worked well. Below are the results of my test: {code:java} # control group : checkpoint time ( s ) --- avg: 82.73 max: 155.88 min: 49.90 store metadata time ( s ) --- avg: 23.45 max: 47.60 min: 9.61 metadata size ( MB ) --- avg: 696.55 max: 954.98 min: 461.35 store metadata speed ( MB/s ) --- avg: 32.41 max: 61.82 min: 18.29 # only merge handle : checkpoint time ( s ) --- avg: 66.22 max: 123.12 min: 38.68 store metadata time ( s ) --- avg: 12.76 max: 26.18 min: 4.02 metadata size ( MB ) --- avg: 269.14 max: 394.26 min: 159.86 store metadata speed ( MB/s ) --- avg: 23.93 max: 46.55 min: 11.33 # not only merge handles, but also serialize channel infos on TaskMangager : checkpoint time ( s ) --- avg: 30.63 max: 74.27 min: 5.16 store metadata time ( s ) --- avg: 0.87 max: 11.23 min: 0.12 metadata size ( MB ) --- avg: 232.22 max: 392.86 min: 45.34 store metadata speed ( MB/s ) --- avg: 291.00 max: 386.80 min: 23.18{code} Based on the results of the above test, I think serializing channel infos on the taskmanger side should be done together. I submitted a PR to implement this solution, please have a look [~pnowojski] ,[~fanrui] ,[~roman] , [~Zakelly] . > Merge unaligned checkpoint state handle > --- > > Key: FLINK-33734 > URL: https://issues.apache.org/jira/browse/FLINK-33734 > Project: Flink > Issue Type: Improvement > Components: Runtime / Checkpointing >Reporter: Feifan Wang >Assignee: Feifan Wang >Priority: Major > Labels: pull-request-available > > h3. Background > Unaligned checkpoint will write the inflight-data of all InputChannel and > ResultSubpartition of the same subtask to the same file during checkpoint. > The InputChannelStateHandle and ResultSubpartitionStateHandle organize the > metadata of inflight-data at the channel granularity, which causes the file > name to be repeated many times. When a job is under backpressure and task > parallelism is high, the metadata of unaligned checkpoints will bloat. This > will result in: > # The amount of data reported by taskmanager to jobmanager increases, and > jobmanager takes longer to process these RPC requests. > # The metadata of the entire checkpoint becomes very large, and it takes > longer to serialize and write it to dfs. > Both of the above points ultimately lead to longer checkpoint duration. > h3. A Production example > Take our production job with a parallelism of 4800 as an example: > # When there is no back pressure, checkpoint end-to-end duration is within 7 > seconds. > # When under pressure: checkpoint end-to-end duration often exceeds 1 > minute. We found that jobmanager took more than 40 seconds to process rpc > requests, and serialized metadata took more than 20 seconds.Some checkpoint > statistics: > |metadata file size|950 MB| > |channel state count|12,229,854| > |channel file count|5536| > Of the 950MB in the metadata file, 68% are redundant file paths. > We enabled log-based checkpoint on this job and hoped that the checkpoint > could be completed within 30 seconds. This problem made it difficult to > achieve this goal. > h3. Propose changes > I suggest introducing MergedInputChannelStateHandle and > MergedResultSubpartitionStateHandle to eliminate redundant file paths. > The taskmanager merges all InputChannelStateHandles with the same delegated > StreamStateHandle in the same subtask into one MergedInputChannelStateHandle > before reporting. When recovering from checkpoint, jobmangager converts > MergedInputChannelStateHandle to InputChannelStateHandle collection before > assigning state handle, and the rest of the process does not need to be > changed. > Structure of MergedInputChannelStateHandle : > >
Re: [PR] [FLINK-25421] Port JDBC Sink to new Unified Sink API (FLIP-143) [flink-connector-jdbc]
eskabetxe commented on PR #2: URL: https://github.com/apache/flink-connector-jdbc/pull/2#issuecomment-1923662905 @snuyanzin @MartijnVisser I already change an test to mock Sink.InitContext as in 1.19 add a method with a new class.. But now is falling all dialects for TIMESTAMP_LTZ that in 1.19 dont fail and for 1.18 fail.. is this a new feature on 1.19 correct? how should proceed with this? remove the fail test? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (FLINK-34343) ResourceManager registration is not completed when registering the JobMaster
[ https://issues.apache.org/jira/browse/FLINK-34343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17813629#comment-17813629 ] Matthias Pohl commented on FLINK-34343: --- [~chesnay] do you have capacity to look at that one? I'm still investigating the FLINK-34007 test instability. > ResourceManager registration is not completed when registering the JobMaster > > > Key: FLINK-34343 > URL: https://issues.apache.org/jira/browse/FLINK-34343 > Project: Flink > Issue Type: Bug > Components: Runtime / Coordination, Runtime / RPC >Affects Versions: 1.19.0 >Reporter: Matthias Pohl >Priority: Blocker > Labels: test-stability > Attachments: FLINK-34343_k8s_application_cluster_e2e_test.log > > > https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=57203=logs=64debf87-ecdb-5aef-788d-8720d341b5cb=2302fb98-0839-5df2-3354-bbae636f81a7=8066 > The test run failed due to a NullPointerException: > {code} > Feb 02 01:11:55 2024-02-02 01:11:47,791 INFO > org.apache.flink.runtime.rpc.pekko.FencedPekkoRpcActor [] - The rpc > endpoint > org.apache.flink.runtime.resourcemanager.active.ActiveResourceManager has not > been started yet. Discarding message LocalFencedMessage(000 > 0, > LocalRpcInvocation(ResourceManagerGateway.registerJobMaster(JobMasterId, > ResourceID, String, JobID, Time))) until processing is started. > Feb 02 01:11:55 2024-02-02 01:11:47,797 WARN > org.apache.flink.runtime.rpc.pekko.SupervisorActor [] - RpcActor > pekko://flink/user/rpc/resourcemanager_2 has failed. Shutting it down now. > Feb 02 01:11:55 java.lang.NullPointerException: Cannot invoke > "org.apache.flink.runtime.rpc.RpcServer.getAddress()" because > "this.rpcServer" is null > Feb 02 01:11:55 at > org.apache.flink.runtime.rpc.RpcEndpoint.getAddress(RpcEndpoint.java:322) > ~[flink-dist-1.19-SNAPSHOT.jar:1.19-SNAPSHOT] > Feb 02 01:11:55 at > org.apache.flink.runtime.rpc.pekko.PekkoRpcActor.handleMessage(PekkoRpcActor.java:182) > ~[flink-rpc-akka06a9bb81-2e68-483a-b236-a283d0b1d097.jar:1.19-SNAPSHOT] > Feb 02 01:11:55 at > org.apache.pekko.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:33) > ~[flink-rpc-akka06a9bb81-2e68-483a-b236-a283d0b1d097.jar:1.19-SNAPSHOT] > Feb 02 01:11:55 at > org.apache.pekko.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:29) > ~[flink-rpc-akka06a9bb81-2e68-483a-b236-a283d0b1d097.jar:1.19-SNAPSHOT] > Feb 02 01:11:55 at > scala.PartialFunction.applyOrElse(PartialFunction.scala:127) > ~[flink-rpc-akka06a9bb81-2e68-483a-b236-a283d0b1d097.jar:1.19-SNAPSHOT] > Feb 02 01:11:55 at > scala.PartialFunction.applyOrElse$(PartialFunction.scala:126) > ~[flink-rpc-akka06a9bb81-2e68-483a-b236-a283d0b1d097.jar:1.19-SNAPSHOT] > Feb 02 01:11:55 at > org.apache.pekko.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala:29) > ~[flink-rpc-akka06a9bb81-2e68-483a-b236-a283d0b1d097.jar:1.19-SNAPSHOT] > Feb 02 01:11:55 at > scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:175) > ~[flink-rpc-akka06a9bb81-2e68-483a-b236-a283d0b1d097.jar:1.19-SNAPSHOT] > Feb 02 01:11:55 at > scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:176) > ~[flink-rpc-akka06a9bb81-2e68-483a-b236-a283d0b1d097.jar:1.19-SNAPSHOT] > Feb 02 01:11:55 at > scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:176) > ~[flink-rpc-akka06a9bb81-2e68-483a-b236-a283d0b1d097.jar:1.19-SNAPSHOT] > Feb 02 01:11:55 at > org.apache.pekko.actor.Actor.aroundReceive(Actor.scala:547) > ~[flink-rpc-akka06a9bb81-2e68-483a-b236-a283d0b1d097.jar:1.19-SNAPSHOT] > Feb 02 01:11:55 at > org.apache.pekko.actor.Actor.aroundReceive$(Actor.scala:545) > ~[flink-rpc-akka06a9bb81-2e68-483a-b236-a283d0b1d097.jar:1.19-SNAPSHOT] > Feb 02 01:11:55 at > org.apache.pekko.actor.AbstractActor.aroundReceive(AbstractActor.scala:229) > ~[flink-rpc-akka06a9bb81-2e68-483a-b236-a283d0b1d097.jar:1.19-SNAPSHOT] > Feb 02 01:11:55 at > org.apache.pekko.actor.ActorCell.receiveMessage(ActorCell.scala:590) > ~[flink-rpc-akka06a9bb81-2e68-483a-b236-a283d0b1d097.jar:1.19-SNAPSHOT] > Feb 02 01:11:55 at > org.apache.pekko.actor.ActorCell.invoke(ActorCell.scala:557) > ~[flink-rpc-akka06a9bb81-2e68-483a-b236-a283d0b1d097.jar:1.19-SNAPSHOT] > Feb 02 01:11:55 at > org.apache.pekko.dispatch.Mailbox.processMailbox(Mailbox.scala:280) > ~[flink-rpc-akka06a9bb81-2e68-483a-b236-a283d0b1d097.jar:1.19-SNAPSHOT] > Feb 02 01:11:55 at > org.apache.pekko.dispatch.Mailbox.run(Mailbox.scala:241) > ~[flink-rpc-akka06a9bb81-2e68-483a-b236-a283d0b1d097.jar:1.19-SNAPSHOT] > Feb 02 01:11:55 at >
[jira] [Commented] (FLINK-34343) ResourceManager registration is not completed when registering the JobMaster
[ https://issues.apache.org/jira/browse/FLINK-34343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17813627#comment-17813627 ] Matthias Pohl commented on FLINK-34343: --- I attached the logs of the corresponding failure to this issue. > ResourceManager registration is not completed when registering the JobMaster > > > Key: FLINK-34343 > URL: https://issues.apache.org/jira/browse/FLINK-34343 > Project: Flink > Issue Type: Bug > Components: Runtime / Coordination, Runtime / RPC >Affects Versions: 1.19.0 >Reporter: Matthias Pohl >Priority: Blocker > Labels: test-stability > Attachments: FLINK-34343_k8s_application_cluster_e2e_test.log > > > https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=57203=logs=64debf87-ecdb-5aef-788d-8720d341b5cb=2302fb98-0839-5df2-3354-bbae636f81a7=8066 > The test run failed due to a NullPointerException: > {code} > Feb 02 01:11:55 2024-02-02 01:11:47,791 INFO > org.apache.flink.runtime.rpc.pekko.FencedPekkoRpcActor [] - The rpc > endpoint > org.apache.flink.runtime.resourcemanager.active.ActiveResourceManager has not > been started yet. Discarding message LocalFencedMessage(000 > 0, > LocalRpcInvocation(ResourceManagerGateway.registerJobMaster(JobMasterId, > ResourceID, String, JobID, Time))) until processing is started. > Feb 02 01:11:55 2024-02-02 01:11:47,797 WARN > org.apache.flink.runtime.rpc.pekko.SupervisorActor [] - RpcActor > pekko://flink/user/rpc/resourcemanager_2 has failed. Shutting it down now. > Feb 02 01:11:55 java.lang.NullPointerException: Cannot invoke > "org.apache.flink.runtime.rpc.RpcServer.getAddress()" because > "this.rpcServer" is null > Feb 02 01:11:55 at > org.apache.flink.runtime.rpc.RpcEndpoint.getAddress(RpcEndpoint.java:322) > ~[flink-dist-1.19-SNAPSHOT.jar:1.19-SNAPSHOT] > Feb 02 01:11:55 at > org.apache.flink.runtime.rpc.pekko.PekkoRpcActor.handleMessage(PekkoRpcActor.java:182) > ~[flink-rpc-akka06a9bb81-2e68-483a-b236-a283d0b1d097.jar:1.19-SNAPSHOT] > Feb 02 01:11:55 at > org.apache.pekko.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:33) > ~[flink-rpc-akka06a9bb81-2e68-483a-b236-a283d0b1d097.jar:1.19-SNAPSHOT] > Feb 02 01:11:55 at > org.apache.pekko.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:29) > ~[flink-rpc-akka06a9bb81-2e68-483a-b236-a283d0b1d097.jar:1.19-SNAPSHOT] > Feb 02 01:11:55 at > scala.PartialFunction.applyOrElse(PartialFunction.scala:127) > ~[flink-rpc-akka06a9bb81-2e68-483a-b236-a283d0b1d097.jar:1.19-SNAPSHOT] > Feb 02 01:11:55 at > scala.PartialFunction.applyOrElse$(PartialFunction.scala:126) > ~[flink-rpc-akka06a9bb81-2e68-483a-b236-a283d0b1d097.jar:1.19-SNAPSHOT] > Feb 02 01:11:55 at > org.apache.pekko.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala:29) > ~[flink-rpc-akka06a9bb81-2e68-483a-b236-a283d0b1d097.jar:1.19-SNAPSHOT] > Feb 02 01:11:55 at > scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:175) > ~[flink-rpc-akka06a9bb81-2e68-483a-b236-a283d0b1d097.jar:1.19-SNAPSHOT] > Feb 02 01:11:55 at > scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:176) > ~[flink-rpc-akka06a9bb81-2e68-483a-b236-a283d0b1d097.jar:1.19-SNAPSHOT] > Feb 02 01:11:55 at > scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:176) > ~[flink-rpc-akka06a9bb81-2e68-483a-b236-a283d0b1d097.jar:1.19-SNAPSHOT] > Feb 02 01:11:55 at > org.apache.pekko.actor.Actor.aroundReceive(Actor.scala:547) > ~[flink-rpc-akka06a9bb81-2e68-483a-b236-a283d0b1d097.jar:1.19-SNAPSHOT] > Feb 02 01:11:55 at > org.apache.pekko.actor.Actor.aroundReceive$(Actor.scala:545) > ~[flink-rpc-akka06a9bb81-2e68-483a-b236-a283d0b1d097.jar:1.19-SNAPSHOT] > Feb 02 01:11:55 at > org.apache.pekko.actor.AbstractActor.aroundReceive(AbstractActor.scala:229) > ~[flink-rpc-akka06a9bb81-2e68-483a-b236-a283d0b1d097.jar:1.19-SNAPSHOT] > Feb 02 01:11:55 at > org.apache.pekko.actor.ActorCell.receiveMessage(ActorCell.scala:590) > ~[flink-rpc-akka06a9bb81-2e68-483a-b236-a283d0b1d097.jar:1.19-SNAPSHOT] > Feb 02 01:11:55 at > org.apache.pekko.actor.ActorCell.invoke(ActorCell.scala:557) > ~[flink-rpc-akka06a9bb81-2e68-483a-b236-a283d0b1d097.jar:1.19-SNAPSHOT] > Feb 02 01:11:55 at > org.apache.pekko.dispatch.Mailbox.processMailbox(Mailbox.scala:280) > ~[flink-rpc-akka06a9bb81-2e68-483a-b236-a283d0b1d097.jar:1.19-SNAPSHOT] > Feb 02 01:11:55 at > org.apache.pekko.dispatch.Mailbox.run(Mailbox.scala:241) > ~[flink-rpc-akka06a9bb81-2e68-483a-b236-a283d0b1d097.jar:1.19-SNAPSHOT] > Feb 02 01:11:55 at > org.apache.pekko.dispatch.Mailbox.exec(Mailbox.scala:253)
[jira] [Updated] (FLINK-34343) ResourceManager registration is not completed when registering the JobMaster
[ https://issues.apache.org/jira/browse/FLINK-34343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matthias Pohl updated FLINK-34343: -- Attachment: FLINK-34343_k8s_application_cluster_e2e_test.log > ResourceManager registration is not completed when registering the JobMaster > > > Key: FLINK-34343 > URL: https://issues.apache.org/jira/browse/FLINK-34343 > Project: Flink > Issue Type: Bug > Components: Runtime / Coordination, Runtime / RPC >Affects Versions: 1.19.0 >Reporter: Matthias Pohl >Priority: Blocker > Labels: test-stability > Attachments: FLINK-34343_k8s_application_cluster_e2e_test.log > > > https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=57203=logs=64debf87-ecdb-5aef-788d-8720d341b5cb=2302fb98-0839-5df2-3354-bbae636f81a7=8066 > The test run failed due to a NullPointerException: > {code} > Feb 02 01:11:55 2024-02-02 01:11:47,791 INFO > org.apache.flink.runtime.rpc.pekko.FencedPekkoRpcActor [] - The rpc > endpoint > org.apache.flink.runtime.resourcemanager.active.ActiveResourceManager has not > been started yet. Discarding message LocalFencedMessage(000 > 0, > LocalRpcInvocation(ResourceManagerGateway.registerJobMaster(JobMasterId, > ResourceID, String, JobID, Time))) until processing is started. > Feb 02 01:11:55 2024-02-02 01:11:47,797 WARN > org.apache.flink.runtime.rpc.pekko.SupervisorActor [] - RpcActor > pekko://flink/user/rpc/resourcemanager_2 has failed. Shutting it down now. > Feb 02 01:11:55 java.lang.NullPointerException: Cannot invoke > "org.apache.flink.runtime.rpc.RpcServer.getAddress()" because > "this.rpcServer" is null > Feb 02 01:11:55 at > org.apache.flink.runtime.rpc.RpcEndpoint.getAddress(RpcEndpoint.java:322) > ~[flink-dist-1.19-SNAPSHOT.jar:1.19-SNAPSHOT] > Feb 02 01:11:55 at > org.apache.flink.runtime.rpc.pekko.PekkoRpcActor.handleMessage(PekkoRpcActor.java:182) > ~[flink-rpc-akka06a9bb81-2e68-483a-b236-a283d0b1d097.jar:1.19-SNAPSHOT] > Feb 02 01:11:55 at > org.apache.pekko.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:33) > ~[flink-rpc-akka06a9bb81-2e68-483a-b236-a283d0b1d097.jar:1.19-SNAPSHOT] > Feb 02 01:11:55 at > org.apache.pekko.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:29) > ~[flink-rpc-akka06a9bb81-2e68-483a-b236-a283d0b1d097.jar:1.19-SNAPSHOT] > Feb 02 01:11:55 at > scala.PartialFunction.applyOrElse(PartialFunction.scala:127) > ~[flink-rpc-akka06a9bb81-2e68-483a-b236-a283d0b1d097.jar:1.19-SNAPSHOT] > Feb 02 01:11:55 at > scala.PartialFunction.applyOrElse$(PartialFunction.scala:126) > ~[flink-rpc-akka06a9bb81-2e68-483a-b236-a283d0b1d097.jar:1.19-SNAPSHOT] > Feb 02 01:11:55 at > org.apache.pekko.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala:29) > ~[flink-rpc-akka06a9bb81-2e68-483a-b236-a283d0b1d097.jar:1.19-SNAPSHOT] > Feb 02 01:11:55 at > scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:175) > ~[flink-rpc-akka06a9bb81-2e68-483a-b236-a283d0b1d097.jar:1.19-SNAPSHOT] > Feb 02 01:11:55 at > scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:176) > ~[flink-rpc-akka06a9bb81-2e68-483a-b236-a283d0b1d097.jar:1.19-SNAPSHOT] > Feb 02 01:11:55 at > scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:176) > ~[flink-rpc-akka06a9bb81-2e68-483a-b236-a283d0b1d097.jar:1.19-SNAPSHOT] > Feb 02 01:11:55 at > org.apache.pekko.actor.Actor.aroundReceive(Actor.scala:547) > ~[flink-rpc-akka06a9bb81-2e68-483a-b236-a283d0b1d097.jar:1.19-SNAPSHOT] > Feb 02 01:11:55 at > org.apache.pekko.actor.Actor.aroundReceive$(Actor.scala:545) > ~[flink-rpc-akka06a9bb81-2e68-483a-b236-a283d0b1d097.jar:1.19-SNAPSHOT] > Feb 02 01:11:55 at > org.apache.pekko.actor.AbstractActor.aroundReceive(AbstractActor.scala:229) > ~[flink-rpc-akka06a9bb81-2e68-483a-b236-a283d0b1d097.jar:1.19-SNAPSHOT] > Feb 02 01:11:55 at > org.apache.pekko.actor.ActorCell.receiveMessage(ActorCell.scala:590) > ~[flink-rpc-akka06a9bb81-2e68-483a-b236-a283d0b1d097.jar:1.19-SNAPSHOT] > Feb 02 01:11:55 at > org.apache.pekko.actor.ActorCell.invoke(ActorCell.scala:557) > ~[flink-rpc-akka06a9bb81-2e68-483a-b236-a283d0b1d097.jar:1.19-SNAPSHOT] > Feb 02 01:11:55 at > org.apache.pekko.dispatch.Mailbox.processMailbox(Mailbox.scala:280) > ~[flink-rpc-akka06a9bb81-2e68-483a-b236-a283d0b1d097.jar:1.19-SNAPSHOT] > Feb 02 01:11:55 at > org.apache.pekko.dispatch.Mailbox.run(Mailbox.scala:241) > ~[flink-rpc-akka06a9bb81-2e68-483a-b236-a283d0b1d097.jar:1.19-SNAPSHOT] > Feb 02 01:11:55 at > org.apache.pekko.dispatch.Mailbox.exec(Mailbox.scala:253) >
[jira] [Commented] (FLINK-34007) Flink Job stuck in suspend state after losing leadership in HA Mode
[ https://issues.apache.org/jira/browse/FLINK-34007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17813626#comment-17813626 ] Matthias Pohl commented on FLINK-34007: --- The Azure test failure seems to be unrelated. I created FLINK-34343 to cover this one separately. The GitHub Actions failures is still concerning, though. > Flink Job stuck in suspend state after losing leadership in HA Mode > --- > > Key: FLINK-34007 > URL: https://issues.apache.org/jira/browse/FLINK-34007 > Project: Flink > Issue Type: Bug > Components: Runtime / Coordination >Affects Versions: 1.19.0, 1.18.1, 1.18.2 >Reporter: Zhenqiu Huang >Assignee: Matthias Pohl >Priority: Blocker > Labels: pull-request-available > Fix For: 1.19.0 > > Attachments: Debug.log, LeaderElector-Debug.json, job-manager.log > > > The observation is that Job manager goes to suspend state with a failed > container not able to register itself to resource manager after timeout. > JM Log, see attached > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-34343) ResourceManager registration is not completed when registering the JobMaster
[ https://issues.apache.org/jira/browse/FLINK-34343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17813624#comment-17813624 ] Matthias Pohl commented on FLINK-34343: --- Initially, I thought that it would be caused by the FLINK-34007 changes. But that's not the case. The "Run kubernetes application test" doesn't rely HA but uses {{StandaloneLeaderElection}} (sessionId: {{}}). > ResourceManager registration is not completed when registering the JobMaster > > > Key: FLINK-34343 > URL: https://issues.apache.org/jira/browse/FLINK-34343 > Project: Flink > Issue Type: Bug > Components: Runtime / Coordination, Runtime / RPC >Affects Versions: 1.19.0 >Reporter: Matthias Pohl >Priority: Blocker > Labels: test-stability > > https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=57203=logs=64debf87-ecdb-5aef-788d-8720d341b5cb=2302fb98-0839-5df2-3354-bbae636f81a7=8066 > The test run failed due to a NullPointerException: > {code} > Feb 02 01:11:55 2024-02-02 01:11:47,791 INFO > org.apache.flink.runtime.rpc.pekko.FencedPekkoRpcActor [] - The rpc > endpoint > org.apache.flink.runtime.resourcemanager.active.ActiveResourceManager has not > been started yet. Discarding message LocalFencedMessage(000 > 0, > LocalRpcInvocation(ResourceManagerGateway.registerJobMaster(JobMasterId, > ResourceID, String, JobID, Time))) until processing is started. > Feb 02 01:11:55 2024-02-02 01:11:47,797 WARN > org.apache.flink.runtime.rpc.pekko.SupervisorActor [] - RpcActor > pekko://flink/user/rpc/resourcemanager_2 has failed. Shutting it down now. > Feb 02 01:11:55 java.lang.NullPointerException: Cannot invoke > "org.apache.flink.runtime.rpc.RpcServer.getAddress()" because > "this.rpcServer" is null > Feb 02 01:11:55 at > org.apache.flink.runtime.rpc.RpcEndpoint.getAddress(RpcEndpoint.java:322) > ~[flink-dist-1.19-SNAPSHOT.jar:1.19-SNAPSHOT] > Feb 02 01:11:55 at > org.apache.flink.runtime.rpc.pekko.PekkoRpcActor.handleMessage(PekkoRpcActor.java:182) > ~[flink-rpc-akka06a9bb81-2e68-483a-b236-a283d0b1d097.jar:1.19-SNAPSHOT] > Feb 02 01:11:55 at > org.apache.pekko.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:33) > ~[flink-rpc-akka06a9bb81-2e68-483a-b236-a283d0b1d097.jar:1.19-SNAPSHOT] > Feb 02 01:11:55 at > org.apache.pekko.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:29) > ~[flink-rpc-akka06a9bb81-2e68-483a-b236-a283d0b1d097.jar:1.19-SNAPSHOT] > Feb 02 01:11:55 at > scala.PartialFunction.applyOrElse(PartialFunction.scala:127) > ~[flink-rpc-akka06a9bb81-2e68-483a-b236-a283d0b1d097.jar:1.19-SNAPSHOT] > Feb 02 01:11:55 at > scala.PartialFunction.applyOrElse$(PartialFunction.scala:126) > ~[flink-rpc-akka06a9bb81-2e68-483a-b236-a283d0b1d097.jar:1.19-SNAPSHOT] > Feb 02 01:11:55 at > org.apache.pekko.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala:29) > ~[flink-rpc-akka06a9bb81-2e68-483a-b236-a283d0b1d097.jar:1.19-SNAPSHOT] > Feb 02 01:11:55 at > scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:175) > ~[flink-rpc-akka06a9bb81-2e68-483a-b236-a283d0b1d097.jar:1.19-SNAPSHOT] > Feb 02 01:11:55 at > scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:176) > ~[flink-rpc-akka06a9bb81-2e68-483a-b236-a283d0b1d097.jar:1.19-SNAPSHOT] > Feb 02 01:11:55 at > scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:176) > ~[flink-rpc-akka06a9bb81-2e68-483a-b236-a283d0b1d097.jar:1.19-SNAPSHOT] > Feb 02 01:11:55 at > org.apache.pekko.actor.Actor.aroundReceive(Actor.scala:547) > ~[flink-rpc-akka06a9bb81-2e68-483a-b236-a283d0b1d097.jar:1.19-SNAPSHOT] > Feb 02 01:11:55 at > org.apache.pekko.actor.Actor.aroundReceive$(Actor.scala:545) > ~[flink-rpc-akka06a9bb81-2e68-483a-b236-a283d0b1d097.jar:1.19-SNAPSHOT] > Feb 02 01:11:55 at > org.apache.pekko.actor.AbstractActor.aroundReceive(AbstractActor.scala:229) > ~[flink-rpc-akka06a9bb81-2e68-483a-b236-a283d0b1d097.jar:1.19-SNAPSHOT] > Feb 02 01:11:55 at > org.apache.pekko.actor.ActorCell.receiveMessage(ActorCell.scala:590) > ~[flink-rpc-akka06a9bb81-2e68-483a-b236-a283d0b1d097.jar:1.19-SNAPSHOT] > Feb 02 01:11:55 at > org.apache.pekko.actor.ActorCell.invoke(ActorCell.scala:557) > ~[flink-rpc-akka06a9bb81-2e68-483a-b236-a283d0b1d097.jar:1.19-SNAPSHOT] > Feb 02 01:11:55 at > org.apache.pekko.dispatch.Mailbox.processMailbox(Mailbox.scala:280) > ~[flink-rpc-akka06a9bb81-2e68-483a-b236-a283d0b1d097.jar:1.19-SNAPSHOT] > Feb 02 01:11:55 at > org.apache.pekko.dispatch.Mailbox.run(Mailbox.scala:241) >
[jira] [Created] (FLINK-34343) ResourceManager registration is not completed when registering the JobMaster
Matthias Pohl created FLINK-34343: - Summary: ResourceManager registration is not completed when registering the JobMaster Key: FLINK-34343 URL: https://issues.apache.org/jira/browse/FLINK-34343 Project: Flink Issue Type: Bug Components: Runtime / Coordination, Runtime / RPC Affects Versions: 1.19.0 Reporter: Matthias Pohl https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=57203=logs=64debf87-ecdb-5aef-788d-8720d341b5cb=2302fb98-0839-5df2-3354-bbae636f81a7=8066 The test run failed due to a NullPointerException: {code} Feb 02 01:11:55 2024-02-02 01:11:47,791 INFO org.apache.flink.runtime.rpc.pekko.FencedPekkoRpcActor [] - The rpc endpoint org.apache.flink.runtime.resourcemanager.active.ActiveResourceManager has not been started yet. Discarding message LocalFencedMessage(000 0, LocalRpcInvocation(ResourceManagerGateway.registerJobMaster(JobMasterId, ResourceID, String, JobID, Time))) until processing is started. Feb 02 01:11:55 2024-02-02 01:11:47,797 WARN org.apache.flink.runtime.rpc.pekko.SupervisorActor [] - RpcActor pekko://flink/user/rpc/resourcemanager_2 has failed. Shutting it down now. Feb 02 01:11:55 java.lang.NullPointerException: Cannot invoke "org.apache.flink.runtime.rpc.RpcServer.getAddress()" because "this.rpcServer" is null Feb 02 01:11:55 at org.apache.flink.runtime.rpc.RpcEndpoint.getAddress(RpcEndpoint.java:322) ~[flink-dist-1.19-SNAPSHOT.jar:1.19-SNAPSHOT] Feb 02 01:11:55 at org.apache.flink.runtime.rpc.pekko.PekkoRpcActor.handleMessage(PekkoRpcActor.java:182) ~[flink-rpc-akka06a9bb81-2e68-483a-b236-a283d0b1d097.jar:1.19-SNAPSHOT] Feb 02 01:11:55 at org.apache.pekko.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:33) ~[flink-rpc-akka06a9bb81-2e68-483a-b236-a283d0b1d097.jar:1.19-SNAPSHOT] Feb 02 01:11:55 at org.apache.pekko.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:29) ~[flink-rpc-akka06a9bb81-2e68-483a-b236-a283d0b1d097.jar:1.19-SNAPSHOT] Feb 02 01:11:55 at scala.PartialFunction.applyOrElse(PartialFunction.scala:127) ~[flink-rpc-akka06a9bb81-2e68-483a-b236-a283d0b1d097.jar:1.19-SNAPSHOT] Feb 02 01:11:55 at scala.PartialFunction.applyOrElse$(PartialFunction.scala:126) ~[flink-rpc-akka06a9bb81-2e68-483a-b236-a283d0b1d097.jar:1.19-SNAPSHOT] Feb 02 01:11:55 at org.apache.pekko.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala:29) ~[flink-rpc-akka06a9bb81-2e68-483a-b236-a283d0b1d097.jar:1.19-SNAPSHOT] Feb 02 01:11:55 at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:175) ~[flink-rpc-akka06a9bb81-2e68-483a-b236-a283d0b1d097.jar:1.19-SNAPSHOT] Feb 02 01:11:55 at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:176) ~[flink-rpc-akka06a9bb81-2e68-483a-b236-a283d0b1d097.jar:1.19-SNAPSHOT] Feb 02 01:11:55 at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:176) ~[flink-rpc-akka06a9bb81-2e68-483a-b236-a283d0b1d097.jar:1.19-SNAPSHOT] Feb 02 01:11:55 at org.apache.pekko.actor.Actor.aroundReceive(Actor.scala:547) ~[flink-rpc-akka06a9bb81-2e68-483a-b236-a283d0b1d097.jar:1.19-SNAPSHOT] Feb 02 01:11:55 at org.apache.pekko.actor.Actor.aroundReceive$(Actor.scala:545) ~[flink-rpc-akka06a9bb81-2e68-483a-b236-a283d0b1d097.jar:1.19-SNAPSHOT] Feb 02 01:11:55 at org.apache.pekko.actor.AbstractActor.aroundReceive(AbstractActor.scala:229) ~[flink-rpc-akka06a9bb81-2e68-483a-b236-a283d0b1d097.jar:1.19-SNAPSHOT] Feb 02 01:11:55 at org.apache.pekko.actor.ActorCell.receiveMessage(ActorCell.scala:590) ~[flink-rpc-akka06a9bb81-2e68-483a-b236-a283d0b1d097.jar:1.19-SNAPSHOT] Feb 02 01:11:55 at org.apache.pekko.actor.ActorCell.invoke(ActorCell.scala:557) ~[flink-rpc-akka06a9bb81-2e68-483a-b236-a283d0b1d097.jar:1.19-SNAPSHOT] Feb 02 01:11:55 at org.apache.pekko.dispatch.Mailbox.processMailbox(Mailbox.scala:280) ~[flink-rpc-akka06a9bb81-2e68-483a-b236-a283d0b1d097.jar:1.19-SNAPSHOT] Feb 02 01:11:55 at org.apache.pekko.dispatch.Mailbox.run(Mailbox.scala:241) ~[flink-rpc-akka06a9bb81-2e68-483a-b236-a283d0b1d097.jar:1.19-SNAPSHOT] Feb 02 01:11:55 at org.apache.pekko.dispatch.Mailbox.exec(Mailbox.scala:253) ~[flink-rpc-akka06a9bb81-2e68-483a-b236-a283d0b1d097.jar:1.19-SNAPSHOT] Feb 02 01:11:55 at java.util.concurrent.ForkJoinTask.doExec(Unknown Source) ~[?:?] Feb 02 01:11:55 at java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(Unknown Source) ~[?:?] Feb 02 01:11:55 at java.util.concurrent.ForkJoinPool.scan(Unknown Source) ~[?:?] Feb 02 01:11:55 at java.util.concurrent.ForkJoinPool.runWorker(Unknown Source) ~[?:?] Feb 02 01:11:55 at java.util.concurrent.ForkJoinWorkerThread.run(Unknown Source) ~[?:?] {code} -- This message
Re: [PR] [FLINK-34247][doc] Document FLIP-366: Support standard YAML for FLINK configuration [flink]
flinkbot commented on PR #24251: URL: https://github.com/apache/flink/pull/24251#issuecomment-1923581075 ## CI report: * e2f376490d7922436989cf89a66269f2cd73272c UNKNOWN Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (FLINK-34247) Document FLIP-366: Support standard YAML for FLINK configuration
[ https://issues.apache.org/jira/browse/FLINK-34247?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated FLINK-34247: --- Labels: pull-request-available (was: ) > Document FLIP-366: Support standard YAML for FLINK configuration > > > Key: FLINK-34247 > URL: https://issues.apache.org/jira/browse/FLINK-34247 > Project: Flink > Issue Type: Sub-task > Components: Documentation >Reporter: Junrui Li >Priority: Major > Labels: pull-request-available > Fix For: 1.19.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[PR] [FLINK-34247][doc] Document FLIP-366: Support standard YAML for FLINK configuration [flink]
JunRuiLee opened a new pull request, #24251: URL: https://github.com/apache/flink/pull/24251 ## What is the purpose of the change Document FLIP-366: Support standard YAML for FLINK configuration ## Brief change log - Add documentation of new Flink configuration file config.yaml. - Update the usage of flink-conf.yaml in doc. - Update the usage of "env.java.home" in doc. The updated document is shown in the figure: English version: ![config-1](https://github.com/apache/flink/assets/107924572/9d662f0d-c066-4aeb-83ca-51cdb2827942) ![config-2](https://github.com/apache/flink/assets/107924572/2aaf6ec6-cf82-4de5-b54b-32e973a75108) Chinese version: ![config-zh-1](https://github.com/apache/flink/assets/107924572/25dc226e-96f7-49a6-9f82-ab53dc46856a) ![config-zh-2](https://github.com/apache/flink/assets/107924572/159546ca-3da9-4ed0-9d0c-dc760bee79d3) ## Verifying this change This change is a trivial rework / code cleanup without any test coverage. ## Does this pull request potentially affect one of the following parts: - Dependencies (does it add or upgrade a dependency): (yes / **no**) - The public API, i.e., is any changed class annotated with `@Public(Evolving)`: (yes / **no**) - The serializers: (yes / **no** / don't know) - The runtime per-record code paths (performance sensitive): (yes / **no** / don't know) - Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn, ZooKeeper: (yes / **no** / don't know) - The S3 file system connector: (yes / **no** / don't know) ## Documentation - Does this pull request introduce a new feature? (yes / **no**) - If yes, how is the feature documented? (not applicable / **docs** / JavaDocs / not documented) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (FLINK-34229) Duplicate entry in InnerClasses attribute in class file FusionStreamOperator
[ https://issues.apache.org/jira/browse/FLINK-34229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17813613#comment-17813613 ] xingbe commented on FLINK-34229: [~FrankZou] Thanks for resolving this. I have checked that after cherry-picked the commit, q35.sql is now able to run to completion without exceptions. It looks good to me. > Duplicate entry in InnerClasses attribute in class file FusionStreamOperator > > > Key: FLINK-34229 > URL: https://issues.apache.org/jira/browse/FLINK-34229 > Project: Flink > Issue Type: Bug > Components: Table SQL / Runtime >Affects Versions: 1.19.0 >Reporter: xingbe >Priority: Major > Labels: pull-request-available > Attachments: image-2024-01-24-17-05-47-883.png, taskmanager_log.txt > > > I noticed a runtime error happens in 10TB TPC-DS (q35.sql) benchmarks in > 1.19, the problem did not happen in 1.18.0. This issue may have been newly > introduced recently. !image-2024-01-24-17-05-47-883.png|width=589,height=279! -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [PR] [FLINK-26369][doc-zh] Translate the part zh-page mixed with not be translated. [flink]
1996fanrui commented on PR #20340: URL: https://github.com/apache/flink/pull/20340#issuecomment-1923500149 @flinkbot run azure -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [FLINK-26088][Connectors/ElasticSearch] Add Elasticsearch 8.0 support [flink-connector-elasticsearch]
StefanXiepj commented on PR #74: URL: https://github.com/apache/flink-connector-elasticsearch/pull/74#issuecomment-1923495793 > @StefanXiepj Are you still active on this PR? There are another PR[53](https://github.com/apache/flink-connector-elasticsearch/pull/53) doing it, i don't known which one is better. If necessary, I can continue do it -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (FLINK-34333) Fix FLINK-34007 LeaderElector bug in 1.18
[ https://issues.apache.org/jira/browse/FLINK-34333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17813604#comment-17813604 ] Yang Wang commented on FLINK-34333: --- +1 for what Chesnay said. We already shade the fabric8 k8s client in kubernetes-client module. It should not be used directly in other projects. > Fix FLINK-34007 LeaderElector bug in 1.18 > - > > Key: FLINK-34333 > URL: https://issues.apache.org/jira/browse/FLINK-34333 > Project: Flink > Issue Type: Bug > Components: Runtime / Coordination >Affects Versions: 1.18.1 >Reporter: Matthias Pohl >Assignee: Matthias Pohl >Priority: Blocker > Labels: pull-request-available > > FLINK-34007 revealed a bug in the k8s client v6.6.2 which we're using since > Flink 1.18. This issue was fixed with FLINK-34007 for Flink 1.19 which > required an update of the k8s client to v6.9.0. > This Jira issue is about finding a solution in Flink 1.18 for the very same > problem FLINK-34007 covered. It's a dedicated Jira issue because we want to > unblock the release of 1.19 by resolving FLINK-34007. > Just to summarize why the upgrade to v6.9.0 is desired: There's a bug in > v6.6.2 which might prevent the leadership lost event being forwarded to the > client ([#5463|https://github.com/fabric8io/kubernetes-client/issues/5463]). > An initial proposal where the release call was handled in Flink's > {{KubernetesLeaderElector}} didn't work due to the leadership lost event > being triggered twice (see [FLINK-34007 PR > comment|https://github.com/apache/flink/pull/24132#discussion_r1467175902]) -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-34333) Fix FLINK-34007 LeaderElector bug in 1.18
[ https://issues.apache.org/jira/browse/FLINK-34333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17813603#comment-17813603 ] Yang Wang commented on FLINK-34333: --- I am not aware of other potential issues and in favor of upgrading the k8s client version. > Fix FLINK-34007 LeaderElector bug in 1.18 > - > > Key: FLINK-34333 > URL: https://issues.apache.org/jira/browse/FLINK-34333 > Project: Flink > Issue Type: Bug > Components: Runtime / Coordination >Affects Versions: 1.18.1 >Reporter: Matthias Pohl >Assignee: Matthias Pohl >Priority: Blocker > Labels: pull-request-available > > FLINK-34007 revealed a bug in the k8s client v6.6.2 which we're using since > Flink 1.18. This issue was fixed with FLINK-34007 for Flink 1.19 which > required an update of the k8s client to v6.9.0. > This Jira issue is about finding a solution in Flink 1.18 for the very same > problem FLINK-34007 covered. It's a dedicated Jira issue because we want to > unblock the release of 1.19 by resolving FLINK-34007. > Just to summarize why the upgrade to v6.9.0 is desired: There's a bug in > v6.6.2 which might prevent the leadership lost event being forwarded to the > client ([#5463|https://github.com/fabric8io/kubernetes-client/issues/5463]). > An initial proposal where the release call was handled in Flink's > {{KubernetesLeaderElector}} didn't work due to the leadership lost event > being triggered twice (see [FLINK-34007 PR > comment|https://github.com/apache/flink/pull/24132#discussion_r1467175902]) -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [PR] [FLINK-25421] Port JDBC Sink to new Unified Sink API (FLIP-143) [flink-connector-jdbc]
eskabetxe commented on code in PR #2: URL: https://github.com/apache/flink-connector-jdbc/pull/2#discussion_r1475842140 ## .github/workflows/weekly.yml: ## @@ -27,25 +27,13 @@ jobs: strategy: matrix: flink_branches: [{ - flink: 1.16-SNAPSHOT, - branch: main -}, { - flink: 1.17-SNAPSHOT, - branch: main -}, { flink: 1.18-SNAPSHOT, jdk: '8, 11, 17', branch: main }, { flink: 1.19-SNAPSHOT, jdk: '8, 11, 17, 21', branch: main -}, { - flink: 1.16.2, - branch: v3.1 -}, { - flink: 1.17.1, - branch: v3.1 }, { Review Comment: We could release 3.2 (3.1.X) with current main code giving support for 1.17 and 1.18 (1.18 still not support by any release), I don't know why still don't have release an 1.18 compatible version, any blocker? And them merge this and release again with (minor or major).. ## .github/workflows/weekly.yml: ## @@ -27,25 +27,13 @@ jobs: strategy: matrix: flink_branches: [{ - flink: 1.16-SNAPSHOT, - branch: main -}, { - flink: 1.17-SNAPSHOT, - branch: main -}, { flink: 1.18-SNAPSHOT, jdk: '8, 11, 17', branch: main }, { flink: 1.19-SNAPSHOT, jdk: '8, 11, 17, 21', branch: main -}, { - flink: 1.16.2, - branch: v3.1 -}, { - flink: 1.17.1, - branch: v3.1 }, { Review Comment: We could release 3.2 (3.1.X) with current main code giving support for 1.17 and 1.18 (1.18 still not support by any release), I don't know why still don't have release an 1.18 compatible version, any blocker? And them merge this and release again with (minor or major).. Would this work? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [FLINK-25421] Port JDBC Sink to new Unified Sink API (FLIP-143) [flink-connector-jdbc]
eskabetxe commented on code in PR #2: URL: https://github.com/apache/flink-connector-jdbc/pull/2#discussion_r1475842140 ## .github/workflows/weekly.yml: ## @@ -27,25 +27,13 @@ jobs: strategy: matrix: flink_branches: [{ - flink: 1.16-SNAPSHOT, - branch: main -}, { - flink: 1.17-SNAPSHOT, - branch: main -}, { flink: 1.18-SNAPSHOT, jdk: '8, 11, 17', branch: main }, { flink: 1.19-SNAPSHOT, jdk: '8, 11, 17, 21', branch: main -}, { - flink: 1.16.2, - branch: v3.1 -}, { - flink: 1.17.1, - branch: v3.1 }, { Review Comment: We could release 3.2 (or 3.1.X) with current main code giving support for 1.16, 1.17 and 1.18 (1.18 still not support by any release), I don't know why still don't have release an 1.18 compatible version, any blocker? And them merge this and release again with (minor or major).. Would this work? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (FLINK-31989) Update documentation
[ https://issues.apache.org/jira/browse/FLINK-31989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17813602#comment-17813602 ] Danny Cranmer commented on FLINK-31989: --- We could split this task and deliver incrementally with each subtask, rather than all at once > Update documentation > > > Key: FLINK-31989 > URL: https://issues.apache.org/jira/browse/FLINK-31989 > Project: Flink > Issue Type: Sub-task >Reporter: Hong Liang Teoh >Priority: Major > > Update Flink documentation to explain the new KDS source. Include > * Improvements available in new KDS source > * Incompatible changes made > * Example implementation > * Example customisations -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (FLINK-34342) Address Shard Consistency Issue for DDB Streams Source
[ https://issues.apache.org/jira/browse/FLINK-34342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Danny Cranmer updated FLINK-34342: -- Description: *Problem* We call [DescribeStream|https://docs.aws.amazon.com/amazondynamodb/latest/APIReference/API_streams_DescribeStream.html] with the ExclusiveStartShardId parameter set to the last seen shard ID. The issue is that the API is eventually consistent, meaning if we call the API multiple times we might get different results, for example: * Call 1: [A, C] * Call 2: [A, B, C] Since we would set ExclusiveStartShardId to {{{}C{}}}, in the above example the connector would miss shard {{B}} *Solution* We need to find a solution to support this gap. This could be to periodically list all shards to find gaps and not start processing new shards until their parents are complete. This feature does not need to be applied to the KDS source. was: *Problem* We call [DescribeStream|https://docs.aws.amazon.com/amazondynamodb/latest/APIReference/API_streams_DescribeStream.html] with the ExclusiveStartShardId parameter set to the last seen shard ID. > Address Shard Consistency Issue for DDB Streams Source > -- > > Key: FLINK-34342 > URL: https://issues.apache.org/jira/browse/FLINK-34342 > Project: Flink > Issue Type: Sub-task > Components: Connectors / DynamoDB >Reporter: Danny Cranmer >Priority: Major > > *Problem* > We call > [DescribeStream|https://docs.aws.amazon.com/amazondynamodb/latest/APIReference/API_streams_DescribeStream.html] > with the ExclusiveStartShardId parameter set to the last seen shard ID. The > issue is that the API is eventually consistent, meaning if we call the API > multiple times we might get different results, for example: > * Call 1: [A, C] > * Call 2: [A, B, C] > Since we would set ExclusiveStartShardId to {{{}C{}}}, in the above example > the connector would miss shard {{B}} > *Solution* > We need to find a solution to support this gap. This could be to periodically > list all shards to find gaps and not start processing new shards until their > parents are complete. This feature does not need to be applied to the KDS > source. -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [PR] [FLINK-33396][table-planner] Fix table alias not be cleared in subquery [flink]
xuyangzhong commented on code in PR #24239: URL: https://github.com/apache/flink/pull/24239#discussion_r1475829008 ## flink-table/flink-table-planner/src/test/resources/org/apache/flink/table/planner/plan/hints/batch/NestLoopJoinHintTest.xml: ## @@ -679,8 +680,8 @@ HashJoin(joinType=[LeftSemiJoin], where=[=(a1, a2)], select=[a1, b1], build=[rig +- Calc(select=[a2]) +- NestedLoopJoin(joinType=[InnerJoin], where=[=(a2, a3)], select=[a2, a3], build=[left]) :- Exchange(distribution=[broadcast]) - : +- TableSourceScan(table=[[default_catalog, default_database, T2, project=[a2], metadata=[]]], fields=[a2], hints=[[[ALIAS options:[T2) - +- TableSourceScan(table=[[default_catalog, default_database, T3, project=[a3], metadata=[]]], fields=[a3], hints=[[[ALIAS options:[T3) Review Comment: Got it! I have added two tests for table hints. 1. There are some alias hints because of query hints exist, and table hints will not be cleared. 2. There are no some alias hints because of query hints not exist, and table hints will also not be cleared. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (FLINK-34342) Address ListShards Consistency Issue for DDB Streams Source
[ https://issues.apache.org/jira/browse/FLINK-34342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Danny Cranmer updated FLINK-34342: -- Description: *Problem* We call [DescribeStream|https://docs.aws.amazon.com/amazondynamodb/latest/APIReference/API_streams_DescribeStream.html] with the ExclusiveStartShardId parameter set to the last seen shard ID. > Address ListShards Consistency Issue for DDB Streams Source > --- > > Key: FLINK-34342 > URL: https://issues.apache.org/jira/browse/FLINK-34342 > Project: Flink > Issue Type: Sub-task > Components: Connectors / DynamoDB >Reporter: Danny Cranmer >Priority: Major > > *Problem* > We call > [DescribeStream|https://docs.aws.amazon.com/amazondynamodb/latest/APIReference/API_streams_DescribeStream.html] > with the ExclusiveStartShardId parameter set to the last seen shard ID. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (FLINK-34342) Address Shard Consistency Issue for DDB Streams Source
[ https://issues.apache.org/jira/browse/FLINK-34342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Danny Cranmer updated FLINK-34342: -- Summary: Address Shard Consistency Issue for DDB Streams Source (was: Address ListShards Consistency Issue for DDB Streams Source) > Address Shard Consistency Issue for DDB Streams Source > -- > > Key: FLINK-34342 > URL: https://issues.apache.org/jira/browse/FLINK-34342 > Project: Flink > Issue Type: Sub-task > Components: Connectors / DynamoDB >Reporter: Danny Cranmer >Priority: Major > > *Problem* > We call > [DescribeStream|https://docs.aws.amazon.com/amazondynamodb/latest/APIReference/API_streams_DescribeStream.html] > with the ExclusiveStartShardId parameter set to the last seen shard ID. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (FLINK-34342) Address ListShards Consistency Issue for DDB Streams Source
[ https://issues.apache.org/jira/browse/FLINK-34342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Danny Cranmer updated FLINK-34342: -- Summary: Address ListShards Consistency Issue for DDB Streams Source (was: Address ListShards Consistency for DDB Streams Source) > Address ListShards Consistency Issue for DDB Streams Source > --- > > Key: FLINK-34342 > URL: https://issues.apache.org/jira/browse/FLINK-34342 > Project: Flink > Issue Type: Sub-task > Components: Connectors / DynamoDB >Reporter: Danny Cranmer >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-34342) Address ListShards Consistency for DDB Streams Source
Danny Cranmer created FLINK-34342: - Summary: Address ListShards Consistency for DDB Streams Source Key: FLINK-34342 URL: https://issues.apache.org/jira/browse/FLINK-34342 Project: Flink Issue Type: Sub-task Components: Connectors / DynamoDB Reporter: Danny Cranmer -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (FLINK-34340) Add support for DDB Streams for DataStream API
[ https://issues.apache.org/jira/browse/FLINK-34340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Danny Cranmer updated FLINK-34340: -- Summary: Add support for DDB Streams for DataStream API (was: Add support for DDB Streams) > Add support for DDB Streams for DataStream API > -- > > Key: FLINK-34340 > URL: https://issues.apache.org/jira/browse/FLINK-34340 > Project: Flink > Issue Type: Sub-task > Components: Connectors / DynamoDB >Reporter: Danny Cranmer >Priority: Major > > In the legacy KDS source we support Amazon DynamoDB streams via an adapter > shim. Both KDS and DDB streams have a similar API. > This task builds upon https://issues.apache.org/jira/browse/FLINK-34339 and > will add a {{DynamoDBStreamsSource}} which will setup a DDB SDK client shim. -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [PR] [FLINK-25421] Port JDBC Sink to new Unified Sink API (FLIP-143) [flink-connector-jdbc]
snuyanzin commented on code in PR #2: URL: https://github.com/apache/flink-connector-jdbc/pull/2#discussion_r1475815795 ## .github/workflows/weekly.yml: ## @@ -27,25 +27,13 @@ jobs: strategy: matrix: flink_branches: [{ - flink: 1.16-SNAPSHOT, - branch: main -}, { - flink: 1.17-SNAPSHOT, - branch: main -}, { flink: 1.18-SNAPSHOT, jdk: '8, 11, 17', branch: main }, { flink: 1.19-SNAPSHOT, jdk: '8, 11, 17, 21', branch: main -}, { - flink: 1.16.2, - branch: v3.1 -}, { - flink: 1.17.1, - branch: v3.1 }, { Review Comment: IIRC idealy we should support currently supported Flink versions (1.17, 1.18, 1.19 and 1.16 (probably will be not supported after 1.19 release)). I don't know another way to have all of them supported with out branching (4.0.0 or 3.2.0 doesn't really matter) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [FLINK-25421] Port JDBC Sink to new Unified Sink API (FLIP-143) [flink-connector-jdbc]
snuyanzin commented on code in PR #2: URL: https://github.com/apache/flink-connector-jdbc/pull/2#discussion_r1475815795 ## .github/workflows/weekly.yml: ## @@ -27,25 +27,13 @@ jobs: strategy: matrix: flink_branches: [{ - flink: 1.16-SNAPSHOT, - branch: main -}, { - flink: 1.17-SNAPSHOT, - branch: main -}, { flink: 1.18-SNAPSHOT, jdk: '8, 11, 17', branch: main }, { flink: 1.19-SNAPSHOT, jdk: '8, 11, 17, 21', branch: main -}, { - flink: 1.16.2, - branch: v3.1 -}, { - flink: 1.17.1, - branch: v3.1 }, { Review Comment: IIRC idealy we should support currently supported Flink versions (1.17, 1.18, 1.19 and 1.16 (probably will be not supported after 1.19 release)). I don't know another way to have all of them supported without branching (4.0.0 or 3.2.0 doesn't really matter) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (FLINK-34341) Implement DDB Streams Table API support
[ https://issues.apache.org/jira/browse/FLINK-34341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Danny Cranmer updated FLINK-34341: -- Description: Implement Table API support for DDB Streams Source. Consider: * Configurations to support. Should have customisation parity with DataStream API * Testing should include both SQL client + Table API via Java was: Implement Table API support for KDS Source. Consider: * Configurations to support. Should have customisation parity with DataStream API * Testing should include both SQL client + Table API via Java > Implement DDB Streams Table API support > --- > > Key: FLINK-34341 > URL: https://issues.apache.org/jira/browse/FLINK-34341 > Project: Flink > Issue Type: Sub-task > Components: Connectors / Kinesis >Reporter: Hong Liang Teoh >Priority: Major > Labels: pull-request-available > > Implement Table API support for DDB Streams Source. > > Consider: > * Configurations to support. Should have customisation parity with > DataStream API > * Testing should include both SQL client + Table API via Java -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (FLINK-34341) Implement DDB Streams Table API support
[ https://issues.apache.org/jira/browse/FLINK-34341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Danny Cranmer updated FLINK-34341: -- Reporter: Danny Cranmer (was: Hong Liang Teoh) > Implement DDB Streams Table API support > --- > > Key: FLINK-34341 > URL: https://issues.apache.org/jira/browse/FLINK-34341 > Project: Flink > Issue Type: Sub-task > Components: Connectors / Kinesis >Reporter: Danny Cranmer >Priority: Major > Labels: pull-request-available > > Implement Table API support for DDB Streams Source. > > Consider: > * Configurations to support. Should have customisation parity with > DataStream API > * Testing should include both SQL client + Table API via Java -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-34341) Implement DDB Streams Table API support
Danny Cranmer created FLINK-34341: - Summary: Implement DDB Streams Table API support Key: FLINK-34341 URL: https://issues.apache.org/jira/browse/FLINK-34341 Project: Flink Issue Type: Sub-task Components: Connectors / Kinesis Reporter: Hong Liang Teoh Implement Table API support for KDS Source. Consider: * Configurations to support. Should have customisation parity with DataStream API * Testing should include both SQL client + Table API via Java -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (FLINK-31987) Implement KDS Table API support
[ https://issues.apache.org/jira/browse/FLINK-31987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Danny Cranmer updated FLINK-31987: -- Component/s: Connectors / Kinesis > Implement KDS Table API support > --- > > Key: FLINK-31987 > URL: https://issues.apache.org/jira/browse/FLINK-31987 > Project: Flink > Issue Type: Sub-task > Components: Connectors / Kinesis >Reporter: Hong Liang Teoh >Priority: Major > Labels: pull-request-available > > Implement Table API support for KDS Source. > > Consider: > * Configurations to support. Should have customisation parity with > DataStream API > * Testing should include both SQL client + Table API via Java -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (FLINK-31987) Implement KDS Table API support
[ https://issues.apache.org/jira/browse/FLINK-31987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Danny Cranmer updated FLINK-31987: -- Summary: Implement KDS Table API support (was: Implement Table API support) > Implement KDS Table API support > --- > > Key: FLINK-31987 > URL: https://issues.apache.org/jira/browse/FLINK-31987 > Project: Flink > Issue Type: Sub-task >Reporter: Hong Liang Teoh >Priority: Major > Labels: pull-request-available > > Implement Table API support for KDS Source. > > Consider: > * Configurations to support. Should have customisation parity with > DataStream API > * Testing should include both SQL client + Table API via Java -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (FLINK-31980) Implement support for EFO in new KDS Source
[ https://issues.apache.org/jira/browse/FLINK-31980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Danny Cranmer updated FLINK-31980: -- Summary: Implement support for EFO in new KDS Source (was: Implement support for EFO in new Source) > Implement support for EFO in new KDS Source > --- > > Key: FLINK-31980 > URL: https://issues.apache.org/jira/browse/FLINK-31980 > Project: Flink > Issue Type: Sub-task >Reporter: Hong Liang Teoh >Priority: Major > Labels: pull-request-available > > Implement support for reading from Kinesis Stream using Enhanced Fan Out > mechanism -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (FLINK-31980) Implement support for EFO in new KDS Source
[ https://issues.apache.org/jira/browse/FLINK-31980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Danny Cranmer updated FLINK-31980: -- Component/s: Connectors / Kinesis > Implement support for EFO in new KDS Source > --- > > Key: FLINK-31980 > URL: https://issues.apache.org/jira/browse/FLINK-31980 > Project: Flink > Issue Type: Sub-task > Components: Connectors / Kinesis >Reporter: Hong Liang Teoh >Priority: Major > Labels: pull-request-available > > Implement support for reading from Kinesis Stream using Enhanced Fan Out > mechanism -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-34330) Specify code owners for .github/workflows folder
[ https://issues.apache.org/jira/browse/FLINK-34330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17813596#comment-17813596 ] Chesnay Schepler commented on FLINK-34330: -- Specifically, what will not fly is that a codeowner review is REQUIRED for something to be merged. But you can use codeowners to auto-request reviews from people (WITHOUT blocking the PR on it). > Specify code owners for .github/workflows folder > > > Key: FLINK-34330 > URL: https://issues.apache.org/jira/browse/FLINK-34330 > Project: Flink > Issue Type: Sub-task >Affects Versions: 1.19.0, 1.18.1 >Reporter: Matthias Pohl >Priority: Major > > Currently, the workflow files can be modified by any committer. We have to > discuss whether we want to limit access to the PMC (or a subset of it) here. > That might be a means to protect self-hosted runners. > See the [codeowner > documentation|https://docs.github.com/en/repositories/managing-your-repositorys-settings-and-features/customizing-your-repository/about-code-owners] > for further details. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-34301) Release Testing Instructions: Verify FLINK-20281 Window aggregation supports changelog stream input
[ https://issues.apache.org/jira/browse/FLINK-34301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17813595#comment-17813595 ] xuyang commented on FLINK-34301: Window TVF aggregation supports changelog stream is ready for testing. User can add a window tvf aggregation as a down stream after CDC source or some nodes that will produce cdc records. Someone can verify this feature with: # Prepare a mysql table, and insert some data at first. # Start sql-client and prepare ddl for this mysql table as a cdc source. # You can verify the plan by `EXPLAIN PLAN_ADVICE` to check if there is a window aggregate node and the changelog contains "UA" or "UB" or "D" in its upstream. # Use different kinds of window tvf to test window tvf aggregation while updating the source data to check the data correctness. > Release Testing Instructions: Verify FLINK-20281 Window aggregation supports > changelog stream input > --- > > Key: FLINK-34301 > URL: https://issues.apache.org/jira/browse/FLINK-34301 > Project: Flink > Issue Type: Sub-task > Components: Table SQL / API >Affects Versions: 1.19.0 >Reporter: lincoln lee >Assignee: xuyang >Priority: Blocker > Labels: release-testing > Fix For: 1.19.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Comment Edited] (FLINK-34300) Release Testing Instructions: Verify FLINK-24024 Support session Window TVF
[ https://issues.apache.org/jira/browse/FLINK-34300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17813590#comment-17813590 ] xuyang edited comment on FLINK-34300 at 2/2/24 9:29 AM: Session window TVF is ready. Users can use Session window TVF aggregation instead of using legacy session group window aggregation. Someone can verify this feature by following the [doc]([https://github.com/apache/flink/pull/24250]) although it is still being reviewed. Further more, although session window join, session window rank and session window deduplicate are in experimental state, If someone finds some bugs about them, you could also open a Jira linked this one to report them. was (Author: xuyangzhong): Session window TVF is ready. Users can use Session window TVF aggregation instead of using legacy session group window aggregation. Someone can verify this feature by the [doc](https://github.com/apache/flink/pull/24250) although it is preparing. Further more, although session window join, session window rank and session window deduplicate are in experimental state, If someone finds some bugs about them, you could also open a Jira linked this one to report it. > Release Testing Instructions: Verify FLINK-24024 Support session Window TVF > --- > > Key: FLINK-34300 > URL: https://issues.apache.org/jira/browse/FLINK-34300 > Project: Flink > Issue Type: Sub-task > Components: Table SQL / API >Affects Versions: 1.19.0 >Reporter: lincoln lee >Assignee: xuyang >Priority: Blocker > Labels: release-testing > Fix For: 1.19.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-34340) Add support for DDB Streams
Danny Cranmer created FLINK-34340: - Summary: Add support for DDB Streams Key: FLINK-34340 URL: https://issues.apache.org/jira/browse/FLINK-34340 Project: Flink Issue Type: Sub-task Components: Connectors / DynamoDB Reporter: Danny Cranmer In the legacy KDS source we support Amazon DynamoDB streams via an adapter shim. Both KDS and DDB streams have a similar API. This task builds upon https://issues.apache.org/jira/browse/FLINK-34339 and will add a {{DynamoDBStreamsSource}} which will setup a DDB SDK client shim. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (FLINK-34339) Add connector abstraction layer to remove reliance on AWS SDK classes
[ https://issues.apache.org/jira/browse/FLINK-34339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Danny Cranmer updated FLINK-34339: -- Component/s: Connectors / Kinesis > Add connector abstraction layer to remove reliance on AWS SDK classes > - > > Key: FLINK-34339 > URL: https://issues.apache.org/jira/browse/FLINK-34339 > Project: Flink > Issue Type: Sub-task > Components: Connectors / Kinesis >Reporter: Danny Cranmer >Priority: Major > > In order to shim DDB streams we need to be able to support the > Stream/Shard/Record etc concepts without tying to a specific implementation. > This will allow us to mimic the KDS/DDB streams support in the old connector > by providing a shim at the AWS SDK client. > # Model {{software.amazon.awssdk.services.kinesis}} classes as native > concepts > # Push down any usage of {{software.amazon.awssdk.services.kinesis}} to a > KDS specific class > # Ensure that the bulk of the connector logic is reusable, the top level > class would be implementation specific and shim in the write client factories > and configuration -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-34300) Release Testing Instructions: Verify FLINK-24024 Support session Window TVF
[ https://issues.apache.org/jira/browse/FLINK-34300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17813590#comment-17813590 ] xuyang commented on FLINK-34300: Session window TVF is ready. Users can use Session window TVF aggregation instead of using legacy session group window aggregation. Someone can verify this feature by the [doc](https://github.com/apache/flink/pull/24250) although it is preparing. Further more, although session window join, session window rank and session window deduplicate are in experimental state, If someone finds some bugs about them, you could also open a Jira linked this one to report it. > Release Testing Instructions: Verify FLINK-24024 Support session Window TVF > --- > > Key: FLINK-34300 > URL: https://issues.apache.org/jira/browse/FLINK-34300 > Project: Flink > Issue Type: Sub-task > Components: Table SQL / API >Affects Versions: 1.19.0 >Reporter: lincoln lee >Assignee: xuyang >Priority: Blocker > Labels: release-testing > Fix For: 1.19.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-34330) Specify code owners for .github/workflows folder
[ https://issues.apache.org/jira/browse/FLINK-34330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17813589#comment-17813589 ] Chesnay Schepler commented on FLINK-34330: -- Chances are they just weren't caught; no one is actively looking out this after all. But anytime this gets brought up on ASF mailing lists it gets shut down quickly. > Specify code owners for .github/workflows folder > > > Key: FLINK-34330 > URL: https://issues.apache.org/jira/browse/FLINK-34330 > Project: Flink > Issue Type: Sub-task >Affects Versions: 1.19.0, 1.18.1 >Reporter: Matthias Pohl >Priority: Major > > Currently, the workflow files can be modified by any committer. We have to > discuss whether we want to limit access to the PMC (or a subset of it) here. > That might be a means to protect self-hosted runners. > See the [codeowner > documentation|https://docs.github.com/en/repositories/managing-your-repositorys-settings-and-features/customizing-your-repository/about-code-owners] > for further details. -- This message was sent by Atlassian Jira (v8.20.10#820010)