[jira] [Updated] (FLINK-34345) Remove TaskExecutorManager related logic

2024-02-02 Thread Rui Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-34345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rui Fan updated FLINK-34345:

Affects Version/s: 1.19.0

> Remove TaskExecutorManager related logic
> 
>
> Key: FLINK-34345
> URL: https://issues.apache.org/jira/browse/FLINK-34345
> Project: Flink
>  Issue Type: Technical Debt
>Affects Versions: 1.19.0
>Reporter: Caican Cai
>Assignee: Caican Cai
>Priority: Minor
>  Labels: pull-request-available
>
> Remove TaskExecutorManager related logic



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (FLINK-34345) Remove TaskExecutorManager related logic

2024-02-02 Thread Rui Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-34345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rui Fan reassigned FLINK-34345:
---

Assignee: Caican Cai

> Remove TaskExecutorManager related logic
> 
>
> Key: FLINK-34345
> URL: https://issues.apache.org/jira/browse/FLINK-34345
> Project: Flink
>  Issue Type: Technical Debt
>Reporter: Caican Cai
>Assignee: Caican Cai
>Priority: Minor
>  Labels: pull-request-available
>
> Remove TaskExecutorManager related logic



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] [FLINK-34345][runtime] Remove TaskExecutorManager related logic [flink]

2024-02-02 Thread via GitHub


caicancai commented on PR #24257:
URL: https://github.com/apache/flink/pull/24257#issuecomment-1925163414

   @RocMarshal @1996fanrui If you have time, can you help me review it? Thank 
you.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [FLINK-34345][runtime] Remove TaskExecutorManager related logic [flink]

2024-02-02 Thread via GitHub


caicancai commented on PR #24257:
URL: https://github.com/apache/flink/pull/24257#issuecomment-1925162917

   related to https://issues.apache.org/jira/browse/FLINK-31449


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [FLINK-34345][runtime] Remove TaskExecutorManager related logic [flink]

2024-02-02 Thread via GitHub


flinkbot commented on PR #24257:
URL: https://github.com/apache/flink/pull/24257#issuecomment-1925162848

   
   ## CI report:
   
   * 38fe004b350cad98d065f0619409fcf120de8c69 UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (FLINK-34345) Remove TaskExecutorManager related logic

2024-02-02 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-34345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated FLINK-34345:
---
Labels: pull-request-available  (was: )

> Remove TaskExecutorManager related logic
> 
>
> Key: FLINK-34345
> URL: https://issues.apache.org/jira/browse/FLINK-34345
> Project: Flink
>  Issue Type: Technical Debt
>Reporter: Caican Cai
>Priority: Minor
>  Labels: pull-request-available
>
> Remove TaskExecutorManager related logic



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[PR] [FLINK-34345][runtime] Remove TaskExecutorManager related logic [flink]

2024-02-02 Thread via GitHub


caicancai opened a new pull request, #24257:
URL: https://github.com/apache/flink/pull/24257

   
   
   ## What is the purpose of the change
   
   *(For example: This pull request makes task deployment go through the blob 
server, rather than through RPC. That way we avoid re-transferring them on each 
deployment (during recovery).)*
   
   
   ## Brief change log
   
   *(for example:)*
 - *The TaskInfo is stored in the blob store on job creation time as a 
persistent artifact*
 - *Deployments RPC transmits only the blob storage reference*
 - *TaskManagers retrieve the TaskInfo from the blob cache*
   
   
   ## Verifying this change
   
   Please make sure both new and modified tests in this PR follows the 
conventions defined in our code quality guide: 
https://flink.apache.org/contributing/code-style-and-quality-common.html#testing
   
   *(Please pick either of the following options)*
   
   This change is a trivial rework / code cleanup without any test coverage.
   
   *(or)*
   
   This change is already covered by existing tests, such as *(please describe 
tests)*.
   
   *(or)*
   
   This change added tests and can be verified as follows:
   
   *(example:)*
 - *Added integration tests for end-to-end deployment with large payloads 
(100MB)*
 - *Extended integration test for recovery after master (JobManager) 
failure*
 - *Added test that validates that TaskInfo is transferred only once across 
recoveries*
 - *Manually verified the change by running a 4 node cluster with 2 
JobManagers and 4 TaskManagers, a stateful streaming program, and killing one 
JobManager and two TaskManagers during the execution, verifying that recovery 
happens correctly.*
   
   ## Does this pull request potentially affect one of the following parts:
   
 - Dependencies (does it add or upgrade a dependency): (yes / no)
 - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: (yes / no)
 - The serializers: (yes / no / don't know)
 - The runtime per-record code paths (performance sensitive): (yes / no / 
don't know)
 - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Kubernetes/Yarn, ZooKeeper: (yes / no / don't know)
 - The S3 file system connector: (yes / no / don't know)
   
   ## Documentation
   
 - Does this pull request introduce a new feature? (yes / no)
 - If yes, how is the feature documented? (not applicable / docs / JavaDocs 
/ not documented)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (FLINK-34345) Remove TaskExecutorManager related logic

2024-02-02 Thread Caican Cai (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-34345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Caican Cai updated FLINK-34345:
---
Issue Type: Technical Debt  (was: Improvement)

> Remove TaskExecutorManager related logic
> 
>
> Key: FLINK-34345
> URL: https://issues.apache.org/jira/browse/FLINK-34345
> Project: Flink
>  Issue Type: Technical Debt
>Reporter: Caican Cai
>Priority: Minor
>
> Remove TaskExecutorManager related logic



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-34345) Remove TaskExecutorManager related logic

2024-02-02 Thread Caican Cai (Jira)
Caican Cai created FLINK-34345:
--

 Summary: Remove TaskExecutorManager related logic
 Key: FLINK-34345
 URL: https://issues.apache.org/jira/browse/FLINK-34345
 Project: Flink
  Issue Type: Improvement
Reporter: Caican Cai


Remove TaskExecutorManager related logic



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-33734) Merge unaligned checkpoint state handle

2024-02-02 Thread Rui Fan (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-33734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17813874#comment-17813874
 ] 

Rui Fan commented on FLINK-33734:
-

Thanks [~Feifan Wang] for the effort. IIUC, the checkpoint avg time is reduced 
from 82 s to 30s, right? IMHO, I think the checkpoint time is still too long. 
WDYT?

> Merge unaligned checkpoint state handle
> ---
>
> Key: FLINK-33734
> URL: https://issues.apache.org/jira/browse/FLINK-33734
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / Checkpointing
>Reporter: Feifan Wang
>Assignee: Feifan Wang
>Priority: Major
>  Labels: pull-request-available
>
> h3. Background
> Unaligned checkpoint will write the inflight-data of all InputChannel and 
> ResultSubpartition of the same subtask to the same file during checkpoint. 
> The InputChannelStateHandle and ResultSubpartitionStateHandle organize the 
> metadata of inflight-data at the channel granularity, which causes the file 
> name to be repeated many times. When a job is under backpressure and task 
> parallelism is high, the metadata of unaligned checkpoints will bloat. This 
> will result in:
>  # The amount of data reported by taskmanager to jobmanager increases, and 
> jobmanager takes longer to process these RPC requests.
>  # The metadata of the entire checkpoint becomes very large, and it takes 
> longer to serialize and write it to dfs.
> Both of the above points ultimately lead to longer checkpoint duration.
> h3. A Production example
> Take our production job with a parallelism of 4800 as an example:
>  # When there is no back pressure, checkpoint end-to-end duration is within 7 
> seconds.
>  # When under pressure: checkpoint end-to-end duration often exceeds 1 
> minute. We found that jobmanager took more than 40 seconds to process rpc 
> requests, and serialized metadata took more than 20 seconds.Some checkpoint 
> statistics:
> |metadata file size|950 MB|
> |channel state count|12,229,854|
> |channel file count|5536|
> Of the 950MB in the metadata file, 68% are redundant file paths.
> We enabled log-based checkpoint on this job and hoped that the checkpoint 
> could be completed within 30 seconds. This problem made it difficult to 
> achieve this goal.
> h3. Propose changes
> I suggest introducing MergedInputChannelStateHandle and 
> MergedResultSubpartitionStateHandle to eliminate redundant file paths.
> The taskmanager merges all InputChannelStateHandles with the same delegated 
> StreamStateHandle in the same subtask into one MergedInputChannelStateHandle 
> before reporting. When recovering from checkpoint, jobmangager converts 
> MergedInputChannelStateHandle to InputChannelStateHandle collection before 
> assigning state handle, and the rest of the process does not need to be 
> changed. 
> Structure of MergedInputChannelStateHandle :
>  
> {code:java}
> {   // MergedInputChannelStateHandle
>     "delegate": {
>         "filePath": 
> "viewfs://hadoop-meituan/flink-yg15/checkpoints/retained/1234567/ab8d0c2f02a47586490b15e7a2c30555/chk-31/ffe54c0a-9b6e-4724-aae7-61b96bf8b1cf",
>         "stateSize": 123456
>     },
>     "size": 2000,
>     "subtaskIndex":0,
>     "channels": [ // One InputChannel per element
>         {
>             "info": {
>                 "gateIdx": 0,
>                 "inputChannelIdx": 0
>             },
>             "offsets": [
>                 100,200,300,400
>             ],
>             "size": 1400
>         },
>         {
>             "info": {
>                 "gateIdx": 0,
>                 "inputChannelIdx": 1
>             },
>             "offsets": [
>                 500,600
>             ],
>             "size": 600
>         }
>     ]
> }
>  {code}
> MergedResultSubpartitionStateHandle is similar.
>  
>  
> WDYT [~roman] , [~pnowojski] , [~fanrui] ?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (FLINK-26369) Translate the part zh-page mixed with not be translated.

2024-02-02 Thread Rui Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-26369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rui Fan resolved FLINK-26369.
-
Resolution: Fixed

> Translate the part zh-page mixed with not be translated. 
> -
>
> Key: FLINK-26369
> URL: https://issues.apache.org/jira/browse/FLINK-26369
> Project: Flink
>  Issue Type: Bug
>  Components: Documentation
>Affects Versions: 1.15.0
>Reporter: Aiden Gong
>Assignee: Zhongqiang Gong
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 1.19.0
>
>
> These file should be translated.
> Files:
> docs/content.zh/docs/deployment/ha/overview.md
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-26369) Translate the part zh-page mixed with not be translated.

2024-02-02 Thread Rui Fan (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-26369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17813872#comment-17813872
 ] 

Rui Fan commented on FLINK-26369:
-

Merged to master(1.19.0) via : 5fe6f2024f77c03b2f2a897827765b209c42d5b0

> Translate the part zh-page mixed with not be translated. 
> -
>
> Key: FLINK-26369
> URL: https://issues.apache.org/jira/browse/FLINK-26369
> Project: Flink
>  Issue Type: Bug
>  Components: Documentation
>Affects Versions: 1.15.0
>Reporter: Aiden Gong
>Assignee: Zhongqiang Gong
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 1.19.0
>
>
> These file should be translated.
> Files:
> docs/content.zh/docs/deployment/ha/overview.md
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] [FLINK-26369][doc-zh] Translate the part zh-page mixed with not be translated. [flink]

2024-02-02 Thread via GitHub


1996fanrui merged PR #20340:
URL: https://github.com/apache/flink/pull/20340


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [FLINK-26369][doc-zh] Translate the part zh-page mixed with not be translated. [flink]

2024-02-02 Thread via GitHub


GOODBOY008 commented on PR #20340:
URL: https://github.com/apache/flink/pull/20340#issuecomment-1925022155

   @1996fanrui Branch has rebased, Thanks for your review.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [FLINK-31836][table] Upgrade Calcite version to 1.34.0 [flink]

2024-02-02 Thread via GitHub


flinkbot commented on PR #24256:
URL: https://github.com/apache/flink/pull/24256#issuecomment-1925021000

   
   ## CI report:
   
   * 6650bc89e66820688b8507d870762aa5f3e016b3 UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (FLINK-31836) Upgrade Calcite version to 1.34.0

2024-02-02 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-31836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated FLINK-31836:
---
Labels: pull-request-available  (was: )

> Upgrade Calcite version to 1.34.0
> -
>
> Key: FLINK-31836
> URL: https://issues.apache.org/jira/browse/FLINK-31836
> Project: Flink
>  Issue Type: Technical Debt
>  Components: Table SQL / API
>Reporter: Sergey Nuyanzin
>Assignee: Sergey Nuyanzin
>Priority: Major
>  Labels: pull-request-available
>
> Calcite 1.34.0 has been released 
> https://calcite.apache.org/news/2023/03/14/release-1.34.0/



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[PR] [FLINK-31836][table] Upgrade Calcite version to 1.34.0 [flink]

2024-02-02 Thread via GitHub


snuyanzin opened a new pull request, #24256:
URL: https://github.com/apache/flink/pull/24256

   ## What is the purpose of the change
   
   This PR is about to upgrade Calcite to 1.34.0
   it is based on https://github.com/apache/flink/pull/24255
   once upgrade to 1.33.0 will be approved and merged I will rebase this one
   
   Since 1.34.0 was a short release the upgrade is relatively simple, so only 
the latest commit is for 1.34.0 upgrade
   
   
   ## Verifying this change
   
   This change is already covered by existing tests
   
   ## Does this pull request potentially affect one of the following parts:
   
 - Dependencies (does it add or upgrade a dependency): ( yes)
 - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: ( no)
 - The serializers: ( no)
 - The runtime per-record code paths (performance sensitive): (yes / no / 
don't know)
 - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Kubernetes/Yarn, ZooKeeper: ( no )
 - The S3 file system connector: ( no)
   
   ## Documentation
   
 - Does this pull request introduce a new feature? ( no)
 - If yes, how is the feature documented? (not applicable)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Assigned] (FLINK-26369) Translate the part zh-page mixed with not be translated.

2024-02-02 Thread Rui Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-26369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rui Fan reassigned FLINK-26369:
---

Assignee: Zhongqiang Gong

> Translate the part zh-page mixed with not be translated. 
> -
>
> Key: FLINK-26369
> URL: https://issues.apache.org/jira/browse/FLINK-26369
> Project: Flink
>  Issue Type: Bug
>  Components: Documentation
>Affects Versions: 1.15.0
>Reporter: Aiden Gong
>Assignee: Zhongqiang Gong
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 1.19.0
>
>
> These file should be translated.
> Files:
> docs/content.zh/docs/deployment/ha/overview.md
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] [FLINK-26369][doc-zh] Translate the part zh-page mixed with not be translated. [flink]

2024-02-02 Thread via GitHub


1996fanrui commented on PR #20340:
URL: https://github.com/apache/flink/pull/20340#issuecomment-1925001112

   Hi @GOODBOY008 , would you mind rebasing the master branch? 
   
   If I remember correctly, this ci isuue has been fixed in master branch.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [FLINK-26369][doc-zh] Translate the part zh-page mixed with not be translated. [flink]

2024-02-02 Thread via GitHub


GOODBOY008 commented on PR #20340:
URL: https://github.com/apache/flink/pull/20340#issuecomment-1924979780

   @flinkbot run azure


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [FLINK-31362][table] Upgrade to Calcite version to 1.33.0 [flink]

2024-02-02 Thread via GitHub


flinkbot commented on PR #24255:
URL: https://github.com/apache/flink/pull/24255#issuecomment-1924979141

   
   ## CI report:
   
   * b9d77b02be5ddc59ed2596babcbcce7d49f92c4a UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (FLINK-31362) Upgrade to Calcite version to 1.33.0

2024-02-02 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-31362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated FLINK-31362:
---
Labels: pull-request-available  (was: )

> Upgrade to Calcite version to 1.33.0
> 
>
> Key: FLINK-31362
> URL: https://issues.apache.org/jira/browse/FLINK-31362
> Project: Flink
>  Issue Type: Improvement
>  Components: Table SQL / API
>Reporter: Aitozi
>Assignee: Sergey Nuyanzin
>Priority: Major
>  Labels: pull-request-available
>
> In Calcite 1.33.0, C-style escape strings have been supported. We could 
> leverage it to enhance our string literals usage.
> issue: https://issues.apache.org/jira/browse/CALCITE-5305



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[PR] [FLINK-31362][table] Upgrade to Calcite version to 1.33.0 [flink]

2024-02-02 Thread via GitHub


snuyanzin opened a new pull request, #24255:
URL: https://github.com/apache/flink/pull/24255

   ## What is the purpose of the change
   
   The PR is about to bump Calcite to 1.33.0
   
   ## Brief change log
   
   *(for example:)*
 - *The TaskInfo is stored in the blob store on job creation time as a 
persistent artifact*
 - *Deployments RPC transmits only the blob storage reference*
 - *TaskManagers retrieve the TaskInfo from the blob cache*
   
   
   ## Verifying this change
   
   This change is already covered by existing tests + some tests are changed 
because of changes in Calcite
   
   ## Does this pull request potentially affect one of the following parts:
   
 - Dependencies (does it add or upgrade a dependency): (yes )
 - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: (no)
 - The serializers: ( no )
 - The runtime per-record code paths (performance sensitive): (no )
 - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Kubernetes/Yarn, ZooKeeper: (no )
 - The S3 file system connector: (no )
   
   ## Documentation
   
 - Does this pull request introduce a new feature? ( no)
 - If yes, how is the feature documented? (not applicable)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (FLINK-34014) Jdbc connector can avoid send empty insert to database when there's no buffer data

2024-02-02 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-34014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated FLINK-34014:
---
Labels: pull-request-available  (was: )

> Jdbc connector can avoid send empty insert to database when there's no buffer 
> data
> --
>
> Key: FLINK-34014
> URL: https://issues.apache.org/jira/browse/FLINK-34014
> Project: Flink
>  Issue Type: Improvement
>  Components: Connectors / JDBC
>Reporter: luoyuxia
>Priority: Major
>  Labels: pull-request-available
>
> In jdbc connector, we will have a background thread to flush buffered data to 
> database, but when no data is in buffer, we can avoid the flush to database.
> we can avoid it in method JdbcOutputFormat#attemptFlush or in
> JdbcBatchStatementExecutor like TableBufferedStatementExecutor  which can 
> aovid calling  {{statementExecutor.executeBatch()}} when buffer is empty



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] [FLINK-34014][jdbc] Avoid executeBatch when buffer is empty [flink-connector-jdbc]

2024-02-02 Thread via GitHub


boring-cyborg[bot] commented on PR #100:
URL: 
https://github.com/apache/flink-connector-jdbc/pull/100#issuecomment-1924778857

   Thanks for opening this pull request! Please check out our contributing 
guidelines. (https://flink.apache.org/contributing/how-to-contribute.html)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Commented] (FLINK-34152) Tune memory of autoscaled jobs

2024-02-02 Thread Emanuele Pirro (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-34152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17813772#comment-17813772
 ] 

Emanuele Pirro commented on FLINK-34152:


Thanks Maximilian for reporting this. It's indeed an issue, especially the 
`OutOfMemoryErrors` on scale in.

> Tune memory of autoscaled jobs
> --
>
> Key: FLINK-34152
> URL: https://issues.apache.org/jira/browse/FLINK-34152
> Project: Flink
>  Issue Type: New Feature
>  Components: Autoscaler, Kubernetes Operator
>Reporter: Maximilian Michels
>Assignee: Maximilian Michels
>Priority: Major
>  Labels: pull-request-available
> Fix For: kubernetes-operator-1.8.0
>
>
> The current autoscaling algorithm adjusts the parallelism of the job task 
> vertices according to the processing needs. By adjusting the parallelism, we 
> systematically scale the amount of CPU for a task. At the same time, we also 
> indirectly change the amount of memory tasks have at their dispense. However, 
> there are some problems with this.
>  # Memory is overprovisioned: On scale up we may add more memory than we 
> actually need. Even on scale down, the memory / cpu ratio can still be off 
> and too much memory is used.
>  # Memory is underprovisioned: For stateful jobs, we risk running into 
> OutOfMemoryErrors on scale down. Even before running out of memory, too 
> little memory can have a negative impact on the effectiveness of the scaling.
> We lack the capability to tune memory proportionally to the processing needs. 
> In the same way that we measure CPU usage and size the tasks accordingly, we 
> need to evaluate memory usage and adjust the heap memory size.
> A tuning algorithm could look like this:
> h2. 1. Establish a memory baseline
> We observe the average heap memory usage at task managers.
> h2. 2. Calculate memory usage per record
> The memory requirements per record can be estimated by calculating this ratio:
> {noformat}
> memory_per_rec = sum(heap_usage) / sum(records_processed)
> {noformat}
> This ratio is surprisingly constant based off empirical data.
> h2. 3. Scale memory proportionally to the per-record memory needs
> {noformat}
> memory_per_tm = expected_records_per_sec * memory_per_rec / num_task_managers 
> {noformat}
> A minimum memory limit needs to be added to avoid scaling down memory too 
> much. The max memory per TM should be equal to the initially defined 
> user-specified limit from the ResourceSpec. 
> {noformat}
> memory_per_tm = max(min_limit, memory_per_tm)
> memory_per_tm = min(max_limit, memory_per_tm) {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] [FLINK-26088][Connectors/ElasticSearch] Add Elasticsearch 8.0 support [flink-connector-elasticsearch]

2024-02-02 Thread via GitHub


reta commented on PR #53:
URL: 
https://github.com/apache/flink-connector-elasticsearch/pull/53#issuecomment-1924270597

   @MartijnVisser  @mtfelisb it looks great, I have no more comments, it would 
be great to have e2e tests added but we could also do that in separate pull 
request.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (FLINK-34325) Inconsistent state with data loss after OutOfMemoryError

2024-02-02 Thread Alexis Sarda-Espinosa (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-34325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexis Sarda-Espinosa updated FLINK-34325:
--
Description: 
I have a job that uses broadcast state to maintain a cache of required 
metadata. I am currently evaluating memory requirements of my specific use 
case, and I ran into a weird situation that seems worrisome.

All sources in my job are Kafka sources. I wrote a large amount of messages in 
Kafka to force the broadcast state's cache to grow. At some point, this caused 
an "{{java.lang.OutOfMemoryError: Java heap space}}" error in the -Job- Task 
Manager. I would have expected the whole java process of the TM to crash, but 
the job was simply restarted. What's worrisome is that, after 2 checkpoint 
failures ^1^, the job restarted and subsequently resumed from the latest 
successful checkpoint and completely ignored all the events I wrote to Kafka, 
which I can verify because I have a custom metric that exposes the approximate 
size of this cache, and the fact that the job didn't crashloop at this point 
after reading all the messages from Kafka over and over again.

I'm attaching an excerpt of the Job Manager's logs. My main concerns are:

# It seems the memory error somehow prevented the Kafka offsets from being 
"rolled back", so eventually the Kafka events that should have ended in the 
broadcast state's cache were ignored.
# -Is it normal that the state is somehow "materialized" in the JM and is thus 
affected by the size of the JM's heap? Is this something particular due to the 
use of broadcast state? I found this very surprising.- See comments

^1^ Two failures are expected since 
{{execution.checkpointing.tolerable-failed-checkpoints=1}}

  was:
I have a job that uses broadcast state to maintain a cache of required 
metadata. I am currently evaluating memory requirements of my specific use 
case, and I ran into a weird situation that seems worrisome.

All sources in my job are Kafka sources. I wrote a large amount of messages in 
Kafka to force the broadcast state's cache to grow. At some point, this caused 
an "{{java.lang.OutOfMemoryError: Java heap space}}" error in the -Job- Task 
Manager. I would have expected the whole java process of the TM to crash, but 
the job was simply restarted. What's worrisome is that, after 2 checkpoint 
failures ^1^, the job restarted and subsequently resumed from the latest 
successful checkpoint and completely ignored all the events I wrote to Kafka, 
which I can verify because I have a custom metric that exposes the approximate 
size of this cache, and the fact that the job didn't crashloop at this point 
after reading all the messages from Kafka over and over again.

I'm attaching an excerpt of the Job Manager's logs. My main concerns are:

# It seems the memory error from the JM didn't prevent the Kafka offsets from 
being "rolled back", so eventually the Kafka events that should have ended in 
the broadcast state's cache were ignored.
# -Is it normal that the state is somehow "materialized" in the JM and is thus 
affected by the size of the JM's heap? Is this something particular due to the 
use of broadcast state? I found this very surprising.- See comments

^1^ Two failures are expected since 
{{execution.checkpointing.tolerable-failed-checkpoints=1}}


> Inconsistent state with data loss after OutOfMemoryError
> 
>
> Key: FLINK-34325
> URL: https://issues.apache.org/jira/browse/FLINK-34325
> Project: Flink
>  Issue Type: Bug
>Affects Versions: 1.17.1
> Environment: Flink on Kubernetes with HA, RocksDB with incremental 
> checkpoints on Azure
>Reporter: Alexis Sarda-Espinosa
>Priority: Major
> Attachments: jobmanager_log.txt
>
>
> I have a job that uses broadcast state to maintain a cache of required 
> metadata. I am currently evaluating memory requirements of my specific use 
> case, and I ran into a weird situation that seems worrisome.
> All sources in my job are Kafka sources. I wrote a large amount of messages 
> in Kafka to force the broadcast state's cache to grow. At some point, this 
> caused an "{{java.lang.OutOfMemoryError: Java heap space}}" error in the 
> -Job- Task Manager. I would have expected the whole java process of the TM to 
> crash, but the job was simply restarted. What's worrisome is that, after 2 
> checkpoint failures ^1^, the job restarted and subsequently resumed from the 
> latest successful checkpoint and completely ignored all the events I wrote to 
> Kafka, which I can verify because I have a custom metric that exposes the 
> approximate size of this cache, and the fact that the job didn't crashloop at 
> this point after reading all the messages from Kafka over and over again.
> I'm attaching an excerpt of the Job Manager's logs. My main concerns 

[jira] [Updated] (FLINK-34325) Inconsistent state with data loss after OutOfMemoryError

2024-02-02 Thread Alexis Sarda-Espinosa (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-34325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexis Sarda-Espinosa updated FLINK-34325:
--
Description: 
I have a job that uses broadcast state to maintain a cache of required 
metadata. I am currently evaluating memory requirements of my specific use 
case, and I ran into a weird situation that seems worrisome.

All sources in my job are Kafka sources. I wrote a large amount of messages in 
Kafka to force the broadcast state's cache to grow. At some point, this caused 
an "{{java.lang.OutOfMemoryError: Java heap space}}" error in the -Job- Task 
Manager. I would have expected the whole java process of the TM to crash, but 
the job was simply restarted. What's worrisome is that, after 2 checkpoint 
failures ^1^, the job restarted and subsequently resumed from the latest 
successful checkpoint and completely ignored all the events I wrote to Kafka, 
which I can verify because I have a custom metric that exposes the approximate 
size of this cache, and the fact that the job didn't crashloop at this point 
after reading all the messages from Kafka over and over again.

I'm attaching an excerpt of the Job Manager's logs. My main concerns are:

# It seems the memory error from the JM didn't prevent the Kafka offsets from 
being "rolled back", so eventually the Kafka events that should have ended in 
the broadcast state's cache were ignored.
# -Is it normal that the state is somehow "materialized" in the JM and is thus 
affected by the size of the JM's heap? Is this something particular due to the 
use of broadcast state? I found this very surprising.- See comments

^1^ Two failures are expected since 
{{execution.checkpointing.tolerable-failed-checkpoints=1}}

  was:
I have a job that uses broadcast state to maintain a cache of required 
metadata. I am currently evaluating memory requirements of my specific use 
case, and I ran into a weird situation that seems worrisome.

All sources in my job are Kafka sources. I wrote a large amount of messages in 
Kafka to force the broadcast state's cache to grow. At some point, this caused 
an "{{java.lang.OutOfMemoryError: Java heap space}}" error in the Job Manager. 
I would have expected the whole java process of the JM to crash, but the job 
was simply restarted. What's worrisome is that, after 2 checkpoint failures 
^1^, the job restarted and subsequently resumed from the latest successful 
checkpoint and completely ignored all the events I wrote to Kafka, which I can 
verify because I have a custom metric that exposes the approximate size of this 
cache, and the fact that the job didn't crashloop at this point after reading 
all the messages from Kafka over and over again.

I'm attaching an excerpt of the Job Manager's logs. My main concerns are:

# It seems the memory error from the JM didn't prevent the Kafka offsets from 
being "rolled back", so eventually the Kafka events that should have ended in 
the broadcast state's cache were ignored.
# -Is it normal that the state is somehow "materialized" in the JM and is thus 
affected by the size of the JM's heap? Is this something particular due to the 
use of broadcast state? I found this very surprising.- See comments

^1^ Two failures are expected since 
{{execution.checkpointing.tolerable-failed-checkpoints=1}}


> Inconsistent state with data loss after OutOfMemoryError
> 
>
> Key: FLINK-34325
> URL: https://issues.apache.org/jira/browse/FLINK-34325
> Project: Flink
>  Issue Type: Bug
>Affects Versions: 1.17.1
> Environment: Flink on Kubernetes with HA, RocksDB with incremental 
> checkpoints on Azure
>Reporter: Alexis Sarda-Espinosa
>Priority: Major
> Attachments: jobmanager_log.txt
>
>
> I have a job that uses broadcast state to maintain a cache of required 
> metadata. I am currently evaluating memory requirements of my specific use 
> case, and I ran into a weird situation that seems worrisome.
> All sources in my job are Kafka sources. I wrote a large amount of messages 
> in Kafka to force the broadcast state's cache to grow. At some point, this 
> caused an "{{java.lang.OutOfMemoryError: Java heap space}}" error in the 
> -Job- Task Manager. I would have expected the whole java process of the TM to 
> crash, but the job was simply restarted. What's worrisome is that, after 2 
> checkpoint failures ^1^, the job restarted and subsequently resumed from the 
> latest successful checkpoint and completely ignored all the events I wrote to 
> Kafka, which I can verify because I have a custom metric that exposes the 
> approximate size of this cache, and the fact that the job didn't crashloop at 
> this point after reading all the messages from Kafka over and over again.
> I'm attaching an excerpt of the Job Manager's logs. My main 

Re: [PR] [FLINK-33396][table-planner] Fix table alias not be cleared in subquery [flink]

2024-02-02 Thread via GitHub


jeyhunkarimov commented on code in PR #24239:
URL: https://github.com/apache/flink/pull/24239#discussion_r1476267210


##
flink-table/flink-table-planner/src/main/scala/org/apache/flink/table/planner/plan/utils/RelTreeWriterImpl.scala:
##
@@ -286,4 +297,45 @@ class RelTreeWriterImpl(
   })
 }
   }
+
+  private def copy(pw: PrintWriter): RelTreeWriterImpl = {
+new RelTreeWriterImpl(
+  pw,
+  explainLevel,
+  withIdPrefix,
+  withChangelogTraits,
+  withRowType,
+  withTreeStyle,
+  withUpsertKey,
+  withQueryHint,
+  withQueryBlockAlias,
+  statementNum,
+  withAdvice,
+  withRicherDetailInSubQuery)
+  }
+
+  /**
+   * Mainly copy from [[RexSubQuery#computeDigest]].
+   *
+   * Modified to support explain sub-query with richer additional detail by 
[[RelTreeWriterImpl]] in
+   * Flink rather than Calcite.
+   */
+  private def computeSubQueryRicherDigest(subQuery: RexSubQuery): String = {
+val sb = new StringBuilder(subQuery.getOperator.getName);
+sb.append("(")
+subQuery.getOperands.forEach(
+  operand => {
+sb.append(operand)
+sb.append(", ")
+  })
+sb.append("{\n")
+
+val sw = new StringWriter
+val newRelTreeWriter = copy(new PrintWriter(sw))
+subQuery.rel.explain(newRelTreeWriter)
+sb.append(sw.toString)
+
+sb.append("})")
+sb.toString()
+  }

Review Comment:
   IMO we should keep the scope of this PR to remove the hints about alias. And 
in the tests, we should only see the change, `query_plan_with_alias_hint` -> 
`the_exact_same_query_plan_without_unnecessary_alias`. Would that make sense to 
you?



##
flink-table/flink-table-planner/src/main/scala/org/apache/flink/table/planner/plan/utils/RelTreeWriterImpl.scala:
##
@@ -189,7 +191,16 @@ class RelTreeWriterImpl(
 case value =>
   if (j == 0) s.append("(") else s.append(", ")
   j = j + 1
-  s.append(value.left).append("=[").append(value.right).append("]")
+  val rightStr = value.right match {
+case subQuery: RexSubQuery =>
+  if (withRicherDetailInSubQuery) {
+computeSubQueryRicherDigest(subQuery)
+  } else {
+value.right.toString
+  }
+case _ => value.right.toString
+  }
+  s.append(value.left).append("=[").append(rightStr).append("]")

Review Comment:
   Is this block needed for the fixing the table alias clearance? 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (FLINK-34343) ResourceManager registration is not completed when registering the JobMaster

2024-02-02 Thread Matthias Pohl (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-34343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias Pohl updated FLINK-34343:
--
Priority: Critical  (was: Blocker)

> ResourceManager registration is not completed when registering the JobMaster
> 
>
> Key: FLINK-34343
> URL: https://issues.apache.org/jira/browse/FLINK-34343
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination, Runtime / RPC
>Affects Versions: 1.17.2, 1.19.0, 1.18.1
>Reporter: Matthias Pohl
>Priority: Critical
>  Labels: test-stability
> Attachments: FLINK-34343_k8s_application_cluster_e2e_test.log
>
>
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=57203=logs=64debf87-ecdb-5aef-788d-8720d341b5cb=2302fb98-0839-5df2-3354-bbae636f81a7=8066
> The test run failed due to a NullPointerException:
> {code}
> Feb 02 01:11:55 2024-02-02 01:11:47,791 INFO  
> org.apache.flink.runtime.rpc.pekko.FencedPekkoRpcActor   [] - The rpc 
> endpoint 
> org.apache.flink.runtime.resourcemanager.active.ActiveResourceManager has not 
> been started yet. Discarding message LocalFencedMessage(000
> 0, 
> LocalRpcInvocation(ResourceManagerGateway.registerJobMaster(JobMasterId, 
> ResourceID, String, JobID, Time))) until processing is started.
> Feb 02 01:11:55 2024-02-02 01:11:47,797 WARN  
> org.apache.flink.runtime.rpc.pekko.SupervisorActor   [] - RpcActor 
> pekko://flink/user/rpc/resourcemanager_2 has failed. Shutting it down now.
> Feb 02 01:11:55 java.lang.NullPointerException: Cannot invoke 
> "org.apache.flink.runtime.rpc.RpcServer.getAddress()" because 
> "this.rpcServer" is null
> Feb 02 01:11:55 at 
> org.apache.flink.runtime.rpc.RpcEndpoint.getAddress(RpcEndpoint.java:322) 
> ~[flink-dist-1.19-SNAPSHOT.jar:1.19-SNAPSHOT]
> Feb 02 01:11:55 at 
> org.apache.flink.runtime.rpc.pekko.PekkoRpcActor.handleMessage(PekkoRpcActor.java:182)
>  ~[flink-rpc-akka06a9bb81-2e68-483a-b236-a283d0b1d097.jar:1.19-SNAPSHOT]
> Feb 02 01:11:55 at 
> org.apache.pekko.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:33) 
> ~[flink-rpc-akka06a9bb81-2e68-483a-b236-a283d0b1d097.jar:1.19-SNAPSHOT]
> Feb 02 01:11:55 at 
> org.apache.pekko.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:29) 
> ~[flink-rpc-akka06a9bb81-2e68-483a-b236-a283d0b1d097.jar:1.19-SNAPSHOT]
> Feb 02 01:11:55 at 
> scala.PartialFunction.applyOrElse(PartialFunction.scala:127) 
> ~[flink-rpc-akka06a9bb81-2e68-483a-b236-a283d0b1d097.jar:1.19-SNAPSHOT]
> Feb 02 01:11:55 at 
> scala.PartialFunction.applyOrElse$(PartialFunction.scala:126) 
> ~[flink-rpc-akka06a9bb81-2e68-483a-b236-a283d0b1d097.jar:1.19-SNAPSHOT]
> Feb 02 01:11:55 at 
> org.apache.pekko.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala:29)
>  ~[flink-rpc-akka06a9bb81-2e68-483a-b236-a283d0b1d097.jar:1.19-SNAPSHOT]
> Feb 02 01:11:55 at 
> scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:175) 
> ~[flink-rpc-akka06a9bb81-2e68-483a-b236-a283d0b1d097.jar:1.19-SNAPSHOT]
> Feb 02 01:11:55 at 
> scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:176) 
> ~[flink-rpc-akka06a9bb81-2e68-483a-b236-a283d0b1d097.jar:1.19-SNAPSHOT]
> Feb 02 01:11:55 at 
> scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:176) 
> ~[flink-rpc-akka06a9bb81-2e68-483a-b236-a283d0b1d097.jar:1.19-SNAPSHOT]
> Feb 02 01:11:55 at 
> org.apache.pekko.actor.Actor.aroundReceive(Actor.scala:547) 
> ~[flink-rpc-akka06a9bb81-2e68-483a-b236-a283d0b1d097.jar:1.19-SNAPSHOT]
> Feb 02 01:11:55 at 
> org.apache.pekko.actor.Actor.aroundReceive$(Actor.scala:545) 
> ~[flink-rpc-akka06a9bb81-2e68-483a-b236-a283d0b1d097.jar:1.19-SNAPSHOT]
> Feb 02 01:11:55 at 
> org.apache.pekko.actor.AbstractActor.aroundReceive(AbstractActor.scala:229) 
> ~[flink-rpc-akka06a9bb81-2e68-483a-b236-a283d0b1d097.jar:1.19-SNAPSHOT]
> Feb 02 01:11:55 at 
> org.apache.pekko.actor.ActorCell.receiveMessage(ActorCell.scala:590) 
> ~[flink-rpc-akka06a9bb81-2e68-483a-b236-a283d0b1d097.jar:1.19-SNAPSHOT]
> Feb 02 01:11:55 at 
> org.apache.pekko.actor.ActorCell.invoke(ActorCell.scala:557) 
> ~[flink-rpc-akka06a9bb81-2e68-483a-b236-a283d0b1d097.jar:1.19-SNAPSHOT]
> Feb 02 01:11:55 at 
> org.apache.pekko.dispatch.Mailbox.processMailbox(Mailbox.scala:280) 
> ~[flink-rpc-akka06a9bb81-2e68-483a-b236-a283d0b1d097.jar:1.19-SNAPSHOT]
> Feb 02 01:11:55 at 
> org.apache.pekko.dispatch.Mailbox.run(Mailbox.scala:241) 
> ~[flink-rpc-akka06a9bb81-2e68-483a-b236-a283d0b1d097.jar:1.19-SNAPSHOT]
> Feb 02 01:11:55 at 
> org.apache.pekko.dispatch.Mailbox.exec(Mailbox.scala:253) 
> 

[jira] [Updated] (FLINK-34343) ResourceManager registration is not completed when registering the JobMaster

2024-02-02 Thread Matthias Pohl (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-34343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias Pohl updated FLINK-34343:
--
Affects Version/s: 1.18.1
   1.17.2

> ResourceManager registration is not completed when registering the JobMaster
> 
>
> Key: FLINK-34343
> URL: https://issues.apache.org/jira/browse/FLINK-34343
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination, Runtime / RPC
>Affects Versions: 1.17.2, 1.19.0, 1.18.1
>Reporter: Matthias Pohl
>Priority: Blocker
>  Labels: test-stability
> Attachments: FLINK-34343_k8s_application_cluster_e2e_test.log
>
>
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=57203=logs=64debf87-ecdb-5aef-788d-8720d341b5cb=2302fb98-0839-5df2-3354-bbae636f81a7=8066
> The test run failed due to a NullPointerException:
> {code}
> Feb 02 01:11:55 2024-02-02 01:11:47,791 INFO  
> org.apache.flink.runtime.rpc.pekko.FencedPekkoRpcActor   [] - The rpc 
> endpoint 
> org.apache.flink.runtime.resourcemanager.active.ActiveResourceManager has not 
> been started yet. Discarding message LocalFencedMessage(000
> 0, 
> LocalRpcInvocation(ResourceManagerGateway.registerJobMaster(JobMasterId, 
> ResourceID, String, JobID, Time))) until processing is started.
> Feb 02 01:11:55 2024-02-02 01:11:47,797 WARN  
> org.apache.flink.runtime.rpc.pekko.SupervisorActor   [] - RpcActor 
> pekko://flink/user/rpc/resourcemanager_2 has failed. Shutting it down now.
> Feb 02 01:11:55 java.lang.NullPointerException: Cannot invoke 
> "org.apache.flink.runtime.rpc.RpcServer.getAddress()" because 
> "this.rpcServer" is null
> Feb 02 01:11:55 at 
> org.apache.flink.runtime.rpc.RpcEndpoint.getAddress(RpcEndpoint.java:322) 
> ~[flink-dist-1.19-SNAPSHOT.jar:1.19-SNAPSHOT]
> Feb 02 01:11:55 at 
> org.apache.flink.runtime.rpc.pekko.PekkoRpcActor.handleMessage(PekkoRpcActor.java:182)
>  ~[flink-rpc-akka06a9bb81-2e68-483a-b236-a283d0b1d097.jar:1.19-SNAPSHOT]
> Feb 02 01:11:55 at 
> org.apache.pekko.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:33) 
> ~[flink-rpc-akka06a9bb81-2e68-483a-b236-a283d0b1d097.jar:1.19-SNAPSHOT]
> Feb 02 01:11:55 at 
> org.apache.pekko.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:29) 
> ~[flink-rpc-akka06a9bb81-2e68-483a-b236-a283d0b1d097.jar:1.19-SNAPSHOT]
> Feb 02 01:11:55 at 
> scala.PartialFunction.applyOrElse(PartialFunction.scala:127) 
> ~[flink-rpc-akka06a9bb81-2e68-483a-b236-a283d0b1d097.jar:1.19-SNAPSHOT]
> Feb 02 01:11:55 at 
> scala.PartialFunction.applyOrElse$(PartialFunction.scala:126) 
> ~[flink-rpc-akka06a9bb81-2e68-483a-b236-a283d0b1d097.jar:1.19-SNAPSHOT]
> Feb 02 01:11:55 at 
> org.apache.pekko.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala:29)
>  ~[flink-rpc-akka06a9bb81-2e68-483a-b236-a283d0b1d097.jar:1.19-SNAPSHOT]
> Feb 02 01:11:55 at 
> scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:175) 
> ~[flink-rpc-akka06a9bb81-2e68-483a-b236-a283d0b1d097.jar:1.19-SNAPSHOT]
> Feb 02 01:11:55 at 
> scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:176) 
> ~[flink-rpc-akka06a9bb81-2e68-483a-b236-a283d0b1d097.jar:1.19-SNAPSHOT]
> Feb 02 01:11:55 at 
> scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:176) 
> ~[flink-rpc-akka06a9bb81-2e68-483a-b236-a283d0b1d097.jar:1.19-SNAPSHOT]
> Feb 02 01:11:55 at 
> org.apache.pekko.actor.Actor.aroundReceive(Actor.scala:547) 
> ~[flink-rpc-akka06a9bb81-2e68-483a-b236-a283d0b1d097.jar:1.19-SNAPSHOT]
> Feb 02 01:11:55 at 
> org.apache.pekko.actor.Actor.aroundReceive$(Actor.scala:545) 
> ~[flink-rpc-akka06a9bb81-2e68-483a-b236-a283d0b1d097.jar:1.19-SNAPSHOT]
> Feb 02 01:11:55 at 
> org.apache.pekko.actor.AbstractActor.aroundReceive(AbstractActor.scala:229) 
> ~[flink-rpc-akka06a9bb81-2e68-483a-b236-a283d0b1d097.jar:1.19-SNAPSHOT]
> Feb 02 01:11:55 at 
> org.apache.pekko.actor.ActorCell.receiveMessage(ActorCell.scala:590) 
> ~[flink-rpc-akka06a9bb81-2e68-483a-b236-a283d0b1d097.jar:1.19-SNAPSHOT]
> Feb 02 01:11:55 at 
> org.apache.pekko.actor.ActorCell.invoke(ActorCell.scala:557) 
> ~[flink-rpc-akka06a9bb81-2e68-483a-b236-a283d0b1d097.jar:1.19-SNAPSHOT]
> Feb 02 01:11:55 at 
> org.apache.pekko.dispatch.Mailbox.processMailbox(Mailbox.scala:280) 
> ~[flink-rpc-akka06a9bb81-2e68-483a-b236-a283d0b1d097.jar:1.19-SNAPSHOT]
> Feb 02 01:11:55 at 
> org.apache.pekko.dispatch.Mailbox.run(Mailbox.scala:241) 
> ~[flink-rpc-akka06a9bb81-2e68-483a-b236-a283d0b1d097.jar:1.19-SNAPSHOT]
> Feb 02 01:11:55 at 
> org.apache.pekko.dispatch.Mailbox.exec(Mailbox.scala:253) 
> 

[jira] [Commented] (FLINK-33819) Support setting CompressType in RocksDBStateBackend

2024-02-02 Thread Piotr Nowojski (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-33819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17813740#comment-17813740
 ] 

Piotr Nowojski commented on FLINK-33819:


Thanks [~mayuehappy].

{quote}
merged 4f7725aa into master which is not blocked by the performance result.

I also think it's worthy discussing and we could find a better default 
behavious in the next version.
{quote}

+1


Yes

> Support setting CompressType in RocksDBStateBackend
> ---
>
> Key: FLINK-33819
> URL: https://issues.apache.org/jira/browse/FLINK-33819
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / State Backends
>Affects Versions: 1.18.0
>Reporter: Yue Ma
>Assignee: Yue Ma
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.19.0
>
> Attachments: image-2023-12-14-11-32-32-968.png, 
> image-2023-12-14-11-35-22-306.png
>
>
> Currently, RocksDBStateBackend does not support setting the compression 
> level, and Snappy is used for compression by default. But we have some 
> scenarios where compression will use a lot of CPU resources. Turning off 
> compression can significantly reduce CPU overhead. So we may need to support 
> a parameter for users to set the CompressType of Rocksdb.
>   !image-2023-12-14-11-35-22-306.png!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (FLINK-33819) Support setting CompressType in RocksDBStateBackend

2024-02-02 Thread Piotr Nowojski (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-33819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17813740#comment-17813740
 ] 

Piotr Nowojski edited comment on FLINK-33819 at 2/2/24 3:52 PM:


Thanks [~mayuehappy].

{quote}
merged 4f7725aa into master which is not blocked by the performance result.

I also think it's worthy discussing and we could find a better default 
behavious in the next version.
{quote}

+1


was (Author: pnowojski):
Thanks [~mayuehappy].

{quote}
merged 4f7725aa into master which is not blocked by the performance result.

I also think it's worthy discussing and we could find a better default 
behavious in the next version.
{quote}

+1


Yes

> Support setting CompressType in RocksDBStateBackend
> ---
>
> Key: FLINK-33819
> URL: https://issues.apache.org/jira/browse/FLINK-33819
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / State Backends
>Affects Versions: 1.18.0
>Reporter: Yue Ma
>Assignee: Yue Ma
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.19.0
>
> Attachments: image-2023-12-14-11-32-32-968.png, 
> image-2023-12-14-11-35-22-306.png
>
>
> Currently, RocksDBStateBackend does not support setting the compression 
> level, and Snappy is used for compression by default. But we have some 
> scenarios where compression will use a lot of CPU resources. Turning off 
> compression can significantly reduce CPU overhead. So we may need to support 
> a parameter for users to set the CompressType of Rocksdb.
>   !image-2023-12-14-11-35-22-306.png!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Reopened] (FLINK-33696) FLIP-385: Add OpenTelemetryTraceReporter and OpenTelemetryMetricReporter

2024-02-02 Thread Piotr Nowojski (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-33696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Piotr Nowojski reopened FLINK-33696:


No it's not yet done. I still have to open a PR adding OpenTelemetry reporters.

7db2ecad was adding only slf4j reporter.

> FLIP-385: Add OpenTelemetryTraceReporter and OpenTelemetryMetricReporter
> 
>
> Key: FLINK-33696
> URL: https://issues.apache.org/jira/browse/FLINK-33696
> Project: Flink
>  Issue Type: New Feature
>  Components: Runtime / Metrics
>Reporter: Piotr Nowojski
>Assignee: Piotr Nowojski
>Priority: Major
> Fix For: 1.19.0
>
>
> h1. Motivation
> [FLIP-384|https://cwiki.apache.org/confluence/display/FLINK/FLIP-384%3A+Introduce+TraceReporter+and+use+it+to+create+checkpointing+and+recovery+traces]
>  is adding TraceReporter interface. However with 
> [FLIP-384|https://cwiki.apache.org/confluence/display/FLINK/FLIP-384%3A+Introduce+TraceReporter+and+use+it+to+create+checkpointing+and+recovery+traces]
>  alone, Log4jTraceReporter would be the only available implementation of 
> TraceReporter interface, which is not very helpful.
> In this FLIP I’m proposing to contribute both MetricExporter and 
> TraceReporter implementation using OpenTelemetry.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (FLINK-34169) [benchmark] CI fails during test running

2024-02-02 Thread Zakelly Lan (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-34169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zakelly Lan closed FLINK-34169.
---
Resolution: Fixed

> [benchmark] CI fails during test running
> 
>
> Key: FLINK-34169
> URL: https://issues.apache.org/jira/browse/FLINK-34169
> Project: Flink
>  Issue Type: Bug
>  Components: Benchmarks
>Reporter: Zakelly Lan
>Assignee: Yunfeng Zhou
>Priority: Critical
>
> [CI link: 
> https://github.com/apache/flink-benchmarks/actions/runs/7580834955/job/20647663157?pr=85|https://github.com/apache/flink-benchmarks/actions/runs/7580834955/job/20647663157?pr=85]
> which says:
> {code:java}
> // omit some stack traces
>   Caused by: java.util.concurrent.ExecutionException: Boxed Error
> 1115  at scala.concurrent.impl.Promise$.resolver(Promise.scala:87)
> 1116  at 
> scala.concurrent.impl.Promise$.scala$concurrent$impl$Promise$$resolveTry(Promise.scala:79)
> 1117  at 
> scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:284)
> 1118  at 
> org.apache.pekko.pattern.PromiseActorRef.$bang(AskSupport.scala:629)
> 1119  at org.apache.pekko.actor.ActorRef.tell(ActorRef.scala:141)
> 1120  at 
> org.apache.flink.runtime.rpc.pekko.PekkoRpcActor.handleRpcInvocation(PekkoRpcActor.java:317)
> 1121  at 
> org.apache.flink.runtime.rpc.pekko.PekkoRpcActor.handleRpcMessage(PekkoRpcActor.java:222)
> 1122  ... 22 more
> 1123  Caused by: java.lang.NoClassDefFoundError: 
> javax/activation/UnsupportedDataTypeException
> 1124  at 
> org.apache.flink.runtime.io.network.partition.consumer.SingleInputGateFactory.createKnownInputChannel(SingleInputGateFactory.java:387)
> 1125  at 
> org.apache.flink.runtime.io.network.partition.consumer.SingleInputGateFactory.lambda$createInputChannel$2(SingleInputGateFactory.java:353)
> 1126  at 
> org.apache.flink.runtime.shuffle.ShuffleUtils.applyWithShuffleTypeCheck(ShuffleUtils.java:51)
> 1127  at 
> org.apache.flink.runtime.io.network.partition.consumer.SingleInputGateFactory.createInputChannel(SingleInputGateFactory.java:333)
> 1128  at 
> org.apache.flink.runtime.io.network.partition.consumer.SingleInputGateFactory.createInputChannelsAndTieredStorageService(SingleInputGateFactory.java:284)
> 1129  at 
> org.apache.flink.runtime.io.network.partition.consumer.SingleInputGateFactory.create(SingleInputGateFactory.java:204)
> 1130  at 
> org.apache.flink.runtime.io.network.NettyShuffleEnvironment.createInputGates(NettyShuffleEnvironment.java:265)
> 1131  at 
> org.apache.flink.runtime.taskmanager.Task.(Task.java:418)
> 1132  at 
> org.apache.flink.runtime.taskexecutor.TaskExecutor.submitTask(TaskExecutor.java:821)
> 1133  at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 1134  at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> 1135  at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> 1136  at java.base/java.lang.reflect.Method.invoke(Method.java:566)
> 1137  at 
> org.apache.flink.runtime.rpc.pekko.PekkoRpcActor.lambda$handleRpcInvocation$1(PekkoRpcActor.java:309)
> 1138  at 
> org.apache.flink.runtime.concurrent.ClassLoadingUtils.runWithContextClassLoader(ClassLoadingUtils.java:83)
> 1139  at 
> org.apache.flink.runtime.rpc.pekko.PekkoRpcActor.handleRpcInvocation(PekkoRpcActor.java:307)
> 1140  ... 23 more
> 1141  Caused by: java.lang.ClassNotFoundException: 
> javax.activation.UnsupportedDataTypeException
> 1142  at 
> java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:581)
> 1143  at 
> java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:178)
> 1144  at 
> java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:527)
> 1145  ... 39 more {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-34169) [benchmark] CI fails during test running

2024-02-02 Thread Zakelly Lan (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-34169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17813730#comment-17813730
 ] 

Zakelly Lan commented on FLINK-34169:
-

Thanks, closing now.

> [benchmark] CI fails during test running
> 
>
> Key: FLINK-34169
> URL: https://issues.apache.org/jira/browse/FLINK-34169
> Project: Flink
>  Issue Type: Bug
>  Components: Benchmarks
>Reporter: Zakelly Lan
>Assignee: Yunfeng Zhou
>Priority: Critical
>
> [CI link: 
> https://github.com/apache/flink-benchmarks/actions/runs/7580834955/job/20647663157?pr=85|https://github.com/apache/flink-benchmarks/actions/runs/7580834955/job/20647663157?pr=85]
> which says:
> {code:java}
> // omit some stack traces
>   Caused by: java.util.concurrent.ExecutionException: Boxed Error
> 1115  at scala.concurrent.impl.Promise$.resolver(Promise.scala:87)
> 1116  at 
> scala.concurrent.impl.Promise$.scala$concurrent$impl$Promise$$resolveTry(Promise.scala:79)
> 1117  at 
> scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:284)
> 1118  at 
> org.apache.pekko.pattern.PromiseActorRef.$bang(AskSupport.scala:629)
> 1119  at org.apache.pekko.actor.ActorRef.tell(ActorRef.scala:141)
> 1120  at 
> org.apache.flink.runtime.rpc.pekko.PekkoRpcActor.handleRpcInvocation(PekkoRpcActor.java:317)
> 1121  at 
> org.apache.flink.runtime.rpc.pekko.PekkoRpcActor.handleRpcMessage(PekkoRpcActor.java:222)
> 1122  ... 22 more
> 1123  Caused by: java.lang.NoClassDefFoundError: 
> javax/activation/UnsupportedDataTypeException
> 1124  at 
> org.apache.flink.runtime.io.network.partition.consumer.SingleInputGateFactory.createKnownInputChannel(SingleInputGateFactory.java:387)
> 1125  at 
> org.apache.flink.runtime.io.network.partition.consumer.SingleInputGateFactory.lambda$createInputChannel$2(SingleInputGateFactory.java:353)
> 1126  at 
> org.apache.flink.runtime.shuffle.ShuffleUtils.applyWithShuffleTypeCheck(ShuffleUtils.java:51)
> 1127  at 
> org.apache.flink.runtime.io.network.partition.consumer.SingleInputGateFactory.createInputChannel(SingleInputGateFactory.java:333)
> 1128  at 
> org.apache.flink.runtime.io.network.partition.consumer.SingleInputGateFactory.createInputChannelsAndTieredStorageService(SingleInputGateFactory.java:284)
> 1129  at 
> org.apache.flink.runtime.io.network.partition.consumer.SingleInputGateFactory.create(SingleInputGateFactory.java:204)
> 1130  at 
> org.apache.flink.runtime.io.network.NettyShuffleEnvironment.createInputGates(NettyShuffleEnvironment.java:265)
> 1131  at 
> org.apache.flink.runtime.taskmanager.Task.(Task.java:418)
> 1132  at 
> org.apache.flink.runtime.taskexecutor.TaskExecutor.submitTask(TaskExecutor.java:821)
> 1133  at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 1134  at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> 1135  at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> 1136  at java.base/java.lang.reflect.Method.invoke(Method.java:566)
> 1137  at 
> org.apache.flink.runtime.rpc.pekko.PekkoRpcActor.lambda$handleRpcInvocation$1(PekkoRpcActor.java:309)
> 1138  at 
> org.apache.flink.runtime.concurrent.ClassLoadingUtils.runWithContextClassLoader(ClassLoadingUtils.java:83)
> 1139  at 
> org.apache.flink.runtime.rpc.pekko.PekkoRpcActor.handleRpcInvocation(PekkoRpcActor.java:307)
> 1140  ... 23 more
> 1141  Caused by: java.lang.ClassNotFoundException: 
> javax.activation.UnsupportedDataTypeException
> 1142  at 
> java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:581)
> 1143  at 
> java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:178)
> 1144  at 
> java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:527)
> 1145  ... 39 more {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (FLINK-23582) Add documentation for session window

2024-02-02 Thread Martijn Visser (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-23582?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Martijn Visser closed FLINK-23582.
--
Resolution: Duplicate

> Add documentation for session window
> 
>
> Key: FLINK-23582
> URL: https://issues.apache.org/jira/browse/FLINK-23582
> Project: Flink
>  Issue Type: Sub-task
>  Components: Documentation, Table SQL / API
>Reporter: Jing Zhang
>Assignee: Jing Zhang
>Priority: Minor
>  Labels: pull-request-available, stale-assigned
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-33856) Add metrics to monitor the interaction performance between task and external storage system in the process of checkpoint making

2024-02-02 Thread Martijn Visser (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-33856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17813723#comment-17813723
 ] 

Martijn Visser commented on FLINK-33856:


There's been a change at the ASF on who can create Confluence accounts, we're 
discussing on how to proceed. Hopefully I'll have an answer soon.

> Add metrics to monitor the interaction performance between task and external 
> storage system in the process of checkpoint making
> ---
>
> Key: FLINK-33856
> URL: https://issues.apache.org/jira/browse/FLINK-33856
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / Checkpointing
>Affects Versions: 1.18.0
>Reporter: Jufang He
>Assignee: Jufang He
>Priority: Major
>  Labels: pull-request-available
>
> When Flink makes a checkpoint, the interaction performance with the external 
> file system has a great impact on the overall time-consuming. Therefore, it 
> is easy to observe the bottleneck point by adding performance indicators when 
> the task interacts with the external file storage system. These include: the 
> rate of file write , the latency to write the file, the latency to close the 
> file.
> In flink side add the above metrics has the following advantages: convenient 
> statistical different task E2E time-consuming; do not need to distinguish the 
> type of external storage system, can be unified in the 
> FsCheckpointStreamFactory.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] [FLINK-33138] DataStream API implementation [flink-connector-prometheus]

2024-02-02 Thread via GitHub


hlteoh37 commented on code in PR #1:
URL: 
https://github.com/apache/flink-connector-prometheus/pull/1#discussion_r1476176456


##
example-datastream-job/pom.xml:
##
@@ -0,0 +1,156 @@
+
+
+http://maven.apache.org/POM/4.0.0;
+ xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance;
+ xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 
http://maven.apache.org/xsd/maven-4.0.0.xsd;>
+4.0.0
+
+org.apache.flink
+flink-connector-prometheus-parent
+1.0.0-SNAPSHOT
+
+
+Flink : Connectors : Prometheus : Sample Application
+example-datastream-job
+
+
+UTF-8
+1.17.2
+2.17.1
+
+
+
+
+org.apache.flink
+flink-connector-prometheus
+${project.version}
+
+
+
+
+org.apache.flink.connector.prometheus
+amp-request-signer
+${project.version}
+
+
+
+
+org.apache.flink
+flink-streaming-java
+${flink.version}
+provided
+
+
+org.apache.flink
+flink-clients
+${flink.version}
+provided
+
+
+org.apache.flink
+flink-connector-base
+${flink.version}
+provided
+
+
+
+org.apache.logging.log4j
+log4j-slf4j-impl
+${log4j.version}
+runtime
+
+
+org.apache.logging.log4j
+log4j-api
+${log4j.version}
+runtime
+
+
+org.apache.logging.log4j
+log4j-core
+${log4j.version}
+runtime
+
+
+
+
+org.apache.flink
+flink-runtime-web

Review Comment:
   why do we need flink-runtime-web?



##
amp-request-signer/pom.xml:
##
@@ -0,0 +1,60 @@
+
+
+http://maven.apache.org/POM/4.0.0;
+ xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance;
+ xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 
http://maven.apache.org/xsd/maven-4.0.0.xsd;>
+4.0.0
+
+org.apache.flink
+flink-connector-prometheus-parent
+1.0.0-SNAPSHOT
+
+
+Flink : Connectors : Prometheus : AMP request signer
+org.apache.flink.connector.prometheus
+amp-request-signer
+
+
+UTF-8
+1.17.0
+
+
+
+
+org.apache.flink
+flink-connector-prometheus
+${project.version}
+provided
+
+
+
+com.amazonaws
+aws-java-sdk-core
+1.12.570

Review Comment:
   can we delegate this to a bom in parent?



##
prometheus-connector/src/main/java/org/apache/flink/connector/prometheus/sink/errorhandling/SinkWriterErrorHandlingBehaviorConfiguration.java:
##
@@ -0,0 +1,105 @@
+/*
+ *  Licensed to the Apache Software Foundation (ASF) under one or more
+ *  contributor license agreements.  See the NOTICE file distributed with
+ *  this work for additional information regarding copyright ownership.
+ *  The ASF licenses this file to You under the Apache License, Version 2.0
+ *  (the "License"); you may not use this file except in compliance with
+ *  the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ *  Unless required by applicable law or agreed to in writing, software
+ *  distributed under the License is distributed on an "AS IS" BASIS,
+ *  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ *  See the License for the specific language governing permissions and
+ *  limitations under the License.
+ */
+
+package org.apache.flink.connector.prometheus.sink.errorhandling;
+
+import java.io.Serializable;
+import java.util.Optional;
+
+import static 
org.apache.flink.connector.prometheus.sink.errorhandling.OnErrorBehavior.FAIL;
+
+/**
+ * Configure the error-handling behavior of the writer, for different types of 
error. Also defines
+ * default behaviors.
+ */
+public class SinkWriterErrorHandlingBehaviorConfiguration implements 
Serializable {
+
+public static final OnErrorBehavior ON_MAX_RETRY_EXCEEDED_DEFAULT_BEHAVIOR 
= FAIL;
+public static final OnErrorBehavior 
ON_HTTP_CLIENT_IO_FAIL_DEFAULT_BEHAVIOR = FAIL;
+public static final OnErrorBehavior 
ON_PROMETHEUS_NON_RETRIABLE_ERROR_DEFAULT_BEHAVIOR = FAIL;
+
+/** Behaviour when the max retries is exceeded on Prometheus retriable 
errors. */
+private final OnErrorBehavior onMaxRetryExceeded;
+
+/** Behaviour when the HTTP client fails, for an I/O problem. */
+private final OnErrorBehavior onHttpClientIOFail;
+
+/** Behaviour when Prometheus Remote-Write respond with a non-retriable 
error. */
+private final OnErrorBehavior onPrometheusNonRetriableError;
+
+public 

Re: [PR] [FLINK-34344] Pass JobID to CheckpointStatsTracker [flink]

2024-02-02 Thread via GitHub


flinkbot commented on PR #24254:
URL: https://github.com/apache/flink/pull/24254#issuecomment-1924070918

   
   ## CI report:
   
   * 03d4cfb0053a784e0f88677dcf1e8a6e4dc3c9f4 UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Commented] (FLINK-34122) Deprecate old serialization config methods and options

2024-02-02 Thread Martijn Visser (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-34122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17813714#comment-17813714
 ] 

Martijn Visser commented on FLINK-34122:


Whatever works for you! 

> Deprecate old serialization config methods and options
> --
>
> Key: FLINK-34122
> URL: https://issues.apache.org/jira/browse/FLINK-34122
> Project: Flink
>  Issue Type: Sub-task
>  Components: API / Type Serialization System, Runtime / Configuration
>Affects Versions: 1.19.0
>Reporter: Zhanghao Chen
>Assignee: Zhanghao Chen
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.19.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-34344) Wrong JobID in CheckpointStatsTracker

2024-02-02 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-34344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated FLINK-34344:
---
Labels: pull-request-available  (was: )

> Wrong JobID in CheckpointStatsTracker
> -
>
> Key: FLINK-34344
> URL: https://issues.apache.org/jira/browse/FLINK-34344
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Checkpointing
>Affects Versions: 1.19.0
>Reporter: Roman Khachatryan
>Assignee: Roman Khachatryan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.19.0
>
>
> The job id is generated randomly:
> ```
> public CheckpointStatsTracker(int numRememberedCheckpoints, MetricGroup 
> metricGroup) {
> this(numRememberedCheckpoints, metricGroup, new JobID(), 
> Integer.MAX_VALUE);
> }
> ```
> This affects how it is logged (or reported elsewhere).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[PR] [FLINK-34344] Pass JobID to CheckpointStatsTracker [flink]

2024-02-02 Thread via GitHub


rkhachatryan opened a new pull request, #24254:
URL: https://github.com/apache/flink/pull/24254

   
   
   ## What is the purpose of the change
   
   *(For example: This pull request makes task deployment go through the blob 
server, rather than through RPC. That way we avoid re-transferring them on each 
deployment (during recovery).)*
   
   
   ## Brief change log
   
   *(for example:)*
 - *The TaskInfo is stored in the blob store on job creation time as a 
persistent artifact*
 - *Deployments RPC transmits only the blob storage reference*
 - *TaskManagers retrieve the TaskInfo from the blob cache*
   
   
   ## Verifying this change
   
   Please make sure both new and modified tests in this PR follows the 
conventions defined in our code quality guide: 
https://flink.apache.org/contributing/code-style-and-quality-common.html#testing
   
   *(Please pick either of the following options)*
   
   This change is a trivial rework / code cleanup without any test coverage.
   
   *(or)*
   
   This change is already covered by existing tests, such as *(please describe 
tests)*.
   
   *(or)*
   
   This change added tests and can be verified as follows:
   
   *(example:)*
 - *Added integration tests for end-to-end deployment with large payloads 
(100MB)*
 - *Extended integration test for recovery after master (JobManager) 
failure*
 - *Added test that validates that TaskInfo is transferred only once across 
recoveries*
 - *Manually verified the change by running a 4 node cluster with 2 
JobManagers and 4 TaskManagers, a stateful streaming program, and killing one 
JobManager and two TaskManagers during the execution, verifying that recovery 
happens correctly.*
   
   ## Does this pull request potentially affect one of the following parts:
   
 - Dependencies (does it add or upgrade a dependency): (yes / no)
 - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: (yes / no)
 - The serializers: (yes / no / don't know)
 - The runtime per-record code paths (performance sensitive): (yes / no / 
don't know)
 - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Kubernetes/Yarn, ZooKeeper: (yes / no / don't know)
 - The S3 file system connector: (yes / no / don't know)
   
   ## Documentation
   
 - Does this pull request introduce a new feature? (yes / no)
 - If yes, how is the feature documented? (not applicable / docs / JavaDocs 
/ not documented)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [FLINK-21949][table] Support ARRAY_AGG aggregate function [flink]

2024-02-02 Thread via GitHub


snuyanzin commented on PR #23411:
URL: https://github.com/apache/flink/pull/23411#issuecomment-1924052949

   I think we could wait until Monday or even more since right now a feature 
freeze and need to wait for cutting release branch


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Created] (FLINK-34344) Wrong JobID in CheckpointStatsTracker

2024-02-02 Thread Roman Khachatryan (Jira)
Roman Khachatryan created FLINK-34344:
-

 Summary: Wrong JobID in CheckpointStatsTracker
 Key: FLINK-34344
 URL: https://issues.apache.org/jira/browse/FLINK-34344
 Project: Flink
  Issue Type: Bug
  Components: Runtime / Checkpointing
Affects Versions: 1.19.0
Reporter: Roman Khachatryan
Assignee: Roman Khachatryan
 Fix For: 1.19.0


The job id is generated randomly:
```
public CheckpointStatsTracker(int numRememberedCheckpoints, MetricGroup 
metricGroup) {
this(numRememberedCheckpoints, metricGroup, new JobID(), 
Integer.MAX_VALUE);
}
```
This affects how it is logged (or reported elsewhere).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (FLINK-34295) Release Testing Instructions: Verify FLINK-33712 Deprecate RuntimeContext#getExecutionConfig

2024-02-02 Thread lincoln lee (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-34295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lincoln lee closed FLINK-34295.
---
Resolution: Fixed

[~JunRuiLi] Thanks for confirming!

> Release Testing Instructions: Verify FLINK-33712 Deprecate 
> RuntimeContext#getExecutionConfig
> 
>
> Key: FLINK-34295
> URL: https://issues.apache.org/jira/browse/FLINK-34295
> Project: Flink
>  Issue Type: Sub-task
>  Components: Runtime / Configuration
>Affects Versions: 1.19.0
>Reporter: lincoln lee
>Assignee: Junrui Li
>Priority: Blocker
>  Labels: release-testing
> Fix For: 1.19.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (FLINK-34296) Release Testing Instructions: Verify FLINK-33581 Deprecate configuration getters/setters that return/set complex Java objects

2024-02-02 Thread lincoln lee (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-34296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lincoln lee closed FLINK-34296.
---
Resolution: Fixed

> Release Testing Instructions: Verify FLINK-33581 Deprecate configuration 
> getters/setters that return/set complex Java objects
> -
>
> Key: FLINK-34296
> URL: https://issues.apache.org/jira/browse/FLINK-34296
> Project: Flink
>  Issue Type: Sub-task
>  Components: Runtime / Configuration
>Affects Versions: 1.19.0
>Reporter: lincoln lee
>Assignee: Junrui Li
>Priority: Blocker
>  Labels: release-testing
> Fix For: 1.19.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-34296) Release Testing Instructions: Verify FLINK-33581 Deprecate configuration getters/setters that return/set complex Java objects

2024-02-02 Thread lincoln lee (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-34296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17813709#comment-17813709
 ] 

lincoln lee commented on FLINK-34296:
-

[~JunRuiLi] Thanks for confirming, will close it.

> Release Testing Instructions: Verify FLINK-33581 Deprecate configuration 
> getters/setters that return/set complex Java objects
> -
>
> Key: FLINK-34296
> URL: https://issues.apache.org/jira/browse/FLINK-34296
> Project: Flink
>  Issue Type: Sub-task
>  Components: Runtime / Configuration
>Affects Versions: 1.19.0
>Reporter: lincoln lee
>Assignee: Junrui Li
>Priority: Blocker
>  Labels: release-testing
> Fix For: 1.19.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-34299) Release Testing Instructions: Verify FLINK-33203 Adding a separate configuration for specifying Java Options of the SQL Gateway

2024-02-02 Thread lincoln lee (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-34299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17813708#comment-17813708
 ] 

lincoln lee commented on FLINK-34299:
-

[~guoyangze] Thanks for confirming!

> Release Testing Instructions: Verify FLINK-33203 Adding a separate 
> configuration for specifying Java Options of the SQL Gateway 
> 
>
> Key: FLINK-34299
> URL: https://issues.apache.org/jira/browse/FLINK-34299
> Project: Flink
>  Issue Type: Sub-task
>  Components: Table SQL / Gateway
>Affects Versions: 1.19.0
>Reporter: lincoln lee
>Assignee: Yangze Guo
>Priority: Blocker
>  Labels: release-testing
> Fix For: 1.19.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] [FLINK-24239] Event time temporal join should support values from array, map, row, etc. as join key [flink]

2024-02-02 Thread via GitHub


flinkbot commented on PR #24253:
URL: https://github.com/apache/flink/pull/24253#issuecomment-1924021657

   
   ## CI report:
   
   * e06aee059e610ce72fb153eff7c8f466cde82bfa UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[PR] [FLINK-24239] Event time temporal join should support values from array, map, row, etc. as join key [flink]

2024-02-02 Thread via GitHub


dawidwys opened a new pull request, #24253:
URL: https://github.com/apache/flink/pull/24253

   ## What is the purpose of the change
   
   This makes it possible to use more complex keys for temporal join. It tries 
pushing down join predicates before extracting join keys which makes more cases 
produce an equi-join condition required for a temporal join.
   
   
   ## Verifying this change
   
   Added tests in `TemporalJoinRestoreTest` covering keys from:
   * a nested row
   * value from a map
   
   ## Does this pull request potentially affect one of the following parts:
   
 - Dependencies (does it add or upgrade a dependency): (yes / **no**)
 - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: (yes / **no**)
 - The serializers: (yes / **no** / don't know)
 - The runtime per-record code paths (performance sensitive): (yes / **no** 
/ don't know)
 - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Kubernetes/Yarn, ZooKeeper: (yes / **no** / don't 
know)
 - The S3 file system connector: (yes / **no** / don't know)
   
   ## Documentation
   
 - Does this pull request introduce a new feature? (yes / **no**)
 - If yes, how is the feature documented? (**not applicable** / docs / 
JavaDocs / not documented)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (FLINK-24239) Event time temporal join should support values from array, map, row, etc. as join key

2024-02-02 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-24239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated FLINK-24239:
---
Labels: pull-request-available  (was: )

> Event time temporal join should support values from array, map, row, etc. as 
> join key
> -
>
> Key: FLINK-24239
> URL: https://issues.apache.org/jira/browse/FLINK-24239
> Project: Flink
>  Issue Type: Improvement
>  Components: Table SQL / Planner
>Affects Versions: 1.12.6, 1.13.3, 1.14.1, 1.15.0
>Reporter: Caizhi Weng
>Assignee: Dawid Wysakowicz
>Priority: Major
>  Labels: pull-request-available
>
> This ticket is from the [mailing 
> list|https://lists.apache.org/thread.html/r90cab9c5026e527357d58db70d7e9b5875e57b942738f032bd54bfd3%40%3Cuser-zh.flink.apache.org%3E].
> Currently in event time temporal join when join keys are from an array, map 
> or row, an exception will be thrown saying "Currently the join key in 
> Temporal Table Join can not be empty". This is quite confusing for users as 
> they've already set the join keys.
> Add the following test case to {{TableEnvironmentITCase}} to reproduce this 
> issue.
> {code:scala}
> @Test
> def myTest(): Unit = {
>   tEnv.executeSql(
> """
>   |CREATE TABLE A (
>   |  a MAP,
>   |  ts TIMESTAMP(3),
>   |  WATERMARK FOR ts AS ts
>   |) WITH (
>   |  'connector' = 'values'
>   |)
>   |""".stripMargin)
>   tEnv.executeSql(
> """
>   |CREATE TABLE B (
>   |  id INT,
>   |  ts TIMESTAMP(3),
>   |  WATERMARK FOR ts AS ts
>   |) WITH (
>   |  'connector' = 'values'
>   |)
>   |""".stripMargin)
>   tEnv.executeSql("SELECT * FROM A LEFT JOIN B FOR SYSTEM_TIME AS OF A.ts AS 
> b ON A.a['ID'] = id").print()
> }
> {code}
> The exception stack is
> {code:java}
> org.apache.flink.table.api.ValidationException: Currently the join key in 
> Temporal Table Join can not be empty.
>   at 
> org.apache.flink.table.planner.plan.rules.logical.LogicalCorrelateToJoinFromGeneralTemporalTableRule.onMatch(LogicalCorrelateToJoinFromTemporalTableRule.scala:272)
>   at 
> org.apache.calcite.plan.AbstractRelOptPlanner.fireRule(AbstractRelOptPlanner.java:333)
>   at org.apache.calcite.plan.hep.HepPlanner.applyRule(HepPlanner.java:542)
>   at 
> org.apache.calcite.plan.hep.HepPlanner.applyRules(HepPlanner.java:407)
>   at 
> org.apache.calcite.plan.hep.HepPlanner.executeInstruction(HepPlanner.java:243)
>   at 
> org.apache.calcite.plan.hep.HepInstruction$RuleInstance.execute(HepInstruction.java:127)
>   at 
> org.apache.calcite.plan.hep.HepPlanner.executeProgram(HepPlanner.java:202)
>   at 
> org.apache.calcite.plan.hep.HepPlanner.findBestExp(HepPlanner.java:189)
>   at 
> org.apache.flink.table.planner.plan.optimize.program.FlinkHepProgram.optimize(FlinkHepProgram.scala:69)
>   at 
> org.apache.flink.table.planner.plan.optimize.program.FlinkHepRuleSetProgram.optimize(FlinkHepRuleSetProgram.scala:87)
>   at 
> org.apache.flink.table.planner.plan.optimize.program.FlinkGroupProgram$$anonfun$optimize$1$$anonfun$apply$1.apply(FlinkGroupProgram.scala:63)
>   at 
> org.apache.flink.table.planner.plan.optimize.program.FlinkGroupProgram$$anonfun$optimize$1$$anonfun$apply$1.apply(FlinkGroupProgram.scala:60)
>   at 
> scala.collection.TraversableOnce$$anonfun$foldLeft$1.apply(TraversableOnce.scala:157)
>   at 
> scala.collection.TraversableOnce$$anonfun$foldLeft$1.apply(TraversableOnce.scala:157)
>   at scala.collection.Iterator$class.foreach(Iterator.scala:891)
>   at scala.collection.AbstractIterator.foreach(Iterator.scala:1334)
>   at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
>   at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
>   at 
> scala.collection.TraversableOnce$class.foldLeft(TraversableOnce.scala:157)
>   at scala.collection.AbstractTraversable.foldLeft(Traversable.scala:104)
>   at 
> org.apache.flink.table.planner.plan.optimize.program.FlinkGroupProgram$$anonfun$optimize$1.apply(FlinkGroupProgram.scala:60)
>   at 
> org.apache.flink.table.planner.plan.optimize.program.FlinkGroupProgram$$anonfun$optimize$1.apply(FlinkGroupProgram.scala:55)
>   at 
> scala.collection.TraversableOnce$$anonfun$foldLeft$1.apply(TraversableOnce.scala:157)
>   at 
> scala.collection.TraversableOnce$$anonfun$foldLeft$1.apply(TraversableOnce.scala:157)
>   at scala.collection.immutable.Range.foreach(Range.scala:160)
>   at 
> scala.collection.TraversableOnce$class.foldLeft(TraversableOnce.scala:157)
>   at scala.collection.AbstractTraversable.foldLeft(Traversable.scala:104)
>   at 
> 

Re: [PR] [FLINK-21949][table] Support ARRAY_AGG aggregate function [flink]

2024-02-02 Thread via GitHub


dawidwys commented on PR #23411:
URL: https://github.com/apache/flink/pull/23411#issuecomment-1924010094

   Sorry, it takes such a long time from my side. I had a vacation in the 
meantime. I'll try to check it Monday. Nevertheless if you're comfortable with 
the PR @snuyanzin feel free to merge it without waiting for my review.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [FLINK-34007][k8s] Add graceful shutdown to KubernetesLeaderElector [flink]

2024-02-02 Thread via GitHub


flinkbot commented on PR #24252:
URL: https://github.com/apache/flink/pull/24252#issuecomment-1923919766

   
   ## CI report:
   
   * 4d84982a1376edabc04108045e4cbd32eb9e59da UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[PR] [FLINK-34007][k8s] Add graceful shutdown to KubernetesLeaderElector [flink]

2024-02-02 Thread via GitHub


XComp opened a new pull request, #24252:
URL: https://github.com/apache/flink/pull/24252

   ## What is the purpose of the change
   
   There was a test failure in [this 
build](https://github.com/XComp/flink/actions/runs/7745486791/job/21121947504#step:14:7309)
 which was testing FLINK-34333 (the 1.18 backport of FLINK-34007). 
`KubernetesLeaderElectorITCase.testMultipleKubernetesLeaderElectors` failed 
with the following error:
   ```
   18:38:52,290 [KubernetesLeaderElector-ExecutorService-thread-1] ERROR 
io.fabric8.kubernetes.client.extended.leaderelection.LeaderElector [] - 
Exception occurred while releasing lock 'ConfigMapLock: default - 
kubernetesleaderelectoritcase-configmap-2e3f2dee-c41e-4f6f-830e-426018db34
   a7 (406ed8e6-5cc6-4777-81b3-2ca706b83d72)' on cancel
   io.fabric8.kubernetes.client.KubernetesClientException: Operation: [get]  
for kind: [ConfigMap]  with name: 
[kubernetesleaderelectoritcase-configmap-2e3f2dee-c41e-4f6f-830e-426018db34a7]  
in namespace: [default]  failed.
   at 
io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:159)
 ~[kubernetes-client-api-6.9.2.jar:?]
   at 
io.fabric8.kubernetes.client.dsl.internal.BaseOperation.requireFromServer(BaseOperation.java:194)
 ~[kubernetes-client-6.9.2.jar:?]
   at 
io.fabric8.kubernetes.client.dsl.internal.BaseOperation.get(BaseOperation.java:148)
 ~[kubernetes-client-6.9.2.jar:?]
   at 
io.fabric8.kubernetes.client.dsl.internal.BaseOperation.get(BaseOperation.java:97)
 ~[kubernetes-client-6.9.2.jar:?]
   at 
io.fabric8.kubernetes.client.extended.leaderelection.resourcelock.ResourceLock.get(ResourceLock.java:49)
 ~[kubernetes-client-api-6.9.2.jar:?]
   at 
io.fabric8.kubernetes.client.extended.leaderelection.LeaderElector.release(LeaderElector.java:148)
 ~[kubernetes-client-api-6.9.2.jar:?]
   at 
io.fabric8.kubernetes.client.extended.leaderelection.LeaderElector.stopLeading(LeaderElector.java:126)
 ~[kubernetes-client-api-6.9.2.jar:?]
   at 
io.fabric8.kubernetes.client.extended.leaderelection.LeaderElector.lambda$null$1(LeaderElector.java:96)
 ~[kubernetes-client-api-6.9.2.jar:?]
   at 
java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:774)
 ~[?:1.8.0_402]
   at 
java.util.concurrent.CompletableFuture.uniWhenCompleteStage(CompletableFuture.java:792)
 ~[?:1.8.0_402]
   at 
java.util.concurrent.CompletableFuture.whenComplete(CompletableFuture.java:2153)
 ~[?:1.8.0_402]
   at 
io.fabric8.kubernetes.client.extended.leaderelection.LeaderElector.lambda$start$2(LeaderElector.java:95)
 ~[kubernetes-client-api-6.9.2.jar:?]
   at 
java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:774)
 ~[?:1.8.0_402]
   at 
java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:750)
 ~[?:1.8.0_402]
   at 
java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488) 
~[?:1.8.0_402]
   at 
java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1975) 
~[?:1.8.0_402]
   at 
io.fabric8.kubernetes.client.extended.leaderelection.LeaderElector.lambda$acquire$4(LeaderElector.java:173)
 ~[kubernetes-client-api-6.9.2.jar:?]
   at 
io.fabric8.kubernetes.client.extended.leaderelection.LeaderElector.lambda$loop$8(LeaderElector.java:282)
 ~[kubernetes-client-api-6.9.2.jar:?]
   at 
java.util.concurrent.CompletableFuture$AsyncRun.run(CompletableFuture.java:1640)
 [?:1.8.0_402]
   at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 
[?:1.8.0_402]
   at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 
[?:1.8.0_402]
   at java.lang.Thread.run(Thread.java:750) [?:1.8.0_402]
   Caused by: java.io.InterruptedIOException
   at 
io.fabric8.kubernetes.client.dsl.internal.OperationSupport.waitForResult(OperationSupport.java:494)
 ~[kubernetes-client-6.9.2.jar:?]
   at 
io.fabric8.kubernetes.client.dsl.internal.OperationSupport.handleResponse(OperationSupport.java:524)
 ~[kubernetes-client-6.9.2.jar:?]
   at 
io.fabric8.kubernetes.client.dsl.internal.OperationSupport.handleGet(OperationSupport.java:467)
 ~[kubernetes-client-6.9.2.jar:?]
   at 
io.fabric8.kubernetes.client.dsl.internal.BaseOperation.handleGet(BaseOperation.java:791)
 ~[kubernetes-client-6.9.2.jar:?]
   at 
io.fabric8.kubernetes.client.dsl.internal.BaseOperation.requireFromServer(BaseOperation.java:192)
 ~[kubernetes-client-6.9.2.jar:?]
   ... 20 more
   Caused by: java.lang.InterruptedException
   at 
java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:347) 
~[?:1.8.0_402]
   at 
java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1908) 
~[?:1.8.0_402]
   at 

Re: [PR] [FLINK-34217][docs] Update user doc for type serialization with FLIP-398 [flink]

2024-02-02 Thread via GitHub


X-czh commented on PR #24230:
URL: https://github.com/apache/flink/pull/24230#issuecomment-1923775660

   @flinkbot run azure


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Assigned] (FLINK-24239) Event time temporal join should support values from array, map, row, etc. as join key

2024-02-02 Thread Dawid Wysakowicz (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-24239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dawid Wysakowicz reassigned FLINK-24239:


Assignee: Dawid Wysakowicz

> Event time temporal join should support values from array, map, row, etc. as 
> join key
> -
>
> Key: FLINK-24239
> URL: https://issues.apache.org/jira/browse/FLINK-24239
> Project: Flink
>  Issue Type: Improvement
>  Components: Table SQL / Planner
>Affects Versions: 1.12.6, 1.13.3, 1.14.1, 1.15.0
>Reporter: Caizhi Weng
>Assignee: Dawid Wysakowicz
>Priority: Major
>
> This ticket is from the [mailing 
> list|https://lists.apache.org/thread.html/r90cab9c5026e527357d58db70d7e9b5875e57b942738f032bd54bfd3%40%3Cuser-zh.flink.apache.org%3E].
> Currently in event time temporal join when join keys are from an array, map 
> or row, an exception will be thrown saying "Currently the join key in 
> Temporal Table Join can not be empty". This is quite confusing for users as 
> they've already set the join keys.
> Add the following test case to {{TableEnvironmentITCase}} to reproduce this 
> issue.
> {code:scala}
> @Test
> def myTest(): Unit = {
>   tEnv.executeSql(
> """
>   |CREATE TABLE A (
>   |  a MAP,
>   |  ts TIMESTAMP(3),
>   |  WATERMARK FOR ts AS ts
>   |) WITH (
>   |  'connector' = 'values'
>   |)
>   |""".stripMargin)
>   tEnv.executeSql(
> """
>   |CREATE TABLE B (
>   |  id INT,
>   |  ts TIMESTAMP(3),
>   |  WATERMARK FOR ts AS ts
>   |) WITH (
>   |  'connector' = 'values'
>   |)
>   |""".stripMargin)
>   tEnv.executeSql("SELECT * FROM A LEFT JOIN B FOR SYSTEM_TIME AS OF A.ts AS 
> b ON A.a['ID'] = id").print()
> }
> {code}
> The exception stack is
> {code:java}
> org.apache.flink.table.api.ValidationException: Currently the join key in 
> Temporal Table Join can not be empty.
>   at 
> org.apache.flink.table.planner.plan.rules.logical.LogicalCorrelateToJoinFromGeneralTemporalTableRule.onMatch(LogicalCorrelateToJoinFromTemporalTableRule.scala:272)
>   at 
> org.apache.calcite.plan.AbstractRelOptPlanner.fireRule(AbstractRelOptPlanner.java:333)
>   at org.apache.calcite.plan.hep.HepPlanner.applyRule(HepPlanner.java:542)
>   at 
> org.apache.calcite.plan.hep.HepPlanner.applyRules(HepPlanner.java:407)
>   at 
> org.apache.calcite.plan.hep.HepPlanner.executeInstruction(HepPlanner.java:243)
>   at 
> org.apache.calcite.plan.hep.HepInstruction$RuleInstance.execute(HepInstruction.java:127)
>   at 
> org.apache.calcite.plan.hep.HepPlanner.executeProgram(HepPlanner.java:202)
>   at 
> org.apache.calcite.plan.hep.HepPlanner.findBestExp(HepPlanner.java:189)
>   at 
> org.apache.flink.table.planner.plan.optimize.program.FlinkHepProgram.optimize(FlinkHepProgram.scala:69)
>   at 
> org.apache.flink.table.planner.plan.optimize.program.FlinkHepRuleSetProgram.optimize(FlinkHepRuleSetProgram.scala:87)
>   at 
> org.apache.flink.table.planner.plan.optimize.program.FlinkGroupProgram$$anonfun$optimize$1$$anonfun$apply$1.apply(FlinkGroupProgram.scala:63)
>   at 
> org.apache.flink.table.planner.plan.optimize.program.FlinkGroupProgram$$anonfun$optimize$1$$anonfun$apply$1.apply(FlinkGroupProgram.scala:60)
>   at 
> scala.collection.TraversableOnce$$anonfun$foldLeft$1.apply(TraversableOnce.scala:157)
>   at 
> scala.collection.TraversableOnce$$anonfun$foldLeft$1.apply(TraversableOnce.scala:157)
>   at scala.collection.Iterator$class.foreach(Iterator.scala:891)
>   at scala.collection.AbstractIterator.foreach(Iterator.scala:1334)
>   at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
>   at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
>   at 
> scala.collection.TraversableOnce$class.foldLeft(TraversableOnce.scala:157)
>   at scala.collection.AbstractTraversable.foldLeft(Traversable.scala:104)
>   at 
> org.apache.flink.table.planner.plan.optimize.program.FlinkGroupProgram$$anonfun$optimize$1.apply(FlinkGroupProgram.scala:60)
>   at 
> org.apache.flink.table.planner.plan.optimize.program.FlinkGroupProgram$$anonfun$optimize$1.apply(FlinkGroupProgram.scala:55)
>   at 
> scala.collection.TraversableOnce$$anonfun$foldLeft$1.apply(TraversableOnce.scala:157)
>   at 
> scala.collection.TraversableOnce$$anonfun$foldLeft$1.apply(TraversableOnce.scala:157)
>   at scala.collection.immutable.Range.foreach(Range.scala:160)
>   at 
> scala.collection.TraversableOnce$class.foldLeft(TraversableOnce.scala:157)
>   at scala.collection.AbstractTraversable.foldLeft(Traversable.scala:104)
>   at 
> org.apache.flink.table.planner.plan.optimize.program.FlinkGroupProgram.optimize(FlinkGroupProgram.scala:55)
>  

[jira] [Commented] (FLINK-33734) Merge unaligned checkpoint state handle

2024-02-02 Thread Feifan Wang (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-33734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17813646#comment-17813646
 ] 

Feifan Wang commented on FLINK-33734:
-

I found that only merging handles was not enough. Although the metadata was 
obviously smaller, it still took a long time for jobmanager to process rpc and 
serialize metadata. I speculate the reason is that although the merge handle 
avoids duplicate file paths, but the number of objects in the metadata is not 
significantly reduced (mainly channel infos). So I tried serializing the 
channel infos directly on the taskmanager side, and the test results showed 
that this worked well. Below are the results of my test:
{code:java}
# control group :
checkpoint time         (    s ) --- avg: 82.73                max: 155.88      
         min: 49.90
store metadata time     (    s ) --- avg: 23.45                max: 47.60       
         min: 9.61
metadata size           (   MB ) --- avg: 696.55               max: 954.98      
         min: 461.35
store metadata speed    ( MB/s ) --- avg: 32.41                max: 61.82       
         min: 18.29

# only merge handle :
checkpoint time         (    s ) --- avg: 66.22                max: 123.12      
         min: 38.68
store metadata time     (    s ) --- avg: 12.76                max: 26.18       
         min: 4.02
metadata size           (   MB ) --- avg: 269.14               max: 394.26      
         min: 159.86
store metadata speed    ( MB/s ) --- avg: 23.93                max: 46.55       
         min: 11.33

# not only merge handles, but also serialize channel infos on TaskMangager :
checkpoint time         (    s ) --- avg: 30.63                max: 74.27       
         min: 5.16
store metadata time     (    s ) --- avg: 0.87                 max: 11.23       
         min: 0.12
metadata size           (   MB ) --- avg: 232.22               max: 392.86      
         min: 45.34
store metadata speed    ( MB/s ) --- avg: 291.00               max: 386.80      
         min: 23.18{code}
Based on the results of the above test, I think serializing channel infos on 
the taskmanger side should be done together. I submitted a PR to implement this 
solution, please have a look [~pnowojski] ,[~fanrui] ,[~roman] , [~Zakelly] .

> Merge unaligned checkpoint state handle
> ---
>
> Key: FLINK-33734
> URL: https://issues.apache.org/jira/browse/FLINK-33734
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / Checkpointing
>Reporter: Feifan Wang
>Assignee: Feifan Wang
>Priority: Major
>  Labels: pull-request-available
>
> h3. Background
> Unaligned checkpoint will write the inflight-data of all InputChannel and 
> ResultSubpartition of the same subtask to the same file during checkpoint. 
> The InputChannelStateHandle and ResultSubpartitionStateHandle organize the 
> metadata of inflight-data at the channel granularity, which causes the file 
> name to be repeated many times. When a job is under backpressure and task 
> parallelism is high, the metadata of unaligned checkpoints will bloat. This 
> will result in:
>  # The amount of data reported by taskmanager to jobmanager increases, and 
> jobmanager takes longer to process these RPC requests.
>  # The metadata of the entire checkpoint becomes very large, and it takes 
> longer to serialize and write it to dfs.
> Both of the above points ultimately lead to longer checkpoint duration.
> h3. A Production example
> Take our production job with a parallelism of 4800 as an example:
>  # When there is no back pressure, checkpoint end-to-end duration is within 7 
> seconds.
>  # When under pressure: checkpoint end-to-end duration often exceeds 1 
> minute. We found that jobmanager took more than 40 seconds to process rpc 
> requests, and serialized metadata took more than 20 seconds.Some checkpoint 
> statistics:
> |metadata file size|950 MB|
> |channel state count|12,229,854|
> |channel file count|5536|
> Of the 950MB in the metadata file, 68% are redundant file paths.
> We enabled log-based checkpoint on this job and hoped that the checkpoint 
> could be completed within 30 seconds. This problem made it difficult to 
> achieve this goal.
> h3. Propose changes
> I suggest introducing MergedInputChannelStateHandle and 
> MergedResultSubpartitionStateHandle to eliminate redundant file paths.
> The taskmanager merges all InputChannelStateHandles with the same delegated 
> StreamStateHandle in the same subtask into one MergedInputChannelStateHandle 
> before reporting. When recovering from checkpoint, jobmangager converts 
> MergedInputChannelStateHandle to InputChannelStateHandle collection before 
> assigning state handle, and the rest of the process does not need to be 
> changed. 
> Structure of MergedInputChannelStateHandle :
>  
> 

Re: [PR] [FLINK-25421] Port JDBC Sink to new Unified Sink API (FLIP-143) [flink-connector-jdbc]

2024-02-02 Thread via GitHub


eskabetxe commented on PR #2:
URL: 
https://github.com/apache/flink-connector-jdbc/pull/2#issuecomment-1923662905

   @snuyanzin @MartijnVisser 
   I already change an test to mock Sink.InitContext as in 1.19 add a method 
with a new class..
   But now is falling all dialects for TIMESTAMP_LTZ that in 1.19 dont fail and 
for 1.18 fail.. is this a new feature on 1.19 correct? how should proceed with 
this? remove the fail test? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Commented] (FLINK-34343) ResourceManager registration is not completed when registering the JobMaster

2024-02-02 Thread Matthias Pohl (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-34343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17813629#comment-17813629
 ] 

Matthias Pohl commented on FLINK-34343:
---

[~chesnay] do you have capacity to look at that one? I'm still investigating 
the FLINK-34007 test instability.

> ResourceManager registration is not completed when registering the JobMaster
> 
>
> Key: FLINK-34343
> URL: https://issues.apache.org/jira/browse/FLINK-34343
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination, Runtime / RPC
>Affects Versions: 1.19.0
>Reporter: Matthias Pohl
>Priority: Blocker
>  Labels: test-stability
> Attachments: FLINK-34343_k8s_application_cluster_e2e_test.log
>
>
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=57203=logs=64debf87-ecdb-5aef-788d-8720d341b5cb=2302fb98-0839-5df2-3354-bbae636f81a7=8066
> The test run failed due to a NullPointerException:
> {code}
> Feb 02 01:11:55 2024-02-02 01:11:47,791 INFO  
> org.apache.flink.runtime.rpc.pekko.FencedPekkoRpcActor   [] - The rpc 
> endpoint 
> org.apache.flink.runtime.resourcemanager.active.ActiveResourceManager has not 
> been started yet. Discarding message LocalFencedMessage(000
> 0, 
> LocalRpcInvocation(ResourceManagerGateway.registerJobMaster(JobMasterId, 
> ResourceID, String, JobID, Time))) until processing is started.
> Feb 02 01:11:55 2024-02-02 01:11:47,797 WARN  
> org.apache.flink.runtime.rpc.pekko.SupervisorActor   [] - RpcActor 
> pekko://flink/user/rpc/resourcemanager_2 has failed. Shutting it down now.
> Feb 02 01:11:55 java.lang.NullPointerException: Cannot invoke 
> "org.apache.flink.runtime.rpc.RpcServer.getAddress()" because 
> "this.rpcServer" is null
> Feb 02 01:11:55 at 
> org.apache.flink.runtime.rpc.RpcEndpoint.getAddress(RpcEndpoint.java:322) 
> ~[flink-dist-1.19-SNAPSHOT.jar:1.19-SNAPSHOT]
> Feb 02 01:11:55 at 
> org.apache.flink.runtime.rpc.pekko.PekkoRpcActor.handleMessage(PekkoRpcActor.java:182)
>  ~[flink-rpc-akka06a9bb81-2e68-483a-b236-a283d0b1d097.jar:1.19-SNAPSHOT]
> Feb 02 01:11:55 at 
> org.apache.pekko.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:33) 
> ~[flink-rpc-akka06a9bb81-2e68-483a-b236-a283d0b1d097.jar:1.19-SNAPSHOT]
> Feb 02 01:11:55 at 
> org.apache.pekko.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:29) 
> ~[flink-rpc-akka06a9bb81-2e68-483a-b236-a283d0b1d097.jar:1.19-SNAPSHOT]
> Feb 02 01:11:55 at 
> scala.PartialFunction.applyOrElse(PartialFunction.scala:127) 
> ~[flink-rpc-akka06a9bb81-2e68-483a-b236-a283d0b1d097.jar:1.19-SNAPSHOT]
> Feb 02 01:11:55 at 
> scala.PartialFunction.applyOrElse$(PartialFunction.scala:126) 
> ~[flink-rpc-akka06a9bb81-2e68-483a-b236-a283d0b1d097.jar:1.19-SNAPSHOT]
> Feb 02 01:11:55 at 
> org.apache.pekko.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala:29)
>  ~[flink-rpc-akka06a9bb81-2e68-483a-b236-a283d0b1d097.jar:1.19-SNAPSHOT]
> Feb 02 01:11:55 at 
> scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:175) 
> ~[flink-rpc-akka06a9bb81-2e68-483a-b236-a283d0b1d097.jar:1.19-SNAPSHOT]
> Feb 02 01:11:55 at 
> scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:176) 
> ~[flink-rpc-akka06a9bb81-2e68-483a-b236-a283d0b1d097.jar:1.19-SNAPSHOT]
> Feb 02 01:11:55 at 
> scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:176) 
> ~[flink-rpc-akka06a9bb81-2e68-483a-b236-a283d0b1d097.jar:1.19-SNAPSHOT]
> Feb 02 01:11:55 at 
> org.apache.pekko.actor.Actor.aroundReceive(Actor.scala:547) 
> ~[flink-rpc-akka06a9bb81-2e68-483a-b236-a283d0b1d097.jar:1.19-SNAPSHOT]
> Feb 02 01:11:55 at 
> org.apache.pekko.actor.Actor.aroundReceive$(Actor.scala:545) 
> ~[flink-rpc-akka06a9bb81-2e68-483a-b236-a283d0b1d097.jar:1.19-SNAPSHOT]
> Feb 02 01:11:55 at 
> org.apache.pekko.actor.AbstractActor.aroundReceive(AbstractActor.scala:229) 
> ~[flink-rpc-akka06a9bb81-2e68-483a-b236-a283d0b1d097.jar:1.19-SNAPSHOT]
> Feb 02 01:11:55 at 
> org.apache.pekko.actor.ActorCell.receiveMessage(ActorCell.scala:590) 
> ~[flink-rpc-akka06a9bb81-2e68-483a-b236-a283d0b1d097.jar:1.19-SNAPSHOT]
> Feb 02 01:11:55 at 
> org.apache.pekko.actor.ActorCell.invoke(ActorCell.scala:557) 
> ~[flink-rpc-akka06a9bb81-2e68-483a-b236-a283d0b1d097.jar:1.19-SNAPSHOT]
> Feb 02 01:11:55 at 
> org.apache.pekko.dispatch.Mailbox.processMailbox(Mailbox.scala:280) 
> ~[flink-rpc-akka06a9bb81-2e68-483a-b236-a283d0b1d097.jar:1.19-SNAPSHOT]
> Feb 02 01:11:55 at 
> org.apache.pekko.dispatch.Mailbox.run(Mailbox.scala:241) 
> ~[flink-rpc-akka06a9bb81-2e68-483a-b236-a283d0b1d097.jar:1.19-SNAPSHOT]
> Feb 02 01:11:55 at 
> 

[jira] [Commented] (FLINK-34343) ResourceManager registration is not completed when registering the JobMaster

2024-02-02 Thread Matthias Pohl (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-34343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17813627#comment-17813627
 ] 

Matthias Pohl commented on FLINK-34343:
---

I attached the logs of the corresponding failure to this issue.

> ResourceManager registration is not completed when registering the JobMaster
> 
>
> Key: FLINK-34343
> URL: https://issues.apache.org/jira/browse/FLINK-34343
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination, Runtime / RPC
>Affects Versions: 1.19.0
>Reporter: Matthias Pohl
>Priority: Blocker
>  Labels: test-stability
> Attachments: FLINK-34343_k8s_application_cluster_e2e_test.log
>
>
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=57203=logs=64debf87-ecdb-5aef-788d-8720d341b5cb=2302fb98-0839-5df2-3354-bbae636f81a7=8066
> The test run failed due to a NullPointerException:
> {code}
> Feb 02 01:11:55 2024-02-02 01:11:47,791 INFO  
> org.apache.flink.runtime.rpc.pekko.FencedPekkoRpcActor   [] - The rpc 
> endpoint 
> org.apache.flink.runtime.resourcemanager.active.ActiveResourceManager has not 
> been started yet. Discarding message LocalFencedMessage(000
> 0, 
> LocalRpcInvocation(ResourceManagerGateway.registerJobMaster(JobMasterId, 
> ResourceID, String, JobID, Time))) until processing is started.
> Feb 02 01:11:55 2024-02-02 01:11:47,797 WARN  
> org.apache.flink.runtime.rpc.pekko.SupervisorActor   [] - RpcActor 
> pekko://flink/user/rpc/resourcemanager_2 has failed. Shutting it down now.
> Feb 02 01:11:55 java.lang.NullPointerException: Cannot invoke 
> "org.apache.flink.runtime.rpc.RpcServer.getAddress()" because 
> "this.rpcServer" is null
> Feb 02 01:11:55 at 
> org.apache.flink.runtime.rpc.RpcEndpoint.getAddress(RpcEndpoint.java:322) 
> ~[flink-dist-1.19-SNAPSHOT.jar:1.19-SNAPSHOT]
> Feb 02 01:11:55 at 
> org.apache.flink.runtime.rpc.pekko.PekkoRpcActor.handleMessage(PekkoRpcActor.java:182)
>  ~[flink-rpc-akka06a9bb81-2e68-483a-b236-a283d0b1d097.jar:1.19-SNAPSHOT]
> Feb 02 01:11:55 at 
> org.apache.pekko.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:33) 
> ~[flink-rpc-akka06a9bb81-2e68-483a-b236-a283d0b1d097.jar:1.19-SNAPSHOT]
> Feb 02 01:11:55 at 
> org.apache.pekko.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:29) 
> ~[flink-rpc-akka06a9bb81-2e68-483a-b236-a283d0b1d097.jar:1.19-SNAPSHOT]
> Feb 02 01:11:55 at 
> scala.PartialFunction.applyOrElse(PartialFunction.scala:127) 
> ~[flink-rpc-akka06a9bb81-2e68-483a-b236-a283d0b1d097.jar:1.19-SNAPSHOT]
> Feb 02 01:11:55 at 
> scala.PartialFunction.applyOrElse$(PartialFunction.scala:126) 
> ~[flink-rpc-akka06a9bb81-2e68-483a-b236-a283d0b1d097.jar:1.19-SNAPSHOT]
> Feb 02 01:11:55 at 
> org.apache.pekko.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala:29)
>  ~[flink-rpc-akka06a9bb81-2e68-483a-b236-a283d0b1d097.jar:1.19-SNAPSHOT]
> Feb 02 01:11:55 at 
> scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:175) 
> ~[flink-rpc-akka06a9bb81-2e68-483a-b236-a283d0b1d097.jar:1.19-SNAPSHOT]
> Feb 02 01:11:55 at 
> scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:176) 
> ~[flink-rpc-akka06a9bb81-2e68-483a-b236-a283d0b1d097.jar:1.19-SNAPSHOT]
> Feb 02 01:11:55 at 
> scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:176) 
> ~[flink-rpc-akka06a9bb81-2e68-483a-b236-a283d0b1d097.jar:1.19-SNAPSHOT]
> Feb 02 01:11:55 at 
> org.apache.pekko.actor.Actor.aroundReceive(Actor.scala:547) 
> ~[flink-rpc-akka06a9bb81-2e68-483a-b236-a283d0b1d097.jar:1.19-SNAPSHOT]
> Feb 02 01:11:55 at 
> org.apache.pekko.actor.Actor.aroundReceive$(Actor.scala:545) 
> ~[flink-rpc-akka06a9bb81-2e68-483a-b236-a283d0b1d097.jar:1.19-SNAPSHOT]
> Feb 02 01:11:55 at 
> org.apache.pekko.actor.AbstractActor.aroundReceive(AbstractActor.scala:229) 
> ~[flink-rpc-akka06a9bb81-2e68-483a-b236-a283d0b1d097.jar:1.19-SNAPSHOT]
> Feb 02 01:11:55 at 
> org.apache.pekko.actor.ActorCell.receiveMessage(ActorCell.scala:590) 
> ~[flink-rpc-akka06a9bb81-2e68-483a-b236-a283d0b1d097.jar:1.19-SNAPSHOT]
> Feb 02 01:11:55 at 
> org.apache.pekko.actor.ActorCell.invoke(ActorCell.scala:557) 
> ~[flink-rpc-akka06a9bb81-2e68-483a-b236-a283d0b1d097.jar:1.19-SNAPSHOT]
> Feb 02 01:11:55 at 
> org.apache.pekko.dispatch.Mailbox.processMailbox(Mailbox.scala:280) 
> ~[flink-rpc-akka06a9bb81-2e68-483a-b236-a283d0b1d097.jar:1.19-SNAPSHOT]
> Feb 02 01:11:55 at 
> org.apache.pekko.dispatch.Mailbox.run(Mailbox.scala:241) 
> ~[flink-rpc-akka06a9bb81-2e68-483a-b236-a283d0b1d097.jar:1.19-SNAPSHOT]
> Feb 02 01:11:55 at 
> org.apache.pekko.dispatch.Mailbox.exec(Mailbox.scala:253) 

[jira] [Updated] (FLINK-34343) ResourceManager registration is not completed when registering the JobMaster

2024-02-02 Thread Matthias Pohl (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-34343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias Pohl updated FLINK-34343:
--
Attachment: FLINK-34343_k8s_application_cluster_e2e_test.log

> ResourceManager registration is not completed when registering the JobMaster
> 
>
> Key: FLINK-34343
> URL: https://issues.apache.org/jira/browse/FLINK-34343
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination, Runtime / RPC
>Affects Versions: 1.19.0
>Reporter: Matthias Pohl
>Priority: Blocker
>  Labels: test-stability
> Attachments: FLINK-34343_k8s_application_cluster_e2e_test.log
>
>
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=57203=logs=64debf87-ecdb-5aef-788d-8720d341b5cb=2302fb98-0839-5df2-3354-bbae636f81a7=8066
> The test run failed due to a NullPointerException:
> {code}
> Feb 02 01:11:55 2024-02-02 01:11:47,791 INFO  
> org.apache.flink.runtime.rpc.pekko.FencedPekkoRpcActor   [] - The rpc 
> endpoint 
> org.apache.flink.runtime.resourcemanager.active.ActiveResourceManager has not 
> been started yet. Discarding message LocalFencedMessage(000
> 0, 
> LocalRpcInvocation(ResourceManagerGateway.registerJobMaster(JobMasterId, 
> ResourceID, String, JobID, Time))) until processing is started.
> Feb 02 01:11:55 2024-02-02 01:11:47,797 WARN  
> org.apache.flink.runtime.rpc.pekko.SupervisorActor   [] - RpcActor 
> pekko://flink/user/rpc/resourcemanager_2 has failed. Shutting it down now.
> Feb 02 01:11:55 java.lang.NullPointerException: Cannot invoke 
> "org.apache.flink.runtime.rpc.RpcServer.getAddress()" because 
> "this.rpcServer" is null
> Feb 02 01:11:55 at 
> org.apache.flink.runtime.rpc.RpcEndpoint.getAddress(RpcEndpoint.java:322) 
> ~[flink-dist-1.19-SNAPSHOT.jar:1.19-SNAPSHOT]
> Feb 02 01:11:55 at 
> org.apache.flink.runtime.rpc.pekko.PekkoRpcActor.handleMessage(PekkoRpcActor.java:182)
>  ~[flink-rpc-akka06a9bb81-2e68-483a-b236-a283d0b1d097.jar:1.19-SNAPSHOT]
> Feb 02 01:11:55 at 
> org.apache.pekko.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:33) 
> ~[flink-rpc-akka06a9bb81-2e68-483a-b236-a283d0b1d097.jar:1.19-SNAPSHOT]
> Feb 02 01:11:55 at 
> org.apache.pekko.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:29) 
> ~[flink-rpc-akka06a9bb81-2e68-483a-b236-a283d0b1d097.jar:1.19-SNAPSHOT]
> Feb 02 01:11:55 at 
> scala.PartialFunction.applyOrElse(PartialFunction.scala:127) 
> ~[flink-rpc-akka06a9bb81-2e68-483a-b236-a283d0b1d097.jar:1.19-SNAPSHOT]
> Feb 02 01:11:55 at 
> scala.PartialFunction.applyOrElse$(PartialFunction.scala:126) 
> ~[flink-rpc-akka06a9bb81-2e68-483a-b236-a283d0b1d097.jar:1.19-SNAPSHOT]
> Feb 02 01:11:55 at 
> org.apache.pekko.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala:29)
>  ~[flink-rpc-akka06a9bb81-2e68-483a-b236-a283d0b1d097.jar:1.19-SNAPSHOT]
> Feb 02 01:11:55 at 
> scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:175) 
> ~[flink-rpc-akka06a9bb81-2e68-483a-b236-a283d0b1d097.jar:1.19-SNAPSHOT]
> Feb 02 01:11:55 at 
> scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:176) 
> ~[flink-rpc-akka06a9bb81-2e68-483a-b236-a283d0b1d097.jar:1.19-SNAPSHOT]
> Feb 02 01:11:55 at 
> scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:176) 
> ~[flink-rpc-akka06a9bb81-2e68-483a-b236-a283d0b1d097.jar:1.19-SNAPSHOT]
> Feb 02 01:11:55 at 
> org.apache.pekko.actor.Actor.aroundReceive(Actor.scala:547) 
> ~[flink-rpc-akka06a9bb81-2e68-483a-b236-a283d0b1d097.jar:1.19-SNAPSHOT]
> Feb 02 01:11:55 at 
> org.apache.pekko.actor.Actor.aroundReceive$(Actor.scala:545) 
> ~[flink-rpc-akka06a9bb81-2e68-483a-b236-a283d0b1d097.jar:1.19-SNAPSHOT]
> Feb 02 01:11:55 at 
> org.apache.pekko.actor.AbstractActor.aroundReceive(AbstractActor.scala:229) 
> ~[flink-rpc-akka06a9bb81-2e68-483a-b236-a283d0b1d097.jar:1.19-SNAPSHOT]
> Feb 02 01:11:55 at 
> org.apache.pekko.actor.ActorCell.receiveMessage(ActorCell.scala:590) 
> ~[flink-rpc-akka06a9bb81-2e68-483a-b236-a283d0b1d097.jar:1.19-SNAPSHOT]
> Feb 02 01:11:55 at 
> org.apache.pekko.actor.ActorCell.invoke(ActorCell.scala:557) 
> ~[flink-rpc-akka06a9bb81-2e68-483a-b236-a283d0b1d097.jar:1.19-SNAPSHOT]
> Feb 02 01:11:55 at 
> org.apache.pekko.dispatch.Mailbox.processMailbox(Mailbox.scala:280) 
> ~[flink-rpc-akka06a9bb81-2e68-483a-b236-a283d0b1d097.jar:1.19-SNAPSHOT]
> Feb 02 01:11:55 at 
> org.apache.pekko.dispatch.Mailbox.run(Mailbox.scala:241) 
> ~[flink-rpc-akka06a9bb81-2e68-483a-b236-a283d0b1d097.jar:1.19-SNAPSHOT]
> Feb 02 01:11:55 at 
> org.apache.pekko.dispatch.Mailbox.exec(Mailbox.scala:253) 
> 

[jira] [Commented] (FLINK-34007) Flink Job stuck in suspend state after losing leadership in HA Mode

2024-02-02 Thread Matthias Pohl (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-34007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17813626#comment-17813626
 ] 

Matthias Pohl commented on FLINK-34007:
---

The Azure test failure seems to be unrelated. I created FLINK-34343 to cover 
this one separately. The GitHub Actions failures is still concerning, though.

> Flink Job stuck in suspend state after losing leadership in HA Mode
> ---
>
> Key: FLINK-34007
> URL: https://issues.apache.org/jira/browse/FLINK-34007
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Affects Versions: 1.19.0, 1.18.1, 1.18.2
>Reporter: Zhenqiu Huang
>Assignee: Matthias Pohl
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 1.19.0
>
> Attachments: Debug.log, LeaderElector-Debug.json, job-manager.log
>
>
> The observation is that Job manager goes to suspend state with a failed 
> container not able to register itself to resource manager after timeout.
> JM Log, see attached
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-34343) ResourceManager registration is not completed when registering the JobMaster

2024-02-02 Thread Matthias Pohl (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-34343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17813624#comment-17813624
 ] 

Matthias Pohl commented on FLINK-34343:
---

Initially, I thought that it would be caused by the FLINK-34007 changes. But 
that's not the case. The "Run kubernetes application test" doesn't rely HA but 
uses {{StandaloneLeaderElection}} (sessionId: 
{{}}).

> ResourceManager registration is not completed when registering the JobMaster
> 
>
> Key: FLINK-34343
> URL: https://issues.apache.org/jira/browse/FLINK-34343
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination, Runtime / RPC
>Affects Versions: 1.19.0
>Reporter: Matthias Pohl
>Priority: Blocker
>  Labels: test-stability
>
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=57203=logs=64debf87-ecdb-5aef-788d-8720d341b5cb=2302fb98-0839-5df2-3354-bbae636f81a7=8066
> The test run failed due to a NullPointerException:
> {code}
> Feb 02 01:11:55 2024-02-02 01:11:47,791 INFO  
> org.apache.flink.runtime.rpc.pekko.FencedPekkoRpcActor   [] - The rpc 
> endpoint 
> org.apache.flink.runtime.resourcemanager.active.ActiveResourceManager has not 
> been started yet. Discarding message LocalFencedMessage(000
> 0, 
> LocalRpcInvocation(ResourceManagerGateway.registerJobMaster(JobMasterId, 
> ResourceID, String, JobID, Time))) until processing is started.
> Feb 02 01:11:55 2024-02-02 01:11:47,797 WARN  
> org.apache.flink.runtime.rpc.pekko.SupervisorActor   [] - RpcActor 
> pekko://flink/user/rpc/resourcemanager_2 has failed. Shutting it down now.
> Feb 02 01:11:55 java.lang.NullPointerException: Cannot invoke 
> "org.apache.flink.runtime.rpc.RpcServer.getAddress()" because 
> "this.rpcServer" is null
> Feb 02 01:11:55 at 
> org.apache.flink.runtime.rpc.RpcEndpoint.getAddress(RpcEndpoint.java:322) 
> ~[flink-dist-1.19-SNAPSHOT.jar:1.19-SNAPSHOT]
> Feb 02 01:11:55 at 
> org.apache.flink.runtime.rpc.pekko.PekkoRpcActor.handleMessage(PekkoRpcActor.java:182)
>  ~[flink-rpc-akka06a9bb81-2e68-483a-b236-a283d0b1d097.jar:1.19-SNAPSHOT]
> Feb 02 01:11:55 at 
> org.apache.pekko.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:33) 
> ~[flink-rpc-akka06a9bb81-2e68-483a-b236-a283d0b1d097.jar:1.19-SNAPSHOT]
> Feb 02 01:11:55 at 
> org.apache.pekko.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:29) 
> ~[flink-rpc-akka06a9bb81-2e68-483a-b236-a283d0b1d097.jar:1.19-SNAPSHOT]
> Feb 02 01:11:55 at 
> scala.PartialFunction.applyOrElse(PartialFunction.scala:127) 
> ~[flink-rpc-akka06a9bb81-2e68-483a-b236-a283d0b1d097.jar:1.19-SNAPSHOT]
> Feb 02 01:11:55 at 
> scala.PartialFunction.applyOrElse$(PartialFunction.scala:126) 
> ~[flink-rpc-akka06a9bb81-2e68-483a-b236-a283d0b1d097.jar:1.19-SNAPSHOT]
> Feb 02 01:11:55 at 
> org.apache.pekko.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala:29)
>  ~[flink-rpc-akka06a9bb81-2e68-483a-b236-a283d0b1d097.jar:1.19-SNAPSHOT]
> Feb 02 01:11:55 at 
> scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:175) 
> ~[flink-rpc-akka06a9bb81-2e68-483a-b236-a283d0b1d097.jar:1.19-SNAPSHOT]
> Feb 02 01:11:55 at 
> scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:176) 
> ~[flink-rpc-akka06a9bb81-2e68-483a-b236-a283d0b1d097.jar:1.19-SNAPSHOT]
> Feb 02 01:11:55 at 
> scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:176) 
> ~[flink-rpc-akka06a9bb81-2e68-483a-b236-a283d0b1d097.jar:1.19-SNAPSHOT]
> Feb 02 01:11:55 at 
> org.apache.pekko.actor.Actor.aroundReceive(Actor.scala:547) 
> ~[flink-rpc-akka06a9bb81-2e68-483a-b236-a283d0b1d097.jar:1.19-SNAPSHOT]
> Feb 02 01:11:55 at 
> org.apache.pekko.actor.Actor.aroundReceive$(Actor.scala:545) 
> ~[flink-rpc-akka06a9bb81-2e68-483a-b236-a283d0b1d097.jar:1.19-SNAPSHOT]
> Feb 02 01:11:55 at 
> org.apache.pekko.actor.AbstractActor.aroundReceive(AbstractActor.scala:229) 
> ~[flink-rpc-akka06a9bb81-2e68-483a-b236-a283d0b1d097.jar:1.19-SNAPSHOT]
> Feb 02 01:11:55 at 
> org.apache.pekko.actor.ActorCell.receiveMessage(ActorCell.scala:590) 
> ~[flink-rpc-akka06a9bb81-2e68-483a-b236-a283d0b1d097.jar:1.19-SNAPSHOT]
> Feb 02 01:11:55 at 
> org.apache.pekko.actor.ActorCell.invoke(ActorCell.scala:557) 
> ~[flink-rpc-akka06a9bb81-2e68-483a-b236-a283d0b1d097.jar:1.19-SNAPSHOT]
> Feb 02 01:11:55 at 
> org.apache.pekko.dispatch.Mailbox.processMailbox(Mailbox.scala:280) 
> ~[flink-rpc-akka06a9bb81-2e68-483a-b236-a283d0b1d097.jar:1.19-SNAPSHOT]
> Feb 02 01:11:55 at 
> org.apache.pekko.dispatch.Mailbox.run(Mailbox.scala:241) 
> 

[jira] [Created] (FLINK-34343) ResourceManager registration is not completed when registering the JobMaster

2024-02-02 Thread Matthias Pohl (Jira)
Matthias Pohl created FLINK-34343:
-

 Summary: ResourceManager registration is not completed when 
registering the JobMaster
 Key: FLINK-34343
 URL: https://issues.apache.org/jira/browse/FLINK-34343
 Project: Flink
  Issue Type: Bug
  Components: Runtime / Coordination, Runtime / RPC
Affects Versions: 1.19.0
Reporter: Matthias Pohl


https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=57203=logs=64debf87-ecdb-5aef-788d-8720d341b5cb=2302fb98-0839-5df2-3354-bbae636f81a7=8066

The test run failed due to a NullPointerException:
{code}
Feb 02 01:11:55 2024-02-02 01:11:47,791 INFO  
org.apache.flink.runtime.rpc.pekko.FencedPekkoRpcActor   [] - The rpc 
endpoint org.apache.flink.runtime.resourcemanager.active.ActiveResourceManager 
has not been started yet. Discarding message 
LocalFencedMessage(000
0, 
LocalRpcInvocation(ResourceManagerGateway.registerJobMaster(JobMasterId, 
ResourceID, String, JobID, Time))) until processing is started.
Feb 02 01:11:55 2024-02-02 01:11:47,797 WARN  
org.apache.flink.runtime.rpc.pekko.SupervisorActor   [] - RpcActor 
pekko://flink/user/rpc/resourcemanager_2 has failed. Shutting it down now.
Feb 02 01:11:55 java.lang.NullPointerException: Cannot invoke 
"org.apache.flink.runtime.rpc.RpcServer.getAddress()" because "this.rpcServer" 
is null
Feb 02 01:11:55 at 
org.apache.flink.runtime.rpc.RpcEndpoint.getAddress(RpcEndpoint.java:322) 
~[flink-dist-1.19-SNAPSHOT.jar:1.19-SNAPSHOT]
Feb 02 01:11:55 at 
org.apache.flink.runtime.rpc.pekko.PekkoRpcActor.handleMessage(PekkoRpcActor.java:182)
 ~[flink-rpc-akka06a9bb81-2e68-483a-b236-a283d0b1d097.jar:1.19-SNAPSHOT]
Feb 02 01:11:55 at 
org.apache.pekko.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:33) 
~[flink-rpc-akka06a9bb81-2e68-483a-b236-a283d0b1d097.jar:1.19-SNAPSHOT]
Feb 02 01:11:55 at 
org.apache.pekko.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:29) 
~[flink-rpc-akka06a9bb81-2e68-483a-b236-a283d0b1d097.jar:1.19-SNAPSHOT]
Feb 02 01:11:55 at 
scala.PartialFunction.applyOrElse(PartialFunction.scala:127) 
~[flink-rpc-akka06a9bb81-2e68-483a-b236-a283d0b1d097.jar:1.19-SNAPSHOT]
Feb 02 01:11:55 at 
scala.PartialFunction.applyOrElse$(PartialFunction.scala:126) 
~[flink-rpc-akka06a9bb81-2e68-483a-b236-a283d0b1d097.jar:1.19-SNAPSHOT]
Feb 02 01:11:55 at 
org.apache.pekko.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala:29) 
~[flink-rpc-akka06a9bb81-2e68-483a-b236-a283d0b1d097.jar:1.19-SNAPSHOT]
Feb 02 01:11:55 at 
scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:175) 
~[flink-rpc-akka06a9bb81-2e68-483a-b236-a283d0b1d097.jar:1.19-SNAPSHOT]
Feb 02 01:11:55 at 
scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:176) 
~[flink-rpc-akka06a9bb81-2e68-483a-b236-a283d0b1d097.jar:1.19-SNAPSHOT]
Feb 02 01:11:55 at 
scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:176) 
~[flink-rpc-akka06a9bb81-2e68-483a-b236-a283d0b1d097.jar:1.19-SNAPSHOT]
Feb 02 01:11:55 at 
org.apache.pekko.actor.Actor.aroundReceive(Actor.scala:547) 
~[flink-rpc-akka06a9bb81-2e68-483a-b236-a283d0b1d097.jar:1.19-SNAPSHOT]
Feb 02 01:11:55 at 
org.apache.pekko.actor.Actor.aroundReceive$(Actor.scala:545) 
~[flink-rpc-akka06a9bb81-2e68-483a-b236-a283d0b1d097.jar:1.19-SNAPSHOT]
Feb 02 01:11:55 at 
org.apache.pekko.actor.AbstractActor.aroundReceive(AbstractActor.scala:229) 
~[flink-rpc-akka06a9bb81-2e68-483a-b236-a283d0b1d097.jar:1.19-SNAPSHOT]
Feb 02 01:11:55 at 
org.apache.pekko.actor.ActorCell.receiveMessage(ActorCell.scala:590) 
~[flink-rpc-akka06a9bb81-2e68-483a-b236-a283d0b1d097.jar:1.19-SNAPSHOT]
Feb 02 01:11:55 at 
org.apache.pekko.actor.ActorCell.invoke(ActorCell.scala:557) 
~[flink-rpc-akka06a9bb81-2e68-483a-b236-a283d0b1d097.jar:1.19-SNAPSHOT]
Feb 02 01:11:55 at 
org.apache.pekko.dispatch.Mailbox.processMailbox(Mailbox.scala:280) 
~[flink-rpc-akka06a9bb81-2e68-483a-b236-a283d0b1d097.jar:1.19-SNAPSHOT]
Feb 02 01:11:55 at 
org.apache.pekko.dispatch.Mailbox.run(Mailbox.scala:241) 
~[flink-rpc-akka06a9bb81-2e68-483a-b236-a283d0b1d097.jar:1.19-SNAPSHOT]
Feb 02 01:11:55 at 
org.apache.pekko.dispatch.Mailbox.exec(Mailbox.scala:253) 
~[flink-rpc-akka06a9bb81-2e68-483a-b236-a283d0b1d097.jar:1.19-SNAPSHOT]
Feb 02 01:11:55 at java.util.concurrent.ForkJoinTask.doExec(Unknown 
Source) ~[?:?]
Feb 02 01:11:55 at 
java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(Unknown Source) ~[?:?]
Feb 02 01:11:55 at java.util.concurrent.ForkJoinPool.scan(Unknown 
Source) ~[?:?]
Feb 02 01:11:55 at java.util.concurrent.ForkJoinPool.runWorker(Unknown 
Source) ~[?:?]
Feb 02 01:11:55 at 
java.util.concurrent.ForkJoinWorkerThread.run(Unknown Source) ~[?:?]
{code}



--
This message 

Re: [PR] [FLINK-34247][doc] Document FLIP-366: Support standard YAML for FLINK configuration [flink]

2024-02-02 Thread via GitHub


flinkbot commented on PR #24251:
URL: https://github.com/apache/flink/pull/24251#issuecomment-1923581075

   
   ## CI report:
   
   * e2f376490d7922436989cf89a66269f2cd73272c UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (FLINK-34247) Document FLIP-366: Support standard YAML for FLINK configuration

2024-02-02 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-34247?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated FLINK-34247:
---
Labels: pull-request-available  (was: )

> Document FLIP-366: Support standard YAML for FLINK configuration
> 
>
> Key: FLINK-34247
> URL: https://issues.apache.org/jira/browse/FLINK-34247
> Project: Flink
>  Issue Type: Sub-task
>  Components: Documentation
>Reporter: Junrui Li
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.19.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[PR] [FLINK-34247][doc] Document FLIP-366: Support standard YAML for FLINK configuration [flink]

2024-02-02 Thread via GitHub


JunRuiLee opened a new pull request, #24251:
URL: https://github.com/apache/flink/pull/24251

   
   
   ## What is the purpose of the change
   
   Document FLIP-366: Support standard YAML for FLINK configuration
   
   
   ## Brief change log
   
 - Add documentation of new Flink configuration file config.yaml.
 - Update the usage of flink-conf.yaml in doc.
 - Update the usage of "env.java.home" in doc.
   
   The updated document is shown in the figure:
   
   English version:
   
   
![config-1](https://github.com/apache/flink/assets/107924572/9d662f0d-c066-4aeb-83ca-51cdb2827942)
   
![config-2](https://github.com/apache/flink/assets/107924572/2aaf6ec6-cf82-4de5-b54b-32e973a75108)
   
   Chinese version:
   
   
![config-zh-1](https://github.com/apache/flink/assets/107924572/25dc226e-96f7-49a6-9f82-ab53dc46856a)
   
![config-zh-2](https://github.com/apache/flink/assets/107924572/159546ca-3da9-4ed0-9d0c-dc760bee79d3)
   
   ## Verifying this change
   
   This change is a trivial rework / code cleanup without any test coverage.
   
   
   ## Does this pull request potentially affect one of the following parts:
   
 - Dependencies (does it add or upgrade a dependency): (yes / **no**)
 - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: (yes / **no**)
 - The serializers: (yes / **no** / don't know)
 - The runtime per-record code paths (performance sensitive): (yes / **no** 
/ don't know)
 - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Kubernetes/Yarn, ZooKeeper: (yes / **no** / don't 
know)
 - The S3 file system connector: (yes / **no** / don't know)
   
   ## Documentation
   
 - Does this pull request introduce a new feature? (yes / **no**)
 - If yes, how is the feature documented? (not applicable / **docs** / 
JavaDocs / not documented)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Commented] (FLINK-34229) Duplicate entry in InnerClasses attribute in class file FusionStreamOperator

2024-02-02 Thread xingbe (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-34229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17813613#comment-17813613
 ] 

xingbe commented on FLINK-34229:


[~FrankZou] Thanks for resolving this. I have checked that after cherry-picked 
the commit, q35.sql is now able to run to completion without exceptions. It 
looks good to me.

> Duplicate entry in InnerClasses attribute in class file FusionStreamOperator
> 
>
> Key: FLINK-34229
> URL: https://issues.apache.org/jira/browse/FLINK-34229
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / Runtime
>Affects Versions: 1.19.0
>Reporter: xingbe
>Priority: Major
>  Labels: pull-request-available
> Attachments: image-2024-01-24-17-05-47-883.png, taskmanager_log.txt
>
>
> I noticed a runtime error happens in 10TB TPC-DS (q35.sql) benchmarks in 
> 1.19, the problem did not happen in 1.18.0. This issue may have been newly 
> introduced recently. !image-2024-01-24-17-05-47-883.png|width=589,height=279!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] [FLINK-26369][doc-zh] Translate the part zh-page mixed with not be translated. [flink]

2024-02-02 Thread via GitHub


1996fanrui commented on PR #20340:
URL: https://github.com/apache/flink/pull/20340#issuecomment-1923500149

   @flinkbot run azure


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [FLINK-26088][Connectors/ElasticSearch] Add Elasticsearch 8.0 support [flink-connector-elasticsearch]

2024-02-02 Thread via GitHub


StefanXiepj commented on PR #74:
URL: 
https://github.com/apache/flink-connector-elasticsearch/pull/74#issuecomment-1923495793

   > @StefanXiepj Are you still active on this PR?
   
   There are another 
PR[53](https://github.com/apache/flink-connector-elasticsearch/pull/53) doing 
it, i don't known which one is better. If necessary, I can continue do it


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Commented] (FLINK-34333) Fix FLINK-34007 LeaderElector bug in 1.18

2024-02-02 Thread Yang Wang (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-34333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17813604#comment-17813604
 ] 

Yang Wang commented on FLINK-34333:
---

+1 for what Chesnay said. We already shade the fabric8 k8s client in 
kubernetes-client module. It should not be used directly in other projects.

> Fix FLINK-34007 LeaderElector bug in 1.18
> -
>
> Key: FLINK-34333
> URL: https://issues.apache.org/jira/browse/FLINK-34333
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Affects Versions: 1.18.1
>Reporter: Matthias Pohl
>Assignee: Matthias Pohl
>Priority: Blocker
>  Labels: pull-request-available
>
> FLINK-34007 revealed a bug in the k8s client v6.6.2 which we're using since 
> Flink 1.18. This issue was fixed with FLINK-34007 for Flink 1.19 which 
> required an update of the k8s client to v6.9.0.
> This Jira issue is about finding a solution in Flink 1.18 for the very same 
> problem FLINK-34007 covered. It's a dedicated Jira issue because we want to 
> unblock the release of 1.19 by resolving FLINK-34007.
> Just to summarize why the upgrade to v6.9.0 is desired: There's a bug in 
> v6.6.2 which might prevent the leadership lost event being forwarded to the 
> client ([#5463|https://github.com/fabric8io/kubernetes-client/issues/5463]). 
> An initial proposal where the release call was handled in Flink's 
> {{KubernetesLeaderElector}} didn't work due to the leadership lost event 
> being triggered twice (see [FLINK-34007 PR 
> comment|https://github.com/apache/flink/pull/24132#discussion_r1467175902])



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-34333) Fix FLINK-34007 LeaderElector bug in 1.18

2024-02-02 Thread Yang Wang (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-34333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17813603#comment-17813603
 ] 

Yang Wang commented on FLINK-34333:
---

I am not aware of other potential issues and in favor of upgrading the k8s 
client version. 

> Fix FLINK-34007 LeaderElector bug in 1.18
> -
>
> Key: FLINK-34333
> URL: https://issues.apache.org/jira/browse/FLINK-34333
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Affects Versions: 1.18.1
>Reporter: Matthias Pohl
>Assignee: Matthias Pohl
>Priority: Blocker
>  Labels: pull-request-available
>
> FLINK-34007 revealed a bug in the k8s client v6.6.2 which we're using since 
> Flink 1.18. This issue was fixed with FLINK-34007 for Flink 1.19 which 
> required an update of the k8s client to v6.9.0.
> This Jira issue is about finding a solution in Flink 1.18 for the very same 
> problem FLINK-34007 covered. It's a dedicated Jira issue because we want to 
> unblock the release of 1.19 by resolving FLINK-34007.
> Just to summarize why the upgrade to v6.9.0 is desired: There's a bug in 
> v6.6.2 which might prevent the leadership lost event being forwarded to the 
> client ([#5463|https://github.com/fabric8io/kubernetes-client/issues/5463]). 
> An initial proposal where the release call was handled in Flink's 
> {{KubernetesLeaderElector}} didn't work due to the leadership lost event 
> being triggered twice (see [FLINK-34007 PR 
> comment|https://github.com/apache/flink/pull/24132#discussion_r1467175902])



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] [FLINK-25421] Port JDBC Sink to new Unified Sink API (FLIP-143) [flink-connector-jdbc]

2024-02-02 Thread via GitHub


eskabetxe commented on code in PR #2:
URL: 
https://github.com/apache/flink-connector-jdbc/pull/2#discussion_r1475842140


##
.github/workflows/weekly.yml:
##
@@ -27,25 +27,13 @@ jobs:
 strategy:
   matrix:
 flink_branches: [{
-  flink: 1.16-SNAPSHOT,
-  branch: main
-}, {
-  flink: 1.17-SNAPSHOT,
-  branch: main
-}, {
   flink: 1.18-SNAPSHOT,
   jdk: '8, 11, 17',
   branch: main
 }, {
   flink: 1.19-SNAPSHOT,
   jdk: '8, 11, 17, 21',
   branch: main
-}, {
-  flink: 1.16.2,
-  branch: v3.1
-}, {
-  flink: 1.17.1,
-  branch: v3.1
 }, {

Review Comment:
   We could release 3.2 (3.1.X) with current main code giving support for 1.17 
and 1.18 (1.18 still not support by any release), I don't know why still don't 
have release an 1.18 compatible version, any blocker?
   
   And them merge this and release again with (minor or major)..



##
.github/workflows/weekly.yml:
##
@@ -27,25 +27,13 @@ jobs:
 strategy:
   matrix:
 flink_branches: [{
-  flink: 1.16-SNAPSHOT,
-  branch: main
-}, {
-  flink: 1.17-SNAPSHOT,
-  branch: main
-}, {
   flink: 1.18-SNAPSHOT,
   jdk: '8, 11, 17',
   branch: main
 }, {
   flink: 1.19-SNAPSHOT,
   jdk: '8, 11, 17, 21',
   branch: main
-}, {
-  flink: 1.16.2,
-  branch: v3.1
-}, {
-  flink: 1.17.1,
-  branch: v3.1
 }, {

Review Comment:
   We could release 3.2 (3.1.X) with current main code giving support for 1.17 
and 1.18 (1.18 still not support by any release), I don't know why still don't 
have release an 1.18 compatible version, any blocker?
   
   And them merge this and release again with (minor or major)..
   Would this work?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [FLINK-25421] Port JDBC Sink to new Unified Sink API (FLIP-143) [flink-connector-jdbc]

2024-02-02 Thread via GitHub


eskabetxe commented on code in PR #2:
URL: 
https://github.com/apache/flink-connector-jdbc/pull/2#discussion_r1475842140


##
.github/workflows/weekly.yml:
##
@@ -27,25 +27,13 @@ jobs:
 strategy:
   matrix:
 flink_branches: [{
-  flink: 1.16-SNAPSHOT,
-  branch: main
-}, {
-  flink: 1.17-SNAPSHOT,
-  branch: main
-}, {
   flink: 1.18-SNAPSHOT,
   jdk: '8, 11, 17',
   branch: main
 }, {
   flink: 1.19-SNAPSHOT,
   jdk: '8, 11, 17, 21',
   branch: main
-}, {
-  flink: 1.16.2,
-  branch: v3.1
-}, {
-  flink: 1.17.1,
-  branch: v3.1
 }, {

Review Comment:
   We could release 3.2 (or 3.1.X) with current main code giving support for 
1.16, 1.17 and 1.18 (1.18 still not support by any release), I don't know why 
still don't have release an 1.18 compatible version, any blocker?
   
   And them merge this and release again with (minor or major)..
   Would this work?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Commented] (FLINK-31989) Update documentation

2024-02-02 Thread Danny Cranmer (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-31989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17813602#comment-17813602
 ] 

Danny Cranmer commented on FLINK-31989:
---

We could split this task and deliver incrementally with each subtask, rather 
than all at once

> Update documentation
> 
>
> Key: FLINK-31989
> URL: https://issues.apache.org/jira/browse/FLINK-31989
> Project: Flink
>  Issue Type: Sub-task
>Reporter: Hong Liang Teoh
>Priority: Major
>
> Update Flink documentation to explain the new KDS source. Include
>  * Improvements available in new KDS source
>  * Incompatible changes made
>  * Example implementation
>  * Example customisations



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-34342) Address Shard Consistency Issue for DDB Streams Source

2024-02-02 Thread Danny Cranmer (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-34342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Danny Cranmer updated FLINK-34342:
--
Description: 
*Problem*

We call 
[DescribeStream|https://docs.aws.amazon.com/amazondynamodb/latest/APIReference/API_streams_DescribeStream.html]
 with the ExclusiveStartShardId parameter set to the last seen shard ID. The 
issue is that the API is eventually consistent, meaning if we call the API 
multiple times we might get different results, for example: 
 * Call 1: [A, C]
 * Call 2: [A, B, C]

Since we would set ExclusiveStartShardId to {{{}C{}}}, in the above example the 
connector would miss shard {{B}}

*Solution*

We need to find a solution to support this gap. This could be to periodically 
list all shards to find gaps and not start processing new shards until their 
parents are complete. This feature does not need to be applied to the KDS 
source.

  was:
*Problem*

We call 
[DescribeStream|https://docs.aws.amazon.com/amazondynamodb/latest/APIReference/API_streams_DescribeStream.html]
 with the ExclusiveStartShardId parameter set to the last seen shard ID. 


> Address Shard Consistency Issue for DDB Streams Source
> --
>
> Key: FLINK-34342
> URL: https://issues.apache.org/jira/browse/FLINK-34342
> Project: Flink
>  Issue Type: Sub-task
>  Components: Connectors / DynamoDB
>Reporter: Danny Cranmer
>Priority: Major
>
> *Problem*
> We call 
> [DescribeStream|https://docs.aws.amazon.com/amazondynamodb/latest/APIReference/API_streams_DescribeStream.html]
>  with the ExclusiveStartShardId parameter set to the last seen shard ID. The 
> issue is that the API is eventually consistent, meaning if we call the API 
> multiple times we might get different results, for example: 
>  * Call 1: [A, C]
>  * Call 2: [A, B, C]
> Since we would set ExclusiveStartShardId to {{{}C{}}}, in the above example 
> the connector would miss shard {{B}}
> *Solution*
> We need to find a solution to support this gap. This could be to periodically 
> list all shards to find gaps and not start processing new shards until their 
> parents are complete. This feature does not need to be applied to the KDS 
> source.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] [FLINK-33396][table-planner] Fix table alias not be cleared in subquery [flink]

2024-02-02 Thread via GitHub


xuyangzhong commented on code in PR #24239:
URL: https://github.com/apache/flink/pull/24239#discussion_r1475829008


##
flink-table/flink-table-planner/src/test/resources/org/apache/flink/table/planner/plan/hints/batch/NestLoopJoinHintTest.xml:
##
@@ -679,8 +680,8 @@ HashJoin(joinType=[LeftSemiJoin], where=[=(a1, a2)], 
select=[a1, b1], build=[rig
+- Calc(select=[a2])
   +- NestedLoopJoin(joinType=[InnerJoin], where=[=(a2, a3)], select=[a2, 
a3], build=[left])
  :- Exchange(distribution=[broadcast])
- :  +- TableSourceScan(table=[[default_catalog, default_database, T2, 
project=[a2], metadata=[]]], fields=[a2], hints=[[[ALIAS options:[T2)
- +- TableSourceScan(table=[[default_catalog, default_database, T3, 
project=[a3], metadata=[]]], fields=[a3], hints=[[[ALIAS options:[T3)

Review Comment:
   Got it! I have added two tests for table hints.
   1. There are some alias hints because of query hints exist, and table hints 
will not be cleared.
   2. There are no some alias hints because of query hints not exist, and table 
hints will also not be cleared.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (FLINK-34342) Address ListShards Consistency Issue for DDB Streams Source

2024-02-02 Thread Danny Cranmer (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-34342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Danny Cranmer updated FLINK-34342:
--
Description: 
*Problem*

We call 
[DescribeStream|https://docs.aws.amazon.com/amazondynamodb/latest/APIReference/API_streams_DescribeStream.html]
 with the ExclusiveStartShardId parameter set to the last seen shard ID. 

> Address ListShards Consistency Issue for DDB Streams Source
> ---
>
> Key: FLINK-34342
> URL: https://issues.apache.org/jira/browse/FLINK-34342
> Project: Flink
>  Issue Type: Sub-task
>  Components: Connectors / DynamoDB
>Reporter: Danny Cranmer
>Priority: Major
>
> *Problem*
> We call 
> [DescribeStream|https://docs.aws.amazon.com/amazondynamodb/latest/APIReference/API_streams_DescribeStream.html]
>  with the ExclusiveStartShardId parameter set to the last seen shard ID. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-34342) Address Shard Consistency Issue for DDB Streams Source

2024-02-02 Thread Danny Cranmer (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-34342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Danny Cranmer updated FLINK-34342:
--
Summary: Address Shard Consistency Issue for DDB Streams Source  (was: 
Address ListShards Consistency Issue for DDB Streams Source)

> Address Shard Consistency Issue for DDB Streams Source
> --
>
> Key: FLINK-34342
> URL: https://issues.apache.org/jira/browse/FLINK-34342
> Project: Flink
>  Issue Type: Sub-task
>  Components: Connectors / DynamoDB
>Reporter: Danny Cranmer
>Priority: Major
>
> *Problem*
> We call 
> [DescribeStream|https://docs.aws.amazon.com/amazondynamodb/latest/APIReference/API_streams_DescribeStream.html]
>  with the ExclusiveStartShardId parameter set to the last seen shard ID. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-34342) Address ListShards Consistency Issue for DDB Streams Source

2024-02-02 Thread Danny Cranmer (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-34342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Danny Cranmer updated FLINK-34342:
--
Summary: Address ListShards Consistency Issue for DDB Streams Source  (was: 
Address ListShards Consistency for DDB Streams Source)

> Address ListShards Consistency Issue for DDB Streams Source
> ---
>
> Key: FLINK-34342
> URL: https://issues.apache.org/jira/browse/FLINK-34342
> Project: Flink
>  Issue Type: Sub-task
>  Components: Connectors / DynamoDB
>Reporter: Danny Cranmer
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-34342) Address ListShards Consistency for DDB Streams Source

2024-02-02 Thread Danny Cranmer (Jira)
Danny Cranmer created FLINK-34342:
-

 Summary: Address ListShards Consistency for DDB Streams Source
 Key: FLINK-34342
 URL: https://issues.apache.org/jira/browse/FLINK-34342
 Project: Flink
  Issue Type: Sub-task
  Components: Connectors / DynamoDB
Reporter: Danny Cranmer






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-34340) Add support for DDB Streams for DataStream API

2024-02-02 Thread Danny Cranmer (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-34340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Danny Cranmer updated FLINK-34340:
--
Summary: Add support for DDB Streams for DataStream API  (was: Add support 
for DDB Streams)

> Add support for DDB Streams for DataStream API
> --
>
> Key: FLINK-34340
> URL: https://issues.apache.org/jira/browse/FLINK-34340
> Project: Flink
>  Issue Type: Sub-task
>  Components: Connectors / DynamoDB
>Reporter: Danny Cranmer
>Priority: Major
>
> In the legacy KDS source we support Amazon DynamoDB streams via an adapter 
> shim. Both KDS and DDB streams have a similar API.
> This task builds upon https://issues.apache.org/jira/browse/FLINK-34339 and 
> will add a {{DynamoDBStreamsSource}} which will setup a DDB SDK client shim.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] [FLINK-25421] Port JDBC Sink to new Unified Sink API (FLIP-143) [flink-connector-jdbc]

2024-02-02 Thread via GitHub


snuyanzin commented on code in PR #2:
URL: 
https://github.com/apache/flink-connector-jdbc/pull/2#discussion_r1475815795


##
.github/workflows/weekly.yml:
##
@@ -27,25 +27,13 @@ jobs:
 strategy:
   matrix:
 flink_branches: [{
-  flink: 1.16-SNAPSHOT,
-  branch: main
-}, {
-  flink: 1.17-SNAPSHOT,
-  branch: main
-}, {
   flink: 1.18-SNAPSHOT,
   jdk: '8, 11, 17',
   branch: main
 }, {
   flink: 1.19-SNAPSHOT,
   jdk: '8, 11, 17, 21',
   branch: main
-}, {
-  flink: 1.16.2,
-  branch: v3.1
-}, {
-  flink: 1.17.1,
-  branch: v3.1
 }, {

Review Comment:
   IIRC idealy we should support currently supported Flink versions (1.17, 
1.18, 1.19 and 1.16 (probably will be not supported after 1.19 release)).
   I don't know another way to have all of them supported with out branching 
(4.0.0 or 3.2.0 doesn't really matter)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [FLINK-25421] Port JDBC Sink to new Unified Sink API (FLIP-143) [flink-connector-jdbc]

2024-02-02 Thread via GitHub


snuyanzin commented on code in PR #2:
URL: 
https://github.com/apache/flink-connector-jdbc/pull/2#discussion_r1475815795


##
.github/workflows/weekly.yml:
##
@@ -27,25 +27,13 @@ jobs:
 strategy:
   matrix:
 flink_branches: [{
-  flink: 1.16-SNAPSHOT,
-  branch: main
-}, {
-  flink: 1.17-SNAPSHOT,
-  branch: main
-}, {
   flink: 1.18-SNAPSHOT,
   jdk: '8, 11, 17',
   branch: main
 }, {
   flink: 1.19-SNAPSHOT,
   jdk: '8, 11, 17, 21',
   branch: main
-}, {
-  flink: 1.16.2,
-  branch: v3.1
-}, {
-  flink: 1.17.1,
-  branch: v3.1
 }, {

Review Comment:
   IIRC idealy we should support currently supported Flink versions (1.17, 
1.18, 1.19 and 1.16 (probably will be not supported after 1.19 release)).
   I don't know another way to have all of them supported without branching 
(4.0.0 or 3.2.0 doesn't really matter)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (FLINK-34341) Implement DDB Streams Table API support

2024-02-02 Thread Danny Cranmer (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-34341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Danny Cranmer updated FLINK-34341:
--
Description: 
Implement Table API support for DDB Streams Source.

 

Consider:
 * Configurations to support. Should have customisation parity with DataStream 
API
 * Testing should include both SQL client + Table API via Java

  was:
Implement Table API support for KDS Source.

 

Consider:
 * Configurations to support. Should have customisation parity with DataStream 
API
 * Testing should include both SQL client + Table API via Java


> Implement DDB Streams Table API support
> ---
>
> Key: FLINK-34341
> URL: https://issues.apache.org/jira/browse/FLINK-34341
> Project: Flink
>  Issue Type: Sub-task
>  Components: Connectors / Kinesis
>Reporter: Hong Liang Teoh
>Priority: Major
>  Labels: pull-request-available
>
> Implement Table API support for DDB Streams Source.
>  
> Consider:
>  * Configurations to support. Should have customisation parity with 
> DataStream API
>  * Testing should include both SQL client + Table API via Java



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-34341) Implement DDB Streams Table API support

2024-02-02 Thread Danny Cranmer (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-34341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Danny Cranmer updated FLINK-34341:
--
Reporter: Danny Cranmer  (was: Hong Liang Teoh)

> Implement DDB Streams Table API support
> ---
>
> Key: FLINK-34341
> URL: https://issues.apache.org/jira/browse/FLINK-34341
> Project: Flink
>  Issue Type: Sub-task
>  Components: Connectors / Kinesis
>Reporter: Danny Cranmer
>Priority: Major
>  Labels: pull-request-available
>
> Implement Table API support for DDB Streams Source.
>  
> Consider:
>  * Configurations to support. Should have customisation parity with 
> DataStream API
>  * Testing should include both SQL client + Table API via Java



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-34341) Implement DDB Streams Table API support

2024-02-02 Thread Danny Cranmer (Jira)
Danny Cranmer created FLINK-34341:
-

 Summary: Implement DDB Streams Table API support
 Key: FLINK-34341
 URL: https://issues.apache.org/jira/browse/FLINK-34341
 Project: Flink
  Issue Type: Sub-task
  Components: Connectors / Kinesis
Reporter: Hong Liang Teoh


Implement Table API support for KDS Source.

 

Consider:
 * Configurations to support. Should have customisation parity with DataStream 
API
 * Testing should include both SQL client + Table API via Java



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-31987) Implement KDS Table API support

2024-02-02 Thread Danny Cranmer (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-31987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Danny Cranmer updated FLINK-31987:
--
Component/s: Connectors / Kinesis

> Implement KDS Table API support
> ---
>
> Key: FLINK-31987
> URL: https://issues.apache.org/jira/browse/FLINK-31987
> Project: Flink
>  Issue Type: Sub-task
>  Components: Connectors / Kinesis
>Reporter: Hong Liang Teoh
>Priority: Major
>  Labels: pull-request-available
>
> Implement Table API support for KDS Source.
>  
> Consider:
>  * Configurations to support. Should have customisation parity with 
> DataStream API
>  * Testing should include both SQL client + Table API via Java



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-31987) Implement KDS Table API support

2024-02-02 Thread Danny Cranmer (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-31987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Danny Cranmer updated FLINK-31987:
--
Summary: Implement KDS Table API support  (was: Implement Table API support)

> Implement KDS Table API support
> ---
>
> Key: FLINK-31987
> URL: https://issues.apache.org/jira/browse/FLINK-31987
> Project: Flink
>  Issue Type: Sub-task
>Reporter: Hong Liang Teoh
>Priority: Major
>  Labels: pull-request-available
>
> Implement Table API support for KDS Source.
>  
> Consider:
>  * Configurations to support. Should have customisation parity with 
> DataStream API
>  * Testing should include both SQL client + Table API via Java



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-31980) Implement support for EFO in new KDS Source

2024-02-02 Thread Danny Cranmer (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-31980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Danny Cranmer updated FLINK-31980:
--
Summary: Implement support for EFO in new KDS Source  (was: Implement 
support for EFO in new Source)

> Implement support for EFO in new KDS Source
> ---
>
> Key: FLINK-31980
> URL: https://issues.apache.org/jira/browse/FLINK-31980
> Project: Flink
>  Issue Type: Sub-task
>Reporter: Hong Liang Teoh
>Priority: Major
>  Labels: pull-request-available
>
> Implement support for reading from Kinesis Stream using Enhanced Fan Out 
> mechanism



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-31980) Implement support for EFO in new KDS Source

2024-02-02 Thread Danny Cranmer (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-31980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Danny Cranmer updated FLINK-31980:
--
Component/s: Connectors / Kinesis

> Implement support for EFO in new KDS Source
> ---
>
> Key: FLINK-31980
> URL: https://issues.apache.org/jira/browse/FLINK-31980
> Project: Flink
>  Issue Type: Sub-task
>  Components: Connectors / Kinesis
>Reporter: Hong Liang Teoh
>Priority: Major
>  Labels: pull-request-available
>
> Implement support for reading from Kinesis Stream using Enhanced Fan Out 
> mechanism



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-34330) Specify code owners for .github/workflows folder

2024-02-02 Thread Chesnay Schepler (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-34330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17813596#comment-17813596
 ] 

Chesnay Schepler commented on FLINK-34330:
--

Specifically, what will not fly is that a codeowner review is REQUIRED for 
something to be merged. But you can use codeowners to auto-request reviews from 
people (WITHOUT blocking the PR on it).

> Specify code owners for .github/workflows folder
> 
>
> Key: FLINK-34330
> URL: https://issues.apache.org/jira/browse/FLINK-34330
> Project: Flink
>  Issue Type: Sub-task
>Affects Versions: 1.19.0, 1.18.1
>Reporter: Matthias Pohl
>Priority: Major
>
> Currently, the workflow files can be modified by any committer. We have to 
> discuss whether we want to limit access to the PMC (or a subset of it) here. 
> That might be a means to protect self-hosted runners.
> See the [codeowner 
> documentation|https://docs.github.com/en/repositories/managing-your-repositorys-settings-and-features/customizing-your-repository/about-code-owners]
>  for further details.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-34301) Release Testing Instructions: Verify FLINK-20281 Window aggregation supports changelog stream input

2024-02-02 Thread xuyang (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-34301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17813595#comment-17813595
 ] 

xuyang commented on FLINK-34301:


Window TVF aggregation supports changelog stream  is ready for testing. User 
can add a window tvf aggregation as a down stream after CDC source or some 
nodes that will produce cdc records.

Someone can verify this feature with:
 # Prepare a mysql table, and insert some data at first.
 # Start sql-client and prepare ddl for this mysql table as a cdc source.
 # You can verify the plan by `EXPLAIN PLAN_ADVICE` to check if there is a 
window aggregate node and the changelog contains "UA" or "UB" or "D" in its 
upstream. 
 # Use different kinds of window tvf to test window tvf aggregation while 
updating the source data to check the data correctness.

> Release Testing Instructions: Verify FLINK-20281 Window aggregation supports 
> changelog stream input
> ---
>
> Key: FLINK-34301
> URL: https://issues.apache.org/jira/browse/FLINK-34301
> Project: Flink
>  Issue Type: Sub-task
>  Components: Table SQL / API
>Affects Versions: 1.19.0
>Reporter: lincoln lee
>Assignee: xuyang
>Priority: Blocker
>  Labels: release-testing
> Fix For: 1.19.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (FLINK-34300) Release Testing Instructions: Verify FLINK-24024 Support session Window TVF

2024-02-02 Thread xuyang (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-34300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17813590#comment-17813590
 ] 

xuyang edited comment on FLINK-34300 at 2/2/24 9:29 AM:


Session window TVF is ready. Users can use Session window TVF aggregation 
instead of using legacy session group window aggregation.

Someone can verify this feature by following the 
[doc]([https://github.com/apache/flink/pull/24250]) although it is still being 
reviewed. 

Further more,  although session window join, session window rank and session 
window deduplicate are in experimental state, If someone finds some bugs about 
them, you could also open a Jira linked this one to report them.


was (Author: xuyangzhong):
Session window TVF is ready. Users can use Session window TVF aggregation 
instead of using legacy session group window aggregation.

Someone can verify this feature by the 
[doc](https://github.com/apache/flink/pull/24250) although it is preparing. 

Further more,  although session window join, session window rank and session 
window deduplicate are in experimental state, If someone finds some bugs about 
them, you could also open a Jira linked this one to report it.

> Release Testing Instructions: Verify FLINK-24024 Support session Window TVF
> ---
>
> Key: FLINK-34300
> URL: https://issues.apache.org/jira/browse/FLINK-34300
> Project: Flink
>  Issue Type: Sub-task
>  Components: Table SQL / API
>Affects Versions: 1.19.0
>Reporter: lincoln lee
>Assignee: xuyang
>Priority: Blocker
>  Labels: release-testing
> Fix For: 1.19.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-34340) Add support for DDB Streams

2024-02-02 Thread Danny Cranmer (Jira)
Danny Cranmer created FLINK-34340:
-

 Summary: Add support for DDB Streams
 Key: FLINK-34340
 URL: https://issues.apache.org/jira/browse/FLINK-34340
 Project: Flink
  Issue Type: Sub-task
  Components: Connectors / DynamoDB
Reporter: Danny Cranmer


In the legacy KDS source we support Amazon DynamoDB streams via an adapter 
shim. Both KDS and DDB streams have a similar API.

This task builds upon https://issues.apache.org/jira/browse/FLINK-34339 and 
will add a {{DynamoDBStreamsSource}} which will setup a DDB SDK client shim.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-34339) Add connector abstraction layer to remove reliance on AWS SDK classes

2024-02-02 Thread Danny Cranmer (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-34339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Danny Cranmer updated FLINK-34339:
--
Component/s: Connectors / Kinesis

> Add connector abstraction layer to remove reliance on AWS SDK classes
> -
>
> Key: FLINK-34339
> URL: https://issues.apache.org/jira/browse/FLINK-34339
> Project: Flink
>  Issue Type: Sub-task
>  Components: Connectors / Kinesis
>Reporter: Danny Cranmer
>Priority: Major
>
> In order to shim DDB streams we need to be able to support the 
> Stream/Shard/Record etc concepts without tying to a specific implementation. 
> This will allow us to mimic the KDS/DDB streams support in the old connector 
> by providing a shim at the AWS SDK client.
>  # Model {{software.amazon.awssdk.services.kinesis}} classes as native 
> concepts
>  # Push down any usage of {{software.amazon.awssdk.services.kinesis}} to a 
> KDS specific class
>  # Ensure that the bulk of the connector logic is reusable, the top level 
> class would be implementation specific and shim in the write client factories 
> and configuration



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-34300) Release Testing Instructions: Verify FLINK-24024 Support session Window TVF

2024-02-02 Thread xuyang (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-34300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17813590#comment-17813590
 ] 

xuyang commented on FLINK-34300:


Session window TVF is ready. Users can use Session window TVF aggregation 
instead of using legacy session group window aggregation.

Someone can verify this feature by the 
[doc](https://github.com/apache/flink/pull/24250) although it is preparing. 

Further more,  although session window join, session window rank and session 
window deduplicate are in experimental state, If someone finds some bugs about 
them, you could also open a Jira linked this one to report it.

> Release Testing Instructions: Verify FLINK-24024 Support session Window TVF
> ---
>
> Key: FLINK-34300
> URL: https://issues.apache.org/jira/browse/FLINK-34300
> Project: Flink
>  Issue Type: Sub-task
>  Components: Table SQL / API
>Affects Versions: 1.19.0
>Reporter: lincoln lee
>Assignee: xuyang
>Priority: Blocker
>  Labels: release-testing
> Fix For: 1.19.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-34330) Specify code owners for .github/workflows folder

2024-02-02 Thread Chesnay Schepler (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-34330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17813589#comment-17813589
 ] 

Chesnay Schepler commented on FLINK-34330:
--

Chances are they just weren't caught; no one is actively looking out this after 
all.
But anytime this gets brought up on ASF mailing lists it gets shut down quickly.

> Specify code owners for .github/workflows folder
> 
>
> Key: FLINK-34330
> URL: https://issues.apache.org/jira/browse/FLINK-34330
> Project: Flink
>  Issue Type: Sub-task
>Affects Versions: 1.19.0, 1.18.1
>Reporter: Matthias Pohl
>Priority: Major
>
> Currently, the workflow files can be modified by any committer. We have to 
> discuss whether we want to limit access to the PMC (or a subset of it) here. 
> That might be a means to protect self-hosted runners.
> See the [codeowner 
> documentation|https://docs.github.com/en/repositories/managing-your-repositorys-settings-and-features/customizing-your-repository/about-code-owners]
>  for further details.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


  1   2   >