flink checkpoint 延迟的性能问题讨论

2024-06-16 Thread 15868861416
各位大佬,
背景:
实际测试flink读Kafka 数据写入hudi, checkpoint的间隔时间是1min, 
state.backend分别为filesystem,测试结果如下:



写hudi的checkpoint 的延迟





写iceberg得延迟:



疑问: hudi的checkpoint的文件数据比iceberg要大很多,如何降低flink写hudi的checkpoint的延迟?


| |
博星
|
|
15868861...@163.com
|



??????Flink????????????????join??????????????n??????

2024-06-16 Thread ????

1.left
 join
2.n??



  
| ?? | <1227581...@qq.com.INVALID> |
|  | 2024??06??16?? 21:08 |
| ?? | user-zh |
| ?? | |
|  | ??Flinkjoin??n?? |
??SQL??datastreamDWD??ClickHouse/Doris??



1227581...@qq.com








----
??: 
   "user-zh"



??????Flink????????????????join??????????????n??????

2024-06-16 Thread ????
??flink sql apidatastream api??



  
| ?? | <1227581...@qq.com.INVALID> |
|  | 2024??06??16?? 20:35 |
| ?? | user-zh |
| ?? | |
|  | Flinkjoin??n?? |
??
1DWD??KafkaDWD
2Kafka
3??FlinkKafka1,2??FlinkKafka??DWD12DWS??


DWS







|
|

1227581...@qq.com
|
 

Flink????????????????join??????????????n??????

2024-06-16 Thread ????
??
1DWD??KafkaDWD
2Kafka
3??FlinkKafka1,2??FlinkKafka??DWD12DWS??


DWS










1227581...@qq.com





Re: Exception: Coordinator of operator xxxx does not exist or the job vertex this operator belongs to is not initialized.

2024-06-16 Thread Junrui Lee
Hi Biao,

I agree with you that this exception is not very meaningful and can be
noisy in the JM logs, especially when running large-scale batch jobs in a
session cluster.

IIRC, there isn't a current config to filter out or silence such exceptions
in batch mode. So I've created a JIRA ticket (
https://issues.apache.org/jira/browse/FLINK-35622) to track this issue for
possible future optimizations.

Best,
Junrui

Geng Biao  于2024年6月16日周日 13:41写道:

> Hi Junrui,
> Thanks for your answer! Since this exception is not very meaningful, is
> there a solution or a flink config to filter out or silent such exception
> in batch mode? When I run some large scale batch jobs in a session cluster,
> it turns out that the JM log will be fulfilled with this exception which
> makes it difficult to find detailed execution information about the job.
>
> Best,
> Biao Geng
>
> 发送自 Outlook for iOS 
> --
> *发件人:* Junrui Lee 
> *发送时间:* Sunday, June 16, 2024 12:49:10 PM
> *收件人:* Corin 
> *抄送:* user@flink.apache.org 
> *主题:* Re: Exception: Coordinator of operator  does not exist or the
> job vertex this operator belongs to is not initialized.
>
> Hi,
>
> This exception is common in batch jobs and is caused by the collect sink
> attempting to fetch data from the corresponding operator coordinator on the
> JM based on the operator ID. However, due to the sequential scheduling of
> batch jobs, if a job vertex has not been initialized yet, the corresponding
> operator coordinator cannot be found, leading to the printing of this
> message. This log does not impact the normal execution of the job because
> the collect sink will keep retrying to send the request.
>
> Best,
> Junrui
>
> Corin  于2024年6月16日周日 12:45写道:
>
> When I run a batch job using Flink 1.19, I used collect() in the job, and
> many times the following error appears in the JobManager log: Caused by:
> org.apache.flink.util.FlinkException: Coordinator of operator  does not
> exist or the job vertex this operator belongs to is not initialized. What
> is the cause of this exception?
>
>


raw issues

2024-06-16 Thread Fokou Toukam, Thierry

hello, i am trying to do vector assembling with flink 1.15 but i have this. How 
can i solve it please?   2024-06-16 03:47:24 DEBUG Main:114 - Assembled Data 
Table Schema: root
 |-- tripId: INT
 |-- stopId: INT
 |-- routeId: INT
 |-- stopSequence: INT
 |-- speed: DOUBLE
 |-- currentStatus: INT
 |-- temp_max: DOUBLE
 |-- temp_min: DOUBLE
 |-- visibility: INT
 |-- dayOfWeek: INT
 |-- distance: DOUBLE
 |-- hour: DOUBLE
 |-- weatherConditionId: INT
 |-- bearing: DOUBLE
 |-- features: RAW('org.apache.flink.ml.linalg.Vector', '...'), 2024-06-15 
00:47:15 Cause: Incompatible types for sink column 'features' at position 0.
2024-06-15 00:47:15
2024-06-15 00:47:15 Query schema: [features: 
RAW('org.apache.flink.ml.linalg.Vector', '...')]
2024-06-15 00:47:15 Sink schema:  [features: 
RAW('org.apache.flink.ml.linalg.Vector', '...')]

Thierry FOKOU |  IT M.A.Sc Student

Département de génie logiciel et TI

École de technologie supérieure  |  Université du Québec

1100, rue Notre-Dame Ouest

Montréal (Québec)  H3C 1K3


[image001]