Questions regarding adaptive scheduler with YARN and application mode

2023-06-26 Thread Leon Xu
Hi Flink users,

I am trying to use Adaptive Scheduler to auto scale our Flink streaming
jobs (NOT batch job). Our jobs are running on YARN with application mode.
There isn't much doc around how adaptive scheduler works. So I have some
questions:


   1. How does Adaptive Scheduler work with YARN/Application mode? If the
   scheduler decides to request more tasks will it trigger the request to YARN
   while the job is already running

   2. What's the evaluation criteria to trigger a scale-up ? Is it possible
   to manually trigger a scale-up for testing purposes?


Thanks


Identifying a flink dashboard

2023-06-26 Thread Mike Phillips
G'day all,

Not sure if this is the correct place but...
We have a number of flink dashboards and it is difficult to know what
dashboard we are looking at.
Is there a configurable way to change the 'Apache Flink Dashboard' heading
on the dashboard?
Or some other way of uniquely identifying what dashboard I am currently
looking at?
Flink is running in k8s and we use kubectl port forwarding to connect to
the dashboard so we can't ID using the URL

-- 
--
Kind Regards

Mike


Re: Very long launch of the Flink application in BATCH mode

2023-06-26 Thread Brendan Cortez
No, I'm using a collection source + 20 same JDBC lookups + Kafka sink.

On Mon, 26 Jun 2023 at 19:17, Yaroslav Tkachenko 
wrote:

> Hey Brendan,
>
> Do you use a file source by any chance?
>
> On Mon, Jun 26, 2023 at 4:31 AM Brendan Cortez <
> brendan.cortez...@gmail.com> wrote:
>
>> Hi all!
>>
>> I'm trying to submit a Flink Job in Application Mode in the Kubernetes
>> cluster.
>>
>> I see some problems when an application has a big number of operators
>> (more than 20 same operators) - it freezes for ~6 minutes after
>> *2023-06-21 15:46:45,082 WARN
>>  org.apache.flink.connector.kafka.sink.KafkaSinkBuilder   [] - Property
>> [transaction.timeout.ms ] not specified.
>> Setting it to PT1H*
>>  and until
>>
>> *2023-06-21 15:53:20,002 INFO
>>  org.apache.flink.streaming.api.graph.StreamGraphGenerator[] - Disabled
>> Checkpointing. Checkpointing is not supported and not needed when executing
>> jobs in BATCH mode.*(logs in attachment)
>>
>> When I set log.level=DEBUG, I see only this message each 10 seconds:
>> *2023-06-21 14:51:30,921 DEBUG
>> org.apache.flink.runtime.resourcemanager.active.ActiveResourceManager [] -
>> Trigger heartbeat request.*
>>
>> Please, could you help me understand the cause of this problem and how to
>> fix it. I use the Flink 1.15.3 version.
>>
>> Thank you in advance!
>>
>> Best regards,
>> Brendan Cortez.
>>
>


Re: Very long launch of the Flink application in BATCH mode

2023-06-26 Thread Yaroslav Tkachenko
Hey Brendan,

Do you use a file source by any chance?

On Mon, Jun 26, 2023 at 4:31 AM Brendan Cortez 
wrote:

> Hi all!
>
> I'm trying to submit a Flink Job in Application Mode in the Kubernetes
> cluster.
>
> I see some problems when an application has a big number of operators
> (more than 20 same operators) - it freezes for ~6 minutes after
> *2023-06-21 15:46:45,082 WARN
>  org.apache.flink.connector.kafka.sink.KafkaSinkBuilder   [] - Property
> [transaction.timeout.ms ] not specified.
> Setting it to PT1H*
>  and until
>
> *2023-06-21 15:53:20,002 INFO
>  org.apache.flink.streaming.api.graph.StreamGraphGenerator[] - Disabled
> Checkpointing. Checkpointing is not supported and not needed when executing
> jobs in BATCH mode.*(logs in attachment)
>
> When I set log.level=DEBUG, I see only this message each 10 seconds:
> *2023-06-21 14:51:30,921 DEBUG
> org.apache.flink.runtime.resourcemanager.active.ActiveResourceManager [] -
> Trigger heartbeat request.*
>
> Please, could you help me understand the cause of this problem and how to
> fix it. I use the Flink 1.15.3 version.
>
> Thank you in advance!
>
> Best regards,
> Brendan Cortez.
>


Very long launch of the Flink application in BATCH mode

2023-06-26 Thread Brendan Cortez
Hi all!

I'm trying to submit a Flink Job in Application Mode in the Kubernetes
cluster.

I see some problems when an application has a big number of operators (more
than 20 same operators) - it freezes for ~6 minutes after
*2023-06-21 15:46:45,082 WARN
 org.apache.flink.connector.kafka.sink.KafkaSinkBuilder   [] - Property
[transaction.timeout.ms ] not specified.
Setting it to PT1H*
 and until

*2023-06-21 15:53:20,002 INFO
 org.apache.flink.streaming.api.graph.StreamGraphGenerator[] - Disabled
Checkpointing. Checkpointing is not supported and not needed when executing
jobs in BATCH mode.*(logs in attachment)

When I set log.level=DEBUG, I see only this message each 10 seconds:
*2023-06-21 14:51:30,921 DEBUG
org.apache.flink.runtime.resourcemanager.active.ActiveResourceManager [] -
Trigger heartbeat request.*

Please, could you help me understand the cause of this problem and how to
fix it. I use the Flink 1.15.3 version.

Thank you in advance!

Best regards,
Brendan Cortez.


flink-k8s-app.log
Description: Binary data