Re: Why is task manager shutting down?

2022-09-30 Thread Congxian Qiu
Hi
You can configure the key `task.cancellation.timeout`[1] to increase
the timeout, and the code about this logic is here[2]

[1]
https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/config/#task-cancellation-timeout
[2]
https://github.com/apache/flink/blob/f543b8ac690b1dee58bc3cb345a1c8ad0db0941e/flink-runtime/src/main/java/org/apache/flink/runtime/taskmanager/Task.java#L1775
Best,
Congxian


John Smith  于2022年9月29日周四 19:04写道:

> Sorry I mean the 180 seconds. Where does flink decide that 180 seconds is
> the cutoff point... And can I increase it.
>
> On Thu., Sep. 29, 2022, 7:02 a.m. John Smith, 
> wrote:
>
>> Is there a way to increase the 30 seconds to 60? Where is that 30 second
>> timeout set?
>>
>> I have jdbc query timeout but at some point at night the insert takes a
>> bit longer cause of index rebuilding.
>>
>> On Wed., Sep. 28, 2022, 5:02 a.m. Congxian Qiu, 
>> wrote:
>>
>>> Hi John
>>>
>>> Yes, the whole TaskManager exited because the task did not react to
>>> cancelling signal in time
>>>
>>> ```
>>>
>>> 2022-08-30 09:14:22,138 ERROR 
>>> org.apache.flink.runtime.taskexecutor.TaskExecutor   [] - Task did 
>>> not exit gracefully within 180 + seconds.
>>> org.apache.flink.util.FlinkRuntimeException: Task did not exit gracefully 
>>> within 180 + seconds.
>>> at 
>>> org.apache.flink.runtime.taskmanager.Task$TaskCancelerWatchDog.run(Task.java:1791)
>>>  [flink-dist_2.12-1.14.4.jar:1.14.4]
>>> at java.lang.Thread.run(Thread.java:750) [?:1.8.0_342]
>>> 2022-08-30 09:14:22,139 ERROR 
>>> org.apache.flink.runtime.taskexecutor.TaskManagerRunner  [] - Fatal 
>>> error occurred while executing the TaskManager. Shutting it down...
>>>
>>> ```
>>>
>>>
>>>  And the task stack logged such as below when cancelling the sink task
>>>
>>> ```
>>>
>>> 2022-08-30 09:14:22,135 WARN  org.apache.flink.runtime.taskmanager.Task 
>>>[] - Task 'Sink: jdbc (1/1)#359' did not react to cancelling 
>>> signal - notifying TM; it is stuck for 180 seconds in method:
>>>  java.net.SocketInputStream.socketRead0(Native Method)
>>> java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
>>> java.net.SocketInputStream.read(SocketInputStream.java:171)
>>> java.net.SocketInputStream.read(SocketInputStream.java:141)
>>> com.microsoft.sqlserver.jdbc.TDSChannel.read(IOBuffer.java:2023)
>>> com.microsoft.sqlserver.jdbc.TDSReader.readPacket(IOBuffer.java:6418)
>>> com.microsoft.sqlserver.jdbc.TDSCommand.startResponse(IOBuffer.java:7579)
>>> com.microsoft.sqlserver.jdbc.SQLServerPreparedStatement.doExecutePreparedStatement(SQLServerPreparedStatement.java:592)
>>> com.microsoft.sqlserver.jdbc.SQLServerPreparedStatement$PrepStmtExecCmd.doExecute(SQLServerPreparedStatement.java:524)
>>> com.microsoft.sqlserver.jdbc.TDSCommand.execute(IOBuffer.java:7194)
>>> com.microsoft.sqlserver.jdbc.SQLServerConnection.executeCommand(SQLServerConnection.java:2979)
>>> com.microsoft.sqlserver.jdbc.SQLServerStatement.executeCommand(SQLServerStatement.java:248)
>>> com.microsoft.sqlserver.jdbc.SQLServerStatement.executeStatement(SQLServerStatement.java:223)
>>> com.microsoft.sqlserver.jdbc.SQLServerPreparedStatement.execute(SQLServerPreparedStatement.java:505)
>>> com.xx.common.flink.connectors.jdbc.xxJdbcJsonOutputFormat.flush(xxJdbcJsonOutputFormat.java:111)
>>> com.xx.common.flink.connectors.jdbc.xxJdbcJsonSink.snapshotState(xxJdbcJsonSink.java:33)
>>> ```
>>>
>>>
>>> Best,
>>> Congxian
>>>
>>>
>>> John Smith  于2022年9月23日周五 23:35写道:
>>>
 Sorry new file:
 https://www.dropbox.com/s/mm9521crwvevzgl/flink-flink-taskexecutor-274-flink-prod-v-task-0001.log?dl=0

 On Fri, Sep 23, 2022 at 11:26 AM John Smith 
 wrote:

> Hi I have attached the logs here...
>
>
> https://www.dropbox.com/s/12gwlps52lvxdhz/flink-flink-taskexecutor-274-flink-prod-v-task-0001.log?dl=0
>
> 1- It looks like a timeout issue. Can someone confirm?
> 2- The task manager is restarted, since I have restart on failure in
> SystemD. But it seems after a few restarts it stops. Does it mean that
> SystemD has an internal counter of how many times it will restart a 
> service
> before it doesn't do it anymore?
>



Re: Why is task manager shutting down?

2022-09-29 Thread John Smith
Sorry I mean the 180 seconds. Where does flink decide that 180 seconds is
the cutoff point... And can I increase it.

On Thu., Sep. 29, 2022, 7:02 a.m. John Smith, 
wrote:

> Is there a way to increase the 30 seconds to 60? Where is that 30 second
> timeout set?
>
> I have jdbc query timeout but at some point at night the insert takes a
> bit longer cause of index rebuilding.
>
> On Wed., Sep. 28, 2022, 5:02 a.m. Congxian Qiu, 
> wrote:
>
>> Hi John
>>
>> Yes, the whole TaskManager exited because the task did not react to
>> cancelling signal in time
>>
>> ```
>>
>> 2022-08-30 09:14:22,138 ERROR 
>> org.apache.flink.runtime.taskexecutor.TaskExecutor   [] - Task did 
>> not exit gracefully within 180 + seconds.
>> org.apache.flink.util.FlinkRuntimeException: Task did not exit gracefully 
>> within 180 + seconds.
>>  at 
>> org.apache.flink.runtime.taskmanager.Task$TaskCancelerWatchDog.run(Task.java:1791)
>>  [flink-dist_2.12-1.14.4.jar:1.14.4]
>>  at java.lang.Thread.run(Thread.java:750) [?:1.8.0_342]
>> 2022-08-30 09:14:22,139 ERROR 
>> org.apache.flink.runtime.taskexecutor.TaskManagerRunner  [] - Fatal 
>> error occurred while executing the TaskManager. Shutting it down...
>>
>> ```
>>
>>
>>  And the task stack logged such as below when cancelling the sink task
>>
>> ```
>>
>> 2022-08-30 09:14:22,135 WARN  org.apache.flink.runtime.taskmanager.Task  
>>   [] - Task 'Sink: jdbc (1/1)#359' did not react to cancelling 
>> signal - notifying TM; it is stuck for 180 seconds in method:
>>  java.net.SocketInputStream.socketRead0(Native Method)
>> java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
>> java.net.SocketInputStream.read(SocketInputStream.java:171)
>> java.net.SocketInputStream.read(SocketInputStream.java:141)
>> com.microsoft.sqlserver.jdbc.TDSChannel.read(IOBuffer.java:2023)
>> com.microsoft.sqlserver.jdbc.TDSReader.readPacket(IOBuffer.java:6418)
>> com.microsoft.sqlserver.jdbc.TDSCommand.startResponse(IOBuffer.java:7579)
>> com.microsoft.sqlserver.jdbc.SQLServerPreparedStatement.doExecutePreparedStatement(SQLServerPreparedStatement.java:592)
>> com.microsoft.sqlserver.jdbc.SQLServerPreparedStatement$PrepStmtExecCmd.doExecute(SQLServerPreparedStatement.java:524)
>> com.microsoft.sqlserver.jdbc.TDSCommand.execute(IOBuffer.java:7194)
>> com.microsoft.sqlserver.jdbc.SQLServerConnection.executeCommand(SQLServerConnection.java:2979)
>> com.microsoft.sqlserver.jdbc.SQLServerStatement.executeCommand(SQLServerStatement.java:248)
>> com.microsoft.sqlserver.jdbc.SQLServerStatement.executeStatement(SQLServerStatement.java:223)
>> com.microsoft.sqlserver.jdbc.SQLServerPreparedStatement.execute(SQLServerPreparedStatement.java:505)
>> com.xx.common.flink.connectors.jdbc.xxJdbcJsonOutputFormat.flush(xxJdbcJsonOutputFormat.java:111)
>> com.xx.common.flink.connectors.jdbc.xxJdbcJsonSink.snapshotState(xxJdbcJsonSink.java:33)
>> ```
>>
>>
>> Best,
>> Congxian
>>
>>
>> John Smith  于2022年9月23日周五 23:35写道:
>>
>>> Sorry new file:
>>> https://www.dropbox.com/s/mm9521crwvevzgl/flink-flink-taskexecutor-274-flink-prod-v-task-0001.log?dl=0
>>>
>>> On Fri, Sep 23, 2022 at 11:26 AM John Smith 
>>> wrote:
>>>
 Hi I have attached the logs here...


 https://www.dropbox.com/s/12gwlps52lvxdhz/flink-flink-taskexecutor-274-flink-prod-v-task-0001.log?dl=0

 1- It looks like a timeout issue. Can someone confirm?
 2- The task manager is restarted, since I have restart on failure in
 SystemD. But it seems after a few restarts it stops. Does it mean that
 SystemD has an internal counter of how many times it will restart a service
 before it doesn't do it anymore?

>>>


Re: Why is task manager shutting down?

2022-09-29 Thread John Smith
Is there a way to increase the 30 seconds to 60? Where is that 30 second
timeout set?

I have jdbc query timeout but at some point at night the insert takes a bit
longer cause of index rebuilding.

On Wed., Sep. 28, 2022, 5:02 a.m. Congxian Qiu, 
wrote:

> Hi John
>
> Yes, the whole TaskManager exited because the task did not react to
> cancelling signal in time
>
> ```
>
> 2022-08-30 09:14:22,138 ERROR 
> org.apache.flink.runtime.taskexecutor.TaskExecutor   [] - Task did 
> not exit gracefully within 180 + seconds.
> org.apache.flink.util.FlinkRuntimeException: Task did not exit gracefully 
> within 180 + seconds.
>   at 
> org.apache.flink.runtime.taskmanager.Task$TaskCancelerWatchDog.run(Task.java:1791)
>  [flink-dist_2.12-1.14.4.jar:1.14.4]
>   at java.lang.Thread.run(Thread.java:750) [?:1.8.0_342]
> 2022-08-30 09:14:22,139 ERROR 
> org.apache.flink.runtime.taskexecutor.TaskManagerRunner  [] - Fatal error 
> occurred while executing the TaskManager. Shutting it down...
>
> ```
>
>
>  And the task stack logged such as below when cancelling the sink task
>
> ```
>
> 2022-08-30 09:14:22,135 WARN  org.apache.flink.runtime.taskmanager.Task   
>  [] - Task 'Sink: jdbc (1/1)#359' did not react to cancelling 
> signal - notifying TM; it is stuck for 180 seconds in method:
>  java.net.SocketInputStream.socketRead0(Native Method)
> java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
> java.net.SocketInputStream.read(SocketInputStream.java:171)
> java.net.SocketInputStream.read(SocketInputStream.java:141)
> com.microsoft.sqlserver.jdbc.TDSChannel.read(IOBuffer.java:2023)
> com.microsoft.sqlserver.jdbc.TDSReader.readPacket(IOBuffer.java:6418)
> com.microsoft.sqlserver.jdbc.TDSCommand.startResponse(IOBuffer.java:7579)
> com.microsoft.sqlserver.jdbc.SQLServerPreparedStatement.doExecutePreparedStatement(SQLServerPreparedStatement.java:592)
> com.microsoft.sqlserver.jdbc.SQLServerPreparedStatement$PrepStmtExecCmd.doExecute(SQLServerPreparedStatement.java:524)
> com.microsoft.sqlserver.jdbc.TDSCommand.execute(IOBuffer.java:7194)
> com.microsoft.sqlserver.jdbc.SQLServerConnection.executeCommand(SQLServerConnection.java:2979)
> com.microsoft.sqlserver.jdbc.SQLServerStatement.executeCommand(SQLServerStatement.java:248)
> com.microsoft.sqlserver.jdbc.SQLServerStatement.executeStatement(SQLServerStatement.java:223)
> com.microsoft.sqlserver.jdbc.SQLServerPreparedStatement.execute(SQLServerPreparedStatement.java:505)
> com.xx.common.flink.connectors.jdbc.xxJdbcJsonOutputFormat.flush(xxJdbcJsonOutputFormat.java:111)
> com.xx.common.flink.connectors.jdbc.xxJdbcJsonSink.snapshotState(xxJdbcJsonSink.java:33)
> ```
>
>
> Best,
> Congxian
>
>
> John Smith  于2022年9月23日周五 23:35写道:
>
>> Sorry new file:
>> https://www.dropbox.com/s/mm9521crwvevzgl/flink-flink-taskexecutor-274-flink-prod-v-task-0001.log?dl=0
>>
>> On Fri, Sep 23, 2022 at 11:26 AM John Smith 
>> wrote:
>>
>>> Hi I have attached the logs here...
>>>
>>>
>>> https://www.dropbox.com/s/12gwlps52lvxdhz/flink-flink-taskexecutor-274-flink-prod-v-task-0001.log?dl=0
>>>
>>> 1- It looks like a timeout issue. Can someone confirm?
>>> 2- The task manager is restarted, since I have restart on failure in
>>> SystemD. But it seems after a few restarts it stops. Does it mean that
>>> SystemD has an internal counter of how many times it will restart a service
>>> before it doesn't do it anymore?
>>>
>>


Re: Why is task manager shutting down?

2022-09-28 Thread Congxian Qiu
Hi John

Yes, the whole TaskManager exited because the task did not react to
cancelling signal in time

```

2022-08-30 09:14:22,138 ERROR
org.apache.flink.runtime.taskexecutor.TaskExecutor   [] - Task
did not exit gracefully within 180 + seconds.
org.apache.flink.util.FlinkRuntimeException: Task did not exit
gracefully within 180 + seconds.
at 
org.apache.flink.runtime.taskmanager.Task$TaskCancelerWatchDog.run(Task.java:1791)
[flink-dist_2.12-1.14.4.jar:1.14.4]
at java.lang.Thread.run(Thread.java:750) [?:1.8.0_342]
2022-08-30 09:14:22,139 ERROR
org.apache.flink.runtime.taskexecutor.TaskManagerRunner  [] -
Fatal error occurred while executing the TaskManager. Shutting it
down...

```


 And the task stack logged such as below when cancelling the sink task

```

2022-08-30 09:14:22,135 WARN
org.apache.flink.runtime.taskmanager.Task[] - Task
'Sink: jdbc (1/1)#359' did not react to cancelling signal - notifying
TM; it is stuck for 180 seconds in method:
 java.net.SocketInputStream.socketRead0(Native Method)
java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
java.net.SocketInputStream.read(SocketInputStream.java:171)
java.net.SocketInputStream.read(SocketInputStream.java:141)
com.microsoft.sqlserver.jdbc.TDSChannel.read(IOBuffer.java:2023)
com.microsoft.sqlserver.jdbc.TDSReader.readPacket(IOBuffer.java:6418)
com.microsoft.sqlserver.jdbc.TDSCommand.startResponse(IOBuffer.java:7579)
com.microsoft.sqlserver.jdbc.SQLServerPreparedStatement.doExecutePreparedStatement(SQLServerPreparedStatement.java:592)
com.microsoft.sqlserver.jdbc.SQLServerPreparedStatement$PrepStmtExecCmd.doExecute(SQLServerPreparedStatement.java:524)
com.microsoft.sqlserver.jdbc.TDSCommand.execute(IOBuffer.java:7194)
com.microsoft.sqlserver.jdbc.SQLServerConnection.executeCommand(SQLServerConnection.java:2979)
com.microsoft.sqlserver.jdbc.SQLServerStatement.executeCommand(SQLServerStatement.java:248)
com.microsoft.sqlserver.jdbc.SQLServerStatement.executeStatement(SQLServerStatement.java:223)
com.microsoft.sqlserver.jdbc.SQLServerPreparedStatement.execute(SQLServerPreparedStatement.java:505)
com.xx.common.flink.connectors.jdbc.xxJdbcJsonOutputFormat.flush(xxJdbcJsonOutputFormat.java:111)
com.xx.common.flink.connectors.jdbc.xxJdbcJsonSink.snapshotState(xxJdbcJsonSink.java:33)
```


Best,
Congxian


John Smith  于2022年9月23日周五 23:35写道:

> Sorry new file:
> https://www.dropbox.com/s/mm9521crwvevzgl/flink-flink-taskexecutor-274-flink-prod-v-task-0001.log?dl=0
>
> On Fri, Sep 23, 2022 at 11:26 AM John Smith 
> wrote:
>
>> Hi I have attached the logs here...
>>
>>
>> https://www.dropbox.com/s/12gwlps52lvxdhz/flink-flink-taskexecutor-274-flink-prod-v-task-0001.log?dl=0
>>
>> 1- It looks like a timeout issue. Can someone confirm?
>> 2- The task manager is restarted, since I have restart on failure in
>> SystemD. But it seems after a few restarts it stops. Does it mean that
>> SystemD has an internal counter of how many times it will restart a service
>> before it doesn't do it anymore?
>>
>


Re: Why is task manager shutting down?

2022-09-23 Thread John Smith
Sorry new file:
https://www.dropbox.com/s/mm9521crwvevzgl/flink-flink-taskexecutor-274-flink-prod-v-task-0001.log?dl=0

On Fri, Sep 23, 2022 at 11:26 AM John Smith  wrote:

> Hi I have attached the logs here...
>
>
> https://www.dropbox.com/s/12gwlps52lvxdhz/flink-flink-taskexecutor-274-flink-prod-v-task-0001.log?dl=0
>
> 1- It looks like a timeout issue. Can someone confirm?
> 2- The task manager is restarted, since I have restart on failure in
> SystemD. But it seems after a few restarts it stops. Does it mean that
> SystemD has an internal counter of how many times it will restart a service
> before it doesn't do it anymore?
>


Why is task manager shutting down?

2022-09-23 Thread John Smith
Hi I have attached the logs here...

https://www.dropbox.com/s/12gwlps52lvxdhz/flink-flink-taskexecutor-274-flink-prod-v-task-0001.log?dl=0

1- It looks like a timeout issue. Can someone confirm?
2- The task manager is restarted, since I have restart on failure in
SystemD. But it seems after a few restarts it stops. Does it mean that
SystemD has an internal counter of how many times it will restart a service
before it doesn't do it anymore?


Re: Task manager shutting down.

2022-05-05 Thread John Smith
Actually what's happening is there's a nightly indexing job. So when we
call the insert it takes longer than the specified checkpoint threshold.
JDBC will hapilly continue waiting for a response from the DB until it's
done. So the checkpoint threshold is reached and the job tries to shut down
and restart, but the job is blocked on the JDBC driver and it's causing all
kinds of crazy exceptions as you see in the logs.

So a stop gap solution was to add setQueryTimeout to a value a bit shorter
than the threshold of the checkpoint. This allows the job to fail
"gracefully" and restart until indexing is done.

1- We can review the indexing policy, if it's required nightly, which just
means that instead of having the job fail every night it will fail only
when the indexing happens.
2- The other is to try to figure out a way to pause the job, maybe through
cron and savepoints. But it seems way overly thought.

On Wed, May 4, 2022 at 1:40 PM Martijn Visser 
wrote:

> Hi John,
>
> In an ideal scenario you would be able to leverage Flink's backpressure
> mechanism. That would effectively slow down the processing until the reason
> for backpressure has been resolved. However, given that indexing happens
> after you've sinked your result, from a Flink perspective, the action is
> completed. Perhaps someone else has a different idea on how to achieve
> this.
>
> Best regards,
>
> Martijn
>
> On Wed, 4 May 2022 at 19:31, John Smith  wrote:
>
>> So I know specifically, it's the indexing and I put setQueryTimeout. So
>> the job fails. And goes into retry. That's fine.
>>
>> But just wondering is there a way to pause the stream at a specified
>> time/checkpoint and then resume after a specified time?
>>
>> On Wed, May 4, 2022 at 10:23 AM Martijn Visser 
>> wrote:
>>
>>> Hi John,
>>>
>>> It is generic, but each database has its own dialect implementation
>>> because they all have their differences unfortunately :)
>>>
>>> I wish I knew how I could help you out here. Perhaps some of the JDBC
>>> maintainers could chip in.
>>>
>>> Best regards,
>>>
>>> Martijn
>>>
>>> On Sun, 1 May 2022 at 04:06, John Smith  wrote:
>>>
 Plus in a way isn't the flink-jdbc connector kinda generic? At least
 the older one didn't seem to be server specific.

 On Sat, Apr 30, 2022 at 10:04 PM John Smith 
 wrote:

> Hi Martin, is there anything I need to check for?
>
> On Tue, Apr 26, 2022 at 9:50 PM John Smith 
> wrote:
>
>> Yeah based off the flink JDBC output format...
>>
>>
>> On Tue, Apr 26, 2022 at 10:05 AM Martijn Visser <
>> martijnvis...@apache.org> wrote:
>>
>>> Hi John,
>>>
>>> Have you built your own JDBC MSSQL source or sink or perhaps a CDC
>>> driver? Because I'm not aware of a Flink Microsoft SQL Server JDBC 
>>> driver.
>>>
>>> Best regards,
>>>
>>> Martijn Visser
>>> https://twitter.com/MartijnVisser82
>>> https://github.com/MartijnVisser
>>>
>>>
>>> On Tue, 26 Apr 2022 at 16:01, John Smith 
>>> wrote:
>>>
 Hi running 1.14.4

 Logs included:
 https://www.dropbox.com/s/8zjndt5rzd9o80f/flink-flink-taskexecutor-138-task-0002.log?dl=0

 1- My task managers shut down with: Terminating TaskManagerRunner
 with exit code 1.
 2- It seems to happen at the same time every day. Which leads me to
 believe it's our database indexing (See below for reasoning of this).
 3- Most of our jobs are ETL from Kafka to SQL Server.
 4- We see the following exceptions in the logs:
   - Task 'Sink: jdbc (1/1)#10' did not react to cancelling
 signal - interrupting; it is stuck for 30 seconds in method:
 ... com.microsoft.sqlserver.jdbc.TDSChannel ...
   - Sink: jdbc (1/1)#9 (3aaf6d8a45df6c43198bc8297b42354c)
 switched from RUNNING to FAILED with failure cause:
 org.apache.flink.util.FlinkException: Disconnect from JobManager
 responsible for ...
 5- Also seeing this: Failed to close consumer network client with
 type org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient
 java.lang.NoClassDefFoundError:
 org/apache/kafka/common/network/Selector$CloseMode

 So what I'm guessing is happening is the indexing is blocking the
 job and the task manager cannot cleanly remove the job and finally 
 after a
 while it decides to shut down completely?

 Is there a way to pause the stream and restart at a later time
 knowing that this happens always at the same wall clock time? Or maybe
 allow the JDBC to cleanly shutdown with a timeout?





Re: Task manager shutting down.

2022-05-04 Thread Martijn Visser
Hi John,

In an ideal scenario you would be able to leverage Flink's backpressure
mechanism. That would effectively slow down the processing until the reason
for backpressure has been resolved. However, given that indexing happens
after you've sinked your result, from a Flink perspective, the action is
completed. Perhaps someone else has a different idea on how to achieve
this.

Best regards,

Martijn

On Wed, 4 May 2022 at 19:31, John Smith  wrote:

> So I know specifically, it's the indexing and I put setQueryTimeout. So
> the job fails. And goes into retry. That's fine.
>
> But just wondering is there a way to pause the stream at a specified
> time/checkpoint and then resume after a specified time?
>
> On Wed, May 4, 2022 at 10:23 AM Martijn Visser 
> wrote:
>
>> Hi John,
>>
>> It is generic, but each database has its own dialect implementation
>> because they all have their differences unfortunately :)
>>
>> I wish I knew how I could help you out here. Perhaps some of the JDBC
>> maintainers could chip in.
>>
>> Best regards,
>>
>> Martijn
>>
>> On Sun, 1 May 2022 at 04:06, John Smith  wrote:
>>
>>> Plus in a way isn't the flink-jdbc connector kinda generic? At least the
>>> older one didn't seem to be server specific.
>>>
>>> On Sat, Apr 30, 2022 at 10:04 PM John Smith 
>>> wrote:
>>>
 Hi Martin, is there anything I need to check for?

 On Tue, Apr 26, 2022 at 9:50 PM John Smith 
 wrote:

> Yeah based off the flink JDBC output format...
>
>
> On Tue, Apr 26, 2022 at 10:05 AM Martijn Visser <
> martijnvis...@apache.org> wrote:
>
>> Hi John,
>>
>> Have you built your own JDBC MSSQL source or sink or perhaps a CDC
>> driver? Because I'm not aware of a Flink Microsoft SQL Server JDBC 
>> driver.
>>
>> Best regards,
>>
>> Martijn Visser
>> https://twitter.com/MartijnVisser82
>> https://github.com/MartijnVisser
>>
>>
>> On Tue, 26 Apr 2022 at 16:01, John Smith 
>> wrote:
>>
>>> Hi running 1.14.4
>>>
>>> Logs included:
>>> https://www.dropbox.com/s/8zjndt5rzd9o80f/flink-flink-taskexecutor-138-task-0002.log?dl=0
>>>
>>> 1- My task managers shut down with: Terminating TaskManagerRunner
>>> with exit code 1.
>>> 2- It seems to happen at the same time every day. Which leads me to
>>> believe it's our database indexing (See below for reasoning of this).
>>> 3- Most of our jobs are ETL from Kafka to SQL Server.
>>> 4- We see the following exceptions in the logs:
>>>   - Task 'Sink: jdbc (1/1)#10' did not react to cancelling
>>> signal - interrupting; it is stuck for 30 seconds in method:
>>> ... com.microsoft.sqlserver.jdbc.TDSChannel ...
>>>   - Sink: jdbc (1/1)#9 (3aaf6d8a45df6c43198bc8297b42354c)
>>> switched from RUNNING to FAILED with failure cause:
>>> org.apache.flink.util.FlinkException: Disconnect from JobManager
>>> responsible for ...
>>> 5- Also seeing this: Failed to close consumer network client with
>>> type org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient
>>> java.lang.NoClassDefFoundError:
>>> org/apache/kafka/common/network/Selector$CloseMode
>>>
>>> So what I'm guessing is happening is the indexing is blocking the
>>> job and the task manager cannot cleanly remove the job and finally 
>>> after a
>>> while it decides to shut down completely?
>>>
>>> Is there a way to pause the stream and restart at a later time
>>> knowing that this happens always at the same wall clock time? Or maybe
>>> allow the JDBC to cleanly shutdown with a timeout?
>>>
>>>
>>>


Re: Task manager shutting down.

2022-05-04 Thread John Smith
So I know specifically, it's the indexing and I put setQueryTimeout. So the
job fails. And goes into retry. That's fine.

But just wondering is there a way to pause the stream at a specified
time/checkpoint and then resume after a specified time?

On Wed, May 4, 2022 at 10:23 AM Martijn Visser 
wrote:

> Hi John,
>
> It is generic, but each database has its own dialect implementation
> because they all have their differences unfortunately :)
>
> I wish I knew how I could help you out here. Perhaps some of the JDBC
> maintainers could chip in.
>
> Best regards,
>
> Martijn
>
> On Sun, 1 May 2022 at 04:06, John Smith  wrote:
>
>> Plus in a way isn't the flink-jdbc connector kinda generic? At least the
>> older one didn't seem to be server specific.
>>
>> On Sat, Apr 30, 2022 at 10:04 PM John Smith 
>> wrote:
>>
>>> Hi Martin, is there anything I need to check for?
>>>
>>> On Tue, Apr 26, 2022 at 9:50 PM John Smith 
>>> wrote:
>>>
 Yeah based off the flink JDBC output format...


 On Tue, Apr 26, 2022 at 10:05 AM Martijn Visser <
 martijnvis...@apache.org> wrote:

> Hi John,
>
> Have you built your own JDBC MSSQL source or sink or perhaps a CDC
> driver? Because I'm not aware of a Flink Microsoft SQL Server JDBC driver.
>
> Best regards,
>
> Martijn Visser
> https://twitter.com/MartijnVisser82
> https://github.com/MartijnVisser
>
>
> On Tue, 26 Apr 2022 at 16:01, John Smith 
> wrote:
>
>> Hi running 1.14.4
>>
>> Logs included:
>> https://www.dropbox.com/s/8zjndt5rzd9o80f/flink-flink-taskexecutor-138-task-0002.log?dl=0
>>
>> 1- My task managers shut down with: Terminating TaskManagerRunner
>> with exit code 1.
>> 2- It seems to happen at the same time every day. Which leads me to
>> believe it's our database indexing (See below for reasoning of this).
>> 3- Most of our jobs are ETL from Kafka to SQL Server.
>> 4- We see the following exceptions in the logs:
>>   - Task 'Sink: jdbc (1/1)#10' did not react to cancelling signal
>> - interrupting; it is stuck for 30 seconds in method:
>> ... com.microsoft.sqlserver.jdbc.TDSChannel ...
>>   - Sink: jdbc (1/1)#9 (3aaf6d8a45df6c43198bc8297b42354c)
>> switched from RUNNING to FAILED with failure cause:
>> org.apache.flink.util.FlinkException: Disconnect from JobManager
>> responsible for ...
>> 5- Also seeing this: Failed to close consumer network client with
>> type org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient
>> java.lang.NoClassDefFoundError:
>> org/apache/kafka/common/network/Selector$CloseMode
>>
>> So what I'm guessing is happening is the indexing is blocking the job
>> and the task manager cannot cleanly remove the job and finally after a
>> while it decides to shut down completely?
>>
>> Is there a way to pause the stream and restart at a later time
>> knowing that this happens always at the same wall clock time? Or maybe
>> allow the JDBC to cleanly shutdown with a timeout?
>>
>>
>>


Re: Task manager shutting down.

2022-05-04 Thread Martijn Visser
Hi John,

It is generic, but each database has its own dialect implementation because
they all have their differences unfortunately :)

I wish I knew how I could help you out here. Perhaps some of the JDBC
maintainers could chip in.

Best regards,

Martijn

On Sun, 1 May 2022 at 04:06, John Smith  wrote:

> Plus in a way isn't the flink-jdbc connector kinda generic? At least the
> older one didn't seem to be server specific.
>
> On Sat, Apr 30, 2022 at 10:04 PM John Smith 
> wrote:
>
>> Hi Martin, is there anything I need to check for?
>>
>> On Tue, Apr 26, 2022 at 9:50 PM John Smith 
>> wrote:
>>
>>> Yeah based off the flink JDBC output format...
>>>
>>>
>>> On Tue, Apr 26, 2022 at 10:05 AM Martijn Visser <
>>> martijnvis...@apache.org> wrote:
>>>
 Hi John,

 Have you built your own JDBC MSSQL source or sink or perhaps a CDC
 driver? Because I'm not aware of a Flink Microsoft SQL Server JDBC driver.

 Best regards,

 Martijn Visser
 https://twitter.com/MartijnVisser82
 https://github.com/MartijnVisser


 On Tue, 26 Apr 2022 at 16:01, John Smith 
 wrote:

> Hi running 1.14.4
>
> Logs included:
> https://www.dropbox.com/s/8zjndt5rzd9o80f/flink-flink-taskexecutor-138-task-0002.log?dl=0
>
> 1- My task managers shut down with: Terminating TaskManagerRunner with
> exit code 1.
> 2- It seems to happen at the same time every day. Which leads me to
> believe it's our database indexing (See below for reasoning of this).
> 3- Most of our jobs are ETL from Kafka to SQL Server.
> 4- We see the following exceptions in the logs:
>   - Task 'Sink: jdbc (1/1)#10' did not react to cancelling signal
> - interrupting; it is stuck for 30 seconds in method:
> ... com.microsoft.sqlserver.jdbc.TDSChannel ...
>   - Sink: jdbc (1/1)#9 (3aaf6d8a45df6c43198bc8297b42354c) switched
> from RUNNING to FAILED with failure cause:
> org.apache.flink.util.FlinkException: Disconnect from JobManager
> responsible for ...
> 5- Also seeing this: Failed to close consumer network client with type
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient
> java.lang.NoClassDefFoundError:
> org/apache/kafka/common/network/Selector$CloseMode
>
> So what I'm guessing is happening is the indexing is blocking the job
> and the task manager cannot cleanly remove the job and finally after a
> while it decides to shut down completely?
>
> Is there a way to pause the stream and restart at a later time knowing
> that this happens always at the same wall clock time? Or maybe allow the
> JDBC to cleanly shutdown with a timeout?
>
>
>


Re: Task manager shutting down.

2022-04-30 Thread John Smith
Plus in a way isn't the flink-jdbc connector kinda generic? At least the
older one didn't seem to be server specific.

On Sat, Apr 30, 2022 at 10:04 PM John Smith  wrote:

> Hi Martin, is there anything I need to check for?
>
> On Tue, Apr 26, 2022 at 9:50 PM John Smith  wrote:
>
>> Yeah based off the flink JDBC output format...
>>
>>
>> On Tue, Apr 26, 2022 at 10:05 AM Martijn Visser 
>> wrote:
>>
>>> Hi John,
>>>
>>> Have you built your own JDBC MSSQL source or sink or perhaps a CDC
>>> driver? Because I'm not aware of a Flink Microsoft SQL Server JDBC driver.
>>>
>>> Best regards,
>>>
>>> Martijn Visser
>>> https://twitter.com/MartijnVisser82
>>> https://github.com/MartijnVisser
>>>
>>>
>>> On Tue, 26 Apr 2022 at 16:01, John Smith  wrote:
>>>
 Hi running 1.14.4

 Logs included:
 https://www.dropbox.com/s/8zjndt5rzd9o80f/flink-flink-taskexecutor-138-task-0002.log?dl=0

 1- My task managers shut down with: Terminating TaskManagerRunner with
 exit code 1.
 2- It seems to happen at the same time every day. Which leads me to
 believe it's our database indexing (See below for reasoning of this).
 3- Most of our jobs are ETL from Kafka to SQL Server.
 4- We see the following exceptions in the logs:
   - Task 'Sink: jdbc (1/1)#10' did not react to cancelling signal -
 interrupting; it is stuck for 30 seconds in method:
 ... com.microsoft.sqlserver.jdbc.TDSChannel ...
   - Sink: jdbc (1/1)#9 (3aaf6d8a45df6c43198bc8297b42354c) switched
 from RUNNING to FAILED with failure cause:
 org.apache.flink.util.FlinkException: Disconnect from JobManager
 responsible for ...
 5- Also seeing this: Failed to close consumer network client with type
 org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient
 java.lang.NoClassDefFoundError:
 org/apache/kafka/common/network/Selector$CloseMode

 So what I'm guessing is happening is the indexing is blocking the job
 and the task manager cannot cleanly remove the job and finally after a
 while it decides to shut down completely?

 Is there a way to pause the stream and restart at a later time knowing
 that this happens always at the same wall clock time? Or maybe allow the
 JDBC to cleanly shutdown with a timeout?





Re: Task manager shutting down.

2022-04-30 Thread John Smith
Hi Martin, is there anything I need to check for?

On Tue, Apr 26, 2022 at 9:50 PM John Smith  wrote:

> Yeah based off the flink JDBC output format...
>
>
> On Tue, Apr 26, 2022 at 10:05 AM Martijn Visser 
> wrote:
>
>> Hi John,
>>
>> Have you built your own JDBC MSSQL source or sink or perhaps a CDC
>> driver? Because I'm not aware of a Flink Microsoft SQL Server JDBC driver.
>>
>> Best regards,
>>
>> Martijn Visser
>> https://twitter.com/MartijnVisser82
>> https://github.com/MartijnVisser
>>
>>
>> On Tue, 26 Apr 2022 at 16:01, John Smith  wrote:
>>
>>> Hi running 1.14.4
>>>
>>> Logs included:
>>> https://www.dropbox.com/s/8zjndt5rzd9o80f/flink-flink-taskexecutor-138-task-0002.log?dl=0
>>>
>>> 1- My task managers shut down with: Terminating TaskManagerRunner with
>>> exit code 1.
>>> 2- It seems to happen at the same time every day. Which leads me to
>>> believe it's our database indexing (See below for reasoning of this).
>>> 3- Most of our jobs are ETL from Kafka to SQL Server.
>>> 4- We see the following exceptions in the logs:
>>>   - Task 'Sink: jdbc (1/1)#10' did not react to cancelling signal -
>>> interrupting; it is stuck for 30 seconds in method:
>>> ... com.microsoft.sqlserver.jdbc.TDSChannel ...
>>>   - Sink: jdbc (1/1)#9 (3aaf6d8a45df6c43198bc8297b42354c) switched
>>> from RUNNING to FAILED with failure cause:
>>> org.apache.flink.util.FlinkException: Disconnect from JobManager
>>> responsible for ...
>>> 5- Also seeing this: Failed to close consumer network client with type
>>> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient
>>> java.lang.NoClassDefFoundError:
>>> org/apache/kafka/common/network/Selector$CloseMode
>>>
>>> So what I'm guessing is happening is the indexing is blocking the job
>>> and the task manager cannot cleanly remove the job and finally after a
>>> while it decides to shut down completely?
>>>
>>> Is there a way to pause the stream and restart at a later time knowing
>>> that this happens always at the same wall clock time? Or maybe allow the
>>> JDBC to cleanly shutdown with a timeout?
>>>
>>>
>>>


Re: Task manager shutting down.

2022-04-26 Thread John Smith
Yeah based off the flink JDBC output format...


On Tue, Apr 26, 2022 at 10:05 AM Martijn Visser 
wrote:

> Hi John,
>
> Have you built your own JDBC MSSQL source or sink or perhaps a CDC driver?
> Because I'm not aware of a Flink Microsoft SQL Server JDBC driver.
>
> Best regards,
>
> Martijn Visser
> https://twitter.com/MartijnVisser82
> https://github.com/MartijnVisser
>
>
> On Tue, 26 Apr 2022 at 16:01, John Smith  wrote:
>
>> Hi running 1.14.4
>>
>> Logs included:
>> https://www.dropbox.com/s/8zjndt5rzd9o80f/flink-flink-taskexecutor-138-task-0002.log?dl=0
>>
>> 1- My task managers shut down with: Terminating TaskManagerRunner with
>> exit code 1.
>> 2- It seems to happen at the same time every day. Which leads me to
>> believe it's our database indexing (See below for reasoning of this).
>> 3- Most of our jobs are ETL from Kafka to SQL Server.
>> 4- We see the following exceptions in the logs:
>>   - Task 'Sink: jdbc (1/1)#10' did not react to cancelling signal -
>> interrupting; it is stuck for 30 seconds in method:
>> ... com.microsoft.sqlserver.jdbc.TDSChannel ...
>>   - Sink: jdbc (1/1)#9 (3aaf6d8a45df6c43198bc8297b42354c) switched
>> from RUNNING to FAILED with failure cause:
>> org.apache.flink.util.FlinkException: Disconnect from JobManager
>> responsible for ...
>> 5- Also seeing this: Failed to close consumer network client with type
>> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient
>> java.lang.NoClassDefFoundError:
>> org/apache/kafka/common/network/Selector$CloseMode
>>
>> So what I'm guessing is happening is the indexing is blocking the job and
>> the task manager cannot cleanly remove the job and finally after a while it
>> decides to shut down completely?
>>
>> Is there a way to pause the stream and restart at a later time knowing
>> that this happens always at the same wall clock time? Or maybe allow the
>> JDBC to cleanly shutdown with a timeout?
>>
>>
>>


Re: Task manager shutting down.

2022-04-26 Thread Martijn Visser
Hi John,

Have you built your own JDBC MSSQL source or sink or perhaps a CDC driver?
Because I'm not aware of a Flink Microsoft SQL Server JDBC driver.

Best regards,

Martijn Visser
https://twitter.com/MartijnVisser82
https://github.com/MartijnVisser


On Tue, 26 Apr 2022 at 16:01, John Smith  wrote:

> Hi running 1.14.4
>
> Logs included:
> https://www.dropbox.com/s/8zjndt5rzd9o80f/flink-flink-taskexecutor-138-task-0002.log?dl=0
>
> 1- My task managers shut down with: Terminating TaskManagerRunner with
> exit code 1.
> 2- It seems to happen at the same time every day. Which leads me to
> believe it's our database indexing (See below for reasoning of this).
> 3- Most of our jobs are ETL from Kafka to SQL Server.
> 4- We see the following exceptions in the logs:
>   - Task 'Sink: jdbc (1/1)#10' did not react to cancelling signal -
> interrupting; it is stuck for 30 seconds in method:
> ... com.microsoft.sqlserver.jdbc.TDSChannel ...
>   - Sink: jdbc (1/1)#9 (3aaf6d8a45df6c43198bc8297b42354c) switched
> from RUNNING to FAILED with failure cause:
> org.apache.flink.util.FlinkException: Disconnect from JobManager
> responsible for ...
> 5- Also seeing this: Failed to close consumer network client with type
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient
> java.lang.NoClassDefFoundError:
> org/apache/kafka/common/network/Selector$CloseMode
>
> So what I'm guessing is happening is the indexing is blocking the job and
> the task manager cannot cleanly remove the job and finally after a while it
> decides to shut down completely?
>
> Is there a way to pause the stream and restart at a later time knowing
> that this happens always at the same wall clock time? Or maybe allow the
> JDBC to cleanly shutdown with a timeout?
>
>
>


Task manager shutting down.

2022-04-26 Thread John Smith
Hi running 1.14.4

Logs included:
https://www.dropbox.com/s/8zjndt5rzd9o80f/flink-flink-taskexecutor-138-task-0002.log?dl=0

1- My task managers shut down with: Terminating TaskManagerRunner with exit
code 1.
2- It seems to happen at the same time every day. Which leads me to believe
it's our database indexing (See below for reasoning of this).
3- Most of our jobs are ETL from Kafka to SQL Server.
4- We see the following exceptions in the logs:
  - Task 'Sink: jdbc (1/1)#10' did not react to cancelling signal -
interrupting; it is stuck for 30 seconds in method:
... com.microsoft.sqlserver.jdbc.TDSChannel ...
  - Sink: jdbc (1/1)#9 (3aaf6d8a45df6c43198bc8297b42354c) switched from
RUNNING to FAILED with failure cause: org.apache.flink.util.FlinkException:
Disconnect from JobManager responsible for ...
5- Also seeing this: Failed to close consumer network client with type
org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient
java.lang.NoClassDefFoundError:
org/apache/kafka/common/network/Selector$CloseMode

So what I'm guessing is happening is the indexing is blocking the job and
the task manager cannot cleanly remove the job and finally after a while it
decides to shut down completely?

Is there a way to pause the stream and restart at a later time knowing that
this happens always at the same wall clock time? Or maybe allow the JDBC to
cleanly shutdown with a timeout?