flink cdc 在做全量同步时Lock wait timeout

2021-03-21 Thread 王敏超
2021-03-22 09:33:19.554 [flink-akka.actor.default-dispatcher-4] INFO 
org.apache.flink.runtime.executiongraph.ExecutionGraph - Job Flink Streaming
Job (e0f59fb577ea3d275451163cf5cc479b) switched from state RUNNING to
FAILING.
org.apache.flink.runtime.JobException: Recovery is suppressed by
FixedDelayRestartBackoffTimeStrategy(maxNumberRestartAttempts=1,
backoffTimeMS=1)
at
org.apache.flink.runtime.executiongraph.failover.flip1.ExecutionFailureHandler.handleFailure(ExecutionFailureHandler.java:116)
~[flink-dist_2.11-1.11.0.jar:1.11.0]
at
org.apache.flink.runtime.executiongraph.failover.flip1.ExecutionFailureHandler.getFailureHandlingResult(ExecutionFailureHandler.java:78)
~[flink-dist_2.11-1.11.0.jar:1.11.0]
at
org.apache.flink.runtime.scheduler.DefaultScheduler.handleTaskFailure(DefaultScheduler.java:192)
~[flink-dist_2.11-1.11.0.jar:1.11.0]
at
org.apache.flink.runtime.scheduler.DefaultScheduler.maybeHandleTaskFailure(DefaultScheduler.java:185)
~[flink-dist_2.11-1.11.0.jar:1.11.0]
at
org.apache.flink.runtime.scheduler.DefaultScheduler.updateTaskExecutionStateInternal(DefaultScheduler.java:179)
~[flink-dist_2.11-1.11.0.jar:1.11.0]
at
org.apache.flink.runtime.scheduler.SchedulerBase.updateTaskExecutionState(SchedulerBase.java:503)
~[flink-dist_2.11-1.11.0.jar:1.11.0]
at
org.apache.flink.runtime.jobmaster.JobMaster.updateTaskExecutionState(JobMaster.java:392)
~[flink-dist_2.11-1.11.0.jar:1.11.0]
at sun.reflect.GeneratedMethodAccessor64.invoke(Unknown Source) ~[?:?]
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
~[?:1.8.0_232]
at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_232]
at
org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcInvocation(AkkaRpcActor.java:284)
~[flink-dist_2.11-1.11.0.jar:1.11.0]
at
org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaRpcActor.java:199)
~[flink-dist_2.11-1.11.0.jar:1.11.0]
at
org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.handleRpcMessage(FencedAkkaRpcActor.java:74)
~[flink-dist_2.11-1.11.0.jar:1.11.0]
at
org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleMessage(AkkaRpcActor.java:152)
~[flink-dist_2.11-1.11.0.jar:1.11.0]
at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:26)
[flink-dist_2.11-1.11.0.jar:1.11.0]
at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:21)
[flink-dist_2.11-1.11.0.jar:1.11.0]
at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123)
[flink-dist_2.11-1.11.0.jar:1.11.0]
at akka.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala:21)
[flink-dist_2.11-1.11.0.jar:1.11.0]
at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:170)
[flink-dist_2.11-1.11.0.jar:1.11.0]
at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171)
[flink-dist_2.11-1.11.0.jar:1.11.0]
at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171)
[flink-dist_2.11-1.11.0.jar:1.11.0]
at akka.actor.Actor$class.aroundReceive(Actor.scala:517)
[flink-dist_2.11-1.11.0.jar:1.11.0]
at akka.actor.AbstractActor.aroundReceive(AbstractActor.scala:225)
[flink-dist_2.11-1.11.0.jar:1.11.0]
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:592)
[flink-dist_2.11-1.11.0.jar:1.11.0]
at akka.actor.ActorCell.invoke(ActorCell.scala:561)
[flink-dist_2.11-1.11.0.jar:1.11.0]
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:258)
[flink-dist_2.11-1.11.0.jar:1.11.0]
at akka.dispatch.Mailbox.run(Mailbox.scala:225)
[flink-dist_2.11-1.11.0.jar:1.11.0]
at akka.dispatch.Mailbox.exec(Mailbox.scala:235)
[flink-dist_2.11-1.11.0.jar:1.11.0]
at akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
[flink-dist_2.11-1.11.0.jar:1.11.0]
at
akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
[flink-dist_2.11-1.11.0.jar:1.11.0]
at akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
[flink-dist_2.11-1.11.0.jar:1.11.0]
at
akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
[flink-dist_2.11-1.11.0.jar:1.11.0]
Caused by: org.apache.kafka.connect.errors.ConnectException: Lock wait
timeout exceeded; try restarting transaction Error code: 1205; SQLSTATE:
40001.
at 
io.debezium.connector.mysql.AbstractReader.wrap(AbstractReader.java:230)
~[1616376681494-flink-connector-mysql-cdc.jar:?]
at
io.debezium.connector.mysql.AbstractReader.failed(AbstractReader.java:207)
~[1616376681494-flink-connector-mysql-cdc.jar:?]
at
io.debezium.connector.mysql.SnapshotReader.execute(SnapshotReader.java:831)
~[1616376681494-flink-connector-mysql-cdc.jar:?]
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
~[?:1.8.0_232]
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoo

Re: flink 使用yarn部署,报错:Maximum Memory: 8192 Requested: 10240MB. Please check the 'yarn.scheduler.maximum-allocation-mb' and the 'yarn.nodemanager.resource.memory-mb' configuration values

2021-03-21 Thread Xintong Song
报错信息里已经说明了:你的 Yarn 集群配置允许的最大 container 是 2g,而你的 flink 配置的 TM 大小是 10g。

Thank you~

Xintong Song



On Sat, Mar 20, 2021 at 7:52 PM william <712677...@qq.com> wrote:

> org.apache.flink.client.deployment.ClusterDeploymentException: Couldn't
> deploy Yarn session cluster
> at
>
> org.apache.flink.yarn.YarnClusterDescriptor.deploySessionCluster(YarnClusterDescriptor.java:425)
> at
>
> org.apache.flink.yarn.cli.FlinkYarnSessionCli.run(FlinkYarnSessionCli.java:606)
> at
>
> org.apache.flink.yarn.cli.FlinkYarnSessionCli.lambda$main$4(FlinkYarnSessionCli.java:860)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at
>
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1754)
> at
>
> org.apache.flink.runtime.security.contexts.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41)
> at
>
> org.apache.flink.yarn.cli.FlinkYarnSessionCli.main(FlinkYarnSessionCli.java:860)
> Caused by:
> org.apache.flink.yarn.YarnClusterDescriptor$YarnDeploymentException: The
> cluster does not have the requested resources for the TaskManagers
> available!
> Maximum Memory: 8192 Requested: 10240MB. Please check the
> 'yarn.scheduler.maximum-allocation-mb' and the
> 'yarn.nodemanager.resource.memory-mb' configuration values
>
>
>
>
> --
> Sent from: http://apache-flink.147419.n8.nabble.com/
>


Re: flink 1.11.2 使用rocksdb时出现org.apache.flink.util.SerializedThrowable错误

2021-03-21 Thread yidan zhao
分享下原因呗。

hdxg1101300...@163.com  于2021年3月21日周日 上午12:36写道:

> 知道原因了
>
>
>
> hdxg1101300...@163.com
>
> 发件人: hdxg1101300...@163.com
> 发送时间: 2021-03-20 22:07
> 收件人: user-zh
> 主题: flink 1.11.2 使用rocksdb时出现org.apache.flink.util.SerializedThrowable错误
> 你好:
> 最近升级flink版本从flink 1.10.2 升级到flink.1.11.2;主要是考虑日志太大查看不方便的原因;
> 代码没有变动只是从1.10.2.编译为1.11.2 ,集群客户端版本升级到1.11.2;任务提交到yarn 使用per job方式;
> 之前时一个taskmanager一个slot,现在使用一个taskmanager 2个slot;程序运行一段时间(1个小时左右)后就会出现
> Caused by: org.apache.flink.util.SerializedThrowable
>
> org.apache.flink.runtime.checkpoint.CheckpointException: Could not
> complete snapshot 53 for operator Sink: 发送短信 (5/8). Failure reason:
> Checkpoint was declined.
> at
> org.apache.flink.streaming.api.operators.StreamOperatorStateHandler.snapshotState(StreamOperatorStateHandler.java:215)
> ~[flink-dist_2.11-1.11.2.jar:1.11.2]
> at
> org.apache.flink.streaming.api.operators.StreamOperatorStateHandler.snapshotState(StreamOperatorStateHandler.java:156)
> ~[flink-dist_2.11-1.11.2.jar:1.11.2]
> at
> org.apache.flink.streaming.api.operators.AbstractStreamOperator.snapshotState(AbstractStreamOperator.java:314)
> ~[flink-dist_2.11-1.11.2.jar:1.11.2]
> at
> org.apache.flink.streaming.runtime.tasks.SubtaskCheckpointCoordinatorImpl.checkpointStreamOperator(SubtaskCheckpointCoordinatorImpl.java:614)
> ~[flink-dist_2.11-1.11.2.jar:1.11.2]
> at
> org.apache.flink.streaming.runtime.tasks.SubtaskCheckpointCoordinatorImpl.buildOperatorSnapshotFutures(SubtaskCheckpointCoordinatorImpl.java:540)
> ~[flink-dist_2.11-1.11.2.jar:1.11.2]
> at
> org.apache.flink.streaming.runtime.tasks.SubtaskCheckpointCoordinatorImpl.takeSnapshotSync(SubtaskCheckpointCoordinatorImpl.java:507)
> ~[flink-dist_2.11-1.11.2.jar:1.11.2]
> at
> org.apache.flink.streaming.runtime.tasks.SubtaskCheckpointCoordinatorImpl.checkpointState(SubtaskCheckpointCoordinatorImpl.java:266)
> ~[flink-dist_2.11-1.11.2.jar:1.11.2]
> at
> org.apache.flink.streaming.runtime.tasks.StreamTask.lambda$performCheckpoint$8(StreamTask.java:921)
> ~[flink-dist_2.11-1.11.2.jar:1.11.2]
> at
> org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$1.runThrowing(StreamTaskActionExecutor.java:47)
> ~[flink-dist_2.11-1.11.2.jar:1.11.2]
> at
> org.apache.flink.streaming.runtime.tasks.StreamTask.performCheckpoint(StreamTask.java:911)
> ~[flink-dist_2.11-1.11.2.jar:1.11.2]
> at
> org.apache.flink.streaming.runtime.tasks.StreamTask.triggerCheckpointOnBarrier(StreamTask.java:879)
> ~[flink-dist_2.11-1.11.2.jar:1.11.2]
> at 
> org.apache.flink.streaming.runtime.io.CheckpointBarrierHandler.notifyCheckpoint(CheckpointBarrierHandler.java:113)
> ~[flink-dist_2.11-1.11.2.jar:1.11.2]
> at 
> org.apache.flink.streaming.runtime.io.CheckpointBarrierAligner.processBarrier(CheckpointBarrierAligner.java:198)
> ~[flink-dist_2.11-1.11.2.jar:1.11.2]
> at 
> org.apache.flink.streaming.runtime.io.CheckpointedInputGate.pollNext(CheckpointedInputGate.java:93)
> ~[flink-dist_2.11-1.11.2.jar:1.11.2]
> at 
> org.apache.flink.streaming.runtime.io.StreamTaskNetworkInput.emitNext(StreamTaskNetworkInput.java:158)
> ~[flink-dist_2.11-1.11.2.jar:1.11.2]
> at 
> org.apache.flink.streaming.runtime.io.StreamOneInputProcessor.processInput(StreamOneInputProcessor.java:67)
> ~[flink-dist_2.11-1.11.2.jar:1.11.2]
> at
> org.apache.flink.streaming.runtime.tasks.StreamTask.processInput(StreamTask.java:351)
> ~[flink-dist_2.11-1.11.2.jar:1.11.2]
> at
> org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxStep(MailboxProcessor.java:191)
> [flink-dist_2.11-1.11.2.jar:1.11.2]
> at
> org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:181)
> [flink-dist_2.11-1.11.2.jar:1.11.2]
> at
> org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop(StreamTask.java:566)
> [flink-dist_2.11-1.11.2.jar:1.11.2]
> at
> org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:536)
> [flink-dist_2.11-1.11.2.jar:1.11.2]
> at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:721)
> [flink-dist_2.11-1.11.2.jar:1.11.2]
> at org.apache.flink.runtime.taskmanager.Task.run(Task.java:546)
> [flink-dist_2.11-1.11.2.jar:1.11.2]
> at java.lang.Thread.run(Thread.java:748) [?:1.8.0_181]
> Caused by: org.apache.flink.util.SerializedThrowable
> at com.com.functions.Transaction.firstPhase(Transaction.java:193)
> ~[dc_cbssbroadband-1.0.4.3.1-jar-with-dependencies.jar:?]
> at com.com.functions.TransactionData.flush(TransactionData.java:37)
> ~[dc_cbssbroadband-1.0.4.3.1-jar-with-dependencies.jar:?]
> at com.com.utils.TwoPhaseHttpSink.preCommit(TwoPhaseHttpSink.java:105)
> ~[dc_cbssbroadband-1.0.4.3.1-jar-with-dependencies.jar:?]
> at com.com.utils.TwoPhaseHttpSink.preCommit(TwoPhaseHttpSink.java:39)
> ~[dc_cbssbroadband-1.0.4.3.1-jar-with-dependencies.jar:?]
> at
> org.apache.flink.streaming.api.functions.sink.TwoPhaseCommitSinkFunction.snapshotState(TwoPhaseCommitSinkFunction.java:321)
> ~[flink-dist_2.11-1.11.2.jar:1.11.2]
> 

Flink job manager HA 是否可以像 Hadoop Name Node 一样手动重启?

2021-03-21 Thread macdoor
Flink job manager HA 是否可以像 Hadoop Name Node 一样手动重启,同时保证集群正常运行?

我发现  job manager 占用内存似乎总是在缓慢不断增长,Hadoop Name Node 也有这个问题,我通过隔一段时间轮动重启Hadoop
Name Node 解决这个问题,在HA模式下Flink job manager 是否可以轮动重启?





--
Sent from: http://apache-flink.147419.n8.nabble.com/

flink1.11.2集群出现了3种连接拒绝,导致任务失败

2021-03-21 Thread chaiyi
你好:
最近建立一个3台机子的flink集群,版本是 zk-3.6.2 + hadoop-3.3.0 + 
flink-1.11.2。3台机制是在同一个物理机上建立的虚拟机,应该来说不会出现网络波动导致的网络拒绝,但是为什么一直会出现网络拒绝
项目在运行一段时间以后,短则几个小时,长则3到5天,任务就会挂掉,一共出现了一下3种异常,全是网络连接方法的,请帮忙看看,是不是flink网络配置方面有问题。
1. 集群之间通信连接拒绝:
2021-03-03 08:50:42,851 INFO  
org.apache.flink.runtime.executiongraph.ExecutionGraph   [] - 
Window(ProcessingTimeSessionWindows(9), ProcessingTimeTrigger, 
FlightTrackAggregate, FlightTrackSectorResult) -> Sink: Unnamed (4/4) 
(3097c00c09b475b35c23782a3b4a8eaa) switched from RUNNING to FAILED on 
org.apache.flink.runtime.jobmaster.slotpool.SingleLogicalSlot@4df09503.
org.apache.flink.runtime.io.network.netty.exception.LocalTransportException: 
readAddress(..) failed: Connection reset by peer (connection to 
'10.100.1.222/10.100.1.222:43156')
at 
org.apache.flink.runtime.io.network.netty.CreditBasedPartitionRequestClientHandler.exceptionCaught(CreditBasedPartitionRequestClientHandler.java:173)
 ~[flink-dist_2.11-1.11.2.jar:1.11.2]
at 
org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:297)
 ~[flink-dist_2.11-1.11.2.jar:1.11.2]
at 
org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:276)
 ~[flink-dist_2.11-1.11.2.jar:1.11.2]
at 
org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.fireExceptionCaught(AbstractChannelHandlerContext.java:268)
 ~[flink-dist_2.11-1.11.2.jar:1.11.2]
at 
org.apache.flink.shaded.netty4.io.netty.channel.DefaultChannelPipeline$HeadContext.exceptionCaught(DefaultChannelPipeline.java:1388)
 ~[flink-dist_2.11-1.11.2.jar:1.11.2]
at 
org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:297)
 ~[flink-dist_2.11-1.11.2.jar:1.11.2]
at 
org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:276)
 ~[flink-dist_2.11-1.11.2.jar:1.11.2]
at 
org.apache.flink.shaded.netty4.io.netty.channel.DefaultChannelPipeline.fireExceptionCaught(DefaultChannelPipeline.java:918)
 ~[flink-dist_2.11-1.11.2.jar:1.11.2]
at 
org.apache.flink.shaded.netty4.io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.handleReadException(AbstractEpollStreamChannel.java:730)
 ~[flink-dist_2.11-1.11.2.jar:1.11.2]
at 
org.apache.flink.shaded.netty4.io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.epollInReady(AbstractEpollStreamChannel.java:820)
 ~[flink-dist_2.11-1.11.2.jar:1.11.2]
at 
org.apache.flink.shaded.netty4.io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:424)
 ~[flink-dist_2.11-1.11.2.jar:1.11.2]
at 
org.apache.flink.shaded.netty4.io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:326)
 ~[flink-dist_2.11-1.11.2.jar:1.11.2]
at 
org.apache.flink.shaded.netty4.io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:918)
 ~[flink-dist_2.11-1.11.2.jar:1.11.2]
at 
org.apache.flink.shaded.netty4.io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
 ~[flink-dist_2.11-1.11.2.jar:1.11.2]
at java.lang.Thread.run(Thread.java:748) ~[?:1.8.0_152]
Caused by: 
org.apache.flink.shaded.netty4.io.netty.channel.unix.Errors$NativeIoException: 
readAddress(..) failed: Connection reset by peer


2. 连接到ZK的请求失败,
2021-03-02 20:27:13,487 INFO  
org.apache.flink.shaded.zookeeper3.org.apache.zookeeper.ClientCnxn [] - Unable 
to read additional data from server sessionid 0x30018710a580007, likely server 
has closed socket, closing socket connection and attempting reconnect
2021-03-02 20:27:13,588 INFO  
org.apache.flink.shaded.curator4.org.apache.curator.framework.state.ConnectionStateManager
 [] - State change: SUSPENDED
2021-03-02 20:27:13,590 WARN  
org.apache.flink.runtime.leaderelection.ZooKeeperLeaderElectionService [] - 
Connection to ZooKeeper suspended. The contender LeaderContender: 
DefaultDispatcherRunner no longer participates in the leader election.
2021-03-02 20:27:13,591 WARN  
org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalService [] - 
Connection to ZooKeeper suspended. Can no longer retrieve the leader from 
ZooKeeper.
2021-03-02 20:27:13,591 WARN  
org.apache.flink.runtime.leaderelection.ZooKeeperLeaderElectionService [] - 
Connection to ZooKeeper suspended. The contender LeaderContender: 
StandaloneResourceManager no longer participates in the leader election.
2021-03-02 20:27:13,591 WARN  
org.apache.flink.runtime.leaderelection.ZooKeeperLeaderElectionService [] - 
Connection to ZooKeeper suspended. The contender http://flink-02:8081 no longer 
participates in the leader election.
2021-03-02 20:27:13,591 WARN  
org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalService [] - 
Connection to ZooKeeper suspended. Can no longer retrieve the leader from 
ZooKeeper.
2021-03-02 20:27:13,830 WARN  
org.apache.flink.shade

flink1.11.2集群出现了3种连接拒绝,导致任务失败

2021-03-21 Thread chaiyi
你好:
最近建立一个3台机子的flink集群,版本是 zk-3.6.2 + hadoop-3.3.0 + 
flink-1.11.2。3台机制是在同一个物理机上建立的虚拟机,应该来说不会出现网络波动导致的网络拒绝,但是为什么一直会出现网络拒绝
项目在运行一段时间以后,短则几个小时,长则3到5天,任务就会挂掉,一共出现了一下3种异常,全是网络连接方法的,请帮忙看看,是不是flink网络配置方面有问题。
1. 集群之间通信连接拒绝:
2021-03-03 08:50:42,851 INFO  
org.apache.flink.runtime.executiongraph.ExecutionGraph   [] - 
Window(ProcessingTimeSessionWindows(9), ProcessingTimeTrigger, 
FlightTrackAggregate, FlightTrackSectorResult) -> Sink: Unnamed (4/4) 
(3097c00c09b475b35c23782a3b4a8eaa) switched from RUNNING to FAILED on 
org.apache.flink.runtime.jobmaster.slotpool.SingleLogicalSlot@4df09503.
org.apache.flink.runtime.io.network.netty.exception.LocalTransportException: 
readAddress(..) failed: Connection reset by peer (connection to 
'10.100.1.222/10.100.1.222:43156')
at 
org.apache.flink.runtime.io.network.netty.CreditBasedPartitionRequestClientHandler.exceptionCaught(CreditBasedPartitionRequestClientHandler.java:173)
 ~[flink-dist_2.11-1.11.2.jar:1.11.2]
at 
org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:297)
 ~[flink-dist_2.11-1.11.2.jar:1.11.2]
at 
org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:276)
 ~[flink-dist_2.11-1.11.2.jar:1.11.2]
at 
org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.fireExceptionCaught(AbstractChannelHandlerContext.java:268)
 ~[flink-dist_2.11-1.11.2.jar:1.11.2]
at 
org.apache.flink.shaded.netty4.io.netty.channel.DefaultChannelPipeline$HeadContext.exceptionCaught(DefaultChannelPipeline.java:1388)
 ~[flink-dist_2.11-1.11.2.jar:1.11.2]
at 
org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:297)
 ~[flink-dist_2.11-1.11.2.jar:1.11.2]
at 
org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:276)
 ~[flink-dist_2.11-1.11.2.jar:1.11.2]
at 
org.apache.flink.shaded.netty4.io.netty.channel.DefaultChannelPipeline.fireExceptionCaught(DefaultChannelPipeline.java:918)
 ~[flink-dist_2.11-1.11.2.jar:1.11.2]
at 
org.apache.flink.shaded.netty4.io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.handleReadException(AbstractEpollStreamChannel.java:730)
 ~[flink-dist_2.11-1.11.2.jar:1.11.2]
at 
org.apache.flink.shaded.netty4.io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.epollInReady(AbstractEpollStreamChannel.java:820)
 ~[flink-dist_2.11-1.11.2.jar:1.11.2]
at 
org.apache.flink.shaded.netty4.io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:424)
 ~[flink-dist_2.11-1.11.2.jar:1.11.2]
at 
org.apache.flink.shaded.netty4.io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:326)
 ~[flink-dist_2.11-1.11.2.jar:1.11.2]
at 
org.apache.flink.shaded.netty4.io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:918)
 ~[flink-dist_2.11-1.11.2.jar:1.11.2]
at 
org.apache.flink.shaded.netty4.io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
 ~[flink-dist_2.11-1.11.2.jar:1.11.2]
at java.lang.Thread.run(Thread.java:748) ~[?:1.8.0_152]
Caused by: 
org.apache.flink.shaded.netty4.io.netty.channel.unix.Errors$NativeIoException: 
readAddress(..) failed: Connection reset by peer


2. 连接到ZK的请求失败,
2021-03-02 20:27:13,487 INFO  
org.apache.flink.shaded.zookeeper3.org.apache.zookeeper.ClientCnxn [] - Unable 
to read additional data from server sessionid 0x30018710a580007, likely server 
has closed socket, closing socket connection and attempting reconnect
2021-03-02 20:27:13,588 INFO  
org.apache.flink.shaded.curator4.org.apache.curator.framework.state.ConnectionStateManager
 [] - State change: SUSPENDED
2021-03-02 20:27:13,590 WARN  
org.apache.flink.runtime.leaderelection.ZooKeeperLeaderElectionService [] - 
Connection to ZooKeeper suspended. The contender LeaderContender: 
DefaultDispatcherRunner no longer participates in the leader election.
2021-03-02 20:27:13,591 WARN  
org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalService [] - 
Connection to ZooKeeper suspended. Can no longer retrieve the leader from 
ZooKeeper.
2021-03-02 20:27:13,591 WARN  
org.apache.flink.runtime.leaderelection.ZooKeeperLeaderElectionService [] - 
Connection to ZooKeeper suspended. The contender LeaderContender: 
StandaloneResourceManager no longer participates in the leader election.
2021-03-02 20:27:13,591 WARN  
org.apache.flink.runtime.leaderelection.ZooKeeperLeaderElectionService [] - 
Connection to ZooKeeper suspended. The contender http://flink-02:8081 no longer 
participates in the leader election.
2021-03-02 20:27:13,591 WARN  
org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalService [] - 
Connection to ZooKeeper suspended. Can no longer retrieve the leader from 
ZooKeeper.
2021-03-02 20:27:13,830 WARN  
org.apache.flink.shade

Re: Flink 1.12.0 隔几个小时Checkpoint就会失败

2021-03-21 Thread Congxian Qiu
从日志看 checkpoint 超时了,可以尝试看一下是哪个算子的哪个并发没有做完 checkpoint,可以看看这篇文章[1] 能否帮助你

[1] https://www.infoq.cn/article/g8ylv3i2akmmzgccz8ku
Best,
Congxian


Frost Wong  于2021年3月18日周四 下午12:28写道:

> 哦哦,我看到了有个
>
> setTolerableCheckpointFailureNumber
>
> 之前不知道有这个方法,倒是可以试一下,不过我就是不太理解为什么会失败,也没有任何报错
> 
> 发件人: yidan zhao 
> 发送时间: 2021年3月18日 3:47
> 收件人: user-zh 
> 主题: Re: Flink 1.12.0 隔几个小时Checkpoint就会失败
>
> 设置下检查点失败不影响任务呀,你这貌似还导致任务重启了?
>
> Frost Wong  于2021年3月18日周四 上午10:38写道:
>
> > Hi 大家好
> >
> > 我用的Flink on yarn模式运行的一个任务,每隔几个小时就会出现一次错误
> >
> > 2021-03-18 08:52:37,019 INFO
> > org.apache.flink.runtime.checkpoint.CheckpointCoordinator[] -
> Completed
> > checkpoint 661818 for job 4fa72fc414f53e5ee062f9fbd5a2f4d5 (562357 bytes
> in
> > 4699 ms).
> > 2021-03-18 08:52:37,637 INFO
> > org.apache.flink.runtime.checkpoint.CheckpointCoordinator[] -
> > Triggering checkpoint 661819 (type=CHECKPOINT) @ 1616028757520 for job
> > 4fa72fc414f53e5ee062f9fbd5a2f4d5.
> > 2021-03-18 08:52:42,956 INFO
> > org.apache.flink.runtime.checkpoint.CheckpointCoordinator[] -
> Completed
> > checkpoint 661819 for job 4fa72fc414f53e5ee062f9fbd5a2f4d5 (2233389 bytes
> > in 4939 ms).
> > 2021-03-18 08:52:43,528 INFO
> > org.apache.flink.runtime.checkpoint.CheckpointCoordinator[] -
> > Triggering checkpoint 661820 (type=CHECKPOINT) @ 1616028763457 for job
> > 4fa72fc414f53e5ee062f9fbd5a2f4d5.
> > 2021-03-18 09:12:43,528 INFO
> > org.apache.flink.runtime.checkpoint.CheckpointCoordinator[] -
> > Checkpoint 661820 of job 4fa72fc414f53e5ee062f9fbd5a2f4d5 expired before
> > completing.
> > 2021-03-18 09:12:43,615 INFO
> > org.apache.flink.runtime.jobmaster.JobMaster [] - Trying
> to
> > recover from a global failure.
> > org.apache.flink.util.FlinkRuntimeException: Exceeded checkpoint
> tolerable
> > failure threshold.
> > at
> >
> org.apache.flink.runtime.checkpoint.CheckpointFailureManager.handleCheckpointException(CheckpointFailureManager.java:90)
> > ~[flink-dist_2.12-1.12.0.jar:1.12.0]
> > at
> >
> org.apache.flink.runtime.checkpoint.CheckpointFailureManager.handleJobLevelCheckpointException(CheckpointFailureManager.java:65)
> > ~[flink-dist_2.12-1.12.0.jar:1.12.0]
> > at
> >
> org.apache.flink.runtime.checkpoint.CheckpointCoordinator.abortPendingCheckpoint(CheckpointCoordinator.java:1760)
> > ~[flink-dist_2.12-1.12.0.jar:1.12.0]
> > at
> >
> org.apache.flink.runtime.checkpoint.CheckpointCoordinator.abortPendingCheckpoint(CheckpointCoordinator.java:1733)
> > ~[flink-dist_2.12-1.12.0.jar:1.12.0]
> > at
> >
> org.apache.flink.runtime.checkpoint.CheckpointCoordinator.access$600(CheckpointCoordinator.java:93)
> > ~[flink-dist_2.12-1.12.0.jar:1.12.0]
> > at
> >
> org.apache.flink.runtime.checkpoint.CheckpointCoordinator$CheckpointCanceller.run(CheckpointCoordinator.java:1870)
> > ~[flink-dist_2.12-1.12.0.jar:1.12.0]
> > at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> > ~[?:1.8.0_231]
> > at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> ~[?:1.8.0_231]
> > at
> >
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
> > ~[?:1.8.0_231]
> > at
> >
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
> > ~[?:1.8.0_231]
> > at
> >
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> > ~[?:1.8.0_231]
> > at
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> > ~[?:1.8.0_231]
> > at java.lang.Thread.run(Thread.java:748) ~[?:1.8.0_231]
> > 2021-03-18 09:12:43,618 INFO
> > org.apache.flink.runtime.executiongraph.ExecutionGraph   [] - Job
> > csmonitor_comment_strategy (4fa72fc414f53e5ee062f9fbd5a2f4d5) switched
> from
> > state RUNNING to RESTARTING.
> > 2021-03-18 09:12:43,619 INFO
> > org.apache.flink.runtime.executiongraph.ExecutionGraph   [] - Flat
> Map
> > (43/256) (18dec1f23b95f741f5266594621971d5) switched from RUNNING to
> > CANCELING.
> > 2021-03-18 09:12:43,622 INFO
> > org.apache.flink.runtime.executiongraph.ExecutionGraph   [] - Flat
> Map
> > (44/256) (3f2ec60b2f3042ceea6e1d660c78d3d7) switched from RUNNING to
> > CANCELING.
> > 2021-03-18 09:12:43,622 INFO
> > org.apache.flink.runtime.executiongraph.ExecutionGraph   [] - Flat
> Map
> > (45/256) (66d411c2266ab025b69196dfec30d888) switched from RUNNING to
> > CANCELING.
> > 然后就自己恢复了。用的是Unaligned
> >
> Checkpoint,rocksdb存储后端,在这个错误前后也没有什么其他报错信息。从Checkpoint的metrics看,总是剩最后一个无法完成,调整过parallelism也无法解决问题。
> >
> > 谢谢大家!
> >
>


flink1.12 Standalone模式发送python脚本任务报错: java.lang.ClassNotFoundException: org.apache.flink.connector.jdbc.table.JdbcRowDataInputFormat

2021-03-21 Thread xiaoyue
flink1.12.2 部署standalone集群模式,任务是pyflink实现链接Mysql数据库完成计算任务:

1. 已在 /user/local/flink-1.12.2/lib目录下,添加相关依赖:

mysql-connector-java-8.0.12.jar,

flink-connector-jdbc_2.11-1.12.0.jar,

flink-table-api-java-1.12.0.jar

2.发送任务命令:

   bin/flink run  -py ../test.py -p 8

3.附报错信息如下;在线等路过部署过的大佬,指点一下~ 谢谢!

Traceback (most recent call last):

  File "../test.py", line 57, in 

env.execute('Test')

  File 
"/usr/local/env/flink1.12_py3_env/flink-1.12.2/opt/python/pyflink.zip/pyflink/table/table_environment.py",
 line 1276, in execute

  File 
"/usr/local/env/flink1.12_py3_env/flink-1.12.2/opt/python/py4j-0.10.8.1-src.zip/py4j/java_gateway.py",
 line 1286, in __call__

  File 
"/usr/local/env/flink1.12_py3_env/flink-1.12.2/opt/python/pyflink.zip/pyflink/util/exceptions.py",
 line 147, in deco

  File 
"/usr/local/env/flink1.12_py3_env/flink-1.12.2/opt/python/py4j-0.10.8.1-src.zip/py4j/protocol.py",
 line 328, in get_return_value

py4j.protocol.Py4JJavaError: An error occurred while calling o5.execute.

: org.apache.flink.util.FlinkException: Failed to execute job 
'Pyflink1.12_Query_Time_Test'.

at 
org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.executeAsync(StreamExecutionEnvironment.java:1918)

at 
org.apache.flink.client.program.StreamContextEnvironment.executeAsync(StreamContextEnvironment.java:135)

at 
org.apache.flink.client.program.StreamContextEnvironment.execute(StreamContextEnvironment.java:76)

at 
org.apache.flink.table.planner.delegation.ExecutorBase.execute(ExecutorBase.java:50)

at 
org.apache.flink.table.api.internal.TableEnvironmentImpl.execute(TableEnvironmentImpl.java:1277)

at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)

at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

at java.lang.reflect.Method.invoke(Method.java:498)

at 
org.apache.flink.api.python.shaded.py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)

at 
org.apache.flink.api.python.shaded.py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)

at 
org.apache.flink.api.python.shaded.py4j.Gateway.invoke(Gateway.java:282)

at 
org.apache.flink.api.python.shaded.py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)

at 
org.apache.flink.api.python.shaded.py4j.commands.CallCommand.execute(CallCommand.java:79)

at 
org.apache.flink.api.python.shaded.py4j.GatewayConnection.run(GatewayConnection.java:238)

at java.lang.Thread.run(Thread.java:748)

Caused by: java.lang.RuntimeException: 
org.apache.flink.runtime.client.JobInitializationException: Could not 
instantiate JobManager.

at org.apache.flink.util.ExceptionUtils.rethrow(ExceptionUtils.java:316)

at 
org.apache.flink.util.function.FunctionUtils.lambda$uncheckedFunction$2(FunctionUtils.java:75)

at 
java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:616)

at 
java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:591)

at 
java.util.concurrent.CompletableFuture$Completion.exec(CompletableFuture.java:457)

at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289)

at 
java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056)

at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692)

at 
java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:157)

Caused by: org.apache.flink.runtime.client.JobInitializationException: Could 
not instantiate JobManager.

at 
org.apache.flink.runtime.dispatcher.Dispatcher.lambda$createJobManagerRunner$5(Dispatcher.java:494)

at 
java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1604)

at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)

at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)

at java.lang.Thread.run(Thread.java:748)

Caused by: org.apache.flink.runtime.client.JobExecutionException: Cannot 
initialize task 'Source: TableSourceScan(table=[[default_catalog, 
default_database, TP_GL_DAY, project=[DAY_ID]]], fields=[DAY_ID])': Loading the 
input/output formats failed: 

at 
org.apache.flink.runtime.executiongraph.ExecutionGraphBuilder.buildGraph(ExecutionGraphBuilder.java:239)

at 
org.apache.flink.runtime.scheduler.SchedulerBase.createExecutionGraph(SchedulerBase.java:322)

at 
org.apache.flink.runtime.scheduler.SchedulerBase.createAndRestoreExecutionGraph(SchedulerBase.java:276)

at 
org.apache.flink.runtime.scheduler.SchedulerBase.(SchedulerBase.java:249)

at 
org.apache.flink.runtime.scheduler.DefaultScheduler.(DefaultScheduler.java:133)

at 
org.apache.flink.runtime

Re:flink 使用yarn部署,报错:Maximum Memory: 8192 Requested: 10240MB. Please check the 'yarn.scheduler.maximum-allocation-mb' and the 'yarn.nodemanager.resource.memory-mb' configuration values

2021-03-21 Thread Michael Ran
超过了 yarn 容器 配置吧
At 2021-03-20 10:57:23, "william" <712677...@qq.com> wrote:
>org.apache.flink.client.deployment.ClusterDeploymentException: Couldn't
>deploy Yarn session cluster
>at
>org.apache.flink.yarn.YarnClusterDescriptor.deploySessionCluster(YarnClusterDescriptor.java:425)
>at
>org.apache.flink.yarn.cli.FlinkYarnSessionCli.run(FlinkYarnSessionCli.java:606)
>at
>org.apache.flink.yarn.cli.FlinkYarnSessionCli.lambda$main$4(FlinkYarnSessionCli.java:860)
>at java.security.AccessController.doPrivileged(Native Method)
>at javax.security.auth.Subject.doAs(Subject.java:422)
>at
>org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1754)
>at
>org.apache.flink.runtime.security.contexts.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41)
>at
>org.apache.flink.yarn.cli.FlinkYarnSessionCli.main(FlinkYarnSessionCli.java:860)
>Caused by:
>org.apache.flink.yarn.YarnClusterDescriptor$YarnDeploymentException: The
>cluster does not have the requested resources for the TaskManagers
>available!
>Maximum Memory: 8192 Requested: 10240MB. Please check the
>'yarn.scheduler.maximum-allocation-mb' and the
>'yarn.nodemanager.resource.memory-mb' configuration values
>
>
>
>
>--
>Sent from: http://apache-flink.147419.n8.nabble.com/