Re: flink1.10.1/1.11.1 使用sql 进行group 和 时间窗口 操作后 状态越来越大
在测试环境: 关闭增量chk,全量的state大小大约在:100M左右; 之前开启:我观察了一段时间,膨胀到5G,而且还一直在增长; sql: select sum(xx) group by 1 分钟窗口 过期时间设置的为:5-30min -- Sent from: http://apache-flink.147419.n8.nabble.com/
Re: flink1.10.1/1.11.1 使用sql 进行group 和 时间窗口 操作后 状态越来越大
唐云大佬好, 我关闭了chk的增量模式之后,chkstate确实不会再无线膨胀了。这个是我配置的不准确,还是一个已知问题呢 -- Sent from: http://apache-flink.147419.n8.nabble.com/
Re: flink1.10.1/1.11.1 使用sql 进行group 和 时间窗口 操作后 状态越来越大
"计算机的解决方案是 出现问题,大都是保护现场,等问题解决后,释放现场 " 那么解决问题的方法是?生产上state还在不断膨胀。 简单一个问题,生产上发生OOM了,短时间内无法排查出原因,请问如何处理? -- Sent from: http://apache-flink.147419.n8.nabble.com/
Re: 执行mvn构建错误
看了下找不到的包是example相关的包(不影响core相关代码),可以从别处下载下来,然后添加到本地maven库内。 -- Sent from: http://apache-flink.147419.n8.nabble.com/
Re: 请教一下flink1.12可以指定时间清除state吗?
增加一个定时器,可以指定时间做清理动作。可以参考flink中window trigger的相关代码和实现。 -- Sent from: http://apache-flink.147419.n8.nabble.com/
Flink1.10.1代码自定义rocksdb backend webui中还是yaml显示默认的配置
flink1.10.1 在代码中通过如下方式指定backend streamEnv.setStateBackend(new RocksDBStateBackend("hdfs:///user/flink/flink-checkpoints", false)); 问题:flink web ui jm中显示的依然为yaml中默认配置 实际上面代码中的配置已经生效 -- Sent from: http://apache-flink.147419.n8.nabble.com/
Re: flink1.10.1/1.11.1 使用sql 进行group 和 时间窗口 操作后 状态越来越大
state.backend.incremental 出现问题的时候增量模式是开启的吗? -- Sent from: http://apache-flink.147419.n8.nabble.com/
Re: Re:Re: flink sql作业state size一直增加
mini batch默认为false 。题主问题找到了吗 -- Sent from: http://apache-flink.147419.n8.nabble.com/
Re: 使用sql时候,设置了idle过期时间,但是状态还是一直变大
flink 1.10.1 同样遇到这个问题 设置了ttl但是没有生效,请问题主解决该问题了吗? *sql*: select * from xx group by TUMBLE(monitor_processtime, INTERVAL '60' SECOND),topic_identity *60s的窗口,设置的过期时间是2分钟,但是checkpoint中状态还是在变大* *tEnv.getConfig().setIdleStateRetentionTime(Time.minutes(2), Time.minutes(5)); * -- Sent from: http://apache-flink.147419.n8.nabble.com/
Re: flink1.11日志上报
我们也是用的kafkaappender进行日志上报,然后在ES中提供日志检索 -- Sent from: http://apache-flink.147419.n8.nabble.com/
Re: Re:Re: Flink 1.10.1 checkpoint失败问题
非常感谢。 后续我关注下这个问题,有结论反馈给大家,供参考。 -- Sent from: http://apache-flink.147419.n8.nabble.com/
Re: Flink 1.10.1 checkpoint失败问题
flink版本:Flink1.10.1 部署方式:flink on yarn hadoop版本:cdh5.15.2-2.6.0 现状:Checkpoint CountsTriggered: 9339In Progress: 0Completed: 8439Failed: 900Restored: 7 错误信息: ava.lang.Exception: Could not perform checkpoint 1194 for operator Map (3/3). at org.apache.flink.streaming.runtime.tasks.StreamTask.triggerCheckpointOnBarrier(StreamTask.java:816) at org.apache.flink.streaming.runtime.io.CheckpointBarrierHandler.notifyCheckpoint(CheckpointBarrierHandler.java:86) at org.apache.flink.streaming.runtime.io.CheckpointBarrierTracker.processBarrier(CheckpointBarrierTracker.java:99) at org.apache.flink.streaming.runtime.io.CheckpointedInputGate.pollNext(CheckpointedInputGate.java:155) at org.apache.flink.streaming.runtime.io.StreamTaskNetworkInput.emitNext(StreamTaskNetworkInput.java:133) at org.apache.flink.streaming.runtime.io.StreamOneInputProcessor.processInput(StreamOneInputProcessor.java:69) at org.apache.flink.streaming.runtime.tasks.StreamTask.processInput(StreamTask.java:310) at org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:187) at org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop(StreamTask.java:485) at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:469) at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:708) at org.apache.flink.runtime.taskmanager.Task.run(Task.java:533) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.NullPointerException at org.apache.flink.streaming.runtime.tasks.StreamTask$CheckpointingOperation.executeCheckpointing(StreamTask.java:1382) at org.apache.flink.streaming.runtime.tasks.StreamTask.checkpointState(StreamTask.java:974) at org.apache.flink.streaming.runtime.tasks.StreamTask.lambda$performCheckpoint$5(StreamTask.java:870) at org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$SynchronizedStreamTaskActionExecutor.runThrowing(StreamTaskActionExecutor.java:94) at org.apache.flink.streaming.runtime.tasks.StreamTask.performCheckpoint(StreamTask.java:843) at org.apache.flink.streaming.runtime.tasks.StreamTask.triggerCheckpointOnBarrier(StreamTask.java:803) ... 12 more 同样的程序在11.2的版本上,chk是完全正常的。 -- Sent from: http://apache-flink.147419.n8.nabble.com/
Re: Flink 1.10.1 checkpoint失败问题
尝试了将jdk升级到了261,报错依然还有。 -- Sent from: http://apache-flink.147419.n8.nabble.com/
Re: Flink 1.10.1 checkpoint失败问题
谢谢 我看了那个issue,有问题的是jdk 1.8_060版本的,我们用的是074版本的。 我测试环境尝试升级一下jdk到251版本。 -- Sent from: http://apache-flink.147419.n8.nabble.com/
Flink 1.10.1 checkpoint失败问题
各位好,checkpoint相关问题L flink版本1.10.1:,个别的checkpoint过程发生问题: java.lang.Exception: Could not perform checkpoint 1194 for operator Map (3/3). at org.apache.flink.streaming.runtime.tasks.StreamTask.triggerCheckpointOnBarrier(StreamTask.java:816) at org.apache.flink.streaming.runtime.io.CheckpointBarrierHandler.notifyCheckpoint(CheckpointBarrierHandler.java:86) at org.apache.flink.streaming.runtime.io.CheckpointBarrierTracker.processBarrier(CheckpointBarrierTracker.java:99) at org.apache.flink.streaming.runtime.io.CheckpointedInputGate.pollNext(CheckpointedInputGate.java:155) at org.apache.flink.streaming.runtime.io.StreamTaskNetworkInput.emitNext(StreamTaskNetworkInput.java:133) at org.apache.flink.streaming.runtime.io.StreamOneInputProcessor.processInput(StreamOneInputProcessor.java:69) at org.apache.flink.streaming.runtime.tasks.StreamTask.processInput(StreamTask.java:310) at org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:187) at org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop(StreamTask.java:485) at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:469) at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:708) at org.apache.flink.runtime.taskmanager.Task.run(Task.java:533) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.NullPointerException at org.apache.flink.streaming.runtime.tasks.StreamTask$CheckpointingOperation.executeCheckpointing(StreamTask.java:1382) at org.apache.flink.streaming.runtime.tasks.StreamTask.checkpointState(StreamTask.java:974) at org.apache.flink.streaming.runtime.tasks.StreamTask.lambda$performCheckpoint$5(StreamTask.java:870) at org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$SynchronizedStreamTaskActionExecutor.runThrowing(StreamTaskActionExecutor.java:94) at org.apache.flink.streaming.runtime.tasks.StreamTask.performCheckpoint(StreamTask.java:843) at org.apache.flink.streaming.runtime.tasks.StreamTask.triggerCheckpointOnBarrier(StreamTask.java:803) ... 12 mor 绝大部分是正常完成的,但是小部分比如上面的情况,就会失败,还会导致suspending-->restart. -- Sent from: http://apache-flink.147419.n8.nabble.com/
Re: Flink 1.10.1 cdh-hdfs启用HA时,flink job提交报错
问题找到了; hdfs-site.xml配置文件冲突导致的。 原因:通过-yt上传了 外部集群的hdfs-site.xml文件。 flink10初始化taskmanager读取 hdfs-site.xml配置的时候被外部的hdfs-site.xml文件干扰。 此问题终结。 -- Sent from: http://apache-flink.147419.n8.nabble.com/
Re: flink-1.11 sql写ES6问题
可能是jar包冲突。 -- Sent from: http://apache-flink.147419.n8.nabble.com/
Re: Re:Flink 1.10.1 cdh-hdfs启用HA时,flink job提交报错
现在的配置是这样的,没有添加namenode+ip; jobmanager.archive.fs.dir: hdfs:///completed-jobs/ 需要改成: hdfs://nameservice2/completed-jobs/ 这样的吗? 感觉是创建fs的时候错了。看到这部分异常: createNonHAProxy -- Sent from: http://apache-flink.147419.n8.nabble.com/
Flink 1.10.1 cdh-hdfs启用HA时,flink job提交报错
各位老师好,在HDFS上开启HA的时候,向yarn提交任务的时候,遇到点问题。 cdh版本:5.15.2 hdfs版本:2.6.0 启动模式:flink-on-yarn 配置了HADOOP_CONF_DIR=/etc/hadoop/conf 命令: ./bin/flink run -m yarn-cluster -yt /yarn-conf -p 3 -ytm 2048 -ys 1 -ynm xxx /jars/flink10.jar xxx HDFS不启用HA的时候,能正常提交。 提交任务到yarn的时候,出现如下异常:nameservice2 是HA配置时候自定义的nameservice 2020-09-02 14:53:08,118 DEBUG org.apache.flink.yarn.YarnResourceManager - TM:remote keytab path obtained null 2020-09-02 14:53:08,119 DEBUG org.apache.flink.yarn.YarnResourceManager - TM:remote keytab principal obtained null 2020-09-02 14:53:08,119 DEBUG org.apache.flink.yarn.YarnResourceManager - TM:remote yarn conf path obtained null 2020-09-02 14:53:08,119 DEBUG org.apache.flink.yarn.YarnResourceManager - TM:remote krb5 path obtained null 2020-09-02 14:53:08,120 ERROR org.apache.flink.yarn.YarnResourceManager - Could not start TaskManager in container container_1598944802155_0042_01_06. java.lang.IllegalArgumentException: java.net.UnknownHostException: nameservice2 at org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:374) at org.apache.hadoop.hdfs.NameNodeProxies.createNonHAProxy(NameNodeProxies.java:312) at org.apache.hadoop.hdfs.NameNodeProxies.createProxy(NameNodeProxies.java:178) at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:665) at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:601) at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:148) at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2596) at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:91) at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2630) at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2612) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:370) at org.apache.hadoop.fs.Path.getFileSystem(Path.java:296) at org.apache.flink.yarn.Utils.createTaskExecutorContext(Utils.java:469) at org.apache.flink.yarn.YarnResourceManager.createTaskExecutorLaunchContext(YarnResourceManager.java:582) at org.apache.flink.yarn.YarnResourceManager.startTaskExecutorInContainer(YarnResourceManager.java:384) at java.lang.Iterable.forEach(Iterable.java:75) at org.apache.flink.yarn.YarnResourceManager.lambda$onContainersAllocated$1(YarnResourceManager.java:366) at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRunAsync(AkkaRpcActor.java:402) at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaRpcActor.java:195) at org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.handleRpcMessage(FencedAkkaRpcActor.java:74) at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleMessage(AkkaRpcActor.java:152) at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:26) at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:21) at scala.PartialFunction.applyOrElse(PartialFunction.scala:123) at scala.PartialFunction.applyOrElse$(PartialFunction.scala:122) at akka.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala:21) at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171) at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:172) at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:172) at akka.actor.Actor.aroundReceive(Actor.scala:517) at akka.actor.Actor.aroundReceive$(Actor.scala:515) at akka.actor.AbstractActor.aroundReceive(AbstractActor.scala:225) at akka.actor.ActorCell.receiveMessage(ActorCell.scala:592) at akka.actor.ActorCell.invoke(ActorCell.scala:561) at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:258) at akka.dispatch.Mailbox.run(Mailbox.scala:225) at akka.dispatch.Mailbox.exec(Mailbox.scala:235) at akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) at akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) at akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) at akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) Caused by: java.net.UnknownHostException: nameservice2 ... 41 more -- Sent from: http://apache-flink.147419.n8.nabble.com/
Flink 1.10.1 cdh-hdfs启用HA时,flink job提交报错
各位老师,好。 Flink on yarn 模式提交 hadoop:2.6.0 cdh 5.15.2 HADOOP_CONF_DIR=/etc/hadoop/conf 在cdh上开启hdfs的HA之后提交任务报错,不开启HA能正常提交任务。 启动方式: /bin/flink run -m yarn-cluster -yt /yarn-conf -p 3 -ytm 2048 -ys 1 -ynm xxx /jars/flink10.jar xx 报错信息r如下: 2020-09-02 14:53:08,118 DEBUG org.apache.flink.yarn.YarnResourceManager - TM:remote keytab path obtained null 2020-09-02 14:53:08,119 DEBUG org.apache.flink.yarn.YarnResourceManager - TM:remote keytab principal obtained null 2020-09-02 14:53:08,119 DEBUG org.apache.flink.yarn.YarnResourceManager - TM:remote yarn conf path obtained null 2020-09-02 14:53:08,119 DEBUG org.apache.flink.yarn.YarnResourceManager - TM:remote krb5 path obtained null 2020-09-02 14:53:08,120 ERROR org.apache.flink.yarn.YarnResourceManager - Could not start TaskManager in container container_1598944802155_0042_01_06. java.lang.IllegalArgumentException: java.net.UnknownHostException: nameservice2 at org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:374) at org.apache.hadoop.hdfs.NameNodeProxies.createNonHAProxy(NameNodeProxies.java:312) at org.apache.hadoop.hdfs.NameNodeProxies.createProxy(NameNodeProxies.java:178) at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:665) at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:601) at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:148) at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2596) at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:91) at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2630) at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2612) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:370) at org.apache.hadoop.fs.Path.getFileSystem(Path.java:296) at org.apache.flink.yarn.Utils.createTaskExecutorContext(Utils.java:469) at org.apache.flink.yarn.YarnResourceManager.createTaskExecutorLaunchContext(YarnResourceManager.java:582) at org.apache.flink.yarn.YarnResourceManager.startTaskExecutorInContainer(YarnResourceManager.java:384) at java.lang.Iterable.forEach(Iterable.java:75) at org.apache.flink.yarn.YarnResourceManager.lambda$onContainersAllocated$1(YarnResourceManager.java:366) at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRunAsync(AkkaRpcActor.java:402) at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaRpcActor.java:195) at org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.handleRpcMessage(FencedAkkaRpcActor.java:74) at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleMessage(AkkaRpcActor.java:152) at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:26) at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:21) at scala.PartialFunction.applyOrElse(PartialFunction.scala:123) at scala.PartialFunction.applyOrElse$(PartialFunction.scala:122) at akka.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala:21) at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171) at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:172) at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:172) at akka.actor.Actor.aroundReceive(Actor.scala:517) at akka.actor.Actor.aroundReceive$(Actor.scala:515) at akka.actor.AbstractActor.aroundReceive(AbstractActor.scala:225) at akka.actor.ActorCell.receiveMessage(ActorCell.scala:592) at akka.actor.ActorCell.invoke(ActorCell.scala:561) at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:258) at akka.dispatch.Mailbox.run(Mailbox.scala:225) at akka.dispatch.Mailbox.exec(Mailbox.scala:235) at akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) at akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) at akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) at akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) Caused by: java.net.UnknownHostException: nameservice2 ... 41 more -- Sent from: http://apache-flink.147419.n8.nabble.com/
ceshi
测试内容 -- Sent from: http://apache-flink.147419.n8.nabble.com/