Re: Flink 1.10 内存问题

2021-07-27 文章 Ada Luna
最后我发现问题的根源是双流JOIN没设置TTL。双流JOIN task的 OutputBuffer会被打满。然后Flink就处于假死状态了。不再消费任何数据。

Ada Luna  于2021年7月19日周一 下午7:06写道:
>
> 异步IO的Order队列打满,导致算子卡死?
>
> Ada Luna  于2021年7月19日周一 下午2:02写道:
> >
> > 我通过反压信息观察到,这个 async wait operator
> > 算子上游全部出现严重反压。很有可能是这个算子死锁或者死循环等类似问题。但是我还不知道如何进一步排查。
> >
> > "async wait operator -> (where: (=(CASE(>(ABS(Z), WaterRoseMax_5), 1,
> > 0), 0)), select: (ID, STID, _UTF-16LE'' AS STCD, ITEM, VAL, UNIT,
> > DATATIME AS DT, _UTF-16LE'2.6' AS DATAINDEX, currtime2(DATATIME,
> > _UTF-16LE'-MM-dd HH:mm:ss') AS CREATETIME, 0 AS ISFILL, 1 AS
> > STATE, 0E0 AS P5) -> to: Tuple2, where: (=(CASE(>(ABS(Z),
> > WaterRoseMax_5), 1, 0), 1)), select: (STID, DATATIME, _UTF-16LE'2' AS
> > DATATYPE, currtime2(DATATIME, _UTF-16LE'-MM-dd HH:mm:ss') AS
> > VALTIME, _UTF-16LE'' AS CORTIME, VAL AS CORBERVAL, _UTF-16LE'' AS
> > CORAFTVAL, _UTF-16LE'' AS CORNAME, _UTF-16LE'' AS CORPHONE, 0 AS
> > STATE, 0 AS ISFILL, 1 AS TYPE) -> to: Tuple2, where: (=(CASE(>(ABS(Z),
> > WaterRoseMax_5), 1, 0), 0)), select: (ID, STID, _UTF-16LE'' AS STCD,
> > ITEM, VAL, UNIT, DATATIME AS DT, _UTF-16LE'2.6' AS DATAINDEX,
> > currtime2(DATATIME, _UTF-16LE'-MM-dd HH:mm:ss') AS CREATETIME, 0
> > AS ISFILL, 1 AS STATE, 0E0 AS P5) -> to: Tuple2) (2/2)" #82 prio=5
> > os_prio=0 tid=0x7fd4c4ac5000 nid=0x21c3 in Object.wait()
> > [0x7fd4d5416000]
> > java.lang.Thread.State: WAITING (on object monitor)
> > at java.lang.Object.wait(Native Method)
> > at java.lang.Object.wait(Object.java:502)
> > at 
> > org.apache.flink.streaming.api.operators.async.AsyncWaitOperator.addAsyncBufferEntry(AsyncWaitOperator.java:403)
> > at 
> > org.apache.flink.streaming.api.operators.async.AsyncWaitOperator.processElement(AsyncWaitOperator.java:224)
> > at 
> > org.apache.flink.streaming.runtime.io.StreamInputProcessor.processInput(StreamInputProcessor.java:202)
> > - locked <0x00074cb5b3a0> (a java.lang.Object)
> > at 
> > org.apache.flink.streaming.runtime.tasks.OneInputStreamTask.run(OneInputStreamTask.java:105)
> > at 
> > org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:302)
> > at org.apache.flink.runtime.taskmanager.Task.run(Task.java:711)
> > at java.lang.Thread.run(Thread.java:748)
> >
> > "async wait operator -> (where: (=(CASE(>(ABS(Z), WaterRoseMax_5), 1,
> > 0), 0)), select: (ID, STID, _UTF-16LE'' AS STCD, ITEM, VAL, UNIT,
> > DATATIME AS DT, _UTF-16LE'2.6' AS DATAINDEX, currtime2(DATATIME,
> > _UTF-16LE'-MM-dd HH:mm:ss') AS CREATETIME, 0 AS ISFILL, 1 AS
> > STATE, 0E0 AS P5) -> to: Tuple2, where: (=(CASE(>(ABS(Z),
> > WaterRoseMax_5), 1, 0), 1)), select: (STID, DATATIME, _UTF-16LE'2' AS
> > DATATYPE, currtime2(DATATIME, _UTF-16LE'-MM-dd HH:mm:ss') AS
> > VALTIME, _UTF-16LE'' AS CORTIME, VAL AS CORBERVAL, _UTF-16LE'' AS
> > CORAFTVAL, _UTF-16LE'' AS CORNAME, _UTF-16LE'' AS CORPHONE, 0 AS
> > STATE, 0 AS ISFILL, 1 AS TYPE) -> to: Tuple2, where: (=(CASE(>(ABS(Z),
> > WaterRoseMax_5), 1, 0), 0)), select: (ID, STID, _UTF-16LE'' AS STCD,
> > ITEM, VAL, UNIT, DATATIME AS DT, _UTF-16LE'2.6' AS DATAINDEX,
> > currtime2(DATATIME, _UTF-16LE'-MM-dd HH:mm:ss') AS CREATETIME, 0
> > AS ISFILL, 1 AS STATE, 0E0 AS P5) -> to: Tuple2) (1/2)" #81 prio=5
> > os_prio=0 tid=0x7fd4c4ac3000 nid=0x21c2 in Object.wait()
> > [0x7fd4d5517000]
> > java.lang.Thread.State: WAITING (on object monitor)
> > at java.lang.Object.wait(Native Method)
> > at java.lang.Object.wait(Object.java:502)
> > at 
> > org.apache.flink.runtime.io.network.partition.consumer.SingleInputGate.getNextBufferOrEvent(SingleInputGate.java:539)
> > - locked <0x00074cb5d560> (a java.util.ArrayDeque)
> > at 
> > org.apache.flink.runtime.io.network.partition.consumer.SingleInputGate.getNextBufferOrEvent(SingleInputGate.java:508)
> > at 
> > org.apache.flink.streaming.runtime.io.BarrierBuffer.getNextNonBlocked(BarrierBuffer.java:165)
> > at 
> > org.apache.flink.streaming.runtime.io.StreamInputProcessor.processInput(StreamInputProcessor.java:209)
> > at 
> > org.apache.flink.streaming.runtime.tasks.OneInputStreamTask.run(OneInputStreamTask.java:105)
> > at 
> > org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:302)
> > at org.apache.flink.runtime.taskmanager.Task.run(Task.java:711)
> > at java.lang.Thread.run(Thread.java:748)
> >
> > Yun Tang  于2021年7月6日周二 下午4:01写道:
> > >
> > > Hi,
> > >
> > > 有可能的,如果下游发生了死锁,无法消费任何数据的话,整个作业就假死了。要定位ro

Re: Flink 1.10 内存问题

2021-07-19 文章 Ada Luna
异步IO的Order队列打满,导致算子卡死?

Ada Luna  于2021年7月19日周一 下午2:02写道:
>
> 我通过反压信息观察到,这个 async wait operator
> 算子上游全部出现严重反压。很有可能是这个算子死锁或者死循环等类似问题。但是我还不知道如何进一步排查。
>
> "async wait operator -> (where: (=(CASE(>(ABS(Z), WaterRoseMax_5), 1,
> 0), 0)), select: (ID, STID, _UTF-16LE'' AS STCD, ITEM, VAL, UNIT,
> DATATIME AS DT, _UTF-16LE'2.6' AS DATAINDEX, currtime2(DATATIME,
> _UTF-16LE'-MM-dd HH:mm:ss') AS CREATETIME, 0 AS ISFILL, 1 AS
> STATE, 0E0 AS P5) -> to: Tuple2, where: (=(CASE(>(ABS(Z),
> WaterRoseMax_5), 1, 0), 1)), select: (STID, DATATIME, _UTF-16LE'2' AS
> DATATYPE, currtime2(DATATIME, _UTF-16LE'-MM-dd HH:mm:ss') AS
> VALTIME, _UTF-16LE'' AS CORTIME, VAL AS CORBERVAL, _UTF-16LE'' AS
> CORAFTVAL, _UTF-16LE'' AS CORNAME, _UTF-16LE'' AS CORPHONE, 0 AS
> STATE, 0 AS ISFILL, 1 AS TYPE) -> to: Tuple2, where: (=(CASE(>(ABS(Z),
> WaterRoseMax_5), 1, 0), 0)), select: (ID, STID, _UTF-16LE'' AS STCD,
> ITEM, VAL, UNIT, DATATIME AS DT, _UTF-16LE'2.6' AS DATAINDEX,
> currtime2(DATATIME, _UTF-16LE'-MM-dd HH:mm:ss') AS CREATETIME, 0
> AS ISFILL, 1 AS STATE, 0E0 AS P5) -> to: Tuple2) (2/2)" #82 prio=5
> os_prio=0 tid=0x7fd4c4ac5000 nid=0x21c3 in Object.wait()
> [0x7fd4d5416000]
> java.lang.Thread.State: WAITING (on object monitor)
> at java.lang.Object.wait(Native Method)
> at java.lang.Object.wait(Object.java:502)
> at 
> org.apache.flink.streaming.api.operators.async.AsyncWaitOperator.addAsyncBufferEntry(AsyncWaitOperator.java:403)
> at 
> org.apache.flink.streaming.api.operators.async.AsyncWaitOperator.processElement(AsyncWaitOperator.java:224)
> at 
> org.apache.flink.streaming.runtime.io.StreamInputProcessor.processInput(StreamInputProcessor.java:202)
> - locked <0x00074cb5b3a0> (a java.lang.Object)
> at 
> org.apache.flink.streaming.runtime.tasks.OneInputStreamTask.run(OneInputStreamTask.java:105)
> at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:302)
> at org.apache.flink.runtime.taskmanager.Task.run(Task.java:711)
> at java.lang.Thread.run(Thread.java:748)
>
> "async wait operator -> (where: (=(CASE(>(ABS(Z), WaterRoseMax_5), 1,
> 0), 0)), select: (ID, STID, _UTF-16LE'' AS STCD, ITEM, VAL, UNIT,
> DATATIME AS DT, _UTF-16LE'2.6' AS DATAINDEX, currtime2(DATATIME,
> _UTF-16LE'-MM-dd HH:mm:ss') AS CREATETIME, 0 AS ISFILL, 1 AS
> STATE, 0E0 AS P5) -> to: Tuple2, where: (=(CASE(>(ABS(Z),
> WaterRoseMax_5), 1, 0), 1)), select: (STID, DATATIME, _UTF-16LE'2' AS
> DATATYPE, currtime2(DATATIME, _UTF-16LE'-MM-dd HH:mm:ss') AS
> VALTIME, _UTF-16LE'' AS CORTIME, VAL AS CORBERVAL, _UTF-16LE'' AS
> CORAFTVAL, _UTF-16LE'' AS CORNAME, _UTF-16LE'' AS CORPHONE, 0 AS
> STATE, 0 AS ISFILL, 1 AS TYPE) -> to: Tuple2, where: (=(CASE(>(ABS(Z),
> WaterRoseMax_5), 1, 0), 0)), select: (ID, STID, _UTF-16LE'' AS STCD,
> ITEM, VAL, UNIT, DATATIME AS DT, _UTF-16LE'2.6' AS DATAINDEX,
> currtime2(DATATIME, _UTF-16LE'-MM-dd HH:mm:ss') AS CREATETIME, 0
> AS ISFILL, 1 AS STATE, 0E0 AS P5) -> to: Tuple2) (1/2)" #81 prio=5
> os_prio=0 tid=0x7fd4c4ac3000 nid=0x21c2 in Object.wait()
> [0x7fd4d5517000]
> java.lang.Thread.State: WAITING (on object monitor)
> at java.lang.Object.wait(Native Method)
> at java.lang.Object.wait(Object.java:502)
> at 
> org.apache.flink.runtime.io.network.partition.consumer.SingleInputGate.getNextBufferOrEvent(SingleInputGate.java:539)
> - locked <0x00074cb5d560> (a java.util.ArrayDeque)
> at 
> org.apache.flink.runtime.io.network.partition.consumer.SingleInputGate.getNextBufferOrEvent(SingleInputGate.java:508)
> at 
> org.apache.flink.streaming.runtime.io.BarrierBuffer.getNextNonBlocked(BarrierBuffer.java:165)
> at 
> org.apache.flink.streaming.runtime.io.StreamInputProcessor.processInput(StreamInputProcessor.java:209)
> at 
> org.apache.flink.streaming.runtime.tasks.OneInputStreamTask.run(OneInputStreamTask.java:105)
> at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:302)
> at org.apache.flink.runtime.taskmanager.Task.run(Task.java:711)
> at java.lang.Thread.run(Thread.java:748)
>
> Yun Tang  于2021年7月6日周二 下午4:01写道:
> >
> > Hi,
> >
> > 有可能的,如果下游发生了死锁,无法消费任何数据的话,整个作业就假死了。要定位root cause还是需要查一下下游的task究竟怎么了
> >
> > 祝好
> > 唐云
> > 
> > From: Ada Luna 
> > Sent: Tuesday, July 6, 2021 12:04
> > To: user-zh@flink.apache.org 
> > Subject: Re: Flink 1.10 内存问题
> >
> > 反压会导致整个Flink任务假死吗?一条Kafka数据都不消费了。持续几天,不重启不恢复的
> >
> > Yun Tang  于2021年7月6日周二 上午11:12写道:
> > >
> > > Hi,
> > >
> > > LocalBufferPool.requestMemorySegment 
> > > 这个方法并不是在申请内存,而是因为作业存在反压,因为下游没有及时消费,相关buffer被占用,所以上游

Re: Flink 1.10 内存问题

2021-07-19 文章 Ada Luna
我通过反压信息观察到,这个 async wait operator
算子上游全部出现严重反压。很有可能是这个算子死锁或者死循环等类似问题。但是我还不知道如何进一步排查。

"async wait operator -> (where: (=(CASE(>(ABS(Z), WaterRoseMax_5), 1,
0), 0)), select: (ID, STID, _UTF-16LE'' AS STCD, ITEM, VAL, UNIT,
DATATIME AS DT, _UTF-16LE'2.6' AS DATAINDEX, currtime2(DATATIME,
_UTF-16LE'-MM-dd HH:mm:ss') AS CREATETIME, 0 AS ISFILL, 1 AS
STATE, 0E0 AS P5) -> to: Tuple2, where: (=(CASE(>(ABS(Z),
WaterRoseMax_5), 1, 0), 1)), select: (STID, DATATIME, _UTF-16LE'2' AS
DATATYPE, currtime2(DATATIME, _UTF-16LE'-MM-dd HH:mm:ss') AS
VALTIME, _UTF-16LE'' AS CORTIME, VAL AS CORBERVAL, _UTF-16LE'' AS
CORAFTVAL, _UTF-16LE'' AS CORNAME, _UTF-16LE'' AS CORPHONE, 0 AS
STATE, 0 AS ISFILL, 1 AS TYPE) -> to: Tuple2, where: (=(CASE(>(ABS(Z),
WaterRoseMax_5), 1, 0), 0)), select: (ID, STID, _UTF-16LE'' AS STCD,
ITEM, VAL, UNIT, DATATIME AS DT, _UTF-16LE'2.6' AS DATAINDEX,
currtime2(DATATIME, _UTF-16LE'-MM-dd HH:mm:ss') AS CREATETIME, 0
AS ISFILL, 1 AS STATE, 0E0 AS P5) -> to: Tuple2) (2/2)" #82 prio=5
os_prio=0 tid=0x7fd4c4ac5000 nid=0x21c3 in Object.wait()
[0x7fd4d5416000]
java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
at java.lang.Object.wait(Object.java:502)
at 
org.apache.flink.streaming.api.operators.async.AsyncWaitOperator.addAsyncBufferEntry(AsyncWaitOperator.java:403)
at 
org.apache.flink.streaming.api.operators.async.AsyncWaitOperator.processElement(AsyncWaitOperator.java:224)
at 
org.apache.flink.streaming.runtime.io.StreamInputProcessor.processInput(StreamInputProcessor.java:202)
- locked <0x00074cb5b3a0> (a java.lang.Object)
at 
org.apache.flink.streaming.runtime.tasks.OneInputStreamTask.run(OneInputStreamTask.java:105)
at 
org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:302)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:711)
at java.lang.Thread.run(Thread.java:748)

"async wait operator -> (where: (=(CASE(>(ABS(Z), WaterRoseMax_5), 1,
0), 0)), select: (ID, STID, _UTF-16LE'' AS STCD, ITEM, VAL, UNIT,
DATATIME AS DT, _UTF-16LE'2.6' AS DATAINDEX, currtime2(DATATIME,
_UTF-16LE'-MM-dd HH:mm:ss') AS CREATETIME, 0 AS ISFILL, 1 AS
STATE, 0E0 AS P5) -> to: Tuple2, where: (=(CASE(>(ABS(Z),
WaterRoseMax_5), 1, 0), 1)), select: (STID, DATATIME, _UTF-16LE'2' AS
DATATYPE, currtime2(DATATIME, _UTF-16LE'-MM-dd HH:mm:ss') AS
VALTIME, _UTF-16LE'' AS CORTIME, VAL AS CORBERVAL, _UTF-16LE'' AS
CORAFTVAL, _UTF-16LE'' AS CORNAME, _UTF-16LE'' AS CORPHONE, 0 AS
STATE, 0 AS ISFILL, 1 AS TYPE) -> to: Tuple2, where: (=(CASE(>(ABS(Z),
WaterRoseMax_5), 1, 0), 0)), select: (ID, STID, _UTF-16LE'' AS STCD,
ITEM, VAL, UNIT, DATATIME AS DT, _UTF-16LE'2.6' AS DATAINDEX,
currtime2(DATATIME, _UTF-16LE'-MM-dd HH:mm:ss') AS CREATETIME, 0
AS ISFILL, 1 AS STATE, 0E0 AS P5) -> to: Tuple2) (1/2)" #81 prio=5
os_prio=0 tid=0x7fd4c4ac3000 nid=0x21c2 in Object.wait()
[0x7fd4d5517000]
java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
at java.lang.Object.wait(Object.java:502)
at 
org.apache.flink.runtime.io.network.partition.consumer.SingleInputGate.getNextBufferOrEvent(SingleInputGate.java:539)
- locked <0x00074cb5d560> (a java.util.ArrayDeque)
at 
org.apache.flink.runtime.io.network.partition.consumer.SingleInputGate.getNextBufferOrEvent(SingleInputGate.java:508)
at 
org.apache.flink.streaming.runtime.io.BarrierBuffer.getNextNonBlocked(BarrierBuffer.java:165)
at 
org.apache.flink.streaming.runtime.io.StreamInputProcessor.processInput(StreamInputProcessor.java:209)
at 
org.apache.flink.streaming.runtime.tasks.OneInputStreamTask.run(OneInputStreamTask.java:105)
at 
org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:302)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:711)
at java.lang.Thread.run(Thread.java:748)

Yun Tang  于2021年7月6日周二 下午4:01写道:
>
> Hi,
>
> 有可能的,如果下游发生了死锁,无法消费任何数据的话,整个作业就假死了。要定位root cause还是需要查一下下游的task究竟怎么了
>
> 祝好
> 唐云
> 
> From: Ada Luna 
> Sent: Tuesday, July 6, 2021 12:04
> To: user-zh@flink.apache.org 
> Subject: Re: Flink 1.10 内存问题
>
> 反压会导致整个Flink任务假死吗?一条Kafka数据都不消费了。持续几天,不重启不恢复的
>
> Yun Tang  于2021年7月6日周二 上午11:12写道:
> >
> > Hi,
> >
> > LocalBufferPool.requestMemorySegment 
> > 这个方法并不是在申请内存,而是因为作业存在反压,因为下游没有及时消费,相关buffer被占用,所以上游会卡在requestMemorySegment上面。
> >
> > 想要解决还是查一下为什么下游会反压。
> >
> >
> > 祝好
> > 唐云
> > 
> > From: Ada Luna 
> > Sent: Tuesday, July 6, 2021 10:43
> > To: user-zh@flink.apache.org 
> > Subject: Re: Flink 1.10 内存问题
> >
> > "Source: test_records (2/3)" #78 prio=5 os_prio=0
> > tid=0x7fd4c4a24800 nid=0x21bf in Object.wait()
> > [0x7fd4d581a000]
> > java.lang.Thread.St

Re: Flink 1.10 内存问题

2021-07-06 文章 Yun Tang
Hi,

有可能的,如果下游发生了死锁,无法消费任何数据的话,整个作业就假死了。要定位root cause还是需要查一下下游的task究竟怎么了

祝好
唐云

From: Ada Luna 
Sent: Tuesday, July 6, 2021 12:04
To: user-zh@flink.apache.org 
Subject: Re: Flink 1.10 内存问题

反压会导致整个Flink任务假死吗?一条Kafka数据都不消费了。持续几天,不重启不恢复的

Yun Tang  于2021年7月6日周二 上午11:12写道:
>
> Hi,
>
> LocalBufferPool.requestMemorySegment 
> 这个方法并不是在申请内存,而是因为作业存在反压,因为下游没有及时消费,相关buffer被占用,所以上游会卡在requestMemorySegment上面。
>
> 想要解决还是查一下为什么下游会反压。
>
>
> 祝好
> 唐云
> 
> From: Ada Luna 
> Sent: Tuesday, July 6, 2021 10:43
> To: user-zh@flink.apache.org 
> Subject: Re: Flink 1.10 内存问题
>
> "Source: test_records (2/3)" #78 prio=5 os_prio=0
> tid=0x7fd4c4a24800 nid=0x21bf in Object.wait()
> [0x7fd4d581a000]
> java.lang.Thread.State: TIMED_WAITING (on object monitor)
> at java.lang.Object.wait(Native Method)
> at 
> org.apache.flink.runtime.io.network.buffer.LocalBufferPool.requestMemorySegment(LocalBufferPool.java:251)
> - locked <0x00074d8b0df0> (a java.util.ArrayDeque)
> at 
> org.apache.flink.runtime.io.network.buffer.LocalBufferPool.requestBufferBuilderBlocking(LocalBufferPool.java:218)
> at 
> org.apache.flink.runtime.io.network.api.writer.RecordWriter.requestNewBufferBuilder(RecordWriter.java:264)
> at 
> org.apache.flink.runtime.io.network.api.writer.RecordWriter.copyFromSerializerToTargetChannel(RecordWriter.java:192)
> at 
> org.apache.flink.runtime.io.network.api.writer.RecordWriter.emit(RecordWriter.java:162)
> at 
> org.apache.flink.runtime.io.network.api.writer.RecordWriter.emit(RecordWriter.java:128)
> at 
> org.apache.flink.streaming.runtime.io.RecordWriterOutput.pushToRecordWriter(RecordWriterOutput.java:107)
> at 
> org.apache.flink.streaming.runtime.io.RecordWriterOutput.collect(RecordWriterOutput.java:89)
> at 
> org.apache.flink.streaming.runtime.io.RecordWriterOutput.collect(RecordWriterOutput.java:45)
> at 
> org.apache.flink.streaming.api.operators.AbstractStreamOperator$CountingOutput.collect(AbstractStreamOperator.java:718)
> at 
> org.apache.flink.streaming.api.operators.AbstractStreamOperator$CountingOutput.collect(AbstractStreamOperator.java:696)
> at 
> org.apache.flink.streaming.api.operators.StreamSourceContexts$NonTimestampContext.collect(StreamSourceContexts.java:104)
> - locked <0x00074cbd3be0> (a java.lang.Object)
> at 
> org.apache.flink.streaming.api.operators.StreamSourceContexts$NonTimestampContext.collectWithTimestamp(StreamSourceContexts.java:111)
> at 
> org.apache.flink.streaming.connectors.kafka.internals.AbstractFetcher.emitRecordWithTimestamp(AbstractFetcher.java:398)
> - locked <0x00074cbd3be0> (a java.lang.Object)
> at 
> org.apache.flink.streaming.connectors.kafka.internal.Kafka010Fetcher.emitRecord(Kafka010Fetcher.java:91)
> at 
> org.apache.flink.streaming.connectors.kafka.internal.Kafka09Fetcher.runFetchLoop(Kafka09Fetcher.java:156)
> at 
> org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumerBase.run(FlinkKafkaConsumerBase.java:711)
> at 
> com.dtstack.flink.sql.source.kafka.KafkaConsumer011.run(KafkaConsumer011.java:66)
> at 
> org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:93)
> at 
> org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:57)
> at 
> org.apache.flink.streaming.runtime.tasks.SourceStreamTask.run(SourceStreamTask.java:97)
> at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:302)
> at org.apache.flink.runtime.taskmanager.Task.run(Task.java:711)
> at java.lang.Thread.run(Thread.java:748)
>
> Ada Luna  于2021年7月6日周二 上午10:13写道:
> >
> > 下面报错调大TaskManager内存即可解决,但是我不知道为什么Flink内存不够大会出现如下假死情况。申请内存卡住。整个任务状态为RUNNING但是不再消费数据。
> >
> >
> >
> > "Map -> to: Tuple2 -> Map -> (from: (id, sid, item, val, unit, dt,
> > after_index, tablename, PROCTIME) -> where: (AND(=(tablename,
> > CONCAT(_UTF-16LE't_real', currtime2(dt, _UTF-16LE'MMdd'))),
> > OR(=(after_index, _UTF-16LE'2.6'), AND(=(sid, _UTF-16LE'7296'),
> > =(after_index, _UTF-16LE'2.10'))), LIKE(item, _UTF-16LE'??5%'),
> > =(MOD(getminutes(dt), 5), 0))), select: (id AS ID, sid AS STID, item
> > AS ITEM, val AS VAL, unit AS UNIT, dt AS DATATIME), from: (id, sid,
> > item, val, unit, dt, after_index, tablename, PROCTIME) -> where:
> > (AND(=(tablename, CONCAT(_UTF-16LE't_real', currtime2(dt,
> > _UTF-16LE'MMdd'))), OR(=(after_index, _UTF-16LE'2.6'), AND(=(sid,
> > _UTF-16LE'7296'), =(after_index, _UTF-16LE'2.10'))), LIKE(item,
> > _UTF-16LE'??5%'), =(MOD(getminutes(dt), 5), 0))), select: (sid AS
> > STID, val AS VAL, dt AS DATATIME)) (1/1)" #7

Re: Flink 1.10 内存问题

2021-07-05 文章 yidan zhao
Mark.这个问题我也遇到过,不清楚为啥。我也很奇怪,之前我的任务经常这样,反压到直接停滞,CPU使用率也会降低。
按照正常的理解,反压时候压力高,CPU使用率应该高。 而我的情况是反压到整个任务停滞,导致CPU使用率就很低很低了。

Ada Luna  于2021年7月6日周二 下午12:05写道:
>
> 反压会导致整个Flink任务假死吗?一条Kafka数据都不消费了。持续几天,不重启不恢复的
>
> Yun Tang  于2021年7月6日周二 上午11:12写道:
> >
> > Hi,
> >
> > LocalBufferPool.requestMemorySegment 
> > 这个方法并不是在申请内存,而是因为作业存在反压,因为下游没有及时消费,相关buffer被占用,所以上游会卡在requestMemorySegment上面。
> >
> > 想要解决还是查一下为什么下游会反压。
> >
> >
> > 祝好
> > 唐云
> > 
> > From: Ada Luna 
> > Sent: Tuesday, July 6, 2021 10:43
> > To: user-zh@flink.apache.org 
> > Subject: Re: Flink 1.10 内存问题
> >
> > "Source: test_records (2/3)" #78 prio=5 os_prio=0
> > tid=0x7fd4c4a24800 nid=0x21bf in Object.wait()
> > [0x7fd4d581a000]
> > java.lang.Thread.State: TIMED_WAITING (on object monitor)
> > at java.lang.Object.wait(Native Method)
> > at 
> > org.apache.flink.runtime.io.network.buffer.LocalBufferPool.requestMemorySegment(LocalBufferPool.java:251)
> > - locked <0x00074d8b0df0> (a java.util.ArrayDeque)
> > at 
> > org.apache.flink.runtime.io.network.buffer.LocalBufferPool.requestBufferBuilderBlocking(LocalBufferPool.java:218)
> > at 
> > org.apache.flink.runtime.io.network.api.writer.RecordWriter.requestNewBufferBuilder(RecordWriter.java:264)
> > at 
> > org.apache.flink.runtime.io.network.api.writer.RecordWriter.copyFromSerializerToTargetChannel(RecordWriter.java:192)
> > at 
> > org.apache.flink.runtime.io.network.api.writer.RecordWriter.emit(RecordWriter.java:162)
> > at 
> > org.apache.flink.runtime.io.network.api.writer.RecordWriter.emit(RecordWriter.java:128)
> > at 
> > org.apache.flink.streaming.runtime.io.RecordWriterOutput.pushToRecordWriter(RecordWriterOutput.java:107)
> > at 
> > org.apache.flink.streaming.runtime.io.RecordWriterOutput.collect(RecordWriterOutput.java:89)
> > at 
> > org.apache.flink.streaming.runtime.io.RecordWriterOutput.collect(RecordWriterOutput.java:45)
> > at 
> > org.apache.flink.streaming.api.operators.AbstractStreamOperator$CountingOutput.collect(AbstractStreamOperator.java:718)
> > at 
> > org.apache.flink.streaming.api.operators.AbstractStreamOperator$CountingOutput.collect(AbstractStreamOperator.java:696)
> > at 
> > org.apache.flink.streaming.api.operators.StreamSourceContexts$NonTimestampContext.collect(StreamSourceContexts.java:104)
> > - locked <0x00074cbd3be0> (a java.lang.Object)
> > at 
> > org.apache.flink.streaming.api.operators.StreamSourceContexts$NonTimestampContext.collectWithTimestamp(StreamSourceContexts.java:111)
> > at 
> > org.apache.flink.streaming.connectors.kafka.internals.AbstractFetcher.emitRecordWithTimestamp(AbstractFetcher.java:398)
> > - locked <0x00074cbd3be0> (a java.lang.Object)
> > at 
> > org.apache.flink.streaming.connectors.kafka.internal.Kafka010Fetcher.emitRecord(Kafka010Fetcher.java:91)
> > at 
> > org.apache.flink.streaming.connectors.kafka.internal.Kafka09Fetcher.runFetchLoop(Kafka09Fetcher.java:156)
> > at 
> > org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumerBase.run(FlinkKafkaConsumerBase.java:711)
> > at 
> > com.dtstack.flink.sql.source.kafka.KafkaConsumer011.run(KafkaConsumer011.java:66)
> > at 
> > org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:93)
> > at 
> > org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:57)
> > at 
> > org.apache.flink.streaming.runtime.tasks.SourceStreamTask.run(SourceStreamTask.java:97)
> > at 
> > org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:302)
> > at org.apache.flink.runtime.taskmanager.Task.run(Task.java:711)
> > at java.lang.Thread.run(Thread.java:748)
> >
> > Ada Luna  于2021年7月6日周二 上午10:13写道:
> > >
> > > 下面报错调大TaskManager内存即可解决,但是我不知道为什么Flink内存不够大会出现如下假死情况。申请内存卡住。整个任务状态为RUNNING但是不再消费数据。
> > >
> > >
> > >
> > > "Map -> to: Tuple2 -> Map -> (from: (id, sid, item, val, unit, dt,
> > > after_index, tablename, PROCTIME) -> where: (AND(=(tablename,
> > > CONCAT(_UTF-16LE't_real', currtime2(dt, _UTF-16LE'MMdd'))),
> > > OR(=(after_index, _UTF-16LE'2.6'), AND(=(sid, _UTF-16LE'7296'),
> > > =(after_index, _UTF-16LE'2.10'))), LIKE(item, _UTF-16LE'??5%'),
> > > =(MOD(getminutes(dt), 5), 0))), select: (id AS ID, sid AS STID, item
> > > AS ITEM, val AS VAL, unit AS UNIT, dt AS DATATIME), from: (id, sid,
> > &

Re: Flink 1.10 内存问题

2021-07-05 文章 Ada Luna
反压会导致整个Flink任务假死吗?一条Kafka数据都不消费了。持续几天,不重启不恢复的

Yun Tang  于2021年7月6日周二 上午11:12写道:
>
> Hi,
>
> LocalBufferPool.requestMemorySegment 
> 这个方法并不是在申请内存,而是因为作业存在反压,因为下游没有及时消费,相关buffer被占用,所以上游会卡在requestMemorySegment上面。
>
> 想要解决还是查一下为什么下游会反压。
>
>
> 祝好
> 唐云
> 
> From: Ada Luna 
> Sent: Tuesday, July 6, 2021 10:43
> To: user-zh@flink.apache.org 
> Subject: Re: Flink 1.10 内存问题
>
> "Source: test_records (2/3)" #78 prio=5 os_prio=0
> tid=0x7fd4c4a24800 nid=0x21bf in Object.wait()
> [0x7fd4d581a000]
> java.lang.Thread.State: TIMED_WAITING (on object monitor)
> at java.lang.Object.wait(Native Method)
> at 
> org.apache.flink.runtime.io.network.buffer.LocalBufferPool.requestMemorySegment(LocalBufferPool.java:251)
> - locked <0x00074d8b0df0> (a java.util.ArrayDeque)
> at 
> org.apache.flink.runtime.io.network.buffer.LocalBufferPool.requestBufferBuilderBlocking(LocalBufferPool.java:218)
> at 
> org.apache.flink.runtime.io.network.api.writer.RecordWriter.requestNewBufferBuilder(RecordWriter.java:264)
> at 
> org.apache.flink.runtime.io.network.api.writer.RecordWriter.copyFromSerializerToTargetChannel(RecordWriter.java:192)
> at 
> org.apache.flink.runtime.io.network.api.writer.RecordWriter.emit(RecordWriter.java:162)
> at 
> org.apache.flink.runtime.io.network.api.writer.RecordWriter.emit(RecordWriter.java:128)
> at 
> org.apache.flink.streaming.runtime.io.RecordWriterOutput.pushToRecordWriter(RecordWriterOutput.java:107)
> at 
> org.apache.flink.streaming.runtime.io.RecordWriterOutput.collect(RecordWriterOutput.java:89)
> at 
> org.apache.flink.streaming.runtime.io.RecordWriterOutput.collect(RecordWriterOutput.java:45)
> at 
> org.apache.flink.streaming.api.operators.AbstractStreamOperator$CountingOutput.collect(AbstractStreamOperator.java:718)
> at 
> org.apache.flink.streaming.api.operators.AbstractStreamOperator$CountingOutput.collect(AbstractStreamOperator.java:696)
> at 
> org.apache.flink.streaming.api.operators.StreamSourceContexts$NonTimestampContext.collect(StreamSourceContexts.java:104)
> - locked <0x00074cbd3be0> (a java.lang.Object)
> at 
> org.apache.flink.streaming.api.operators.StreamSourceContexts$NonTimestampContext.collectWithTimestamp(StreamSourceContexts.java:111)
> at 
> org.apache.flink.streaming.connectors.kafka.internals.AbstractFetcher.emitRecordWithTimestamp(AbstractFetcher.java:398)
> - locked <0x00074cbd3be0> (a java.lang.Object)
> at 
> org.apache.flink.streaming.connectors.kafka.internal.Kafka010Fetcher.emitRecord(Kafka010Fetcher.java:91)
> at 
> org.apache.flink.streaming.connectors.kafka.internal.Kafka09Fetcher.runFetchLoop(Kafka09Fetcher.java:156)
> at 
> org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumerBase.run(FlinkKafkaConsumerBase.java:711)
> at 
> com.dtstack.flink.sql.source.kafka.KafkaConsumer011.run(KafkaConsumer011.java:66)
> at 
> org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:93)
> at 
> org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:57)
> at 
> org.apache.flink.streaming.runtime.tasks.SourceStreamTask.run(SourceStreamTask.java:97)
> at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:302)
> at org.apache.flink.runtime.taskmanager.Task.run(Task.java:711)
> at java.lang.Thread.run(Thread.java:748)
>
> Ada Luna  于2021年7月6日周二 上午10:13写道:
> >
> > 下面报错调大TaskManager内存即可解决,但是我不知道为什么Flink内存不够大会出现如下假死情况。申请内存卡住。整个任务状态为RUNNING但是不再消费数据。
> >
> >
> >
> > "Map -> to: Tuple2 -> Map -> (from: (id, sid, item, val, unit, dt,
> > after_index, tablename, PROCTIME) -> where: (AND(=(tablename,
> > CONCAT(_UTF-16LE't_real', currtime2(dt, _UTF-16LE'MMdd'))),
> > OR(=(after_index, _UTF-16LE'2.6'), AND(=(sid, _UTF-16LE'7296'),
> > =(after_index, _UTF-16LE'2.10'))), LIKE(item, _UTF-16LE'??5%'),
> > =(MOD(getminutes(dt), 5), 0))), select: (id AS ID, sid AS STID, item
> > AS ITEM, val AS VAL, unit AS UNIT, dt AS DATATIME), from: (id, sid,
> > item, val, unit, dt, after_index, tablename, PROCTIME) -> where:
> > (AND(=(tablename, CONCAT(_UTF-16LE't_real', currtime2(dt,
> > _UTF-16LE'MMdd'))), OR(=(after_index, _UTF-16LE'2.6'), AND(=(sid,
> > _UTF-16LE'7296'), =(after_index, _UTF-16LE'2.10'))), LIKE(item,
> > _UTF-16LE'??5%'), =(MOD(getminutes(dt), 5), 0))), select: (sid AS
> > STID, val AS VAL, dt AS DATATIME)) (1/1)" #79 prio=5 os_prio=0
> > tid=0x7fd4c4a94000 nid=0x21c0 in Object.wait()
> > [0x7fd4d5719000]
> > java.lang.Thread.State: TIMED_WAITING (on object monitor)
> > at java.lang.Object.wait(Native Method)
> > at 
&g

Re: Flink 1.10 内存问题

2021-07-05 文章 Yun Tang
Hi,

LocalBufferPool.requestMemorySegment 
这个方法并不是在申请内存,而是因为作业存在反压,因为下游没有及时消费,相关buffer被占用,所以上游会卡在requestMemorySegment上面。

想要解决还是查一下为什么下游会反压。


祝好
唐云

From: Ada Luna 
Sent: Tuesday, July 6, 2021 10:43
To: user-zh@flink.apache.org 
Subject: Re: Flink 1.10 内存问题

"Source: test_records (2/3)" #78 prio=5 os_prio=0
tid=0x7fd4c4a24800 nid=0x21bf in Object.wait()
[0x7fd4d581a000]
java.lang.Thread.State: TIMED_WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
at 
org.apache.flink.runtime.io.network.buffer.LocalBufferPool.requestMemorySegment(LocalBufferPool.java:251)
- locked <0x00074d8b0df0> (a java.util.ArrayDeque)
at 
org.apache.flink.runtime.io.network.buffer.LocalBufferPool.requestBufferBuilderBlocking(LocalBufferPool.java:218)
at 
org.apache.flink.runtime.io.network.api.writer.RecordWriter.requestNewBufferBuilder(RecordWriter.java:264)
at 
org.apache.flink.runtime.io.network.api.writer.RecordWriter.copyFromSerializerToTargetChannel(RecordWriter.java:192)
at 
org.apache.flink.runtime.io.network.api.writer.RecordWriter.emit(RecordWriter.java:162)
at 
org.apache.flink.runtime.io.network.api.writer.RecordWriter.emit(RecordWriter.java:128)
at 
org.apache.flink.streaming.runtime.io.RecordWriterOutput.pushToRecordWriter(RecordWriterOutput.java:107)
at 
org.apache.flink.streaming.runtime.io.RecordWriterOutput.collect(RecordWriterOutput.java:89)
at 
org.apache.flink.streaming.runtime.io.RecordWriterOutput.collect(RecordWriterOutput.java:45)
at 
org.apache.flink.streaming.api.operators.AbstractStreamOperator$CountingOutput.collect(AbstractStreamOperator.java:718)
at 
org.apache.flink.streaming.api.operators.AbstractStreamOperator$CountingOutput.collect(AbstractStreamOperator.java:696)
at 
org.apache.flink.streaming.api.operators.StreamSourceContexts$NonTimestampContext.collect(StreamSourceContexts.java:104)
- locked <0x00074cbd3be0> (a java.lang.Object)
at 
org.apache.flink.streaming.api.operators.StreamSourceContexts$NonTimestampContext.collectWithTimestamp(StreamSourceContexts.java:111)
at 
org.apache.flink.streaming.connectors.kafka.internals.AbstractFetcher.emitRecordWithTimestamp(AbstractFetcher.java:398)
- locked <0x00074cbd3be0> (a java.lang.Object)
at 
org.apache.flink.streaming.connectors.kafka.internal.Kafka010Fetcher.emitRecord(Kafka010Fetcher.java:91)
at 
org.apache.flink.streaming.connectors.kafka.internal.Kafka09Fetcher.runFetchLoop(Kafka09Fetcher.java:156)
at 
org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumerBase.run(FlinkKafkaConsumerBase.java:711)
at 
com.dtstack.flink.sql.source.kafka.KafkaConsumer011.run(KafkaConsumer011.java:66)
at 
org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:93)
at 
org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:57)
at 
org.apache.flink.streaming.runtime.tasks.SourceStreamTask.run(SourceStreamTask.java:97)
at 
org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:302)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:711)
at java.lang.Thread.run(Thread.java:748)

Ada Luna  于2021年7月6日周二 上午10:13写道:
>
> 下面报错调大TaskManager内存即可解决,但是我不知道为什么Flink内存不够大会出现如下假死情况。申请内存卡住。整个任务状态为RUNNING但是不再消费数据。
>
>
>
> "Map -> to: Tuple2 -> Map -> (from: (id, sid, item, val, unit, dt,
> after_index, tablename, PROCTIME) -> where: (AND(=(tablename,
> CONCAT(_UTF-16LE't_real', currtime2(dt, _UTF-16LE'MMdd'))),
> OR(=(after_index, _UTF-16LE'2.6'), AND(=(sid, _UTF-16LE'7296'),
> =(after_index, _UTF-16LE'2.10'))), LIKE(item, _UTF-16LE'??5%'),
> =(MOD(getminutes(dt), 5), 0))), select: (id AS ID, sid AS STID, item
> AS ITEM, val AS VAL, unit AS UNIT, dt AS DATATIME), from: (id, sid,
> item, val, unit, dt, after_index, tablename, PROCTIME) -> where:
> (AND(=(tablename, CONCAT(_UTF-16LE't_real', currtime2(dt,
> _UTF-16LE'MMdd'))), OR(=(after_index, _UTF-16LE'2.6'), AND(=(sid,
> _UTF-16LE'7296'), =(after_index, _UTF-16LE'2.10'))), LIKE(item,
> _UTF-16LE'??5%'), =(MOD(getminutes(dt), 5), 0))), select: (sid AS
> STID, val AS VAL, dt AS DATATIME)) (1/1)" #79 prio=5 os_prio=0
> tid=0x7fd4c4a94000 nid=0x21c0 in Object.wait()
> [0x7fd4d5719000]
> java.lang.Thread.State: TIMED_WAITING (on object monitor)
> at java.lang.Object.wait(Native Method)
> at 
> org.apache.flink.runtime.io.network.buffer.LocalBufferPool.requestMemorySegment(LocalBufferPool.java:251)
> - locked <0x00074e6c8b98> (a java.util.ArrayDeque)
> at 
> org.apache.flink.runtime.io.network.buffer.LocalBufferPool.requestBufferBuilderBlocking(LocalBufferPool.java:218)
> at 
> org.apache.flink.runtime.io.network.api.writer.RecordWriter.requestNewBufferBuilder(RecordWriter.java:264)
> at 
> org.apache.flink.runtime.io.network.api.writer.RecordWriter.copyFromSerializerToTargetChannel(RecordWriter.java:192)
> at 
> or

Re: Flink 1.10 内存问题

2021-07-05 文章 Ada Luna
"Source: test_records (2/3)" #78 prio=5 os_prio=0
tid=0x7fd4c4a24800 nid=0x21bf in Object.wait()
[0x7fd4d581a000]
java.lang.Thread.State: TIMED_WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
at 
org.apache.flink.runtime.io.network.buffer.LocalBufferPool.requestMemorySegment(LocalBufferPool.java:251)
- locked <0x00074d8b0df0> (a java.util.ArrayDeque)
at 
org.apache.flink.runtime.io.network.buffer.LocalBufferPool.requestBufferBuilderBlocking(LocalBufferPool.java:218)
at 
org.apache.flink.runtime.io.network.api.writer.RecordWriter.requestNewBufferBuilder(RecordWriter.java:264)
at 
org.apache.flink.runtime.io.network.api.writer.RecordWriter.copyFromSerializerToTargetChannel(RecordWriter.java:192)
at 
org.apache.flink.runtime.io.network.api.writer.RecordWriter.emit(RecordWriter.java:162)
at 
org.apache.flink.runtime.io.network.api.writer.RecordWriter.emit(RecordWriter.java:128)
at 
org.apache.flink.streaming.runtime.io.RecordWriterOutput.pushToRecordWriter(RecordWriterOutput.java:107)
at 
org.apache.flink.streaming.runtime.io.RecordWriterOutput.collect(RecordWriterOutput.java:89)
at 
org.apache.flink.streaming.runtime.io.RecordWriterOutput.collect(RecordWriterOutput.java:45)
at 
org.apache.flink.streaming.api.operators.AbstractStreamOperator$CountingOutput.collect(AbstractStreamOperator.java:718)
at 
org.apache.flink.streaming.api.operators.AbstractStreamOperator$CountingOutput.collect(AbstractStreamOperator.java:696)
at 
org.apache.flink.streaming.api.operators.StreamSourceContexts$NonTimestampContext.collect(StreamSourceContexts.java:104)
- locked <0x00074cbd3be0> (a java.lang.Object)
at 
org.apache.flink.streaming.api.operators.StreamSourceContexts$NonTimestampContext.collectWithTimestamp(StreamSourceContexts.java:111)
at 
org.apache.flink.streaming.connectors.kafka.internals.AbstractFetcher.emitRecordWithTimestamp(AbstractFetcher.java:398)
- locked <0x00074cbd3be0> (a java.lang.Object)
at 
org.apache.flink.streaming.connectors.kafka.internal.Kafka010Fetcher.emitRecord(Kafka010Fetcher.java:91)
at 
org.apache.flink.streaming.connectors.kafka.internal.Kafka09Fetcher.runFetchLoop(Kafka09Fetcher.java:156)
at 
org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumerBase.run(FlinkKafkaConsumerBase.java:711)
at 
com.dtstack.flink.sql.source.kafka.KafkaConsumer011.run(KafkaConsumer011.java:66)
at 
org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:93)
at 
org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:57)
at 
org.apache.flink.streaming.runtime.tasks.SourceStreamTask.run(SourceStreamTask.java:97)
at 
org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:302)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:711)
at java.lang.Thread.run(Thread.java:748)

Ada Luna  于2021年7月6日周二 上午10:13写道:
>
> 下面报错调大TaskManager内存即可解决,但是我不知道为什么Flink内存不够大会出现如下假死情况。申请内存卡住。整个任务状态为RUNNING但是不再消费数据。
>
>
>
> "Map -> to: Tuple2 -> Map -> (from: (id, sid, item, val, unit, dt,
> after_index, tablename, PROCTIME) -> where: (AND(=(tablename,
> CONCAT(_UTF-16LE't_real', currtime2(dt, _UTF-16LE'MMdd'))),
> OR(=(after_index, _UTF-16LE'2.6'), AND(=(sid, _UTF-16LE'7296'),
> =(after_index, _UTF-16LE'2.10'))), LIKE(item, _UTF-16LE'??5%'),
> =(MOD(getminutes(dt), 5), 0))), select: (id AS ID, sid AS STID, item
> AS ITEM, val AS VAL, unit AS UNIT, dt AS DATATIME), from: (id, sid,
> item, val, unit, dt, after_index, tablename, PROCTIME) -> where:
> (AND(=(tablename, CONCAT(_UTF-16LE't_real', currtime2(dt,
> _UTF-16LE'MMdd'))), OR(=(after_index, _UTF-16LE'2.6'), AND(=(sid,
> _UTF-16LE'7296'), =(after_index, _UTF-16LE'2.10'))), LIKE(item,
> _UTF-16LE'??5%'), =(MOD(getminutes(dt), 5), 0))), select: (sid AS
> STID, val AS VAL, dt AS DATATIME)) (1/1)" #79 prio=5 os_prio=0
> tid=0x7fd4c4a94000 nid=0x21c0 in Object.wait()
> [0x7fd4d5719000]
> java.lang.Thread.State: TIMED_WAITING (on object monitor)
> at java.lang.Object.wait(Native Method)
> at 
> org.apache.flink.runtime.io.network.buffer.LocalBufferPool.requestMemorySegment(LocalBufferPool.java:251)
> - locked <0x00074e6c8b98> (a java.util.ArrayDeque)
> at 
> org.apache.flink.runtime.io.network.buffer.LocalBufferPool.requestBufferBuilderBlocking(LocalBufferPool.java:218)
> at 
> org.apache.flink.runtime.io.network.api.writer.RecordWriter.requestNewBufferBuilder(RecordWriter.java:264)
> at 
> org.apache.flink.runtime.io.network.api.writer.RecordWriter.copyFromSerializerToTargetChannel(RecordWriter.java:192)
> at 
> org.apache.flink.runtime.io.network.api.writer.RecordWriter.emit(RecordWriter.java:162)
> at 
> org.apache.flink.runtime.io.network.api.writer.RecordWriter.emit(RecordWriter.java:128)
> at 
> org.apache.flink.streaming.runtime.io.RecordWriterOutput.pushToRecordWriter(RecordWriterOutput.java:107)
> at 
> org.apache.flink.streaming.runtime.io.RecordWriterOutput.collect(RecordWriterOutput.java:89)
> at 
>