Jinstarry commented on issue #5343: Kafka indexing tasks occasionally fail with 'Stream closed' URL: https://github.com/apache/incubator-druid/issues/5343#issuecomment-475091973 Hi, @jihoonson . I found a similar error in Druid 0.13.0. This problem probably occurs when there is not enough free memory on the server. When an error occurred, the available memory increased from 162056 to 4570104. During the test we were still storing data on the client and there was no loss. However, memory usage was increasing slightly. Perhaps there was not enough memory left for the segment to be created. middlemanager.log is below. `2019-03-18T10:21:45,196 INFO [forking-task-runner-6] org.apache.druid.indexing.overlord.ForkingTaskRunner - Removing task directory: var/druid/task/index_kafka_ds_test2_9d87293708d7ae2_pdpdjhlh 2019-03-18T10:21:46,079 INFO [WorkerTaskManager-NoticeHandler] org.apache.druid.indexing.worker.WorkerTaskManager - Job's finished. Completed [index_kafka_ds_test4_ee46e16cbb67785_okdcidna] with status [FAILED] 2019-03-18T10:21:46,097 INFO [WorkerTaskManager-NoticeHandler] org.apache.druid.indexing.worker.WorkerTaskManager - Job's finished. Completed [index_kafka_ds_test2_9d87293708d7ae2_pdpdjhlh] with status [FAILED] 2019-03-18T10:21:53,750 INFO [qtp1431064234-77] org.apache.druid.indexing.overlord.ForkingTaskRunner - Killing process for task: index_kafka_ds_test2_4d88b40775ab307_cdkfbhbk 2019-03-18T10:21:53,756 INFO [forking-task-runner-10] org.apache.druid.indexing.common.tasklogs.FileTaskLogs - Wrote task log to: var/druid/indexing-logs/index_kafka_ds_test2_4d88b40775ab307_cdkfbhbk.log 2019-03-18T10:21:53,761 INFO [forking-task-runner-10] org.apache.druid.indexing.overlord.ForkingTaskRunner - Exception caught during execution java.io.IOException: Stream closed at java.io.BufferedInputStream.getBufIfOpen(BufferedInputStream.java:170) ~[?:1.8.0_181] at java.io.BufferedInputStream.read1(BufferedInputStream.java:291) ~[?:1.8.0_181] at java.io.BufferedInputStream.read(BufferedInputStream.java:345) ~[?:1.8.0_181] at java.io.FilterInputStream.read(FilterInputStream.java:107) ~[?:1.8.0_181] at com.google.common.io.ByteStreams.copy(ByteStreams.java:175) ~[guava-16.0.1.jar:?] at org.apache.druid.indexing.overlord.ForkingTaskRunner$1.call(ForkingTaskRunner.java:438) [druid-indexing-service-0.13.0-incubating.jar:0.13.0-incubating] at org.apache.druid.indexing.overlord.ForkingTaskRunner$1.call(ForkingTaskRunner.java:223) [druid-indexing-service-0.13.0-incubating.jar:0.13.0-incubating] at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_181] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_181] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_181] at java.lang.Thread.run(Thread.java:748) [?:1.8.0_181] 2019-03-18T10:21:53,792 INFO [forking-task-runner-10] org.apache.druid.indexing.overlord.ForkingTaskRunner - Removing task directory: var/druid/task/index_kafka_ds_test2_4d88b40775ab307_cdkfbhbk 2019-03-18T10:21:53,806 INFO [WorkerTaskManager-NoticeHandler] org.apache.druid.indexing.worker.WorkerTaskManager - Job's finished. Completed [index_kafka_ds_test2_4d88b40775ab307_cdkfbhbk] with status [FAILED] 2019-03-18T10:21:54,362 INFO [WorkerTaskManager-NoticeHandler] org.apache.druid.indexing.worker.WorkerTaskManager - Task[index_kafka_ds_test2_4d88b40775ab307_knbgkefd] started. 2019-03-18T10:21:57,111 INFO [forking-task-runner-15] org.apache.druid.indexing.overlord.ForkingTaskRunner - Running command: java -cp ` And overlord.log of the same time is below. `2019-03-18T10:21:53,939 INFO [KafkaSupervisor-ds_test2] org.apache.druid.indexing.overlord.TaskLockbox - TaskLock is now empty: TaskLock{type=EXCLUSIVE, groupId=index_kafka_ds_test2, dataSource=ds_test2, interval=2019-03-18T00:00:00.000Z/2019-03-19T00:00:00.000Z, version=2019-03-18T10:21:47.300Z, priority=75, revoked=false} 2019-03-18T10:21:53,946 INFO [KafkaSupervisor-ds_test2] org.apache.druid.indexing.overlord.MetadataTaskStorage - Deleting TaskLock with id[232]: TaskLock{type=EXCLUSIVE, groupId=index_kafka_ds_test2, dataSource=ds_test2, interval=2019-03-18T00:00:00.000Z/2019-03-19T00:00:00.000Z, version=2019-03-18T10:21:47.300Z, priority=75, revoked=false} 2019-03-18T10:21:53,957 INFO [KafkaSupervisor-ds_test2] org.apache.druid.indexing.overlord.MetadataTaskStorage - Updating task index_kafka_ds_test2_4d88b40775ab307_cdkfbhbk to status: TaskStatus{id=index_kafka_ds_test2_4d88b40775ab307_cdkfbhbk, status=FAILED, duration=-1, errorMsg=null} 2019-03-18T10:21:53,964 INFO [KafkaSupervisor-ds_test2] org.apache.druid.indexing.overlord.TaskQueue - Task done: AbstractTask{id='index_kafka_ds_test2_4d88b40775ab307_cdkfbhbk', groupId='index_kafka_ds_test2', taskResource=TaskResource{availabilityGroup='index_kafka_ds_test2_4d88b40775ab307', requiredCapacity=1}, dataSource='ds_test2', context={checkpoints={"0":{"0":8275}}, IS_INCREMENTAL_HANDOFF_SUPPORTED=true}} 2019-03-18T10:21:53,964 INFO [KafkaSupervisor-ds_test2] org.apache.druid.indexing.kafka.supervisor.KafkaSupervisor - Killing task [index_kafka_ds_test2_4d88b40775ab307_hbdjcjcn] in the task group 2019-03-18T10:21:53,964 INFO [Curator-PathChildrenCache-1] org.apache.druid.indexing.overlord.RemoteTaskRunner - Cleaning up task[index_kafka_ds_test2_4d88b40775ab307_cdkfbhbk] on worker[localhost:8091] 2019-03-18T10:21:53,968 WARN [Curator-PathChildrenCache-1] org.apache.druid.indexing.overlord.TaskQueue - Unknown task completed: index_kafka_ds_test2_4d88b40775ab307_cdkfbhbk 2019-03-18T10:21:53,968 INFO [KafkaSupervisor-ds_test2] org.apache.druid.indexing.overlord.RemoteTaskRunner - Removed task from pending queue: index_kafka_ds_test2_4d88b40775ab307_hbdjcjcn 2019-03-18T10:21:53,968 INFO [KafkaSupervisor-ds_test2] org.apache.druid.indexing.overlord.TaskLockbox - Removing task[index_kafka_ds_test2_4d88b40775ab307_hbdjcjcn] from activeTasks 2019-03-18T10:21:53,968 INFO [Curator-PathChildrenCache-1] org.apache.druid.java.util.emitter.core.LoggingEmitter - Event [{"feed":"metrics","timestamp":"2019-03-18T10:21:53.968Z","service":"druid/overlord","host":"localhost:8090","version":"0.13.0-incubating","metric":"task/run/time","value":34710,"dataSource":"ds_test2","taskId":"index_kafka_ds_test2_4d88b40775ab307_cdkfbhbk","taskStatus":"FAILED","taskType":"index_kafka"}] 2019-03-18T10:21:53,969 INFO [Curator-PathChildrenCache-1] org.apache.druid.indexing.overlord.TaskQueue - Task FAILED: AbstractTask{id='index_kafka_ds_test2_4d88b40775ab307_cdkfbhbk', groupId='index_kafka_ds_test2', taskResource=TaskResource{availabilityGroup='index_kafka_ds_test2_4d88b40775ab307', requiredCapacity=1}, dataSource='ds_test2', context={checkpoints={"0":{"0":8275}}, IS_INCREMENTAL_HANDOFF_SUPPORTED=true}} (34710 run duration) 2019-03-18T10:21:53,969 INFO [Curator-PathChildrenCache-1] org.apache.druid.indexing.overlord.TaskRunnerUtils - Task [index_kafka_ds_test2_4d88b40775ab307_cdkfbhbk] status changed to [FAILED]. 2019-03-18T10:21:53,970 INFO [Curator-PathChildrenCache-1] org.apache.druid.indexing.overlord.RemoteTaskRunner - Task[index_kafka_ds_test2_4d88b40775ab307_cdkfbhbk] went bye bye. 2019-03-18T10:21:53,972 INFO [KafkaSupervisor-ds_test2] org.apache.druid.indexing.overlord.MetadataTaskStorage - Updating task index_kafka_ds_test2_4d88b40775ab307_hbdjcjcn to status: TaskStatus{id=index_kafka_ds_test2_4d88b40775ab307_hbdjcjcn, status=FAILED, duration=-1, errorMsg=null} 2019-03-18T10:21:53,977 INFO [KafkaSupervisor-ds_test2] org.apache.druid.indexing.overlord.TaskQueue - Task done: AbstractTask{id='index_kafka_ds_test2_4d88b40775ab307_hbdjcjcn', groupId='index_kafka_ds_test2', taskResource=TaskResource{availabilityGroup='index_kafka_ds_test2_4d88b40775ab307', requiredCapacity=1}, dataSource='ds_test2', context={checkpoints={"0":{"0":8275}}, IS_INCREMENTAL_HANDOFF_SUPPORTED=true}} 2019-03-18T10:21:53,978 INFO [KafkaSupervisor-ds_test2] org.apache.druid.indexing.kafka.supervisor.KafkaSupervisor - [ds_test2] supervisor is running. 2019-03-18T10:21:53,978 INFO [KafkaSupervisor-ds_test2] org.apache.druid.indexing.kafka.supervisor.KafkaSupervisor - Creating new task group [0] for partitions [0]` And zookeeper.log of the same time is below. `2019-03-18 10:21:45,993 INFO [SyncThread:0] org.apache.zookeeper.server.ZooKeeperServer - Established session 0x169904864e50013 with negotiated timeout 30000 for client /127.0.0.1:50068 2019-03-18 10:21:53,880 INFO [ProcessThread(sid:0 cport:2181):] org.apache.zookeeper.server.PrepRequestProcessor - Processed session termination for sessionid: 0x169904864e5000d 2019-03-18 10:21:53,886 WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181] org.apache.zookeeper.server.NIOServerCnxn - caught end of stream exception EndOfStreamException: Unable to read additional data from client sessionid 0x169904864e5000d, likely client has closed socket at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:228) at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:203) at java.lang.Thread.run(Thread.java:748) 2019-03-18 10:21:53,891 INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181] org.apache.zookeeper.server.NIOServerCnxn - Closed socket connection for client /127.0.0.1:49798 which had sessionid 0x169904864e5000d 2019-03-18 10:21:56,640 INFO [ProcessThread(sid:0 cport:2181):] org.apache.zookeeper.server.PrepRequestProcessor - Processed session termination for sessionid: 0x169904864e50013`
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
