[jira] [Commented] (KAFKA-6188) Broker fails with FATAL Shutdown - log dirs have failed

Di Campo (JIRA) Wed, 30 May 2018 03:30:44 -0700


    [ 
https://issues.apache.org/jira/browse/KAFKA-6188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16494982#comment-16494982
 ]


Di Campo commented on KAFKA-6188:
---------------------------------

Hi ,

I faced same issue as [~chubao] when deleting a KafkaStreams produced topic (In 
my case, the failing topic is a State Store managed through Processor API, but 
it is curious we both got the error in an internally-created StateStore-backing 
changelog topic). 
I am on Amazon EFS too. Kafka 1.1.0. Java 9. 3-broker cluster.

Deletion of topic of a state store failed. It failed with arpund 100M over 40M 
<1k events. 
{quote}{{[2018-05-30 08:56:36,193] INFO [ReplicaFetcher replicaId=2, 
leaderId=1, fetcherId=0] Error sending fetch request (sessionId=1854198522, 
epoch=2329887) to node 1: java.io.IOException: Connection to 1 was disconnected 
before the response was read. (org.apache.kafka.clients.FetchSessionHandler)}}
{{[2018-05-30 08:56:36,195] INFO Deleted offset index 
/kafka/kafka-logs-2/stream-processor-lastSessionByChannelStoreName-changelog-13.b9486f127f05418787972d5823506db4-delete/00000000000002914722.index.
 (kafka.log.LogSegment)}}
{{[2018-05-30 08:56:36,203] WARN [ReplicaFetcher replicaId=2, leaderId=1, 
fetcherId=0] Error in response for fetch request (type=FetchRequest, 
replicaId=2, maxWait=500, minBytes=1, maxBytes=10485760, fetchData={}, 
isolationLevel=READ_UNCOMMITTED, toForget=, metadata=(sessionId=1854198522, 
epoch=2329887)) (kafka.server.ReplicaFetcherThread)}}
{{java.io.IOException: Connection to 1 was disconnected before the response was 
read}}
{{ at 
org.apache.kafka.clients.NetworkClientUtils.sendAndReceive(NetworkClientUtils.java:97)}}{quote}

First time I found this there were about 100M events. Then I tried with 10M 
events, and it was deleted OK, no error. Then I tried again with 40M, and it 
failed.

We have read that there 
[are|https://www.slideshare.net/HadoopSummit/apache-kafka-best-practices] 
[issues|https://github.com/strimzi/strimzi/issues/441] on [memory 
mapped|https://thehoard.blog/how-kafkas-storage-internals-work-3a29b02e026] 
files, which Kafka uses for indexes, and that there are issues with those in 
Networked File Systems (such as EFS). May this be related? 

 

> Broker fails with FATAL Shutdown - log dirs have failed
> -------------------------------------------------------
>
>                 Key: KAFKA-6188
>                 URL: https://issues.apache.org/jira/browse/KAFKA-6188
>             Project: Kafka
>          Issue Type: Bug
>          Components: clients, log
>    Affects Versions: 1.0.0, 1.0.1
>         Environment: Windows 10
>            Reporter: Valentina Baljak
>            Priority: Blocker
>              Labels: windows
>         Attachments: Segments are opened before deletion, 
> kafka_2.10-0.10.2.1.zip, output.txt
>
>
> Just started with version 1.0.0 after a 4-5 months of using 0.10.2.1. The 
> test environment is very simple, with only one producer and one consumer. 
> Initially, everything started fine, stand alone tests worked as expected. 
> However, running my code, Kafka clients fail after approximately 10 minutes. 
> Kafka won't start after that and it fails with the same error. 
> Deleting logs helps to start again, and the same problem occurs.
> Here is the error traceback:
> [2017-11-08 08:21:57,532] INFO Starting log cleanup with a period of 300000 
> ms. (kafka.log.LogManager)
> [2017-11-08 08:21:57,548] INFO Starting log flusher with a default period of 
> 9223372036854775807 ms. (kafka.log.LogManager)
> [2017-11-08 08:21:57,798] INFO Awaiting socket connections on 0.0.0.0:9092. 
> (kafka.network.Acceptor)
> [2017-11-08 08:21:57,813] INFO [SocketServer brokerId=0] Started 1 acceptor 
> threads (kafka.network.SocketServer)
> [2017-11-08 08:21:57,829] INFO [ExpirationReaper-0-Produce]: Starting 
> (kafka.server.DelayedOperationPurgatory$ExpiredOperationReaper)
> [2017-11-08 08:21:57,845] INFO [ExpirationReaper-0-DeleteRecords]: Starting 
> (kafka.server.DelayedOperationPurgatory$ExpiredOperationReaper)
> [2017-11-08 08:21:57,845] INFO [ExpirationReaper-0-Fetch]: Starting 
> (kafka.server.DelayedOperationPurgatory$ExpiredOperationReaper)
> [2017-11-08 08:21:57,845] INFO [LogDirFailureHandler]: Starting 
> (kafka.server.ReplicaManager$LogDirFailureHandler)
> [2017-11-08 08:21:57,860] INFO [ReplicaManager broker=0] Stopping serving 
> replicas in dir C:\Kafka\kafka_2.12-1.0.0\kafka-logs 
> (kafka.server.ReplicaManager)
> [2017-11-08 08:21:57,860] INFO [ReplicaManager broker=0] Partitions  are 
> offline due to failure on log directory C:\Kafka\kafka_2.12-1.0.0\kafka-logs 
> (kafka.server.ReplicaManager)
> [2017-11-08 08:21:57,860] INFO [ReplicaFetcherManager on broker 0] Removed 
> fetcher for partitions  (kafka.server.ReplicaFetcherManager)
> [2017-11-08 08:21:57,892] INFO [ReplicaManager broker=0] Broker 0 stopped 
> fetcher for partitions  because they are in the failed log dir 
> C:\Kafka\kafka_2.12-1.0.0\kafka-logs (kafka.server.ReplicaManager)
> [2017-11-08 08:21:57,892] INFO Stopping serving logs in dir 
> C:\Kafka\kafka_2.12-1.0.0\kafka-logs (kafka.log.LogManager)
> [2017-11-08 08:21:57,892] FATAL Shutdown broker because all log dirs in 
> C:\Kafka\kafka_2.12-1.0.0\kafka-logs have failed (kafka.log.LogManager)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (KAFKA-6188) Broker fails with FATAL Shutdown - log dirs have failed

Reply via email to