[
https://issues.apache.org/jira/browse/KAFKA-6188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16476061#comment-16476061
]
M. Manna edited comment on KAFKA-6188 at 5/15/18 4:01 PM:
----------------------------------------------------------
[~TeilaRei] and [~darion] I am not sure if this version makes any difference.
'inter.broker.protocol.version' is default to the correction version 1.1-IV0
which means for 1.1.0 it shouldn't halt. I have tried it with fresh 3 node and
1 node cluster using OOB setup. My log and offset cleanup sizes were small and
retention period was also small to trigger a quick test. It still breaks on
Windows.
The stack trace which [[email protected]] and myself have investigated
shows that this will almost certainly happen if the segment file channels are
open/closed at the same time cleaner thread is trying to clean/read it.
Closing the log and index files didn't help - When you start the broker it
cleans the files nicely, but the problem arises when expired offsets are being
cleaned using LogCleaner$CleanerThread. I hope this helps.
was (Author: [email protected]):
[~TeilaRei] and [~darion] I am not sure if this version makes any difference.
'inter.broker.protocol.version' is default to the correction version 1.1-IV0
which means for 1.1.0 it shouldn't halt.
The stack trace which [[email protected]] and myself have investigated shows
that this will almost certainly happen if the segment file channels are
open/closed at the same time cleaner thread is trying to clean/read it.
Closing the log and index files didn't help - When you start the broker it
cleans the files nicely, but the problem arises when expired offsets are being
cleaned using LogCleaner$CleanerThread. I hope this helps.
> Broker fails with FATAL Shutdown - log dirs have failed
> -------------------------------------------------------
>
> Key: KAFKA-6188
> URL: https://issues.apache.org/jira/browse/KAFKA-6188
> Project: Kafka
> Issue Type: Bug
> Components: clients, log
> Affects Versions: 1.0.0, 1.0.1
> Environment: Windows 10
> Reporter: Valentina Baljak
> Priority: Blocker
> Labels: windows
>
> Just started with version 1.0.0 after a 4-5 months of using 0.10.2.1. The
> test environment is very simple, with only one producer and one consumer.
> Initially, everything started fine, stand alone tests worked as expected.
> However, running my code, Kafka clients fail after approximately 10 minutes.
> Kafka won't start after that and it fails with the same error.
> Deleting logs helps to start again, and the same problem occurs.
> Here is the error traceback:
> [2017-11-08 08:21:57,532] INFO Starting log cleanup with a period of 300000
> ms. (kafka.log.LogManager)
> [2017-11-08 08:21:57,548] INFO Starting log flusher with a default period of
> 9223372036854775807 ms. (kafka.log.LogManager)
> [2017-11-08 08:21:57,798] INFO Awaiting socket connections on 0.0.0.0:9092.
> (kafka.network.Acceptor)
> [2017-11-08 08:21:57,813] INFO [SocketServer brokerId=0] Started 1 acceptor
> threads (kafka.network.SocketServer)
> [2017-11-08 08:21:57,829] INFO [ExpirationReaper-0-Produce]: Starting
> (kafka.server.DelayedOperationPurgatory$ExpiredOperationReaper)
> [2017-11-08 08:21:57,845] INFO [ExpirationReaper-0-DeleteRecords]: Starting
> (kafka.server.DelayedOperationPurgatory$ExpiredOperationReaper)
> [2017-11-08 08:21:57,845] INFO [ExpirationReaper-0-Fetch]: Starting
> (kafka.server.DelayedOperationPurgatory$ExpiredOperationReaper)
> [2017-11-08 08:21:57,845] INFO [LogDirFailureHandler]: Starting
> (kafka.server.ReplicaManager$LogDirFailureHandler)
> [2017-11-08 08:21:57,860] INFO [ReplicaManager broker=0] Stopping serving
> replicas in dir C:\Kafka\kafka_2.12-1.0.0\kafka-logs
> (kafka.server.ReplicaManager)
> [2017-11-08 08:21:57,860] INFO [ReplicaManager broker=0] Partitions are
> offline due to failure on log directory C:\Kafka\kafka_2.12-1.0.0\kafka-logs
> (kafka.server.ReplicaManager)
> [2017-11-08 08:21:57,860] INFO [ReplicaFetcherManager on broker 0] Removed
> fetcher for partitions (kafka.server.ReplicaFetcherManager)
> [2017-11-08 08:21:57,892] INFO [ReplicaManager broker=0] Broker 0 stopped
> fetcher for partitions because they are in the failed log dir
> C:\Kafka\kafka_2.12-1.0.0\kafka-logs (kafka.server.ReplicaManager)
> [2017-11-08 08:21:57,892] INFO Stopping serving logs in dir
> C:\Kafka\kafka_2.12-1.0.0\kafka-logs (kafka.log.LogManager)
> [2017-11-08 08:21:57,892] FATAL Shutdown broker because all log dirs in
> C:\Kafka\kafka_2.12-1.0.0\kafka-logs have failed (kafka.log.LogManager)
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)