Petr Pchelko created KAFKA-7156:
-----------------------------------

             Summary: Deleting topics with long names can bring all brokers to 
unrecoverable state
                 Key: KAFKA-7156
                 URL: https://issues.apache.org/jira/browse/KAFKA-7156
             Project: Kafka
          Issue Type: Bug
          Components: core
    Affects Versions: 1.1.0
            Reporter: Petr Pchelko


Kafka limit for the topic name is 249 symbols, so creating a topic with a name 
248 symbol long is possible. However, when deleting the topic, Kafka tries to 
rename the data directory for the topic to add some hash and `-deleted` in the 
data directory, so that the resulting file name exceeds the 255 symbol file 
name limit in most of the Unix file systems. This provokes a  
java.nio.file.FileSystemException which in turn immediately shuts down all the 
brokers. Further attemts to restart the broker fail with the same exception. 
The only way to resurrect the cluster is to manually delete the affected topic 
from zookeeper and from the disk on all the broker machines.

 

Steps to reproduce:

(Note: delete.topic.enable=true must be set in the config)

 
{code:java}
> kafka-topics.sh --zookeeper localhost:2181 --create --topic 
> aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
>  --partitions 1 --replication-factor 1
> kafka-topics.sh --zookeeper localhost:2181 --delete --topic 
> aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
{code}
 

 

After these 2 commands executed all the brokers where this topic is replicated 
immediately shut down with the following logs:

 
{code:java}
ERROR Error while renaming dir for 
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa-0
 in log dir /tmp/kafka-logs (kafka.server.LogDirFailureChannel)

java.nio.file.FileSystemException: 
/tmp/kafka-logs/aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa-0
 -> 
/tmp/kafka-logs/aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa-0.093fd1e1728f438ea990cbad8a514b9f-delete:
 File name too long

at sun.nio.fs.UnixException.translateToIOException(UnixException.java:91)

at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)

at sun.nio.fs.UnixCopyFile.move(UnixCopyFile.java:457)

at sun.nio.fs.UnixFileSystemProvider.move(UnixFileSystemProvider.java:262)

at java.nio.file.Files.move(Files.java:1395)

...

Suppressed: java.nio.file.FileSystemException: 
/tmp/kafka-logs/aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa-0
 -> 
/tmp/kafka-logs/aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa-0.093fd1e1728f438ea990cbad8a514b9f-delete:
 File name too long

at sun.nio.fs.UnixException.translateToIOException(UnixException.java:91)

at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)

at sun.nio.fs.UnixCopyFile.move(UnixCopyFile.java:396)

at sun.nio.fs.UnixFileSystemProvider.move(UnixFileSystemProvider.java:262)

at java.nio.file.Files.move(Files.java:1395)

at org.apache.kafka.common.utils.Utils.atomicMoveWithFallback(Utils.java:694)

... 23 more

[2018-07-12 13:34:45,847] INFO [ReplicaManager broker=0] Stopping serving 
replicas in dir /tmp/kafka-logs (kafka.server.ReplicaManager)

[2018-07-12 13:34:45,848] INFO [ReplicaFetcherManager on broker 0] Removed 
fetcher for partitions  (kafka.server.ReplicaFetcherManager)

[2018-07-12 13:34:45,849] INFO [ReplicaAlterLogDirsManager on broker 0] Removed 
fetcher for partitions  (kafka.server.ReplicaAlterLogDirsManager)

[2018-07-12 13:34:45,851] INFO [ReplicaManager broker=0] Broker 0 stopped 
fetcher for partitions  and stopped moving logs for partitions  because they 
are in the failed log directory /tmp/kafka-logs. (kafka.server.ReplicaManager)

[2018-07-12 13:34:45,851] INFO Stopping serving logs in dir /tmp/kafka-logs 
(kafka.log.LogManager)

[2018-07-12 13:34:45,854] ERROR Shutdown broker because all log dirs in 
/tmp/kafka-logs have failed (kafka.log.LogManager)

[2018-07-12 13:34:46,264] WARN Exception causing close of session 
0x1648e0b3ec80004 due to java.io.IOException: Connection reset by peer 
(org.apache.zookeeper.server.NIOServerCnxn)

[2018-07-12 13:34:46,264] INFO Closed socket connection for client 
/0:0:0:0:0:0:0:1:63972 which had sessionid 0x1648e0b3ec80004 
(org.apache.zookeeper.server.NIOServerCnxn)
{code}
 

Note, that 
{code:java}
[2018-07-12 13:34:45,854] ERROR Shutdown broker because all log dirs in 
/tmp/kafka-logs have failed (kafka.log.LogManager){code}
is happening regardless whether the topic with a long name is the only one on 
the broker or not.

 

Further attempts to restart the brokers fail with the same error until all the 
mentions of the deleted topic is removed from Zookeeper and the files are 
removed from the data directories on all the brokers.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to