[jira] [Created] (KAFKA-12164) ssue when kafka connect worker pod restart, during creation of nested partition directories in hdfs file system.

kaushik srinivas (Jira) Thu, 07 Jan 2021 23:24:16 -0800

kaushik srinivas created KAFKA-12164:
----------------------------------------

Summary: ssue when kafka connect worker pod restart, during
creation of nested partition directories in hdfs file system.
Key: KAFKA-12164
URL: https://issues.apache.org/jira/browse/KAFKA-12164
Project: Kafka
Issue Type: Bug
Components: KafkaConnect
Reporter: kaushik srinivas

In our production labs, an issue is observed. Below is the sequence of the same.
# hdfs connector is added to the connect worker.
# hdfs connector is creating folders in hdfs /test1=1/test2=2/
Based on the custom partitioner. Here test1 and test2 are two separate nested
directories derived from multiple fields in the record using a custom
partitioner.
# Now kafka connect hdfs connector uses below function calls to create the
directories in the hdfs file system.
fs.mkdirs(new Path(filename));
ref:
[https://github.com/confluentinc/kafka-connect-hdfs/blob/master/src/main/java/io/confluent/connect/hdfs/storage/HdfsStorage.java]

Now the important thing to note is that if mkdirs() is a non atomic operation
(i.e can result in partial execution if interrupted)
then suppose the first directory ie test1 is created and just before creation
of test2 in hdfs happens if there is a restart to the connect worker pod. Then
the hdfs file system will remain with partial folders created for partitions
during the restart time frames.

So we might have conditions in hdfs as below
/test1=0/test2=0/
/test1=1/
/test1=2/test2=2
/test1=3/test2=3

So the second partition has a missing directory in it. And if hive integration
is enabled, hive metastore exceptions will occur since there is a partition
expected from hive table is missing for few partitions in hdfs.

*This can occur to any connector with some ongoing non atomic operation and a
restart is triggered to kafka connect worker pod. This will result in some
partially completed states in the system and may cause issues for the connector
to continue its operation*.

*This is a very critical issue and needs some attention on ways for handling
the same.*

--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (KAFKA-12164) ssue when kafka connect worker pod restart, during creation of nested partition directories in hdfs file system.

Reply via email to