Hi Sergey Onuchin,

Sorry to hear that. But we could not give some suggestions based on the
only information you mentioned.
If any more on-site information may be better to trace, such as
depoy architecture, NameNode log and jstack etc.
Based on my practice, I did not receive some cases which delete directory
without noise.
Did you try to check operations (rename and delete) about the
parent-directory?
Good luck!

Best Regards,
- He Xiaoqiao


On Mon, Oct 16, 2023 at 11:58 PM <
hdfs-issues-reject-1697471875.2027154.pkchcedhioidkhech...@hadoop.apache.org>
wrote:

>
> ---------- Forwarded message ----------
> From: Sergey Onuchin <sergey.onuc...@acronis.com.invalid>
> To: "hdfs-iss...@hadoop.apache.org" <hdfs-iss...@hadoop.apache.org>
> Cc:
> Bcc:
> Date: Mon, 16 Oct 2023 15:57:47 +0000
> Subject: HDFS loses directories with production data
>
> Hello,
>
>
>
> We’ve been using Hadoop (+Spark) for 3 years on production w/o major
> issues.
>
>
>
> Lately we observe that whole non-empty directories (table partitions) are
> disappearing in random ways.
>
> We see in application logs (and in hdfs-audit) logs creation of the
> directory + data files.
>
> Then later we see NO this directory in HDFS.
>
>
>
> hdfs-audit.log shows no traces of deletes or renames for the disappeared
> directories.
>
> We can trust these logs, as we see our manual operations are present in
> the logs.
>
>
>
> Time between creation and disappearing is 1-2 days.
>
>
>
> Maybe we are losing individual files as well, we just cannot find this out
> reliably.
>
>
>
> This is a blocker issue for us, we have to stop production data processing
> until we find out and fix data loss root cause.
>
>
>
> Please help to identify the root cause or find the right direction for
> search/further questions.
>
>
>
>
>
> -- Hadoop version: ------------------------------
>
> Hadoop 3.2.1
>
> Source code repository https://gitbox.apache.org/repos/asf/hadoop.git -r
> b3cbbb467e22ea829b3808f4b7b01d07e0bf3842
>
> Compiled by rohithsharmaks on 2019-09-10T15:56Z
>
> Compiled with protoc 2.5.0
>
> From source with checksum 776eaf9eee9c0ffc370bcbc1888737
>
>
>
> Thank you!
>
> Sergey Onuchin
>
>
>

Reply via email to