Re: [PR] HDDS-11463. Track and display failed DataNode storage locations in SCM. [ozone]

via GitHub Wed, 28 May 2025 19:17:47 -0700


slfan1989 commented on PR #7266:
URL: https://github.com/apache/ozone/pull/7266#issuecomment-2918041727


   @errose28 Thank you for your message! I'd like to share some thoughts from a 
different perspective. As it stands, this feature does not conflict with the 
proposal in #8405. #8405 represents a more innovative and forward-looking 
design, and although it's still under discussion, it will certainly be valuable 
if implemented as planned.
   
   At the same time, I believe this feature does not impact HDDS-13096 or 
HDDS-13097. My comment on #8405 was more about expressing expectations for the 
system’s future capabilities — I hope Ozone can gradually support such features 
— rather than raising any objections to #8405 itself.
   
   The design of #7266 is inspired by HDFS's disk failure detection mechanism, 
with the goal of improving the system's ability to identify and locate failed 
disks. For users migrating from HDFS to Ozone, using the volume command to 
directly view failed disks can offer a more intuitive and convenient 
operational experience.
   
   From my perspective, we all play different roles in this project. Your team 
focuses on evolving and optimizing the system's architecture, while we, as 
external users, are more focused on refining specific functional details based 
on real-world use. Ultimately, however, we share the same goal: to make Ozone 
more robust, more user-friendly, and more widely adopted.
   
   Naturally, it's not easy to fully align these detail-oriented changes with 
larger, ongoing feature developments — for example, making #7266 fully 
consistent with #8405. This is mainly because #8405 is broader in scope, with a 
longer timeline, whereas #7266 focuses on a very specific aspect. While we 
fully respect the overall direction, we also hope to move forward with some 
smaller, incremental improvements to address current practical issues.
   
   In addition to this PR, we're also working on several other enhancements. 
For instance, we've implemented mechanisms to collect DataNode I/O statistics 
to more precisely manage container replication. We've also introduced 
time-based peak/off-peak control logic for various DataNode management 
operations (such as deletion, container replication, and EC container 
reconstruction). These improvements are driven by real-world production needs, 
and from our perspective, they've shown positive results.
   
   However, since many of these PRs have some degree of code coupling with our 
previous contributions, it's difficult for us to combine everything into a 
single, unified patch for upstream submission.
   
   Therefore, we hope to proceed with #7266 for now. If #8405 later results in 
a more complete or improved solution, we’d be happy to continue refining things 
in that direction. In the meantime, this also gives us a valuable opportunity 
to participate in the community and contribute to Ozone’s development.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] HDDS-11463. Track and display failed DataNode storage locations in SCM. [ozone]

Reply via email to