slfan1989 commented on PR #7266: URL: https://github.com/apache/ozone/pull/7266#issuecomment-2918041727
@errose28 Thank you for your message! I'd like to share some thoughts from a different perspective. As it stands, this feature does not conflict with the proposal in #8405. #8405 represents a more innovative and forward-looking design, and although it's still under discussion, it will certainly be valuable if implemented as planned. At the same time, I believe this feature does not impact HDDS-13096 or HDDS-13097. My comment on #8405 was more about expressing expectations for the system’s future capabilities — I hope Ozone can gradually support such features — rather than raising any objections to #8405 itself. The design of #7266 is inspired by HDFS's disk failure detection mechanism, with the goal of improving the system's ability to identify and locate failed disks. For users migrating from HDFS to Ozone, using the volume command to directly view failed disks can offer a more intuitive and convenient operational experience. From my perspective, we all play different roles in this project. Your team focuses on evolving and optimizing the system's architecture, while we, as external users, are more focused on refining specific functional details based on real-world use. Ultimately, however, we share the same goal: to make Ozone more robust, more user-friendly, and more widely adopted. Naturally, it's not easy to fully align these detail-oriented changes with larger, ongoing feature developments — for example, making #7266 fully consistent with #8405. This is mainly because #8405 is broader in scope, with a longer timeline, whereas #7266 focuses on a very specific aspect. While we fully respect the overall direction, we also hope to move forward with some smaller, incremental improvements to address current practical issues. In addition to this PR, we're also working on several other enhancements. For instance, we've implemented mechanisms to collect DataNode I/O statistics to more precisely manage container replication. We've also introduced time-based peak/off-peak control logic for various DataNode management operations (such as deletion, container replication, and EC container reconstruction). These improvements are driven by real-world production needs, and from our perspective, they've shown positive results. However, since many of these PRs have some degree of code coupling with our previous contributions, it's difficult for us to combine everything into a single, unified patch for upstream submission. Therefore, we hope to proceed with #7266 for now. If #8405 later results in a more complete or improved solution, we’d be happy to continue refining things in that direction. In the meantime, this also gives us a valuable opportunity to participate in the community and contribute to Ozone’s development. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
