Hey folks, We at LinkedIn have been working for a while on a scale testing and performance evaluation tool for HDFS and particularly the NameNode, which we call Dynamometer. It is now open source: you can view it on our GitHub page [1], and read about our motivations and design in our blog post [2]. The Dynamometer framework sets up a NameNode and DataNodes inside of YARN containers to create a full-scale HDFS cluster, just without any actual data, and then starts a MapReduce job which is used to replay audit log traces to generate realistic load. We’ve been using this internally for quite a while and have found it to be very useful for verifying changes before they go live on our production clusters, quantifying the performance of releases, and investigating the performance implications of potential patches. We hope that you will all find it useful as well and invite your contributions and feedback.
Thanks, Erik Krogen HDFS @ LinkedIn [1]: https://github.com/linkedin/dynamometer [2]: https://engineering.linkedin.com/blog/2018/02/dynamometer--scale-testing-hdfs-on-minimal-hardware-with-maximum --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org