Google is currently in the middle of upgrading from ext2 to a more up to date file system. We ended up choosing ext4. This thread touches upon many of the issues we wrestled with, so I thought it would be interesting to share. We should be sending out more details soon.
The driving performance reason to upgrade is that while ext2 had been "good enough" for a very long time the metadata arrangement on a stale file system was leading to what we call "read inflation". This is where we end up doing many seeks to read one block of data. In general latency from poor block allocation was causing performance hiccups. We spent a lot of time with unix standard benchmarks (dbench, compile bench, et al) on xfs, ext4, jfs to try to see which one was going to perform the best. In the end we mostly ended up using the benchmarks to validate our assumptions and do functional testing. Larry is completely right IMHO. These benchmarks were instrumental in helping us understand how the file systems worked in controlled situations and gain confidence from our customers. For our workloads we saw ext4 and xfs as "close enough" in performance in the areas we cared about. The fact that we had a much smoother upgrade path with ext4 clinched the deal. The only upgrade option we have is online. ext4 is already moving the bottleneck away from the storage stack for some of our most intensive applications. It was not until we moved from benchmarks to customer workload that we were able to make detailed performance comparisons and find bugs in our implementation. "Iterate often" seems to be the winning strategy for SW dev. But when it involves rebooting a cloud of systems and making a one way conversion of their data it can get messy. That said I see benchmarks as tools to build confidence before running traffic on redundant live systems. mrubin PS for some reason "dbench" holds mythical power over many folks I have met. They just believe it's the most trusted and standard benchmark for file systems. In my experience it often acts as a random number generator. It has found some bugs in our code as it exercises the VFS layer very well. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html