Nigel Daley wrote:
So shouldn't fixing this test to conform to the new model in HADOOP-1134 be the concern of the patch for HADOOP-1134?

Yes, but, as it stands, this patch would silently stop working correctly once HADOOP-1134 is committed. It should instead be written in a more robust way, that can survive expected changes. Relying on HDFS using ChecksumFileSystem isn't as reliable as an explicit constructor that says "I want an unchecksummed FileSystem."

As it stand, I can't run NNBench at scale without using a raw file system, which is what this patch is intended to allow.

It seems strange to disable things in an undocumented and unsupported way in order to get a benchmark to complete. How does that prove scalability? Rather, leaving NNBench alone seems like a strong argument for implementing HADOOP-1134 sooner.

Still, if you want to be able to disable checksums, for benchmarks or whatever, we can permit that, but should do so explicitly.

HADOOP-928 caused this test to use a ChecksumFileSystem and subsequently we saw our "read" TPS metric plummet from 20,000 to a couple hundred.

Ah, NNBench used the 'raw' methods before, which was kind of sneaky on its part, since it didn't benchmark the typical user experience. Although the namenode performance should only halve at worst with checksums as currently implemented, no?

Let's get our current benchmark back on track before we commit HADOOP-1134 (which will likely take a while before it is "Patch Available").

I'd argue that we should fix the benchmark to accurately reflect what users see, so that we see real improvement when HADOOP-1134 is committed. That would make it a more useful and realistic benchmark. However if you believe that a checksum-free benchmark is still useful, I think it should be more future-proof.

Doug

Reply via email to