Nigel Daley wrote:
So shouldn't fixing this test to conform to the new model in HADOOP-1134
be the concern of the patch for HADOOP-1134?
Yes, but, as it stands, this patch would silently stop working correctly
once HADOOP-1134 is committed. It should instead be written in a more
robust way, that can survive expected changes. Relying on HDFS using
ChecksumFileSystem isn't as reliable as an explicit constructor that
says "I want an unchecksummed FileSystem."
As it stand, I can't run
NNBench at scale without using a raw file system, which is what this
patch is intended to allow.
It seems strange to disable things in an undocumented and unsupported
way in order to get a benchmark to complete. How does that prove
scalability? Rather, leaving NNBench alone seems like a strong argument
for implementing HADOOP-1134 sooner.
Still, if you want to be able to disable checksums, for benchmarks or
whatever, we can permit that, but should do so explicitly.
HADOOP-928 caused this test to use a
ChecksumFileSystem and subsequently we saw our "read" TPS metric plummet
from 20,000 to a couple hundred.
Ah, NNBench used the 'raw' methods before, which was kind of sneaky on
its part, since it didn't benchmark the typical user experience.
Although the namenode performance should only halve at worst with
checksums as currently implemented, no?
Let's get our current benchmark back on track before we commit
HADOOP-1134 (which will likely take a while before it is "Patch
Available").
I'd argue that we should fix the benchmark to accurately reflect what
users see, so that we see real improvement when HADOOP-1134 is
committed. That would make it a more useful and realistic benchmark.
However if you believe that a checksum-free benchmark is still useful, I
think it should be more future-proof.
Doug
- Re: [jira] Updated: (HADOOP-1180) NNbench test should be abl... Doug Cutting
-