Scott Carey wrote:
HTTP is very useful and typically performs very well.  It has lots of
things built-in too.  In addition to what you mention, it  has a
caching mechanism built-in, range queries, and all sorts of ways to
tag along state if needed.  To top it off there are a lot of testing
and debugging tools available for it.  So from that front using it is
very attractive.

Glad you agree!

However, In my experience zero-copy is not going to be much of a gain
performance-wise for this sort of application, and will limit what
can be done.  As long as a servlet doesn't transcode data and mostly
copies, it will be very fast - many multiples of gigabit ethernet
speed per CPU - far more than most disk setups will handle for a
while.

In MapReduce, datanodes are also running map and reduce tasks, so we'd like it if datanodes not only keep up with disks and networks, but also use minimal CPU to do so. Zerocopy on the datanode has been shown to help significantly MapReduce benchmarks. That said, zero copy may or may not be significantly better than one-copy. I intend to benchmark that. But the important thing to measure is not just throughput but also idle CPU.

Additionally, I'm not sure CRC checking should occur on the
client.  TCP/IP already checksums packets, so network data corruption
over HTTP is not a primary concern.   The big concern is silent data
corruption on the disk.

I believe that disks are the largest source of data corruption, but I am not confident they are the only source. HDFS uses end-to-end checksums. As data is written to HDFS it is immediately checksummed on the client. This checksum then lives with the data and is validated on the client immediately before the data is returned to the application. The goal is to catch corruption wherever it may occur, on disks, on the network, or while buffered in memory. In addition, the checksum is validated after data is transmitted to datanodes but before before blocks are stored, so that initial network and memory corruptions are caught early and the writing process fails, rather than permitting an application to write corrupt data. Finally, datanodes periodically scan for corrupt blocks on disks, replacing them with non-corrupt replicas, decreasing the chance that over time all replicas become corrupt.

Additionally, embedding Tomcat tends to be more tricky than Jetty,
though that can be overcome.  One might argue that we don't even want
a servlet container, we just want an HTTP connector.  The Servlet API
is familiar, but for a high performance transport it might just be
overhead and restrictive.  Direct access to Tomcat's NIO connector
might be significantly lighter-weight and more flexible. Tomcat's NIO
connector implementation works great and I have had great success
with up to 10K connections with the pure Java connector using
ordinary byte buffers and about 20 servlet threads.

I hope to start benchmarking bulk data RPC over the next few weeks. I'll probably start with a servlet using Jetty, then see if I can increase throughput and decrease CPU utilization through the use of things like Tomcat's NIO connector, Grizzly, etc.

Doug

Reply via email to