HBase is a database that runs on top of HDFS. So that's another one. It has an 
append-only usage pattern, which makes it a good fit.

I don't see how not-so-commodity hardware could go without replication to 
achieve the same as HDFS. It's not only about data safety, but also about 
availability. HDFS can survive complete machines dying, RAM banks going bad, 
motherboards going haywire and network partitions going off line because of 
switch failures. Anything that needs to survive a single box or single rack 
failure needs replication, I guess. That said, I think when you have two boxes 
doing 2PC for writes and each box is in itself setup with redundant storage 
(RAID or otherwise), you get a faster filesystem that is fully redundant. It 
would not make a nice fit for MapReduce though.


Friso




On 25 jan 2011, at 21:37, Nathan Rutman wrote:

I have a very general question on the usefulness of HDFS for purposes other 
than running distributed compute jobs for Hadoop.  Hadoop and HDFS seem very 
popular these days, but the use of HDFS for other purposes (database backend, 
records archiving, etc) confuses me, since there are other free distributed 
filesystems out there (I personally work on Lustre), with significantly better 
general-purpose performance.

So please tell me if I'm wrong about any of this.  Note I've gathered most of 
my info from documentation rather than reading the source code.

As I understand it, HDFS was written specifically for Hadoop compute jobs, with 
the following design factors in mind:

  *   write-once-read-many (worm) access model
  *   use commodity hardware with relatively high failures rates (i.e. 
assumptive failures)
  *   long, sequential streaming data access
  *   large files
  *   hardware/OS agnostic
  *   moving computation is cheaper than moving data

While appropriate for processing many large-input Hadoop data-processing jobs, 
there are significant penalties to be paid when trying to use these design 
factors for more general-purpose storage:

  *   Commodity hardware requires data replication for safety.  The HDFS 
implementation has three penalties: storage redundancy, network loading, and 
blocking writes.  By default, HDFS blocks are replicated 3x: local, "nearby", 
and "far away" to minimize the impact of data center catastrophe.  In addition 
to the obvious 3x cost for storage, the result is that every data block must be 
written "far away" - exactly the opposite of the "Move Computation to Data" 
mantra.  Furthermore, these over-network writes are synchronous; the client 
write blocks until all copies are complete on disk, with the longest latency 
path of 2 network hops plus a disk write gating the overall write speed.   Note 
that while this would be disastrous for a general-purpose filesystem, with true 
WORM usage it may be acceptable to penalize writes this way.
  *   Large block size implies fewer files.  HDFS reaches limits in the tens of 
millions of files.
  *   Large block size wastes space for small file.  The minimum file size is 1 
block.
  *   There is no data caching.  When delivering large contiguous streaming 
data, this doesn't matter.  But when the read load is random, seeky, or 
partial, this is a missing high-impact performance feature.
  *   In a WORM model, changing a small part of a file requires all the file 
data to be copied, so e.g. database record modifications would be very 
expensive.
  *   There are no hardlinks, softlinks, or quotas.
  *   HDFS isn't directly mountable, and therefore requires a non-standard API 
to use.  (FUSE workaround exists.)
  *   Java source code is very portable and easy to install, but not very quick.
  *   Moving computation is cheaper than moving data.  But the data nonetheless 
always has to be moved: either read off of a local hard drive or read over the 
network into the compute node's memory.  It is not necessarily the case that 
reading a local hard drive is faster than reading a distributed (striped) file 
over a fast network.  Commodity network (e.g. 1GigE), probably yes.  But a fast 
(and expensive) network (e.g. 4xDDR Infiniband) can deliver data significantly 
faster than a local commodity hard drive.

If I'm missing other points, pro- or con-, I would appreciate hearing them.  
Note again I'm not questioning the success of HDFS in achieving those stated 
design choices, but rather trying to understand HDFS's applicability to other 
storage domains beyond Hadoop.

Thanks for your time.


Reply via email to