Re: Python DFS/DHT project

Darren Govoni Thu, 18 Feb 2010 15:55:33 -0800

Hi,
  Thanks for the remarks. One thing about mogilefs, similar to hadoop is
it requires a database and manager object to track the distributed
blocks. This was a design requirement I wanted to avoid, however (and
there are some tradeoffs for it).


But actually, from what I read HDFS divides large files into blocks and
can store redundant copies of files by replicating the blocks. 

This is one definition of striping, where each stripe spans multiple
servers with copies of blocks located elsewhere (e.g. stripe 0 and
stripe 1 contain the same blocks but on different servers). 

Perhaps HDFS is not exactly like this, but the point of it in RAID and
my system is also the same in HDFS which is fault-tolerance through
redundancy and distribution across servers.

Cheers,
Darren

On Thu, 2010-02-18 at 16:52 -0500, Edward Capriolo wrote:

> On Thu, Feb 18, 2010 at 4:29 PM, Darren Govoni <dar...@ontrenet.com> wrote:
> > Hi,
> >  I'm developing a python DFS/DHT and Software RAID file system that
> > resembles Hadoop (among others).
> > I wanted to convey the traits of my filesystem and see how it compares
> > to HDFS but my aim is to develop different capabilities, not the exact
> > same. Basically, what my DFS can do now is:
> >
> > - zero-conf distributed file system. No node manager or database.
> > - fully de-centralized, distributed. Peer like. No single point of
> > failure.
> > - Can stripe and stagger blocks across servers with any level of
> > RAID/redundancy.
> > - Can retrieve files by key from any node in the mesh.
> > - Reconstructs ordered blocks from the mesh on-the-fly.
> > - Flat keyspace
> > - Can reconstruct the keyspace on-the-fly, from any node. There is no
> > database.
> > - Entirely stateless. Requires no database or persistent information.
> > - Fault-tolerant streaming.
> > - Auto-matic performance load balancing
> > - Auto-matic diskspace load balancing
> > - Pluggable blocking classes. When adding files to the DFS, they can be
> > chopped into blocks using pluggable classes with different rules.
> > - Small and fast footprint (less than 20k of Python code)
> > - HTTP/web friendly. Any URL client can use it.
> >
> > There's more, but I wanted to craft my requirements to provide new or
> > different capabilities from HDFS.
> > So is this identical to HDFS? Does it sound useful? Thanks for any
> > thoughts.
> >
> > Darren
> >
> 
> Your description sounds much like http://www.danga.com/mogilefs/
> 
> 
> Many of your points like:
> > - Can stripe and stagger blocks across servers with any level of
> > RAID/redundancy.
> 
> Seem like carryover ideas from typical large/multi-disk storage systems.
> That is different then hadoop in many ways. HDFS was created to be a
> file system for batch systems. Supporting striping or RAID levels has
> not been the target of hadoop.
> 
> If you want to compare what you are looking to do with what hadoop does
> http://hadoop.apache.org/common/docs/current/hdfs_design.html
> should help you.

Re: Python DFS/DHT project

Reply via email to