Re: HDFS architecture based on GFS?

Matei Zaharia Sun, 15 Feb 2009 14:41:10 -0800

Forgot to add, this JIRA details the latest security features that are being
worked on in Hadoop trunk: https://issues.apache.org/jira/browse/HADOOP-4487.
This document describes the current status and limitations of the
permissions mechanism:
http://hadoop.apache.org/core/docs/current/hdfs_permissions_guide.html.


On Sun, Feb 15, 2009 at 2:35 PM, Matei Zaharia <ma...@cloudera.com> wrote:

> I think it's safe to assume that Hadoop works like MapReduce/GFS at the
> level described in those papers. In particular, in HDFS, there is a master
> node containing metadata and a number of slave nodes (datanodes) containing
> blocks, as in GFS. Clients start by talking to the master to list
> directories, etc. When they want to read a region of some file, they tell
> the master the filename and offset, and they receive a list of block
> locations (datanodes). They then contact the individual datanodes to read
> the blocks. When clients write a file, they first obtain a new block ID and
> list of nodes to write it to from the master, then contact the datanodes to
> write it (actually, the datanodes pipeline the write as in GFS) and report
> when the write is complete. HDFS actually has some security mechanisms built
> in, authenticating users based on their Unix ID and providing Unix-like file
> permissions. I don't know much about how these are implemented, but they
> would be a good place to start looking.
>
> On Sun, Feb 15, 2009 at 1:36 PM, Amandeep Khurana <ama...@gmail.com>wrote:
>
>> Thanks Matie
>>
>> I had gone through the architecture document online. I am currently
>> working
>> on a project towards Security in Hadoop. I do know how the data moves
>> around
>> in the GFS but wasnt sure how much of that does HDFS follow and how
>> different it is from GFS. Can you throw some light on that?
>>
>> Security would also involve the Map Reduce jobs following the same
>> protocols. Thats why the question about how does the Hadoop framework
>> integrate with the HDFS, and how different is it from Map Reduce and GFS.
>> The GFS and Map Reduce papers give a good information on how those systems
>> are designed but there is nothing that concrete for Hadoop that I have
>> been
>> able to find.
>>
>> Amandeep
>>
>>
>> Amandeep Khurana
>> Computer Science Graduate Student
>> University of California, Santa Cruz
>>
>>
>> On Sun, Feb 15, 2009 at 12:07 PM, Matei Zaharia <ma...@cloudera.com>
>> wrote:
>>
>> > Hi Amandeep,
>> > Hadoop is definitely inspired by MapReduce/GFS and aims to provide those
>> > capabilities as an open-source project. HDFS is similar to GFS (large
>> > blocks, replication, etc); some notable things missing are read-write
>> > support in the middle of a file (unlikely to be provided because few
>> Hadoop
>> > applications require it) and multiple appenders (the record append
>> > operation). You can read about HDFS architecture at
>> > http://hadoop.apache.org/core/docs/current/hdfs_design.html. The
>> MapReduce
>> > part of Hadoop interacts with HDFS in the same way that Google's
>> MapReduce
>> > interacts with GFS (shipping computation to the data), although Hadoop
>> > MapReduce also supports running over other distributed filesystems.
>> >
>> > Matei
>> >
>> > On Sun, Feb 15, 2009 at 11:57 AM, Amandeep Khurana <ama...@gmail.com>
>> > wrote:
>> >
>> > > Hi
>> > >
>> > > Is the HDFS architecture completely based on the Google Filesystem? If
>> it
>> > > isnt, what are the differences between the two?
>> > >
>> > > Secondly, is the coupling between Hadoop and HDFS same as how it is
>> > between
>> > > the Google's version of Map Reduce and GFS?
>> > >
>> > > Amandeep
>> > >
>> > >
>> > > Amandeep Khurana
>> > > Computer Science Graduate Student
>> > > University of California, Santa Cruz
>> > >
>> >
>>
>
>

Re: HDFS architecture based on GFS?

Reply via email to