Kevin,

You present a good discussion of architectural alternatives here.  But my
comment really had more to do with whether a particular HDFS patch would
provide what the original poster seemed to be asking about.  This is
especially pertinent since the patch was intended to scratch a different
itch.

On Tue, Apr 12, 2011 at 5:51 AM, <kevin.le...@thomsonreuters.com> wrote:

> This is the age old argument of what to share in a partitioned
> environment. IBM and Teradata have always used "shared nothing" which is
> what only getting one chunk of the file in each hadoop node is doing.
> Oracle has always used "shared disk" which is not an easy thing to do,
> especially in scale, and seems to have varying results depending on
> application, transaction or dss. Here are a couple of web references.
>
> http://www.informatik.uni-trier.de/~ley/db/conf/vldb/Bhide88.html
>
> http://jhingran.typepad.com/anant_jhingrans_musings/2010/02/shared-nothi
> ng-vs-shared-disks-the-cloud-sequel.html
>
> Rather than say shared nothing isn't useful, hadoop should look to how
> others make this work. The two key problems to avoid are data skew where
> one node sees to much data and becomes the slow node and large
> intra-partition joins where large data is needed from more than one
> partition and potentially gets copied around.
>
> Rather than hybriding into shared disk, I think hadoop should hybrid
> into the shared data solutions others use, replication of select data,
> for solving intra-partition joins in a "shared nothing" architecture.
> This may be more database terminology that could be addressed by hbase,
> but I think it is good background for the questions of memory mapping
> files in hadoop.
>
> Kevin
>
>
> -----Original Message-----
> From: Ted Dunning [mailto:tdunn...@maprtech.com]
> Sent: Tuesday, April 12, 2011 12:09 AM
> To: Jason Rutherglen
> Cc: common-user@hadoop.apache.org; Edward Capriolo
> Subject: Re: Memory mapped resources
>
> Yes.  But only one such block. That is what I meant by chunk.
>
> That is fine if you want that chunk but if you want to mmap the entire
> file,
> it isn't real useful.
>
> On Mon, Apr 11, 2011 at 6:48 PM, Jason Rutherglen <
> jason.rutherg...@gmail.com> wrote:
>
> > What do you mean by local chunk?  I think it's providing access to the
> > underlying file block?
> >
> > On Mon, Apr 11, 2011 at 6:30 PM, Ted Dunning <tdunn...@maprtech.com>
> > wrote:
> > > Also, it only provides access to a local chunk of a file which isn't
> very
> > > useful.
> > >
> > > On Mon, Apr 11, 2011 at 5:32 PM, Edward Capriolo
> <edlinuxg...@gmail.com>
> > > wrote:
> > >>
> > >> On Mon, Apr 11, 2011 at 7:05 PM, Jason Rutherglen
> > >> <jason.rutherg...@gmail.com> wrote:
> > >> > Yes you can however it will require customization of HDFS.  Take
> a
> > >> > look at HDFS-347 specifically the HDFS-347-branch-20-append.txt
> patch.
> > >> >  I have been altering it for use with HBASE-3529.  Note that the
> patch
> > >> > noted is for the -append branch which is mainly for HBase.
> > >> >
> > >> > On Mon, Apr 11, 2011 at 3:57 PM, Benson Margulies
> > >> > <bimargul...@gmail.com> wrote:
> > >> >> We have some very large files that we access via memory mapping
> in
> > >> >> Java. Someone's asked us about how to make this conveniently
> > >> >> deployable in Hadoop. If we tell them to put the files into
> hdfs, can
> > >> >> we obtain a File for the underlying file on any given node?
> > >> >>
> > >> >
> > >>
> > >> This features it not yet part of hadoop so doing this is not
> > "convenient".
> > >
> > >
> >
>

Reply via email to