Kevin, You present a good discussion of architectural alternatives here. But my comment really had more to do with whether a particular HDFS patch would provide what the original poster seemed to be asking about. This is especially pertinent since the patch was intended to scratch a different itch.
On Tue, Apr 12, 2011 at 5:51 AM, <kevin.le...@thomsonreuters.com> wrote: > This is the age old argument of what to share in a partitioned > environment. IBM and Teradata have always used "shared nothing" which is > what only getting one chunk of the file in each hadoop node is doing. > Oracle has always used "shared disk" which is not an easy thing to do, > especially in scale, and seems to have varying results depending on > application, transaction or dss. Here are a couple of web references. > > http://www.informatik.uni-trier.de/~ley/db/conf/vldb/Bhide88.html > > http://jhingran.typepad.com/anant_jhingrans_musings/2010/02/shared-nothi > ng-vs-shared-disks-the-cloud-sequel.html > > Rather than say shared nothing isn't useful, hadoop should look to how > others make this work. The two key problems to avoid are data skew where > one node sees to much data and becomes the slow node and large > intra-partition joins where large data is needed from more than one > partition and potentially gets copied around. > > Rather than hybriding into shared disk, I think hadoop should hybrid > into the shared data solutions others use, replication of select data, > for solving intra-partition joins in a "shared nothing" architecture. > This may be more database terminology that could be addressed by hbase, > but I think it is good background for the questions of memory mapping > files in hadoop. > > Kevin > > > -----Original Message----- > From: Ted Dunning [mailto:tdunn...@maprtech.com] > Sent: Tuesday, April 12, 2011 12:09 AM > To: Jason Rutherglen > Cc: common-user@hadoop.apache.org; Edward Capriolo > Subject: Re: Memory mapped resources > > Yes. But only one such block. That is what I meant by chunk. > > That is fine if you want that chunk but if you want to mmap the entire > file, > it isn't real useful. > > On Mon, Apr 11, 2011 at 6:48 PM, Jason Rutherglen < > jason.rutherg...@gmail.com> wrote: > > > What do you mean by local chunk? I think it's providing access to the > > underlying file block? > > > > On Mon, Apr 11, 2011 at 6:30 PM, Ted Dunning <tdunn...@maprtech.com> > > wrote: > > > Also, it only provides access to a local chunk of a file which isn't > very > > > useful. > > > > > > On Mon, Apr 11, 2011 at 5:32 PM, Edward Capriolo > <edlinuxg...@gmail.com> > > > wrote: > > >> > > >> On Mon, Apr 11, 2011 at 7:05 PM, Jason Rutherglen > > >> <jason.rutherg...@gmail.com> wrote: > > >> > Yes you can however it will require customization of HDFS. Take > a > > >> > look at HDFS-347 specifically the HDFS-347-branch-20-append.txt > patch. > > >> > I have been altering it for use with HBASE-3529. Note that the > patch > > >> > noted is for the -append branch which is mainly for HBase. > > >> > > > >> > On Mon, Apr 11, 2011 at 3:57 PM, Benson Margulies > > >> > <bimargul...@gmail.com> wrote: > > >> >> We have some very large files that we access via memory mapping > in > > >> >> Java. Someone's asked us about how to make this conveniently > > >> >> deployable in Hadoop. If we tell them to put the files into > hdfs, can > > >> >> we obtain a File for the underlying file on any given node? > > >> >> > > >> > > > >> > > >> This features it not yet part of hadoop so doing this is not > > "convenient". > > > > > > > > >