Comments inline below. --- Jim Kellerman, Senior Engineer; Powerset
> -----Original Message----- > From: Naama Kraus [mailto:[EMAIL PROTECTED] > Sent: Sunday, June 15, 2008 3:39 AM > To: [email protected] > Subject: HBase and locality issues > > Hi, > > I have some questions regarding HBase and locality issues - > I'd appreciate some explanations and clarifications. > > I understand HBase is built on top of HDFS. > Say an HRegionServer creates a HStoreFile where it puts some > column family content. Does HDFS split the file to multiple > HDFS blocks and distributes them around bunch of machines ? Yes. HStoreFile is currently implemented using org.apache.hadoop.io.MapFile > If that's the case, when the region server needs to actually > access the files, does HDFS underneath communicates remote > machines to read the various blocks ? Sometimes. If a requested block is local, HDFS will try to get that one. > Doesn't it hurt performance since there is no locality in data access > (region server actually works on remote blocks). Somewhat. We have other areas that we have identified as larger performance bottlenecks that need to be addressed first. > Or is the HStoreFile implemented in some other way which > writes it to the local disks of the region server node > machine that owns it ? No. Blocks are placed according to HDFS strategies. > If so, then how ? Does this code overrides the HDFS behavior ? It doesn't. > Another related question is about Map Reduce and HBase. When > a MapReduce job runs on top of HBase - i.e. gets a table as > an input. How does the MapReduce framework know how to > schedule map tasks near data ? Does it have any knowledge of > the actual location of the data pieces composing the table to > be processed ? No. It is on our list of things to do. See HBASE-57 > I'd be also glad to get pointers to the related source code (classes). > > Thanks for any information, > Naama > > -- > oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 > oo 00 oo 00 oo 00 oo 00 oo "If you want your children to be > intelligent, read them fairy tales. If you want them to be > more intelligent, read them more fairy tales." (Albert > Einstein) No virus found in this outgoing message. Checked by AVG. Version: 8.0.100 / Virus Database: 270.3.0/1503 - Release Date: 6/14/2008 6:02 PM
