another question is how data locality is maintained while storing a file data (considering we use hdfs as the layer between h/w and hbase). As per bigtable, they store data as per column families... and I assume hbase to be doing so. Is it like each col family is having a separate hfile(really?)... If yes, how the relation with a row key is maintained with all such files.
[Zhu, I hope you don't mind stealing your thread... but my questions are somewhat related to yours, so I did it that way (smile)...] Thanks, ~Himanshu On Sat, Aug 28, 2010 at 10:21 PM, zhixuan zhu <[email protected]>wrote: > Ryan, > > Thanks for your quick response. > > Since HFiles can not be modified in the HDFS once written, I guess the > write > buffer take all this modified data block in buffer and overwrite the whole > HDFS data block corresponding to the HFile changed before. > > I need reread the bigTable papers, always has questions... > > Thanks > > > > On Sat, Aug 28, 2010 at 11:58 PM, Ryan Rawson <[email protected]> wrote: > > > Hfiles are write once read many. Once written they cannot be modified so > > there is way to move things around. > > > > Hbase deals with this by having a robust write buffer and writing large > > files. > > > > For more architectural details check out the bigtable paper. > > > > On Aug 28, 2010 8:32 PM, "zhixuan zhu" <[email protected]> wrote: > > > hey guys, > > > > > > I am studying the HFiles now and have a couple of questions. > > > > > > The keys in the HFiles are sorted. So when a key is inserted into a > data > > > block which is full and the key is smaller than the greatest key in > this > > > data block and greater than the smallest key in this data block. In > this > > > case, does the data block need reorganize? say keys greater than the > > > inserted keys into next data block?. > > > > > > And if a value for a key needs update, how is this achieved in HFIle? > > > > > > Appreciate your time for answering my questions! > > > > > > Thanks > > > > > > Tim Zhu > > >
