An hdfs-347 that checksums is over in a the hadoop branch that fb published over on github (Dhruba and Jon pointed me at it); i've been meaning to put the patch up in the hdfs-347 issue.
St.Ack On Fri, Jun 3, 2011 at 4:42 PM, Jason Rutherglen <[email protected]> wrote: > I think one'd need to checksum only once on the first file system > instantiation, or first access of the file? As mentioned in > HDFS-2004, HBase's usage of HDFS is outside of the initial design > motivation. Eg, the rules may need to be bent in order to enable > performant use of HBase with HDFS. The idea of working with HDFS at > the block level becomes [likely] more important. > > On Fri, Jun 3, 2011 at 3:57 PM, Kihwal Lee <[email protected]> wrote: >> When I tried HDFS-941, the new bottleneck was checksum. So the performance >> may drop significantly if checksum is added and enabled in HDFS-347. >> >> Kihwal >> >> >> On 6/3/11 5:46 PM, "Andrew Purtell" <[email protected]> wrote: >> >> Yes, and though I have patches, and I'm happy to provide them if you want... >> >> Indeed, 347 doesn't do security or checksums so needs work to say the least. >> We use it with HBase given a privileged role such that it shares >> group-readable DFS data directories with the DataNodes. It works for us, >> though checksumming is on the to do list. >> >> And I agree 947 is scary. However I did pull the last incarnation of 947 >> attached to the jira into CDH3U0 for some ongoing testing with real load, >> combined with 918, which we did put into production. >> >> - Andy >> >> --- On Fri, 6/3/11, Todd Lipcon <[email protected]> wrote: >> >>> From: Todd Lipcon <[email protected]> >>> Subject: Re: HDFS-1599 status? (HDFS tickets to improve HBase) >>> To: [email protected] >>> Date: Friday, June 3, 2011, 1:09 PM >>> On Fri, Jun 3, 2011 at 12:50 PM, Doug >>> Meil >>> <[email protected]> >>> wrote: >>> > Thanks everybody for commenting on this thread. >>> > >>> > We'd certainly like to lobby for movement on these two >>> tickets, and although we don't have anybody that is familiar >>> with the source code we'd be happy to perform some tests get >>> some performance numbers. >>> > >>> > Per Kihwal's comments, it sounds like HDFS-941 needs >>> to get re-worked because the patch is stale. >>> > >>> >>> Yes - bc Wong, the originally contributor, works with me but on >>> unrelated projects. HDFS-941 was something he did as part of a >>> "hackathon" but only gets occasional time to circle back on it. As we >>> last left it, there were just a few things that had to be addressed. >>> If someone wants to finish it up, and volunteer to test it under some >>> real load, I'd be happy to review and commit. >>> >>> > The patch for HDFS-347 sounds like it's still usable. >>> >>> The current patch for 347 is unworkable since it doesn't do >>> checksums or security. The FD-passing approach was working at some >>> point but basically needs to be re-done on trunk. >>> >>> I think doing HDFS-941 and HDFS-918 first is best, then more drastic >>> things like 347 can be considered. >> >> >> >
