One scenario I've seen in practice is HFiles corrupted due to incomplete write. No trailer. For example, an incomplete memstore flush. But, one can still scan what is available to recover the KVs and write a new storefile with a complete and valid structure.
Other scenarios all involve reconciliation between META contents and what is actually on disk. Maybe a failed split where the daughters were created but META was not updated. On disk, maybe one daughter was fully created but the other is incomplete. These scenarios all involve region inspection and sanity checking, and decisions whether to roll a failed split forward or back. Also, maybe even total META reconstruction, if it was hosed somehow. > if a regionserver crashes during an upload, how do I know what has been > lost? From where do I restart the upload? If we can guarantee that when a client side flush completes successfully, that everything has for sure been written, then the uploader can track this. It can control its flush strategy according to its own needs and can consider each successful flush a checkpoint. Right? - Andy ________________________________ From: stack <[email protected]> To: [email protected] Sent: Friday, August 7, 2009 9:13:21 AM Subject: Re: roadmap: data integrity On Thu, Aug 6, 2009 at 10:25 AM, Andrew Purtell <[email protected]> wrote: > I updated the roadmap up on the wiki: > > > * Data integrity > * Insure that proper append() support in HDFS actually closes the > WAL last block write hole > * HBase-FSCK (HBASE-7) -- Suggest making this a blocker for 0.21 > > I have had several recent conversations on my travels with people in > Fortune 100 companies (based on this list: > http://www.wageproject.org/content/fortune/index.php). I like that links' topic matter. The question is I think does the above align with project goals. > Making HBase-FSCK a blocker will probably knock something someone > wants for the 0.21 timeframe off the list. > I think topic of integrity is a good one to raise at this time. Its about time for a (re)visit. Is there enough information in the filesystem for an hbasck tool to do its reconstruction work? Regions now have .regioninfo files written to them on creation with regioninfo written to them, the hfiles have first, last, and sequenceids as metadata in them. What else do we need to fully-reconstruct tables when, ${deity} forbid (<-I like this one), there is a catastrophic crash? A requirement of any hbsfck is that it finish promptly (MR job?). It should not be one of those tools that chew for hours on end spinning disks while a progress bar crawls to completion. One area that for sure could do with review is logsplitting and then replay of edits on region redeploy. We've not given this the attention it deserves ensuring we are not dropping edits mostly because we've just presumed loss because up to this there has been no working flush/append. Another interesting question I was asked recently was, if a regionserver crashes during an upload, how do I know what has been lost? From where do I restart the upload? St.Ack
