That sounds like a very effective way for developers to kill clusters with compactions :)
J-D On Wed, Jun 19, 2013 at 2:39 PM, Kevin O'dell <[email protected]> wrote: > JD, > > What about adding a flag for the delete, something like -full or > -true(it is early). Once we issue the delete to the proper row/region we > run a flush, then execute a single region major compaction. That way, if > it is a single record, or a subset of data the impact is minimal. If the > delete happens to hit every region we will compact every region(not ideal). > Another thought would be an overwrite, but with versions this logic > becomes more complicated. > > > On Wed, Jun 19, 2013 at 8:31 AM, Jean-Daniel Cryans > <[email protected]>wrote: > >> Hey devs, >> >> I was presenting at GOTO Amsterdam yesterday and I got a question >> about a scenario that I've never thought about before. I'm wondering >> what others think. >> >> How do you efficiently wipe out random data in HBase? >> >> For example, you have a website and a user asks you to close their >> account and get rid of the data. >> >> Would you say "sure can do, lemme just issue a couple of Deletes!" and >> call it a day? What if you really have to delete the data, not just >> mask it, because of contractual obligations or local laws? >> >> Major compacting is the obvious solution but it seems really >> inefficient. Let's say you've got some truly random data to delete and >> it happens so that you have at least one row per region to get rid >> of... then you need to basically rewrite the whole table? >> >> My answer was such, and I told the attendee that it's not an easy use >> case to manage in HBase. >> >> Thoughts? >> >> J-D >> > > > > -- > Kevin O'Dell > Systems Engineer, Cloudera
