Deletes would be fine if I was always comfortable deleting a row, whether or not the row existed. In my application, I'd need to perform a check on a cell which may result in that cell's deletion. So let's say I read in a cell, determine that it's supposed to be deleted, then commit a Delete. I want to ensure that, in between the check and Delete, someone doesn't overwrite that cell with a new value which actually should not be deleted. I'm concerned about intervening Put's updating a cell's value to a new value that I don't want to delete. I don't see a way to solve that problem without some form of atomicity (be it RowLocks, a new atomic checkAndDelete primitive, etc).
Best regards, Mike On Fri, Apr 30, 2010 at 2:58 PM, Jonathan Gray <jg...@facebook.com> wrote: > One option would be to just do the delete. Deletes are cheap and nothing > bad will happen if you delete data which doesn't exist (unless you do the > delete latest version which does require a value to exist). > > > -----Original Message----- > > From: Michael Dalton [mailto:mwdal...@gmail.com] > > Sent: Friday, April 30, 2010 2:51 PM > > To: hbase-user@hadoop.apache.org > > Subject: HTable checkAndPut equivalent for Deletes > > > > Hi everyone, > > > > I have a quick question -- I'd like to do a simple atomic check-and- > > Delete > > for a row. For Put operations, HTable.checkAndPut appears to allow a > > simple > > atomic compare-and-update, which is great. However, there doesn't seem > > to be > > an equivalent function for deletes. > > > > I was thinking about approximating this by writing NULL or zero-length > > byte > > array as a value in a Put to emulating deleting a cell. It appears that > > checkAndPut already treats a zero-length array as equivalent to a > > non-existent value when performing its comparison (before committing > > the > > Put). The only drawback I can see to this is that I never truly remove > > rows, > > I just end up with 'dead' rows containing empty byte arrays, so I'd > > imagine > > that every N hours or days I would need to garbage collect these empty > > rows > > somehow (which brings us back full circle to the issue of how to > > atomically > > check and delete a row). > > > > The only real alternative I can see for doing this would be to emulate > > checkAndDelete by using RowLocks to lock the row, perform a Get, verify > > that > > the row contains the expected value, then perform a delete, and then > > unlock > > the row itself. Correct me if I'm wrong, but this should definitely > > emulate > > the semantics of atomic compare-and-Delete (assuming the compare and > > delete > > operate on the same row and use the RowLock). However, I'm not sure > > what the > > performance would be for using RowLocks to emulate checkAndDelete on > > the > > client side vs. using Put+checkAndPut to emulate checkAndDelete on the > > server side. Does anyone have any advice on this issue, or any idea > > what the > > relative tradeoffs are? > > > > In the long run, it seems to me that the clearly optimal solution would > > be > > to have a checkAndDelete function in HTable, and I'd be interesting in > > adding this functionality if no one else is currently working on it. Is > > that > > something that would be interesting to integrate and worth committing > > back > > to mainline? Are there any hidden pitfalls I should be aware of, or > > some > > technical/design reason for why this API call doesn't already exist? If > > not, > > I'll take a hard look at the delete and checkAndPut code in the > > regionserver > > and once sometime soon open an issue in JIRA and start coding. > > > > Best regards, > > > > Mike >