Re: HTable checkAndPut equivalent for Deletes

Michael Dalton Fri, 30 Apr 2010 15:27:08 -0700

Deletes would be fine if I was always comfortable deleting a row, whether or
not the row existed. In my application, I'd need to perform a check on a
cell which may result in that cell's deletion. So let's say I read in a
cell, determine that it's supposed to be deleted, then commit a Delete. I
want to ensure that, in between the check and Delete, someone doesn't
overwrite that cell with a new value which actually should not be deleted.
I'm concerned about intervening Put's updating a cell's value to a new value
that I don't want to delete. I don't see a way to solve that problem without
some form of atomicity (be it RowLocks, a new atomic checkAndDelete
primitive, etc).


Best regards,

Mike

On Fri, Apr 30, 2010 at 2:58 PM, Jonathan Gray <jg...@facebook.com> wrote:

> One option would be to just do the delete.  Deletes are cheap and nothing
> bad will happen if you delete data which doesn't exist (unless you do the
> delete latest version which does require a value to exist).
>
> > -----Original Message-----
> > From: Michael Dalton [mailto:mwdal...@gmail.com]
> > Sent: Friday, April 30, 2010 2:51 PM
> > To: hbase-user@hadoop.apache.org
> > Subject: HTable checkAndPut equivalent for Deletes
> >
> > Hi everyone,
> >
> > I have a quick question -- I'd like to do a simple atomic check-and-
> > Delete
> > for a row. For Put operations, HTable.checkAndPut appears to allow a
> > simple
> > atomic compare-and-update, which is great. However, there doesn't seem
> > to be
> > an equivalent function for deletes.
> >
> > I was thinking about approximating this by writing NULL or zero-length
> > byte
> > array as a value in a Put to emulating deleting a cell. It appears that
> > checkAndPut already treats a zero-length array as equivalent to a
> > non-existent value when performing its comparison (before committing
> > the
> > Put). The only drawback I can see to this is that I never truly remove
> > rows,
> > I just end up with 'dead' rows containing empty byte arrays, so I'd
> > imagine
> > that every N hours or days I would need to garbage collect these empty
> > rows
> > somehow (which brings us back full circle to the issue of how to
> > atomically
> > check and delete a row).
> >
> > The only real alternative I can see for doing this would be to emulate
> > checkAndDelete by using RowLocks to lock the row, perform a Get, verify
> > that
> > the row contains the expected value, then perform a delete, and then
> > unlock
> > the row itself. Correct me if I'm wrong, but this should definitely
> > emulate
> > the semantics of atomic compare-and-Delete (assuming the compare and
> > delete
> > operate on the same row and use the RowLock). However, I'm not sure
> > what the
> > performance would be for using RowLocks to emulate checkAndDelete on
> > the
> > client side vs. using Put+checkAndPut to emulate checkAndDelete on the
> > server side. Does anyone have any advice on this issue, or any idea
> > what the
> > relative tradeoffs are?
> >
> > In the long run, it seems to me that the clearly optimal solution would
> > be
> > to have a checkAndDelete function in HTable, and I'd be interesting in
> > adding this functionality if no one else is currently working on it. Is
> > that
> > something that would be interesting to integrate and worth committing
> > back
> > to mainline? Are there any hidden pitfalls I should be aware of, or
> > some
> > technical/design reason for why this API call doesn't already exist? If
> > not,
> > I'll take a hard look at the delete and checkAndPut code in the
> > regionserver
> > and once sometime soon open an issue in JIRA and start coding.
> >
> > Best regards,
> >
> > Mike
>

Re: HTable checkAndPut equivalent for Deletes

Reply via email to