I understand - yes, if constraints are limited to local variables and
there is no "read & put" being done, we're fine. I think I
misunderstood the original intent of "constraint" to include read &
put and missed the "intercept and check whether the column values are
of the right data type / format" use case.

I think what you explained sounds very reasonable and we can defer the
check & put constraint to HBASE-4999

Thanks for the detailed clarification.
--Suraj

On Sun, Dec 11, 2011 at 10:26 PM, Jesse Yates <[email protected]> wrote:
> Adding constraints was originally developed with the idea of checking
> incoming writes for validity, based on their internal properties. For
> instance, checking to make sure that a value is in the range 1-10, or that
> it is an integer, or not over a certain length (you get the idea).
>
> If the row lock is released between the time the coprocessor finishes
>> "preXXXX" checks and the core mutation method is invoked (as has been
>> discussed in this thread), how can the Constraint be ensured? If two
>> requests are being processed in parallel, there is every possibility
>> that both requests pass the "Constraint" check individually, but break
>> it together (e.g. even simple checks like column value == 10 would
>> break if two requests fire concurrently).
>>
>> So - I'm questioning whether a pure Coprocessor implementation alone
>> would be sufficient?
>>
>
> If you have a situation where are just doing the straight put (or
> checkAndPut) then the constraint will not affect the atomicity or
> consistency of writes to the table. Even if one of the puts finishes the
> constraint first, the single row will not be corrupted and the 'fastest'
> write will win. The underlying region takes care of breaking the tie and
> ensuring that writes get serialized to the table.
>
> Its true that the transactional nature will be an issue if you are doing a
> read on the table first. From HRegion.internalPut(), we can see that we
> first get a readlock and then, when doing the put actually do serialize the
> writes into the memstore, enforcing the consistency and atomicity of the
> update. However, there are no check that the prePut checking is going to
> enforce that atomicity. From internal put, we see:
>
> /* run pre put hook outside of lock to avoid deadlock */
>    if (coprocessorHost != null) {
>      if (coprocessorHost.prePut(put, walEdit, writeToWAL)) {
>        return;
>      }
>    }
>
> So yes, this doesn't ensure that we are going to get specific ordering or
> even a fully consistent view of the underlying data. However, as long as
> each constraint just uses local variables, each thread interacting with the
> constraint will be completely fine.
>
> Yeah, some documentation needs to be done on constraints as far as not
> using static fields or expecting multiple thread interaction. However, the
> checking ensures that the field passes - the underlying store takes care of
> ensuring atomicity and consistency.
>
> Take your example of column value == 10. Two different requests go into the
> HRegion. They each run the Equals10Constraint in a different thread. The
> constraint allows each to pass (lets assume here that they both do ==10).
> Then it will hit the remainder of HRegion.internalPut(), go through the
> updatesLock and then also get serialized in the call to
> HRegion.applyFamilyMapToMemstore(familyMap, null). So the atomicity of the
> writes are preserved. This also works for n concurrent writes and where not
> all writes pass the constraint.
>
>
>
>>
>> I think we'll need an approach that makes the constraint checking and
>> mutation to be _atomically_ achieved
>> a) either by taking a row lock and passing that into put / checkAndPut
>> b) referencing & checking the constraint directly from within the put
>> / checkAndPut methods (like we do with the comparator, for instance)
>>
>>
> However, if we wanted to do a read of the table, within the coprocessor, we
> will want to actually pass in the rowlock so we can ensure that we don't
> get a contention/atomicity issue. This could be pretty easily handled by
> setting the rowlock on the Put that is passed into prePut (which either
> resets that current rowlock or sets it to the found rowlock). Then the
> underlying constraint could make that accessible if we wanted to scan the
> region.
>
> I also just did a quick test to modifying TestSimpleRegionObserver to do a
> read from the region when doing the prePut() and came up working fine - no
> deadlock issues. Checked for doing bothPut and checkAndPut, and didn't come
> up with anything locking.
>
> Heavy concurrent use only will be slowed down by the reads, but nothing
> should be broken. We might be able to do some internal work on constraints
> and create a CheckAndWriteConstriant that will be used by makes it easier
> to check the value already present. Doing something like that would be best
> handled in HBASE-4999.
>
>
> Does that seem reasonable?
>
> -Jesse
>
>
>>
>>
>> On Fri, Dec 9, 2011 at 1:31 PM, Suraj Varma <[email protected]> wrote:
>> > Hi:
>> > I opened a jira ticket on this:
>> https://issues.apache.org/jira/browse/HBASE-4999
>> >
>> > I have linked to HBASE-4605 in the description to show related work on
>> > Constraints by Jesse.
>> >
>> > Thanks!
>> > --Suraj
>> >
>> > On Sun, Dec 4, 2011 at 1:10 PM, Ted Yu <[email protected]> wrote:
>> >> Currently ConstraintProcessor latches onto prePut() to perform
>> validation
>> >> check.
>> >>
>> >> From HRegion.doMiniBatchPut() where prePut() is called:
>> >>    /* Run coprocessor pre hook outside of locks to avoid deadlock */
>> >> So to make use of Constraint in Suraj's scenario, we have some
>> decisions to
>> >> make about various factors.
>> >>
>> >> Cheers
>> >>
>> >> On Sun, Dec 4, 2011 at 8:39 AM, Suraj Varma <[email protected]>
>> wrote:
>> >>
>> >>> Jesse:
>> >>> >> Quick soln - write a CP to check the single row (blocking the put).
>> >>>
>> >>> Yeah - given that I want this to be atomically done, I'm wondering if
>> >>> this would even work (because, I believe I'd need to unlock the row so
>> >>> that the checkAndMutate can take the lock - so, there is a brief
>> >>> window between where there is no lock being held - and some other
>> >>> thread could take that lock). One option would be to pass in a lock to
>> >>> checkAndMutate ... but that would increase the locking period and may
>> >>> have performance implications, I think.
>> >>>
>> >>> I see a lot of potential in the Constraints implementation - it would
>> >>> really open up CAS operations to do functional constraint checking,
>> >>> rather than just value comparisons.
>> >>>
>> >>> --Suraj
>> >>>
>> >>> On Sun, Dec 4, 2011 at 8:32 AM, Suraj Varma <[email protected]>
>> wrote:
>> >>> > Thanks - I see that the lock is taken internal to checkAndMutate.
>> >>> >
>> >>> > I'm wondering whether it is a better idea to actually pass in a
>> >>> > Constraint (or even Constraints) as the checkAndMutate argument.
>> Right
>> >>> > now it is taking in an Comparator and a CompareOp for verification.
>> >>> > But, this could just be a special case of Constraint which is
>> >>> > evaluated within the lock.
>> >>> >
>> >>> > In other words, we could open up a richer Constraint checking api
>> >>> > where any "functional" Constraint check can be performed in the
>> >>> > checkAndPut operation.
>> >>> >
>> >>> > This would also not have the same performance impact of taking a
>> >>> > rowLock in preCheckAndPut and release in postCheckAndPut. And - it is
>> >>> > really (in my mind) implementing the compare-and-set more
>> generically.
>> >>> >
>> >>> > I also see the potential of passing in multiple constraints (say
>> >>> > upper/lower bounds in Increment/Decrement operations) etc.
>> >>> >
>> >>> > --Suraj
>> >>> >
>> >>> >
>> >>> > On Sat, Dec 3, 2011 at 7:44 PM, Ted Yu <[email protected]> wrote:
>> >>> >> From HRegionServer.checkAndPut():
>> >>> >>    if (region.getCoprocessorHost() != null) {
>> >>> >>      Boolean result = region.getCoprocessorHost()
>> >>> >>        .preCheckAndPut(row, family, qualifier, CompareOp.EQUAL,
>> >>> comparator,
>> >>> >>          put);
>> >>> >> ...
>> >>> >>    boolean result = checkAndMutate(regionName, row, family,
>> qualifier,
>> >>> >>      CompareOp.EQUAL, new BinaryComparator(value), put,
>> >>> >>      lock);
>> >>> >> We can see that the lock isn't taken for preCheckAndPut().
>> >>> >>
>> >>> >> To satisfy Suraj's requirement, I think a slight change to
>> >>> checkAndPut() is
>> >>> >> needed so that atomicity can be achieved across preCheckAndPut() and
>> >>> >> checkAndMutate().
>> >>> >>
>> >>> >> Cheers
>> >>> >>
>> >>> >> On Sat, Dec 3, 2011 at 4:54 PM, Suraj Varma <[email protected]>
>> >>> wrote:
>> >>> >>
>> >>> >>> Just so my question is clear ... everything I'm suggesting is in
>> the
>> >>> >>> context of a single row (not cross row / table). - so, yes, I'm
>> >>> >>> guessing obtaining a RowLock on the region side during
>> preCheckAndPut
>> >>> >>> / postCheckAndPut would certainly work. Which was why I was asking
>> >>> >>> whether the pre/postCheckAndPut obtains the row lock or whether the
>> >>> >>> row lock is only obtained within checkAndPut.
>> >>> >>>
>> >>> >>> Let's say the coprocessor takes a rowlock in preCheckAndPut ...
>> will
>> >>> >>> that even work? i.e. can the same rowlock be inherited by the
>> >>> >>> checkAndPut api within that thread's context? Or will
>> preCheckAndPut
>> >>> >>> have to release the lock so that checkAndPut can take it (which
>> won't
>> >>> >>> work for my case, as it has to be atomic between the preCheck and
>> >>> >>> Put.)
>> >>> >>>
>> >>> >>> Thanks for pointing me to the Constraints functionality - I'll
>> take a
>> >>> >>> look at whether it could potentially work.
>> >>> >>> --Suraj
>> >>> >>>
>> >>> >>> On Sat, Dec 3, 2011 at 10:25 AM, Jesse Yates <
>> [email protected]>
>> >>> >>> wrote:
>> >>> >>> > I think the feature you are looking for is a Constraint.
>> Currently
>> >>> they
>> >>> >>> are
>> >>> >>> > being added to 0.94 in
>> >>> >>> > HBASE-4605<https://issues.apache.org/jira/browse/HBASE-4605>;
>> >>> >>> > they are almost ready to be rolled in, and backporting to 0.92 is
>> >>> >>> > definitely doable.
>> >>> >>> >
>> >>> >>> > However, Constraints aren't going to be quite flexible enough to
>> >>> >>> > efficiently support what you are describing. For instance, with a
>> >>> >>> > constraint, you are ideally just checking the put value against
>> some
>> >>> >>> simple
>> >>> >>> > constraint (never over 10 or always an integer), but looking at
>> the
>> >>> >>> current
>> >>> >>> > state of the table before allowing the put would currently
>> require
>> >>> >>> creating
>> >>> >>> > a full blown connection to the local table through another
>> HTable.
>> >>> >>> >
>> >>> >>> > In the short term, you could write a simple coprocessor to do
>> this
>> >>> >>> checking
>> >>> >>> > and then move over to constraints (which are a simpler, more
>> >>> flexible,
>> >>> >>> way
>> >>> >>> > of doing this) when the necessary features have been added.
>> >>> >>> >
>> >>> >>> > It is worth discussing if it makes sense to have access to the
>> local
>> >>> >>> region
>> >>> >>> > through a constraint, though that breaks the idea a little bit,
>> it
>> >>> would
>> >>> >>> > certainly be useful and not overly wasteful in terms of runtime.
>> >>> >>> >
>> >>> >>> > Supposing the feature would be added to talk to the local table,
>> and
>> >>> >>> since
>> >>> >>> > the puts are going to be serialized on the regionserver (at
>> least to
>> >>> that
>> >>> >>> > single row you are trying to update), you will never get a
>> situation
>> >>> >>> where
>> >>> >>> > the value added is over the threshold. If you were really worried
>> >>> about
>> >>> >>> the
>> >>> >>> > atomicity of the operation, then when doing the put, first get
>> the
>> >>> >>> RowLock,
>> >>> >>> > then do the put and release the RowLock. However, that latter
>> method
>> >>> is
>> >>> >>> > going to be really slow, so should only be used as a stop gap if
>> the
>> >>> >>> > constraint doesn't work as expected, until a patch is made for
>> >>> >>> constraints.
>> >>> >>> >
>> >>> >>> > Feel free to open up a ticket and link it to 4605 for adding the
>> >>> local
>> >>> >>> > table access functionality, and we can discuss the de/merits of
>> >>> adding
>> >>> >>> the
>> >>> >>> > access.
>> >>> >>> >
>> >>> >>> > -Jesse
>> >>> >>> >
>> >>> >>> > On Sat, Dec 3, 2011 at 6:24 AM, Suraj Varma <[email protected]
>> >
>> >>> wrote:
>> >>> >>> >
>> >>> >>> >> I'm looking at the preCheckAndPut / postCheckAndPut api with
>> >>> >>> >> coprocessors and I'm wondering ... are these pre/post checks
>> done
>> >>> >>> >> _after_ taking the row lock or is the row lock only done within
>> the
>> >>> >>> >> checkAndPut api.
>> >>> >>> >>
>> >>> >>> >> I'm interested in seeing if we can implement something like:
>> >>> >>> >> (in pseudo sql)
>> >>> >>> >> update table-name
>> >>> >>> >> set column-name = new-value
>> >>> >>> >> where (column-value - new-value) > threshold-value
>> >>> >>> >>
>> >>> >>> >> Basically ... I want to enhance the checkAndPut to not just
>> compare
>> >>> >>> >> "values" ... but apply an arbitrary function on the value
>> >>> _atomically_
>> >>> >>> >> in the Put call. Multiple threads would be firing these
>> mutations
>> >>> and
>> >>> >>> >> I'd like the threshold-value above to never be breached under
>> any
>> >>> >>> >> circumstance.
>> >>> >>> >>
>> >>> >>> >> Is there a solution that can be implemented either via
>> checkAndPut
>> >>> or
>> >>> >>> >> using coprocessors preCheckAndPut? If not, would this be a
>> useful
>> >>> >>> >> feature to build in HBase?
>> >>> >>> >>
>> >>> >>> >> Thanks,
>> >>> >>> >> --Suraj
>> >>> >>> >>
>> >>> >>> >
>> >>> >>> >
>> >>> >>> >
>> >>> >>> > --
>> >>> >>> > -------------------
>> >>> >>> > Jesse Yates
>> >>> >>> > 240-888-2200
>> >>> >>> > @jesse_yates
>> >>> >>>
>> >>>
>>
>
> --
> -------------------
> Jesse Yates
> 240-888-2200
> @jesse_yates

Reply via email to