I understand - yes, if constraints are limited to local variables and there is no "read & put" being done, we're fine. I think I misunderstood the original intent of "constraint" to include read & put and missed the "intercept and check whether the column values are of the right data type / format" use case.
I think what you explained sounds very reasonable and we can defer the check & put constraint to HBASE-4999 Thanks for the detailed clarification. --Suraj On Sun, Dec 11, 2011 at 10:26 PM, Jesse Yates <[email protected]> wrote: > Adding constraints was originally developed with the idea of checking > incoming writes for validity, based on their internal properties. For > instance, checking to make sure that a value is in the range 1-10, or that > it is an integer, or not over a certain length (you get the idea). > > If the row lock is released between the time the coprocessor finishes >> "preXXXX" checks and the core mutation method is invoked (as has been >> discussed in this thread), how can the Constraint be ensured? If two >> requests are being processed in parallel, there is every possibility >> that both requests pass the "Constraint" check individually, but break >> it together (e.g. even simple checks like column value == 10 would >> break if two requests fire concurrently). >> >> So - I'm questioning whether a pure Coprocessor implementation alone >> would be sufficient? >> > > If you have a situation where are just doing the straight put (or > checkAndPut) then the constraint will not affect the atomicity or > consistency of writes to the table. Even if one of the puts finishes the > constraint first, the single row will not be corrupted and the 'fastest' > write will win. The underlying region takes care of breaking the tie and > ensuring that writes get serialized to the table. > > Its true that the transactional nature will be an issue if you are doing a > read on the table first. From HRegion.internalPut(), we can see that we > first get a readlock and then, when doing the put actually do serialize the > writes into the memstore, enforcing the consistency and atomicity of the > update. However, there are no check that the prePut checking is going to > enforce that atomicity. From internal put, we see: > > /* run pre put hook outside of lock to avoid deadlock */ > if (coprocessorHost != null) { > if (coprocessorHost.prePut(put, walEdit, writeToWAL)) { > return; > } > } > > So yes, this doesn't ensure that we are going to get specific ordering or > even a fully consistent view of the underlying data. However, as long as > each constraint just uses local variables, each thread interacting with the > constraint will be completely fine. > > Yeah, some documentation needs to be done on constraints as far as not > using static fields or expecting multiple thread interaction. However, the > checking ensures that the field passes - the underlying store takes care of > ensuring atomicity and consistency. > > Take your example of column value == 10. Two different requests go into the > HRegion. They each run the Equals10Constraint in a different thread. The > constraint allows each to pass (lets assume here that they both do ==10). > Then it will hit the remainder of HRegion.internalPut(), go through the > updatesLock and then also get serialized in the call to > HRegion.applyFamilyMapToMemstore(familyMap, null). So the atomicity of the > writes are preserved. This also works for n concurrent writes and where not > all writes pass the constraint. > > > >> >> I think we'll need an approach that makes the constraint checking and >> mutation to be _atomically_ achieved >> a) either by taking a row lock and passing that into put / checkAndPut >> b) referencing & checking the constraint directly from within the put >> / checkAndPut methods (like we do with the comparator, for instance) >> >> > However, if we wanted to do a read of the table, within the coprocessor, we > will want to actually pass in the rowlock so we can ensure that we don't > get a contention/atomicity issue. This could be pretty easily handled by > setting the rowlock on the Put that is passed into prePut (which either > resets that current rowlock or sets it to the found rowlock). Then the > underlying constraint could make that accessible if we wanted to scan the > region. > > I also just did a quick test to modifying TestSimpleRegionObserver to do a > read from the region when doing the prePut() and came up working fine - no > deadlock issues. Checked for doing bothPut and checkAndPut, and didn't come > up with anything locking. > > Heavy concurrent use only will be slowed down by the reads, but nothing > should be broken. We might be able to do some internal work on constraints > and create a CheckAndWriteConstriant that will be used by makes it easier > to check the value already present. Doing something like that would be best > handled in HBASE-4999. > > > Does that seem reasonable? > > -Jesse > > >> >> >> On Fri, Dec 9, 2011 at 1:31 PM, Suraj Varma <[email protected]> wrote: >> > Hi: >> > I opened a jira ticket on this: >> https://issues.apache.org/jira/browse/HBASE-4999 >> > >> > I have linked to HBASE-4605 in the description to show related work on >> > Constraints by Jesse. >> > >> > Thanks! >> > --Suraj >> > >> > On Sun, Dec 4, 2011 at 1:10 PM, Ted Yu <[email protected]> wrote: >> >> Currently ConstraintProcessor latches onto prePut() to perform >> validation >> >> check. >> >> >> >> From HRegion.doMiniBatchPut() where prePut() is called: >> >> /* Run coprocessor pre hook outside of locks to avoid deadlock */ >> >> So to make use of Constraint in Suraj's scenario, we have some >> decisions to >> >> make about various factors. >> >> >> >> Cheers >> >> >> >> On Sun, Dec 4, 2011 at 8:39 AM, Suraj Varma <[email protected]> >> wrote: >> >> >> >>> Jesse: >> >>> >> Quick soln - write a CP to check the single row (blocking the put). >> >>> >> >>> Yeah - given that I want this to be atomically done, I'm wondering if >> >>> this would even work (because, I believe I'd need to unlock the row so >> >>> that the checkAndMutate can take the lock - so, there is a brief >> >>> window between where there is no lock being held - and some other >> >>> thread could take that lock). One option would be to pass in a lock to >> >>> checkAndMutate ... but that would increase the locking period and may >> >>> have performance implications, I think. >> >>> >> >>> I see a lot of potential in the Constraints implementation - it would >> >>> really open up CAS operations to do functional constraint checking, >> >>> rather than just value comparisons. >> >>> >> >>> --Suraj >> >>> >> >>> On Sun, Dec 4, 2011 at 8:32 AM, Suraj Varma <[email protected]> >> wrote: >> >>> > Thanks - I see that the lock is taken internal to checkAndMutate. >> >>> > >> >>> > I'm wondering whether it is a better idea to actually pass in a >> >>> > Constraint (or even Constraints) as the checkAndMutate argument. >> Right >> >>> > now it is taking in an Comparator and a CompareOp for verification. >> >>> > But, this could just be a special case of Constraint which is >> >>> > evaluated within the lock. >> >>> > >> >>> > In other words, we could open up a richer Constraint checking api >> >>> > where any "functional" Constraint check can be performed in the >> >>> > checkAndPut operation. >> >>> > >> >>> > This would also not have the same performance impact of taking a >> >>> > rowLock in preCheckAndPut and release in postCheckAndPut. And - it is >> >>> > really (in my mind) implementing the compare-and-set more >> generically. >> >>> > >> >>> > I also see the potential of passing in multiple constraints (say >> >>> > upper/lower bounds in Increment/Decrement operations) etc. >> >>> > >> >>> > --Suraj >> >>> > >> >>> > >> >>> > On Sat, Dec 3, 2011 at 7:44 PM, Ted Yu <[email protected]> wrote: >> >>> >> From HRegionServer.checkAndPut(): >> >>> >> if (region.getCoprocessorHost() != null) { >> >>> >> Boolean result = region.getCoprocessorHost() >> >>> >> .preCheckAndPut(row, family, qualifier, CompareOp.EQUAL, >> >>> comparator, >> >>> >> put); >> >>> >> ... >> >>> >> boolean result = checkAndMutate(regionName, row, family, >> qualifier, >> >>> >> CompareOp.EQUAL, new BinaryComparator(value), put, >> >>> >> lock); >> >>> >> We can see that the lock isn't taken for preCheckAndPut(). >> >>> >> >> >>> >> To satisfy Suraj's requirement, I think a slight change to >> >>> checkAndPut() is >> >>> >> needed so that atomicity can be achieved across preCheckAndPut() and >> >>> >> checkAndMutate(). >> >>> >> >> >>> >> Cheers >> >>> >> >> >>> >> On Sat, Dec 3, 2011 at 4:54 PM, Suraj Varma <[email protected]> >> >>> wrote: >> >>> >> >> >>> >>> Just so my question is clear ... everything I'm suggesting is in >> the >> >>> >>> context of a single row (not cross row / table). - so, yes, I'm >> >>> >>> guessing obtaining a RowLock on the region side during >> preCheckAndPut >> >>> >>> / postCheckAndPut would certainly work. Which was why I was asking >> >>> >>> whether the pre/postCheckAndPut obtains the row lock or whether the >> >>> >>> row lock is only obtained within checkAndPut. >> >>> >>> >> >>> >>> Let's say the coprocessor takes a rowlock in preCheckAndPut ... >> will >> >>> >>> that even work? i.e. can the same rowlock be inherited by the >> >>> >>> checkAndPut api within that thread's context? Or will >> preCheckAndPut >> >>> >>> have to release the lock so that checkAndPut can take it (which >> won't >> >>> >>> work for my case, as it has to be atomic between the preCheck and >> >>> >>> Put.) >> >>> >>> >> >>> >>> Thanks for pointing me to the Constraints functionality - I'll >> take a >> >>> >>> look at whether it could potentially work. >> >>> >>> --Suraj >> >>> >>> >> >>> >>> On Sat, Dec 3, 2011 at 10:25 AM, Jesse Yates < >> [email protected]> >> >>> >>> wrote: >> >>> >>> > I think the feature you are looking for is a Constraint. >> Currently >> >>> they >> >>> >>> are >> >>> >>> > being added to 0.94 in >> >>> >>> > HBASE-4605<https://issues.apache.org/jira/browse/HBASE-4605>; >> >>> >>> > they are almost ready to be rolled in, and backporting to 0.92 is >> >>> >>> > definitely doable. >> >>> >>> > >> >>> >>> > However, Constraints aren't going to be quite flexible enough to >> >>> >>> > efficiently support what you are describing. For instance, with a >> >>> >>> > constraint, you are ideally just checking the put value against >> some >> >>> >>> simple >> >>> >>> > constraint (never over 10 or always an integer), but looking at >> the >> >>> >>> current >> >>> >>> > state of the table before allowing the put would currently >> require >> >>> >>> creating >> >>> >>> > a full blown connection to the local table through another >> HTable. >> >>> >>> > >> >>> >>> > In the short term, you could write a simple coprocessor to do >> this >> >>> >>> checking >> >>> >>> > and then move over to constraints (which are a simpler, more >> >>> flexible, >> >>> >>> way >> >>> >>> > of doing this) when the necessary features have been added. >> >>> >>> > >> >>> >>> > It is worth discussing if it makes sense to have access to the >> local >> >>> >>> region >> >>> >>> > through a constraint, though that breaks the idea a little bit, >> it >> >>> would >> >>> >>> > certainly be useful and not overly wasteful in terms of runtime. >> >>> >>> > >> >>> >>> > Supposing the feature would be added to talk to the local table, >> and >> >>> >>> since >> >>> >>> > the puts are going to be serialized on the regionserver (at >> least to >> >>> that >> >>> >>> > single row you are trying to update), you will never get a >> situation >> >>> >>> where >> >>> >>> > the value added is over the threshold. If you were really worried >> >>> about >> >>> >>> the >> >>> >>> > atomicity of the operation, then when doing the put, first get >> the >> >>> >>> RowLock, >> >>> >>> > then do the put and release the RowLock. However, that latter >> method >> >>> is >> >>> >>> > going to be really slow, so should only be used as a stop gap if >> the >> >>> >>> > constraint doesn't work as expected, until a patch is made for >> >>> >>> constraints. >> >>> >>> > >> >>> >>> > Feel free to open up a ticket and link it to 4605 for adding the >> >>> local >> >>> >>> > table access functionality, and we can discuss the de/merits of >> >>> adding >> >>> >>> the >> >>> >>> > access. >> >>> >>> > >> >>> >>> > -Jesse >> >>> >>> > >> >>> >>> > On Sat, Dec 3, 2011 at 6:24 AM, Suraj Varma <[email protected] >> > >> >>> wrote: >> >>> >>> > >> >>> >>> >> I'm looking at the preCheckAndPut / postCheckAndPut api with >> >>> >>> >> coprocessors and I'm wondering ... are these pre/post checks >> done >> >>> >>> >> _after_ taking the row lock or is the row lock only done within >> the >> >>> >>> >> checkAndPut api. >> >>> >>> >> >> >>> >>> >> I'm interested in seeing if we can implement something like: >> >>> >>> >> (in pseudo sql) >> >>> >>> >> update table-name >> >>> >>> >> set column-name = new-value >> >>> >>> >> where (column-value - new-value) > threshold-value >> >>> >>> >> >> >>> >>> >> Basically ... I want to enhance the checkAndPut to not just >> compare >> >>> >>> >> "values" ... but apply an arbitrary function on the value >> >>> _atomically_ >> >>> >>> >> in the Put call. Multiple threads would be firing these >> mutations >> >>> and >> >>> >>> >> I'd like the threshold-value above to never be breached under >> any >> >>> >>> >> circumstance. >> >>> >>> >> >> >>> >>> >> Is there a solution that can be implemented either via >> checkAndPut >> >>> or >> >>> >>> >> using coprocessors preCheckAndPut? If not, would this be a >> useful >> >>> >>> >> feature to build in HBase? >> >>> >>> >> >> >>> >>> >> Thanks, >> >>> >>> >> --Suraj >> >>> >>> >> >> >>> >>> > >> >>> >>> > >> >>> >>> > >> >>> >>> > -- >> >>> >>> > ------------------- >> >>> >>> > Jesse Yates >> >>> >>> > 240-888-2200 >> >>> >>> > @jesse_yates >> >>> >>> >> >>> >> > > -- > ------------------- > Jesse Yates > 240-888-2200 > @jesse_yates
