>> so that child rows are always in the same region as the parent rows Should the user expect abnormal growth for certain parent(s) ?
I think even HFile v2 has a limit on the file size beyond which operations would become less efficient. On Tue, Jan 17, 2012 at 4:48 PM, lars hofhansl <[email protected]> wrote: > Yes, it's hard constraint, but the building blocks are there. > User can disable automatic splitting and pre-split the table. > > For example one could have a table that hosts a parent child relationship > in a single table, by prefixing all child child row keys with the parent > row key, > Now it is possible to presplit the table (or use a custom local balancer) > so that child rows are always in the same region as the parent rows. > And then it would be possible to do cross parent/child transactions. > > Using the same scheme it is possible to do consistent parent/child indexes > (consistent indexes within the same parent prefix). > (I just made this up, but this is somewhat similar to the Megastore > design, I think) > > > Anyway, I set out asking whether this would be a useful endeavor, seems > the answer is resounding "maybe". :) > > > -- Lars > > > > ----- Original Message ----- > From: Mikael Sitruk <[email protected]> > To: [email protected] > Cc: > Sent: Tuesday, January 17, 2012 3:07 PM > Subject: Re: Limited cross row transactions > > Well i understand the limitation now, asking to be in the same region is > really hard constraint. > Even if this is on the same RS this is not enough, because after a restart, > regions may be allocated differently and now part of the data may be in one > region under server A and the other part under server B. > > Well perhaps we need use case for better understanding, and perhaps finding > alternative. > > The first use case i was thinking of is as follow - > I need to insert data with different access criteria, but the data inserted > should be inserted in atomic way. > In RDBMS i would have two table, insert data in the first one with key#1 > and then in the second one with key #2 then commit. > In HBase i need to use different column family with key #1 (for atomicity) > then to manage a kind of secondary index to map key#2 to key #1 (perhaps > via co-processor) to have quick access to the data of key#2. > Having cross row trx, i would think of sing different keys under the same > table (and probably different cf too), without the need to have secondary > index, but again with the limitation it does not seems to be easily > feasible. > > Mik. > > On Wed, Jan 18, 2012 at 12:22 AM, Ted Yu <[email protected]> wrote: > > > People rely on RDBMS for the transaction support. > > > > Consider the following example: > > A highly de-normalized schema puts related users in the same region where > > this 'limited cross row transactions' works. > > After some time, the region has to be split (maybe due to good business > > condition). > > What should the HBase user do now ? > > > > Cheers > > > > On Tue, Jan 17, 2012 at 2:13 PM, Mikael Sitruk <[email protected] > > >wrote: > > > > > Ted - My 2 cents as a user. > > > The user should know what he is doing, this is like a 'delete' > operation, > > > this is less intuitive that the original delete in RDBMS, so the same > > will > > > be for this light transaction. > > > If the transaction fails because of cross region server then the design > > of > > > the user was wrong > > > if the transaction fails because of concurrent access, then he should > be > > > able to re-read and reprocess its request. > > > The only problem is how to make sure in advance that the different rows > > > will be in the same RS? > > > > > > Lars - is the limitation is at the region or at the region server? It > was > > > not so clear. > > > > > > Mikael.S > > > > > > On Tue, Jan 17, 2012 at 11:52 PM, Ted Yu <[email protected]> wrote: > > > > > > > Back to original proposal: > > > > If client side grouping reveals that the batch of operations cannot > be > > > > supported by 'limited cross row transactions', what should the user > do > > ? > > > > > > > > Cheers > > > > > > > > On Tue, Jan 17, 2012 at 1:49 PM, Ted Yu <[email protected]> wrote: > > > > > > > > > Whether Omid fits the bill is open to discussion. > > > > > > > > > > We should revisit HBASE-2315 and provide the support Flavio, et al > > > need. > > > > > > > > > > Cheers > > > > > > > > > > > > > > > On Tue, Jan 17, 2012 at 1:41 PM, Lars George < > [email protected] > > > > >wrote: > > > > > > > > > >> Hi Ted, > > > > >> > > > > >> Wouldn't Omid (https://github.com/yahoo/omid) help there? Or is > > that > > > > too > > > > >> broad? Just curious. > > > > >> > > > > >> Lars > > > > >> > > > > >> On Jan 17, 2012, at 4:36 PM, Ted Yu wrote: > > > > >> > > > > >> > Can we collect use case for 'limited cross row transactions' > > first ? > > > > >> > > > > > >> > I have been thinking about (unlimited) multi-row transaction > > support > > > > in > > > > >> > HBase. It may not be a one-man task. But we should definitely > > > > implement > > > > >> it > > > > >> > someday. > > > > >> > > > > > >> > Cheers > > > > >> > > > > > >> > On Tue, Jan 17, 2012 at 1:27 PM, lars hofhansl < > > [email protected] > > > > > > > > >> wrote: > > > > >> > > > > > >> >> I just committed HBASE-5203 (together with HBASE-3584 this > > > implements > > > > >> >> atomic row operations). > > > > >> >> Although a relatively small patch it lays the groundwork for > > > > >> heterogeneous > > > > >> >> operations in a single WALEdit. > > > > >> >> > > > > >> >> The interesting part is that even though the code enforced the > > > atomic > > > > >> >> operation to be a for single row, this is not required. > > > > >> >> It is enough if all involved KVs reside in the same region. > > > > >> >> > > > > >> >> I am not saying that we should add any high level concept to > > HBase > > > > >> (such > > > > >> >> as the EntityGroups of Megastore). > > > > >> >> > > > > >> >> But, with a slight addition to the API (allowing a grouping of > > > > multiple > > > > >> >> row operations) client applications have all the building > blocks > > to > > > > do > > > > >> >> limited cross row atomic operations. > > > > >> >> The client application would be responsible for either > correctly > > > > >> >> pre-splitting the table, or a custom balancer has to be > provided. > > > > >> >> > > > > >> >> The operation would fail if the regionserver determines that it > > > would > > > > >> need > > > > >> >> data from multiple region servers. > > > > >> >> > > > > >> >> I think this needs at least minimal support from HBase and > cannot > > > > >> >> (efficiently or without adding more moving parts) by a client > API > > > > only. > > > > >> >> > > > > >> >> > > > > >> >> Comments? Is this worth pursuing? If so, I'll file a jira and > > > > provide a > > > > >> >> patch. > > > > >> >> > > > > >> >> Thanks. > > > > >> >> > > > > >> >> > > > > >> >> -- Lars > > > > >> >> > > > > >> >> > > > > >> > > > > >> > > > > > > > > > > > > > > > > > > > > > -- > > > Mikael.S > > > > > > > > > -- > Mikael.S > >
