UUIDs wont clash. Especially if you're using version 5 which is a truncated SHA-1 hash of the UUID.
> From: germog...@gmail.com > Date: Thu, 29 Apr 2010 13:58:42 -0300 > Subject: Re: Unique row ID constraint > To: hbase-user@hadoop.apache.org > > Hello Tatsuya, > > Can the keys be randomly generated or they must be incremental? Remember > that you can achieve higher throughput if they are randomly generated since > the insertions will possibly load all machines more evenly. > > Using UUIDs may ensure key uniqueness (I don't hope a UUID clash soon :-) > and load balance over the cluster, but if you are paranoid enough you can > also check whether a row already exists by using > checkAndPut<http://hadoop.apache.org/hbase/docs/r0.20.3/api/org/apache/hadoop/hbase/client/HTable.html#checkAndPut(byte[], > byte[], byte[], byte[], org.apache.hadoop.hbase.client.Put)> (just check for > an empty byte array values in a column that you can ensure it has always > some value). > > On Thu, Apr 29, 2010 at 1:36 PM, Todd Lipcon <t...@cloudera.com> wrote: > > > Hi Tatsuya, > > > > Note that your solution is not correct in the case of failure, since the > > check and put are not atomic with each other. > > > > If your client or server fails between the ICV and the put, no other > > clients > > will be able to put the row, but there will be no data. > > > > -Todd > > > > > > On Thu, Apr 29, 2010 at 1:33 AM, Tatsuya Kawano <tatsuy...@snowcocoa.info > > >wrote: > > > > > Hi Stack and Ryan, > > > > > > Thanks for your advices. I knew using row lock wasn't ideal, but I > > > couldn't find an appropriate atomic operation to do Compare And Swap. > > > > > > So, thanks Stack for helping me to find it. I found > > > incrementColumnValue() atomic operation just works for me since it > > > automatically initializes the column value with 0 when the column > > > doesn't exist. I cat try to increment the column value by 1, and if it > > > returns 1, I can be sure that I'm the first one who has created the > > > column and row. > > > > > > So, my updated code is much simpler and now lock-free. > > > > > > =============================================== > > > def insert(table: HTable, put: Put): Unit = { > > > val count = table.incrementColumnValue(put.getRow, family, uniqueQual, > > > 1) > > > > > > if (count == 1) { > > > table.put(put) > > > > > > } else { > > > throw new DuplicateRowException("Tried to insert a duplicate row: " > > > + Bytes.toString(put.getRow)) > > > } > > > } > > > =============================================== > > > > > > Thanks, > > > Tatsuya > > > > > > > > > > > > 2010/4/29 Ryan Rawson <ryano...@gmail.com>: > > > > I would strongly discourage people from building on top of > > > > lockRow/unlockRow. The problem is if a row is not available, lockRow > > > > will hold a responder thread and you can end up with a deadlock > > > > because the lock holder won't be able to unlock. Sure the expiry > > > > system kicks in, but 60 seconds is kind of infinity in database terms > > > > :-) > > > > > > > > I would probably go with either ICV or CAS to build the tools you > > > > want. With CAS you can accomplish a lot of things locking > > > > accomplishes, but more efficiently. > > > > > > > > On Wed, Apr 28, 2010 at 9:42 AM, Stack <st...@duboce.net> wrote: > > > >> Would the incrementValue [1] work for this? > > > >> St.Ack > > > >> > > > >> 1. > > > > > http://hadoop.apache.org/hbase/docs/r0.20.3/api/org/apache/hadoop/hbase/client/HTable.html#incrementColumnValue%28byte[],%20byte[],%20byte[],%20long%29 > > > >> > > > >> On Wed, Apr 28, 2010 at 7:40 AM, Tatsuya Kawano > > > >> <tatsuy...@snowcocoa.info> wrote: > > > >>> Hi, > > > >>> > > > >>> I'd like to implement unique row ID constraint (like the primary key > > > >>> constraint in RDBMS) in my application framework. > > > >>> > > > >>> Here is a code fragment from my current implementation (HBase > > > >>> 0.20.4rc) written in Scala. It works as expected, but is there any > > > >>> better (shorter) way to do this like checkAndPut()? I'd like to pass > > > >>> a single Put object to my function (method) rather than passing > > rowId, > > > >>> family, qualifier and value separately. I can't do this now because I > > > >>> have to give the rowLock object when I instantiate the Put. > > > >>> > > > >>> =============================================== > > > >>> def insert(table: HTable, rowId: Array[Byte], family: Array[Byte], > > > >>> qualifier: Array[Byte], value: > > > >>> Array[Byte]): Unit = { > > > >>> > > > >>> val get = new Get(rowId) > > > >>> > > > >>> val lock = table.lockRow(rowId) // will expire in one minute > > > >>> try { > > > >>> if (table.exists(get)) { > > > >>> throw new DuplicateRowException("Tried to insert a duplicate > > > row: " > > > >>> + Bytes.toString(rowId)) > > > >>> > > > >>> } else { > > > >>> val put = new Put(rowId, lock) > > > >>> put.add(family, qualifier, value) > > > >>> > > > >>> table.put(put) > > > >>> } > > > >>> > > > >>> } finally { > > > >>> table.unlockRow(lock) > > > >>> } > > > >>> > > > >>> } > > > >>> =============================================== > > > >>> > > > >>> Thanks, > > > >>> > > > >>> -- > > > >>> 河野 達也 > > > >>> Tatsuya Kawano (Mr.) > > > >>> Tokyo, Japan > > > >>> > > > >>> twitter: http://twitter.com/tatsuya6502 > > > > > > > > > > > -- > > Todd Lipcon > > Software Engineer, Cloudera > > > > > > -- > Guilherme > > msn: guigermog...@hotmail.com > homepage: http://sites.google.com/site/germoglio/ _________________________________________________________________ Hotmail is redefining busy with tools for the New Busy. Get more from your inbox. http://www.windowslive.com/campaign/thenewbusy?ocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_2