RE: Unique row ID constraint

Michael Segel Thu, 29 Apr 2010 13:09:31 -0700

UUIDs wont clash. Especially if you're using version 5 which is a truncated 
SHA-1 hash of the UUID.



> From: germog...@gmail.com
> Date: Thu, 29 Apr 2010 13:58:42 -0300
> Subject: Re: Unique row ID constraint
> To: hbase-user@hadoop.apache.org
> 
> Hello Tatsuya,
> 
> Can the keys be randomly generated or they must be incremental? Remember
> that you can achieve higher throughput if they are randomly generated since
> the insertions will possibly load all machines more evenly.
> 
> Using UUIDs may ensure key uniqueness (I don't hope a UUID clash soon :-)
> and load balance over the cluster, but if you are paranoid enough you can
> also check whether a row already exists by using
> checkAndPut<http://hadoop.apache.org/hbase/docs/r0.20.3/api/org/apache/hadoop/hbase/client/HTable.html#checkAndPut(byte[],
> byte[], byte[], byte[], org.apache.hadoop.hbase.client.Put)> (just check for
> an empty byte array values in a column that you can ensure it has always
> some value).
> 
> On Thu, Apr 29, 2010 at 1:36 PM, Todd Lipcon <t...@cloudera.com> wrote:
> 
> > Hi Tatsuya,
> >
> > Note that your solution is not correct in the case of failure, since the
> > check and put are not atomic with each other.
> >
> > If your client or server fails between the ICV and the put, no other
> > clients
> > will be able to put the row, but there will be no data.
> >
> > -Todd
> >
> >
> > On Thu, Apr 29, 2010 at 1:33 AM, Tatsuya Kawano <tatsuy...@snowcocoa.info
> > >wrote:
> >
> > > Hi Stack and Ryan,
> > >
> > > Thanks for your advices. I knew using row lock wasn't ideal, but I
> > > couldn't find an appropriate atomic operation to do Compare And Swap.
> > >
> > > So, thanks Stack for helping me to find it. I found
> > > incrementColumnValue() atomic operation just works for me since it
> > > automatically initializes the column value with 0 when the column
> > > doesn't exist. I cat try to increment the column value by 1, and if it
> > > returns 1, I can be sure that I'm the first one who has created the
> > > column and row.
> > >
> > > So, my updated code is much simpler and now lock-free.
> > >
> > > ===============================================
> > >  def insert(table: HTable, put: Put): Unit = {
> > >    val count = table.incrementColumnValue(put.getRow, family, uniqueQual,
> > > 1)
> > >
> > >    if (count == 1) {
> > >      table.put(put)
> > >
> > >    } else {
> > >       throw new DuplicateRowException("Tried to insert a duplicate row: "
> > >               + Bytes.toString(put.getRow))
> > >    }
> > >  }
> > > ===============================================
> > >
> > > Thanks,
> > > Tatsuya
> > >
> > >
> > >
> > > 2010/4/29 Ryan Rawson <ryano...@gmail.com>:
> > > > I would strongly discourage people from building on top of
> > > > lockRow/unlockRow.  The problem is if a row is not available, lockRow
> > > > will hold a responder thread and you can end up with a deadlock
> > > > because the lock holder won't be able to unlock.  Sure the expiry
> > > > system kicks in, but 60 seconds is kind of infinity in database terms
> > > > :-)
> > > >
> > > > I would probably go with either ICV or CAS to build the tools you
> > > > want.  With CAS you can accomplish a lot of things locking
> > > > accomplishes, but more efficiently.
> > > >
> > > > On Wed, Apr 28, 2010 at 9:42 AM, Stack <st...@duboce.net> wrote:
> > > >> Would the incrementValue [1] work for this?
> > > >> St.Ack
> > > >>
> > > >> 1.
> > >
> > http://hadoop.apache.org/hbase/docs/r0.20.3/api/org/apache/hadoop/hbase/client/HTable.html#incrementColumnValue%28byte[],%20byte[],%20byte[],%20long%29
> > > >>
> > > >> On Wed, Apr 28, 2010 at 7:40 AM, Tatsuya Kawano
> > > >> <tatsuy...@snowcocoa.info> wrote:
> > > >>> Hi,
> > > >>>
> > > >>> I'd like to implement unique row ID constraint (like the primary key
> > > >>> constraint in RDBMS) in my application framework.
> > > >>>
> > > >>> Here is a code fragment from my current implementation (HBase
> > > >>> 0.20.4rc) written in Scala. It works as expected, but is there any
> > > >>> better (shorter) way to do this like checkAndPut()?  I'd like to pass
> > > >>> a single Put object to my function (method) rather than passing
> > rowId,
> > > >>> family, qualifier and value separately. I can't do this now because I
> > > >>> have to give the rowLock object when I instantiate the Put.
> > > >>>
> > > >>> ===============================================
> > > >>> def insert(table: HTable, rowId: Array[Byte], family: Array[Byte],
> > > >>>                               qualifier: Array[Byte], value:
> > > >>> Array[Byte]): Unit = {
> > > >>>
> > > >>>    val get = new Get(rowId)
> > > >>>
> > > >>>    val lock = table.lockRow(rowId) // will expire in one minute
> > > >>>    try {
> > > >>>      if (table.exists(get)) {
> > > >>>        throw new DuplicateRowException("Tried to insert a duplicate
> > > row: "
> > > >>>                + Bytes.toString(rowId))
> > > >>>
> > > >>>      } else {
> > > >>>        val put = new Put(rowId, lock)
> > > >>>        put.add(family, qualifier, value)
> > > >>>
> > > >>>        table.put(put)
> > > >>>      }
> > > >>>
> > > >>>    } finally {
> > > >>>      table.unlockRow(lock)
> > > >>>    }
> > > >>>
> > > >>> }
> > > >>> ===============================================
> > > >>>
> > > >>> Thanks,
> > > >>>
> > > >>> --
> > > >>> 河野 達也
> > > >>> Tatsuya Kawano (Mr.)
> > > >>> Tokyo, Japan
> > > >>>
> > > >>> twitter: http://twitter.com/tatsuya6502
> > >
> >
> >
> >
> > --
> > Todd Lipcon
> > Software Engineer, Cloudera
> >
> 
> 
> 
> -- 
> Guilherme
> 
> msn: guigermog...@hotmail.com
> homepage: http://sites.google.com/site/germoglio/
                                          
_________________________________________________________________
Hotmail is redefining busy with tools for the New Busy. Get more from your 
inbox.
http://www.windowslive.com/campaign/thenewbusy?ocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_2

RE: Unique row ID constraint

Reply via email to