Re: Accessing rows with number indexes

Tim Sell Sat, 10 Jan 2009 09:55:38 -0800

Do the sentences need to be sorted?
if not you could use an number prefix on the row key. Keep track of
the highest prefix and use that range to select a prefix randomly.
Then start a scanner at that prefix


~Tim.

2009/1/10 Jim Twensky <[email protected]>:
> Hello,
>
> I have an HBase table that contains sentences as row keys and a few numeric
> values as columns. A simple abstract model of the table looks like the
> following:
>
> --------------------------------------------------------------------------------------------------------------------------
> Sentence     |          frequency:value     |      probability:value-0
> |     probability:value-2
> --------------------------------------------------------------------------------------------------------------------------
> Hello World |                 5                    |      0.000545321
> |     0.002368204
>     .                              .
> .                             .
>     .                              .
> .                             .
>     .                              .
> .                             .
> --------------------------------------------------------------------------------------------------------------------------
>
>
> I create the table and load it using Hadoop and there are hundreds of
> billions of entries in it. I use this table to solve an optimization problem
> using a hill climbing/simulated annealing method. Basically, I need to
> change the likelihood values randomly. For example, I need to change say the
> first 5 rows starting at the 112th row and do some calculations and so on...
>
> Now the problem is, I can't see an easy way to access to the n'th row
> directly. If I was using a traditional RDBMS, I'd add another column and
> auto-increment it each time I added a new row but this is not possible since
> I load the table using Hadoop and the there are parallel insertions taking
> place simultaneously. A quick and dirty way to do this might be adding a new
> index column after I load and initialize the table but the table is huge and
> it doesn't seem right to me. Another bad approach would be to use a scanner
> starting from the first row and calling Scanner.next() n times inside a for
> loop to access the n'th row, which also seems very slow. Any ideas on how I
> could do it more efficiently?
>
> Thanks in advance,
> Jim
>

Re: Accessing rows with number indexes

Reply via email to