[ 
https://issues.apache.org/jira/browse/HBASE-5402?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cosmin Lehene resolved HBASE-5402.
----------------------------------
    Resolution: Later

Closing it with resolution later.  It may make sense to a have a deterministic, 
invertible hash, in case someone wants to play with it.

> PerformanceEvaluation creates the wrong number of rows in randomWrite
> ---------------------------------------------------------------------
>
>                 Key: HBASE-5402
>                 URL: https://issues.apache.org/jira/browse/HBASE-5402
>             Project: HBase
>          Issue Type: Improvement
>          Components: test
>            Reporter: Oliver Meyn
>              Labels: beginner
>
> The command line 'hbase org.apache.hadoop.hbase.PerformanceEvaluation 
> randomWrite 10' should result in a table with 10 * (1024 * 1024) rows (so 
> 10485760).  Instead what happens is that the randomWrite job reports writing 
> that many rows (exactly) but running rowcounter against the table reveals 
> only e.g 6549899 rows.  A second attempt to build the table produced slightly 
> different results (e.g. 6627689).  I see a similar discrepancy when using 50 
> instead of 10 clients (~35% smaller than expected).
> Further experimentation reveals that the problem is key collision - by 
> removing the % totalRows in getRandomRow I saw a reduction in collisions 
> (table was ~8M rows instead of 6.6M).  Replacing the random row key with 
> UUIDs instead of Integers solved the problem and produced exactly 10485760 
> rows.  But that makes the key size 16 bytes instead of the current 10, so I'm 
> not sure that's an acceptable solution.
> Here's the UUID code I used:
>   public static byte[] format(final UUID uuid) {
>     long msb = uuid.getMostSignificantBits();
>     long lsb = uuid.getLeastSignificantBits();
>     byte[] buffer = new byte[16];
>     for (int i = 0; i < 8; i++) {
>       buffer[i] = (byte) (msb >>> 8 * (7 - i));
>     }
>     for (int i = 8; i < 16; i++) {
>       buffer[i] = (byte) (lsb >>> 8 * (7 - i));
>     }
>     return buffer;
>   }
> which is invoked within getRandomRow with 
> return format(UUID.randomUUID());



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to