[ https://issues.apache.org/jira/browse/HBASE-5402?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Cosmin Lehene resolved HBASE-5402. ---------------------------------- Resolution: Later Closing it with resolution later. It may make sense to a have a deterministic, invertible hash, in case someone wants to play with it. > PerformanceEvaluation creates the wrong number of rows in randomWrite > --------------------------------------------------------------------- > > Key: HBASE-5402 > URL: https://issues.apache.org/jira/browse/HBASE-5402 > Project: HBase > Issue Type: Improvement > Components: test > Reporter: Oliver Meyn > Labels: beginner > > The command line 'hbase org.apache.hadoop.hbase.PerformanceEvaluation > randomWrite 10' should result in a table with 10 * (1024 * 1024) rows (so > 10485760). Instead what happens is that the randomWrite job reports writing > that many rows (exactly) but running rowcounter against the table reveals > only e.g 6549899 rows. A second attempt to build the table produced slightly > different results (e.g. 6627689). I see a similar discrepancy when using 50 > instead of 10 clients (~35% smaller than expected). > Further experimentation reveals that the problem is key collision - by > removing the % totalRows in getRandomRow I saw a reduction in collisions > (table was ~8M rows instead of 6.6M). Replacing the random row key with > UUIDs instead of Integers solved the problem and produced exactly 10485760 > rows. But that makes the key size 16 bytes instead of the current 10, so I'm > not sure that's an acceptable solution. > Here's the UUID code I used: > public static byte[] format(final UUID uuid) { > long msb = uuid.getMostSignificantBits(); > long lsb = uuid.getLeastSignificantBits(); > byte[] buffer = new byte[16]; > for (int i = 0; i < 8; i++) { > buffer[i] = (byte) (msb >>> 8 * (7 - i)); > } > for (int i = 8; i < 16; i++) { > buffer[i] = (byte) (lsb >>> 8 * (7 - i)); > } > return buffer; > } > which is invoked within getRandomRow with > return format(UUID.randomUUID()); -- This message was sent by Atlassian JIRA (v6.3.4#6332)