[jira] [Created] (HBASE-17849) PE tool randomness is not totally random

ramkrishna.s.vasudevan (JIRA) Wed, 29 Mar 2017 11:12:06 -0700

ramkrishna.s.vasudevan created HBASE-17849:
----------------------------------------------


             Summary: PE tool randomness is not totally random
                 Key: HBASE-17849
                 URL: https://issues.apache.org/jira/browse/HBASE-17849
             Project: HBase
          Issue Type: Bug
          Components: test
    Affects Versions: 2.0.0
            Reporter: ramkrishna.s.vasudevan
             Fix For: 2.0.0


Recently we were using the PE tool for doing some bucket cache related 
performance tests. One thing that we noted was that the way the random read 
works is not totally random.

Suppose we load 200G of data using --size param and then we use --rows=500000 
to do the randomRead. The assumption was among the 200G of data it could 
generate randomly 500000 row keys to do the reads.
But it so happens that the PE tool generates random rows only on those set of 
row keys which falls under the first 500000 rows. 
This was quite evident when we tried to use HBASE-15314 in our testing. Suppose 
we split the bucket cache of size 200G into 2 files each 100G the randomReads 
with --rows=500000 always lands in the first file and not in the 2nd file. 
Better to make PE purely random.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Created] (HBASE-17849) PE tool randomness is not totally random

Reply via email to