[ https://issues.apache.org/jira/browse/HBASE-17849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
ramkrishna.s.vasudevan updated HBASE-17849: ------------------------------------------- Status: Patch Available (was: Open) > PE tool randomness is not totally random > ---------------------------------------- > > Key: HBASE-17849 > URL: https://issues.apache.org/jira/browse/HBASE-17849 > Project: HBase > Issue Type: Bug > Components: test > Affects Versions: 2.0.0 > Reporter: ramkrishna.s.vasudevan > Assignee: ramkrishna.s.vasudevan > Fix For: 2.0.0 > > Attachments: HBASE-17849.patch, HBASE-17849.patch > > > Recently we were using the PE tool for doing some bucket cache related > performance tests. One thing that we noted was that the way the random read > works is not totally random. > Suppose we load 200G of data using --size param and then we use --rows=500000 > to do the randomRead. The assumption was among the 200G of data it could > generate randomly 500000 row keys to do the reads. > But it so happens that the PE tool generates random rows only on those set of > row keys which falls under the first 500000 rows. > This was quite evident when we tried to use HBASE-15314 in our testing. > Suppose we split the bucket cache of size 200G into 2 files each 100G the > randomReads with --rows=500000 always lands in the first file and not in the > 2nd file. Better to make PE purely random. -- This message was sent by Atlassian JIRA (v6.3.15#6346)