[ https://issues.apache.org/jira/browse/HBASE-21256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16632977#comment-16632977 ]
Duo Zhang commented on HBASE-21256: ----------------------------------- {quote} It is 48-bits seed algorithm (according to HBASE-13382 said) that Random utilizes and therefore it CAN generated 32-bits numbers in a completely random way. {quote} Not really. In HBASE-13161 it shows that SecureRandom can fix the problem. Anyway, the SecureRandom is not necessary as we do not need 'secure' here. See this page https://en.wikipedia.org/wiki/Linear_congruential_generator There is a table which shows different parameter for different LCGs, I think we could make use of the one introduced by Knuth. It uses a 64 bits seed, so we can use it to generate our 16 bytes key - just use two longs, and we can make sure that a 10B test will not have collisions, as we need to generate 2^64 times to make it repetitive. Thanks. > Improve IntegrationTestBigLinkedList for testing huge data > ---------------------------------------------------------- > > Key: HBASE-21256 > URL: https://issues.apache.org/jira/browse/HBASE-21256 > Project: HBase > Issue Type: Improvement > Components: integration tests > Affects Versions: 3.0.0 > Reporter: Zephyr Guo > Assignee: Zephyr Guo > Priority: Major > Fix For: 3.0.0 > > Attachments: HBASE-21256-v1.patch, ITBLL-1.png, ITBLL-2.png > > > Recently, I use ITBLL to test some features in our company. I have > encountered the following problems: > > 1. Generator is too slow at the generating stage, the root cause is > SecureRandom. There is a global lock in SecureRandom( See the following > picture). I use Random instead of SecureRandom, and it could speed up this > stage(500% up with 20 mapper). SecureRandom was brought by HBASE-13382, but > speaking of generating random bytes, in my opnion, > it is the same with Random. > !ITBLL-1.png! > 2. VerifyReducer have a cpu cost of 14% on format method. This is cause by > create keyString variable. However, keyString may never be used if test > result is correct.(and that's in most cases). Just delay creating keyString > can yield huge performance boost in verifing stage. > !ITBLL-2.png! > 3.Arguments check is needed, because there's constraint between arguments. If > we broken this constraint, we can not get a correct circular list. > > 4.Let big family value size could be configured. > > 5.Avoid starting RS at backup master -- This message was sent by Atlassian JIRA (v7.6.3#76005)