[ 
https://issues.apache.org/jira/browse/HBASE-9759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13794859#comment-13794859
 ] 

stack commented on HBASE-9759:
------------------------------

+1 on trying the patch.  How does it provent collision (I did not review 
closely).

If you do a select on row 0, does it have more versions than other rows.

What is to prevent our clashing randomly on another row?  Because our randoms 
generation is within a fixed range per iteration?



> IntegrationTestBulkLoad random number collision
> -----------------------------------------------
>
>                 Key: HBASE-9759
>                 URL: https://issues.apache.org/jira/browse/HBASE-9759
>             Project: HBase
>          Issue Type: Bug
>          Components: test
>            Reporter: Enis Soztutar
>            Assignee: Enis Soztutar
>             Fix For: 0.98.0, 0.96.1
>
>         Attachments: hbase-9759_v1.patch
>
>
> ITBL failed recently in our test harness. Inspecting the failure made me 
> believe that the only reason that particular failure might have happened is 
> that there is a collision in random longs generated by the test. 
> The test creates 50 mappers by default, and each mapper writes a 500K random 
> rows starting with row = 0. By default there are 5 iterations.
> The check job outputs these counters: 
> {code}
> 2013-10-13 07:48:01,134 Map input records=124999751
> 2013-10-13 07:48:01,134 Map output records=124999999
> {code}
> The number of input records seems fine because
> {code}
> 124999751 = 1 + 5 * (0.5M - 1) * 50
> {code}
> 5 = num iterations, 0.5M = num rows, 50 = num mappers, and 1 is for row =0 
> which every chain writes to. 
> Output records should be 125M, however we see one cell missing. Since the map 
> input records matches expected number of distinct rows, I suspect that row = 
> 0 had a collision. 
> In one of the generate jobs, we can see that the reducer output count does 
> not match the reducer input count. Given that we are using KVSortReducer, 
> this confirms that there is a collision in KeyValues received by this task.
> {code}
> 2013-10-13 06:48:12,738 Reduce input records=75000000
> 2013-10-13 06:48:12,738 Reduce output records=74999997
> {code}
> The count is off by 3 because we are writing 3 columns per row. 
> My only theory for explaining this is that we had a collision in chainId's or 
> one of the chains reused row = 0 as the next row. 
> This is similar to HBASE-8700, however, in this the probability is much much 
> much lower. 



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to