[ 
https://issues.apache.org/jira/browse/HBASE-6295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13668441#comment-13668441
 ] 

Nicolas Liochon commented on HBASE-6295:
----------------------------------------

bq. Are the above failures because of the patch?
TestHCM it's a bad fix of an old hidden bug. I've got what I expect to be the 
right fix. I'm testing it locally right now.

bq. The new class needs a license.
done.

bq. Does AsyncProcessCallback have to be public? Is it only used inside the 
client package? If so, shut down access.
Done.

bq. We need these atomics + AtomicInteger ct = 
taskCounterPerRegion.get(encodedRegionName);? It is single threaded access 
right? Or you need it for internals?


bq. + static class MyAsyncProcess<Res> extends AsyncProcess<Res> {
It should. I will double check.
We need them because the client is monotheaded, but we receive the results, and 
resubmit in parallel.

bq. Why we need to add this, updateCachedLocations, if its internally used? It 
means you can use HConnection in more places instead of HCI?
Yes, I removed the cast to HCI (but I now have to push updateCachedLocations in 
the interface).

bq. If so, why not Result? If not and it is just generic, just R? Res confuses.
Will do.

I'm going to test it on a real cluster. I've actually tested a lot trunk with a 
1.7, it was working great globally. But right now, still with trunk, I got 
stuck. I'm also going to test the 0.95.1.
                
> Possible performance improvement in client batch operations: presplit and 
> send in background
> --------------------------------------------------------------------------------------------
>
>                 Key: HBASE-6295
>                 URL: https://issues.apache.org/jira/browse/HBASE-6295
>             Project: HBase
>          Issue Type: Improvement
>          Components: Client, Performance
>    Affects Versions: 0.95.2
>            Reporter: Nicolas Liochon
>            Assignee: Nicolas Liochon
>              Labels: noob
>             Fix For: 0.98.0
>
>         Attachments: 6295.v1.patch, 6295.v2.patch, 6295.v3.patch, 
> 6295.v4.patch, 6295.v5.patch, 6295.v6.patch, 6295.v8.patch
>
>
> today batch algo is:
> {noformat}
> for Operation o: List<Op>{
>   add o to todolist
>   if todolist > maxsize or o last in list
>     split todolist per location
>     send split lists to region servers
>     clear todolist
>     wait
> }
> {noformat}
> We could:
> - create immediately the final object instead of an intermediate array
> - split per location immediately
> - instead of sending when the list as a whole is full, send it when there is 
> enough data for a single location
> It would be:
> {noformat}
> for Operation o: List<Op>{
>   get location
>   add o to todo location.todolist
>   if (location.todolist > maxLocationSize)
>     send location.todolist to region server 
>     clear location.todolist
>     // don't wait, continue the loop
> }
> send remaining
> wait
> {noformat}
> It's not trivial to write if you add error management: retried list must be 
> shared with the operations added in the todolist. But it's doable.
> It's interesting mainly for 'big' writes

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to