[ 
https://issues.apache.org/jira/browse/KUDU-1220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Henke updated KUDU-1220:
------------------------------
    Component/s: backup

> Improve bulk loads from multiple sequential writers
> ---------------------------------------------------
>
>                 Key: KUDU-1220
>                 URL: https://issues.apache.org/jira/browse/KUDU-1220
>             Project: Kudu
>          Issue Type: Improvement
>          Components: backup, perf
>    Affects Versions: Public beta
>            Reporter: Jean-Daniel Cryans
>            Assignee: Todd Lipcon
>            Priority: Major
>         Attachments: orderkeys.py, write-pattern.png
>
>
> We ran some experiments loading lineitem at scale factor 15k. The 10 nodes 
> cluster (1 master, 9 TS) is equipped with Intel P3700 SSDs, one per TS, 
> dedicated for the WALs. The table is hash-partitioned and set to have 10 
> tablets per TS.
> Our findings :
> - Reading the bloom filters puts a lot of contention on the block cache. This 
> isn't new, see KUDU-613, but it's now coming up when writing because the SSDs 
> are just really fast.
> - Kudu performs best when data is inserted in order, but with hash 
> partitioning we end up multiple clients writing simultaneously in different 
> key ranges in each tablet. This becomes a worst case scenario, we have to 
> compact (optimize) the row sets over and over again to put them in order. 
> Even if we were to delay this to the end of the bulk load, we're still taking 
> a hit because we have to look at more and more bloom filters to check if a 
> row currently exists or not.
> - In the case of an initial bulk load, we know we're not trying to overwrite 
> rows or update them, so all those checks are unnecessary.
> Some ideas for improvements:
> - Obviously, we need a better block cache.
> - When flushing, we could detect those disjoint set of rows and make sure 
> that maps to row sets that don't cover the gaps. For example, if the MRS has 
> a,b,c,x,y,z then flushing would give us two row sets eg a,b,c and x,y,z 
> instead of one. The danger here is generating too many row sets.
> - When reading, to have the row set interval tree be smart enough to not send 
> readers into the row set gaps. Again with the same example, let's say we're 
> looking for "m", normally we'd see a row set that's a-z so we'd have to check 
> its bloom filter, but if we could detect that it's actually a-c then x-z then 
> we'd save a check.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to