[jira] [Updated] (KUDU-1220) Improve bulk loads from multiple sequential writers
[ https://issues.apache.org/jira/browse/KUDU-1220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Henke updated KUDU-1220: -- Labels: kudu-roadmap (was: ) > Improve bulk loads from multiple sequential writers > --- > > Key: KUDU-1220 > URL: https://issues.apache.org/jira/browse/KUDU-1220 > Project: Kudu > Issue Type: Improvement > Components: backup, perf >Affects Versions: Public beta >Reporter: Jean-Daniel Cryans >Assignee: Todd Lipcon >Priority: Major > Labels: kudu-roadmap > Attachments: orderkeys.py, write-pattern.png > > > We ran some experiments loading lineitem at scale factor 15k. The 10 nodes > cluster (1 master, 9 TS) is equipped with Intel P3700 SSDs, one per TS, > dedicated for the WALs. The table is hash-partitioned and set to have 10 > tablets per TS. > Our findings : > - Reading the bloom filters puts a lot of contention on the block cache. This > isn't new, see KUDU-613, but it's now coming up when writing because the SSDs > are just really fast. > - Kudu performs best when data is inserted in order, but with hash > partitioning we end up multiple clients writing simultaneously in different > key ranges in each tablet. This becomes a worst case scenario, we have to > compact (optimize) the row sets over and over again to put them in order. > Even if we were to delay this to the end of the bulk load, we're still taking > a hit because we have to look at more and more bloom filters to check if a > row currently exists or not. > - In the case of an initial bulk load, we know we're not trying to overwrite > rows or update them, so all those checks are unnecessary. > Some ideas for improvements: > - Obviously, we need a better block cache. > - When flushing, we could detect those disjoint set of rows and make sure > that maps to row sets that don't cover the gaps. For example, if the MRS has > a,b,c,x,y,z then flushing would give us two row sets eg a,b,c and x,y,z > instead of one. The danger here is generating too many row sets. > - When reading, to have the row set interval tree be smart enough to not send > readers into the row set gaps. Again with the same example, let's say we're > looking for "m", normally we'd see a row set that's a-z so we'd have to check > its bloom filter, but if we could detect that it's actually a-c then x-z then > we'd save a check. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (KUDU-1220) Improve bulk loads from multiple sequential writers
[ https://issues.apache.org/jira/browse/KUDU-1220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Henke updated KUDU-1220: -- Component/s: backup > Improve bulk loads from multiple sequential writers > --- > > Key: KUDU-1220 > URL: https://issues.apache.org/jira/browse/KUDU-1220 > Project: Kudu > Issue Type: Improvement > Components: backup, perf >Affects Versions: Public beta >Reporter: Jean-Daniel Cryans >Assignee: Todd Lipcon >Priority: Major > Attachments: orderkeys.py, write-pattern.png > > > We ran some experiments loading lineitem at scale factor 15k. The 10 nodes > cluster (1 master, 9 TS) is equipped with Intel P3700 SSDs, one per TS, > dedicated for the WALs. The table is hash-partitioned and set to have 10 > tablets per TS. > Our findings : > - Reading the bloom filters puts a lot of contention on the block cache. This > isn't new, see KUDU-613, but it's now coming up when writing because the SSDs > are just really fast. > - Kudu performs best when data is inserted in order, but with hash > partitioning we end up multiple clients writing simultaneously in different > key ranges in each tablet. This becomes a worst case scenario, we have to > compact (optimize) the row sets over and over again to put them in order. > Even if we were to delay this to the end of the bulk load, we're still taking > a hit because we have to look at more and more bloom filters to check if a > row currently exists or not. > - In the case of an initial bulk load, we know we're not trying to overwrite > rows or update them, so all those checks are unnecessary. > Some ideas for improvements: > - Obviously, we need a better block cache. > - When flushing, we could detect those disjoint set of rows and make sure > that maps to row sets that don't cover the gaps. For example, if the MRS has > a,b,c,x,y,z then flushing would give us two row sets eg a,b,c and x,y,z > instead of one. The danger here is generating too many row sets. > - When reading, to have the row set interval tree be smart enough to not send > readers into the row set gaps. Again with the same example, let's say we're > looking for "m", normally we'd see a row set that's a-z so we'd have to check > its bloom filter, but if we could detect that it's actually a-c then x-z then > we'd save a check. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (KUDU-1220) Improve bulk loads from multiple sequential writers
[ https://issues.apache.org/jira/browse/KUDU-1220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon updated KUDU-1220: -- Component/s: perf > Improve bulk loads from multiple sequential writers > --- > > Key: KUDU-1220 > URL: https://issues.apache.org/jira/browse/KUDU-1220 > Project: Kudu > Issue Type: Improvement > Components: perf >Affects Versions: Public beta >Reporter: Jean-Daniel Cryans > Attachments: orderkeys.py, write-pattern.png > > > We ran some experiments loading lineitem at scale factor 15k. The 10 nodes > cluster (1 master, 9 TS) is equipped with Intel P3700 SSDs, one per TS, > dedicated for the WALs. The table is hash-partitioned and set to have 10 > tablets per TS. > Our findings : > - Reading the bloom filters puts a lot of contention on the block cache. This > isn't new, see KUDU-613, but it's now coming up when writing because the SSDs > are just really fast. > - Kudu performs best when data is inserted in order, but with hash > partitioning we end up multiple clients writing simultaneously in different > key ranges in each tablet. This becomes a worst case scenario, we have to > compact (optimize) the row sets over and over again to put them in order. > Even if we were to delay this to the end of the bulk load, we're still taking > a hit because we have to look at more and more bloom filters to check if a > row currently exists or not. > - In the case of an initial bulk load, we know we're not trying to overwrite > rows or update them, so all those checks are unnecessary. > Some ideas for improvements: > - Obviously, we need a better block cache. > - When flushing, we could detect those disjoint set of rows and make sure > that maps to row sets that don't cover the gaps. For example, if the MRS has > a,b,c,x,y,z then flushing would give us two row sets eg a,b,c and x,y,z > instead of one. The danger here is generating too many row sets. > - When reading, to have the row set interval tree be smart enough to not send > readers into the row set gaps. Again with the same example, let's say we're > looking for "m", normally we'd see a row set that's a-z so we'd have to check > its bloom filter, but if we could detect that it's actually a-c then x-z then > we'd save a check. -- This message was sent by Atlassian JIRA (v6.3.15#6346)