[
https://issues.apache.org/jira/browse/ACCUMULO-112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13721444#comment-13721444
]
Keith Turner commented on ACCUMULO-112:
---------------------------------------
I have a working version of this [on
github|https://github.com/keith-turner/accumulo/tree/ACCUMULO-112]. It still
needs some polishing and test, but its mostly done. I ran the same test I ran
earlier with 32K rows each having 32 column families. I used snappy for this
test. The column "Scan One CF" is the time it took to read one of the 32
column families. The column "Scan CF:CQ" is the time it took to read one
column family and one column qualifier. This scan usually returned 0 to 2
entries.
||Num Locality Groups||Write Time||Scan One CF Time||Scan CF:CQ Time||Flush
Time||
|1|2.21 secs|0.99 secs|0.79 secs|2.07 secs|
|2|2.27 secs|0.71 secs|0.51 secs|2.19 secs|
|4|2.35 secs|0.43 secs|0.20 secs|2.21 secs|
|8|2.48 secs|0.33 secs|0.12 secs|2.33 secs|
|16|2.86 secs|0.26 secs|0.07 secs|2.56 secs|
|32|3.85 secs|0.24 secs|0.05 secs|2.85 secs|
Below the data is normalized per column. Each cell is divided by the minimum
in its column.
||Num Locality Groups||Write Time||Scan One CF Time||Scan CF:CQ Time||Flush
Time||
|1|1.00|4.13|15.80|1.00|
|2|1.03|2.96|10.20|1.06|
|4|1.06|1.79|4.00|1.07|
|8|1.12|1.38|2.40|1.13|
|16|1.29|1.08|1.40|1.24|
|32|1.74|1.00|1.00|1.38|
> Investigate partitioning in memory map by locality group
> --------------------------------------------------------
>
> Key: ACCUMULO-112
> URL: https://issues.apache.org/jira/browse/ACCUMULO-112
> Project: Accumulo
> Issue Type: Task
> Components: tserver
> Reporter: Keith Turner
> Assignee: Keith Turner
> Labels: gsoc2013, mentor
>
> Currently the in memory map is not partitioned by locality group. This could
> negatively impact scan and minor compaction performance. Would like to run
> some experiments to understand the performance implications. Partitioning by
> locality group could negatively impact insert performance, it could go from
> O(log(R)+log(C)) to O(L * (log(R)+log(C))) in the worst case. L is the
> number of locality groups, R is the number of rows and C is the number of
> columns. The worst case is where each mutation has a change for each
> locality group.
> Currently the in memory map is a map of maps. Like the following.
> {noformat}
> map<row, map<col, val>>
> {noformat}
> Could conceptually change this to one of the following. The first is best
> for scans, that access some locality groups, and minor compactions. The
> second is good for inserts where the mutation covers all locality groups,
> because the row is only looked up once.
> {noformat}
> map<localityGroup, map<row, map<col, val>>>
> {noformat}
> {noformat}
> map<row, map<localityGroup, map<col, val>>>
> {noformat}
> The Accumulo native map is implemented using C++,STL, JNI, and with thread
> locking in java.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira