[ 
https://issues.apache.org/jira/browse/ACCUMULO-112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13721444#comment-13721444
 ] 

Keith Turner commented on ACCUMULO-112:
---------------------------------------

I have a working version of this [on 
github|https://github.com/keith-turner/accumulo/tree/ACCUMULO-112].  It still 
needs some polishing and test, but its mostly done.  I ran the same test I ran 
earlier with 32K rows each having 32 column families.  I used snappy for this 
test.  The column "Scan One CF" is the time it took to read one of the 32 
column families.   The column "Scan CF:CQ" is the time it took to read one 
column family and one column qualifier.  This scan usually returned 0 to 2 
entries.

||Num Locality Groups||Write Time||Scan One CF Time||Scan CF:CQ Time||Flush 
Time||
|1|2.21 secs|0.99 secs|0.79 secs|2.07 secs|
|2|2.27 secs|0.71 secs|0.51 secs|2.19 secs|
|4|2.35 secs|0.43 secs|0.20 secs|2.21 secs|
|8|2.48 secs|0.33 secs|0.12 secs|2.33 secs|
|16|2.86 secs|0.26 secs|0.07 secs|2.56 secs|
|32|3.85 secs|0.24 secs|0.05 secs|2.85 secs|

Below the data is normalized per column.  Each cell is divided by the minimum 
in its column. 

||Num Locality Groups||Write Time||Scan One CF Time||Scan CF:CQ Time||Flush 
Time||
|1|1.00|4.13|15.80|1.00|
|2|1.03|2.96|10.20|1.06|
|4|1.06|1.79|4.00|1.07|
|8|1.12|1.38|2.40|1.13|
|16|1.29|1.08|1.40|1.24|
|32|1.74|1.00|1.00|1.38|


                
> Investigate partitioning in memory map by locality group
> --------------------------------------------------------
>
>                 Key: ACCUMULO-112
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-112
>             Project: Accumulo
>          Issue Type: Task
>          Components: tserver
>            Reporter: Keith Turner
>            Assignee: Keith Turner
>              Labels: gsoc2013, mentor
>
> Currently the in memory map is not partitioned by locality group.  This could 
> negatively impact scan and minor compaction performance.    Would like to run 
> some experiments to understand the performance implications.  Partitioning by 
> locality group could negatively impact insert performance, it could go from 
> O(log(R)+log(C))  to O(L * (log(R)+log(C))) in the worst case.  L is the 
> number of locality groups, R is the number of rows and C is the number of 
> columns.  The worst case is where each mutation has a change for each 
> locality group. 
> Currently the in memory map is a map of maps.  Like the following.
> {noformat}
>   map<row, map<col, val>>
> {noformat}
> Could conceptually change this to one of the following.  The first is best 
> for scans, that access some locality groups, and minor compactions.  The 
> second is good for inserts where the mutation covers all locality groups, 
> because the row is only looked up once.
> {noformat}
>   map<localityGroup, map<row, map<col, val>>>
> {noformat}
> {noformat}
>   map<row, map<localityGroup, map<col, val>>>
> {noformat}
> The Accumulo native map is implemented using C++,STL, JNI, and with thread 
> locking in java.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to