Thanks Josh! Ok, here I add column Hosted Tablets and Entries to the table below for additional information. As we can see the tablets are distributed evenly to all tablet servers, and the one with highest load has the highest number of entries (> 1B), there are few tablet servers have > 700M entries, which are not really far away. I'd admit the data distribution likely not great, because URL is used as row id value (so many of them share same prefix), and it's almost impossible to set the presplit points, unless we know what the data value would be. Instead of specifying split point strings, I wish Accumulo has feature to allow us to specify x number of tablets, and it will automatically split y entries across those x tablets :-)
Follow up questions: 1. The test queries are generated randomly, so theoretically I'd say the likelihood most requests coming to 1 tablet server should be slim, but with the fact of URL is used as row id value, then that may be possible. What does the number in Query column indicate? Is that the number of entries returned, or number of reads? 2. Looking at sample table below, is there a way to find out the ranges of all tablets hosted on TServer14? I am thinking to write a small program to scan all row ids from that tablet server, and find the values which would become the split points, which then I can add the splits to the table, and re-run my tests to see if it resolves the issue. Regarding your other question, yes, I saw a few occasion when refreshing the page, which it shows number of active scans was not 16, and yet there were waiting scans, so it's not like 1-2 times. Server | Hosted Tablets | Entries | Query | Running Scans ==================================== TServer1 | 47 | 548.43M | 24 | 0 (0) TServer2 | 47 | 708.70M | 37 | 0 (0) TServer3 | 47 | 597.88M | 40 | 0 (0) TServer4 | 47 | 382.72M | 1 | 0 (0) TServer5 | 47 | 756.77M | 0 | 0 (0) TServer6 | 47 | 654.38M | 57 | 0 (0) TServer7 | 47 | 695.09M | 5 | 0 (0) TServer8 | 47 | 637.94M | 4 | 0 (0) TServer9 | 47 | 541.74M | 7 | 0 (0) TServer10 | 46 | 625.12M | 0 | 0 (0) TServer11 | 46 | 248.75M | 56 | 0 (0) TServer12 | 46 | 368.87M | 124 | 0 (0) TServer13 | 46 | 292.73M | 25 | 0 (0) TServer14 | 46 | 1.05B | 121 | 16 (435) TServer15 | 46 | 442.23M | 36 | 0 (0) TServer16 | 46 | 800.67M | 21 | 0 (0) TServer17 | 46 | 689.81M | 3 | 0 (0) TServer18 | 46 | 351.86M | 107 | 0 (0) TServer19 | 47 | 941.17M | 21 | 0 (0) TServer20 | 47 | 257.99M | 92 | 0 (0) Thanks, Z -- View this message in context: http://apache-accumulo.1065345.n5.nabble.com/question-on-data-block-cache-tp15906p15937.html Sent from the Developers mailing list archive at Nabble.com.