Suresh, There are a lot of configuration points that can have an impact. For example, there is a configuration option that dictates how much data is returned each "iteration," called table.scan.max.memory [0]. Increasing this will cause more work to be done in each RPC call to get data. Lowering this can have the illusion of improved response time since you get data faster. Playing with this might impact your use case. If your keys/values are large you might attempt to increase this configuration number.
Further, scanning can be impacted by the size of the data and the way it is stored. Table block caching might have an improvement [1], but I'm curious about how the data is stored. Do you have example keys. Are you returning all 1 million records from Accumulo through the scanner to perform some logic client side or is the logic server side in an iterator? Could you do more work in an iterator? Iterating over 1 M keys likely won't take 2-3 seconds when executed at the tablet server, depending on the size of the key. Providing some insight into what the key structure is might give us more insight into how to better configure your tablet server properties. Finally, is the 2-3 seconds just the time to get the data or does that include time to inspect keys? [0] http://accumulo.apache.org/1.6/accumulo_user_manual#_table_scan_max_memory [1] http://accumulo.apache.org/1.6/accumulo_user_manual#_block_cache On Thu, Apr 27, 2017 at 7:09 AM, Suresh Prajapati <[email protected] > wrote: > Hello Team > > I am developing a client in accumulo to store geo-spatial information and > using geomesa for indexing on top of it. However i found that scanning *~1 > million* records taking *2-3 sec*. I looked at indexes and query plan of > geomesa but not able to find cause of the problem. I am running accumulo as > single tablet-server(including master). I want to know - > what are the factors can affect accumulo scanning operation? how can I > optimise this time? > > Thank You > Suresh Prajapati >
