On Thu, Oct 26, 2017 at 2:50 PM, Meier, Caleb <caleb.me...@parsons.com> wrote:
> Hey Keith,
>
> Thanks for the reply.  Regarding our benchmark, I've attached some 
> screenshots of our Accumulo UI that were taken while the benchmark was 
> running.  Basically, our ingest rate is pretty low (about 150 entries/s, but 
> our scan rate is off the charts - approaching 6 million entries/s!).  Also, 
> notice the disparity between reads and returned in the Scan chart.  That 
> disparity would suggest that we're possibly doing full table scans somewhere, 
> which is strange given that all of our scans are RowColumn constrained.  
> Perhaps we are building our Scanner incorrectly.   In an effort to maximize 
> the number of TabletServers, we split the Fluo table into 5MB tablets.  Also, 
> the data is not well balanced -- the tablet servers do take turns being maxed 
> out while others are idle.  We're considering possible sharding strategies.

Yeah need to do something to evenly spread the load or else the
application will not scale.  I have done two things for this in the
past a hash prefix and evenly spreading data sets across tservers.
Both of these approaches are in Fluo Recipes.

http://fluo.apache.org/docs/fluo-recipes/1.1.0-incubating/row-hasher/

The following is the documentation for evenly spreading data sets
across tservers, looking at it now I realize its not very good in
showing how to actually use the functionality.

http://fluo.apache.org/docs/fluo-recipes/1.1.0-incubating/table-optimization/


>
> Given that our TabletServers are getting saturated so quickly for such a low 
> ingest rate, it seems like we definitely need to cut down on the number of 
> scans as a first line of attack to see what that buys us.  Then we'll look 
> into tuning Accumulo and Fluo.  Does this seem like a reasonable approach to 
> you?  Does the scan rate of our application strike you as extremely high?  
> When you look at the Rya Observers, can you pay attention to how we are 
> building our scans to make sure that we're not inadvertently doing full table 
> scans?  Also, what exactly do you mean by "are the 6 lookups in the 
> transaction done sequentially"?

Basically I was wondering what methods you are using to do the 6 gets.
The following step of the tour talks about what I was thinking about
when I asked that question.

http://fluo.apache.org/tour/multi-get/

>
> Thanks,
> Caleb
>
> Caleb A. Meier, Ph.D.
> Senior Software Engineer ♦ Analyst
> Parsons Corporation
> 1911 N. Fort Myer Drive, Suite 800 ♦ Arlington, VA 22209
> Office:  (703)797-3066
> caleb.me...@parsons.com ♦ www.parsons.com
>
> -----Original Message-----
> From: Keith Turner [mailto:ke...@deenlo.com]
> Sent: Thursday, October 26, 2017 1:39 PM
> To: fluo-dev <dev@fluo.apache.org>
> Subject: Re: fluo accumulo table tablet servers not keeping up with 
> application
>
> Caleb
>
> What if any tuning have you done?  The following tune-able Accumulo 
> parameters impact performance.
>
>  * Write ahead log sync settings (this can have huge performance implications)
>  * Files per tablet
>  * Tablet server cache sizes
>  * Accumulo data block sizes
>  * Tablet server client thread pool size
>
> For Fluo the following tune-able parameters are important.
>
>  * Commit memory (this determines how many transactions are held in memory 
> while committing)
>  * Threads running transactions
>
> What does the load (CPU and memory) on the cluster look like?  I'm curious 
> how even it is?  For example is one tserver at 100% cpu while others are 
> idle, this could be caused by uneven data access patterns.
>
> Would it be possible for me to see or run the benchmark?  I am going to take 
> a look at the Rya observers, let me know if there is anything in particular I 
> should look at.
>
> Are the 6 lookups in the transaction done sequentially?
>
> Keith
>
> On Thu, Oct 26, 2017 at 11:34 AM, Meier, Caleb <caleb.me...@parsons.com> 
> wrote:
>> Hello Fluo Devs,
>>
>> We have implemented an incremental query evaluation service for Apache Rya 
>> that leverages Apache Fluo.  We’ve been doing some benchmarking and we’ve 
>> found that the Accumulo Tablet servers for the Fluo table are falling behind 
>> pretty quickly for our application.  We’ve tried splitting the Accumulo 
>> Table so that we have more Tablet Servers, but that doesn’t really buy us 
>> too much.  Our application is fairly scan intensive—we have a metadata 
>> framework in place that allows us to pass query results through the query 
>> tree, and each observer needs to look up metadata to determine which 
>> observer to route its data to after processing.  To give you some indication 
>> of our scan rates, our Join Observer does about 6 lookups, builds a scanner 
>> to do one RowColumn restricted scan, and then does many writes.  So an 
>> obvious way to alleviate the burden on the TableServer is to cut down on the 
>> number of scans.
>>
>> One approach that we are considering is to import all of our metadata into 
>> memory.  Essentially, each Observer would need access to an in memory 
>> metadata cache.  We’re considering using the Observer context, but this 
>> cache needs to be mutable because a user needs to be able to register new 
>> queries.  Is it possible to update the context, or would we need to restart 
>> the application to do that?  I guess other options would be to create a 
>> static cache for each Observer that stores the metadata, or to store it in 
>> Zookeeper.  Have any of you devs ever had create a solution to share state 
>> between Observers that doesn’t rely on the Fluo table?
>>
>> In addition to cutting down on the scan rate, are there any other approaches 
>> that you would consider?  I assume that the problem lies primarily with how 
>> we’ve implemented our application, but I’m also wondering if there is 
>> anything we can do from a configuration point of view to reduce the burden 
>> on the Tablet servers.  Would reducing the number of workers/worker threads 
>> to cut down on the number of times a single observation is processed be 
>> helpful?  It seems like this approach would cut out some redundant scans as 
>> well, but it might be more of a second order optimization. In general, any 
>> insight that you might have on this problem would be greatly appreciated.
>>
>> Sincerely,
>> Caleb Meier
>>
>> Caleb A. Meier, Ph.D.
>> Senior Software Engineer ♦ Analyst
>> Parsons Corporation
>> 1911 N. Fort Myer Drive, Suite 800 ♦ Arlington, VA 22209
>> Office:  (703)797-3066
>> caleb.me...@parsons.com<mailto:caleb.me...@parsons.com> ♦
>> www.parsons.com<https://webportal.parsons.com/,DanaInfo=www.parsons.com+>
>>

Reply via email to