RE: fluo accumulo table tablet servers not keeping up with application

Meier, Caleb Thu, 26 Oct 2017 11:51:20 -0700

Hey Keith,

Thanks for the reply.  Regarding our benchmark, I've attached some screenshots 
of our Accumulo UI that were taken while the benchmark was running.  Basically, 
our ingest rate is pretty low (about 150 entries/s, but our scan rate is off 
the charts - approaching 6 million entries/s!).  Also, notice the disparity 
between reads and returned in the Scan chart.  That disparity would suggest 
that we're possibly doing full table scans somewhere, which is strange given 
that all of our scans are RowColumn constrained.  Perhaps we are building our 
Scanner incorrectly.   In an effort to maximize the number of TabletServers, we 
split the Fluo table into 5MB tablets.  Also, the data is not well balanced -- 
the tablet servers do take turns being maxed out while others are idle.  We're 
considering possible sharding strategies.


Given that our TabletServers are getting saturated so quickly for such a low 
ingest rate, it seems like we definitely need to cut down on the number of 
scans as a first line of attack to see what that buys us.  Then we'll look into 
tuning Accumulo and Fluo.  Does this seem like a reasonable approach to you?  
Does the scan rate of our application strike you as extremely high?  When you 
look at the Rya Observers, can you pay attention to how we are building our 
scans to make sure that we're not inadvertently doing full table scans?  Also, 
what exactly do you mean by "are the 6 lookups in the transaction done 
sequentially"?

Thanks,
Caleb

Caleb A. Meier, Ph.D.
Senior Software Engineer ♦ Analyst
Parsons Corporation
1911 N. Fort Myer Drive, Suite 800 ♦ Arlington, VA 22209
Office:  (703)797-3066
[email protected] ♦ www.parsons.com

-----Original Message-----
From: Keith Turner [mailto:[email protected]] 
Sent: Thursday, October 26, 2017 1:39 PM
To: fluo-dev <[email protected]>
Subject: Re: fluo accumulo table tablet servers not keeping up with application

Caleb

What if any tuning have you done?  The following tune-able Accumulo parameters 
impact performance.

 * Write ahead log sync settings (this can have huge performance implications)
 * Files per tablet
 * Tablet server cache sizes
 * Accumulo data block sizes
 * Tablet server client thread pool size

For Fluo the following tune-able parameters are important.

 * Commit memory (this determines how many transactions are held in memory 
while committing)
 * Threads running transactions

What does the load (CPU and memory) on the cluster look like?  I'm curious how 
even it is?  For example is one tserver at 100% cpu while others are idle, this 
could be caused by uneven data access patterns.

Would it be possible for me to see or run the benchmark?  I am going to take a 
look at the Rya observers, let me know if there is anything in particular I 
should look at.

Are the 6 lookups in the transaction done sequentially?

Keith

On Thu, Oct 26, 2017 at 11:34 AM, Meier, Caleb <[email protected]> wrote:
> Hello Fluo Devs,
>
> We have implemented an incremental query evaluation service for Apache Rya 
> that leverages Apache Fluo.  We’ve been doing some benchmarking and we’ve 
> found that the Accumulo Tablet servers for the Fluo table are falling behind 
> pretty quickly for our application.  We’ve tried splitting the Accumulo Table 
> so that we have more Tablet Servers, but that doesn’t really buy us too much. 
>  Our application is fairly scan intensive—we have a metadata framework in 
> place that allows us to pass query results through the query tree, and each 
> observer needs to look up metadata to determine which observer to route its 
> data to after processing.  To give you some indication of our scan rates, our 
> Join Observer does about 6 lookups, builds a scanner to do one RowColumn 
> restricted scan, and then does many writes.  So an obvious way to alleviate 
> the burden on the TableServer is to cut down on the number of scans.
>
> One approach that we are considering is to import all of our metadata into 
> memory.  Essentially, each Observer would need access to an in memory 
> metadata cache.  We’re considering using the Observer context, but this cache 
> needs to be mutable because a user needs to be able to register new queries.  
> Is it possible to update the context, or would we need to restart the 
> application to do that?  I guess other options would be to create a static 
> cache for each Observer that stores the metadata, or to store it in 
> Zookeeper.  Have any of you devs ever had create a solution to share state 
> between Observers that doesn’t rely on the Fluo table?
>
> In addition to cutting down on the scan rate, are there any other approaches 
> that you would consider?  I assume that the problem lies primarily with how 
> we’ve implemented our application, but I’m also wondering if there is 
> anything we can do from a configuration point of view to reduce the burden on 
> the Tablet servers.  Would reducing the number of workers/worker threads to 
> cut down on the number of times a single observation is processed be helpful? 
>  It seems like this approach would cut out some redundant scans as well, but 
> it might be more of a second order optimization. In general, any insight that 
> you might have on this problem would be greatly appreciated.
>
> Sincerely,
> Caleb Meier
>
> Caleb A. Meier, Ph.D.
> Senior Software Engineer ♦ Analyst
> Parsons Corporation
> 1911 N. Fort Myer Drive, Suite 800 ♦ Arlington, VA 22209
> Office:  (703)797-3066
> [email protected]<mailto:[email protected]> ♦ 
> www.parsons.com<https://webportal.parsons.com/,DanaInfo=www.parsons.com+>
>

RE: fluo accumulo table tablet servers not keeping up with application

Reply via email to