On Thu, Oct 26, 2017 at 2:50 PM, Meier, Caleb <caleb.me...@parsons.com> wrote: > Hey Keith, > > Thanks for the reply. Regarding our benchmark, I've attached some > screenshots of our Accumulo UI that were taken while the benchmark was > running. Basically, our ingest rate is pretty low (about 150 entries/s, but > our scan rate is off the charts - approaching 6 million entries/s!). Also, > notice the disparity between reads and returned in the Scan chart. That > disparity would suggest that we're possibly doing full table scans somewhere, > which is strange given that all of our scans are RowColumn constrained. > Perhaps we are building our Scanner incorrectly. In an effort to maximize > the number of TabletServers, we split the Fluo table into 5MB tablets. Also, > the data is not well balanced -- the tablet servers do take turns being maxed > out while others are idle. We're considering possible sharding strategies.
Yeah need to do something to evenly spread the load or else the application will not scale. I have done two things for this in the past a hash prefix and evenly spreading data sets across tservers. Both of these approaches are in Fluo Recipes. http://fluo.apache.org/docs/fluo-recipes/1.1.0-incubating/row-hasher/ The following is the documentation for evenly spreading data sets across tservers, looking at it now I realize its not very good in showing how to actually use the functionality. http://fluo.apache.org/docs/fluo-recipes/1.1.0-incubating/table-optimization/ > > Given that our TabletServers are getting saturated so quickly for such a low > ingest rate, it seems like we definitely need to cut down on the number of > scans as a first line of attack to see what that buys us. Then we'll look > into tuning Accumulo and Fluo. Does this seem like a reasonable approach to > you? Does the scan rate of our application strike you as extremely high? > When you look at the Rya Observers, can you pay attention to how we are > building our scans to make sure that we're not inadvertently doing full table > scans? Also, what exactly do you mean by "are the 6 lookups in the > transaction done sequentially"? Basically I was wondering what methods you are using to do the 6 gets. The following step of the tour talks about what I was thinking about when I asked that question. http://fluo.apache.org/tour/multi-get/ > > Thanks, > Caleb > > Caleb A. Meier, Ph.D. > Senior Software Engineer ♦ Analyst > Parsons Corporation > 1911 N. Fort Myer Drive, Suite 800 ♦ Arlington, VA 22209 > Office: (703)797-3066 > caleb.me...@parsons.com ♦ www.parsons.com > > -----Original Message----- > From: Keith Turner [mailto:ke...@deenlo.com] > Sent: Thursday, October 26, 2017 1:39 PM > To: fluo-dev <dev@fluo.apache.org> > Subject: Re: fluo accumulo table tablet servers not keeping up with > application > > Caleb > > What if any tuning have you done? The following tune-able Accumulo > parameters impact performance. > > * Write ahead log sync settings (this can have huge performance implications) > * Files per tablet > * Tablet server cache sizes > * Accumulo data block sizes > * Tablet server client thread pool size > > For Fluo the following tune-able parameters are important. > > * Commit memory (this determines how many transactions are held in memory > while committing) > * Threads running transactions > > What does the load (CPU and memory) on the cluster look like? I'm curious > how even it is? For example is one tserver at 100% cpu while others are > idle, this could be caused by uneven data access patterns. > > Would it be possible for me to see or run the benchmark? I am going to take > a look at the Rya observers, let me know if there is anything in particular I > should look at. > > Are the 6 lookups in the transaction done sequentially? > > Keith > > On Thu, Oct 26, 2017 at 11:34 AM, Meier, Caleb <caleb.me...@parsons.com> > wrote: >> Hello Fluo Devs, >> >> We have implemented an incremental query evaluation service for Apache Rya >> that leverages Apache Fluo. We’ve been doing some benchmarking and we’ve >> found that the Accumulo Tablet servers for the Fluo table are falling behind >> pretty quickly for our application. We’ve tried splitting the Accumulo >> Table so that we have more Tablet Servers, but that doesn’t really buy us >> too much. Our application is fairly scan intensive—we have a metadata >> framework in place that allows us to pass query results through the query >> tree, and each observer needs to look up metadata to determine which >> observer to route its data to after processing. To give you some indication >> of our scan rates, our Join Observer does about 6 lookups, builds a scanner >> to do one RowColumn restricted scan, and then does many writes. So an >> obvious way to alleviate the burden on the TableServer is to cut down on the >> number of scans. >> >> One approach that we are considering is to import all of our metadata into >> memory. Essentially, each Observer would need access to an in memory >> metadata cache. We’re considering using the Observer context, but this >> cache needs to be mutable because a user needs to be able to register new >> queries. Is it possible to update the context, or would we need to restart >> the application to do that? I guess other options would be to create a >> static cache for each Observer that stores the metadata, or to store it in >> Zookeeper. Have any of you devs ever had create a solution to share state >> between Observers that doesn’t rely on the Fluo table? >> >> In addition to cutting down on the scan rate, are there any other approaches >> that you would consider? I assume that the problem lies primarily with how >> we’ve implemented our application, but I’m also wondering if there is >> anything we can do from a configuration point of view to reduce the burden >> on the Tablet servers. Would reducing the number of workers/worker threads >> to cut down on the number of times a single observation is processed be >> helpful? It seems like this approach would cut out some redundant scans as >> well, but it might be more of a second order optimization. In general, any >> insight that you might have on this problem would be greatly appreciated. >> >> Sincerely, >> Caleb Meier >> >> Caleb A. Meier, Ph.D. >> Senior Software Engineer ♦ Analyst >> Parsons Corporation >> 1911 N. Fort Myer Drive, Suite 800 ♦ Arlington, VA 22209 >> Office: (703)797-3066 >> caleb.me...@parsons.com<mailto:caleb.me...@parsons.com> ♦ >> www.parsons.com<https://webportal.parsons.com/,DanaInfo=www.parsons.com+> >>