RE: fluo accumulo table tablet servers not keeping up with application

Meier, Caleb Tue, 31 Oct 2017 11:22:43 -0700

Hey Keith,

Just following up on your last message.  After looking at the worker ScanTask 
logs, it seems like the workers are conducting scans as frequently as the min 
sleep time permits.  That is, if the min sleep time is set to 10s, a ScanTask 
is being executed every 10s.  In addition, running the Fluo wait command 
indicates that the number of outstanding notifications steadily increases or is 
held constant (depending on the number of workers).  Based on your comments 
below, it seems like the workers should be scanning at a lower rate given that 
the notification work queue is constantly increasing in size.  Another thing 
that we tried was reducing the number of workers and increasing the min sleep 
time.  This lowered the scan burden on the tablet server, but unsurprisingly 
our processing rate plummeted.  We also tried lowering the ingest rate for a 
fixed number of workers (lowering the notification rate for each worker 
thread).  While it took longer for the TabletServer to become saturated, it 
still became overwhelmed.


In general, for the queries that we are benchmarking, our notification:data 
ratio is about 7:1 (i.e. each piece of ingested data generates about 7 
notifications on the way to being entirely processed).  I think that this is 
our primary culprit, but I think that our application specific scans are also 
part of the problem (I'm still in the process of trying to determine what 
portion of the scans that we are seeing is specific to our observers and what 
portion is specific to notification discovery - any suggestions here would be 
appreciated).  One reason that I think notification discovery is the culprit is 
that we implemented an in memory cache for the metadata, and that didn't seem 
to affect the scan rate too much (metadata seeks constitute about 30% of our 
seeks/scans). 

Going forward, we're going to shard our data and look into ways to cut down on 
scans.  Any other suggestions about how to improve performance would be 
appreciated.

Thanks,
Caleb

Caleb A. Meier, Ph.D.
Senior Software Engineer ♦ Analyst
Parsons Corporation
1911 N. Fort Myer Drive, Suite 800 ♦ Arlington, VA 22209
Office:  (703)797-3066
[email protected] ♦ www.parsons.com

-----Original Message-----
From: Keith Turner [mailto:[email protected]] 
Sent: Friday, October 27, 2017 12:17 PM
To: fluo-dev <[email protected]>
Subject: Re: fluo accumulo table tablet servers not keeping up with application

On Fri, Oct 27, 2017 at 11:03 AM, Meier, Caleb <[email protected]> wrote:
> Hey Keith,
>
> Our benchmark consists of a single query that is a join of two statement 
> patterns (essentially patterns that incoming data matches, where a unit of 
> data is a statement).  We are ingesting 50 pairs of statements a minute (100 
> total), where each statement in the pair matches one of the statement 
> patterns.  Because the data is being ingested at a constant rate, the 
> statement pattern Observers and Join Observers are constantly working.  One 
> thing that is worth mentioning is that we changed the property 
> fluo.implScanTask.maxSleep from 5 min to 10 seconds.  Based on the constant 
> ingest rate, your comments below, and our low maxSleep, it seems like the 
> workers would constantly be scanning for new notifications.
>
>> Once a worker scans all tablets and finds a list of notifications, it does 
>> not scan again until half of those notifications are processed.
>
> How does the maxSleep property work in conjunction with this?  If the max 
> sleep time elapses before a worker processes half of the notifications, will 
> it scan?

I don't think it will scan again until the # of queued notifications is cut in 
half.  I looked in 1.0.0 and 1.1.0 and I think while loops linked below should 
hold off on the scan until the queue halves.

https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_fluo_blob_rel_fluo-2D1.0.0-2Dincubating_modules_core_src_main_java_org_apache_fluo_core_worker_finder_hash_ScanTask.java-23L85&d=DwIFaQ&c=Nwf-pp4xtYRe0sCRVM8_LWH54joYF7EKmrYIdfxIq10&r=vuVdzYC2kksVZR5STiFwDpzJ7CrMHCgeo_4WXTD0qo8&m=btY_WNg1O7SuwcHi1m2ksRp3ggzrI7nJlnC2B5cHgaU&s=BRyQS2DPBtEfUvHT-JKBXPWABrSyihP6yaJcfE1BJFQ&e=
https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_fluo_blob_rel_fluo-2D1.1.0-2Dincubating_modules_core_src_main_java_org_apache_fluo_core_worker_finder_hash_ScanTask.java-23L88&d=DwIFaQ&c=Nwf-pp4xtYRe0sCRVM8_LWH54joYF7EKmrYIdfxIq10&r=vuVdzYC2kksVZR5STiFwDpzJ7CrMHCgeo_4WXTD0qo8&m=btY_WNg1O7SuwcHi1m2ksRp3ggzrI7nJlnC2B5cHgaU&s=ZxURCZE5k65I008z7o4UQGsm6o0mBtJnwV_N6Y668oM&e=

Were you able to find the ScanTask debug messages in the worker logs?
Below are the log messages int the code to give a sense of what to look for.

https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_fluo_blob_rel_fluo-2D1.0.0-2Dincubating_modules_core_src_main_java_org_apache_fluo_core_worker_finder_hash_ScanTask.java-23L130&d=DwIFaQ&c=Nwf-pp4xtYRe0sCRVM8_LWH54joYF7EKmrYIdfxIq10&r=vuVdzYC2kksVZR5STiFwDpzJ7CrMHCgeo_4WXTD0qo8&m=btY_WNg1O7SuwcHi1m2ksRp3ggzrI7nJlnC2B5cHgaU&s=C141kYyjygBL3kWZyUObU1-nu4ZjvMnu7xp_QbIGkCA&e=
https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_fluo_blob_rel_fluo-2D1.1.0-2Dincubating_modules_core_src_main_java_org_apache_fluo_core_worker_finder_hash_ScanTask.java-23L146&d=DwIFaQ&c=Nwf-pp4xtYRe0sCRVM8_LWH54joYF7EKmrYIdfxIq10&r=vuVdzYC2kksVZR5STiFwDpzJ7CrMHCgeo_4WXTD0qo8&m=btY_WNg1O7SuwcHi1m2ksRp3ggzrI7nJlnC2B5cHgaU&s=4Qy1-LbMEpJ7NZLqngU8ZOEOBv6nB0nXM8mjkWdpEL4&e=

IIRC I think if notifications were found in a tablet during the last scan, then 
it will always scan it during the next scan loop.  As notifications are not 
found in a tablet then that tablets next scan time doubles up to 
fluo.implScanTask.maxSleep.

So its possible that all notifications found are being processed quickly and 
then the workers are scanning for more.  The debug messages would show this.

There is also a minSleep time.  This property determines the minimum amount of 
time it will sleep between scan loops, seems to default to 5 secs.  Could try 
increasing this.

Looking at the props, it seems they prop names for min and max sleep changed 
between 1.0.0 and 1.1.0.


>
> Caleb A. Meier, Ph.D.
> Senior Software Engineer ♦ Analyst
> Parsons Corporation
> 1911 N. Fort Myer Drive, Suite 800 ♦ Arlington, VA 22209
> Office:  (703)797-3066
> [email protected] ♦ www.parsons.com
>
> -----Original Message-----
> From: Keith Turner [mailto:[email protected]]
> Sent: Thursday, October 26, 2017 6:20 PM
> To: fluo-dev <[email protected]>
> Subject: Re: fluo accumulo table tablet servers not keeping up with 
> application
>
> On Thu, Oct 26, 2017 at 5:47 PM, Meier, Caleb <[email protected]> wrote:
>> Hey Keith,
>>
>> We'll rerun the benchmarks tomorrow and track the outstanding notifications. 
>>  We'll also see if compacting at some point during ingest helps with the 
>> scan rate.  Have you observed such high scan rates for such a small amount 
>> of data in any of your benchmarking?  What would account for the huge 
>> disparity in results read vs. results returned?  It seems like our scans are 
>> extremely inefficient for some reason.  Our tablet servers are becoming 
>> overwhelmed even before data gets flushed to disk.
>
> Oh I never saw you attachment, may not be able to attach stuff on mailing 
> list.
>
> Its possible that what you are seeing is the workers scanning for 
> notifications.  If you look in the workers logs do you see messages about 
> scanning for notifications?  If so what do they look like?
>
> In 1.0.0 each worker scans all tablets in random order.  When it scans it has 
> an iterator that uses hash+mod to select a subset of notifications.  The 
> iterator also suppresses deleted notifications.
> So the selection and suppression by that iterator could explain the read vs 
> returned.  It does exponential back off on tablets where it does not find 
> data.  Once a worker scans all tablets and finds a list of notifications, it 
> does not scan again until half of those notifications are processed.
>
> In the beginning, would you have a lot of notifications?  If so I would 
> expect a lot of scanning and then it should slow down once the workers get a 
> list of notifications to process.
>
> In 1.1.0 the workers divide up the tablets (so workers no longer scan
> all tablets, groups of workers share groups of tablets).   If the
> table is splits after the workers start, it may take them a bit to execute 
> the distributed algorithm that divys tablets among workers.
>
> Anyway the debug messages about scanning for notifications in the workers 
> should provide some insight into this.
>
> If its not notification scanning, then it could be that the application is 
> scanning over a lots of data that was deleted or something like that.
>
>>
>> Caleb A. Meier, Ph.D.
>> Senior Software Engineer ♦ Analyst
>> Parsons Corporation
>> 1911 N. Fort Myer Drive, Suite 800 ♦ Arlington, VA 22209
>> Office:  (703)797-3066
>> [email protected] ♦ www.parsons.com
>>
>> -----Original Message-----
>> From: Keith Turner [mailto:[email protected]]
>> Sent: Thursday, October 26, 2017 5:36 PM
>> To: fluo-dev <[email protected]>
>> Subject: Re: fluo accumulo table tablet servers not keeping up with 
>> application
>>
>> On Thu, Oct 26, 2017 at 2:50 PM, Meier, Caleb <[email protected]> 
>> wrote:
>>> Hey Keith,
>>>
>>> Thanks for the reply.  Regarding our benchmark, I've attached some 
>>> screenshots of our Accumulo UI that were taken while the benchmark was 
>>> running.  Basically, our ingest rate is pretty low (about 150 entries/s, 
>>> but our scan rate is off the charts - approaching 6 million entries/s!).  
>>> Also, notice the disparity between reads and returned in the Scan chart.  
>>> That disparity would suggest that we're possibly doing full table scans 
>>> somewhere, which is strange given that all of our scans are RowColumn 
>>> constrained.  Perhaps we are building our Scanner incorrectly.   In an 
>>> effort to maximize the number of TabletServers, we split the Fluo table 
>>> into 5MB tablets.  Also, the data is not well balanced -- the tablet 
>>> servers do take turns being maxed out while others are idle.  We're 
>>> considering possible sharding strategies.
>>>
>>> Given that our TabletServers are getting saturated so quickly for such a 
>>> low ingest rate, it seems like we definitely need to cut down on the number 
>>> of scans as a first line of attack to see what that buys us.  Then we'll 
>>> look into tuning Accumulo and Fluo.  Does this seem like a reasonable 
>>> approach to you?  Does the scan rate of our application strike you as 
>>> extremely high?  When you look at the Rya Observers, can you pay attention 
>>> to how we are building our scans to make sure that we're not inadvertently 
>>> doing full table scans?  Also, what exactly do you mean by "are the 6 
>>> lookups in the transaction done sequentially"?
>>
>> Regarding the scan rate there are few things I Am curious about.
>>
>> Fluo workers scan for notifications in addition to the scanning done 
>> by your apps.  I made some changes in 1.1.0 to reduce the amount of 
>> scanning needed to find notifications, but this should not make much 
>> of a difference on a small amount of nodes.  Details about this are 
>> in
>> 1.1.0 release notes.  I am not sure what the best way is to determine how 
>> much of the scanning you are seeing is app vs notification finding.  Can you 
>> run the fluo wait command to see how many outstanding notifications there 
>> are?
>>
>> Transactions leave a paper trail behind and compactions clean this up (Fluo 
>> has a garbage collection iterator).  This is why I asked what effect 
>> compacting the table had.  Compactions will also clean up deleted 
>> notifications.
>>
>>
>>>
>>> Thanks,
>>> Caleb
>>>
>>> Caleb A. Meier, Ph.D.
>>> Senior Software Engineer ♦ Analyst
>>> Parsons Corporation
>>> 1911 N. Fort Myer Drive, Suite 800 ♦ Arlington, VA 22209
>>> Office:  (703)797-3066
>>> [email protected] ♦ www.parsons.com
>>>
>>> -----Original Message-----
>>> From: Keith Turner [mailto:[email protected]]
>>> Sent: Thursday, October 26, 2017 1:39 PM
>>> To: fluo-dev <[email protected]>
>>> Subject: Re: fluo accumulo table tablet servers not keeping up with 
>>> application
>>>
>>> Caleb
>>>
>>> What if any tuning have you done?  The following tune-able Accumulo 
>>> parameters impact performance.
>>>
>>>  * Write ahead log sync settings (this can have huge performance
>>> implications)
>>>  * Files per tablet
>>>  * Tablet server cache sizes
>>>  * Accumulo data block sizes
>>>  * Tablet server client thread pool size
>>>
>>> For Fluo the following tune-able parameters are important.
>>>
>>>  * Commit memory (this determines how many transactions are held in 
>>> memory while committing)
>>>  * Threads running transactions
>>>
>>> What does the load (CPU and memory) on the cluster look like?  I'm curious 
>>> how even it is?  For example is one tserver at 100% cpu while others are 
>>> idle, this could be caused by uneven data access patterns.
>>>
>>> Would it be possible for me to see or run the benchmark?  I am going to 
>>> take a look at the Rya observers, let me know if there is anything in 
>>> particular I should look at.
>>>
>>> Are the 6 lookups in the transaction done sequentially?
>>>
>>> Keith
>>>
>>> On Thu, Oct 26, 2017 at 11:34 AM, Meier, Caleb <[email protected]> 
>>> wrote:
>>>> Hello Fluo Devs,
>>>>
>>>> We have implemented an incremental query evaluation service for Apache Rya 
>>>> that leverages Apache Fluo.  We’ve been doing some benchmarking and we’ve 
>>>> found that the Accumulo Tablet servers for the Fluo table are falling 
>>>> behind pretty quickly for our application.  We’ve tried splitting the 
>>>> Accumulo Table so that we have more Tablet Servers, but that doesn’t 
>>>> really buy us too much.  Our application is fairly scan intensive—we have 
>>>> a metadata framework in place that allows us to pass query results through 
>>>> the query tree, and each observer needs to look up metadata to determine 
>>>> which observer to route its data to after processing.  To give you some 
>>>> indication of our scan rates, our Join Observer does about 6 lookups, 
>>>> builds a scanner to do one RowColumn restricted scan, and then does many 
>>>> writes.  So an obvious way to alleviate the burden on the TableServer is 
>>>> to cut down on the number of scans.
>>>>
>>>> One approach that we are considering is to import all of our metadata into 
>>>> memory.  Essentially, each Observer would need access to an in memory 
>>>> metadata cache.  We’re considering using the Observer context, but this 
>>>> cache needs to be mutable because a user needs to be able to register new 
>>>> queries.  Is it possible to update the context, or would we need to 
>>>> restart the application to do that?  I guess other options would be to 
>>>> create a static cache for each Observer that stores the metadata, or to 
>>>> store it in Zookeeper.  Have any of you devs ever had create a solution to 
>>>> share state between Observers that doesn’t rely on the Fluo table?
>>>>
>>>> In addition to cutting down on the scan rate, are there any other 
>>>> approaches that you would consider?  I assume that the problem lies 
>>>> primarily with how we’ve implemented our application, but I’m also 
>>>> wondering if there is anything we can do from a configuration point of 
>>>> view to reduce the burden on the Tablet servers.  Would reducing the 
>>>> number of workers/worker threads to cut down on the number of times a 
>>>> single observation is processed be helpful?  It seems like this approach 
>>>> would cut out some redundant scans as well, but it might be more of a 
>>>> second order optimization. In general, any insight that you might have on 
>>>> this problem would be greatly appreciated.
>>>>
>>>> Sincerely,
>>>> Caleb Meier
>>>>
>>>> Caleb A. Meier, Ph.D.
>>>> Senior Software Engineer ♦ Analyst
>>>> Parsons Corporation
>>>> 1911 N. Fort Myer Drive, Suite 800 ♦ Arlington, VA 22209
>>>> Office:  (703)797-3066
>>>> [email protected]<mailto:[email protected]> ♦ 
>>>> www.parsons.com<https://webportal.parsons.com/,DanaInfo=www.parsons.
>>>> c
>>>> om+>
>>>>

RE: fluo accumulo table tablet servers not keeping up with application

Reply via email to