Re: [firebird-support] [FB 2.1] Firebird engine seems to slow down on high load without utilizing hardware

'Thomas Steinmaurer' t...@iblogmanager.com [firebird-support] Tue, 12 Apr 2016 06:16:17 -0700

Hi Patrick,

> Hi Thomas, nice to get a response from you. We already met in ~2010 in Linz at
> your office :)
> (ex. SEM GmbH, later Playmonitor GmbH)


I know. XING (Big Brother) is watching you. Nice to see that you are still 
running with Firebird. ;-)


> First, sorry for posting a mixed state of informations. The config settings i
> postet are the current settings.
> But the Lock-Table-Header was from last saturday (day of total system crash) -
> we changed Hash Slot Value since than, but it didn't work. New Table looks
> like:
> 
> 
> LOCK_HEADER BLOCK
> Version: 16, Active owner:      0, Length: 134247728, Used: 55790260
> Semmask: 0x0, Flags: 0x0001
> Enqs: 1806423519, Converts: 4553851, Rejects: 5134185, Blocks: 56585419
> Deadlock scans:     82, Deadlocks:      0, Scan interval:  10
> Acquires: 2058846891, Acquire blocks: 321584126, Spin count:   0
> Mutex wait: 15.6%
> Hash slots: 20011, Hash lengths (min/avg/max):    0/   7/  18
> Remove node:      0, Insert queue:      0, Insert prior:      0
> Owners (297): forward: 385160, backward: 38086352
> Free owners (43): forward: 52978748, backward: 20505128
> Free locks (41802): forward: 180712, backward: 3620136
> Free requests (-1097572396): forward: 46948676, backward: 13681252
> Lock Ordering: Enabled
> 
> 
> The Min/Avg/Max hash lengths look better now, but as you mentioned the Mutex
> wait is worring us too.
> We have 2 direct questions about that.
> 
> 
> 1) What are the negative effects of increasing Hash-Slots (too high)?

It somehow defines the initial size of a hash table which is used for lock(ed) 
object lookup by a key (= hash value), ideally with constant O(1) run-time 
complexity. If the hash table is too small, due to a too small value for hash 
slots, it starts to degenerate into a linked/linear list per hash slot. Worst 
case resulting in O(n) complexity for lookups. The above 20011 setting shows an 
AVG hash length which looks fine.

As you might know, Classic having a dedicated process per connection model 
somehow needs a (global) mechanism to synchronize/protect shared data 
structures across these processes via IPC. This is what the lock manager and 
the lock table is used for.

> 2) As far as we know, we can't influence Mutex wait directly (it's just
> informational). But do you think that's the reason the underlying hardware is
> not utilized?

I don't think you are disk IO bound. Means, I'm not convinced that faster IO 
will help. Somehow backed by the high mutex wait. Under normal operations you 
see 100-500 IOPS with some room for further increase as shown in the 1700 IOPS 
backup use case. Don't know how random disk IO is in this two scenarios. Any 
chance to run some sort of disk IO benchmarks or do you already know your upper 
limits for your SAN IOPS wise?

> 
> 
> We do consider to upgrade to 2.5, but had our eyes on FB 3 over the last year,
> waiting for it to get ready.
> With 2.5.x we tested around a long time now, but never found a real reason to
> upgrade - since it's a reasonable amount of work for us. When you say it
> improves the lock contention, this sound pretty good. But again the question,
> do you think lock contention is limiting our system?

Dmitry, Vlad etc. will correct me (in case he is following the thread), but I 
recall that in 2.5, especially in SuperClassic being multi-threaded per worker 
process compared to Classic, now also allows specific(?) lock manager 
operations in parallel to regular request processing. In general I remember a 
mentioned improvement of ~25% in a TPC-C style workload with SuperClassic 
compared to Classic.

> 
> 
> First and foremost, we would really like to find the bottleneck. We just don't
> have the know-how to imagine something like "Fb 2.1 Engine is limiting us
> because of ..." and without that knowledge it's hard to take actions like
> upgrading to 2.5.
> 
> 
> We'll try to collect information about the garbage we create :) We do run
> "Sinatica Monitoring" on the server, which shows us "Awaiting Gargabe
> Collection" Transactions. Is that the information you'r looking for?

I'm not familiar with Sinatica. Perhaps the periodic MON$ queries (how frequent 
are they executed by Sinatica?) also produce some sort of overhead, cause each 
MON$ table query in context of a new physical transaction results in a stable 
view of current activity. Possibly not neglectable with > 400 connections.

The most easiest way to get insights on your record garbage is, e.g.:

* Run gstat -r
* Run a tool from IBSurgeon (can't recall the name, Alexey?)
* Run a tool from Upscene (FB TraceManager)

> 
> Maybe to avoid confusion, we don't have normal "Spikes" .. the system just
> starts to slow down and this state remains until the server-load is gone 
> (after
> midnight, when software is not used anymore).



--
With regards,
Thomas Steinmaurer
http://www.upscene.com

Professional Tools and Services for Firebird
FB TraceManager, IB LogManager, Database Health Check, Tuning etc.

Re: [firebird-support] [FB 2.1] Firebird engine seems to slow down on high load without utilizing hardware

Reply via email to