[firebird-support] Re[2]: [FB 2.1] Firebird engine seems to slow down on high load without utilizing hardware

Alexey Kovyazin (ak) a...@ib-aid.com [firebird-support] Tue, 12 Apr 2016 23:24:32 -0700

Hi,
You wrote:
>The thing is, sure this numbers look really low. But >the system never uses 
>it. The monitoring of the >SAN show's that this load's are never used
You are confusing the reason and the result.
Monitoring shows low numbers because spinning drives cannot provide fast enough 
random reads. 
>From Sintatica, every 20 Minutes a Peak in GC for >~15.000 transactions
Sinatica uses really strange terms to describe transactions behaviour, and 
since it is an abandonen tool, nobody can explain how they are aligned with 
real situation.
--
Regards,
Alexey Kovyazin
IBSurgeon среда, 13 апреля 2016г., 05:08 +03:00 от " thetr...@yahoo.com 
[firebird-support]" < firebird-support@yahoogroups.com> :


> 
>Hey Thomas,
>thanks for your extensive reply.
>Unfortunatly we'r still bound to some old 32bit UDF functionality which we 
>can't get in 64bit. 
>I think you know about the use of SuperClassic with 32bit Server - 2GB RAM 
>Limit :)
>It's not impossible, but also not really a fast route we can go. But for sure 
>again a reason to talk about moving the switch to 2.5.
>
>We did ran some some disk IO benchmarks (with AS SSD) today, and in times of 
>SSD kinda depressing :D
>The thing is, sure this numbers look really low. But the system never uses it. 
>The monitoring of the SAN show's that this load's are never used. The 
>Single-4k-read is worring me, but i lean towards that our 500 proceses are 
>more like the 64-thread test. But even then, we only messured 100 Iops reading 
>on livesystem.
>
>Sequential Read speed: ~ 450 MB / s
>Sequential Write speed: ~500 MB / s
>4k read: 196 Iops
>4k write: 1376 Iops
>4k-64 thread read: 15945 Iops
>4k-64 thread write: 7361 Iops
>
>
>Garbage Info still needs to be collected.
>But first signs show that this indeed could be a potential problem.
>From Sintatica, every 20 Minutes a Peak in GC for ~15.000 transactions. This 
>get's fixed by the server in the relative small amount of time (i think < 1 
>minute), since it's really only a single peak in the graph everytime.
>When the GC stop increasing and the server starts to collect it, we see an 
>increase of concurrent running transactions (= transactions are longer open 
>and processed slower).
>
>We don't have data from the live system yet to see if this behaviour kind of 
>"snowballs" when there is really high load on the server.
>
>Best Regards,
>
>---In firebird-support@yahoogroups.com, <ts@...> wrote :
>
>Hi Patrick,
>
>>> Hi Thomas, nice to get a response from you. We already met in ~2010 in Linz 
>>> at
>>> your office :)
>>> (ex. SEM GmbH, later Playmonitor GmbH)
>>
>I know. XING (Big Brother) is watching you. Nice to see that you are still 
>running with Firebird. ;-)
>
>
>>> First, sorry for posting a mixed state of informations. The config settings 
>>> i
>>> postet are the current settings.
>>> But the Lock-Table-Header was from last saturday (day of total system 
>>> crash) -
>>> we changed Hash Slot Value since than, but it didn't work. New Table looks
>>> like:
>>> 
>>> 
>>> LOCK_HEADER BLOCK
>>> Version: 16, Active owner:      0, Length: 134247728, Used: 55790260
>>> Semmask: 0x0, Flags: 0x0001
>>> Enqs: 1806423519, Converts: 4553851, Rejects: 5134185, Blocks: 56585419
>>> Deadlock scans:     82, Deadlocks:      0, Scan interval:  10
>>> Acquires: 2058846891, Acquire blocks: 321584126, Spin count:   0
>>> Mutex wait: 15.6%
>>> Hash slots: 20011, Hash lengths (min/avg/max):    0/   7/  18
>>> Remove node:      0, Insert queue:      0, Insert prior:      0
>>> Owners (297): forward: 385160, backward: 38086352
>>> Free owners (43): forward: 52978748, backward: 20505128
>>> Free locks (41802): forward: 180712, backward: 3620136
>>> Free requests (-1097572396): forward: 46948676, backward: 13681252
>>> Lock Ordering: Enabled
>>> 
>>> 
>>> The Min/Avg/Max hash lengths look better now, but as you mentioned the Mutex
>>> wait is worring us too.
>>> We have 2 direct questions about that.
>>> 
>>> 
>>> 1) What are the negative effects of increasing Hash-Slots (too high)?
>>
>It somehow defines the initial size of a hash table which is used for lock(ed) 
>object lookup by a key (= hash value), ideally with constant O(1) run-time 
>complexity. If the hash table is too small, due to a too small value for hash 
>slots, it starts to degenerate into a linked/linear list per hash slot. Worst 
>case resulting in O(n) complexity for lookups. The above 20011 setting shows 
>an AVG hash length which looks fine.
>
>As you might know, Classic having a dedicated process per connection model 
>somehow needs a (global) mechanism to synchronize/protect shared data 
>structures across these processes via IPC. This is what the lock manager and 
>the lock table is used for.
>
>>> 2) As far as we know, we can't influence Mutex wait directly (it's just
>>> informational). But do you think that's the reason the underlying hardware 
>>> is
>>> not utilized?
>>
>I don't think you are disk IO bound. Means, I'm not convinced that faster IO 
>will help. Somehow backed by the high mutex wait. Under normal operations you 
>see 100-500 IOPS with some room for further increase as shown in the 1700 IOPS 
>backup use case. Don't know how random disk IO is in this two scenarios. Any 
>chance to run some sort of disk IO benchmarks or do you already know your 
>upper limits for your SAN IOPS wise?
>
>>> 
>>> 
>>> We do consider to upgrade to 2.5, but had our eyes on FB 3 over the last 
>>> year,
>>> waiting for it to get ready.
>>> With 2.5.x we tested around a long time now, but never found a real reason 
>>> to
>>> upgrade - since it's a reasonable amount of work for us. When you say it
>>> improves the lock contention, this sound pretty good. But again the 
>>> question,
>>> do you think lock contention is limiting our system?
>>
>Dmitry, Vlad etc. will correct me (in case he is following the thread), but I 
>recall that in 2.5, especially in SuperClassic being multi-threaded per worker 
>process compared to Classic, now also allows specific(?) lock manager 
>operations in parallel to regular request processing. In general I remember a 
>mentioned improvement of ~25% in a TPC-C style workload with SuperClassic 
>compared to Classic.
>
>>> 
>>> 
>>> First and foremost, we would really like to find the bottleneck. We just 
>>> don't
>>> have the know-how to imagine something like "Fb 2.1 Engine is limiting us
>>> because of ..." and without that knowledge it's hard to take actions like
>>> upgrading to 2.5.
>>> 
>>> 
>>> We'll try to collect information about the garbage we create :) We do run
>>> "Sinatica Monitoring" on the server, which shows us "Awaiting Gargabe
>>> Collection" Transactions. Is that the information you'r looking for?
>>
>I'm not familiar with Sinatica. Perhaps the periodic MON$ queries (how 
>frequent are they executed by Sinatica?) also produce some sort of overhead, 
>cause each MON$ table query in context of a new physical transaction results 
>in a stable view of current activity. Possibly not neglectable with > 400 
>connections.
>
>The most easiest way to get insights on your record garbage is, e.g.:
>
>* Run gstat -r
>* Run a tool from IBSurgeon (can't recall the name, Alexey?)
>* Run a tool from Upscene (FB TraceManager)
>
>>> 
>>> Maybe to avoid confusion, we don't have normal "Spikes" .. the system just
>>> starts to slow down and this state remains until the server-load is gone 
>>> (after
>>> midnight, when software is not used anymore).
>>
>>
>>
>--
>With regards,
>Thomas Steinmaurer
>http://www.upscene.com
>
>Professional Tools and Services for Firebird
>FB TraceManager, IB LogManager, Database Health Check, Tuning etc.
>

[firebird-support] Re[2]: [FB 2.1] Firebird engine seems to slow down on high load without utilizing hardware

Reply via email to