Hi, You wrote: >The thing is, sure this numbers look really low. But >the system never uses >it. The monitoring of the >SAN show's that this load's are never used You are confusing the reason and the result. Monitoring shows low numbers because spinning drives cannot provide fast enough random reads. >From Sintatica, every 20 Minutes a Peak in GC for >~15.000 transactions Sinatica uses really strange terms to describe transactions behaviour, and since it is an abandonen tool, nobody can explain how they are aligned with real situation. -- Regards, Alexey Kovyazin IBSurgeon среда, 13 апреля 2016г., 05:08 +03:00 от " thetr...@yahoo.com [firebird-support]" < firebird-support@yahoogroups.com> :
> >Hey Thomas, >thanks for your extensive reply. >Unfortunatly we'r still bound to some old 32bit UDF functionality which we >can't get in 64bit. >I think you know about the use of SuperClassic with 32bit Server - 2GB RAM >Limit :) >It's not impossible, but also not really a fast route we can go. But for sure >again a reason to talk about moving the switch to 2.5. > >We did ran some some disk IO benchmarks (with AS SSD) today, and in times of >SSD kinda depressing :D >The thing is, sure this numbers look really low. But the system never uses it. >The monitoring of the SAN show's that this load's are never used. The >Single-4k-read is worring me, but i lean towards that our 500 proceses are >more like the 64-thread test. But even then, we only messured 100 Iops reading >on livesystem. > >Sequential Read speed: ~ 450 MB / s >Sequential Write speed: ~500 MB / s >4k read: 196 Iops >4k write: 1376 Iops >4k-64 thread read: 15945 Iops >4k-64 thread write: 7361 Iops > > >Garbage Info still needs to be collected. >But first signs show that this indeed could be a potential problem. >From Sintatica, every 20 Minutes a Peak in GC for ~15.000 transactions. This >get's fixed by the server in the relative small amount of time (i think < 1 >minute), since it's really only a single peak in the graph everytime. >When the GC stop increasing and the server starts to collect it, we see an >increase of concurrent running transactions (= transactions are longer open >and processed slower). > >We don't have data from the live system yet to see if this behaviour kind of >"snowballs" when there is really high load on the server. > >Best Regards, > >---In firebird-support@yahoogroups.com, <ts@...> wrote : > >Hi Patrick, > >>> Hi Thomas, nice to get a response from you. We already met in ~2010 in Linz >>> at >>> your office :) >>> (ex. SEM GmbH, later Playmonitor GmbH) >> >I know. XING (Big Brother) is watching you. Nice to see that you are still >running with Firebird. ;-) > > >>> First, sorry for posting a mixed state of informations. The config settings >>> i >>> postet are the current settings. >>> But the Lock-Table-Header was from last saturday (day of total system >>> crash) - >>> we changed Hash Slot Value since than, but it didn't work. New Table looks >>> like: >>> >>> >>> LOCK_HEADER BLOCK >>> Version: 16, Active owner: 0, Length: 134247728, Used: 55790260 >>> Semmask: 0x0, Flags: 0x0001 >>> Enqs: 1806423519, Converts: 4553851, Rejects: 5134185, Blocks: 56585419 >>> Deadlock scans: 82, Deadlocks: 0, Scan interval: 10 >>> Acquires: 2058846891, Acquire blocks: 321584126, Spin count: 0 >>> Mutex wait: 15.6% >>> Hash slots: 20011, Hash lengths (min/avg/max): 0/ 7/ 18 >>> Remove node: 0, Insert queue: 0, Insert prior: 0 >>> Owners (297): forward: 385160, backward: 38086352 >>> Free owners (43): forward: 52978748, backward: 20505128 >>> Free locks (41802): forward: 180712, backward: 3620136 >>> Free requests (-1097572396): forward: 46948676, backward: 13681252 >>> Lock Ordering: Enabled >>> >>> >>> The Min/Avg/Max hash lengths look better now, but as you mentioned the Mutex >>> wait is worring us too. >>> We have 2 direct questions about that. >>> >>> >>> 1) What are the negative effects of increasing Hash-Slots (too high)? >> >It somehow defines the initial size of a hash table which is used for lock(ed) >object lookup by a key (= hash value), ideally with constant O(1) run-time >complexity. If the hash table is too small, due to a too small value for hash >slots, it starts to degenerate into a linked/linear list per hash slot. Worst >case resulting in O(n) complexity for lookups. The above 20011 setting shows >an AVG hash length which looks fine. > >As you might know, Classic having a dedicated process per connection model >somehow needs a (global) mechanism to synchronize/protect shared data >structures across these processes via IPC. This is what the lock manager and >the lock table is used for. > >>> 2) As far as we know, we can't influence Mutex wait directly (it's just >>> informational). But do you think that's the reason the underlying hardware >>> is >>> not utilized? >> >I don't think you are disk IO bound. Means, I'm not convinced that faster IO >will help. Somehow backed by the high mutex wait. Under normal operations you >see 100-500 IOPS with some room for further increase as shown in the 1700 IOPS >backup use case. Don't know how random disk IO is in this two scenarios. Any >chance to run some sort of disk IO benchmarks or do you already know your >upper limits for your SAN IOPS wise? > >>> >>> >>> We do consider to upgrade to 2.5, but had our eyes on FB 3 over the last >>> year, >>> waiting for it to get ready. >>> With 2.5.x we tested around a long time now, but never found a real reason >>> to >>> upgrade - since it's a reasonable amount of work for us. When you say it >>> improves the lock contention, this sound pretty good. But again the >>> question, >>> do you think lock contention is limiting our system? >> >Dmitry, Vlad etc. will correct me (in case he is following the thread), but I >recall that in 2.5, especially in SuperClassic being multi-threaded per worker >process compared to Classic, now also allows specific(?) lock manager >operations in parallel to regular request processing. In general I remember a >mentioned improvement of ~25% in a TPC-C style workload with SuperClassic >compared to Classic. > >>> >>> >>> First and foremost, we would really like to find the bottleneck. We just >>> don't >>> have the know-how to imagine something like "Fb 2.1 Engine is limiting us >>> because of ..." and without that knowledge it's hard to take actions like >>> upgrading to 2.5. >>> >>> >>> We'll try to collect information about the garbage we create :) We do run >>> "Sinatica Monitoring" on the server, which shows us "Awaiting Gargabe >>> Collection" Transactions. Is that the information you'r looking for? >> >I'm not familiar with Sinatica. Perhaps the periodic MON$ queries (how >frequent are they executed by Sinatica?) also produce some sort of overhead, >cause each MON$ table query in context of a new physical transaction results >in a stable view of current activity. Possibly not neglectable with > 400 >connections. > >The most easiest way to get insights on your record garbage is, e.g.: > >* Run gstat -r >* Run a tool from IBSurgeon (can't recall the name, Alexey?) >* Run a tool from Upscene (FB TraceManager) > >>> >>> Maybe to avoid confusion, we don't have normal "Spikes" .. the system just >>> starts to slow down and this state remains until the server-load is gone >>> (after >>> midnight, when software is not used anymore). >> >> >> >-- >With regards, >Thomas Steinmaurer >http://www.upscene.com > >Professional Tools and Services for Firebird >FB TraceManager, IB LogManager, Database Health Check, Tuning etc. >