Re: [PERFORM] PostgreSQL performance problem - tuning
Hi, All! Richard Huxton wrote: On Thursday 07 August 2003 17:30, Yaroslav Mazurak wrote: Richard Huxton wrote: On Thursday 07 August 2003 09:24, Yaroslav Mazurak wrote: PG's memory use can be split into four areas (note - I'm not a developer so this could be wrong). 1. Shared memory - vital so that different connections can communicate with each other. Shouldn't be too large, otherwise PG spends too long managing its shared memory rather than working on your queries. 2. Sort memory - If you have to sort results during a query it will use up to the amount you define in sort_mem and then use disk if it needs any more. This is for each sort. 3. Results memory - If you're returning 8000 rows then PG will assemble these and send them to the client which also needs space to store the 8000 rows. 4. Working memory - to actually run the queries - stack and heap space to keep track of its calculations etc. Hence, total free RAM - shared_buffers - k * sort_mem - effective_cache_size == (results memory + working memory)? For the moment, I'd leave the settings roughly where they are while we look at the query, then once that's out of the way we can fine-tune the settings. OK. Additionally, with user-defined function beginchar (SUBSTR(..., 1, 1)), used for indexing, showcalc executes about 16 seconds. With function SUBSTR the same showcalc executes 6 seconds. Fair enough - substr should be fairly efficient. Cost of user-defined SQL function call in PostgreSQL is high? OK - bear in mind that these suggestions are made without the benefit of the explain analyse: 1. You could try splitting out the various tags of your mask into different fields - that will instantly eliminate all the substr() calls and might make a difference. If you want to keep the mask for display purposes, we could build a trigger to keep it in sync with the separate flags. This will be next step. :) 2. Use a calculations table and build your results step by step. So - calculate all the simple accounts, then calculate the ones that contain the simple accounts. I give to SQL to user and few helper functions. Therefore single step is required for building results. 3. You could keep a separate account_contains table that might look like: acc_id | contains A001 | A001 A002 | A002 A003 | A003 A003 | A001 A004 | A004 A004 | A003 A004 | A001 So here A001/A002 are simple accounts but A003 contains A001 too. A004 contains A003 and A001. The table can be kept up to date automatically using some triggers. This should make it simple to pick up all the accounts contained within the target account and might mean you can eliminate the recursion. Thanks, sounds not so bad, but I suspect that this method don't improve performance essentially. I think about another secondary table for showcomp (compshow :)) with showings compiled into account numbers and characteritics. After inserting or updating new or old showing this showing will be recompiled by explicit function call or trigger into atomary account numbers and characteristics. Post the EXPLAIN ANALYSE first - maybe someone smarter than me will have an idea. First result - simple showing 'B00202' (without recursion). Second result - complex showing 'B00204' with recursion (1 level depth). Showing 'B00202' contains 85 accounts, 'B00203' - 108 accounts, and 'B00204' = 'B00202' - 'B00203'. Query text: EXPLAIN ANALYZE SELECT COALESCE( (SELECT sc.koef * 100 FROM showing AS s NATURAL JOIN showcomp AS sc WHERE s.kod = 'B00202' AND NOT SUBSTR(acc_mask, 1, 1) = '[' AND SUBSTR(acc_mask, 1, 4) = '6010' AND SUBSTR(acc_mask, 5, 1) = SUBSTR('20', 1, 1)), (SELECT SUM(sc.koef * COALESCE(showcalc(SUBSTR(acc_mask, 2, LENGTH(acc_mask) - 2), '20', '6010', 100), 0)) FROM showing AS s NATURAL JOIN showcomp AS sc WHERE s.kod = 'B00202' AND SUBSTR(acc_mask, 1, 1) = '['), 0) AS showing; EXPLAIN ANALYZE SELECT COALESCE( (SELECT sc.koef * 100 FROM showing AS s NATURAL JOIN showcomp AS sc WHERE s.kod = 'B00204' AND NOT SUBSTR(acc_mask, 1, 1) = '[' AND SUBSTR(acc_mask, 1, 4) = '6010' AND SUBSTR(acc_mask, 5, 1) = SUBSTR('20', 1, 1)), (SELECT SUM(sc.koef * COALESCE(showcalc(SUBSTR(acc_mask, 2, LENGTH(acc_mask) - 2), '20', '6010', 100), 0)) FROM showing AS s NATURAL JOIN showcomp AS sc WHERE s.kod = 'B00204' AND SUBSTR(acc_mask, 1, 1) = '['), 0) AS showing; QUERY PLAN -- Result (cost=0.00..0.01 rows=1 width=0) (actual time=704.39..704.39 rows=1 loops=1) InitPlan - Hash Join (cost=5.22..449.63 rows=1 width=19) (actual time=167.28..352.90 rows=1 loops=1) Hash Cond: (outer.id_show = inner.id_show) - Seq Scan on showcomp sc (cost=0.00..444.40 rows=1 width=15) (actual time=23.29..350.17 rows
[PERFORM] PostgreSQL performance problem - tuning
=1.00..10090.02 rows=29 width=11) Filter: ((substr((box)::text, 2, 2) = 'NL'::text) OR (substr((box)::text, 2, 2) = 'NM'::text)) - Index Scan using index_dov_tvbv_bnk on dov_tvbv (cost=0.00..142.42 rows=2334 width=36) - Hash (cost=5.83..5.83 rows=16 width=10) - Index Scan using ek_pok_r_pkey on ek_pok_r epr (cost=0.00..5.83 rows=16 width=10) - Hash (cost=178.15..178.15 rows=2100 width=11) - Index Scan using kl_r020_pkey on kl_r020 (cost=0.00..178.15 rows=2100 width=11) - Hash (cost=15.26..15.26 rows=1 width=4) - Index Scan using kl_r030_pkey on kl_r030 r030 (cost=0.00..15.26 rows=1 width=4) Filter: ((r030)::text = '980'::text) - Hash (cost=3.04..3.04 rows=4 width=4) - Index Scan using kl_k041_pkey on kl_k041 (cost=0.00..3.04 rows=4 width=4) (45 rows) Function showcalc definition is: CREATE OR REPLACE FUNCTION showcalc(VARCHAR(10), VARCHAR(2), VARCHAR(4), NUMERIC(16)) RETURNS NUMERIC(16) LANGUAGE SQL AS ' -- Parameters: code, dd, r020, t071 SELECT COALESCE( (SELECT sc.koef * $4 FROM showing AS s NATURAL JOIN showcomp AS sc WHERE s.kod LIKE $1 AND NOT SUBSTR(acc_mask, 1, 1) LIKE ''['' AND SUBSTR(acc_mask, 1, 4) LIKE $3 AND SUBSTR(acc_mask, 5, 1) LIKE SUBSTR($2, 1, 1)), (SELECT SUM(sc.koef * COALESCE(showcalc(SUBSTR(acc_mask, 2, LENGTH(acc_mask) - 2), $2, $3, $4), 0)) FROM showing AS s NATURAL JOIN showcomp AS sc WHERE s.kod LIKE $1 AND SUBSTR(acc_mask, 1, 1) LIKE ''[''), 0) AS showing; '; View v_file02wide is: CREATE VIEW v_file02wide AS SELECT id_a011 AS a011, data, obl.ko, obl.nazva AS oblast, b030, banx.box AS ncks, banx.nazva AS bank, epr.dd, r020, r030, a3, r030.nazva AS valuta, k041, -- Sum equivalent in national currency t071 * get_kurs(id_r030, data) AS t070, t071 FROM v_file02 AS vf02 JOIN kod_obl AS obl USING(id_obl) JOIN (dov_bank NATURAL JOIN dov_tvbv) AS banx ON banx.id_v = vf02.id_v AND data BETWEEN COALESCE(banx.dataa, data) AND COALESCE(banx.datab, data) JOIN ek_pok_r AS epr USING(id_dd) JOIN kl_r020 USING(id_r020) JOIN kl_r030 AS r030 USING(id_r030) JOIN kl_k041 USING(id_k041); Function inrepdate is: CREATE OR REPLACE FUNCTION inrepdate(DATE) RETURNS BOOL LANGUAGE SQL AS ' -- Returns true if given date is in repdate SELECT (SELECT COUNT(*) FROM repdate WHERE $1 BETWEEN COALESCE(data1, CURRENT_DATE) AND COALESCE(data2, CURRENT_DATE)) 0; '; Table sizes (records) filexxr 34712 file02 816589 v_file02816589 kod_obl 43 banx2334 ek_pok_r16 kl_r020 2100 kl_r030 208 kl_r041 4 v_file02wide showing 2787 showcomp13646 repdate 1 Table has indexes almost for all selected fields. showcalc in this query selects and uses 195 rows. Total query size is 8066 records (COUNT(*) executes about 33 seconds and uses 120Mb RAM). With best regards Yaroslav Mazurak. inline: model.gif smime.p7s Description: S/MIME Cryptographic Signature
Re: [PERFORM] PostgreSQL performance problem - tuning
Hi All! First, thanks for answers! Richard Huxton wrote: On Thursday 07 August 2003 09:24, Yaroslav Mazurak wrote: IIRC there is a limit on filesystem cache on freeBSD. 300MB by default. If that is the case, you might have to raise it to make effective_cache_size really effective.. Try various sysctls says nothing for me. I want use *all available RAM* (of course, without needed for OS use) for PostgreSQL. PG will be using the OS' disk caching. I think all applications using OS disk caching. ;) Or you want to say that PostgreSQL tuned for using OS-specific cache implementation? Do you know method for examining real size of OS filesystem cache? If I understood right, PostgreSQL dynamically use all available RAM minus shared_buffers minus k * sort_mem minus effective_cache_size? I want configure PostgreSQL for using _maximum_ of available RAM. Looks fine - PG isn't growing too large and your swap usage seems steady. We can try upping the sort memory later, but given the amount of data you're dealing with I'd guess 64MB should be fine. I think we're going to have to break the query down a little and see where the issue is. What's the situation with: EXPLAIN ANALYZE SELECT some_field FROM v_file02wide WHERE a011 = 3 AND inrepdate(data) AND b030 IN (SELECT b030 FROM dov_bank WHERE dov_bank_box_22(box) IN ('NL', 'NM')) AND r030 = 980; and: EXPLAIN ANALYZE SELECT SUM(showcalc(parameters)) FROM something simple Hopefully one of these will run in a reasonable time, and the other will not. Then we can examine the slow query in more detail. Nothing from your previous EXPLAIN (email of yesterday 13:42) looks unreasonable but something must be going wild in the heart of the query, otherwise you wouldn't be here. Yes, you're right. I've tested a few statements and obtain interesting results. SELECT * FROM v_file02wide WHERE... executes about 34 seconds. SELECT showcalc(...); executes from 0.7 seconds (without recursion) up to 6.3 seconds if recursion is used! :( This mean, that approximate execute time for fully qualified SELECT with about 8K rows is... about 13 hours! :-O Hence, problem is in my function showcalc: CREATE OR REPLACE FUNCTION showcalc(VARCHAR(10), VARCHAR(2), VARCHAR(4), NUMERIC(16)) RETURNS NUMERIC(16) LANGUAGE SQL STABLE AS ' -- Parameters: code, dd, r020, t071 SELECT COALESCE( (SELECT sc.koef * $4 FROM showing AS s NATURAL JOIN showcomp AS sc WHERE s.kod = $1 AND NOT SUBSTR(acc_mask, 1, 1) = ''['' AND SUBSTR(acc_mask, 1, 4) = $3 AND SUBSTR(acc_mask, 5, 1) = SUBSTR($2, 1, 1)), (SELECT SUM(sc.koef * COALESCE(showcalc(SUBSTR(acc_mask, 2, LENGTH(acc_mask) - 2), $2, $3, $4), 0)) FROM showing AS s NATURAL JOIN showcomp AS sc WHERE s.kod = $1 AND SUBSTR(acc_mask, 1, 1) = ''[''), 0) AS showing; '; BTW, cross join , with WHERE clause don't improve performance relative to NATURAL JOIN. Additionally, with user-defined function beginchar (SUBSTR(..., 1, 1)), used for indexing, showcalc executes about 16 seconds. With function SUBSTR the same showcalc executes 6 seconds. Table showing contain information about showing: showing id (id_show), code (kod) and description (opys). Table showcomp contain information about showing components (accounts): showing id (id_show), coefficient (koef) and account_mask (acc_mask). Account mask is 4-char balance account mask || 1-char account characteristics or another showing in square bracket. Example: showing =+==+=== id_show | kod | opys =+==+=== 1 | 'A00101' | 'Received' 2 | 'A00102' | 'Sent' 3 | 'A00103' | 'Total' =+==+=== showcomp =+==+=== id_show | koef | acc_mask =+==+=== 1 | 1.0 | '60102' 1 | 1.0 | '60112' 2 | 1.0 | '70011' 2 | 1.0 | '70021' 3 | 1.0 | '[A00101]' 3 | -1.0 | '[A00102]' =+==+=== This mean that: A00101 includes accounts 6010 and 6011 with characteristics 2, A00102 includes accounts 7001 and 7002 with characteristics 1, and A00103 = A00102 - A00101. In almost all cases recursion depth not exceed 1 level, but I'm not sure. :) View v_file02wide contain account (r020) and 2-char characteristics (dd). Using showcalc I want to sum numbers (t071) on accounts included in appropriate showings. I.e SELECT SUM(showcalc('A00101', dd, r020, t071)) FROM ... must return sum on accounts 6010 and 6011 with characteristics 2 etc. Now I think about change function showcalc or/and this data structures... :) Anyway, 600Mb is too low for PostgreSQL for executing my query - DBMS raise error after 11.5 hours (of estimated 13?). :( With best regards Yaroslav Mazurak. ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command
Re: [PERFORM] PostgreSQL performance problem - tuning
Hi All! Shridhar Daithankar wrote: On 7 Aug 2003 at 10:05, Yaroslav Mazurak wrote: It needs to reflect how much cache the system is using - try the free command to see figures. I'm not found free utility on FreeBSD 4.7. :( rant Grr.. I don't like freeBSD for it's top output.Active/inactive/Wired.. Grr.. why can't it be shared buffered and cached? Same goes for HP-UX top. Looking at it one gets hardly any real information.. Anyway that's just me.. /rant Grr... I don't like PostgreSQL for it's memory usage parameters. In Sybase ASA, I say for example: use 64Mb RAM for cache. I don't worry about data in this cache - this may be queries, sort areas, results etc. I think that server know better about it's memory requirements. I know that Sybase *use*, and use *only this* memory and don't trap with Memory exhausted error. I'm not remember 700 minutes queries (more complex that my query), following with memory exhausted error, on Sybase. Advertising, he? :( Top on freeBSD seems pretty unintuituive em but if you find any documentation on that, that would help you. (Haven't booted in freeBSD in ages so no data out of my head..) You can try various sysctls on freeBSD. Basicalyl idea is to find out how much of memory is used and how much is cached. FreeBSD must be providing that one in some form.. IIRC there is a limit on filesystem cache on freeBSD. 300MB by default. If that is the case, you might have to raise it to make effective_cache_size really effective.. Try various sysctls says nothing for me. I want use *all available RAM* (of course, without needed for OS use) for PostgreSQL. While idle time top says: Mem: 14M Active, 1944K Inact, 28M Wired, 436K Cache, 48M Buf, 331M Free Swap: 368M Total, 17M Used, 352M Free, 4% Inuse After 1 minute of EXPLAIN ANALYZE SELECT SUM(showcalc('B00204', dd, r020, t071)) FROM v_file02wide WHERE a011 = 3 AND inrepdate(data) AND b030 IN (SELECT b030 FROM dov_bank WHERE dov_bank_box_22(box) IN ('NL', 'NM')) AND r030 = 980; executing: Mem: 64M Active, 17M Inact, 72M Wired, 436K Cache, 48M Buf, 221M Free Swap: 368M Total, 3192K Used, 365M Free PID USERNAMEPRI NICE SIZERES STATETIME WCPUCPU COMMAND 59063 postgres 49 0 65560K 55492K RUN 1:06 94.93% 94.63% postgres After 12 minutes of query executing: Mem: 71M Active, 17M Inact, 72M Wired, 436K Cache, 48M Buf, 214M Free Swap: 368M Total, 3192K Used, 365M Free PID USERNAMEPRI NICE SIZERES STATETIME WCPUCPU COMMAND 59063 postgres 56 0 73752K 62996K RUN 12:01 99.02% 99.02% postgres I suspect that swap-file size is too small for my query... but query isn't too large, about 8K rows only. :-| Shridhar With best regards Yaroslav Mazurak. ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] PostgreSQL performance problem - tuning
Hi All! Richard Huxton wrote: On Wednesday 06 August 2003 08:34, Yaroslav Mazurak wrote: sort_mem = 131072 This sort_mem value is *very* large - that's 131MB for *each sort* that It's not TOO large *for PostgreSQL*. When I'm inserting a large amount of data into tables, sort_mem helps. Value of 192M speeds up inserting significantly (verified :))! What mean each sort? Each query with SORT clause or some internal (invisible to user) sorts too (I can't imagine: indexed search or whatever else)? I'm reduced sort_mem to 16M. It means each sort - if you look at your query plan and see three sort clauses that means that query might allocate 48MB to sorting. Now, that's good because sorting items on disk is much slower. It's bad because that's 48MB less for everything else that's happening. OK, I'm preparing to fix this value. :) IMHO this is PostgreSQL's lack of memory management. I think that PostgreSQL can finally allocate enough memory by himself! :-E This is another strange behavior of PostgreSQL - he don't use some created indexes (seq_scan only) after ANALYZE too. OK, I'm turned on this option back. Fair enough, we can work on those. With 7.3.x you can tell PG to examine some tables more thouroughly to get better plans. You might EXPLAIN ANALYZE? effective_cache_size = 65536 So you typically get about 256MB cache usage in top/free? No, top shows 12-20Mb. I'm reduced effective_cache_size to 4K blocks (16M?). Cache size is in blocks of 8KB (usually) - it's a way of telling PG what the chances are of disk blocks being already cached by Linux. PostgreSQL is running on FreeBSD, memory block actually is 4Kb, but in most cases documentation says about 8Kb... I don't know exactly about real disk block size, but suspect that it's 4Kb. :) I think this is a important remark. Can JOIN significantly reduce performance of SELECT statement relative to , WHERE? OK, I'm changed VIEW to this text: It can sometimes. What it means is that PG will follow whatever order you write the joins in. If you know joining a to b to c is the best order, that can be a good thing. Unfortunately, it means the planner can't make a better guess based on its statistics. At this moment this don't helps. :( Well the cost estimates look much more plausible. You couldn't post EXPLAIN ANALYSE could you? That actually runs the query. Now (2K shared_buffers blocks, 16K effective_cache_size blocks, 16Mb sort_mem) PostgreSQL uses much less memory, about 64M... it's not good, I want using all available RAM if possible - PostgreSQL is the main task on this PC. Don't forget that any memory PG is using the operating-system can't. The OS will cache frequently accessed disk blocks for you, so it's a question of finding the right balance. PostgreSQL is the primary task for me on this PC - I don't worry about other tasks except OS. ;) May set effective_cache_size to 192M (48K blocks) be better? I don't understand exactly: effective_cache_size tells PostgreSQL about OS cache size or about available free RAM? It needs to reflect how much cache the system is using - try the free command to see figures. I'm not found free utility on FreeBSD 4.7. :( If you could post the output of EXPLAIN ANALYSE rather than EXPLAIN, I'll take a look at it this evening (London time). There's also plenty of other people on this list who can help too. I'm afraid that this may be too long. :-((( Yesterday I'm re-execute my query with all changes... after 700 (!) minutes query failed with: ERROR: Memory exhausted in AllocSetAlloc(104). I don't understand: result is actually 8K rows long only, but PostgreSQL failed! Why?!! Function showcalc is recursive, but in my query used with level 1 depth only (I know exactly). Again: I think that this is PostgreSQL's lack of quality memory management. :-( - Richard Huxton With best regards Yaroslav Mazurak. ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [PERFORM] PostgreSQL performance problem - tuning
Hi All! Tom Lane wrote: Yaroslav Mazurak [EMAIL PROTECTED] writes: fsync = false I'd turn fsync back on - unless you don't mind losing your data after a crash. This is temporary performance solution - I want get SELECT query result first, but current performance is too low. Disabling fsync will not help SELECT performance one bit. It would only affect transactions that modify the database. Fixed. But at this moment primary tasks are *get result* (1st) from SELECT in *reasonable* time (2nd). :) regards, tom lane With best regards Yaroslav Mazurak. ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]