Hi 2018-04-16 22:42 GMT+02:00 Hackety Man <hackety...@gmail.com>:
> *A description of what you are trying to achieve and what results you > expect.:* > > My end goal was to test the execution time difference between using an > IF(SELECT COUNT(*)...) and an IF EXISTS() when no indexes were used and > when a string match was not found. My expectation was that my 2 functions > would behave fairly similarly, but they most certainly did not. Here are > the table, functions, test queries, and test query results I received, as > well as comments as I present the pieces and talk about the results from my > perspective. > > This is the table and data that I used for my tests. A table with 1 > million sequenced records. No indexing on any columns. I ran ANALYZE on > this table and a VACUUM on the entire database, just to be sure. > > CREATE TABLE zz_noidx1 AS SELECT generate_series(0, 999999) AS > int_distinct, 'Test'::text || generate_series(0, 999999)::text AS > text_distinct; > > These are the 2 functions that I ran my final tests with. My goal was to > determine which function would perform the fastest and my expectation was > that they would still be somewhat close in execution time comparison. > > --Test Function #1 > CREATE OR REPLACE FUNCTION zz_spx_ifcount_noidx(p_findme text) > RETURNS text > LANGUAGE 'plpgsql' > STABLE > AS $$ > > BEGIN > IF (SELECT COUNT(*) FROM zz_noidx1 WHERE LOWER(zz_noidx1.text_distinct) = > LOWER(p_findme)) > 0 THEN > RETURN 'Found'; > ELSE > RETURN 'Not Found'; > END IF; > END; > $$; > > --Test Function #2 > CREATE OR REPLACE FUNCTION zz_spx_ifexists_noidx(p_findme text) > RETURNS text > LANGUAGE 'plpgsql' > STABLE > AS $$ > > BEGIN > IF EXISTS (SELECT 1 FROM zz_noidx1 WHERE LOWER(zz_noidx1.text_distinct) = > LOWER(p_findme)) THEN > RETURN 'Found'; > ELSE > RETURN 'Not Found'; > END IF; > END; > $$; > > The first thing I did was to run some baseline tests using the basic > queries inside of the IF() checks found in each of the functions to see how > the query planner handled them. I ran the following two queries. > > EXPLAIN (ANALYZE, BUFFERS) SELECT COUNT(*) FROM zz_noidx1 WHERE > LOWER(text_distinct) = LOWER('Test5000001'); > EXPLAIN (ANALYZE, BUFFERS) SELECT 1 FROM zz_noidx1 WHERE > LOWER(text_distinct) = LOWER('Test5000001'); > > The execution time results and query plans for these two were very > similar, as expected. In the results I can see that 2 workers were > employed for each query plan. > > --Results for the SELECT COUNT(*) query. > QUERY PLAN > > ------------------------------------------------------------ > ------------------------------------------------------------ > ---------------- > Finalize Aggregate (cost=12661.42..12661.43 rows=1 width=8) (actual > time=172.105..172.105 rows=1 loops=1) > Buffers: shared read=1912 > > > -> Gather (cost=12661.21..12661.42 rows=2 width=8) (actual > time=172.020..172.099 rows=3 loops=1) > Workers Planned: 2 > > > Workers Launched: 2 > > > Buffers: shared read=1912 > > > -> Partial Aggregate (cost=11661.21..11661.22 rows=1 width=8) (actual > time=155.123..155.123 rows=1 loops=3) > Buffers: shared read=5406 > > > -> Parallel Seq Scan on zz_noidx1 (cost=0.00..11656.00 rows=2083 > width=0) (actual time=155.103..155.103 rows=0 loops=3) > Filter: (lower(text_distinct) = 'test5000001'::text) > > Rows Removed by Filter: 333333 > > Buffers: shared read=5406 > > Planning time: 0.718 ms > > > Execution time: 187.601 ms > > --Results for the SELECT 1 query. > QUERY PLAN > > ------------------------------------------------------------ > ---------------------------------------------------------------- > Gather (cost=1000.00..13156.00 rows=5000 width=4) (actual > time=175.682..175.682 rows=0 loops=1) > Workers Planned: 2 > > > Workers Launched: 2 > > Buffers: shared read=2021 > > > -> Parallel Seq Scan on zz_noidx1 (cost=0.00..11656.00 rows=2083 > width=4) (actual time=159.769..159.769 rows=0 loops=3) > Filter: (lower(text_distinct) = 'test5000001'::text) > > Rows Removed by Filter: 333333 > > Buffers: shared read=5406 > > Planning time: 0.874 ms > > Execution time: 192.045 ms > > After running these baseline tests and viewing the fairly similar results, > right or wrong, I expected my queries that tested the functions to behave > similarly. I started with the following query... > > EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM zz_spx_ifcount_noidx(' > Test5000001'); > > and I got the following "auto_explain" results... > > 2018-04-16 14:57:22.624 EDT [17812] LOG: duration: 155.239 ms plan: > Query Text: SELECT (SELECT COUNT(*) FROM zz_noidx1 WHERE > LOWER(zz_noidx1.text_distinct) = LOWER(p_findme)) > 0 > Partial Aggregate (cost=11661.21..11661.22 rows=1 width=8) (actual > time=155.230..155.230 rows=1 loops=1) > Buffers: shared read=1682 > -> Parallel Seq Scan on zz_noidx1 (cost=0.00..11656.00 rows=2083 > width=0) (actual time=155.222..155.222 rows=0 loops=1) > Filter: (lower(text_distinct) = 'test5000001'::text) > Rows Removed by Filter: 311170 > Buffers: shared read=1682 > 2018-04-16 14:57:22.624 EDT [9096] LOG: duration: 154.603 ms plan: > Query Text: SELECT (SELECT COUNT(*) FROM zz_noidx1 WHERE > LOWER(zz_noidx1.text_distinct) = LOWER(p_findme)) > 0 > Partial Aggregate (cost=11661.21..11661.22 rows=1 width=8) (actual > time=154.576..154.576 rows=1 loops=1) > Buffers: shared read=1682 > -> Parallel Seq Scan on zz_noidx1 (cost=0.00..11656.00 rows=2083 > width=0) (actual time=154.570..154.570 rows=0 loops=1) > Filter: (lower(text_distinct) = 'test5000001'::text) > Rows Removed by Filter: 311061 > Buffers: shared read=1682 > 2018-04-16 14:57:22.642 EDT [15132] LOG: duration: 197.260 ms plan: > Query Text: SELECT (SELECT COUNT(*) FROM zz_noidx1 WHERE > LOWER(zz_noidx1.text_distinct) = LOWER(p_findme)) > 0 > Result (cost=12661.43..12661.45 rows=1 width=1) (actual > time=179.561..179.561 rows=1 loops=1) > Buffers: shared read=2042 > InitPlan 1 (returns $1) > -> Finalize Aggregate (cost=12661.42..12661.43 rows=1 width=8) (actual > time=179.559..179.559 rows=1 loops=1) > Buffers: shared read=2042 > -> Gather (cost=12661.21..12661.42 rows=2 width=8) (actual > time=179.529..179.556 rows=3 loops=1) > Workers Planned: 2 > Workers Launched: 2 > Buffers: shared read=2042 > -> Partial Aggregate (cost=11661.21..11661.22 rows=1 width=8) > (actual time=162.831..162.831 rows=1 loops=3) > Buffers: shared read=5406 > -> Parallel Seq Scan on zz_noidx1 (cost=0.00..11656.00 rows=2083 > width=0) (actual time=162.824..162.824 rows=0 loops=3) > Filter: (lower(text_distinct) = 'test5000001'::text) > Rows Removed by Filter: 333333 > Buffers: shared read=5406 > 2018-04-16 14:57:22.642 EDT [15132] CONTEXT: SQL statement "SELECT > (SELECT COUNT(*) FROM zz_noidx1 WHERE LOWER(zz_noidx1.text_distinct) = > LOWER(p_findme)) > 0" > PL/pgSQL function zz_spx_ifcount_noidx(text) line 4 at IF > 2018-04-16 14:57:22.642 EDT [15132] LOG: duration: 199.371 ms plan: > Query Text: explain (analyze, buffers) select * from > zz_spx_ifcount_noidx('Test5000001') > Function Scan on zz_spx_ifcount_noidx (cost=0.25..0.26 rows=1 width=32) > (actual time=199.370..199.370 rows=1 loops=1) > Buffers: shared hit=218 read=5446 > > Here I could see that the 2 workers were getting employed again, which is > great. Just what I expected. And the execution time was in the same > ballpark as my first baseline test using just the query found inside of the > IF() check. 199 milliseonds. Okay. > > I moved on to test the other function with the following query... > > EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM zz_spx_ifcount_noidx(' > Test5000001'); > > and I got the following "auto_explain" results... > > 2018-04-16 14:58:34.134 EDT [12616] LOG: duration: 426.279 ms plan: > Query Text: SELECT EXISTS (SELECT 1 FROM zz_noidx1 WHERE > LOWER(zz_noidx1.text_distinct) = LOWER(p_findme)) > Result (cost=4.08..4.09 rows=1 width=1) (actual time=426.274..426.274 > rows=1 loops=1) > Buffers: shared read=5406 > InitPlan 1 (returns $0) > -> Seq Scan on zz_noidx1 (cost=0.00..20406.00 rows=5000 width=0) > (actual time=426.273..426.273 rows=0 loops=1) > Filter: (lower(text_distinct) = 'test5000001'::text) > Rows Removed by Filter: 1000000 > Buffers: shared read=5406 > 2018-04-16 14:58:34.134 EDT [12616] CONTEXT: SQL statement "SELECT EXISTS > (SELECT 1 FROM zz_noidx1 WHERE LOWER(zz_noidx1.text_distinct) = > LOWER(p_findme))" > PL/pgSQL function zz_spx_ifexists_noidx(text) line 4 at IF > 2018-04-16 14:58:34.134 EDT [12616] LOG: duration: 428.077 ms plan: > Query Text: explain (analyze, buffers) select * from > zz_spx_ifexists_noidx('Test5000001') > Function Scan on zz_spx_ifexists_noidx (cost=0.25..0.26 rows=1 width=32) > (actual time=428.076..428.076 rows=1 loops=1) > Buffers: shared hit=30 read=5438 > > Definitely not the execution time, or query plan, results I was > expecting. As we can see, no workers were employed here and my guess was > that this was the reason or the large execution time difference between > these 2 tests? 199 milliseconds versus 428 milliseconds, which is a big > difference. Why are workers not being employed here like they were when I > tested the query found inside of the IF() check in a standalone manner? > But then I ran another test and the results made even less sense to me. > > When I ran the above query the first 5 times after starting my Postgres > service, I got the same results each time (around 428 milliseconds), but > when running the query 6 or more times, the execution time jumps up to > almost double that. Here are the "auto_explain" results running this query > a 6th time... > > --"auto_explain" results after running the same query 6 or more times. > 2018-04-16 15:01:51.635 EDT [12616] LOG: duration: 761.847 ms plan: > Query Text: SELECT EXISTS (SELECT 1 FROM zz_noidx1 WHERE > LOWER(zz_noidx1.text_distinct) = LOWER(p_findme)) > Result (cost=4.58..4.59 rows=1 width=1) (actual time=761.843..761.843 > rows=1 loops=1) > Buffers: shared hit=160 read=5246 > InitPlan 1 (returns $0) > -> Seq Scan on zz_noidx1 (cost=0.00..22906.00 rows=5000 width=0) > (actual time=761.841..761.841 rows=0 loops=1) > Filter: (lower(text_distinct) = lower($1)) > Rows Removed by Filter: 1000000 > Buffers: shared hit=160 read=5246 > 2018-04-16 15:01:51.635 EDT [12616] CONTEXT: SQL statement "SELECT EXISTS > (SELECT 1 FROM zz_noidx1 WHERE LOWER(zz_noidx1.text_distinct) = > LOWER(p_findme))" > PL/pgSQL function zz_spx_ifexists_noidx(text) line 4 at IF > 2018-04-16 15:01:51.635 EDT [12616] LOG: duration: 762.156 ms plan: > Query Text: explain (analyze, buffers) select * from > zz_spx_ifexists_noidx('Test5000001') > Function Scan on zz_spx_ifexists_noidx (cost=0.25..0.26 rows=1 width=32) > (actual time=762.154..762.155 rows=1 loops=1) > Buffers: shared hit=160 read=5246 > > As you can see, the execution time jumps up to about 762 milliseonds. I > can see in the sequence scan node that the LOWER() function shows up on the > right side of the equal operator, whereas in the first 5 runs of this test > query the plan did not show this. Why is this? > > I tried increasing the "work_mem" setting to 1GB to see if this made any > difference, but the results were the same. > > So those were the tests that I performed and the results I received, which > left me with many questions. If anyone is able to help me understand this > behavior, I'd greatly appreciate it. This is my first post to the email > list, so I hope I did a good enough job providing all the information > needed. > > Thanks! > Ryan > > *PostgreSQL version number you are running:* > > PostgreSQL 10.2, compiled by Visual C++ build 1800, 64-bit > > *How you installed PostgreSQL:* > > Using the Enterprise DB installer. > > I have also installed Enterprise DB's Postgres Enterprise Manager (PEM) > 7.2.0 software and Enterprise DB's SQL Profiler PG10-7.2.0 software. The > PEM Agent service that gets installed is currently turned off. > > *Changes made to the settings in the postgresql.conf file: see Server > Configuration for a quick way to list them all.* > > name |current_setting > |source > -----------------------------------|------------------------ > ---------------|--------------------- > application_name |DBeaver 5.0.3 - > Main |session > auto_explain.log_analyze |on > |configuration file > auto_explain.log_buffers |on > |configuration file > auto_explain.log_min_duration |0 > |configuration file > auto_explain.log_nested_statements |on > |configuration file > auto_explain.log_triggers |on > |configuration file > client_encoding |UTF8 > |client > DateStyle |ISO, MDY > |client > default_text_search_config |pg_catalog.english > |configuration file > dynamic_shared_memory_type |windows > |configuration file > extra_float_digits |3 > |session > lc_messages |English_United > States.1252 |configuration file > lc_monetary |English_United > States.1252 |configuration file > lc_numeric |English_United > States.1252 |configuration file > lc_time |English_United > States.1252 |configuration file > listen_addresses |* > |configuration file > log_destination |stderr > |configuration file > log_timezone |US/Eastern > |configuration file > logging_collector |on > |configuration file > max_connections |100 > |configuration file > max_stack_depth |2MB > |environment variable > port |5432 > |configuration file > shared_buffers |128MB > |configuration file > shared_preload_libraries |$libdir/sql-profiler.dll, > auto_explain |configuration file > ssl |on > |configuration file > ssl_ca_file |root.crt > |configuration file > ssl_cert_file |server.crt > |configuration file > ssl_crl_file |root.crl > |configuration file > ssl_key_file |server.key > |configuration file > TimeZone |America/New_York > |client > > *Operating system and version:* > > Windows 10 Pro 64-bit, Version 1709 (Build 16299.309) > > *Hardware:* > > Processor - Intel Core i7-7820HQ @ 2.90GHz > RAM - 16GB > RAID? - No > Hard Drive - Samsung 512 GB SSD M.2 PCIe NVMe Opal2 > > *What program you're using to connect to PostgreSQL:* > > DBeaver Community Edition v5.0.3 > > *Is there anything relevant or unusual in the PostgreSQL server logs?:* > > Not that I noticed. > > *For questions about any kind of error:* > > N/A > > *What you were doing when the error happened / how to cause the error:* > > N/A > > *The EXACT TEXT of the error message you're getting, if there is one: > (Copy and paste the message to the email, do not send a screenshot)* > > N/A > > A support of parallel query execution is not complete - it doesn't work in PostgreSQL 11 too. So although EXISTS variant can be faster (but can be - the worst case of EXISTS is same like COUNT), then due disabled parallel execution the COUNT(*) is faster now. It is unfortunate, because I believe so this issue will be fixed in few years. Regards Pavel