OK... I apologize for the long email before. Right after I sent it, I thought 
of a much simpler use-case to illustrate the issue which doesn't depend on any 
special data I have access o and complex pivoting. It's as raw as I can make it.

I create a table with 1M rows and 2 columns. Column "a" is a random string, 
while column "b" is a random integer as a string. Then I use a UDF that 
converts strings to floats and handles an exception if the incoming string is 
not parsable as a float. Then I do a simple select of each column. In the "a" 
case, the UDF throws and catches lots of exceptions. In the "b" case, the 
conversion is clean and exceptions are not thrown.


create table sampletest (a varchar, b varchar);

insert into sampletest (a, b)
select substr(md5(random()::text), 0, 15), 
(100000000*random())::integer::varchar
from generate_series(1,1000000);

CREATE OR REPLACE FUNCTION toFloat(str varchar, val real)
RETURNS real AS $$
BEGIN
  RETURN case when str is null then val else str::real end;
EXCEPTION WHEN OTHERS THEN
  RETURN val;
END;
$$ LANGUAGE plpgsql COST 1 IMMUTABLE;

select MAX(toFloat(a, null)) as "a" from sampletest;

select MAX(toFloat(b, null)) as "b" from sampletest;



On purpose, I am doing a max(toFloat) instead of toFloat(max) to exercise the 
UDF 1M times.


V13.4 "a" scenario (exceptions)
-------------------------------------------------------------
Aggregate  (cost=14778.40..14778.41 rows=1 width=4) (actual 
time=774098.537..774098.538 rows=1 loops=1)
  Buffers: shared hit=6373
  ->  Seq Scan on sampletest  (cost=0.00..11975.60 rows=560560 width=32) 
(actual time=0.011..285.458 rows=1000000 loops=1)
        Buffers: shared hit=6370
Planning Time: 0.066 ms
Execution Time: 774,098.563 ms


V13.4 "b" scenario (no exceptions)
-------------------------------------------------------------
Aggregate  (cost=14778.40..14778.41 rows=1 width=4) (actual 
time=1510.200..1510.201 rows=1 loops=1)
  Buffers: shared hit=6385
  ->  Seq Scan on sampletest  (cost=0.00..11975.60 rows=560560 width=32) 
(actual time=0.024..115.196 rows=1000000 loops=1)
        Buffers: shared hit=6370
Planning:
  Buffers: shared hit=26
Planning Time: 0.361 ms
Execution Time: 1,530.659 ms


V11.2 "a" scenario (exceptions)
-------------------------------------------------------------
Aggregate  (cost=21658.00..21658.01 rows=1 width=4) (actual 
time=26528.286..26528.286 rows=1 loops=1)
  Buffers: shared hit=6393
  ->  Seq Scan on sampletest  (cost=0.00..16562.00 rows=1019200 width=15) 
(actual time=0.037..190.633 rows=1000000 loops=1)
        Buffers: shared hit=6370
Planning Time: 1.182 ms
Execution Time: 26,530.492 ms


V11.2 "b" scenario (no exceptions)
-------------------------------------------------------------
Aggregate  (cost=21658.00..21658.01 rows=1 width=4) (actual 
time=1856.116..1856.116 rows=1 loops=1)
  Buffers: shared hit=6370
  ->  Seq Scan on sampletest  (cost=0.00..16562.00 rows=1019200 width=8) 
(actual time=0.014..88.152 rows=1000000 loops=1)
        Buffers: shared hit=6370
Planning Time: 0.098 ms
Execution Time: 1,856.152 ms





Summary:

  *   Scenario V11.2/a: 26.6s
  *   Scenario V11.2/b: 1.9s
  *   Scenario V13.4/a: 774.1s
  *   Scenario V13.4/b: 1.5s

Conclusion:

  *   The no-exception scenario performs 20% better on 13.4 vs 11.2 (nice for a 
straight scan!)
  *   On 11.2, exceptions add an overhead of over 14x (1.9s vs 26.6s). I did 
not expect exceptions to add such a large overhead. Why is that?
  *   Between 11.2 and 13.4, the no-exceptions scenario "b" performs 30x slower 
(26.6s vs 774.1s).

Thank you!
Laurent Hasson.



From: l...@laurent-hasson.com <l...@laurent-hasson.com>
Sent: Saturday, August 21, 2021 03:57
To: pgsql-performa...@postgresql.org
Subject: Big Performance drop of Exceptions in UDFs between V11.2 and 13.4

Hello all,

I think I have identified a major performance issue between V11.2 and 13.4 with 
respect to exception handling in UDFs. I have the following simplified query 
that pivots data and makes use of a UDF to convert data to a specific type, in 
this case, float:


select "iccqa_iccassmt_fk"
     , Tilda.toFloat(MAX("iccqar_ans_val") filter (where "iccqar_ques_code"= 
'DEPTH (CM)') ,null) as "iccqa_DEPTH_CM"
     , Tilda.toFloat(MAX("iccqar_ans_val") filter (where "iccqar_ques_code"= 
'LENGTH (CM)') ,null) as "iccqa_LENGTH_CM"
     , Tilda.toFloat(MAX("iccqar_ans_val") filter (where "iccqar_ques_code"= 
'WIDTH (CM)') ,null) as "iccqa_WIDTH_CM"
     , Tilda.toFloat(MAX("iccqar_ans_val") filter (where "iccqar_ques_code"= 
'DRAIN PRESENT') ,null) as "iccqa_DRAIN_PRESENT"
     , Tilda.toFloat(MAX("iccqar_ans_val") filter (where "iccqar_ques_code"= 
'MEASUREMENTS TAKEN') ,null) as "iccqa_MEASUREMENTS_TAKEN"
     , Tilda.toFloat(MAX("iccqar_ans_val") filter (where "iccqar_ques_code"= 
'SIGNS AND SYMPTOMS OF INFECTION') ,null) as 
"iccqa_SIGNS_AND_SYMPTOMS_OF_INFECTION"
from  (
-- 'A pivoted view of ICC QA assessments'
select VNAHGEDW_FACTS.AssessmentICCQA_Raw."iccqar_iccassmt_fk" as 
"iccqa_iccassmt_fk" -- The key identifying an ICC assessment.
     , VNAHGEDW_FACTS.AssessmentICCQA_Raw."iccqar_ques_code" as 
"iccqar_ques_code" -- The question long code from the meta-data.
     , max(VNAHGEDW_FACTS.AssessmentICCQA_Raw."iccqar_ans_val") as 
"iccqar_ans_val" -- The official answer, if applicable) from the meta-data.
  from VNAHGEDW_FACTS.AssessmentICCQA_Raw
where VNAHGEDW_FACTS.AssessmentICCQA_Raw."iccqar_ques_code" in ('DEPTH (CM)', 
'LENGTH (CM)', 'WIDTH (CM)'
                                                               , 'DRAIN 
PRESENT', 'MEASUREMENTS TAKEN', 'SIGNS AND SYMPTOMS OF INFECTION'
                                                               ) group by 1, 2
) T
     group by 1
;


The UDF is simple as follows:


CREATE OR REPLACE FUNCTION TILDA.toFloat(str varchar, val real)
RETURNS real AS $$
BEGIN
  RETURN case when str is null then val else str::real end;
EXCEPTION WHEN OTHERS THEN
  RETURN val;
END;
$$ LANGUAGE plpgsql COST 1 IMMUTABLE;



It works as a coalesce but with a conversion. I think I have identified some 
large performance difference with the exception handling. It so happens that 
with the last 3 columns ('DRAIN PRESENT', 'MEASUREMENTS TAKEN' and 'SIGNS AND 
SYMPTOMS OF INFECTION'), the data is VERY dirty. There is a mix of 0/1, YES/NO, 
and other mistyped stuff. This means these 3 columns throw lots of exceptions 
in the UDF. To illustrate, I simply break this into 2 queries.



select "iccqa_iccassmt_fk"
     , (Tilda.toFloat(MAX("iccqar_ans_val") filter (where "iccqar_ques_code"= 
'DEPTH (CM)') ,null))::real as "iccqa_DEPTH_CM"
     , (Tilda.toFloat(MAX("iccqar_ans_val") filter (where "iccqar_ques_code"= 
'LENGTH (CM)') ,null))::real as "iccqa_LENGTH_CM"
     , (Tilda.toFloat(MAX("iccqar_ans_val") filter (where "iccqar_ques_code"= 
'WIDTH (CM)') ,null))::real as "iccqa_WIDTH_CM"
--     , (Tilda.toFloat(MAX("iccqar_ans_val") filter (where "iccqar_ques_code"= 
'DRAIN PRESENT') ,null))::real as "iccqa_DRAIN_PRESENT"
--     , (Tilda.toFloat(MAX("iccqar_ans_val") filter (where "iccqar_ques_code"= 
'MEASUREMENTS TAKEN') ,null))::real as "iccqa_MEASUREMENTS_TAKEN"
--     , (Tilda.toFloat(MAX("iccqar_ans_val") filter (where "iccqar_ques_code"= 
'SIGNS AND SYMPTOMS OF INFECTION') ,null))::real as 
"iccqa_SIGNS_AND_SYMPTOMS_OF_INFECTION"
from  (
-- 'A pivoted view of ICC QA assessments'
select VNAHGEDW_FACTS.AssessmentICCQA_Raw."iccqar_iccassmt_fk" as 
"iccqa_iccassmt_fk" -- The key identifying an ICC assessment.
     , VNAHGEDW_FACTS.AssessmentICCQA_Raw."iccqar_ques_code" as 
"iccqar_ques_code" -- The question long code from the meta-data.
     , max(VNAHGEDW_FACTS.AssessmentICCQA_Raw."iccqar_ans_val") as 
"iccqar_ans_val" -- The official answer, if applicable) from the meta-data.
  from VNAHGEDW_FACTS.AssessmentICCQA_Raw
where VNAHGEDW_FACTS.AssessmentICCQA_Raw."iccqar_ques_code" in ('DEPTH (CM)', 
'LENGTH (CM)', 'WIDTH (CM)'
                                                               , 'DRAIN 
PRESENT', 'MEASUREMENTS TAKEN', 'SIGNS AND SYMPTOMS OF INFECTION'
                                                               )
group by 1, 2
) T
     group by 1
;


The performance is as expected.


HashAggregate  (cost=448463.70..448467.20 rows=200 width=16) (actual 
time=6760.797..9585.397 rows=677899 loops=1)
  Group Key: assessmenticcqa_raw.iccqar_iccassmt_fk
  Batches: 1  Memory Usage: 147489kB
  Buffers: shared hit=158815
  ->  HashAggregate  (cost=405997.87..417322.09 rows=1132422 width=56) (actual 
time=4576.514..5460.770 rows=2374628 loops=1)
        Group Key: assessmenticcqa_raw.iccqar_iccassmt_fk, 
assessmenticcqa_raw.iccqar_ques_code
        Batches: 1  Memory Usage: 368657kB
        Buffers: shared hit=158815
        ->  Seq Scan on assessmenticcqa_raw  (cost=0.00..388224.53 rows=2369779 
width=38) (actual time=0.033..3298.544 rows=2374628 loops=1)
              Filter: ((iccqar_ques_code)::text = ANY ('{"DEPTH (CM)","LENGTH 
(CM)","WIDTH (CM)","DRAIN PRESENT","MEASUREMENTS TAKEN","SIGNS AND SYMPTOMS OF 
INFECTION"}'::text[]))
              Rows Removed by Filter: 10734488
              Buffers: shared hit=158815
Planning:
  Buffers: shared hit=3
Planning Time: 0.198 ms
Execution Time: 9678.120 ms



However, once we switch with the three "bad" columns, the results fall apart.



select "iccqa_iccassmt_fk"
--     , (Tilda.toFloat(MAX("iccqar_ans_val") filter (where "iccqar_ques_code"= 
'DEPTH (CM)') ,null))::real as "iccqa_DEPTH_CM"
--     , (Tilda.toFloat(MAX("iccqar_ans_val") filter (where "iccqar_ques_code"= 
'LENGTH (CM)') ,null))::real as "iccqa_LENGTH_CM"
--     , (Tilda.toFloat(MAX("iccqar_ans_val") filter (where "iccqar_ques_code"= 
'WIDTH (CM)') ,null))::real as "iccqa_WIDTH_CM"
     , (Tilda.toFloat(MAX("iccqar_ans_val") filter (where "iccqar_ques_code"= 
'DRAIN PRESENT') ,null))::real as "iccqa_DRAIN_PRESENT"
     , (Tilda.toFloat(MAX("iccqar_ans_val") filter (where "iccqar_ques_code"= 
'MEASUREMENTS TAKEN') ,null))::real as "iccqa_MEASUREMENTS_TAKEN"
     , (Tilda.toFloat(MAX("iccqar_ans_val") filter (where "iccqar_ques_code"= 
'SIGNS AND SYMPTOMS OF INFECTION') ,null))::real as 
"iccqa_SIGNS_AND_SYMPTOMS_OF_INFECTION"
from  (
-- 'A pivoted view of ICC QA assessments'
select VNAHGEDW_FACTS.AssessmentICCQA_Raw."iccqar_iccassmt_fk" as 
"iccqa_iccassmt_fk" -- The key identifying an ICC assessment.
     , VNAHGEDW_FACTS.AssessmentICCQA_Raw."iccqar_ques_code" as 
"iccqar_ques_code" -- The question long code from the meta-data.
     , max(VNAHGEDW_FACTS.AssessmentICCQA_Raw."iccqar_ans_val") as 
"iccqar_ans_val" -- The official answer, if applicable) from the meta-data.
  from VNAHGEDW_FACTS.AssessmentICCQA_Raw
where VNAHGEDW_FACTS.AssessmentICCQA_Raw."iccqar_ques_code" in ('DEPTH (CM)', 
'LENGTH (CM)', 'WIDTH (CM)'
                                                               , 'DRAIN 
PRESENT', 'MEASUREMENTS TAKEN', 'SIGNS AND SYMPTOMS OF INFECTION'
                                                               )
group by 1, 2
) T
     group by 1
;



The performance falls apart. It is a huge performance difference from ~10s to 
~11mn and the only difference that I can think of is that the data is dirty 
which causes the exception path to be taken. The explain is:


HashAggregate  (cost=448463.70..448467.20 rows=200 width=16) (actual 
time=6672.921..696753.080 rows=677899 loops=1)
  Group Key: assessmenticcqa_raw.iccqar_iccassmt_fk
  Batches: 1  Memory Usage: 131105kB
  Buffers: shared hit=158815
  ->  HashAggregate  (cost=405997.87..417322.09 rows=1132422 width=56) (actual 
time=4574.918..5446.022 rows=2374628 loops=1)
        Group Key: assessmenticcqa_raw.iccqar_iccassmt_fk, 
assessmenticcqa_raw.iccqar_ques_code
        Batches: 1  Memory Usage: 368657kB
        Buffers: shared hit=158815
        ->  Seq Scan on assessmenticcqa_raw  (cost=0.00..388224.53 rows=2369779 
width=38) (actual time=0.032..3300.616 rows=2374628 loops=1)
              Filter: ((iccqar_ques_code)::text = ANY ('{"DEPTH (CM)","LENGTH 
(CM)","WIDTH (CM)","DRAIN PRESENT","MEASUREMENTS TAKEN","SIGNS AND SYMPTOMS OF 
INFECTION"}'::text[]))
              Rows Removed by Filter: 10734488
              Buffers: shared hit=158815
Planning:
  Buffers: shared hit=3
Planning Time: 0.201 ms
Execution Time: 696868.845 ms



Now, on V11.2, the explain is:


HashAggregate  (cost=492171.36..492174.86 rows=200 width=16) (actual 
time=19322.522..50556.738 rows=743723 loops=1)
  Group Key: assessmenticcqa_raw.iccqar_iccassmt_fk
  Buffers: shared hit=11 read=174155 dirtied=13
  ->  HashAggregate  (cost=445458.43..457915.21 rows=1245678 width=56) (actual 
time=16260.015..17575.088 rows=2601088 loops=1)
        Group Key: assessmenticcqa_raw.iccqar_iccassmt_fk, 
assessmenticcqa_raw.iccqar_ques_code
        Buffers: shared read=174155 dirtied=13
        ->  Seq Scan on assessmenticcqa_raw  (cost=0.00..425803.93 rows=2620600 
width=38) (actual time=0.126..14425.239 rows=2601088 loops=1)
              Filter: ((iccqar_ques_code)::text = ANY ('{"DEPTH (CM)","LENGTH 
(CM)","WIDTH (CM)","DRAIN PRESENT","MEASUREMENTS TAKEN","SIGNS AND SYMPTOMS OF 
INFECTION"}'::text[]))
              Rows Removed by Filter: 11778360
              Buffers: shared read=174155 dirtied=13
Planning Time: 36.121 ms
Execution Time: 50730.255 ms



So, we are seeing two issues:

-          I think exception handling is significantly slower between V11.2 and 
v13.4. I see almost a 14x difference from 50s to 700s.

-          Comparing the two queries on V11.2, the difference is 13s vs 50s. So 
even on V11.2, the exception handling adds a significant overhead which I was 
not expecting.

I'll be happy to update my test cases and share additional info if needed.

Thank you,
Laurent Hasson.

Reply via email to