HI, ChangAo Chen, David Geier , hackers 

>> But refreshing a materialized view doesn't only change relispopulated
>> but also columns like relfilenode, relpages, relhasindex, etc. Doesn't
>> changing these columns during REFRESH MATERIALIZED VIEW make your
>> optimization applicable in a lot less cases 
>
>I don't think so. If we can skip SetMatViewPopulatedState(), we avoid 
generating
>a dead pg_class tuple in all cases.
>
>> I'm actually wondering why it works at all, even in the example you
>> gave. Because I thought that even when nothing has changed the pg_class
>> row is updated for more columns than just relispopulated.
>
>"refresh materialized view concurrently" is done by doing DELETE + INSERT
>to the matview directly, so only relispopulated will change before the 
patch.
>After the patch, the pg_class row don't change anymore.
 


1.Regarding pg_class.relispopulated
This column was introduced in PostgreSQL 9.3 specifically for materialized 
views. 
Conceptually, the state of "IsScannable" is equivalent to and fully dependent 
on relispopulated.
This flag(relispopulated) is also used across multiple features, including 
COPY/pg_repack/pg_dump.
For the definitions of the populated and scannable states of materialized 
views, please refer to the official documentation:
    
https://www.postgresql.org/docs/devel/sql-creatematerializedview.html
    
https://www.postgresql.org/docs/devel/sql-refreshmaterializedview.html


2.Current status
As ChangAo Chen mentioned and verified via tests, and based on my own 
research and testing,
I agree with this conclusion:
    If pg_class.relispopulated does not actually change, there is *no 
need* to invoke SetMatViewPopulatedState().
Skipping this call will also avoid executing the subsequent call chain:
    CatalogTupleUpdate() -> simple_heap_update() -> 
heap_update().
    This helps prevent dead tuples from being generated, and also 
reduces overhead on underlying mechanisms
    including locks, transactions, WAL (xlog), as well as IPC 
contention and synchronization. 


3.Regarding updates to attributes in pg_class
> But refreshing a materialized view doesn't only change relispopulated
> but also columns like relfilenode, relpages, relhasindex, etc. Doesn't
> changing these columns during REFRESH MATERIALIZED VIEW make your
> optimization applicable in a lot less cases?
To clarify: SetMatViewPopulatedState() is dedicated solely to modifying 
relispopulated. 
Columns such as relfilenode, relpages and relhasindex are updated by other 
separate routines.


4. About SetMatViewPopulatedState() and its caller
There are two possible ways to implement this optimization:
    --The first approach is to refactor SetMatViewPopulatedState() to 
handle the check internally:
         If ((Form_pg_class) 
GETSTRUCT(tuple))->relispopulated != newstate, then perform the actual 
update; otherwise, do nothing.
    --The second approach is to perform the check inside the caller 
RefreshMatViewByOid(), as implemented in ChangAo Chen’s patch.
        I also noticed that intorel_startup() in createas.c 
follows the same design pattern — letting the caller handle the check before 
invoking the callee. 


5. A small suggestion
After further consideration, I suggest using the condition:
    if (RelationIsPopulated(matviewRel) != !skipData)
This is much more readable and intuitive compared to the original:
    if (RelationIsPopulated(matviewRel) == skipData)
Especially when paired with the original call:
    SetMatViewPopulatedState(matviewRel, !skipData);


Additionally, avoiding unnecessary updates and thus preventing dead tuples is 
especially valuable in BI/OLAP environments that use many materialized views 
and refresh them frequently (daily or hourly).


6. About Testing
--The new patch passes all PostgreSQL regression tests:
  xman@xman_lzz:~/patch_1/build/testrun/regress/regress$ grep -n matview 
regression.out
128:# parallel group (20 tests):  init_privs drop_operator security_label 
password lock object_address tablesample collate groupingsets replica_identity 
spgist gin matview identity gist generated_stored rowsecurity brin join_hash 
privileges
137:ok 126       + matview           
                      
47396 ms
xman@xman_lzz:~/patch_1/build/testrun/regress/regress$


--I have also checked the output files and expected files for the materialized 
view regression tests, and they are identical:
  xman@xman_lzz:~$ ls -l 
/home/xman/patch_1/postgres/src/test/regress/expected/matview.out
-rw-rw-r-- 1 xman xman 24946 May 25 22:29 
/home/xman/patch_1/postgres/src/test/regress/expected/matview.out
xman@xman_lzz:~$ ls -l 
/home/xman/patch_1/build/testrun/regress/regress/results/matview.out
-rw-rw-r-- 1 xman xman 24946 May 27 23:05 
/home/xman/patch_1/build/testrun/regress/regress/results/matview.out
xman@xman_lzz:~$ diff 
/home/xman/patch_1/postgres/src/test/regress/expected/matview.out 
/home/xman/patch_1/build/testrun/regress/regress/results/matview.out
xman@xman_lzz:~$ echo $?
0
xman@xman_lzz:~$ 




The above is what I have found so far. Please feel free to correct me if I have 
any misunderstandings.
Thanks,
--
Zizhuan Liu (X-MAN) 
[email protected]

Attachment: v2-0001-Avoid-calling-SetMatViewPopulatedState-if-possibl.patch
Description: Binary data

Reply via email to