Costing bug in hash join logic for semi joins
There is a costing bug in hash join logic seems to have been introduced by
the patch related to inner_unique enhancements(commit:
9c7f5229ad68d7e0e4dd149e3f80257893e404d4). Specifically, "hashjointuples"
which tracks the number of matches for hash clauses is computed wrong for
inner unique scenario. This leads to lot of semi-joins incorrectly turn to
inner joins with unique on inner side. Function "final_cost_hashjoin" has
special handling to cost semi/anti joins to account for early stop after
the first match. This is enhanced by the above said commit to be used for
inner_unique scenario also. However, "hashjointuples" computation is not
fixed to handle this new scenario which is incorrectly stepping into the
anti join logic and assuming unmatched rows. Fix is to handle inner_unique
case when computing "hashjointuples". Here is the outline of the code that
shows the bug.
void
final_cost_hashjoin(PlannerInfo *root, HashPath *path,
JoinCostWorkspace *workspace,
JoinPathExtraData *extra)
{
.
/* CPU costs */
*if (path->jpath.jointype == JOIN_SEMI ||
path->jpath.jointype == JOIN_ANTI ||extra->inner_unique)*
{
..
*/* Get # of
tuples that will pass the basic join */ if
(path->jpath.jointype == JOIN_SEMI) hashjointuples =
outer_matched_rows; else hashjointuples =
outer_path_rows - outer_matched_rows; *
}
else
{
.
}
}
Thanks, RK (Salesforce)
AIX support
Hello Team, We are working on AIX systems and noticed that the thread on removing AIX support in Postgres going forward. https://github.com/postgres/postgres/commit/0b16bb8776bb834eb1ef8204ca95dd7667ab948b” We would be glad to understand any outstanding issues hindering the support on AIX. It is important for us to have Postgres to be supported on AIX. As we are using Postgres extensively on AIX. Also we would like to provide any feasible support from our end for enabling the support on AIX. We would request the community to extend the support on AIX .. Thanks & regards, Sriram.
Re: AIX support
Thanks, Tom and Alvaro, for the info. We shall look into to details and get back. From: Tom Lane Date: Thursday, 21 March 2024 at 7:27 PM To: Sriram RK Cc: [email protected] Subject: Re: AIX support Sriram RK writes: > We are working on AIX systems and noticed that the thread on removing AIX > support in Postgres going forward. > https://github.com/postgres/postgres/commit/0b16bb8776bb834eb1ef8204ca95dd7667ab948b > We would be glad to understand any outstanding issues hindering the > support on AIX. Did you read the linked thread? Starting say about here: https://www.postgresql.org/message-id/flat/20240224172345.32%40rfd.leadboat.com#8b41e50c2190c82c861d91644eed9c30 > Also we would like to provide any feasible support from our end for enabling > the support on AIX. Who is "we", and how much resources are you prepared to put into this? > We would request the community to extend the support on AIX .. The community, in the sense of the existing people doing significant work on Postgres, are absolutely not going to do that. If you can bring a bunch of work to fix all the problems noted in the discussion thread, and commit to providing ongoing development manpower and hardware to keep it working, maybe something could happen. But I suspect you will find it cheaper to start thinking about migrating off AIX. regards, tom lane
Re: AIX support
Hi Team, We are setting up the build environment and trying to build the source and also trying to analyze the assert from the Aix point of view. Also, would like to know if we can access the buildfarm(power machines) to run any of the specific tests to hit this assert. Thanks & regards, Sriram. > From: Sriram RK > Date: Thursday, 21 March 2024 at 10:05 PM > To: Tom Lane [email protected]<mailto:[email protected]>, Alvaro Herrera > Cc: [email protected]<mailto:[email protected]> > Subject: Re: AIX support > Thanks, Tom and Alvaro, for the info. > We shall look into to details and get back.
Re: AIX support
> What you do need to do to reproduce the described problems is > check out the Postgres git tree and rewind to just before > commit 0b16bb877, where we deleted AIX support. Any attempt > to restore AIX support would have to start with reverting that > commit (and perhaps the followup f0827b443). > regards, tom lane Hi Team, thank you for all the info. We progressed to build the source on our nodes and the build was successful with the below configuration. Postgres - github-bcdfa5f2e2f AIX - 71c Xlc - 13.1.0 Bison- 3.0.5 Going ahead, we want to build the changes that were removed as part of “0b16bb8776bb8”, with latest Xlc and gcc version. We were building the source from the postgres ftp server(https://ftp.postgresql.org/pub/source/), would like to understand if there are any source level changes between the ftp server and the source on github? Regards, Sriram. From: Tom Lane Date: Friday, 29 March 2024 at 9:03 AM To: Thomas Munro Cc: Noah Misch , Sriram RK , Alvaro Herrera , [email protected] Subject: Re: AIX support Thomas Munro writes: > Oh, sorry, I had missed the part where newer compilers fix the issue > too. Old out-of-support versions of AIX running old compilers, what > fun. Indeed. One of the topics that needs investigation if you want to pursue this is which AIX system and compiler versions still deserve support, and which of the AIX hacks we had been carrying still need to be there based on that analysis. For context, we've been pruning support for extinct-in-the-wild OS versions pretty aggressively over the past couple of years, and I'd expect to apply the same standard to AIX. regards, tom lane
Re: AIX support
Thanks Noah and Team, We (IBM-AIX team) looked into this issue https://www.postgresql.org/message-id/[email protected] This is related to the compiler issue. The compilers xlc(13.1) and gcc(8.0) have issues. But we verified that this issue is resolved with the newer compiler versions openXL(xlc17.1) and gcc(12.0) onwards. We reported this issue to the xlc team and they have noted this issue. A fix might be possible in May for this issue in xlc v16. We would like to understand if the community can start using the latest compilers to build the source. Also as part of the support, we will help in fixing all the issues related to AIX and continue to support AIX for Postgres. If we need another CI environment we can work to make one available. But for time being can we start reverting the patch that has removed AIX support. We want to make a note that postgres is used extensively in our IBM product and is being exploited by multiple customers. Please let us know if there are any specific details you'd like to discuss further. Regards, Sriram.
Re: AIX support
> Let's start by setting up a new AIX buildfarm member. Regardless of what we > do with v17, we continue to support AIX on the stable branches, and we really > need a buildfarm member to keep the stable branches working anyway. Thanks Heikki. We had already build the source code(v17+ bcdfa5f2e2f) on our local nodes. We will try to setup the buildfarm and let you know. Is there any specific configuration we are looking for? Regards, Sriram.
Re: AIX support
For any complier/hardware related issue we should able to provide support. We are in the process of identifying the AIX systems that can be added to the CI/buildfarm environment. Regards, Sriram.
Re: AIX support
Hi Team, > I have some sympathy for this. The discussion about removing AIX > support had a very short turnaround and happened in an unrelated thread, > without any sort of public announcement or consultation. So this report > of "hey, we were still using that" is timely and fair. We would really like to thank you & the team for considering our request, and really appreciate for providing all the possible options to support AIX. > But the underlying issue that led to the removal (something to do with > direct I/O support and alignment) would still need to be addressed. As we already validated that these alignment specific issues are resolved with the latest versions of the compilers (gcc/ibm-clang). We would request you to use the latest versions for the build. > If we are making the reinstatement of AIX > support contingent on new buildfarm support, those machines need to be > available, at least initially, at least for back branches, like in a > week. Which seems tight. We are already working with the internal team in procuring the nodes for the buildfarm, which can be accessible by the community. > I can see several ways going forward: > 1. We revert the removal of AIX support and carry on with the status quo > ante. (The removal of AIX is a regression; it is timely and in scope > now to revert the change.) > 2. Like (1), but we consider that notice has been given, and we will > remove it early in PG18 (like August) unless the situation improves. We would really appreciate you for providing the possible options and we are very much inclined to these above approaches. Regards, Sriram.
Re: AIX support
> > It would definitely make sense for a new port to start by getting > > things going with gcc only, and then look at resurrecting xlc > > support. > Sriram mentioned upthread that he was looking at both of them. I'd be > ready to assume that most of the interest is in xlc, not gcc. But I > may be wrong. Just a heads-up, we got a node in the OSU lab for the buildfarm. Will let you know once we have the buildfarm setup on that node. Also, we are making progress on setting up the buildfarm on a local node as well. But currently there are some tests failing, seems some issue with plperl. aix01::build-farm-17# ./run_build.pl --keepall --nosend --nostatus --verbose=5 --force REL_16_STABLE Fri Apr 26 00:53:50 2024: buildfarm run for AIXnode01:REL_16_STABLE starting AIXnode01:REL_16_STABLE [00:53:50] checking out source ... AIXnode01:REL_16_STABLE [00:53:56] checking if build run needed ... AIXnode01:REL_16_STABLE [00:53:56] copying source to pgsql.build ... AIXnode01:REL_16_STABLE [00:54:08] running configure ... AIXnode01:REL_16_STABLE [00:55:01] running build ... AIXnode01:REL_16_STABLE [01:08:09] running basic regression tests ... AIXnode01:REL_16_STABLE [01:09:51] running make contrib ... AIXnode01:REL_16_STABLE [01:11:08] running make testmodules ... AIXnode01:REL_16_STABLE [01:11:19] running install ... AIXnode01:REL_16_STABLE [01:11:48] running make contrib install ... AIXnode01:REL_16_STABLE [01:12:01] running testmodules install ... AIXnode01:REL_16_STABLE [01:12:06] checking test-decoding gmake: gcc: A file or directory in the path name does not exist. AIXnode01:REL_16_STABLE [01:12:28] running make check miscellaneous modules ... gmake: gcc: A file or directory in the path name does not exist. AIXnode01:REL_16_STABLE [01:13:50] setting up db cluster (C)... AIXnode01:REL_16_STABLE [01:13:53] starting db (C)... AIXnode01:REL_16_STABLE [01:13:53] running installcheck (C)... AIXnode01:REL_16_STABLE [01:15:05] restarting db (C)... AIXnode01:REL_16_STABLE [01:15:07] running make isolation check ... AIXnode01:REL_16_STABLE [01:15:51] restarting db (C)... AIXnode01:REL_16_STABLE [01:15:56] running make PL installcheck (C)... Branch: REL_16_STABLE Stage PLCheck-C failed with status 2
Re: AIX support
Hi Team, There are couple of updates, firstly we got an AIX node on the OSU lab. Please feel free to reach me, so that we can provide access to the node. We have started working on setting up the buildfarm on that node. Secondly, as part of the buildfarm setup on our local nodes, we are hitting an issue with the plperl extension. In the logs we noticed that when the plperl extension is being created, it is failing to load the perl library. - CREATE EXTENSION plperlu; + server closed the connection unexpectedly + This probably means the server terminated abnormally + before or while processing the request. + connection to server was lost In the logfile we could see these 2024-05-04 05:05:17.537 CDT [3997786:17] pg_regress/plperl_setup LOG: statement: CREATE EXTENSION plperl; Util.c: loadable library and perl binaries are mismatched (got first handshake key 9b80080, needed 9a80080) We tried to resolve some of the suggestions mentioned here, but things couldn’t resolve. https://www.postgresql.org/message-id/CALDaNm15qaRFb3WiPFAdFqoB9pj1E5SCPPUGB%2BnJ4iF4gXO6Rw%40mail.gmail.com Any inputs here would be greatly appreciated. Regards, Sriram.
Re: AIX support
Hi Team, on further investigation we were able to resolve the perl issue by setting the right PERL env location. Earlier it was pointing to the 32bit perl, as a result the perl lib mismatch seems to be happening. Now we have successfully built release 15 and 16 stable branches on the OSU lab node. p9aix (OSU) OS: AIX 72Z RELEASE 16 p9aix:REL_16_STABLE [08:31:26] OK log passed to send_result === Branch: REL_16_STABLE All stages succeeded RELEASE 15 p9aix:REL_15_STABLE [08:55:37] OK log passed to send_result === Branch: REL_15_STABLE All stages succeeded Also, we had successfully built release 16 branch on our local nodes as well OS: AIX 71C pgsql-aix71C:REL_16_STABLE [02:25:32] OK log passed to send_result === Branch: REL_16_STABLE All stages succeeded OS: AIX72Z pgsql-aix72Z:REL_16_STABLE [02:35:03] OK log passed to send_result === Branch: REL_16_STABLE All stages succeeded OS: AIX73D pgsql-aix73D:REL_16_STABLE [05:32:29] OK log passed to send_result === Branch: REL_16_STABLE All stages succeeded Regards, Sriram.
Re: AIX support
Hi Team, We have the AIX node ready in OSU lab, and the branches 15 and 16 got build on the node. We had raised a request to register this node as buildfarm member. Yet to receive the approval. We would like to understand your inputs/plans on reverting the changes for AIX. Thanks, Sriram.
Re: AIX support
Hi Team, we have any updated from the XLC team, the issue specific to the alignment is fixed and XLC had released it as part of 16.1.0.18. The PTF is available at the below location, You can also find a link here: https://www.ibm.com/support/pages/fix-list-xl-cc-aix. >>/opt/IBM/xlC/16.1.0/bin/xlC align.c -o align.xl >>./align.xl al4096 4096 @ 0x20008000 (mod 0) al4096_initialized 4096 @ 0x20004000 (mod 0) al4096_const 4096 @ 0x2000b000 (mod 0) al4096_const_initialized 4096 @ 0x10008000 (mod 0) al4096_static4096 @ 0x2000e000 (mod 0) al4096_static_initialized4096 @ 0x20001000 (mod 0) al4096_static_const 4096 @ 0x20011000 (mod 0) al4096_static_const_initialized 4096 @ 0x10001000 (mod 0) Also would like to know some info related to the request raised for buildfarm access, to register the node in OSU lab. Where can I get the status of the request? Whom can I contact to get the request approved? So that we can add the node to the buildfarm. Regards, Sriram.
Re: AIX support
> > Also would like to know some info related to the request raised for > > buildfarm access, to register the > > node in OSU lab. Where can I get the status of the request? Whom can I > > contact to get the request > > approved? So that we can add the node to the buildfarm. > I assume you filled out the form at > https://buildfarm.postgresql.org/cgi-bin/register-form.pl? It can take a few > weeks, so I wouldn't worry yet. Thanks Noha, I had already submitted a form a week back, hope it might take another couple of weeks to get it approved.
Re: AIX support
Hi Team, We have an update wrt to the PG17 AIX port. We have reverted the changes specific to AIX (that were removed in 0b16bb8776bb8) to the latest PG17 (head). The Buildfarm succeeded for these changes. All the tests passed. System config OS level : AIX-73D Compiler : gcc-12 & xlc(16.1.0.18) Wed May 15 21:26:00 2024: buildfarm run for AIXnode01:HEAD starting AIXnode01:HEAD [21:26:00] running configure ... AIXnode01:HEAD [21:27:03] running build ... AIXnode01:HEAD [21:27:27] running basic regression tests ... AIXnode01:HEAD [21:34:41] running make contrib ... AIXnode01:HEAD [21:34:43] running make testmodules ... AIXnode01:HEAD [21:34:44] running install ... AIXnode01:HEAD [21:34:58] running make contrib install ... AIXnode01:HEAD [21:35:05] running testmodules install ... AIXnode01:HEAD [21:35:08] checking pg_upgrade AIXnode01:HEAD [21:35:08] checking test-decoding AIXnode01:HEAD [21:35:29] running make check miscellaneous modules ... AIXnode01:HEAD [21:36:16] setting up db cluster (C)... AIXnode01:HEAD [21:36:19] starting db (C)... AIXnode01:HEAD [21:36:19] running installcheck (C)... AIXnode01:HEAD [21:46:27] restarting db (C)... AIXnode01:HEAD [21:46:29] running make isolation check ... AIXnode01:HEAD [21:49:57] restarting db (C)... AIXnode01:HEAD [21:50:02] running make PL installcheck (C)... AIXnode01:HEAD [21:50:09] restarting db (C)... AIXnode01:HEAD [21:50:12] running make contrib installcheck (C)... AIXnode01:HEAD [21:53:53] restarting db (C)... AIXnode01:HEAD [21:53:56] running make test-modules installcheck (C)... AIXnode01:HEAD [21:54:28] stopping db (C)... AIXnode01:HEAD [21:54:29] running make ecpg check ... AIXnode01:HEAD [21:54:45] OK Branch: HEAD All stages succeeded The below changes are applied on this commit level commit 54b69f1bd730a228a666441592a12d2a0cbe2a06 (HEAD -> pgAIX, origin/master, origin/HEAD, master) On branch pgAIX Changes to be committed: (use "git restore --staged ..." to unstage) new file: src/backend/port/aix/mkldexport.sh new file: src/include/port/aix.h new file: src/makefiles/Makefile.aix new file: src/template/aix Changes not staged for commit: (use "git add ..." to update what will be committed) (use "git restore ..." to discard changes in working directory) modified: Makefile modified: config/c-compiler.m4 modified: configure modified: configure.ac modified: doc/src/sgml/dfunc.sgml modified: doc/src/sgml/installation.sgml modified: doc/src/sgml/runtime.sgml modified: meson.build modified: src/Makefile.shlib modified: src/backend/Makefile modified: src/backend/meson.build modified: src/backend/storage/buffer/bufmgr.c modified: src/backend/utils/error/elog.c modified: src/backend/utils/misc/ps_status.c modified: src/bin/pg_basebackup/t/010_pg_basebackup.pl modified: src/bin/pg_verifybackup/t/008_untar.pl modified: src/bin/pg_verifybackup/t/010_client_untar.pl modified: src/include/c.h modified: src/include/port/atomics.h modified: src/include/storage/s_lock.h modified: src/interfaces/libpq/Makefile modified: src/interfaces/libpq/meson.build modified: src/port/README modified: src/port/strerror.c modified: src/test/regress/Makefile modified: src/test/regress/expected/sanity_check.out modified: src/test/regress/expected/test_setup.out modified: src/test/regress/regress.c modified: src/test/regress/sql/sanity_check.sql modified: src/test/regress/sql/test_setup.sql modified: src/tools/gen_export.pl modified: src/tools/pginclude/headerscheck Can you please let us know, the process to post the changes for review? Regards, Sriram.
Re: AIX support
Thanks Alvaro, for the info… Hi Team, We referred to the below links to build this patch … https://wiki.postgresql.org/wiki/Submitting_a_Patch https://peter.eisentraut.org/blog/2023/05/09/how-to-submit-a-patch-by-email-2023-edition Please find the attached patch. Apart from the AIX specific changes, there is a minor change in this file wrt to XLC, below is the error for which we removed inline. Later, the build and tests passed for both XLC(16.1.0.18) and gcc(12) as well. src/backend/storage/buffer/bufmgr.c "bufmgr.c", line 811.39: 1506-780 (S) Reference to "RelationGetSmgr" with internal linkage is not allowed within inline definition of "ReadBufferExtended". "bufmgr.c", line 811.15: 1506-780 (S) Reference to "ReadBuffer_common" with internal linkage is not allowed within inline definition of "ReadBufferExtended". gmake[4]: *** [: bufmgr.o] Error 1 Please let us know your feedback. Thanks, Sriram. 0001-AIX-support-revert-the-changes-from-0b16bb8776bb8.patch Description: 0001-AIX-support-revert-the-changes-from-0b16bb8776bb8.patch
Costing bug in hash join logic for semi joins
There is a costing bug in hash join logic seems to have been introduced by
the patch related to inner_unique enhancements(commit:
9c7f5229ad68d7e0e4dd149e3f80257893e404d4). Specifically, "hashjointuples"
which tracks the number of matches for hash clauses is computed wrong for
inner unique scenario. This leads to lot of semi-joins incorrectly turn to
inner joins with unique on inner side. Function "final_cost_hashjoin" has
special handling to cost semi/anti joins to account for early stop after
the first match. This is enhanced by the above said commit to be used for
inner_unique scenario also. However, "hashjointuples" computation is not
fixed to handle this new scenario which is incorrectly stepping into the
anti join logic and assuming unmatched rows. Fix is to handle inner_unique
case when computing "hashjointuples". Here is the outline of the code that
shows the bug.
void
final_cost_hashjoin(PlannerInfo *root, HashPath *path,
JoinCostWorkspace *workspace,
JoinPathExtraData *extra)
{
.
/* CPU costs */
*if (path->jpath.jointype == JOIN_SEMI ||
path->jpath.jointype == JOIN_ANTI ||extra->inner_unique)*
{
..
*/* Get # of
tuples that will pass the basic join */ if
(path->jpath.jointype == JOIN_SEMI) hashjointuples =
outer_matched_rows; else hashjointuples =
outer_path_rows - outer_matched_rows; *
}
else
{
.
}
}
Thanks, RK (Salesforce)
