Re: SSL SNI
Hi Peter, I imagine this also (finally) opens up the possibility for the server to present a different certificate for each hostname based on SNI. This eliminates the requirement for wildcard certs where the cluster is running on a host with multiple (typically two to three) hostnames and the clients check the hostname against SAN in the cert (sslmode=verify-full). Am I right? Is that feature on anybody's roadmap? Cheers, Jesse On Mon, Feb 15, 2021 at 6:09 AM Peter Eisentraut wrote: > > A customer asked about including Server Name Indication (SNI) into the > SSL connection from the client, so they can use an SSL-aware proxy to > route connections. There was a thread a few years ago where this was > briefly discussed but no patch appeared.[0] I whipped up a quick patch > and it did seem to do the job, so I figured I'd share it here. > > The question I had was whether this should be an optional behavior, or > conversely a behavior that can be turned off, or whether it should just > be turned on all the time. > > Technically, it seems pretty harmless. It adds another field to the TLS > handshake, and if the server is not interested in it, it just gets ignored. > > The Wikipedia page[1] discusses some privacy concerns in the context of > web browsing, but it seems there is no principled solution to those. > The relevant RFC[2] "recommends" that SNI is used for all applicable TLS > connections. > > > [0]: > https://www.postgresql.org/message-id/flat/CAPPwrB_tsOw8MtVaA_DFyOFRY2ohNdvMnLoA_JRr3yB67Rggmg%40mail.gmail.com > [1]: https://en.wikipedia.org/wiki/Server_Name_Indication > [2]: https://tools.ietf.org/html/rfc6066#section-3
Re: Have we tried to treat CTE as SubQuery in planner?
Hi, On Fri, Nov 13, 2020 at 10:04 PM Andy Fan wrote: > > Hi: > > Take the following example: > > insert into cte1 select i, i from generate_series(1, 100)i; > create index on cte1(a); > > explain > with cte1 as (select * from cte1) > select * from c where a = 1; > ITYM: EXPLAIN WITH c AS (SELECT * FROM cte1) SELECT * FROM c WHERE a = 1; I'm also guessing your table DDL is: CREATE TABLE cte1 (a int, b int); > It needs to do seq scan on the above format, however it is pretty > quick if we change the query to > select * from (select * from cte1) c where a = 1; Does it? On HEAD, I got the following plan: (without stats): Bitmap Heap Scan on foo Recheck Cond: (a = 1) -> Bitmap Index Scan on foo_a_idx Index Cond: (a = 1) (with stats): Index Scan using foo_a_idx on foo Index Cond: (a = 1) > > I know how we treat cte and subqueries differently currently, > I just don't know why we can't treat cte as a subquery, so lots of > subquery related technology can apply to it. Do we have any > discussion about this? This was brought up a few times, the most recent one I can recall was a little bit over two years ago [1] [1] https://postgr.es/m/87sh48ffhb@news-spur.riddles.org.uk
Re: upcoming API changes for LLVM 12
Hi Andres, On Thu, Oct 15, 2020 at 6:12 PM Andres Freund wrote: > > Hi, > > There will be a breaking API change for JIT related API in LLVM > 12. Mostly about making control over various aspects easier, and then > building on top of that providing new features (like JIT compiling in > the background and making it easier to share JIT compiled output between > processes). > > I've worked with Lang Hames to ensure that the new C API has feature > parity... > I assume you're alluding to the removal of ORC legacy (v1) API? How far back was feature parity in the new API, or we could only switch starting with LLVM 12? > The postgres changes are fairly localized, all in llvmjit.c - it's just > a few #ifdefs to support both LLVM 12 and before. > > The two questions I have are: > > 1) Which versions do we want to add LLVM 12 support? It'd be fairly >easy to backport all the way. But it's not quite a bugfix... OTOH, >it'd probably painful for packagers to have dependencies on different >versions of LLVM for different versions of postgres. > > 2) When do we want to add LLVM 12 support? PG will soon stop compiling >against LLVM 12, which will be released in about 6 months. I worked >with Lang to make most of the breaking changes in a branch (to be >merged in the next few days), but it's possible that there will be a >few smaller changes. I think this has already happened about two weeks ago when Lang's commit 6154c4115cd4b78d landed in LLVM master. Cheers, Jesse
Re: Residual cpluspluscheck issues
Hi Tom, So it's been 17 months since you sent this email, so I'm not sure that nothing has happened (off list or in the code base), but... On Sun, Jun 2, 2019 at 9:53 AM Tom Lane wrote: > * On FreeBSD 12, I get > > /home/tgl/pgsql/src/include/utils/hashutils.h:23:23: warning: 'register' > storage > class specifier is deprecated and incompatible with C++17 > [-Wdeprecated-register] > extern Datum hash_any(register const unsigned char *k, register int keylen); > ^ > /home/tgl/pgsql/src/include/utils/hashutils.h:23:56: warning: 'register' > storage > class specifier is deprecated and incompatible with C++17 > [-Wdeprecated-register] > extern Datum hash_any(register const unsigned char *k, register int keylen); >^ > /home/tgl/pgsql/src/include/utils/hashutils.h:24:32: warning: 'register' > storage > class specifier is deprecated and incompatible with C++17 > [-Wdeprecated-register] > extern Datum hash_any_extended(register const unsigned char *k, >^ > /home/tgl/pgsql/src/include/utils/hashutils.h:25:11: warning: 'register' > storage > class specifier is deprecated and incompatible with C++17 > [-Wdeprecated-register] >register int ... >^ > > which I'm inclined to think means we should drop those register keywords. I think this is a warning from Clang, right? You can get the same warning on macOS if you use the upstream Clang where the default value of -std for clang++ has been gnu++14 since LLVM 6.0 (not AppleClang, which carries a proprietary patch that simply reverts the bump, but they didn't even bother to patch the manpage). I'm running into the same (well similar) warnings when running cpluspluscheck with GCC 11. Upon closer inspection, this is because In GCC 11, the default value of -std has been bumped to gnu++17. IOW, I would've gotten the same warning had I just configured with CXX="g++-10 -std=gnu++17". The g++ warnings look like the following: gcc> In file included from ./src/include/port/atomics.h:70, gcc> from ./src/include/storage/lwlock.h:21, gcc> from ./src/include/storage/lock.h:23, gcc> from ./src/include/storage/proc.h:21, gcc> from ./src/include/storage/shm_mq.h:18, gcc> from ./src/test/modules/test_shm_mq/test_shm_mq.h:18, gcc> from /tmp/cpluspluscheck.AxICnl/test.cpp:3: gcc> ./src/include/port/atomics/arch-x86.h: In function 'bool pg_atomic_test_set_flag_impl(volatile pg_atomic_flag*)': gcc> ./src/include/port/atomics/arch-x86.h:143:16: warning: ISO C++17 does not allow 'register' storage class specifier [-Wregister] gcc> 143 | register char _res = 1; gcc> |^~~~ gcc> In file included from ./src/include/storage/spin.h:54, gcc> from ./src/test/modules/test_shm_mq/test_shm_mq.h:19, gcc> from /tmp/cpluspluscheck.AxICnl/test.cpp:3: gcc> ./src/include/storage/s_lock.h: In function 'int tas(volatile slock_t*)': gcc> ./src/include/storage/s_lock.h:226:19: warning: ISO C++17 does not allow 'register' storage class specifier [-Wregister] gcc> 226 | register slock_t _res = 1; gcc> | ^~~~ I think this is a problem worth solving: C++ 17 is removing the register keyword, and C++ code that includes our headers have the difficult choices of: 1) With a pre-C++ 17 compiler that forewarns the deprecation, find a compiler-specific switch to turn off the warning 2) With a compiler that defaults to C++ 17 or later (major compiler vendors are upgrading the default to C++17): 2a) find a switch to explicitly downgrade to C++ 14 or below (and then possibly jump back to solving problem number 1). 2b) find a compiler-specific switch to stay in the post- C++ 17 mode, but somehow "turn off" the removal of register keyword. This is particularly cringe-y because the C++ programs themselves have to be non-formant through a header we supply. We can either drop the register keywords here (I wonder what the impact on code generation would be, but it'll be hard to examine, given we'll need to examine _every_ instance of generated code for an inline function), or maybe consider hiding those sections with "#ifndef __cplusplus" (provided that we believe there's not much of a reason for the including code to call these functions, just throwing out uneducated guesses here). Do we have a clear decision of what we want to do here? How can I contribute? Cheers, Jesse
Re: Partition prune with stable Expr
On Sun, Sep 27, 2020 at 7:52 PM Andy Fan wrote: > > > On Mon, Sep 28, 2020 at 9:15 AM Tom Lane wrote: >> >> Andy Fan writes: >> > Well, that's very interesting. Specific to my user case, >> > SELECT * FROM p WHERE pkey = to_date('2018-12-13', '-mm-dd)'; >> > p has 1500+ partitions and planning takes lots of time, which is so same >> > with SELECT * FROM p WHERE pkey = '2018-12-13', however the planning >> > time difference is so huge, that doesn't make sense in human view. Can >> > we do something for that? to_date(text, text) should be a "immutable" >> > function IMO. Does that have a semantic issue or other issues? >> >> Yeah. It depends on the lc_time setting, and possibly also the timezone >> GUC. (Admittedly, common values of the format string would not have >> any lc_time dependency, but the immutability property is not fine-grained >> enough to recognize that.) >> >> regards, tom lane > > > Thanks for your reply. Even it has something on GUC or lc_time setting, > suppose > it should be decided at planning time. Do we have concerns about changes > between planning and execution? Planner can be called at prepared statement creation time, like PREPARE yolo() AS SELECT * FROM foo WHERE pk = to_date(...); Here, there's an arbitrary gap between planning time, and execution. > > The attached patch marked some common formatting function as immutable, > only one partition prune test case needed fixing because of this. I only > changed > to_char/to_date/to_timestamp, however the whole list is below. I can change > all of them if needed. > > proname | count > -+--- > to_ascii| 3 > to_char | 8 > to_date | 1 > to_hex | 2 > to_json | 1 > to_jsonb| 1 > to_number | 1 > to_regclass | 1 > to_regcollation | 1 > to_regnamespace | 1 > to_regoper | 1 > to_regoperator | 1 > to_regproc | 1 > to_regprocedure | 1 > to_regrole | 1 > to_regtype | 1 > to_timestamp| 2 > to_tsquery | 2 > to_tsvector | 6 > (19 rows) > This patch is ridiculous. Immutable functions need to produce the same output for the same argument values. None of the functions changed in the patch is immutable: they are all stable because they all depend on GUC settings (e.g. to_tsvector depends on default_text_search_config).
Re: Fix for configure error in 9.5/9.6 on macOS 11.0 Big Sur
On Thu, Sep 3, 2020 at 10:36 AM Tom Lane wrote: > > Jesse Zhang writes: > >> Peter Eisentraut writes: > >>> Where did the -Werror come from? > > > If you noticed the full invocation of clang, you'd notice that Werror is > > nowhere on the command line, even though the error message suggests > > otherwise. I think this is a behavior from the new AppleClang, > > Hmph. If you explicitly say -Wno-error, does the error drop back down > to being a warning? Yeah something like that, this is what works for me: clang -Wno-error=implicit-function-declaration c.c Then it became a warning. Interestingly, it seems that AppleClang incorrectly forces this warning on us: $ clang --verbose c.c | Apple clang version 12.0.0 (clang-1200.0.31.1) | Target: x86_64-apple-darwin20.0.0 | Thread model: posix | InstalledDir: /Library/Developer/CommandLineTools/usr/bin | "/Library/Developer/CommandLineTools/usr/bin/clang" -cc1 -triple x86_64-apple-macosx11.0.0 -Wdeprecated-objc-isa-usage -Werror=deprecated-objc-isa-usage -Werror=implicit-function-declaration -emit-obj -mrelax-all -disable-free -disable-llvm-verifier -discard-value-names -main-file-name c.c -mrelocation-model pic -pic-level 2 -mthread-model posix -mframe-pointer=all -fno-strict-return -masm-verbose -munwind-tables -target-sdk-version=11.0 -target-cpu penryn -dwarf-column-info -debugger-tuning=lldb -target-linker-version 609 -v -resource-dir /Library/Developer/CommandLineTools/usr/lib/clang/12.0.0 -isysroot /Library/Developer/CommandLineTools/SDKs/MacOSX.sdk -I/usr/local/include -internal-isystem /Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/usr/local/include -internal-isystem /Library/Developer/CommandLineTools/usr/lib/clang/12.0.0/include -internal-externc-isystem /Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/usr/include -internal-externc-isystem /Library/Developer/CommandLineTools/usr/include -Wno-reorder-init-list -Wno-implicit-int-float-conversion -Wno-c99-designator -Wno-final-dtor-non-final-class -Wno-extra-semi-stmt -Wno-misleading-indentation -Wno-quoted-include-in-framework-header -Wno-implicit-fallthrough -Wno-enum-enum-conversion -Wno-enum-float-conversion -fdebug-compilation-dir /tmp -ferror-limit 19 -fmessage-length 193 -stack-protector 1 -fstack-check -mdarwin-stkchk-strong-link -fblocks -fencode-extended-block-signature -fregister-global-dtors-with-atexit -fgnuc-version=4.2.1 -fobjc-runtime=macosx-11.0.0 -fmax-type-align=16 -fdiagnostics-show-option -fcolor-diagnostics -o /var/folders/ts/nxrsmhmd0xb5zdrlqb4jlkbrgn/T/c-e6802f.o -x c c.c | clang -cc1 version 12.0.0 (clang-1200.0.31.1) default target x86_64-apple-darwin20.0.0 Notice that -Werror=implicit-function-declaration up there? I spent a few minutes digging in Apple's published fork of LLVM, they've been forcing this error flag for quite a while, but this particular warning-turned-error is guarded by a conditional along the lines of "is this iOS-like" [1][2], so I cannot imagine such a code path is activated (other than something like "goto fail;" from 2014) > > > I've heard reports of the same under the latest Xcode 12 on macOS > > Catalina, but I don't have my hands on such an env. > > The latest thing available to the unwashed masses seems to be > Xcode 11.7 with Yes you're right. Xcode 12 is still beta. > > Smells like an Apple bug from here. Surely they're not expecting > that anyone will appreciate -Werror suddenly being the default. I think you've convinced me that this is an Apple bug indeed. I'll probably just get by with a Wno-error=implicit-function-declaration in my CFLAGS for now. [1] https://github.com/apple/llvm-project/blob/swift-5.3-DEVELOPMENT-SNAPSHOT-2020-08-04-a/clang/lib/Driver/ToolChains/Darwin.cpp#L952-L962 [2] https://opensource.apple.com/source/clang/clang-800.0.42.1/src/tools/clang/lib/Driver/ToolChains.cpp
Re: Fix for configure error in 9.5/9.6 on macOS 11.0 Big Sur
Hi Tom and Peter, On Wed, Sep 2, 2020 at 10:40 PM Tom Lane wrote: > > Peter Eisentraut writes: > > On 2020-09-02 22:43, Jesse Zhang wrote: > >> | conftest.c:184:3: error: implicitly declaring library function > >> 'exit' with type 'void (int) __attribute__((noreturn))' > >> [-Werror,-Wimplicit-function-declaration] > > > Where did the -Werror come from? > > Peter wasn't entirely explicit here, but note the advice at the end of > > https://www.postgresql.org/docs/devel/install-procedure.html > > that you cannot include -Werror in any CFLAGS you tell configure > to use. It'd be nice if autoconf was more robust about that, > but it is not our bug to fix. > > regards, tom lane If you noticed the full invocation of clang, you'd notice that Werror is nowhere on the command line, even though the error message suggests otherwise. I think this is a behavior from the new AppleClang, here's the minimal repro: int main() { exit(0); } And boom! $ clang -c c.c c.c:1:14: error: implicitly declaring library function 'exit' with type 'void (int) __attribute__((noreturn))' [-Werror,-Wimplicit-function-declaration] int main() { exit(0); } ^ c.c:1:14: note: include the header or explicitly provide a declaration for 'exit' 1 error generated. My environment: $ uname -rsv Darwin 20.0.0 Darwin Kernel Version 20.0.0: Fri Aug 14 00:25:13 PDT 2020; root:xnu-7195.40.44.151.1~4/RELEASE_X86_64 $ clang --version Apple clang version 12.0.0 (clang-1200.0.31.1) Target: x86_64-apple-darwin20.0.0 Thread model: posix InstalledDir: /Library/Developer/CommandLineTools/usr/bin I've heard reports of the same under the latest Xcode 12 on macOS Catalina, but I don't have my hands on such an env. Cheers, Jesse
Re: Fix for configure error in 9.5/9.6 on macOS 11.0 Big Sur
Wow thanks Alvaro! My search of "most obvious keywords" didn't turn this up. On Wed, Sep 2, 2020 at 2:18 PM Alvaro Herrera wrote: > > On 2020-Sep-02, Jesse Zhang wrote: > > > Hi Peter, > > > > Yeah it's funny I got this immediately after upgrading to Big Sur (beta > > 5). I found commit 1c0cf52b39ca3 but couldn't quite find the mailing > > list thread on it from 4 years ago (it lists Heikki and Thomas Munro as > > reviewers). Was it prompted by a similar error you encountered? > > https://postgr.es/m/bf9de63c-b669-4b8c-d33b-4a5ed11cd...@2ndquadrant.com Cheers, Jesse
Re: Fix for configure error in 9.5/9.6 on macOS 11.0 Big Sur
Hi Peter, Yeah it's funny I got this immediately after upgrading to Big Sur (beta 5). I found commit 1c0cf52b39ca3 but couldn't quite find the mailing list thread on it from 4 years ago (it lists Heikki and Thomas Munro as reviewers). Was it prompted by a similar error you encountered? On Tue, Aug 25, 2020 at 6:57 AM Peter Eisentraut wrote: > > On 2020-08-02 23:18, Tom Lane wrote: > > Thomas Gilligan writes: > >> Under the next version of macOS (11.0 unreleased beta 3), configuring > >> Postgres 9.5 and 9.6 fails with > > > >>> checking test program... ok > >>> checking whether long int is 64 bits... no > >>> checking whether long long int is 64 bits... no > >>> configure: error: Cannot find a working 64-bit integer type. > > > > Hm, could we see the config.log output for this? I'm not 100% convinced > > that you've diagnosed the problem accurately, because it'd imply that > > Apple made some fundamentally incompatible changes in libc, which > > seems like stirring up trouble for nothing. > > It looks like the new compiler errors out on calling undeclared > functions. Might be good to see config.log to confirm this. Yeah here's an excerpt from config.log verbatim (I'm not wrapping the lines): | configure:13802: checking whether long int is 64 bits | configure:13860: ccache clang -o conftest -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Wendif-labels -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -fwrapv -Wno-unused-command-line-argument -g -O0 conftest.c -lz -lreadline -lm >&5 | conftest.c:169:5: warning: no previous prototype for function 'does_int64_work' [-Wmissing-prototypes] | int does_int64_work() | ^ | conftest.c:169:1: note: declare 'static' if the function is not intended to be used outside of this translation unit | int does_int64_work() | ^ | static | conftest.c:183:1: warning: type specifier missing, defaults to 'int' [-Wimplicit-int] | main() { | ^ | conftest.c:184:3: error: implicitly declaring library function 'exit' with type 'void (int) __attribute__((noreturn))' [-Werror,-Wimplicit-function-declaration] | exit(! does_int64_work()); | ^ | conftest.c:184:3: note: include the header or explicitly provide a declaration for 'exit' | 2 warnings and 1 error generated. Cheers, Jesse
Re: run pgindent on a regular basis / scripted manner
On Wed, Aug 12, 2020 at 10:14 PM Tom Lane wrote: > > Noah Misch writes: > > ... Another advantage of master-only is a guarantee against > > disrupting time-critical patches. (It would be ugly to push back branches > > and > > sort out the master push later, but it doesn't obstruct the mission.) > > Hm, doesn't it? I had the idea that "git push" is atomic --- either all > the per-branch commits succeed, or they all fail. I might be wrong. As of Git 2.28, atomic pushes are not turned on by default. That means "git push my-remote foo bar" _can_ result in partial success. I'm that paranoid who does "git push --atomic my-remote foo bar fizz". Cheers, Jesse
Re: run pgindent on a regular basis / scripted manner
Hi Andres, On Wed, Aug 12, 2020 at 3:34 PM Andres Freund wrote: > > Hi, > > When developing patches I find it fairly painful that I cannot re-indent > patches with pgindent without also seeing a lot of indentation changes > in unmodified parts of files. It is easy enough ([1]) to only re-indent > files that I have modified, but there's often a lot of independent > indentation changes in the files that I did modified. > > I e.g. just re-indented patch 0001 of my GetSnapshotData() series and > most of the hunks were entirely unrelated. Despite the development > window for 14 having only relatively recently opened. Based on my > experience it tends to get worse over time. How bad was it right after branching 13? I wonder if we have any empirical measure of badness over time -- assuming there was a point in the recent past where everything was good, and the bad just crept in. > > > Is there any reason we don't just automatically run pgindent regularly? > Like once a week? And also update typedefs.list automatically, while > we're at it? You know what's better than weekly? Every check-in. I for one would love it if we can just format the entire codebase, and ensure that new check-ins are also formatted. We _do_ need some form of continuous integration to catch us when we have fallen short (again, once HEAD reaches a "known good" state, it's conceivably cheap to keep it in the good state. Cheers, Jesse
Re: Volatile Functions in Parallel Plans
Hi Tom and Zhenghua, On Thu, Jul 16, 2020 at 8:18 AM Tom Lane wrote: > > Jesse Zhang writes: > > You are right, no they are not consistent. But Neither plans is > > incorrect: > > > 1. In the first query, it's semantically permissible to evaluate > > timeofday() for each pair of (c1, c2), and the plan reflects that. > > (Notice that the parallel nature of the plan is just noise here, the > > planner could have gone with a Nested Loop of which the inner side is > > _not_ materialized). > > Yeah, exactly. The short answer here is that refusing to parallelize > the plan would not make things any more consistent. > > In general, using a volatile function in a WHERE clause is problematic > because we make no guarantees about how often it will be evaluated. > It could be anywhere between "never" and "once per row of the > cross-product of the FROM tables". AFAIR, the only concession we've made > to make that less unpredictable is to avoid using volatile functions in > index quals. But even that will only make things noticeably more > predictable for single-table queries. As soon as you get into join cases, > you don't have much control over when the function will get evaluated. > > regards, tom lane For more kicks, I don't even think this is restricted to volatile functions only. To stir the pot, it's conceivable that planner might produce the following plan Seq Scan on pg_temp_3.foo Output: foo.a Filter: (SubPlan 1) SubPlan 1 -> WindowAgg Output: sum(bar.d) OVER (?) -> Seq Scan on pg_temp_3.bar Output: bar.d For the following query SELECT a FROM foo WHERE b = ALL ( SELECT sum(d) OVER (ROWS UNBOUNDED PRECEDING) FROM bar ); N.B. that the WindowAgg might produce a different set of numbers each time depending on the scan order of bar, which means that for two different "foo" tuples of equal b value, one might be rejected by the filter whereas another survives. I think the crux of the discussion should be whether we can reasonably expect a subquery (subquery-like structure, for example the inner side of nest loops upthread) to be evaluated only once. IMHO, no. The SQL standard only broadly mandates that each "execution" of a subquery to be "atomic". Zhenghua and Tom, would you suggest the above plan is wrong (not suboptimal, but wrong) just because we don't materialize the WindowAgg under the subplan? Cheers, Jesse
Re: Volatile Functions in Parallel Plans
Hi Zhenghua, On Wed, Jul 15, 2020 at 5:44 AM Zhenghua Lyu wrote: > > Hi, > I test some SQL in the latest Postgres master branch code (we find these > issues when > developing Greenplum database in the PR > https://github.com/greenplum-db/gpdb/pull/10418, > and my colleague come up with the following cases in Postgres): > > > create table t3 (c1 text, c2 text); > CREATE TABLE > insert into t3 > select > 'fhufehwiohewiuewhuhwiufhwifhweuhfwu', --random data > 'fiowehufwhfyegygfewpfwwfeuhwhufwh' --random data > from generate_series(1, 1000) i; > INSERT 0 1000 > analyze t3; > ANALYZE > create table t4 (like t3); > CREATE TABLE > insert into t4 select * from t4; > INSERT 0 0 > insert into t4 select * from t3; > INSERT 0 1000 > analyze t4; > ANALYZE > set enable_hashjoin to off; > SET > explain (costs off) > select count(*) from t3, t4 > where t3.c1 like '%sss' > and timeofday() = t4.c1 and t3.c1 = t4.c1; >QUERY PLAN > > Finalize Aggregate >-> Gather > Workers Planned: 2 > -> Partial Aggregate >-> Nested Loop > Join Filter: (t3.c1 = t4.c1) > -> Parallel Seq Scan on t3 >Filter: (c1 ~~ '%sss'::text) > -> Seq Scan on t4 >Filter: (timeofday() = c1) > (10 rows) > > explain (verbose, costs off) > select count(*) > from > t3, > (select *, timeofday() as x from t4 ) t4 > where t3.c1 like '%sss' and > timeofday() = t4.c1 and t3.c1 = t4.c1; > QUERY PLAN > -- > Finalize Aggregate >Output: count(*) >-> Gather > Output: (PARTIAL count(*)) > Workers Planned: 2 > -> Partial Aggregate >Output: PARTIAL count(*) >-> Nested Loop > Join Filter: (t3.c1 = t4.c1) > -> Parallel Seq Scan on public.t3 >Output: t3.c1, t3.c2 >Filter: (t3.c1 ~~ '%sss'::text) > -> Seq Scan on public.t4 >Output: t4.c1, NULL::text, timeofday() >Filter: (timeofday() = t4.c1) > (15 rows) > > > > Focus on the last two plans, the function timeofday is > volatile but paralle-safe. And Postgres outputs two parallel > plan. > > > The first plan: > > Finalize Aggregate >-> Gather > Workers Planned: 2 > -> Partial Aggregate >-> Nested Loop > Join Filter: (t3.c1 = t4.c1) > -> Parallel Seq Scan on t3 >Filter: (c1 ~~ '%sss'::text) > -> Seq Scan on t4 >Filter: (timeofday() = c1) > > The join's left tree is parallel scan and the right tree is seq scan. > This algorithm is correct using the distribute distributive law of > distributed join: >A = [A1 A2 A3...An], B then A join B = gather( (A1 join B) (A2 join B) > ... (An join B) ) > > The correctness of the above law should have a pre-assumption: > The data set of B is the same in each join: (A1 join B) (A2 join B) ... > (An join B) > > But things get complicated when volatile functions come in. Timeofday is just > an example to show the idea. The core is volatile functions can return > different > results on successive calls with the same arguments. Thus the following piece, > the right tree of the join > -> Seq Scan on t4 >Filter: (timeofday() = c1) > can not be considered consistent everywhere in the scan workers. > > The second plan > > Finalize Aggregate >Output: count(*) >-> Gather > Output: (PARTIAL count(*)) > Workers Planned: 2 > -> Partial Aggregate >Output: PARTIAL count(*) >-> Nested Loop > Join Filter: (t3.c1 = t4.c1) > -> Parallel Seq Scan on public.t3 >Output: t3.c1, t3.c2 >Filter: (t3.c1 ~~ '%sss'::text) > -> Seq Scan on public.t4 >Output: t4.c1, NULL::text, timeofday() >Filter: (timeofday() = t4.c1) > > > have voltile projections in the right tree of the nestloop: > > -> Seq Scan on public.t4 >Output: t4.c1, NULL::text, timeofday() >Filter: (timeofday() = t4.c1) > > It should not be taken as consistent in different workers. You are right, no they are not consistent. But Neither plans is incorrect: 1. In the first query, it's semantically permissible to evaluate timeofday() for each pair of (c1, c2), and the plan reflects that. (Notice that the
Fix header identification
Hi hackers, PFA a patch that fixes up the identification for 4 header files. I did a little archaeology trying to find plausible reasons for why we committed the wrong identification in the first place, and here's what I came up with: jsonfuncs.h was created in ce0425b162d0a to house backend-only json function declarations previously in jsonapi.h, and the identification was a copy-pasta from jsonapi.h (then in src/include/utils). jsonapi.h was wholesale moved to common in beb4699091e9f without changing identification. partdesc.h was created in 1bb5e78218107 but I can't find a good excuse why we made a mistake then, except for (maybe) the prototype for RelationBuildPartitionDesc() was moved from partcache.h and so we may have also taken its identification line, although that sounds arbitrary. llvmjit_emit.h was created with the wrong identification, probably due to a personal template... Cheers, Jesse From 55c7280ee09ab258efc1c59b4ce6226228bcbab0 Mon Sep 17 00:00:00 2001 From: Jesse Zhang Date: Thu, 9 Jul 2020 13:39:39 -0700 Subject: [PATCH] Fix header identification. jsonfuncs.h was created in ce0425b162d0a to house backend-only json function declarations previously in jsonapi.h, and the identification was a copy-pasta from jsonapi.h (then in src/include/utils). jsonapi.h was wholesale moved to common in beb4699091e9f without changing identification. partdesc.h was created in 1bb5e78218107 but I can't find a good excuse why we made a mistake then, except for (maybe) the prototype for RelationBuildPartitionDesc() was moved from partcache.h and so we may have also taken its identification line, although that sounds arbitrary. llvmjit_emit.h was created with the wrong identification, probably due to a personal template... --- src/include/common/jsonapi.h| 2 +- src/include/jit/llvmjit_emit.h | 2 +- src/include/partitioning/partdesc.h | 2 +- src/include/utils/jsonfuncs.h | 2 +- 4 files changed, 4 insertions(+), 4 deletions(-) diff --git a/src/include/common/jsonapi.h b/src/include/common/jsonapi.h index bcfd57cc53c46..1fee0ea81e47a 100644 --- a/src/include/common/jsonapi.h +++ b/src/include/common/jsonapi.h @@ -6,7 +6,7 @@ * Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group * Portions Copyright (c) 1994, Regents of the University of California * - * src/include/utils/jsonapi.h + * src/include/common/jsonapi.h * *- */ diff --git a/src/include/jit/llvmjit_emit.h b/src/include/jit/llvmjit_emit.h index 74607a43770af..1a7d6db7259e0 100644 --- a/src/include/jit/llvmjit_emit.h +++ b/src/include/jit/llvmjit_emit.h @@ -4,7 +4,7 @@ * * Copyright (c) 2018-2020, PostgreSQL Global Development Group * - * src/include/lib/llvmjit_emit.h + * src/include/jit/llvmjit_emit.h */ #ifndef LLVMJIT_EMIT_H #define LLVMJIT_EMIT_H diff --git a/src/include/partitioning/partdesc.h b/src/include/partitioning/partdesc.h index fb416e073d075..70df764981e70 100644 --- a/src/include/partitioning/partdesc.h +++ b/src/include/partitioning/partdesc.h @@ -4,7 +4,7 @@ * * Copyright (c) 1996-2020, PostgreSQL Global Development Group * - * src/include/utils/partdesc.h + * src/include/partitioning/partdesc.h * *- */ diff --git a/src/include/utils/jsonfuncs.h b/src/include/utils/jsonfuncs.h index 1f1b4029cbf12..4796b2b78bf9a 100644 --- a/src/include/utils/jsonfuncs.h +++ b/src/include/utils/jsonfuncs.h @@ -6,7 +6,7 @@ * Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group * Portions Copyright (c) 1994, Regents of the University of California * - * src/include/utils/jsonapi.h + * src/include/utils/jsonfuncs.h * *- */ -- 2.27.0
Strange behavior with polygon and NaN
Hi hackers, While working with Chris Hajas on merging Postgres 12 with Greenplum Database we stumbled upon the following strange behavior in the geometry type polygon: -- >8 CREATE TEMP TABLE foo (p point); CREATE INDEX ON foo USING gist(p); INSERT INTO foo VALUES ('0,0'), ('1,1'), ('NaN,NaN'); SELECT $q$ SELECT * FROM foo WHERE p <@ polygon '(0,0), (0, 100), (100, 100), (100, 0)' $q$ AS qry \gset BEGIN; SAVEPOINT yolo; SET LOCAL enable_seqscan TO off; :qry; ROLLBACK TO SAVEPOINT yolo; SET LOCAL enable_indexscan TO off; SET LOCAL enable_bitmapscan TO off; :qry; -- 8< If you run the above repro SQL in HEAD (and 12, and likely all older versions), you get the following output: CREATE TABLE CREATE INDEX INSERT 0 3 BEGIN SAVEPOINT SET p --- (0,0) (1,1) (2 rows) ROLLBACK SET SET p --- (0,0) (1,1) (NaN,NaN) (3 rows) At first glance, you'd think this is the gist AM's bad, but on a second thought, something else is strange here. The following query returns true: SELECT point '(NaN, NaN)' <@ polygon '(0,0), (0, 100), (100, 100), (100, 0)' The above behavior of the "contained in" operator is surprising, and it's probably not what the GiST AM is expecting. I took a look at point_inside() in geo_ops.c, and it doesn't seem well equipped to handle NaN. Similary ill-equipped is dist_ppoly_internal() which underlies the distnace operator for polygon. It gives the following interesting output: SELECT *, c <-> polygon '(0,0),(0,100),(100,100),(100,0)' as distance FROM ( SELECT circle(point(100 * i, 'NaN'), 50) AS c FROM generate_series(-2, 4) i ) t(c) ORDER BY 2; c| distance -+-- <(-200,NaN),50> |0 <(-100,NaN),50> |0 <(0,NaN),50>|0 <(100,NaN),50> |0 <(200,NaN),50> | NaN <(300,NaN),50> | NaN <(400,NaN),50> | NaN (7 rows) Should they all be NaN? Am I alone in thinking the index is right but the operators are wrong? Or should we call the indexes wrong here? Cheers, Jesse and Chris
Re: Avoiding hash join batch explosions with extreme skew and weird stats
Hi Tomas, On Tue, Jun 23, 2020 at 3:24 PM Tomas Vondra wrote: > > Now, a couple comments / questions about the code. > > > nodeHash.c > -- > > > 1) MultiExecPrivateHash says this > >/* > * Not subject to skew optimization, so either insert normally > * or save to batch file if it belongs to another stripe > */ > > I wonder what it means to "belong to another stripe". I understand what > that means for batches, which are identified by batchno computed from > the hash value. But I thought "stripes" are just work_mem-sized pieces > of a batch, so I don't quite understand this. Especially when the code > does not actually check "which stripe" the row belongs to. I have to concur that "stripe" did inspire a RAID vibe when I heard it, but it seemed to be a better name than what it replaces > 3) I'm a bit puzzled about this formula in ExecHashIncreaseNumBatches > >childbatch = (1U << (my_log2(hashtable->nbatch) - 1)) | > hashtable->curbatch; > > and also about this comment > >/* > * TODO: what to do about tuples that don't go to the child > * batch or stay in the current batch? (this is why we are > * counting tuples to child and curbatch with two diff > * variables in case the tuples go to a batch that isn't the > * child) > */ >if (batchno == childbatch) > childbatch_outgoing_tuples++; > > I thought each old batch is split into two new ones, and the tuples > either stay in the current one, or are moved to the new one - which I > presume is the childbatch, although I haven't tried to decode that > formula. So where else could the tuple go, as the comment tried to > suggest? True, every old batch is split into two new ones, if you only consider tuples coming from the batch file that _still belong in there_. i.e. there are tuples in the old batch file that belong to a future batch. As an example, if the current nbatch = 8, and we want to expand to nbatch = 16, (old) batch 1 will split into (new) batch 1 and batch 9, but it can already contain tuples that need to go into (current) batches 3, 5, and 7 (soon-to-be batches 11, 13, and 15). Cheers, Jesse
Re: Fix compilation failure against LLVM 11
On Thu, May 28, 2020 at 1:07 AM Michael Paquier wrote: > Please note that I would still wait for their next GA release to plug > in any extra holes at the same time. @Jesse: or is this change > actually part of the upcoming 10.0.1? No a refactoring like this was not in the back branches (nor is it expected). Thanks, Jesse
Re: Fix compilation failure against LLVM 11
Hi Andres, On the mensiversary of the last response, what can I do to help move this along (aside from heeding the advice "don't use LLVM HEAD")? Michael Paquier expressed concerns over flippant churn upthread: On Mon, Apr 27, 2020 at 10:19 PM Andres Freund wrote: AF> On 2020-04-28 13:56:23 +0900, Michael Paquier wrote: MP> > On Mon, Apr 27, 2020 at 07:48:54AM -0700, Jesse Zhang wrote: JZ> > > Are you expressing a concern against "churning" this part of the JZ> > > code in reaction to upstream LLVM changes? I'd agree with you in JZ> > > general. But then the question we need to ask is "will we need JZ> > > to revert this 3 weeks from now if upstream reverts their JZ> > > changes?", or "we change X to Y now, will we need to instead JZ> > > change X to Z 3 weeks later?". > > MP> > My concerns are a mix of all that, because we may finish by doing MP> > the same verification work multiple times instead of fixing all MP> > existing issues at once. A good thing is that we may be able to MP> > conclude rather soon, it looks like LLVM releases a new major MP> > version every 3 months or so. > AF> Given the low cost of a change like this, and the fact that we have AF> a buildfarm animal building recent trunk versions of llvm, I think AF> it's better to just backpatch now. For bystanders: Andres and I seemed to agree that this is unlikely to be flippant (in addition to other benefits mentioned below). We haven't discussed more on this, though I'm uncertain we had a consensus. > JZ> > > In that frame of mind, the answer is simply "no" w.r.t this JZ> > > patch: it's removing an #include that simply has been dead: the JZ> > > upstream change merely exposed it. > > MP> > The docs claim support for LLVM down to 3.9. Are versions older MP> > than 8 fine with your proposed change? > AF> I'll check. > How goes the checking? I was 99% certain it'd work but that might have been my excuse to be lazy. Do you need help on this? > JZ> > > OTOH, is your concern more around "how many more dead #include JZ> > > will LLVM 11 reveal before September?", I'm open to suggestions. JZ> > > I personally have a bias to keep things working. > > MP> > This position can have advantages, though it seems to me that we MP> > should still wait to see if there are more issues popping up. > AF> What's the benefit? The cost of checking this will be not AF> meaningfully lower if there's other things to check as well, given AF> their backward compat story presumably is different. For bystanders: Andres and I argued for "fixing this sooner and backpatch" and Michael suggested "wait longer and whack all moles". We have waited, and there seems to be only one mole (finding all dead unbroken "include"s was left as an exercise for the reader). Have we come to an agreement on this? Cheers, Jesse
Re: do {} while (0) nitpick
Hi Tom, On Fri, May 1, 2020 at 2:32 PM Tom Lane wrote: > > Grepping showed me that there were some not-do-while macros that > also had trailing semicolons. These seem just as broken, so I > fixed 'em all. > I'm curious: *How* are you able to discover those occurrences with grep? I understand how John might have done it with his original patch: it's quite clear the pattern he would look for looks like "while (0);" but how did you find all these other macro definitions with a trailing semicolon? My tiny brain can only imagine: 1. Either grep for trailing semicolon (OMG almost every line will come up) and squint through the context the see the previous line has a trailing backslash; 2. Or use some LLVM magic to spelunk through every macro definition and look for a trailing semicolon Cheers, Jesse
Re: Fix compilation failure against LLVM 11
Hi Michael, On Mon, Apr 27, 2020 at 9:56 PM Michael Paquier wrote: > rather soon, it looks like LLVM releases a new major version every 3 > months or so. FYI LLVM has a six-month release cadence [1], the next release is expected coming September (I can't tell whether you were joking). Cheers, Jesse [1] https://llvm.org/docs/HowToReleaseLLVM.html#release-timeline
Re: Fix compilation failure against LLVM 11
Hi Michael, On Sun, Apr 26, 2020 at 11:21 PM Michael Paquier wrote: > > On Sat, Apr 25, 2020 at 09:41:20PM -0700, Jesse Zhang wrote: > > I searched my inbox and the archive, strange that nobody else is seeing > > this. > > > > Turns out that LLVM has recently removed "llvm/IR/CallSite.h" in > > (unreleased) version 11 [1][2]. To fix the build I tried conditionally > > (on LLVM_VERSION_MAJOR < 11) including CallSite.h, but that looks yuck. > > Then I poked at llvmjit_inline.cpp a bit and found that CallSite.h > > doesn't seem to be really necessary. PFA a patch that simply removes > > this #include. > > > > In addition, I've done the due dilligence of trying to build against > > LLVM versions 8, 9, 10. > > LLVM 11 has not been released yet. Do you think that this part or > even more are subject to change before the 11 release? My take would > be to wait more before fixing this issue and make sure that our code > is fixed when their code is GA'd. > -- > Michael Are you expressing a concern against "churning" this part of the code in reaction to upstream LLVM changes? I'd agree with you in general. But then the question we need to ask is "will we need to revert this 3 weeks from now if upstream reverts their changes?", or "we change X to Y now, will we need to instead change X to Z 3 weeks later?". In that frame of mind, the answer is simply "no" w.r.t this patch: it's removing an #include that simply has been dead: the upstream change merely exposed it. OTOH, is your concern more around "how many more dead #include will LLVM 11 reveal before September?", I'm open to suggestions. I personally have a bias to keep things working. Cheers, Jesse
Fix compilation failure against LLVM 11
Hi hackers, My local build of master started failing last night with this error: llvmjit_inline.cpp:59:10: fatal error: 'llvm/IR/CallSite.h' file not found #include ^~~~ I searched my inbox and the archive, strange that nobody else is seeing this. Turns out that LLVM has recently removed "llvm/IR/CallSite.h" in (unreleased) version 11 [1][2]. To fix the build I tried conditionally (on LLVM_VERSION_MAJOR < 11) including CallSite.h, but that looks yuck. Then I poked at llvmjit_inline.cpp a bit and found that CallSite.h doesn't seem to be really necessary. PFA a patch that simply removes this #include. In addition, I've done the due dilligence of trying to build against LLVM versions 8, 9, 10. Cheers, Jesse [1] LLVM Differential Revision: https://reviews.llvm.org/D78794 [2] LLVM commit https://github.com/llvm/llvm-project/commit/2c24051bacd2 "[CallSite removal] Rename CallSite.h to AbstractCallSite.h. NFC" From 27084a0ebd37176f0a831042457e27cad8ac70eb Mon Sep 17 00:00:00 2001 From: Jesse Zhang Date: Sat, 25 Apr 2020 16:24:33 -0700 Subject: [PATCH] Remove unused include LLVM has recently removed "llvm/IR/CallSite.h" in (unreleased) version 11 [1]. Instead of using preprocessor conditionals, just remove it because it doesn't seem like AbstractCallSite (or none of the other symbols declared in the header) is used by LLVM JIT. [1] LLVM Differential Revision: https://reviews.llvm.org/D78794 --- src/backend/jit/llvm/llvmjit_inline.cpp | 1 - 1 file changed, 1 deletion(-) diff --git a/src/backend/jit/llvm/llvmjit_inline.cpp b/src/backend/jit/llvm/llvmjit_inline.cpp index 95d4d854f43..2617a461cad 100644 --- a/src/backend/jit/llvm/llvmjit_inline.cpp +++ b/src/backend/jit/llvm/llvmjit_inline.cpp @@ -56,7 +56,6 @@ extern "C" #include #endif #include -#include #include #include #include -- 2.26.2
Re: Header / Trailer Comment Typos for M4 macros
On Wed, Apr 22, 2020 at 12:29 PM Tom Lane wrote: > > Jesse Zhang writes: > > either way: either changing the macro names or changing the comment. PFA > > a patch that keeps the macro names. > > Pushed, thanks. > Thanks! > > Also in hindsight: it seems that, as suggested in the trailer typo, > > PGAC_PROG_CXX_VAR_OPT (a la the C version PGAC_PROG_CC_VAR_OPT) would be > > a good addition if we ever want to add the negative warning flags (for > > starter, Wno-unused-command-line-argument for clang++) to CXXFLAGS, but > > I assume it wasn't there in the final patch because we didn't use it > > (presumably because the patch was minimized?). Thoughts? > > I'd be inclined not to add it till we have an actual use for it. > Dead code tends to break silently. > For sure. I feel the same about dead code. I didn't make my question clear though: I'm curious what motivated the original addition of -Wno-unused-command-line-argument in commit 73b416b2e412, and how that problem did't quite manifest itself with Clang++. The commit mentioned pthread flags, but I tried taking out Wno-unused-command-line-argument from configure.in and it produces no warnings on my laptop (I know, that's a bad excuse). For context, I'm running Clang 11 on Ubuntu amd64. Cheers, Jesse
Header / Trailer Comment Typos for M4 macros
Hi hackers, While poking at the build system I stumbled upon some trivial trailer comment inconsistencies in config/c-compiler.m4. They can be fixed either way: either changing the macro names or changing the comment. PFA a patch that keeps the macro names. In hindsight though, it seems that PGAC_PRINTF_ARCHETYPE was meant to be PGAC_C_PRINTF_ARCHETYPE, judging by a half-second squint of surrounding, similar macros. Thoughts? Also in hindsight: it seems that, as suggested in the trailer typo, PGAC_PROG_CXX_VAR_OPT (a la the C version PGAC_PROG_CC_VAR_OPT) would be a good addition if we ever want to add the negative warning flags (for starter, Wno-unused-command-line-argument for clang++) to CXXFLAGS, but I assume it wasn't there in the final patch because we didn't use it (presumably because the patch was minimized?). Thoughts? Cheers, Jesse From 73c15bfdea388347455810c3eb98dbff235d3d08 Mon Sep 17 00:00:00 2001 From: Jesse Zhang Date: Wed, 22 Apr 2020 06:51:05 -0700 Subject: [PATCH] Fix header / trailer typos for m4 macros These seem like legit names in earlier iterations of respective patches (commit b779168ffe33 "Detect PG_PRINTF_ATTRIBUTE automatically." and commit 6869b4f25847 "Add C++ support to configure.") but the macro had been renamed out of sync with the header / trailer comment in the final committed patch. --- config/c-compiler.m4 | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/config/c-compiler.m4 b/config/c-compiler.m4 index 501b74b3a19..c48cba9cf9a 100644 --- a/config/c-compiler.m4 +++ b/config/c-compiler.m4 @@ -2,7 +2,7 @@ # config/c-compiler.m4 -# PGAC_C_PRINTF_ARCHETYPE +# PGAC_PRINTF_ARCHETYPE # --- # Select the format archetype to be used by gcc to check printf-type functions. # We prefer "gnu_printf", as that most closely matches the features supported @@ -466,7 +466,7 @@ undefine([Ac_cachevar])dnl # command-line option. If it does, add the string to CXXFLAGS. AC_DEFUN([PGAC_PROG_CXX_CFLAGS_OPT], [PGAC_PROG_VARCXX_VARFLAGS_OPT(CXX, CXXFLAGS, $1) -])# PGAC_PROG_CXX_VAR_OPT +])# PGAC_PROG_CXX_CFLAGS_OPT -- 2.26.2
Re: Properly mark NULL returns in numeric aggregates
Hi David, On Mon, Apr 13, 2020 at 10:46 PM David Rowley wrote: > > On Tue, 14 Apr 2020 at 06:14, Tom Lane wrote: > > Yeah, they're relying exactly on the assumption that nodeAgg is not > > going to try to copy a value declared "internal", and therefore they > > can be loosey-goosey about whether the value pointer is null or not. > > However, if you want to claim that that's wrong, you have to explain > > why it's okay for some other code to be accessing a value that's > > declared "internal". I'd say that the meaning of that is precisely > > "keepa u hands off". > > > > In the case at hand, the current situation is that we only expect the > > values returned by these combine functions to be read by the associated > > final functions, which are on board with the null-pointer representation > > of an empty result. Your argument is essentially that it should be > > possible to feed the values to the aggregate's associated serialization > > function as well. But the core code never does that, so I'm not convinced > > that we should add it to the requirements; we'd be unable to test it. > > Casting my mind back to when I originally wrote that code, I attempted > to do so in such a way so that it could one day be used for a 3-stage > aggregation. e.g Parallel Partial Aggregate -> Gather -> Combine > Serial Aggregate on one node, then on some master node a Deserial > Combine Finalize Aggregate. You're very right that we can't craft > such a plan with today's master (We didn't even add a supporting enum > for it in AggSplit). However, it does appear that there are > extensions or forks out there which attempt to use the code in this > way, so it would be good to not leave those people out in the cold > regarding this. Greenplum plans split-aggregation quite similarly from Postgres: while it doesn't pass partial results through a intra-cluster "Gather" -- using a reshuffle-by-hash type operation instead -- Greenplum _does_ split an aggregate into final and partial halves, running them on different nodes. In short, the relation ship among the combine, serial, and deserial functions are similar to how they are in Postgres today (serial->deserial->combine), in the context of splitting aggregates. The current problem arises because Greenplum spills the hash table in hash aggregation (a diff we're working actively to upstream), a process in which we have to touch (read: serialize and copy) the internal trans values. However, we are definitely eyeing what you described as something to move towards. As a fork, we'd like to carry as thin a diff as possible. So the current situation is pretty much forcing us to diverge in the functions mentioned up-thread. In hindsight, "sloppy" might not have been a wise choice of words, apologies for the possible offense, David! > > For testing, can't we just have an Assert() in > advance_transition_function that verifies isnull matches the > nullability of the return value for INTERNAL returning transfns? i.e, > the attached > > I don't have a test case to hand that could cause this to fail, but it > sounds like Jesse might. One easy way to cause this is "sum(x) FILTER (WHERE false)" which will for sure make the partial results NULL. Is that what you're looking for? I'll be happy to send in the SQL. Cheers, Jesse
Re: Properly mark NULL returns in numeric aggregates
On Fri, Apr 10, 2020 at 3:59 PM Tom Lane wrote: > > They can't be strict because the initial iteration needs to produce > something from a null state and non-null input. nodeAgg's default > behavior won't work for those because nodeAgg doesn't know how to > copy a value of type "internal". > > regards, tom lane Ah, I think I get it. A copy must happen because the input is likely in a shorter-lived memory context than the state, but nodeAgg's default behavior of copying a by-value datum won't really copy the object pointed to by the pointer wrapped in the datum of "internal" type, so we defer to the combine function. Am I right? Then it follows kinda naturally that those combine functions have been sloppy on arrival since commit 11c8669c0cc . Cheers, Jesse
Re: Properly mark NULL returns in numeric aggregates
Hi Andres, On Fri, Apr 10, 2020 at 12:14 PM Andres Freund wrote: > > Shouldn't these just be marked as strict? > Are you suggesting that because none of the corresponding trans functions (int8_avg_accum, int2_accum, and friends) ever output NULL? That's what we thought, but then I realized that an input to a comebine function is not necessarily an output from a trans function invocation: for example, when there is a "FILTER (WHERE ...)" clause that filters out every tuple in a group, the partial aggregate might just throw a NULL state for the final aggregate to combine. On the other hand, we examined the corresponding final functions (numeric_stddev_pop and friends), they all seem to carefully treat a NULL trans value the same as a "zero input" (as in, state.N == 0 && state.NaNcount ==0). That does suggest to me that it should be fine to declare those combine functions as strict (barring the restriction that they should not be STRICT, anybody remembers why?). Cheers, Jesse and Deep
Properly mark NULL returns in numeric aggregates
Hi hackers, We found that several functions -- namely numeric_combine, numeric_avg_combine, numeric_poly_combine, and int8_avg_combine -- are returning NULL without signaling the nullity of datum in fcinfo.isnull. This is obscured by the fact that the only functions in core (finalfunc for various aggregates) that those return values feed into happen to tolerate (or rather, not quite distinguish) zero-but-not-NULL trans values. In Greenplum, this behavior becomes problematic because Greenplum serializes internal trans values before spilling the hash table. The serial functions (numeric_serialize and friends) are strict functions that will blow up when they are given null (either in the C sense or the SQL sense) inputs. In Postgres if we change hash aggregation in the future to spill the hash table (vis-à-vis the input tuples), this issues would manifest itself in the final aggregate because we'll serialize the combined (and likely incorrectly null) trans values. Please find attached a small patch fixing said issue. Originally reported by Denis Smirnov over at https://github.com/greenplum-db/gpdb/pull/9878 Cheers, Jesse and Deep From edeeaa220fbc3d082d133b29a8c394632588d11a Mon Sep 17 00:00:00 2001 From: Denis Smirnov Date: Thu, 9 Apr 2020 16:10:29 -0700 Subject: [PATCH] Properly mark NULL returns in numeric aggregates When a function returns NULL (in the SQL sense), it is expected to signal that in fcinfo.isnull . numeric_combine and friends break this rule by directly returning a NULL datum without signaling the nullity. This is obscured by the fact that the only functions in core (finalfunc for various aggregates) that those return values feed into happen to tolerate (or rather, not quite distinguish) zero-but-not-NULL trans values. In Greenplum, this behavior becomes problematic because Greenplum serializes internal trans values before spilling the hash table. The serial functions (numeric_serialize and friends) are strict functions that will blow up when they are given null (either in the C sense or the SQL sense) inputs. --- src/backend/utils/adt/numeric.c | 16 1 file changed, 16 insertions(+) diff --git a/src/backend/utils/adt/numeric.c b/src/backend/utils/adt/numeric.c index 9986132b45e..57896de6edf 100644 --- a/src/backend/utils/adt/numeric.c +++ b/src/backend/utils/adt/numeric.c @@ -3946,7 +3946,11 @@ numeric_combine(PG_FUNCTION_ARGS) state2 = PG_ARGISNULL(1) ? NULL : (NumericAggState *) PG_GETARG_POINTER(1); if (state2 == NULL) + { + if (state1 == NULL) + PG_RETURN_NULL(); PG_RETURN_POINTER(state1); + } /* manually copy all fields from state2 to state1 */ if (state1 == NULL) @@ -4034,7 +4038,11 @@ numeric_avg_combine(PG_FUNCTION_ARGS) state2 = PG_ARGISNULL(1) ? NULL : (NumericAggState *) PG_GETARG_POINTER(1); if (state2 == NULL) + { + if (state1 == NULL) + PG_RETURN_NULL(); PG_RETURN_POINTER(state1); + } /* manually copy all fields from state2 to state1 */ if (state1 == NULL) @@ -4543,7 +4551,11 @@ numeric_poly_combine(PG_FUNCTION_ARGS) state2 = PG_ARGISNULL(1) ? NULL : (PolyNumAggState *) PG_GETARG_POINTER(1); if (state2 == NULL) + { + if (state1 == NULL) + PG_RETURN_NULL(); PG_RETURN_POINTER(state1); + } /* manually copy all fields from state2 to state1 */ if (state1 == NULL) @@ -4774,7 +4786,11 @@ int8_avg_combine(PG_FUNCTION_ARGS) state2 = PG_ARGISNULL(1) ? NULL : (PolyNumAggState *) PG_GETARG_POINTER(1); if (state2 == NULL) + { + if (state1 == NULL) + PG_RETURN_NULL(); PG_RETURN_POINTER(state1); + } /* manually copy all fields from state2 to state1 */ if (state1 == NULL) -- 2.26.0
Re: Use compiler intrinsics for bit ops in hash
Hi David, On Sun, Mar 8, 2020 at 11:34 AM David Fetter wrote: > > On Mon, Mar 02, 2020 at 12:45:21PM -0800, Jesse Zhang wrote: > > Hi David, > > Per discussion on IRC with Andrew (RhodiumToad) Gierth: > > The runtime detection means there's always an indirect call overhead > and no way to inline. This is counter to what using compiler > intrinsics is supposed to do. > > It's better to rely on the compiler, because: > (a) The compiler often knows whether the value can or can't be 0 and > can therefore skip a conditional jump. Yes, the compiler would know to eliminate the branch if the inlined function is called with a literal argument, or it infers an invariant from the context (like nesting inside a conditional block, or a previous conditional "noreturn" path). > (b) If you're targeting a recent microarchitecture, the compiler can > just use the right instruction. I might be more conservative than you are on (b). The thought of building a binary that cannot run "somewhere" where the compiler supports by default still mortifies me. > (c) Even if the conditional branch is left in, it's not a big overhead. > I 100% agree with (c), see benchmarking results upthread. Cheers, Jesse
Re: Use compiler intrinsics for bit ops in hash
Hi John, Oops this email has been sitting in my outbox for 3 days... On Wed, Mar 4, 2020 at 1:46 AM John Naylor wrote: > On Tue, Mar 3, 2020 at 4:46 AM Jesse Zhang wrote: > > I've quickly put together a PoC patch on top of yours, which > > re-implements ceil_log2 using LZCNT coupled with a CPUID check. > > Thoughts? > > This patch seems to be making an assumption that an indirect function > call is faster than taking a branch (in inlined code) that the CPU > will almost always predict correctly. It would be nice to have some > numbers to compare. (against pg_count_leading_zeros_* using the "slow" > versions but statically inlined). > Ah, how could I forget that... I ran a quick benchmark on my laptop, and indeed, even though the GCC-generated code takes a hit on zero input (Clang generates slightly different code that gives indistinguishable runtime for zero and non-zero inputs), the inlined code (the function input in my benchmark is never a constant literal so the branch does get exercised at runtime) is still more than twice as fast as the function call. -- BenchmarkTime CPU Iterations -- BM_pfunc/01.57 ns 1.56 ns447127265 BM_pfunc/11.56 ns 1.56 ns449618696 BM_pfunc/81.57 ns 1.57 ns443013856 BM_pfunc/64 1.57 ns 1.57 ns448784369 BM_slow/00.602 ns0.600 ns 10 BM_slow/10.391 ns0.390 ns 10 BM_slow/80.392 ns0.391 ns 10 BM_slow/64 0.391 ns0.390 ns 10 BM_fast/0 1.47 ns 1.46 ns477513921 BM_fast/1 1.47 ns 1.46 ns473992040 BM_fast/8 1.46 ns 1.46 ns474895755 BM_fast/641.47 ns 1.46 ns477215268 For your amusement, I've attached the meat of the benchmark. To build the code you can grab the repository at https://github.com/d/glowing-chainsaw/tree/pfunc > Stylistically, "8 * sizeof(num)" is a bit overly formal, since the > hard-coded number we want is in the name of the function. Oh yeah, overly generic code is indicative of the remnants of my C++ brain, will fix. Cheers, Jesse #include "lzcnt.h" static bool go = lzcnt_available(); __attribute__((target("lzcnt"))) inline int lzcnt_fast(uint64_t word) { return __builtin_clzll(word); } static int lzcnt_choose(uint64_t word) { if (lzcnt_available()) lzcnt_pfunc = lzcnt_fast; else lzcnt_pfunc = lzcnt_slow; return lzcnt_pfunc(word); } int (*lzcnt_pfunc)(uint64_t word) = lzcnt_choose; #ifndef LZCNT_H #define LZCNT_H #include #include static inline bool lzcnt_available() { uint32_t exx[4] = {0, 0, 0, 0}; __get_cpuid(0x8001, exx + 0, exx + 1, exx + 2, exx + 3); return (exx[2] & bit_ABM) != 0; } extern int (*lzcnt_pfunc)(uint64_t word); __attribute__((target("no-lzcnt"))) inline int lzcnt_slow(uint64_t word) { if (word == 0) return 8 * sizeof(word); return __builtin_clzll(word); } int lzcnt_fast(uint64_t word); #endif #include "benchmark/benchmark.h" #include "lzcnt.h" static void BM_pfunc(benchmark::State& state) { auto word = state.range(0); for (auto _ : state) { lzcnt_pfunc(word); } } static void BM_slow(benchmark::State& state) { auto word = state.range(0); for (auto _ : state) { benchmark::DoNotOptimize(lzcnt_slow(word)); } } static void BM_fast(benchmark::State& state) { auto word = state.range(0); for (auto _ : state) { lzcnt_fast(word); } } BENCHMARK(BM_pfunc)->Arg(0)->Range(1, 64); BENCHMARK(BM_slow)->Arg(0)->Range(1, 64); BENCHMARK(BM_fast)->Arg(0)->Range(1, 64); BENCHMARK_MAIN();
Re: Use compiler intrinsics for bit ops in hash
Hi David, On Wed, Feb 26, 2020 at 9:56 PM David Fetter wrote: > > On Wed, Feb 26, 2020 at 09:12:24AM +0100, David Fetter wrote: > > On Fri, Jan 31, 2020 at 04:59:18PM +0100, David Fetter wrote: > > > On Wed, Jan 15, 2020 at 03:45:12PM -0800, Jesse Zhang wrote: > > > > On Tue, Jan 14, 2020 at 2:09 PM David Fetter wrote: > > > > > > The changes in hash AM and SIMPLEHASH do look like a net positive > > > > > > improvement. My biggest cringe might be in pg_bitutils: > > > > > > > > > > > > 1. Is ceil_log2_64 dead code? > > > > > > > > > > Let's call it nascent code. I suspect there are places it could go, if > > > > > I look for them. Also, it seemed silly to have one without the other. > > > > > > > > > > > > > While not absolutely required, I'd like us to find at least one > > > > place and start using it. (Clang also nags at me when we have > > > > unused functions). > > > > > > Done in the expanded patches attached. I see that you've found use of it in dynahash, thanks! The math in the new (from v4 to v6) patch is wrong: it yields ceil_log2(1) = 1 or next_power_of_2(1) = 2. I can see that you lifted the restriction of "num greater than one" for ceil_log2() in this patch set, but it's now _more_ problematic to base those functions on pg_leftmost_one_pos(). I'm not comfortable with your changes to pg_leftmost_one_pos() to remove the restriction on word being non-zero. Specifically pg_leftmost_one_pos() is made to return 0 on 0 input. While none of its current callers (in HEAD) is harmed, this introduces muddy semantics: 1. pg_leftmost_one_pos is semantically undefined on 0 input: scanning for a set bit in a zero word won't find it anywhere. 2. we can _try_ generalizing it to accommodate ceil_log2 by extrapolating based on the invariant that BSR + LZCNT = 31 (or 63). In that case, the extrapolation yields -1 for pg_leftmost_one_pos(0). I'm not convinced that others on the list will be comfortable with the generalization suggested in 2 above. I've quickly put together a PoC patch on top of yours, which re-implements ceil_log2 using LZCNT coupled with a CPUID check. Thoughts? Cheers, Jesse From 0e4392d77b6132a508b7da14871cc99066a9d114 Mon Sep 17 00:00:00 2001 From: Jesse Zhang Date: Fri, 28 Feb 2020 16:22:04 -0800 Subject: [PATCH] Use LZCOUNT when possible This patch reverts the changes to pg_leftmost_one and friends (which is really bit-scan-reverse, and is legitimately undefined one zero) and reworks ceil_log2 and friends to use a count-leading-zeros operation that is well-defined on zero. --- src/include/port/pg_bitutils.h | 47 - src/port/pg_bitutils.c | 77 ++ 2 files changed, 104 insertions(+), 20 deletions(-) diff --git a/src/include/port/pg_bitutils.h b/src/include/port/pg_bitutils.h index 88a9ea5b7fb..b4d5724ee1d 100644 --- a/src/include/port/pg_bitutils.h +++ b/src/include/port/pg_bitutils.h @@ -20,18 +20,19 @@ extern PGDLLIMPORT const uint8 pg_number_of_ones[256]; /* * pg_leftmost_one_pos32 * Returns the position of the most significant set bit in "word", - * measured from the least significant bit. + * measured from the least significant bit. word must not be 0. */ static inline int pg_leftmost_one_pos32(uint32 word) { #ifdef HAVE__BUILTIN_CLZ - return word ? 31 - __builtin_clz(word) : 0; + Assert(word != 0); + + return 31 - __builtin_clz(word); #else int shift = 32 - 8; - if (word == 0) - return 0; + Assert(word != 0); while ((word >> shift) == 0) shift -= 8; @@ -48,18 +49,19 @@ static inline int pg_leftmost_one_pos64(uint64 word) { #ifdef HAVE__BUILTIN_CLZ + Assert(word != 0); + #if defined(HAVE_LONG_INT_64) - return word ? 63 - __builtin_clzl(word) : 0; + return 63 - __builtin_clzl(word); #elif defined(HAVE_LONG_LONG_INT_64) - return word ? 63 - __builtin_clzll(word) : 0; + return 63 - __builtin_clzll(word); #else #error must have a working 64-bit integer datatype #endif #else /* !HAVE__BUILTIN_CLZ */ int shift = 64 - 8; - if (word == 0) - return 0; + Assert(word != 0); while ((word >> shift) == 0) shift -= 8; @@ -71,18 +73,19 @@ pg_leftmost_one_pos64(uint64 word) /* * pg_rightmost_one_pos32 * Returns the position of the least significant set bit in "word", - * measured from the least significant bit. + * measured from the least significant bit. word must not be 0. */ static inline int pg_rightmost_one_pos32(uint32 word) { #ifdef HAVE__BUILTIN_CTZ - return word ? __builtin_ctz(word) : 32; + Assert(word != 0); + + return __builtin_ctz(word); #else int result = 0; - if (word == 0) - return 32; + Assert(word != 0); while ((word & 255) == 0) { @@ -102,
Re: Resolving the python 2 -> python 3 mess
Hi Tom, I really like the "stub .so" idea, but feel pretty uncomfortable for the "transparent" upgrade. Response inlined. On Mon, Feb 17, 2020 at 8:49 AM Tom Lane wrote: > > 2. On platforms where Python 2.x is no longer supported, transparently > map plpythonu and plpython2u to plpython3u. "Transparent" meaning that > dump/reload or pg_upgrade of existing plpythonu/plpython2u functions > will work, but when you run them, what you actually get is Python 3.x. It's fair enough that plpythonu changes its meaning, people who really want the stability should explicitly use plpython2u. > > For existing functions that don't use any obsolete Python syntax > (which one would hope is a pretty large percentage), this is a > zero-effort conversion for users. If a function does use obsolete > constructs, it will get a parse failure when executed, and the user > will have to update it to Python 3 syntax. I propose that we make > that case reasonably painless by providing the conversion script > I posted in [3] (or another one if somebody's got a better one), > bundled as a separately-installable extension. > > A possible gotcha in this approach is if there are any python 2/3 > incompatibilities that would not manifest as syntax errors or > obvious runtime errors, but would allow old code to execute and > silently do the wrong thing. One would hope that the Python crowd > weren't dumb enough to do that, but I don't know whether it's true. > If there are nasty cases like that, maybe what we have to do is allow > plpythonu/plpython2u functions to be dumped and reloaded into a > python-3-only install, but refuse to execute them until they've > been converted. "True division", one of the very first (2011, awww) few breaking changes introduced in Python 3 [1], comes to mind. While it's not the worst incompatibilities between Python 2 and 3, it's bad enough to give pause to the notion that a successful parsing implies successful conversion. [1] https://www.python.org/dev/peps/pep-0238/ Cheers, Jesse
Re: Parallel grouping sets
On Mon, Feb 3, 2020 at 12:07 AM Richard Guo wrote: > > Hi Jesse, > > Thanks for reviewing these two patches. I enjoyed it! > > On Sat, Jan 25, 2020 at 6:52 AM Jesse Zhang wrote: >> >> >> I glanced over both patches. Just the opposite, I have a hunch that v3 >> is always better than v5. Here's my 6-minute understanding of both. >> >> v3 ("the one with grouping set id") really turns the plan from a tree to >> a multiplexed pipe: we can execute grouping aggregate on the workers, >> but only partially. When we emit the trans values, also tag the tuple >> with a group id. After gather, finalize the aggregates with a modified >> grouping aggregate. Unlike a non-split grouping aggregate, the finalize >> grouping aggregate does not "flow" the results from one rollup to the >> next one. Instead, each group only advances on partial inputs tagged for >> the group. >> > > Yes, this is what v3 patch does. > > We (Pengzhou and I) had an offline discussion on this plan and we have > some other idea. Since we have tagged 'GroupingSetId' for each tuple > produced by partial aggregate, why not then perform a normal grouping > sets aggregation in the final phase, with the 'GroupingSetId' included > in the group keys? The plan looks like: > > # explain (costs off, verbose) > select c1, c2, c3, avg(c3) from gstest group by grouping > sets((c1,c2),(c1),(c2,c3)); > QUERY PLAN > -- > Finalize GroupAggregate >Output: c1, c2, c3, avg(c3) >Group Key: (gset_id), gstest.c1, gstest.c2, gstest.c3 >-> Sort > Output: c1, c2, c3, (gset_id), (PARTIAL avg(c3)) > Sort Key: (gset_id), gstest.c1, gstest.c2, gstest.c3 > -> Gather >Output: c1, c2, c3, (gset_id), (PARTIAL avg(c3)) >Workers Planned: 4 >-> Partial HashAggregate > Output: c1, c2, c3, gset_id, PARTIAL avg(c3) > Hash Key: gstest.c1, gstest.c2 > Hash Key: gstest.c1 > Hash Key: gstest.c2, gstest.c3 > -> Parallel Seq Scan on public.gstest >Output: c1, c2, c3 > > This plan should be able to give the correct results. We are still > thinking if it is a better plan than the 'multiplexed pipe' plan as in > v3. Inputs of thoughts here would be appreciated. Ha, I believe you meant to say a "normal aggregate", because what's performed above gather is no longer "grouping sets", right? The group key idea is clever in that it helps "discriminate" tuples by their grouping set id. I haven't completely thought this through, but my hunch is that this leaves some money on the table, for example, won't it also lead to more expensive (and unnecessary) sorting and hashing? The groupings with a few partials are now sharing the same tuplesort with the groupings with a lot of groups even though we only want to tell grouping 1 *apart from* grouping 10, not neccessarily that grouping 1 needs to come before grouping 10. That's why I like the multiplexed pipe / "dispatched by grouping set id" idea: we only pay for sorting (or hashing) within each grouping. That said, I'm open to the criticism that keeping multiple tuplesort and agg hash tabes running is expensive in itself, memory-wise ... Cheers, Jesse
Re: Parallel grouping sets
On Thu, Jan 23, 2020 at 2:47 AM Amit Kapila wrote: > > On Sun, Jan 19, 2020 at 2:23 PM Richard Guo wrote: > > > > I realized that there are two patches in this thread that are > > implemented according to different methods, which causes confusion. > > > > Both the idea seems to be different. Is the second approach [1] > inferior for any case as compared to the first approach? Can we keep > both approaches for parallel grouping sets, if so how? If not, then > won't the code by the first approach be useless once we commit second > approach? > > > [1] - > https://www.postgresql.org/message-id/CAN_9JTwtTTnxhbr5AHuqVcriz3HxvPpx1JWE--DCSdJYuHrLtA%40mail.gmail.com > I glanced over both patches. Just the opposite, I have a hunch that v3 is always better than v5. Here's my 6-minute understanding of both. v5 (the one with a simple partial aggregate) works by pushing a little bit of partial aggregate onto workers, and perform grouping aggregate above gather. This has two interesting outcomes: we can execute unmodified partial aggregate on the workers, and execute almost unmodified rollup aggreegate once the trans values are gathered. A parallel plan for a query like SELECT count(*) FROM foo GROUP BY GROUPING SETS (a), (b), (c), (); can be Finalize GroupAggregate Output: count(*) Group Key: a Group Key: b Group Key: c Group Key: () Gather Merge Partial GroupAggregate Output: PARTIAL count(*) Group Key: a, b, c Sort Sort Key: a, b, c Parallel Seq Scan on foo v3 ("the one with grouping set id") really turns the plan from a tree to a multiplexed pipe: we can execute grouping aggregate on the workers, but only partially. When we emit the trans values, also tag the tuple with a group id. After gather, finalize the aggregates with a modified grouping aggregate. Unlike a non-split grouping aggregate, the finalize grouping aggregate does not "flow" the results from one rollup to the next one. Instead, each group only advances on partial inputs tagged for the group. Finalize HashAggregate Output: count(*) Dispatched by: (GroupingSetID()) Group Key: a Group Key: b Group Key: c Gather Partial GroupAggregate Output: PARTIAL count(*), GroupingSetID() Group Key: a Sort Key: b Group Key: b Sort Key: c Group Key: c Sort Sort Key: a Parallel Seq Scan on foo Note that for the first approach to be viable, the partial aggregate *has to* use a group key that's the union of all grouping sets. In cases where individual columns have a low cardinality but joint cardinality is high (say columns a, b, c each has 16 distinct values, but they are independent, so there are 4096 distinct values on (a,b,c)), this results in fairly high traffic through the shm tuple queue. Cheers, Jesse
Re: Use compiler intrinsics for bit ops in hash
On Tue, Jan 14, 2020 at 2:09 PM David Fetter wrote: > > The changes in hash AM and SIMPLEHASH do look like a net positive > > improvement. My biggest cringe might be in pg_bitutils: > > > > 1. Is ceil_log2_64 dead code? > > Let's call it nascent code. I suspect there are places it could go, if > I look for them. Also, it seemed silly to have one without the other. > While not absolutely required, I'd like us to find at least one place and start using it. (Clang also nags at me when we have unused functions). > On Tue, Jan 14, 2020 at 12:21:41PM -0800, Jesse Zhang wrote: > > 4. It seems like you *really* would like an operation like LZCNT in x86 > > (first appearing in Haswell) that is well defined on zero input. ISTM > > the alternatives are: > > > >a) Special case 1. That seems straightforward, but the branching cost > >on a seemingly unlikely condition seems to be a lot of performance > >loss > > > >b) Use architecture specific intrinsic (and possibly with CPUID > >shenanigans) like __builtin_ia32_lzcnt_u64 on x86 and use the CLZ > >intrinsic elsewhere. The CLZ GCC intrinsic seems to map to > >instructions that are well defined on zero in most ISA's other than > >x86, so maybe we can get away with special-casing x86? i. We can detect LZCNT instruction by checking one of the "extended feature" (EAX=8001) bits using CPUID. Unlike the "basic features" (EAX=1), extended feature flags have been more vendor-specific, but fortunately it seems that the feature bit for LZCNT is the same [1][2]. ii. We'll most likely still need to provide a fallback implementation for processors that don't have LZCNT (either because they are from a different vendor, or an older Intel/AMD processor). I wonder if simply checking for 1 is "good enough". Maybe a micro benchmark is in order? > Is there some way to tilt the tools so that this happens? We have a couple options here: 1. Use a separate object (a la our SSE 4.2 implemenation of CRC). On Clang and GCC (I don't have MSVC at hand), -mabm or -mlzcnt should cause __builtin_clz to generate the LZCNT instruction, which is well defined on zero input. The default configuration would translate __builtin_clz to code that subtracts BSR from the width of the input, but BSR leaves the destination undefined on zero input. 2. (My least favorite) use inline asm (a la our popcount implementation). > b) seems much more attractive. Is there some way to tilt the tools so > that this happens? What should I be reading up on? The enclosed references hopefully are good places to start. Let me know if you have more ideas. Cheers, Jesse References: [1] "How to detect New Instruction support in the 4th generation Intel® Core™ processor family" https://software.intel.com/en-us/articles/how-to-detect-new-instruction-support-in-the-4th-generation-intel-core-processor-family [2] "Bit Manipulation Instruction Sets" https://en.wikipedia.org/wiki/Bit_Manipulation_Instruction_Sets
Re: Use compiler intrinsics for bit ops in hash
Hi David, On Tue, Jan 14, 2020 at 9:36 AM David Fetter wrote: > > Folks, > > The recent patch for distinct windowing aggregates contained a partial > fix of the FIXME that didn't seem entirely right, so I extracted that > part, changed it to use compiler intrinsics, and submit it here. The changes in hash AM and SIMPLEHASH do look like a net positive improvement. My biggest cringe might be in pg_bitutils: > diff --git a/src/include/port/pg_bitutils.h b/src/include/port/pg_bitutils.h > index 498e532308..cc9338da25 100644 > --- a/src/include/port/pg_bitutils.h > +++ b/src/include/port/pg_bitutils.h > @@ -145,4 +145,32 @@ pg_rotate_right32(uint32 word, int n) > return (word >> n) | (word << (sizeof(word) * BITS_PER_BYTE - n)); > } > > +/* ceil(lg2(num)) */ > +static inline uint32 > +ceil_log2_32(uint32 num) > +{ > + return pg_leftmost_one_pos32(num-1) + 1; > +} > + > +static inline uint64 > +ceil_log2_64(uint64 num) > +{ > + return pg_leftmost_one_pos64(num-1) + 1; > +} > + > +/* calculate first power of 2 >= num > + * per https://graphics.stanford.edu/~seander/bithacks.html#RoundUpPowerOf2 > + * using BSR where available */ > +static inline uint32 > +next_power_of_2_32(uint32 num) > +{ > + return ((uint32) 1) << (pg_leftmost_one_pos32(num-1) + 1); > +} > + > +static inline uint64 > +next_power_of_2_64(uint64 num) > +{ > + return ((uint64) 1) << (pg_leftmost_one_pos64(num-1) + 1); > +} > + > #endif /* PG_BITUTILS_H */ > 1. Is ceil_log2_64 dead code? 2. The new utilities added here (ceil_log2_32 and company, next_power_of_2_32 and company) all require num > 1, but don't clearly Assert (or at the very least document) so. 3. A couple of the callers can actively pass in an argument of 1, e.g. from _hash_spareindex in hashutil.c, while some other callers are iffy at best (simplehash.h maybe?) 4. It seems like you *really* would like an operation like LZCNT in x86 (first appearing in Haswell) that is well defined on zero input. ISTM the alternatives are: a) Special case 1. That seems straightforward, but the branching cost on a seemingly unlikely condition seems to be a lot of performance loss b) Use architecture specific intrinsic (and possibly with CPUID shenanigans) like __builtin_ia32_lzcnt_u64 on x86 and use the CLZ intrinsic elsewhere. The CLZ GCC intrinsic seems to map to instructions that are well defined on zero in most ISA's other than x86, so maybe we can get away with special-casing x86? Cheers, Jesse
Re: C testing for Postgres
On Fri, Jun 28, 2019 at 10:37 AM Adam Berlin wrote: > > If we were to use this tool, would the community want to vendor the > framework in the Postgres repository, or keep it in a separate > repository that produces a versioned shared library? > If the library is going to actively evolve, we should bring it into the tree. For a project like this, a "versioned shared library" is a massive pain in the rear for both the consumer of such libraries and for their maintainers. Cheers, Jesse
Re: A small note on the portability of cmake
On Sat, Jan 19, 2019 at 8:50 AM Tom Lane wrote: > > Failed miserably. It turns out cmake has a hard dependency on libuv > which (a) has a hard dependency on atomic ops and (b) according to its > own docs, doesn't really care about any platforms other than > Linux/macOS/Windows and maybe FreeBSD.> I beg to disagree with point b above: the libuv project continually receives (and accepts) patches fixing bugs specifically for OpenBSD. Atomic ops (compare-and-exchange) might be a harder dependency to shed for libuv. Does the fallback onto compiler intrinsics (__sync_val_compare_and_swap, or on GCC 4.7+, __atomic_compare_exchange_n) not work here? Jesse
Re: [PATCH] Fix Proposal - Deadlock Issue in Single User Mode When IO Failure Occurs
Hey Chengyu, How did you set up your GDB to get "p" to pretty-print a Postgres list? Cheers, Jesse On Fri, Nov 30, 2018 at 1:00 PM Chengchao Yu wrote: > Greetings, > > > > Recently, we hit a few occurrences of deadlock when IO failure (including > disk full, random remote disk IO failures) happens in single user mode. We > found the issue exists on both Linux and Windows in multiple postgres > versions. > > > > Here are the steps to repro on Linux (as Windows repro is similar): > > > > 1. Get latest PostgreSQL code, build and install the executables. > > > > $ git clone https://git.postgresql.org/git/postgresql.git > > $ cd postgresql > > $ PGROOT=$(pwd) > > $ git checkout REL_11_STABLE > > $ mkdir build > > $ cd build > > $ ../configure --prefix=/path/to/postgres > > $ make && make install > > > > 2. Run initdb to initialize a PG database folder. > > > > $ /path/to/postgres/bin/initdb -D /path/to/data > > > > 3. Because the unable to write relation data scenario is difficult > to hit naturally even reserved space is turned off, I have prepared a small > patch (see attachment “emulate-error.patch”) to force an error when PG > tries to write data to relation files. We can just apply the patch and > there is no need to put efforts flooding data to disk any more. > > > > $ cd $PGROOT > > $ git apply /path/to/emulate-error.patch > > $ cd build > > $ make && make install > > > > 4. Connect to the newly initialized database cluster with single > user mode, create a table, and insert some data to the table, do a > checkpoint or directly give EOF. Then we hit the deadlock issue and the > process will not exit until we kill it. > > > > Do a checkpoint explicitly: > > > > $ /path/to/postgres/bin/postgres --single -D /path/to/data/ postgres -c > exit_on_error=true < > > create table t1(a int); > > > insert into t1 values (1), (2), (3); > > > checkpoint; > > > EOF > > > > PostgreSQL stand-alone backend 11.1 > > backend> backend> backend> 2018-11-29 02:45:27.891 UTC [18806] FATAL: > Emulate exception in mdwrite() when writing to disk > > 2018-11-29 02:55:27.891 UTC [18806] CONTEXT: writing block 8 of relation > base/12368/1247 > > 2018-11-29 02:55:27.891 UTC [18806] STATEMENT: checkpoint; > > > > 2018-11-29 02:55:27.900 UTC [18806] FATAL: Emulate exception in mdwrite() > when writing to disk > > 2018-11-29 02:55:27.900 UTC [18806] CONTEXT: writing block 8 of relation > base/12368/1247 > > > > Or directly give an EOF: > > > > $ /path/to/postgres/bin/postgres --single -D /path/to/data/ postgres -c > exit_on_error=true < > > create table t1(a int); > > > insert into t1 values (1), (2), (3); > > > EOF > > > > PostgreSQL stand-alone backend 11.1 > > backend> backend> backend> 2018-11-29 02:55:24.438 UTC [18149] FATAL: > Emulate exception in mdwrite() when writing to disk > > 2018-11-29 02:45:24.438 UTC [18149] CONTEXT: writing block 8 of relation > base/12368/1247 > > > > 5. Moreover, when we try to recover the database with single user > mode, we hit the issue again, and the process does not bring up the > database nor exit. > > > > $ /path/to/postgres/bin/postgres --single -D /path/to/data/ postgres -c > exit_on_error=true > > 2018-11-29 02:59:33.257 UTC [19058] LOG: database system shutdown was > interrupted; last known up at 2018-11-29 02:58:49 UTC > > 2018-11-29 02:59:33.485 UTC [19058] LOG: database system was not properly > shut down; automatic recovery in progress > > 2018-11-29 02:59:33.500 UTC [19058] LOG: redo starts at 0/1672E40 > > 2018-11-29 02:59:33.500 UTC [19058] LOG: invalid record length at > 0/1684B90: wanted 24, got 0 > > 2018-11-29 02:59:33.500 UTC [19058] LOG: redo done at 0/1684B68 > > 2018-11-29 02:59:33.500 UTC [19058] LOG: last completed transaction was > at log time 2018-11-29 02:58:49.856663+00 > > 2018-11-29 02:59:33.547 UTC [19058] FATAL: Emulate exception in mdwrite() > when writing to disk > > 2018-11-29 02:59:33.547 UTC [19058] CONTEXT: writing block 8 of relation > base/12368/1247 > > > > Analyses: > > > > So, what happened? Actually, there are 2 types of the deadlock due to the > same root cause. Let’s first take a look at the scenario in step #5. In > this scenario, the deadlock happens when disk IO failure occurs inside > StartupXLOG(). If we attach debugger to PG process, we will see the > process is stuck acquiring the buffer’s lw-lock in AbortBufferIO(). > > > > void > > AbortBufferIO(void) > > { > > BufferDesc *buf = InProgressBuf; > > > > if (buf) > > { > > uint32 buf_state; > > > > /* > > * Since LWLockReleaseAll has already been called, we're not > holding > > * the buffer's io_in_progress_lock. We have to re-acquire it so > that > > * we can use TerminateBufferIO. Anyone who's executing WaitIO on > the > > * buffer will be in a busy spin until we succeed in doing this. > > */ > > LWLockAcquire(BufferDescriptorGetIOLock(buf), LW_EXCLUSIVE); > > > >