from:"Jesse Zhang"

Re: SSL SNI

2021-02-15 Thread Jesse Zhang

Hi Peter,
I imagine this also (finally) opens up the possibility for the server
to present a different certificate for each hostname based on SNI.
This eliminates the requirement for wildcard certs where the cluster
is running on a host with multiple (typically two to three) hostnames
and the clients check the hostname against SAN in the cert
(sslmode=verify-full). Am I right? Is that feature on anybody's
roadmap?

Cheers,
Jesse



On Mon, Feb 15, 2021 at 6:09 AM Peter Eisentraut
 wrote:
>
> A customer asked about including Server Name Indication (SNI) into the
> SSL connection from the client, so they can use an SSL-aware proxy to
> route connections.  There was a thread a few years ago where this was
> briefly discussed but no patch appeared.[0]  I whipped up a quick patch
> and it did seem to do the job, so I figured I'd share it here.
>
> The question I had was whether this should be an optional behavior, or
> conversely a behavior that can be turned off, or whether it should just
> be turned on all the time.
>
> Technically, it seems pretty harmless.  It adds another field to the TLS
> handshake, and if the server is not interested in it, it just gets ignored.
>
> The Wikipedia page[1] discusses some privacy concerns in the context of
> web browsing, but it seems there is no principled solution to those.
> The relevant RFC[2] "recommends" that SNI is used for all applicable TLS
> connections.
>
>
> [0]:
> https://www.postgresql.org/message-id/flat/CAPPwrB_tsOw8MtVaA_DFyOFRY2ohNdvMnLoA_JRr3yB67Rggmg%40mail.gmail.com
> [1]: https://en.wikipedia.org/wiki/Server_Name_Indication
> [2]: https://tools.ietf.org/html/rfc6066#section-3

Re: Have we tried to treat CTE as SubQuery in planner?

2020-11-13 Thread Jesse Zhang

Hi,

On Fri, Nov 13, 2020 at 10:04 PM Andy Fan wrote:
>
> Hi:
>
> Take the following example:
>
> insert into cte1  select i, i from generate_series(1, 100)i;
> create index on cte1(a);
>
> explain
> with cte1 as  (select * from cte1)
> select * from c where a = 1;
>

ITYM:

EXPLAIN
WITH c AS (SELECT * FROM cte1)
SELECT * FROM c WHERE a = 1;

I'm also guessing your table DDL is:

CREATE TABLE cte1 (a int, b int);

> It needs to do seq scan on the above format, however it is pretty
> quick if we change the query to
> select * from (select * from cte1) c where a = 1;

Does it? On HEAD, I got the following plan:

(without stats):
 Bitmap Heap Scan on foo
   Recheck Cond: (a = 1)
   ->  Bitmap Index Scan on foo_a_idx
 Index Cond: (a = 1)

(with stats):
 Index Scan using foo_a_idx on foo
   Index Cond: (a = 1)


>
> I know how we treat cte and subqueries differently currently,
> I just don't know why we can't treat cte as a subquery, so lots of
> subquery related technology can apply to it.  Do we have any
> discussion about this?

This was brought up a few times, the most recent one I can recall was a
little bit over two years ago [1]

[1] https://postgr.es/m/87sh48ffhb@news-spur.riddles.org.uk

Re: upcoming API changes for LLVM 12

2020-11-02 Thread Jesse Zhang

Hi Andres,

On Thu, Oct 15, 2020 at 6:12 PM Andres Freund  wrote:
>
> Hi,
>
> There will be a breaking API change for JIT related API in LLVM
> 12. Mostly about making control over various aspects easier, and then
> building on top of that providing new features (like JIT compiling in
> the background and making it easier to share JIT compiled output between
> processes).
>
> I've worked with Lang Hames to ensure that the new C API has feature
> parity...
>

I assume you're alluding to the removal of ORC legacy (v1) API? How far
back was feature parity in the new API, or we could only switch starting
with LLVM 12?

> The postgres changes are fairly localized, all in llvmjit.c - it's just
> a few #ifdefs to support both LLVM 12 and before.
>
> The two questions I have are:
>
> 1) Which versions do we want to add LLVM 12 support?  It'd be fairly
>easy to backport all the way. But it's not quite a bugfix... OTOH,
>it'd probably painful for packagers to have dependencies on different
>versions of LLVM for different versions of postgres.
>
> 2) When do we want to add LLVM 12 support? PG will soon stop compiling
>against LLVM 12, which will be released in about 6 months. I worked
>with Lang to make most of the breaking changes in a branch (to be
>merged in the next few days), but it's possible that there will be a
>few smaller changes.

I think this has already happened about two weeks ago when Lang's commit
6154c4115cd4b78d landed in LLVM master.

Cheers,
Jesse

Re: Residual cpluspluscheck issues

2020-09-30 Thread Jesse Zhang

Hi Tom,

So it's been 17 months since you sent this email, so I'm not sure that
nothing has happened (off list or in the code base), but...

On Sun, Jun 2, 2019 at 9:53 AM Tom Lane wrote:
> * On FreeBSD 12, I get
>
> /home/tgl/pgsql/src/include/utils/hashutils.h:23:23: warning: 'register' 
> storage
>   class specifier is deprecated and incompatible with C++17
>   [-Wdeprecated-register]
> extern Datum hash_any(register const unsigned char *k, register int keylen);
>   ^
> /home/tgl/pgsql/src/include/utils/hashutils.h:23:56: warning: 'register' 
> storage
>   class specifier is deprecated and incompatible with C++17
>   [-Wdeprecated-register]
> extern Datum hash_any(register const unsigned char *k, register int keylen);
>^
> /home/tgl/pgsql/src/include/utils/hashutils.h:24:32: warning: 'register' 
> storage
>   class specifier is deprecated and incompatible with C++17
>   [-Wdeprecated-register]
> extern Datum hash_any_extended(register const unsigned char *k,
>^
> /home/tgl/pgsql/src/include/utils/hashutils.h:25:11: warning: 'register' 
> storage
>   class specifier is deprecated and incompatible with C++17
>   [-Wdeprecated-register]
>register int ...
>^
>
> which I'm inclined to think means we should drop those register keywords.

I think this is a warning from Clang, right? You can get the same
warning on macOS if you use the upstream Clang where the default value
of -std for clang++ has been gnu++14 since LLVM 6.0 (not AppleClang,
which carries a proprietary patch that simply reverts the bump, but they
didn't even bother to patch the manpage).

I'm running into the same (well similar) warnings when running
cpluspluscheck with GCC 11. Upon closer inspection, this is because In
GCC 11, the default value of -std has been bumped to gnu++17. IOW, I
would've gotten the same warning had I just configured with CXX="g++-10
-std=gnu++17". The g++ warnings look like the following:

gcc> In file included from ./src/include/port/atomics.h:70,
gcc>  from ./src/include/storage/lwlock.h:21,
gcc>  from ./src/include/storage/lock.h:23,
gcc>  from ./src/include/storage/proc.h:21,
gcc>  from ./src/include/storage/shm_mq.h:18,
gcc>  from ./src/test/modules/test_shm_mq/test_shm_mq.h:18,
gcc>  from /tmp/cpluspluscheck.AxICnl/test.cpp:3:
gcc> ./src/include/port/atomics/arch-x86.h: In function 'bool
pg_atomic_test_set_flag_impl(volatile pg_atomic_flag*)':
gcc> ./src/include/port/atomics/arch-x86.h:143:16: warning: ISO C++17
does not allow 'register' storage class specifier [-Wregister]
gcc>   143 |  register char _res = 1;
gcc>   |^~~~
gcc> In file included from ./src/include/storage/spin.h:54,
gcc>  from ./src/test/modules/test_shm_mq/test_shm_mq.h:19,
gcc>  from /tmp/cpluspluscheck.AxICnl/test.cpp:3:
gcc> ./src/include/storage/s_lock.h: In function 'int tas(volatile slock_t*)':
gcc> ./src/include/storage/s_lock.h:226:19: warning: ISO C++17 does
not allow 'register' storage class specifier [-Wregister]
gcc>   226 |  register slock_t _res = 1;
gcc>   |   ^~~~

I think this is a problem worth solving: C++ 17 is removing the register
keyword, and C++ code that includes our headers have the difficult
choices of:

1) With a pre-C++ 17 compiler that forewarns the deprecation, find a
compiler-specific switch to turn off the warning

2) With a compiler that defaults to C++ 17 or later (major compiler
vendors are upgrading the default to C++17):

   2a) find a switch to explicitly downgrade to C++ 14 or below (and
   then possibly jump back to solving problem number 1).

   2b) find a compiler-specific switch to stay in the post- C++ 17 mode,
   but somehow "turn off" the removal of register keyword. This is
   particularly cringe-y because the C++ programs themselves have to be
   non-formant through a header we supply.

We can either drop the register keywords here (I wonder what the impact
on code generation would be, but it'll be hard to examine, given we'll
need to examine _every_ instance of generated code for an inline
function), or maybe consider hiding those sections with "#ifndef
__cplusplus" (provided that we believe there's not much of a reason for
the including code to call these functions, just throwing out uneducated
guesses here).

Do we have a clear decision of what we want to do here? How can I
contribute?

Cheers,
Jesse

Re: Partition prune with stable Expr

2020-09-28 Thread Jesse Zhang

On Sun, Sep 27, 2020 at 7:52 PM Andy Fan wrote:
>
>
> On Mon, Sep 28, 2020 at 9:15 AM Tom Lane wrote:
>>
>> Andy Fan writes:
>> > Well, that's very interesting.  Specific to my user case,
>> > SELECT * FROM p WHERE pkey = to_date('2018-12-13', '-mm-dd)';
>> > p has 1500+ partitions and planning takes lots of time, which is so same
>> > with SELECT * FROM p WHERE pkey = '2018-12-13',  however the planning
>> > time difference is so huge, that doesn't make sense in human view.  Can
>> > we do something for that?  to_date(text, text) should be a "immutable"
>> > function IMO.  Does that have a semantic issue or other issues?
>>
>> Yeah.  It depends on the lc_time setting, and possibly also the timezone
>> GUC.  (Admittedly, common values of the format string would not have
>> any lc_time dependency, but the immutability property is not fine-grained
>> enough to recognize that.)
>>
>> regards, tom lane
>
>
> Thanks for your reply. Even it has something on GUC or lc_time setting, 
> suppose
> it should be decided at planning time.  Do we have concerns about changes
> between planning and execution?

Planner can be called at prepared statement creation time, like

PREPARE yolo() AS SELECT * FROM foo WHERE pk = to_date(...);

Here, there's an arbitrary gap between planning time, and execution.

>
> The attached patch marked some common formatting function as immutable,
> only one partition prune test case needed fixing because of this. I only 
> changed
> to_char/to_date/to_timestamp,  however the whole list is below. I can change
> all of them if needed.
>
>  proname | count
> -+---
>  to_ascii| 3
>  to_char | 8
>  to_date | 1
>  to_hex  | 2
>  to_json | 1
>  to_jsonb| 1
>  to_number   | 1
>  to_regclass | 1
>  to_regcollation | 1
>  to_regnamespace | 1
>  to_regoper  | 1
>  to_regoperator  | 1
>  to_regproc  | 1
>  to_regprocedure | 1
>  to_regrole  | 1
>  to_regtype  | 1
>  to_timestamp| 2
>  to_tsquery  | 2
>  to_tsvector | 6
> (19 rows)
>
This patch is ridiculous.

Immutable functions need to produce the same output for the same
argument values. None of the functions changed in the patch is
immutable: they are all stable because they all depend on GUC settings
(e.g. to_tsvector depends on default_text_search_config).

Re: Fix for configure error in 9.5/9.6 on macOS 11.0 Big Sur

2020-09-04 Thread Jesse Zhang

On Thu, Sep 3, 2020 at 10:36 AM Tom Lane wrote:
>
> Jesse Zhang writes:
> >> Peter Eisentraut writes:
> >>> Where did the -Werror come from?
>
> > If you noticed the full invocation of clang, you'd notice that Werror is
> > nowhere on the command line, even though the error message suggests
> > otherwise. I think this is a behavior from the new AppleClang,
>
> Hmph.  If you explicitly say -Wno-error, does the error drop back down
> to being a warning?

Yeah something like that, this is what works for me:

clang -Wno-error=implicit-function-declaration c.c

Then it became a warning.

Interestingly, it seems that AppleClang incorrectly forces this warning
on us:

$ clang --verbose c.c

| Apple clang version 12.0.0 (clang-1200.0.31.1)
| Target: x86_64-apple-darwin20.0.0
| Thread model: posix
| InstalledDir: /Library/Developer/CommandLineTools/usr/bin
|  "/Library/Developer/CommandLineTools/usr/bin/clang" -cc1 -triple
x86_64-apple-macosx11.0.0 -Wdeprecated-objc-isa-usage
-Werror=deprecated-objc-isa-usage
-Werror=implicit-function-declaration -emit-obj -mrelax-all
-disable-free -disable-llvm-verifier -discard-value-names
-main-file-name c.c -mrelocation-model pic -pic-level 2 -mthread-model
posix -mframe-pointer=all -fno-strict-return -masm-verbose
-munwind-tables -target-sdk-version=11.0 -target-cpu penryn
-dwarf-column-info -debugger-tuning=lldb -target-linker-version 609 -v
-resource-dir /Library/Developer/CommandLineTools/usr/lib/clang/12.0.0
-isysroot /Library/Developer/CommandLineTools/SDKs/MacOSX.sdk
-I/usr/local/include -internal-isystem
/Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/usr/local/include
-internal-isystem
/Library/Developer/CommandLineTools/usr/lib/clang/12.0.0/include
-internal-externc-isystem
/Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/usr/include
-internal-externc-isystem
/Library/Developer/CommandLineTools/usr/include -Wno-reorder-init-list
-Wno-implicit-int-float-conversion -Wno-c99-designator
-Wno-final-dtor-non-final-class -Wno-extra-semi-stmt
-Wno-misleading-indentation -Wno-quoted-include-in-framework-header
-Wno-implicit-fallthrough -Wno-enum-enum-conversion
-Wno-enum-float-conversion -fdebug-compilation-dir /tmp -ferror-limit
19 -fmessage-length 193 -stack-protector 1 -fstack-check
-mdarwin-stkchk-strong-link -fblocks -fencode-extended-block-signature
-fregister-global-dtors-with-atexit -fgnuc-version=4.2.1
-fobjc-runtime=macosx-11.0.0 -fmax-type-align=16
-fdiagnostics-show-option -fcolor-diagnostics -o
/var/folders/ts/nxrsmhmd0xb5zdrlqb4jlkbrgn/T/c-e6802f.o -x c c.c
| clang -cc1 version 12.0.0 (clang-1200.0.31.1) default target
x86_64-apple-darwin20.0.0

Notice that -Werror=implicit-function-declaration up there? I spent a
few minutes digging in Apple's published fork of LLVM, they've been
forcing this error flag for quite a while, but this particular
warning-turned-error is guarded by a conditional along the lines of "is
this iOS-like" [1][2], so I cannot imagine such a code path is activated
(other than something like "goto fail;" from 2014)

>
> > I've heard reports of the same under the latest Xcode 12 on macOS
> > Catalina, but I don't have my hands on such an env.
>
> The latest thing available to the unwashed masses seems to be
> Xcode 11.7 with

Yes you're right. Xcode 12 is still beta.

>
> Smells like an Apple bug from here.  Surely they're not expecting
> that anyone will appreciate -Werror suddenly being the default.

I think you've convinced me that this is an Apple bug indeed. I'll
probably just get by with a Wno-error=implicit-function-declaration in
my CFLAGS for now.


[1] 
https://github.com/apple/llvm-project/blob/swift-5.3-DEVELOPMENT-SNAPSHOT-2020-08-04-a/clang/lib/Driver/ToolChains/Darwin.cpp#L952-L962
[2] 
https://opensource.apple.com/source/clang/clang-800.0.42.1/src/tools/clang/lib/Driver/ToolChains.cpp

Re: Fix for configure error in 9.5/9.6 on macOS 11.0 Big Sur

2020-09-03 Thread Jesse Zhang

Hi Tom and Peter,

On Wed, Sep 2, 2020 at 10:40 PM Tom Lane  wrote:
>
> Peter Eisentraut  writes:
> > On 2020-09-02 22:43, Jesse Zhang wrote:
> >> | conftest.c:184:3: error: implicitly declaring library function
> >> 'exit' with type 'void (int) __attribute__((noreturn))'
> >> [-Werror,-Wimplicit-function-declaration]
>
> > Where did the -Werror come from?
>
> Peter wasn't entirely explicit here, but note the advice at the end of
>
> https://www.postgresql.org/docs/devel/install-procedure.html
>
> that you cannot include -Werror in any CFLAGS you tell configure
> to use.  It'd be nice if autoconf was more robust about that,
> but it is not our bug to fix.
>
> regards, tom lane

If you noticed the full invocation of clang, you'd notice that Werror is
nowhere on the command line, even though the error message suggests
otherwise. I think this is a behavior from the new AppleClang, here's
the minimal repro:

int main() { exit(0); }

And boom!

$ clang -c c.c
c.c:1:14: error: implicitly declaring library function 'exit' with
type 'void (int) __attribute__((noreturn))'
[-Werror,-Wimplicit-function-declaration]
int main() { exit(0); }
 ^
c.c:1:14: note: include the header  or explicitly provide a
declaration for 'exit'
1 error generated.

My environment:

$ uname -rsv
Darwin 20.0.0 Darwin Kernel Version 20.0.0: Fri Aug 14 00:25:13 PDT
2020; root:xnu-7195.40.44.151.1~4/RELEASE_X86_64
$ clang --version
Apple clang version 12.0.0 (clang-1200.0.31.1)
Target: x86_64-apple-darwin20.0.0
Thread model: posix
InstalledDir: /Library/Developer/CommandLineTools/usr/bin

I've heard reports of the same under the latest Xcode 12 on macOS
Catalina, but I don't have my hands on such an env.

Cheers,
Jesse

Re: Fix for configure error in 9.5/9.6 on macOS 11.0 Big Sur

2020-09-02 Thread Jesse Zhang

Wow thanks Alvaro! My search of "most obvious keywords" didn't turn this
up.

On Wed, Sep 2, 2020 at 2:18 PM Alvaro Herrera wrote:
>
> On 2020-Sep-02, Jesse Zhang wrote:
>
> > Hi Peter,
> >
> > Yeah it's funny I got this immediately after upgrading to Big Sur (beta
> > 5). I found commit 1c0cf52b39ca3 but couldn't quite find the mailing
> > list thread on it from 4 years ago (it lists Heikki and Thomas Munro as
> > reviewers). Was it prompted by a similar error you encountered?
>
> https://postgr.es/m/bf9de63c-b669-4b8c-d33b-4a5ed11cd...@2ndquadrant.com

Cheers,
Jesse

Re: Fix for configure error in 9.5/9.6 on macOS 11.0 Big Sur

2020-09-02 Thread Jesse Zhang

Hi Peter,

Yeah it's funny I got this immediately after upgrading to Big Sur (beta
5). I found commit 1c0cf52b39ca3 but couldn't quite find the mailing
list thread on it from 4 years ago (it lists Heikki and Thomas Munro as
reviewers). Was it prompted by a similar error you encountered?

On Tue, Aug 25, 2020 at 6:57 AM Peter Eisentraut
 wrote:
>
> On 2020-08-02 23:18, Tom Lane wrote:
> > Thomas Gilligan  writes:
> >> Under the next version of macOS (11.0 unreleased beta 3), configuring 
> >> Postgres 9.5 and 9.6 fails with
> >
> >>> checking test program... ok
> >>> checking whether long int is 64 bits... no
> >>> checking whether long long int is 64 bits... no
> >>> configure: error: Cannot find a working 64-bit integer type.
> >
> > Hm, could we see the config.log output for this?  I'm not 100% convinced
> > that you've diagnosed the problem accurately, because it'd imply that
> > Apple made some fundamentally incompatible changes in libc, which
> > seems like stirring up trouble for nothing.
>
> It looks like the new compiler errors out on calling undeclared
> functions.  Might be good to see config.log to confirm this.

Yeah here's an excerpt from config.log verbatim (I'm not wrapping the
lines):

| configure:13802: checking whether long int is 64 bits
| configure:13860: ccache clang -o conftest -Wall -Wmissing-prototypes
-Wpointer-arith -Wdeclaration-after-statement -Wendif-labels
-Wmissing-format-attribute -Wformat-security -fno-strict-aliasing
-fwrapv -Wno-unused-command-line-argument -g -O0 conftest.c -lz
-lreadline -lm  >&5
| conftest.c:169:5: warning: no previous prototype for function
'does_int64_work' [-Wmissing-prototypes]
| int does_int64_work()
| ^
| conftest.c:169:1: note: declare 'static' if the function is not
intended to be used outside of this translation unit
| int does_int64_work()
| ^
| static
| conftest.c:183:1: warning: type specifier missing, defaults to 'int'
[-Wimplicit-int]
| main() {
| ^
| conftest.c:184:3: error: implicitly declaring library function
'exit' with type 'void (int) __attribute__((noreturn))'
[-Werror,-Wimplicit-function-declaration]
|   exit(! does_int64_work());
|   ^
| conftest.c:184:3: note: include the header  or explicitly
provide a declaration for 'exit'
| 2 warnings and 1 error generated.

Cheers,
Jesse

Re: run pgindent on a regular basis / scripted manner

2020-08-12 Thread Jesse Zhang

On Wed, Aug 12, 2020 at 10:14 PM Tom Lane wrote:
>
> Noah Misch  writes:
> > ... Another advantage of master-only is a guarantee against
> > disrupting time-critical patches.  (It would be ugly to push back branches 
> > and
> > sort out the master push later, but it doesn't obstruct the mission.)
>
> Hm, doesn't it?  I had the idea that "git push" is atomic --- either all
> the per-branch commits succeed, or they all fail.  I might be wrong.

As of Git 2.28, atomic pushes are not turned on by default. That means
"git push my-remote foo bar" _can_ result in partial success. I'm that
paranoid who does "git push --atomic my-remote foo bar fizz".

Cheers,
Jesse

Re: run pgindent on a regular basis / scripted manner

2020-08-12 Thread Jesse Zhang

Hi Andres,

On Wed, Aug 12, 2020 at 3:34 PM Andres Freund wrote:
>
> Hi,
>
> When developing patches I find it fairly painful that I cannot re-indent
> patches with pgindent without also seeing a lot of indentation changes
> in unmodified parts of files.  It is easy enough ([1]) to only re-indent
> files that I have modified, but there's often a lot of independent
> indentation changes in the files that I did modified.
>
> I e.g. just re-indented patch 0001 of my GetSnapshotData() series and
> most of the hunks were entirely unrelated. Despite the development
> window for 14 having only relatively recently opened. Based on my
> experience it tends to get worse over time.

How bad was it right after branching 13? I wonder if we have any
empirical measure of badness over time -- assuming there was a point in
the recent past where everything was good, and the bad just crept in.

>
>
> Is there any reason we don't just automatically run pgindent regularly?
> Like once a week? And also update typedefs.list automatically, while
> we're at it?

You know what's better than weekly? Every check-in. I for one would love
it if we can just format the entire codebase, and ensure that new
check-ins are also formatted. We _do_ need some form of continuous
integration to catch us when we have fallen short (again, once HEAD
reaches a "known good" state, it's conceivably cheap to keep it in the
good state.

Cheers,
Jesse

Re: Volatile Functions in Parallel Plans

2020-07-16 Thread Jesse Zhang

Hi Tom and Zhenghua,

On Thu, Jul 16, 2020 at 8:18 AM Tom Lane  wrote:
>
> Jesse Zhang  writes:
> > You are right, no they are not consistent. But Neither plans is
> > incorrect:
>
> > 1. In the first query, it's semantically permissible to evaluate
> > timeofday() for each pair of (c1, c2), and the plan reflects that.
> > (Notice that the parallel nature of the plan is just noise here, the
> > planner could have gone with a Nested Loop of which the inner side is
> > _not_ materialized).
>
> Yeah, exactly.  The short answer here is that refusing to parallelize
> the plan would not make things any more consistent.
>
> In general, using a volatile function in a WHERE clause is problematic
> because we make no guarantees about how often it will be evaluated.
> It could be anywhere between "never" and "once per row of the
> cross-product of the FROM tables".  AFAIR, the only concession we've made
> to make that less unpredictable is to avoid using volatile functions in
> index quals.  But even that will only make things noticeably more
> predictable for single-table queries.  As soon as you get into join cases,
> you don't have much control over when the function will get evaluated.
>
> regards, tom lane

For more kicks, I don't even think this is restricted to volatile
functions only. To stir the pot, it's conceivable that planner might
produce the following plan

Seq Scan on pg_temp_3.foo
  Output: foo.a
  Filter: (SubPlan 1)
  SubPlan 1
->  WindowAgg
  Output: sum(bar.d) OVER (?)
  ->  Seq Scan on pg_temp_3.bar
Output: bar.d

For the following query

SELECT a FROM foo WHERE b = ALL (
SELECT sum(d) OVER (ROWS UNBOUNDED PRECEDING) FROM bar
);

N.B. that the WindowAgg might produce a different set of numbers each
time depending on the scan order of bar, which means that for two
different "foo" tuples of equal b value, one might be rejected by the
filter whereas another survives.

I think the crux of the discussion should be whether we can reasonably
expect a subquery (subquery-like structure, for example the inner side
of nest loops upthread) to be evaluated only once. IMHO, no. The SQL
standard only broadly mandates that each "execution" of a subquery to be
"atomic".

Zhenghua and Tom, would you suggest the above plan is wrong (not
suboptimal, but wrong) just because we don't materialize the WindowAgg
under the subplan?

Cheers,
Jesse

Re: Volatile Functions in Parallel Plans

2020-07-16 Thread Jesse Zhang

Hi Zhenghua,

On Wed, Jul 15, 2020 at 5:44 AM Zhenghua Lyu wrote:
>
> Hi,
> I test some SQL in the latest Postgres master branch code (we find these 
> issues when
> developing Greenplum database in the PR 
> https://github.com/greenplum-db/gpdb/pull/10418,
> and my colleague come up with the following cases in Postgres):
>
>
> create table t3 (c1 text, c2 text);
> CREATE TABLE
> insert into t3
> select
>   'fhufehwiohewiuewhuhwiufhwifhweuhfwu', --random data
>   'fiowehufwhfyegygfewpfwwfeuhwhufwh' --random data
> from generate_series(1, 1000) i;
> INSERT 0 1000
> analyze t3;
> ANALYZE
> create table t4 (like t3);
> CREATE TABLE
> insert into t4 select * from t4;
> INSERT 0 0
> insert into t4 select * from t3;
> INSERT 0 1000
> analyze t4;
> ANALYZE
> set enable_hashjoin to off;
> SET
> explain (costs off)
> select count(*) from t3, t4
> where t3.c1 like '%sss'
>   and timeofday() = t4.c1 and t3.c1 = t4.c1;
>QUERY PLAN
> 
>  Finalize Aggregate
>->  Gather
>  Workers Planned: 2
>  ->  Partial Aggregate
>->  Nested Loop
>  Join Filter: (t3.c1 = t4.c1)
>  ->  Parallel Seq Scan on t3
>Filter: (c1 ~~ '%sss'::text)
>  ->  Seq Scan on t4
>Filter: (timeofday() = c1)
> (10 rows)
>
> explain (verbose, costs off)
> select count(*)
> from
>   t3,
>   (select *, timeofday() as x from t4 ) t4
> where t3.c1 like '%sss' and
>   timeofday() = t4.c1 and t3.c1 = t4.c1;
> QUERY PLAN
> --
>  Finalize Aggregate
>Output: count(*)
>->  Gather
>  Output: (PARTIAL count(*))
>  Workers Planned: 2
>  ->  Partial Aggregate
>Output: PARTIAL count(*)
>->  Nested Loop
>  Join Filter: (t3.c1 = t4.c1)
>  ->  Parallel Seq Scan on public.t3
>Output: t3.c1, t3.c2
>Filter: (t3.c1 ~~ '%sss'::text)
>  ->  Seq Scan on public.t4
>Output: t4.c1, NULL::text, timeofday()
>Filter: (timeofday() = t4.c1)
> (15 rows)
>
>
>
> Focus on the last two plans, the function timeofday is
> volatile but paralle-safe. And Postgres outputs two parallel
> plan.
>
>
> The first plan:
>
>  Finalize Aggregate
>->  Gather
>  Workers Planned: 2
>  ->  Partial Aggregate
>->  Nested Loop
>  Join Filter: (t3.c1 = t4.c1)
>  ->  Parallel Seq Scan on t3
>Filter: (c1 ~~ '%sss'::text)
>  ->  Seq Scan on t4
>Filter: (timeofday() = c1)
>
> The join's left tree is parallel scan and the right tree is seq scan.
> This algorithm is correct using the distribute distributive law of
> distributed join:
>A = [A1 A2 A3...An], B then A join B = gather( (A1 join B) (A2 join B) 
> ... (An join B) )
>
> The correctness of the above law should have a pre-assumption:
>   The data set of B is the same in each join: (A1 join B) (A2 join B) ... 
> (An join B)
>
> But things get complicated when volatile functions come in. Timeofday is just
> an example to show the idea. The core is volatile functions  can return 
> different
> results on successive calls with the same arguments. Thus the following piece,
> the right tree of the join
>  ->  Seq Scan on t4
>Filter: (timeofday() = c1)
> can not be considered consistent everywhere in the scan workers.
>
> The second plan
>
>  Finalize Aggregate
>Output: count(*)
>->  Gather
>  Output: (PARTIAL count(*))
>  Workers Planned: 2
>  ->  Partial Aggregate
>Output: PARTIAL count(*)
>->  Nested Loop
>  Join Filter: (t3.c1 = t4.c1)
>  ->  Parallel Seq Scan on public.t3
>Output: t3.c1, t3.c2
>Filter: (t3.c1 ~~ '%sss'::text)
>  ->  Seq Scan on public.t4
>Output: t4.c1, NULL::text, timeofday()
>Filter: (timeofday() = t4.c1)
>
>
> have voltile projections in the right tree of the nestloop:
>
>  ->  Seq Scan on public.t4
>Output: t4.c1, NULL::text, timeofday()
>Filter: (timeofday() = t4.c1)
>
> It should not be taken as consistent in different workers.

You are right, no they are not consistent. But Neither plans is
incorrect:

1. In the first query, it's semantically permissible to evaluate
timeofday() for each pair of (c1, c2), and the plan reflects that.
(Notice that the

Fix header identification

2020-07-13 Thread Jesse Zhang

Hi hackers,

PFA a patch that fixes up the identification for 4 header files.

I did a little archaeology trying to find plausible reasons for why we
committed the wrong identification in the first place, and here's what I
came up with:

jsonfuncs.h was created in ce0425b162d0a to house backend-only json
function declarations previously in jsonapi.h, and the identification
was a copy-pasta from jsonapi.h (then in src/include/utils).

jsonapi.h was wholesale moved to common in beb4699091e9f without
changing identification.

partdesc.h was created in 1bb5e78218107 but I can't find a good excuse
why we made a mistake then, except for (maybe) the prototype for
RelationBuildPartitionDesc() was moved from partcache.h and so we may
have also taken its identification line, although that sounds arbitrary.

llvmjit_emit.h was created with the wrong identification, probably due
to a personal template...

Cheers,
Jesse
From 55c7280ee09ab258efc1c59b4ce6226228bcbab0 Mon Sep 17 00:00:00 2001
From: Jesse Zhang 
Date: Thu, 9 Jul 2020 13:39:39 -0700
Subject: [PATCH] Fix header identification.

jsonfuncs.h was created in ce0425b162d0a to house backend-only json
function declarations previously in jsonapi.h, and the identification
was a copy-pasta from jsonapi.h (then in src/include/utils).

jsonapi.h was wholesale moved to common in beb4699091e9f without
changing identification.

partdesc.h was created in 1bb5e78218107 but I can't find a good excuse
why we made a mistake then, except for (maybe) the prototype for
RelationBuildPartitionDesc() was moved from partcache.h and so we may
have also taken its identification line, although that sounds arbitrary.

llvmjit_emit.h was created with the wrong identification, probably due
to a personal template...
---
 src/include/common/jsonapi.h| 2 +-
 src/include/jit/llvmjit_emit.h  | 2 +-
 src/include/partitioning/partdesc.h | 2 +-
 src/include/utils/jsonfuncs.h   | 2 +-
 4 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/src/include/common/jsonapi.h b/src/include/common/jsonapi.h
index bcfd57cc53c46..1fee0ea81e47a 100644
--- a/src/include/common/jsonapi.h
+++ b/src/include/common/jsonapi.h
@@ -6,7 +6,7 @@
  * Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group
  * Portions Copyright (c) 1994, Regents of the University of California
  *
- * src/include/utils/jsonapi.h
+ * src/include/common/jsonapi.h
  *
  *-
  */
diff --git a/src/include/jit/llvmjit_emit.h b/src/include/jit/llvmjit_emit.h
index 74607a43770af..1a7d6db7259e0 100644
--- a/src/include/jit/llvmjit_emit.h
+++ b/src/include/jit/llvmjit_emit.h
@@ -4,7 +4,7 @@
  *
  * Copyright (c) 2018-2020, PostgreSQL Global Development Group
  *
- * src/include/lib/llvmjit_emit.h
+ * src/include/jit/llvmjit_emit.h
  */
 #ifndef LLVMJIT_EMIT_H
 #define LLVMJIT_EMIT_H
diff --git a/src/include/partitioning/partdesc.h b/src/include/partitioning/partdesc.h
index fb416e073d075..70df764981e70 100644
--- a/src/include/partitioning/partdesc.h
+++ b/src/include/partitioning/partdesc.h
@@ -4,7 +4,7 @@
  *
  * Copyright (c) 1996-2020, PostgreSQL Global Development Group
  *
- * src/include/utils/partdesc.h
+ * src/include/partitioning/partdesc.h
  *
  *-
  */
diff --git a/src/include/utils/jsonfuncs.h b/src/include/utils/jsonfuncs.h
index 1f1b4029cbf12..4796b2b78bf9a 100644
--- a/src/include/utils/jsonfuncs.h
+++ b/src/include/utils/jsonfuncs.h
@@ -6,7 +6,7 @@
  * Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group
  * Portions Copyright (c) 1994, Regents of the University of California
  *
- * src/include/utils/jsonapi.h
+ * src/include/utils/jsonfuncs.h
  *
  *-
  */
-- 
2.27.0

Strange behavior with polygon and NaN

2020-06-24 Thread Jesse Zhang

Hi hackers,

While working with Chris Hajas on merging Postgres 12 with Greenplum
Database we stumbled upon the following strange behavior in the geometry
type polygon:

-- >8 

CREATE TEMP TABLE foo (p point);
CREATE INDEX ON foo USING gist(p);

INSERT INTO foo VALUES ('0,0'), ('1,1'), ('NaN,NaN');

SELECT $q$
SELECT * FROM foo WHERE p <@ polygon '(0,0), (0, 100), (100, 100), (100, 0)'
$q$ AS qry \gset

BEGIN;
SAVEPOINT yolo;
SET LOCAL enable_seqscan TO off;
:qry;

ROLLBACK TO SAVEPOINT yolo;
SET LOCAL enable_indexscan TO off;
SET LOCAL enable_bitmapscan TO off;
:qry;

-- 8< 

If you run the above repro SQL in HEAD (and 12, and likely all older
versions), you get the following output:

CREATE TABLE
CREATE INDEX
INSERT 0 3
BEGIN
SAVEPOINT
SET
   p
---
 (0,0)
 (1,1)
(2 rows)

ROLLBACK
SET
SET
 p
---
 (0,0)
 (1,1)
 (NaN,NaN)
(3 rows)


At first glance, you'd think this is the gist AM's bad, but on a second
thought, something else is strange here. The following query returns
true:

SELECT point '(NaN, NaN)' <@ polygon '(0,0), (0, 100), (100, 100), (100, 0)'

The above behavior of the "contained in" operator is surprising, and
it's probably not what the GiST AM is expecting. I took a look at
point_inside() in geo_ops.c, and it doesn't seem well equipped to handle
NaN. Similary ill-equipped is dist_ppoly_internal() which underlies the
distnace operator for polygon. It gives the following interesting
output:

SELECT *, c <-> polygon '(0,0),(0,100),(100,100),(100,0)' as distance
FROM (
  SELECT circle(point(100 * i, 'NaN'), 50) AS c
  FROM generate_series(-2, 4) i
) t(c)
ORDER BY 2;

c| distance
-+--
 <(-200,NaN),50> |0
 <(-100,NaN),50> |0
 <(0,NaN),50>|0
 <(100,NaN),50>  |0
 <(200,NaN),50>  |  NaN
 <(300,NaN),50>  |  NaN
 <(400,NaN),50>  |  NaN
(7 rows)

Should they all be NaN? Am I alone in thinking the index is right but
the operators are wrong? Or should we call the indexes wrong here?

Cheers,
Jesse and Chris

Re: Avoiding hash join batch explosions with extreme skew and weird stats

2020-06-24 Thread Jesse Zhang

Hi Tomas,

On Tue, Jun 23, 2020 at 3:24 PM Tomas Vondra wrote:
>
> Now, a couple comments / questions about the code.
>
>
> nodeHash.c
> --
>
>
> 1) MultiExecPrivateHash says this
>
>/*
> * Not subject to skew optimization, so either insert normally
> * or save to batch file if it belongs to another stripe
> */
>
> I wonder what it means to "belong to another stripe". I understand what
> that means for batches, which are identified by batchno computed from
> the hash value. But I thought "stripes" are just work_mem-sized pieces
> of a batch, so I don't quite understand this. Especially when the code
> does not actually check "which stripe" the row belongs to.

I have to concur that "stripe" did inspire a RAID vibe when I heard it,
but it seemed to be a better name than what it replaces

> 3) I'm a bit puzzled about this formula in ExecHashIncreaseNumBatches
>
>childbatch = (1U << (my_log2(hashtable->nbatch) - 1)) | 
> hashtable->curbatch;
>
> and also about this comment
>
>/*
> * TODO: what to do about tuples that don't go to the child
> * batch or stay in the current batch? (this is why we are
> * counting tuples to child and curbatch with two diff
> * variables in case the tuples go to a batch that isn't the
> * child)
> */
>if (batchno == childbatch)
>  childbatch_outgoing_tuples++;
>
> I thought each old batch is split into two new ones, and the tuples
> either stay in the current one, or are moved to the new one - which I
> presume is the childbatch, although I haven't tried to decode that
> formula. So where else could the tuple go, as the comment tried to
> suggest?

True, every old batch is split into two new ones, if you only consider
tuples coming from the batch file that _still belong in there_. i.e.
there are tuples in the old batch file that belong to a future batch. As
an example, if the current nbatch = 8, and we want to expand to nbatch =
16, (old) batch 1 will split into (new) batch 1 and batch 9, but it can
already contain tuples that need to go into (current) batches 3, 5, and
7 (soon-to-be batches 11, 13, and 15).

Cheers,
Jesse

Re: Fix compilation failure against LLVM 11

2020-05-28 Thread Jesse Zhang

On Thu, May 28, 2020 at 1:07 AM Michael Paquier  wrote:
> Please note that I would still wait for their next GA release to plug
> in any extra holes at the same time.  @Jesse: or is this change
> actually part of the upcoming 10.0.1?

No a refactoring like this was not in the back branches (nor is it
expected).

Thanks,
Jesse

Re: Fix compilation failure against LLVM 11

2020-05-27 Thread Jesse Zhang

Hi Andres,
On the mensiversary of the last response, what can I do to help move
this along (aside from heeding the advice "don't use LLVM HEAD")?

Michael Paquier expressed concerns over flippant churn upthread:

On Mon, Apr 27, 2020 at 10:19 PM Andres Freund wrote:
AF> On 2020-04-28 13:56:23 +0900, Michael Paquier wrote:
MP> > On Mon, Apr 27, 2020 at 07:48:54AM -0700, Jesse Zhang wrote:
JZ> > > Are you expressing a concern against "churning" this part of the
JZ> > > code in reaction to upstream LLVM changes? I'd agree with you in
JZ> > > general. But then the question we need to ask is "will we need
JZ> > > to revert this 3 weeks from now if upstream reverts their
JZ> > > changes?", or "we change X to Y now, will we need to instead
JZ> > > change X to Z 3 weeks later?".
> >
MP> > My concerns are a mix of all that, because we may finish by doing
MP> > the same verification work multiple times instead of fixing all
MP> > existing issues at once.  A good thing is that we may be able to
MP> > conclude rather soon, it looks like LLVM releases a new major
MP> > version every 3 months or so.
>
AF> Given the low cost of a change like this, and the fact that we have
AF> a buildfarm animal building recent trunk versions of llvm, I think
AF> it's better to just backpatch now.

For bystanders: Andres and I seemed to agree that this is unlikely to be
flippant (in addition to other benefits mentioned below). We haven't
discussed more on this, though I'm uncertain we had a consensus.

>
JZ> > > In that frame of mind, the answer is simply "no" w.r.t this
JZ> > > patch: it's removing an #include that simply has been dead: the
JZ> > > upstream change merely exposed it.
> >
MP> > The docs claim support for LLVM down to 3.9.  Are versions older
MP> > than 8 fine with your proposed change?
>
AF> I'll check.
>

How goes the checking? I was 99% certain it'd work but that might have
been my excuse to be lazy. Do you need help on this?

>
JZ> > > OTOH, is your concern more around "how many more dead #include
JZ> > > will LLVM 11 reveal before September?", I'm open to suggestions.
JZ> > > I personally have a bias to keep things working.
> >
MP> > This position can have advantages, though it seems to me that we
MP> > should still wait to see if there are more issues popping up.
>
AF> What's the benefit? The cost of checking this will be not
AF> meaningfully lower if there's other things to check as well, given
AF> their backward compat story presumably is different.

For bystanders: Andres and I argued for "fixing this sooner and
backpatch" and Michael suggested "wait longer and whack all moles". We
have waited, and there seems to be only one mole (finding all dead
unbroken "include"s was left as an exercise for the reader). Have we
come to an agreement on this?

Cheers,
Jesse

Re: do {} while (0) nitpick

2020-05-04 Thread Jesse Zhang

Hi Tom,

On Fri, May 1, 2020 at 2:32 PM Tom Lane wrote:
>
> Grepping showed me that there were some not-do-while macros that
> also had trailing semicolons.  These seem just as broken, so I
> fixed 'em all.
>

I'm curious: *How* are you able to discover those occurrences with grep?
I understand how John might have done it with his original patch: it's
quite clear the pattern he would look for looks like "while (0);" but
how did you find all these other macro definitions with a trailing
semicolon? My tiny brain can only imagine:

1. Either grep for trailing semicolon (OMG almost every line will come
up) and squint through the context the see the previous line has a
trailing backslash;
2. Or use some LLVM magic to spelunk through every macro definition and
look for a trailing semicolon

Cheers,
Jesse

Re: Fix compilation failure against LLVM 11

2020-04-28 Thread Jesse Zhang

Hi Michael,

On Mon, Apr 27, 2020 at 9:56 PM Michael Paquier wrote:
> rather soon, it looks like LLVM releases a new major version every 3
> months or so.

FYI LLVM has a six-month release cadence [1], the next release is
expected coming September (I can't tell whether you were joking).

Cheers,
Jesse

[1] https://llvm.org/docs/HowToReleaseLLVM.html#release-timeline

Re: Fix compilation failure against LLVM 11

2020-04-27 Thread Jesse Zhang

Hi Michael,

On Sun, Apr 26, 2020 at 11:21 PM Michael Paquier  wrote:
>
> On Sat, Apr 25, 2020 at 09:41:20PM -0700, Jesse Zhang wrote:
> > I searched my inbox and the archive, strange that nobody else is seeing
> > this.
> >
> > Turns out that LLVM has recently removed "llvm/IR/CallSite.h" in
> > (unreleased) version 11 [1][2]. To fix the build I tried conditionally
> > (on LLVM_VERSION_MAJOR < 11) including CallSite.h, but that looks yuck.
> > Then I poked at llvmjit_inline.cpp a bit and found that CallSite.h
> > doesn't seem to be really necessary. PFA a patch that simply removes
> > this #include.
> >
> > In addition, I've done the due dilligence of trying to build against
> > LLVM versions 8, 9, 10.
>
> LLVM 11 has not been released yet.  Do you think that this part or
> even more are subject to change before the 11 release?  My take would
> be to wait more before fixing this issue and make sure that our code
> is fixed when their code is GA'd.
> --
> Michael

Are you expressing a concern against "churning" this part of the code in
reaction to upstream LLVM changes? I'd agree with you in general. But
then the question we need to ask is "will we need to revert this 3 weeks
from now if upstream reverts their changes?", or "we change X to Y now,
will we need to instead change X to Z 3 weeks later?". In that frame of
mind, the answer is simply "no" w.r.t this patch: it's removing an
#include that simply has been dead: the upstream change merely exposed
it.

OTOH, is your concern more around "how many more dead #include will LLVM
11 reveal before September?", I'm open to suggestions. I personally have
a bias to keep things working.

Cheers,
Jesse

Fix compilation failure against LLVM 11

2020-04-25 Thread Jesse Zhang

Hi hackers,

My local build of master started failing last night with this error:

llvmjit_inline.cpp:59:10: fatal error: 'llvm/IR/CallSite.h' file not found
#include 
 ^~~~

I searched my inbox and the archive, strange that nobody else is seeing
this.

Turns out that LLVM has recently removed "llvm/IR/CallSite.h" in
(unreleased) version 11 [1][2]. To fix the build I tried conditionally
(on LLVM_VERSION_MAJOR < 11) including CallSite.h, but that looks yuck.
Then I poked at llvmjit_inline.cpp a bit and found that CallSite.h
doesn't seem to be really necessary. PFA a patch that simply removes
this #include.

In addition, I've done the due dilligence of trying to build against
LLVM versions 8, 9, 10.

Cheers,
Jesse

[1] LLVM Differential Revision: https://reviews.llvm.org/D78794
[2] LLVM commit https://github.com/llvm/llvm-project/commit/2c24051bacd2
"[CallSite removal] Rename CallSite.h to AbstractCallSite.h. NFC"
From 27084a0ebd37176f0a831042457e27cad8ac70eb Mon Sep 17 00:00:00 2001
From: Jesse Zhang 
Date: Sat, 25 Apr 2020 16:24:33 -0700
Subject: [PATCH] Remove unused include

LLVM has recently removed "llvm/IR/CallSite.h" in (unreleased) version
11 [1]. Instead of using preprocessor conditionals, just remove it
because it doesn't seem like AbstractCallSite (or none of the other
symbols declared in the header) is used by LLVM JIT.

[1] LLVM Differential Revision: https://reviews.llvm.org/D78794
---
 src/backend/jit/llvm/llvmjit_inline.cpp | 1 -
 1 file changed, 1 deletion(-)

diff --git a/src/backend/jit/llvm/llvmjit_inline.cpp b/src/backend/jit/llvm/llvmjit_inline.cpp
index 95d4d854f43..2617a461cad 100644
--- a/src/backend/jit/llvm/llvmjit_inline.cpp
+++ b/src/backend/jit/llvm/llvmjit_inline.cpp
@@ -56,7 +56,6 @@ extern "C"
 #include 
 #endif
 #include 
-#include 
 #include 
 #include 
 #include 
-- 
2.26.2

Re: Header / Trailer Comment Typos for M4 macros

2020-04-22 Thread Jesse Zhang

On Wed, Apr 22, 2020 at 12:29 PM Tom Lane  wrote:
>
> Jesse Zhang  writes:
> > either way: either changing the macro names or changing the comment. PFA
> > a patch that keeps the macro names.
>
> Pushed, thanks.
>
Thanks!

> > Also in hindsight: it seems that, as suggested in the trailer typo,
> > PGAC_PROG_CXX_VAR_OPT (a la the C version PGAC_PROG_CC_VAR_OPT) would be
> > a good addition if we ever want to add the negative warning flags (for
> > starter, Wno-unused-command-line-argument for clang++) to CXXFLAGS, but
> > I assume it wasn't there in the final patch because we didn't use it
> > (presumably because the patch was minimized?). Thoughts?
>
> I'd be inclined not to add it till we have an actual use for it.
> Dead code tends to break silently.
>
For sure. I feel the same about dead code.

I didn't make my question clear though: I'm curious what motivated the
original addition of -Wno-unused-command-line-argument in commit
73b416b2e412, and how that problem did't quite manifest itself with Clang++.
The commit mentioned pthread flags, but I tried taking out
Wno-unused-command-line-argument from configure.in and it produces no
warnings on my laptop (I know, that's a bad excuse). For context, I'm
running Clang 11 on Ubuntu amd64.

Cheers,
Jesse

Header / Trailer Comment Typos for M4 macros

2020-04-22 Thread Jesse Zhang

Hi hackers,

While poking at the build system I stumbled upon some trivial trailer
comment inconsistencies in config/c-compiler.m4. They can be fixed
either way: either changing the macro names or changing the comment. PFA
a patch that keeps the macro names.

In hindsight though, it seems that PGAC_PRINTF_ARCHETYPE was meant to be
PGAC_C_PRINTF_ARCHETYPE, judging by a half-second squint of surrounding,
similar macros. Thoughts?

Also in hindsight: it seems that, as suggested in the trailer typo,
PGAC_PROG_CXX_VAR_OPT (a la the C version PGAC_PROG_CC_VAR_OPT) would be
a good addition if we ever want to add the negative warning flags (for
starter, Wno-unused-command-line-argument for clang++) to CXXFLAGS, but
I assume it wasn't there in the final patch because we didn't use it
(presumably because the patch was minimized?). Thoughts?

Cheers,
Jesse
From 73c15bfdea388347455810c3eb98dbff235d3d08 Mon Sep 17 00:00:00 2001
From: Jesse Zhang 
Date: Wed, 22 Apr 2020 06:51:05 -0700
Subject: [PATCH] Fix header / trailer typos for m4 macros

These seem like legit names in earlier iterations of respective patches
(commit b779168ffe33 "Detect PG_PRINTF_ATTRIBUTE automatically." and
commit 6869b4f25847 "Add C++ support to configure.") but the macro had
been renamed out of sync with the header / trailer comment in the final
committed patch.
---
 config/c-compiler.m4 | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/config/c-compiler.m4 b/config/c-compiler.m4
index 501b74b3a19..c48cba9cf9a 100644
--- a/config/c-compiler.m4
+++ b/config/c-compiler.m4
@@ -2,7 +2,7 @@
 # config/c-compiler.m4
 
 
-# PGAC_C_PRINTF_ARCHETYPE
+# PGAC_PRINTF_ARCHETYPE
 # ---
 # Select the format archetype to be used by gcc to check printf-type functions.
 # We prefer "gnu_printf", as that most closely matches the features supported
@@ -466,7 +466,7 @@ undefine([Ac_cachevar])dnl
 # command-line option. If it does, add the string to CXXFLAGS.
 AC_DEFUN([PGAC_PROG_CXX_CFLAGS_OPT],
 [PGAC_PROG_VARCXX_VARFLAGS_OPT(CXX, CXXFLAGS, $1)
-])# PGAC_PROG_CXX_VAR_OPT
+])# PGAC_PROG_CXX_CFLAGS_OPT
 
 
 
-- 
2.26.2

Re: Properly mark NULL returns in numeric aggregates

2020-04-14 Thread Jesse Zhang

Hi David,

On Mon, Apr 13, 2020 at 10:46 PM David Rowley  wrote:
>
> On Tue, 14 Apr 2020 at 06:14, Tom Lane  wrote:
> > Yeah, they're relying exactly on the assumption that nodeAgg is not
> > going to try to copy a value declared "internal", and therefore they
> > can be loosey-goosey about whether the value pointer is null or not.
> > However, if you want to claim that that's wrong, you have to explain
> > why it's okay for some other code to be accessing a value that's
> > declared "internal".  I'd say that the meaning of that is precisely
> > "keepa u hands off".
> >
> > In the case at hand, the current situation is that we only expect the
> > values returned by these combine functions to be read by the associated
> > final functions, which are on board with the null-pointer representation
> > of an empty result.  Your argument is essentially that it should be
> > possible to feed the values to the aggregate's associated serialization
> > function as well.  But the core code never does that, so I'm not convinced
> > that we should add it to the requirements; we'd be unable to test it.
>
> Casting my mind back to when I originally wrote that code, I attempted
> to do so in such a way so that it could one day be used for a 3-stage
> aggregation. e.g Parallel Partial Aggregate -> Gather -> Combine
> Serial Aggregate on one node, then on some master node a Deserial
> Combine Finalize Aggregate.  You're very right that we can't craft
> such a plan with today's master  (We didn't even add a supporting enum
> for it in AggSplit).   However, it does appear that there are
> extensions or forks out there which attempt to use the code in this
> way, so it would be good to not leave those people out in the cold
> regarding this.

Greenplum plans split-aggregation quite similarly from Postgres: while
it doesn't pass partial results through a intra-cluster "Gather" --
using a reshuffle-by-hash type operation instead -- Greenplum _does_
split an aggregate into final and partial halves, running them on
different nodes. In short, the relation ship among the combine, serial,
and deserial functions are similar to how they are in Postgres today
(serial->deserial->combine), in the context of splitting aggregates.
The current problem arises because Greenplum spills the hash table in
hash aggregation (a diff we're working actively to upstream), a process
in which we have to touch (read: serialize and copy) the internal trans
values.  However, we are definitely eyeing what you described as
something to move towards.

As a fork, we'd like to carry as thin a diff as possible. So the current
situation is pretty much forcing us to diverge in the functions
mentioned up-thread.

In hindsight, "sloppy" might not have been a wise choice of words,
apologies for the possible offense, David!

>
> For testing, can't we just have an Assert() in
> advance_transition_function that verifies isnull matches the
> nullability of the return value for INTERNAL returning transfns? i.e,
> the attached
>
> I don't have a test case to hand that could cause this to fail, but it
> sounds like Jesse might.

One easy way to cause this is "sum(x) FILTER (WHERE false)" which will
for sure make the partial results NULL. Is that what you're looking for?
I'll be happy to send in the SQL.

Cheers,
Jesse

Re: Properly mark NULL returns in numeric aggregates

2020-04-13 Thread Jesse Zhang

On Fri, Apr 10, 2020 at 3:59 PM Tom Lane  wrote:
>
> They can't be strict because the initial iteration needs to produce
> something from a null state and non-null input.  nodeAgg's default
> behavior won't work for those because nodeAgg doesn't know how to
> copy a value of type "internal".
>
> regards, tom lane

Ah, I think I get it. A copy must happen because the input is likely in
a shorter-lived memory context than the state, but nodeAgg's default
behavior of copying a by-value datum won't really copy the object
pointed to by the pointer wrapped in the datum of "internal" type, so we
defer to the combine function. Am I right? Then it follows kinda
naturally that those combine functions have been sloppy on arrival since
commit 11c8669c0cc .

Cheers,
Jesse

Re: Properly mark NULL returns in numeric aggregates

2020-04-10 Thread Jesse Zhang

Hi Andres,

On Fri, Apr 10, 2020 at 12:14 PM Andres Freund  wrote:
>
> Shouldn't these just be marked as strict?
>

Are you suggesting that because none of the corresponding trans
functions (int8_avg_accum, int2_accum, and friends) ever output NULL?
That's what we thought, but then I realized that an input to a comebine
function is not necessarily an output from a trans function invocation:
for example, when there is a "FILTER (WHERE ...)" clause that filters
out every tuple in a group, the partial aggregate might just throw a
NULL state for the final aggregate to combine.

On the other hand, we examined the corresponding final functions
(numeric_stddev_pop and friends), they all seem to carefully treat a
NULL trans value the same as a "zero input" (as in, state.N == 0 &&
state.NaNcount ==0). That does suggest to me that it should be fine to
declare those combine functions as strict (barring the restriction that
they should not be STRICT, anybody remembers why?).

Cheers,
Jesse and Deep

Properly mark NULL returns in numeric aggregates

2020-04-09 Thread Jesse Zhang

Hi hackers,

We found that several functions -- namely numeric_combine,
numeric_avg_combine, numeric_poly_combine, and int8_avg_combine -- are
returning NULL without signaling the nullity of datum in fcinfo.isnull.
This is obscured by the fact that the only functions in core (finalfunc
for various aggregates) that those return values feed into happen to
tolerate (or rather, not quite distinguish) zero-but-not-NULL trans
values.

In Greenplum, this behavior becomes problematic because Greenplum
serializes internal trans values before spilling the hash table. The
serial functions (numeric_serialize and friends) are strict functions
that will blow up when they are given null (either in the C sense or the
SQL sense) inputs.

In Postgres if we change hash aggregation in the future to spill the
hash table (vis-à-vis the input tuples), this issues would manifest
itself in the final aggregate because we'll serialize the combined (and
likely incorrectly null) trans values.

Please find attached a small patch fixing said issue.  Originally
reported by Denis Smirnov over at
https://github.com/greenplum-db/gpdb/pull/9878

Cheers,
Jesse and Deep
From edeeaa220fbc3d082d133b29a8c394632588d11a Mon Sep 17 00:00:00 2001
From: Denis Smirnov 
Date: Thu, 9 Apr 2020 16:10:29 -0700
Subject: [PATCH] Properly mark NULL returns in numeric aggregates

When a function returns NULL (in the SQL sense), it is expected to
signal that in fcinfo.isnull . numeric_combine and friends break this
rule by directly returning a NULL datum without signaling the nullity.
This is obscured by the fact that the only functions in core (finalfunc
for various aggregates) that those return values feed into happen to
tolerate (or rather, not quite distinguish) zero-but-not-NULL trans
values.

In Greenplum, this behavior becomes problematic because Greenplum
serializes internal trans values before spilling the hash table. The
serial functions (numeric_serialize and friends) are strict functions
that will blow up when they are given null (either in the C sense or the
SQL sense) inputs.
---
 src/backend/utils/adt/numeric.c | 16 
 1 file changed, 16 insertions(+)

diff --git a/src/backend/utils/adt/numeric.c b/src/backend/utils/adt/numeric.c
index 9986132b45e..57896de6edf 100644
--- a/src/backend/utils/adt/numeric.c
+++ b/src/backend/utils/adt/numeric.c
@@ -3946,7 +3946,11 @@ numeric_combine(PG_FUNCTION_ARGS)
 	state2 = PG_ARGISNULL(1) ? NULL : (NumericAggState *) PG_GETARG_POINTER(1);
 
 	if (state2 == NULL)
+	{
+		if (state1 == NULL)
+			PG_RETURN_NULL();
 		PG_RETURN_POINTER(state1);
+	}
 
 	/* manually copy all fields from state2 to state1 */
 	if (state1 == NULL)
@@ -4034,7 +4038,11 @@ numeric_avg_combine(PG_FUNCTION_ARGS)
 	state2 = PG_ARGISNULL(1) ? NULL : (NumericAggState *) PG_GETARG_POINTER(1);
 
 	if (state2 == NULL)
+	{
+		if (state1 == NULL)
+			PG_RETURN_NULL();
 		PG_RETURN_POINTER(state1);
+	}
 
 	/* manually copy all fields from state2 to state1 */
 	if (state1 == NULL)
@@ -4543,7 +4551,11 @@ numeric_poly_combine(PG_FUNCTION_ARGS)
 	state2 = PG_ARGISNULL(1) ? NULL : (PolyNumAggState *) PG_GETARG_POINTER(1);
 
 	if (state2 == NULL)
+	{
+		if (state1 == NULL)
+			PG_RETURN_NULL();
 		PG_RETURN_POINTER(state1);
+	}
 
 	/* manually copy all fields from state2 to state1 */
 	if (state1 == NULL)
@@ -4774,7 +4786,11 @@ int8_avg_combine(PG_FUNCTION_ARGS)
 	state2 = PG_ARGISNULL(1) ? NULL : (PolyNumAggState *) PG_GETARG_POINTER(1);
 
 	if (state2 == NULL)
+	{
+		if (state1 == NULL)
+			PG_RETURN_NULL();
 		PG_RETURN_POINTER(state1);
+	}
 
 	/* manually copy all fields from state2 to state1 */
 	if (state1 == NULL)
-- 
2.26.0

Re: Use compiler intrinsics for bit ops in hash

2020-03-08 Thread Jesse Zhang

Hi David,
On Sun, Mar 8, 2020 at 11:34 AM David Fetter  wrote:
>
> On Mon, Mar 02, 2020 at 12:45:21PM -0800, Jesse Zhang wrote:
> > Hi David,
>
> Per discussion on IRC with Andrew (RhodiumToad) Gierth:
>
> The runtime detection means there's always an indirect call overhead
> and no way to inline.  This is counter to what using compiler
> intrinsics is supposed to do.
>
> It's better to rely on the compiler, because:
> (a) The compiler often knows whether the value can or can't be 0 and
> can therefore skip a conditional jump.

Yes, the compiler would know to eliminate the branch if the inlined
function is called with a literal argument, or it infers an invariant
from the context (like nesting inside a conditional block, or a previous
conditional "noreturn" path).

> (b) If you're targeting a recent microarchitecture, the compiler can
> just use the right instruction.

I might be more conservative than you are on (b). The thought of
building a binary that cannot run "somewhere" where the compiler
supports by default still mortifies me.

> (c) Even if the conditional branch is left in, it's not a big overhead.
>

I 100% agree with (c), see benchmarking results upthread.

Cheers,
Jesse

Re: Use compiler intrinsics for bit ops in hash

2020-03-08 Thread Jesse Zhang

Hi John,
Oops this email has been sitting in my outbox for 3 days...

On Wed, Mar 4, 2020 at 1:46 AM John Naylor  wrote:
> On Tue, Mar 3, 2020 at 4:46 AM Jesse Zhang  wrote:
> > I've quickly put together a PoC patch on top of yours, which
> > re-implements ceil_log2 using LZCNT coupled with a CPUID check.
> > Thoughts?
>
> This patch seems to be making an assumption that an indirect function
> call is faster than taking a branch (in inlined code) that the CPU
> will almost always predict correctly. It would be nice to have some
> numbers to compare. (against pg_count_leading_zeros_* using the "slow"
> versions but statically inlined).
>

Ah, how could I forget that... I ran a quick benchmark on my laptop, and
indeed, even though the GCC-generated code takes a hit on zero input
(Clang generates slightly different code that gives indistinguishable
runtime for zero and non-zero inputs), the inlined code (the function
input in my benchmark is never a constant literal so the branch does get
exercised at runtime) is still more than twice as fast as the function
call.

--
BenchmarkTime CPU   Iterations
--
BM_pfunc/01.57 ns 1.56 ns447127265
BM_pfunc/11.56 ns 1.56 ns449618696
BM_pfunc/81.57 ns 1.57 ns443013856
BM_pfunc/64   1.57 ns 1.57 ns448784369
BM_slow/00.602 ns0.600 ns   10
BM_slow/10.391 ns0.390 ns   10
BM_slow/80.392 ns0.391 ns   10
BM_slow/64   0.391 ns0.390 ns   10
BM_fast/0 1.47 ns 1.46 ns477513921
BM_fast/1 1.47 ns 1.46 ns473992040
BM_fast/8 1.46 ns 1.46 ns474895755
BM_fast/641.47 ns 1.46 ns477215268


For your amusement, I've attached the meat of the benchmark. To build
the code you can grab the repository at
https://github.com/d/glowing-chainsaw/tree/pfunc

> Stylistically, "8 * sizeof(num)" is a bit overly formal, since the
> hard-coded number we want is in the name of the function.

Oh yeah, overly generic code is indicative of the remnants of my C++
brain, will fix.

Cheers,
Jesse
#include "lzcnt.h"

static bool go = lzcnt_available();

__attribute__((target("lzcnt"))) inline int lzcnt_fast(uint64_t word) {
  return __builtin_clzll(word);
}

static int lzcnt_choose(uint64_t word) {
  if (lzcnt_available())
lzcnt_pfunc = lzcnt_fast;
  else
lzcnt_pfunc = lzcnt_slow;
  return lzcnt_pfunc(word);
}

int (*lzcnt_pfunc)(uint64_t word) = lzcnt_choose;
#ifndef LZCNT_H
#define LZCNT_H

#include 
#include 

static inline bool lzcnt_available() {
  uint32_t exx[4] = {0, 0, 0, 0};

  __get_cpuid(0x8001, exx + 0, exx + 1, exx + 2, exx + 3);

  return (exx[2] & bit_ABM) != 0;
}

extern int (*lzcnt_pfunc)(uint64_t word);

__attribute__((target("no-lzcnt"))) inline int lzcnt_slow(uint64_t word) {
  if (word == 0)
return 8 * sizeof(word);
  return __builtin_clzll(word);
}

int lzcnt_fast(uint64_t word);

#endif
#include "benchmark/benchmark.h"

#include "lzcnt.h"

static void BM_pfunc(benchmark::State& state) {
  auto word = state.range(0);
  for (auto _ : state) {
lzcnt_pfunc(word);
  }
}

static void BM_slow(benchmark::State& state) {
  auto word = state.range(0);
  for (auto _ : state) {
benchmark::DoNotOptimize(lzcnt_slow(word));
  }
}

static void BM_fast(benchmark::State& state) {
  auto word = state.range(0);
  for (auto _ : state) {
lzcnt_fast(word);
  }
}

BENCHMARK(BM_pfunc)->Arg(0)->Range(1, 64);
BENCHMARK(BM_slow)->Arg(0)->Range(1, 64);
BENCHMARK(BM_fast)->Arg(0)->Range(1, 64);

BENCHMARK_MAIN();

Re: Use compiler intrinsics for bit ops in hash

2020-03-02 Thread Jesse Zhang

Hi David,

On Wed, Feb 26, 2020 at 9:56 PM David Fetter  wrote:
>
> On Wed, Feb 26, 2020 at 09:12:24AM +0100, David Fetter wrote:
> > On Fri, Jan 31, 2020 at 04:59:18PM +0100, David Fetter wrote:
> > > On Wed, Jan 15, 2020 at 03:45:12PM -0800, Jesse Zhang wrote:
> > > > On Tue, Jan 14, 2020 at 2:09 PM David Fetter  wrote:
> > > > > > The changes in hash AM and SIMPLEHASH do look like a net positive
> > > > > > improvement. My biggest cringe might be in pg_bitutils:
> > > > > >
> > > > > > 1. Is ceil_log2_64 dead code?
> > > > >
> > > > > Let's call it nascent code. I suspect there are places it could go, if
> > > > > I look for them.  Also, it seemed silly to have one without the other.
> > > > >
> > > >
> > > > While not absolutely required, I'd like us to find at least one
> > > > place and start using it. (Clang also nags at me when we have
> > > > unused functions).
> > >
> > > Done in the expanded patches attached.
I see that you've found use of it in dynahash, thanks!

The math in the new (from v4 to v6) patch is wrong: it yields
ceil_log2(1) = 1 or next_power_of_2(1) = 2. I can see that you lifted
the restriction of "num greater than one" for ceil_log2() in this patch
set, but it's now _more_ problematic to base those functions on
pg_leftmost_one_pos().

I'm not comfortable with your changes to pg_leftmost_one_pos() to remove
the restriction on word being non-zero. Specifically
pg_leftmost_one_pos() is made to return 0 on 0 input. While none of its
current callers (in HEAD) is harmed, this introduces muddy semantics:

1. pg_leftmost_one_pos is semantically undefined on 0 input: scanning
for a set bit in a zero word won't find it anywhere.

2. we can _try_ generalizing it to accommodate ceil_log2 by
extrapolating based on the invariant that BSR + LZCNT = 31 (or 63). In
that case, the extrapolation yields -1 for pg_leftmost_one_pos(0).

I'm not convinced that others on the list will be comfortable with the
generalization suggested in 2 above.

I've quickly put together a PoC patch on top of yours, which
re-implements ceil_log2 using LZCNT coupled with a CPUID check.
Thoughts?

Cheers,
Jesse
From 0e4392d77b6132a508b7da14871cc99066a9d114 Mon Sep 17 00:00:00 2001
From: Jesse Zhang 
Date: Fri, 28 Feb 2020 16:22:04 -0800
Subject: [PATCH] Use LZCOUNT when possible

This patch reverts the changes to pg_leftmost_one and friends (which is
really bit-scan-reverse, and is legitimately undefined one zero) and
reworks ceil_log2 and friends to use a count-leading-zeros operation
that is well-defined on zero.
---
 src/include/port/pg_bitutils.h | 47 -
 src/port/pg_bitutils.c | 77 ++
 2 files changed, 104 insertions(+), 20 deletions(-)

diff --git a/src/include/port/pg_bitutils.h b/src/include/port/pg_bitutils.h
index 88a9ea5b7fb..b4d5724ee1d 100644
--- a/src/include/port/pg_bitutils.h
+++ b/src/include/port/pg_bitutils.h
@@ -20,18 +20,19 @@ extern PGDLLIMPORT const uint8 pg_number_of_ones[256];
 /*
  * pg_leftmost_one_pos32
  *		Returns the position of the most significant set bit in "word",
- *		measured from the least significant bit.
+ *		measured from the least significant bit.  word must not be 0.
  */
 static inline int
 pg_leftmost_one_pos32(uint32 word)
 {
 #ifdef HAVE__BUILTIN_CLZ
-	return word ? 31 - __builtin_clz(word) : 0;
+	Assert(word != 0);
+
+	return 31 - __builtin_clz(word);
 #else
 	int			shift = 32 - 8;
 
-	if (word == 0)
-		return 0;
+	Assert(word != 0);
 
 	while ((word >> shift) == 0)
 		shift -= 8;
@@ -48,18 +49,19 @@ static inline int
 pg_leftmost_one_pos64(uint64 word)
 {
 #ifdef HAVE__BUILTIN_CLZ
+	Assert(word != 0);
+
 #if defined(HAVE_LONG_INT_64)
-	return word ? 63 - __builtin_clzl(word) : 0;
+	return 63 - __builtin_clzl(word);
 #elif defined(HAVE_LONG_LONG_INT_64)
-	return word ? 63 - __builtin_clzll(word) : 0;
+	return 63 - __builtin_clzll(word);
 #else
 #error must have a working 64-bit integer datatype
 #endif
 #else			/* !HAVE__BUILTIN_CLZ */
 	int			shift = 64 - 8;
 
-	if (word == 0)
-		return 0;
+	Assert(word != 0);
 
 	while ((word >> shift) == 0)
 		shift -= 8;
@@ -71,18 +73,19 @@ pg_leftmost_one_pos64(uint64 word)
 /*
  * pg_rightmost_one_pos32
  *		Returns the position of the least significant set bit in "word",
- *		measured from the least significant bit.
+ *		measured from the least significant bit.  word must not be 0.
  */
 static inline int
 pg_rightmost_one_pos32(uint32 word)
 {
 #ifdef HAVE__BUILTIN_CTZ
-	return word ? __builtin_ctz(word) : 32;
+	Assert(word != 0);
+
+	return __builtin_ctz(word);
 #else
 	int			result = 0;
 
-	if (word == 0)
-		return 32;
+	Assert(word != 0);
 
 	while ((word & 255) == 0)
 	{
@@ -102,

Re: Resolving the python 2 -> python 3 mess

2020-02-17 Thread Jesse Zhang

Hi Tom,

I really like the "stub .so" idea, but feel pretty uncomfortable for the
"transparent" upgrade. Response inlined.

On Mon, Feb 17, 2020 at 8:49 AM Tom Lane  wrote:
>
> 2. On platforms where Python 2.x is no longer supported, transparently
> map plpythonu and plpython2u to plpython3u.  "Transparent" meaning that
> dump/reload or pg_upgrade of existing plpythonu/plpython2u functions
> will work, but when you run them, what you actually get is Python 3.x.

It's fair enough that plpythonu changes its meaning, people who really
want the stability should explicitly use plpython2u.

>
> For existing functions that don't use any obsolete Python syntax
> (which one would hope is a pretty large percentage), this is a
> zero-effort conversion for users.  If a function does use obsolete
> constructs, it will get a parse failure when executed, and the user
> will have to update it to Python 3 syntax.  I propose that we make
> that case reasonably painless by providing the conversion script
> I posted in [3] (or another one if somebody's got a better one),
> bundled as a separately-installable extension.
>
> A possible gotcha in this approach is if there are any python 2/3
> incompatibilities that would not manifest as syntax errors or
> obvious runtime errors, but would allow old code to execute and
> silently do the wrong thing.  One would hope that the Python crowd
> weren't dumb enough to do that, but I don't know whether it's true.
> If there are nasty cases like that, maybe what we have to do is allow
> plpythonu/plpython2u functions to be dumped and reloaded into a
> python-3-only install, but refuse to execute them until they've
> been converted.

"True division", one of the very first (2011, awww) few breaking changes
introduced in Python 3 [1], comes to mind. While it's not the worst
incompatibilities between Python 2 and 3, it's bad enough to give pause
to the notion that a successful parsing implies successful conversion.

[1] https://www.python.org/dev/peps/pep-0238/

Cheers,
Jesse

Re: Parallel grouping sets

2020-02-03 Thread Jesse Zhang

On Mon, Feb 3, 2020 at 12:07 AM Richard Guo  wrote:
>
> Hi Jesse,
>
> Thanks for reviewing these two patches.
I enjoyed it!

>
> On Sat, Jan 25, 2020 at 6:52 AM Jesse Zhang  wrote:
>>
>>
>> I glanced over both patches. Just the opposite, I have a hunch that v3
>> is always better than v5. Here's my 6-minute understanding of both.
>>
>> v3 ("the one with grouping set id") really turns the plan from a tree to
>> a multiplexed pipe: we can execute grouping aggregate on the workers,
>> but only partially. When we emit the trans values, also tag the tuple
>> with a group id. After gather, finalize the aggregates with a modified
>> grouping aggregate. Unlike a non-split grouping aggregate, the finalize
>> grouping aggregate does not "flow" the results from one rollup to the
>> next one. Instead, each group only advances on partial inputs tagged for
>> the group.
>>
>
> Yes, this is what v3 patch does.
>
> We (Pengzhou and I) had an offline discussion on this plan and we have
> some other idea.  Since we have tagged 'GroupingSetId' for each tuple
> produced by partial aggregate, why not then perform a normal grouping
> sets aggregation in the final phase, with the 'GroupingSetId' included
> in the group keys?  The plan looks like:
>
> # explain (costs off, verbose)
> select c1, c2, c3, avg(c3) from gstest group by grouping 
> sets((c1,c2),(c1),(c2,c3));
> QUERY PLAN
> --
>  Finalize GroupAggregate
>Output: c1, c2, c3, avg(c3)
>Group Key: (gset_id), gstest.c1, gstest.c2, gstest.c3
>->  Sort
>  Output: c1, c2, c3, (gset_id), (PARTIAL avg(c3))
>  Sort Key: (gset_id), gstest.c1, gstest.c2, gstest.c3
>  ->  Gather
>Output: c1, c2, c3, (gset_id), (PARTIAL avg(c3))
>Workers Planned: 4
>->  Partial HashAggregate
>  Output: c1, c2, c3, gset_id, PARTIAL avg(c3)
>  Hash Key: gstest.c1, gstest.c2
>  Hash Key: gstest.c1
>  Hash Key: gstest.c2, gstest.c3
>  ->  Parallel Seq Scan on public.gstest
>Output: c1, c2, c3
>
> This plan should be able to give the correct results. We are still
> thinking if it is a better plan than the 'multiplexed pipe' plan as in
> v3. Inputs of thoughts here would be appreciated.

Ha, I believe you meant to say a "normal aggregate", because what's
performed above gather is no longer "grouping sets", right?

The group key idea is clever in that it helps "discriminate" tuples by
their grouping set id. I haven't completely thought this through, but my
hunch is that this leaves some money on the table, for example, won't it
also lead to more expensive (and unnecessary) sorting and hashing? The
groupings with a few partials are now sharing the same tuplesort with
the groupings with a lot of groups even though we only want to tell
grouping 1 *apart from* grouping 10, not neccessarily that grouping 1
needs to come before grouping 10. That's why I like the multiplexed pipe
/ "dispatched by grouping set id" idea: we only pay for sorting (or
hashing) within each grouping. That said, I'm open to the criticism that
keeping multiple tuplesort and agg hash tabes running is expensive in
itself, memory-wise ...

Cheers,
Jesse

Re: Parallel grouping sets

2020-01-24 Thread Jesse Zhang

On Thu, Jan 23, 2020 at 2:47 AM Amit Kapila  wrote:
>
> On Sun, Jan 19, 2020 at 2:23 PM Richard Guo  wrote:
> >
> > I realized that there are two patches in this thread that are
> > implemented according to different methods, which causes confusion.
> >
>
> Both the idea seems to be different.  Is the second approach [1]
> inferior for any case as compared to the first approach?  Can we keep
> both approaches for parallel grouping sets, if so how?  If not, then
> won't the code by the first approach be useless once we commit second
> approach?
>
>
> [1] - 
> https://www.postgresql.org/message-id/CAN_9JTwtTTnxhbr5AHuqVcriz3HxvPpx1JWE--DCSdJYuHrLtA%40mail.gmail.com
>

I glanced over both patches. Just the opposite, I have a hunch that v3
is always better than v5. Here's my 6-minute understanding of both.

v5 (the one with a simple partial aggregate) works by pushing a little
bit of partial aggregate onto workers, and perform grouping aggregate
above gather. This has two interesting outcomes: we can execute
unmodified partial aggregate on the workers, and execute almost
unmodified rollup aggreegate once the trans values are gathered. A
parallel plan for a query like

SELECT count(*) FROM foo GROUP BY GROUPING SETS (a), (b), (c), ();

can be

Finalize GroupAggregate
  Output: count(*)
  Group Key: a
  Group Key: b
  Group Key: c
  Group Key: ()
Gather Merge
Partial GroupAggregate
  Output: PARTIAL count(*)
  Group Key: a, b, c
Sort
  Sort Key: a, b, c
Parallel Seq Scan on foo

v3 ("the one with grouping set id") really turns the plan from a tree to
a multiplexed pipe: we can execute grouping aggregate on the workers,
but only partially. When we emit the trans values, also tag the tuple
with a group id. After gather, finalize the aggregates with a modified
grouping aggregate. Unlike a non-split grouping aggregate, the finalize
grouping aggregate does not "flow" the results from one rollup to the
next one. Instead, each group only advances on partial inputs tagged for
the group.

Finalize HashAggregate
  Output: count(*)
  Dispatched by: (GroupingSetID())
  Group Key: a
  Group Key: b
  Group Key: c
Gather
Partial GroupAggregate
  Output: PARTIAL count(*), GroupingSetID()
  Group Key: a
  Sort Key: b
  Group Key: b
  Sort Key: c
  Group Key: c
Sort
  Sort Key: a
Parallel Seq Scan on foo

Note that for the first approach to be viable, the partial aggregate
*has to* use a group key that's the union of all grouping sets. In cases
where individual columns have a low cardinality but joint cardinality is
high (say columns a, b, c each has 16 distinct values, but they are
independent, so there are 4096 distinct values on (a,b,c)), this results
in fairly high traffic through the shm tuple queue.

Cheers,
Jesse

Re: Use compiler intrinsics for bit ops in hash

2020-01-15 Thread Jesse Zhang

On Tue, Jan 14, 2020 at 2:09 PM David Fetter  wrote:
> > The changes in hash AM and SIMPLEHASH do look like a net positive
> > improvement. My biggest cringe might be in pg_bitutils:
> >
> > 1. Is ceil_log2_64 dead code?
>
> Let's call it nascent code. I suspect there are places it could go, if
> I look for them.  Also, it seemed silly to have one without the other.
>

While not absolutely required, I'd like us to find at least one
place and start using it. (Clang also nags at me when we have
unused functions).

> On Tue, Jan 14, 2020 at 12:21:41PM -0800, Jesse Zhang wrote:
> > 4. It seems like you *really* would like an operation like LZCNT in x86
> > (first appearing in Haswell) that is well defined on zero input. ISTM
> > the alternatives are:
> >
> >a) Special case 1. That seems straightforward, but the branching cost
> >on a seemingly unlikely condition seems to be a lot of performance
> >loss
> >
> >b) Use architecture specific intrinsic (and possibly with CPUID
> >shenanigans) like __builtin_ia32_lzcnt_u64 on x86 and use the CLZ
> >intrinsic elsewhere. The CLZ GCC intrinsic seems to map to
> >instructions that are well defined on zero in most ISA's other than
> >x86, so maybe we can get away with special-casing x86?

i. We can detect LZCNT instruction by checking one of the
"extended feature" (EAX=8001) bits using CPUID. Unlike the
"basic features" (EAX=1), extended feature flags have been more
vendor-specific, but fortunately it seems that the feature bit
for LZCNT is the same [1][2].

ii. We'll most likely still need to provide a fallback
implementation for processors that don't have LZCNT (either
because they are from a different vendor, or an older Intel/AMD
processor). I wonder if simply checking for 1 is "good enough".
Maybe a micro benchmark is in order?

> Is there some way to tilt the tools so that this happens?
We have a couple options here:

1. Use a separate object (a la our SSE 4.2 implemenation of
CRC). On Clang and GCC (I don't have MSVC at hand), -mabm or
-mlzcnt should cause __builtin_clz to generate the LZCNT
instruction, which is well defined on zero input. The default
configuration would translate __builtin_clz to code that
subtracts BSR from the width of the input, but BSR leaves the
destination undefined on zero input.

2. (My least favorite) use inline asm (a la our popcount
implementation).

> b) seems much more attractive. Is there some way to tilt the tools so
> that this happens? What should I be reading up on?

The enclosed references hopefully are good places to start. Let
me know if you have more ideas.

Cheers,
Jesse


References:

[1] "How to detect New Instruction support in the 4th generation Intel®
Core™ processor family"
https://software.intel.com/en-us/articles/how-to-detect-new-instruction-support-in-the-4th-generation-intel-core-processor-family
[2] "Bit Manipulation Instruction Sets"
https://en.wikipedia.org/wiki/Bit_Manipulation_Instruction_Sets

Re: Use compiler intrinsics for bit ops in hash

2020-01-14 Thread Jesse Zhang

Hi David,

On Tue, Jan 14, 2020 at 9:36 AM David Fetter  wrote:
>
> Folks,
>
> The recent patch for distinct windowing aggregates contained a partial
> fix of the FIXME that didn't seem entirely right, so I extracted that
> part, changed it to use compiler intrinsics, and submit it here.

The changes in hash AM and SIMPLEHASH do look like a net positive
improvement. My biggest cringe might be in pg_bitutils:

> diff --git a/src/include/port/pg_bitutils.h b/src/include/port/pg_bitutils.h
> index 498e532308..cc9338da25 100644
> --- a/src/include/port/pg_bitutils.h
> +++ b/src/include/port/pg_bitutils.h
> @@ -145,4 +145,32 @@ pg_rotate_right32(uint32 word, int n)
>  return (word >> n) | (word << (sizeof(word) * BITS_PER_BYTE - n));
>  }
>
> +/* ceil(lg2(num)) */
> +static inline uint32
> +ceil_log2_32(uint32 num)
> +{
> + return pg_leftmost_one_pos32(num-1) + 1;
> +}
> +
> +static inline uint64
> +ceil_log2_64(uint64 num)
> +{
> + return pg_leftmost_one_pos64(num-1) + 1;
> +}
> +
> +/* calculate first power of 2 >= num
> + * per https://graphics.stanford.edu/~seander/bithacks.html#RoundUpPowerOf2
> + * using BSR where available */
> +static inline uint32
> +next_power_of_2_32(uint32 num)
> +{
> + return ((uint32) 1) << (pg_leftmost_one_pos32(num-1) + 1);
> +}
> +
> +static inline uint64
> +next_power_of_2_64(uint64 num)
> +{
> + return ((uint64) 1) << (pg_leftmost_one_pos64(num-1) + 1);
> +}
> +
>  #endif /* PG_BITUTILS_H */
>

1. Is ceil_log2_64 dead code?

2. The new utilities added here (ceil_log2_32 and company,
next_power_of_2_32 and company) all require num > 1, but don't clearly
Assert (or at the very least document) so.

3. A couple of the callers can actively pass in an argument of 1, e.g.
from _hash_spareindex in hashutil.c, while some other callers are iffy
at best (simplehash.h maybe?)

4. It seems like you *really* would like an operation like LZCNT in x86
(first appearing in Haswell) that is well defined on zero input. ISTM
the alternatives are:

   a) Special case 1. That seems straightforward, but the branching cost
   on a seemingly unlikely condition seems to be a lot of performance
   loss

   b) Use architecture specific intrinsic (and possibly with CPUID
   shenanigans) like __builtin_ia32_lzcnt_u64 on x86 and use the CLZ
   intrinsic elsewhere. The CLZ GCC intrinsic seems to map to
   instructions that are well defined on zero in most ISA's other than
   x86, so maybe we can get away with special-casing x86?

Cheers,
Jesse

Re: C testing for Postgres

2019-06-28 Thread Jesse Zhang

On Fri, Jun 28, 2019 at 10:37 AM Adam Berlin  wrote:
>
> If we were to use this tool, would the community want to vendor the
> framework in the Postgres repository, or keep it in a separate
> repository that produces a versioned shared library?
>

If the library is going to actively evolve, we should bring it into the
tree. For a project like this, a "versioned shared library" is a massive
pain in the rear for both the consumer of such libraries and for their
maintainers.

Cheers,
Jesse

Re: A small note on the portability of cmake

2019-01-19 Thread Jesse Zhang

On Sat, Jan 19, 2019 at 8:50 AM Tom Lane  wrote:
>
> Failed miserably.  It turns out cmake has a hard dependency on libuv
> which (a) has a hard dependency on atomic ops and (b) according to its
> own docs, doesn't really care about any platforms other than
> Linux/macOS/Windows and maybe FreeBSD.>

I beg to disagree with point b above: the libuv project continually
receives (and accepts) patches fixing bugs specifically for OpenBSD.
Atomic ops (compare-and-exchange) might be a harder dependency to shed
for libuv. Does the fallback onto compiler intrinsics
(__sync_val_compare_and_swap, or on GCC 4.7+,
__atomic_compare_exchange_n) not work here?

Jesse

Re: [PATCH] Fix Proposal - Deadlock Issue in Single User Mode When IO Failure Occurs

2018-12-01 Thread Jesse Zhang

Hey Chengyu,
How did you set up your GDB to get "p" to pretty-print a Postgres list?

Cheers,
Jesse

On Fri, Nov 30, 2018 at 1:00 PM Chengchao Yu  wrote:

> Greetings,
>
>
>
> Recently, we hit a few occurrences of deadlock when IO failure (including
> disk full, random remote disk IO failures) happens in single user mode. We
> found the issue exists on both Linux and Windows in multiple postgres
> versions.
>
>
>
> Here are the steps to repro on Linux (as Windows repro is similar):
>
>
>
> 1.   Get latest PostgreSQL code, build and install the executables.
>
>
>
> $ git clone https://git.postgresql.org/git/postgresql.git
>
> $ cd postgresql
>
> $ PGROOT=$(pwd)
>
> $ git checkout REL_11_STABLE
>
> $ mkdir build
>
> $ cd build
>
> $ ../configure --prefix=/path/to/postgres
>
> $ make && make install
>
>
>
> 2.   Run initdb to initialize a PG database folder.
>
>
>
> $ /path/to/postgres/bin/initdb -D /path/to/data
>
>
>
> 3.   Because the unable to write relation data scenario is difficult
> to hit naturally even reserved space is turned off, I have prepared a small
> patch (see attachment “emulate-error.patch”) to force an error when PG
> tries to write data to relation files. We can just apply the patch and
> there is no need to put efforts flooding data to disk any more.
>
>
>
> $ cd $PGROOT
>
> $ git apply /path/to/emulate-error.patch
>
> $ cd build
>
> $ make && make install
>
>
>
> 4.   Connect to the newly initialized database cluster with single
> user mode, create a table, and insert some data to the table, do a
> checkpoint or directly give EOF. Then we hit the deadlock issue and the
> process will not exit until we kill it.
>
>
>
> Do a checkpoint explicitly:
>
>
>
> $ /path/to/postgres/bin/postgres --single -D /path/to/data/ postgres -c
> exit_on_error=true <
> > create table t1(a int);
>
> > insert into t1 values (1), (2), (3);
>
> > checkpoint;
>
> > EOF
>
>
>
> PostgreSQL stand-alone backend 11.1
>
> backend> backend> backend> 2018-11-29 02:45:27.891 UTC [18806] FATAL:
> Emulate exception in mdwrite() when writing to disk
>
> 2018-11-29 02:55:27.891 UTC [18806] CONTEXT:  writing block 8 of relation
> base/12368/1247
>
> 2018-11-29 02:55:27.891 UTC [18806] STATEMENT:  checkpoint;
>
>
>
> 2018-11-29 02:55:27.900 UTC [18806] FATAL:  Emulate exception in mdwrite()
> when writing to disk
>
> 2018-11-29 02:55:27.900 UTC [18806] CONTEXT:  writing block 8 of relation
> base/12368/1247
>
>
>
> Or directly give an EOF:
>
>
>
> $ /path/to/postgres/bin/postgres --single -D /path/to/data/ postgres -c
> exit_on_error=true <
> > create table t1(a int);
>
> > insert into t1 values (1), (2), (3);
>
> > EOF
>
>
>
> PostgreSQL stand-alone backend 11.1
>
> backend> backend> backend> 2018-11-29 02:55:24.438 UTC [18149] FATAL:
> Emulate exception in mdwrite() when writing to disk
>
> 2018-11-29 02:45:24.438 UTC [18149] CONTEXT:  writing block 8 of relation
> base/12368/1247
>
>
>
> 5.   Moreover, when we try to recover the database with single user
> mode, we hit the issue again, and the process does not bring up the
> database nor exit.
>
>
>
> $ /path/to/postgres/bin/postgres --single -D /path/to/data/ postgres -c
> exit_on_error=true
>
> 2018-11-29 02:59:33.257 UTC [19058] LOG:  database system shutdown was
> interrupted; last known up at 2018-11-29 02:58:49 UTC
>
> 2018-11-29 02:59:33.485 UTC [19058] LOG:  database system was not properly
> shut down; automatic recovery in progress
>
> 2018-11-29 02:59:33.500 UTC [19058] LOG:  redo starts at 0/1672E40
>
> 2018-11-29 02:59:33.500 UTC [19058] LOG:  invalid record length at
> 0/1684B90: wanted 24, got 0
>
> 2018-11-29 02:59:33.500 UTC [19058] LOG:  redo done at 0/1684B68
>
> 2018-11-29 02:59:33.500 UTC [19058] LOG:  last completed transaction was
> at log time 2018-11-29 02:58:49.856663+00
>
> 2018-11-29 02:59:33.547 UTC [19058] FATAL:  Emulate exception in mdwrite()
> when writing to disk
>
> 2018-11-29 02:59:33.547 UTC [19058] CONTEXT:  writing block 8 of relation
> base/12368/1247
>
>
>
> Analyses:
>
>
>
> So, what happened? Actually, there are 2 types of the deadlock due to the
> same root cause. Let’s first take a look at the scenario in step #5. In
> this scenario, the deadlock happens when disk IO failure occurs inside
> StartupXLOG(). If we attach debugger to PG process, we will see the
> process is stuck acquiring the buffer’s lw-lock in AbortBufferIO().
>
>
>
> void
>
> AbortBufferIO(void)
>
> {
>
> BufferDesc *buf = InProgressBuf;
>
>
>
> if (buf)
>
> {
>
> uint32  buf_state;
>
>
>
> /*
>
>  * Since LWLockReleaseAll has already been called, we're not
> holding
>
>  * the buffer's io_in_progress_lock. We have to re-acquire it so
> that
>
>  * we can use TerminateBufferIO. Anyone who's executing WaitIO on
> the
>
>  * buffer will be in a busy spin until we succeed in doing this.
>
>  */
>
> LWLockAcquire(BufferDescriptorGetIOLock(buf), LW_EXCLUSIVE);
>
>
>
>

39 matches

Mail list logo