Re: Problem while setting the fpw with SIGHUP

2018-09-26 Thread Amit Kapila
On Thu, Sep 27, 2018 at 10:34 AM Michael Paquier  wrote:
>
> On Thu, Sep 27, 2018 at 10:03:59AM +0530, Amit Kapila wrote:
> > I think, in this case, it might be advisable to just fix the problem
> > (a) which is what has been reported originally in the thread and
> > AFAICS, the fix for that is clear as compared to the problem (b).  If
> > you agree, then we can discuss what is the best fix for the first
> > problem (a).
>
> Okay, thanks for the input.  The fix for (a) would be in my opinion to
> just move the call to RecoveryInProgress() out of the critical section,
> then save the result into a variable, and use the variable within the
> critical section to avoid the potential palloc() problems.  What do you
> think?
>

Your proposed solution makes sense to me.  IIUC, this is quite similar
to what Dilip has also proposed [1].

[1] - 
https://www.postgresql.org/message-id/CAFiTN-u4BA8KXcQUWDPNgaKAjDXC%3DC2whnzBM8TAcv%3DstckYUw%40mail.gmail.com

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: BUG #15307: Low numerical precision of (Co-) Variance

2018-09-26 Thread Madeleine Thompson
By the way, I did not see the discussion thread or notice that an author of the 
paper I mentioned and the reporter of the bug were the same person until after 
I wrote the review. Oops.

Re: BUG #15307: Low numerical precision of (Co-) Variance

2018-09-26 Thread Madeleine Thompson
The following review has been posted through the commitfest application:
make installcheck-world:  tested, passed
Implements feature:   tested, passed
Spec compliant:   not tested
Documentation:not tested

This is my first PostgreSQL review. Hopefully I'm getting it right. I 
independently ran into the bug in question and found that the author had 
already written a patch. I'm happy the author wrote this.

(1) I am not experienced with PostgreSQL internals, so I can't comment on the 
coding style or usage of internal functions.

 
(2) The patch applies cleanly to ce4887bd025b95c7b455fefd817a418844c6aad3. 
"make check", "make check-world", and "make install" pass.

(3) The patch addresses the numeric inaccuracy reported in the bug. (Yay!) I 
believe the tests are sufficient to confirm that it works as intended. I don't 
think the test coverage *before* this patch was sufficient, and I appreciate 
the improved test coverage of the combine functions. I verified the new tests 
manually.

(4) The comments cite Youngs & Cramer (1971). This is a dated citation. It 
justifies its algorithms based on pre-IEEE754 single-precision arithmetic.  
Most notably, it assumes that floating-point division is not exact. That said, 
the algorithm is still a good choice; it is justified now because compared to 
the standard Welford variance, it is less prone to causing pipeline stalls on a 
modern CPU. Maybe link to Schubert & Gentz 2018 
(https://dl.acm.org/citation.cfm?id=3223036) instead. The new float8_combine 
combine code is (almost) S eqn. 22; the float8_accum code is S eqn. 28.  
float8_regr_accum and float8_regr_combine are S eqn. 22.

(5) I think the comment /* Watch out for roundoff error producing a negative 
variance */ and associated check are obsolete now.

The new status of this patch is: Waiting on Author


Re: Let's stop with the retail rebuilds of src/port/ files already

2018-09-26 Thread Michael Paquier
On Wed, Sep 26, 2018 at 09:24:48PM -0700, Andres Freund wrote:
> On September 26, 2018 9:03:05 PM PDT, Tom Lane  wrote:
>> What I think would make sense is to push this and see what the
>> buildfarm thinks of it.  If there are unfixable problems then
>> we won't have wasted time fleshing out the concept.  Otherwise,
>> I'll do the remaining pieces.
> 
> Sounds reasonable to me.

 # libpgport is needed by some contrib
+# currently we don't install libpgport_shlib.a, maybe we should?

Likely you should as this could be used directly by out-of-core things.
--
Michael


signature.asc
Description: PGP signature


Re: Add RESPECT/IGNORE NULLS and FROM FIRST/LAST options

2018-09-26 Thread Andrew Gierth
> "Krasiyan" == Krasiyan Andreev  writes:

 Krasiyan> I am using last version from more than two months ago in
 Krasiyan> production environment with real data and I didn't find any
 Krasiyan> bugs, so I'm marking this patch as ready for committer in the
 Krasiyan> commitfest app.

Oliver (or anyone else), do you plan to continue working on this in the
immediate future, in line with the comments from myself and Tom in this
thread? If so I'll bump it to the next CF, otherwise I'll mark it
"returned with feedback".

I'm happy to help out with further work on this patch if needed, time
permitting.

-- 
Andrew (irc:RhodiumToad)



Re: Add RESPECT/IGNORE NULLS and FROM FIRST/LAST options

2018-09-26 Thread Andrew Gierth
> "Tom" == Tom Lane  writes:

 >> So I've tried to rough out a decision tree for the various options
 >> on how this might be implemented (discarding the "use precedence
 >> hacks" option). Opinions? Additions?

 Tom> I think it'd be worth at least drafting an implementation for the
 Tom> lexical-lookahead fix. I think it's likely that we'll need to
 Tom> extend base_yylex to do more lookahead in the future even if we
 Tom> don't do it for this, given the SQL committee's evident love for
 Tom> COBOL-ish syntax and lack of regard for what you can do in
 Tom> LALR(1).

That's not _quite_ a fair criticism of the SQL committee; they're not
ignoring the capabilities of parsers, they're just not making their
syntax robust in the presence of the kinds of extensions and
generalizations that we want to do.

Without exception that I know of, every time we've run into a problem
that needed ugly precedence rules or extra lookahead it's been caused by
us not reserving something that is reserved in the spec, or by us
allowing constructs that are not in the spec at all (postfix operators,
especially), or by us deliberately generalizing what the spec allows
(e.g. allowing full expressions where the spec only allows column
references, or allowing extra parens around subselects) or where we've
repurposed syntax from the spec in an incompatible way (as in
ANY(array)).

Anyway, for the time being I will mark this patch as "returned with
feedback".

 >> If the clauses are legal on all window functions, what to do about
 >> existing window functions for which the clauses do not make sense?

 Tom> Option 1: do nothing, document that nothing happens if w.f.
 Tom> doesn't implement it.

That was 1.1.1 on my list.

 Tom> Option 2: record whether the inquiry functions got called. At end
 Tom> of query, error out if they weren't and the options were used.

Erroring at the _end_ of the query seems a bit of a potential surprise.

 Tom> It's also worth wondering if we couldn't just implement the flags
 Tom> in some generic fashion and not need to involve the window
 Tom> functions at all.

That was what I meant by option 1.1.2 on my list.

 Tom> FROM LAST, for example, could and perhaps should be implemented by
 Tom> inverting the sort order.

Actually that can't work for reasons brought up in the recent discussion
of optimization of window function sorts: if you change the sort order
you potentially disturb the ordering of peer rows, and the spec requires
that an (nth_value(x,n) from last over w) and (otherfunc(x) over w) for
order-equivalent windows "w" must see the peer rows in the same order.

So FROM LAST really does have to keep the original sort order, and count
backwards from the end of the window.

 Tom> Possibly IGNORE NULLS could be implemented inside the
 Tom> WinGetFuncArgXXX functions? These behaviors might or might not
 Tom> make much sense with other window functions, but that doesn't seem
 Tom> like it's our problem.

That's about what I was thinking for option 1.1.2, yes.

-- 
Andrew (irc:RhodiumToad)



Re: Problem while setting the fpw with SIGHUP

2018-09-26 Thread Amit Kapila
On Wed, Sep 19, 2018 at 10:50 AM Michael Paquier  wrote:
> > Agreed. "If we need to do that in the start process," we need to
> > change the shared flag and issue FPW_CHANGE always when the
> > database state is different from configuration file, regardless
> > of what StartXLOG() did until the point.
>
> Definitely my mistake here.  Attached is a patch to show what I have in
> mind by making sure that the startup process generates a record even
> after switching full_page_writes when started normally.  This removes
> the condition based on InRecovery, and uses a new argument for
> UpdateFullPageWrites() instead.  Your test case,as well as what I do
> manually for testing, pass without triggering the assertion.
>

This will fix the previous problem reported by me but will lead to
some other behavior change.  If somebody toggles the full_page_writes
flag before crash recovery, then it will log the XLOG_FPW_CHANGE
record, but that was not the case without your patch.

> When I see your patch, I actually see the same kind of logic as what I
> propose which is summarized in two points, and that's a good thing:
> a) Avoid the allocation in the critical section.
> b) Avoid two processes to call UpdateFullPageWrites at the same time.

I think, in this case, it might be advisable to just fix the problem
(a) which is what has been reported originally in the thread and
AFAICS, the fix for that is clear as compared to the problem (b).  If
you agree, then we can discuss what is the best fix for the first
problem (a).

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: Let's stop with the retail rebuilds of src/port/ files already

2018-09-26 Thread Andres Freund



Hi,

On September 26, 2018 9:03:05 PM PDT, Tom Lane  wrote:
>Andres Freund  writes:
>> On 2018-09-26 19:10:40 -0400, Tom Lane wrote:
>>> I'm getting tired of having to make fixes like ce4887bd0.  I think
>>> we should rearrange things so that src/port/ and src/common/ compile
>>> all their files a third time using shared-library-friendly switches,
>>> put them into new .a files, and have libpq and the ecpg libraries
>>> just include those libraries instead of what they're doing now.
>
>> +1
>
>Here's a partial patch for that: it adds the third build variant
>to src/port/ and teaches libpq to use it.  We'd want to likewise
>modify src/common/ and fix up other callers such as ecpg, but this
>seems to be enough to test whether the idea works or not.
>
>I've tried this on Linux, macOS and HPUX and it seems to work in
>those cases, but I'm not foolish enough to imagine that that's
>exhaustive.
>
>What I think would make sense is to push this and see what the
>buildfarm thinks of it.  If there are unfixable problems then
>we won't have wasted time fleshing out the concept.  Otherwise,
>I'll do the remaining pieces.

Sounds reasonable to me.

Medium-long term I think we should consider trying to reduce the duplication 
tho.  Once we provide an elog and error handling wrapper, we really should be 
able to reduce duplication (code and build) a fair bit.  But that should be 
tackled separately.

Andres
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.



Re: Performance improvements for src/port/snprintf.c

2018-09-26 Thread Andres Freund



On September 26, 2018 9:04:08 PM PDT, Thomas Munro 
 wrote:
>On Thu, Sep 27, 2018 at 3:55 PM Andrew Gierth
> wrote:
>> > "Andres" == Andres Freund  writes:
>>  Andres> Hm, stb's results just for floating point isn't bad. The
>above
>>  Andres> numbers were for %f %f. But as the minimal usage would be
>about
>>  Andres> the internal usage of dopr(), here's comparing %.*f:
>>
>>  Andres> snprintf time = 1324.87 ms total, 0.000264975 ms per
>iteration
>>  Andres> pg time = 1434.57 ms total, 0.000286915 ms per iteration
>>  Andres> stbsp time = 552.14 ms total, 0.000110428 ms per iteration
>>
>> Hmm. We had a case recently on IRC where the performance of float8out
>> turned out to be the major bottleneck: a table of about 2.7 million
>rows
>> and ~70 float columns showed an overhead of ~66 seconds for doing
>COPY
>> as opposed to COPY BINARY (the actual problem report was that doing
>> "select * from table" from R was taking a minute+ longer than
>expected,
>> we got comparative timings for COPY just to narrow down causes).
>>
>> That translates to approx. 0.00035 ms overhead (i.e. time(float8out)
>-
>> time(float8send)) per conversion (Linux server, hardware unknown).
>>
>> That 66 seconds was the difference between 18s and 1m24s, so it
>wasn't a
>> small factor but totally dominated the query time.
>
>For perfect and cheap round trip to ASCII, not for human consumption,
>I wonder about the hexadecimal binary float literal format from C99
>(and showing up in other places too).

I'm not quite sure how we realistically would migrate to that though. Clients 
and their users won't understand it, and the more knowledgeable ones will 
already use the binary protocol.

Answers
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.



Re: Performance improvements for src/port/snprintf.c

2018-09-26 Thread Andres Freund



On September 26, 2018 8:53:27 PM PDT, Andrew Gierth 
 wrote:
>> "Andres" == Andres Freund  writes:
>
> Andres> Hm, stb's results just for floating point isn't bad. The above
>Andres> numbers were for %f %f. But as the minimal usage would be about
> Andres> the internal usage of dopr(), here's comparing %.*f:
>
> Andres> snprintf time = 1324.87 ms total, 0.000264975 ms per iteration
> Andres> pg time = 1434.57 ms total, 0.000286915 ms per iteration
> Andres> stbsp time = 552.14 ms total, 0.000110428 ms per iteration
>
>Hmm. We had a case recently on IRC where the performance of float8out
>turned out to be the major bottleneck: a table of about 2.7 million
>rows
>and ~70 float columns showed an overhead of ~66 seconds for doing COPY
>as opposed to COPY BINARY (the actual problem report was that doing
>"select * from table" from R was taking a minute+ longer than expected,
>we got comparative timings for COPY just to narrow down causes).
>
>That translates to approx. 0.00035 ms overhead (i.e. time(float8out) -
>time(float8send)) per conversion (Linux server, hardware unknown).

Sounds like it could be pretty precisely be the cost measured above. My 
laptop's a bit faster than most server CPUs and the test has perfect branch 
prediction...


>That 66 seconds was the difference between 18s and 1m24s, so it wasn't
>a
>small factor but totally dominated the query time.


Ugh.

Andres
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.



Re: Let's stop with the retail rebuilds of src/port/ files already

2018-09-26 Thread Tom Lane
Andres Freund  writes:
> On 2018-09-26 19:10:40 -0400, Tom Lane wrote:
>> I'm getting tired of having to make fixes like ce4887bd0.  I think
>> we should rearrange things so that src/port/ and src/common/ compile
>> all their files a third time using shared-library-friendly switches,
>> put them into new .a files, and have libpq and the ecpg libraries
>> just include those libraries instead of what they're doing now.

> +1

Here's a partial patch for that: it adds the third build variant
to src/port/ and teaches libpq to use it.  We'd want to likewise
modify src/common/ and fix up other callers such as ecpg, but this
seems to be enough to test whether the idea works or not.

I've tried this on Linux, macOS and HPUX and it seems to work in
those cases, but I'm not foolish enough to imagine that that's
exhaustive.

What I think would make sense is to push this and see what the
buildfarm thinks of it.  If there are unfixable problems then
we won't have wasted time fleshing out the concept.  Otherwise,
I'll do the remaining pieces.

regards, tom lane

diff --git a/src/interfaces/libpq/.gitignore b/src/interfaces/libpq/.gitignore
index ce1576e..8885e91 100644
*** a/src/interfaces/libpq/.gitignore
--- b/src/interfaces/libpq/.gitignore
***
*** 1,27 
  /exports.list
  /libpq.rc
  # .c files that are symlinked in from elsewhere
- /chklocale.c
- /crypt.c
- /erand48.c
- /getaddrinfo.c
- /getpeereid.c
- /inet_aton.c
- /inet_net_ntop.c
- /noblock.c
- /open.c
- /system.c
- /pgsleep.c
- /pg_strong_random.c
- /pgstrcasecmp.c
- /pqsignal.c
- /snprintf.c
- /strerror.c
- /strlcpy.c
- /strnlen.c
- /thread.c
- /win32error.c
- /win32setlocale.c
  /ip.c
  /md5.c
  /base64.c
--- 1,6 
diff --git a/src/interfaces/libpq/Makefile b/src/interfaces/libpq/Makefile
index ef8abaf..769c58b 100644
*** a/src/interfaces/libpq/Makefile
--- b/src/interfaces/libpq/Makefile
*** ifneq ($(PORTNAME), win32)
*** 24,50 
  override CFLAGS += $(PTHREAD_CFLAGS)
  endif
  
- # Need to recompile any external C files because we need
- # all object files to use the same compile flags as libpq; some
- # platforms require special flags.
- LIBS := $(LIBS:-lpgport=)
- 
  # We can't use Makefile variables here because the MSVC build system scrapes
  # OBJS from this file.
  OBJS=	fe-auth.o fe-auth-scram.o fe-connect.o fe-exec.o fe-misc.o fe-print.o fe-lobj.o \
  	fe-protocol2.o fe-protocol3.o pqexpbuffer.o fe-secure.o \
  	libpq-events.o
- # libpgport C files we always use
- OBJS += chklocale.o inet_net_ntop.o noblock.o pgstrcasecmp.o pqsignal.o \
- 	snprintf.o strerror.o thread.o
- # libpgport C files that are needed if identified by configure
- OBJS += $(filter crypt.o getaddrinfo.o getpeereid.o inet_aton.o open.o system.o strlcpy.o strnlen.o win32error.o win32setlocale.o, $(LIBOBJS))
- 
- ifeq ($(enable_strong_random), yes)
- OBJS += pg_strong_random.o
- else
- OBJS += erand48.o
- endif
  
  # src/backend/utils/mb
  OBJS += encnames.o wchar.o
--- 24,34 
*** override shlib = cyg$(NAME)$(DLSUFFIX)
*** 62,69 
  endif
  
  ifeq ($(PORTNAME), win32)
! # pgsleep.o is from libpgport
! OBJS += pgsleep.o win32.o libpqrc.o
  
  libpqrc.o: libpq.rc
  	$(WINDRES) -i $< -o $@
--- 46,52 
  endif
  
  ifeq ($(PORTNAME), win32)
! OBJS += win32.o libpqrc.o
  
  libpqrc.o: libpq.rc
  	$(WINDRES) -i $< -o $@
*** endif
*** 76,86 
  
  # Add libraries that libpq depends (or might depend) on into the
  # shared library link.  (The order in which you list them here doesn't
! # matter.)
  ifneq ($(PORTNAME), win32)
! SHLIB_LINK += $(filter -lcrypt -ldes -lcom_err -lcrypto -lk5crypto -lkrb5 -lgssapi_krb5 -lgss -lgssapi -lssl -lsocket -lnsl -lresolv -lintl -lm, $(LIBS)) $(LDAP_LIBS_FE) $(PTHREAD_LIBS)
  else
! SHLIB_LINK += $(filter -lcrypt -ldes -lcom_err -lcrypto -lk5crypto -lkrb5 -lgssapi32 -lssl -lsocket -lnsl -lresolv -lintl -lm $(PTHREAD_LIBS), $(LIBS)) $(LDAP_LIBS_FE)
  endif
  ifeq ($(PORTNAME), win32)
  SHLIB_LINK += -lshell32 -lws2_32 -lsecur32 $(filter -leay32 -lssleay32 -lcomerr32 -lkrb5_32, $(LIBS))
--- 59,70 
  
  # Add libraries that libpq depends (or might depend) on into the
  # shared library link.  (The order in which you list them here doesn't
! # matter.)  Note that we filter out -lpgport from LIBS and instead
! # insert -lpgport_shlib, to get port files that are built correctly.
  ifneq ($(PORTNAME), win32)
! SHLIB_LINK += -lpgport_shlib $(filter -lcrypt -ldes -lcom_err -lcrypto -lk5crypto -lkrb5 -lgssapi_krb5 -lgss -lgssapi -lssl -lsocket -lnsl -lresolv -lintl -lm, $(LIBS)) $(LDAP_LIBS_FE) $(PTHREAD_LIBS)
  else
! SHLIB_LINK += -lpgport_shlib $(filter -lcrypt -ldes -lcom_err -lcrypto -lk5crypto -lkrb5 -lgssapi32 -lssl -lsocket -lnsl -lresolv -lintl -lm $(PTHREAD_LIBS), $(LIBS)) $(LDAP_LIBS_FE)
  endif
  ifeq ($(PORTNAME), win32)
  SHLIB_LINK += -lshell32 -lws2_32 -lsecur32 $(filter -leay32 -lssleay32 -lcomerr32 -lkrb5_32, $(LIBS))
*** SHLIB_EXPORTS 

Re: Performance improvements for src/port/snprintf.c

2018-09-26 Thread Thomas Munro
On Thu, Sep 27, 2018 at 3:55 PM Andrew Gierth
 wrote:
> > "Andres" == Andres Freund  writes:
>  Andres> Hm, stb's results just for floating point isn't bad. The above
>  Andres> numbers were for %f %f. But as the minimal usage would be about
>  Andres> the internal usage of dopr(), here's comparing %.*f:
>
>  Andres> snprintf time = 1324.87 ms total, 0.000264975 ms per iteration
>  Andres> pg time = 1434.57 ms total, 0.000286915 ms per iteration
>  Andres> stbsp time = 552.14 ms total, 0.000110428 ms per iteration
>
> Hmm. We had a case recently on IRC where the performance of float8out
> turned out to be the major bottleneck: a table of about 2.7 million rows
> and ~70 float columns showed an overhead of ~66 seconds for doing COPY
> as opposed to COPY BINARY (the actual problem report was that doing
> "select * from table" from R was taking a minute+ longer than expected,
> we got comparative timings for COPY just to narrow down causes).
>
> That translates to approx. 0.00035 ms overhead (i.e. time(float8out) -
> time(float8send)) per conversion (Linux server, hardware unknown).
>
> That 66 seconds was the difference between 18s and 1m24s, so it wasn't a
> small factor but totally dominated the query time.

For perfect and cheap round trip to ASCII, not for human consumption,
I wonder about the hexadecimal binary float literal format from C99
(and showing up in other places too).

-- 
Thomas Munro
http://www.enterprisedb.com



Re: Performance improvements for src/port/snprintf.c

2018-09-26 Thread Andrew Gierth
> "Andres" == Andres Freund  writes:

 Andres> Hm, stb's results just for floating point isn't bad. The above
 Andres> numbers were for %f %f. But as the minimal usage would be about
 Andres> the internal usage of dopr(), here's comparing %.*f:

 Andres> snprintf time = 1324.87 ms total, 0.000264975 ms per iteration
 Andres> pg time = 1434.57 ms total, 0.000286915 ms per iteration
 Andres> stbsp time = 552.14 ms total, 0.000110428 ms per iteration

Hmm. We had a case recently on IRC where the performance of float8out
turned out to be the major bottleneck: a table of about 2.7 million rows
and ~70 float columns showed an overhead of ~66 seconds for doing COPY
as opposed to COPY BINARY (the actual problem report was that doing
"select * from table" from R was taking a minute+ longer than expected,
we got comparative timings for COPY just to narrow down causes).

That translates to approx. 0.00035 ms overhead (i.e. time(float8out) -
time(float8send)) per conversion (Linux server, hardware unknown).

That 66 seconds was the difference between 18s and 1m24s, so it wasn't a
small factor but totally dominated the query time.

-- 
Andrew (irc:RhodiumToad)



Re: Performance improvements for src/port/snprintf.c

2018-09-26 Thread Tom Lane
Andres Freund  writes:
> On 2018-09-26 21:44:41 -0400, Tom Lane wrote:
>> BTW, were you thinking of plugging in strfromd() inside snprintf.c,
>> or just invoking it directly from float[48]out?  The latter would
>> presumably be cheaper, and it'd solve the most pressing performance
>> problem, if not every problem.

> I wasn't actually seriously suggesting we should use strfromd, but I
> guess one way to deal with this would be to add a wrapper routine that
> could directly be called from float[48]out *and* from fmtfloat().

Yeah, something along that line occurred to me a bit later.

> Wonder
> if it'd be worthwhile to *not* pass that wrapper a format string, but
> instead pass the sprecision as an explicit argument.

Right, getting rid of the round trip to text for the precision seems
like a win.  I'm surprised that strfromd is defined the way it is and
not with something like (double val, char fmtcode, int precision, ...)

regards, tom lane



Performance of the partitioning in the large scale

2018-09-26 Thread Kato, Sho
Hi,

Thanks to your efforts the performance of the partitioned table has improved 
greatly.
Since I evaluated the performance by combining the performance improvement 
patches proposed in PG12, I share it.

The purpose of this evaluation is to organize performance issues of the table 
which is partitioned large number like over thousand and I want to bring the 
partitioned table performance close to the no partitioned table performance.

I used pgbench.
On the premise that tables don't have data and plan_cache_mode is auto.

- source:

master(38763d6778 Date:   Thu Sep 20 15:52:39 2018 +1200) + v9 patch[1] + v5 
patch[2] + v3 patch[3]

[1] Reduce partition tuple routing overheads
https://commitfest.postgresql.org/19/1690/

[2] Revise executor's handling of range table relations
https://commitfest.postgresql.org/19/1758/

[3] Speed up planning with partitions
https://commitfest.postgresql.org/19/1778/


- table definition:

When 6400 items of data is inserted to parent table, the each leaf partitions 
have the same number of items.
For example, the following DDL is the number of leaf partitions is a hundred.

create table test.accounts_history(id serial, aid int, delta int, mtime 
timestamp without time zone) partition by range(aid);

create table test.account_part_1 partition of test.accounts for values from (1) 
to (65);
create table test.account_part_2 partition of test.accounts for values from 
(65) to (129);
...
create table test.account_part_100 partition of test.accounts for values from 
(6337) to (6400);

create table test.ah_part_1 partition of test.accounts_history for values from 
(1) to (65);
create table test.ah_part_2 partition of test.accounts_history for values from 
(65) to (129);
...
create table test.ah_part_100 partition of test.accounts_history for values 
from (6337) to (6400);


- benchmark script:

I make SQL which only one leaf partition is targeted in each case of 
SELECT/UPDATE/DELETE/INSERT.

\set aid random(1, 6400)
\set delta random(-5000, 5000)

SELECT abalance FROM test.accounts WHERE aid = :aid;
  or
INSERT INTO test.accounts_history (aid, delta, mtime) VALUES (:aid, :delta, 
CURRENT_TIMESTAMP);
  or
UPDATE test.accounts SET abalance = abalance + :delta WHERE aid = :aid;
  or
DELETE FROM test.accounts where aid = :aid;


- results:

1. simple mode results:

part_num is the number of partition and 0 means no partitioned case.
tps_ex is tps of excluding connections establishing.

Also, after pgbench, I evaluate Planning time and Execution Time using explain 
analyze.
plan_time_avg and execute_time_avg are average of explain analyze when executed 
ten times.

pgbench -n -T 60 -r -f select.sql

part_num |   tps_ex| plan_time_avg | execute_time_avg
--+-+---+--
0 |  8285.83582 |0.0528 |   0.0222
  100 |   5948.1711 |0.1342 |   0.0306
  200 | 5436.438478 |  0.15 |   0.0298
  400 | 4523.867744 | 0.148 | 0.03
  800 | 3460.625739 |0.1447 |   0.0305
 1600 | 2428.795542 |0.1528 |   0.0303
 3200 | 1539.672214 |0.1552 |   0.0316
 6400 |  880.965232 |0.1704 |   0.0288
(8 rows)

pgbench -n -T 60 -r -f update.sql

part_num |   tps_ex| plan_time_avg | execute_time_avg
--+-+---+--
0 |  7360.58261 |0.0596 |   0.0417
  100 | 4633.880563 |0.1564 |0.105
  200 | 3972.737702 | 0.152 |   0.1007
  400 |  3000.23471 |0.1594 |   0.1039
  800 | 2139.676379 |0.1664 |   0.1055
 1600 | 1348.553673 | 0.165 |   0.1056
 3200 |   787.48559 |0.1774 |   0.1124
 6400 |  411.575671 |0.1823 |   0.1089
(8 rows)

pgbench -n -T 60 -r -f delete.sql

part_num |   tps_ex| plan_time_avg | execute_time_avg
--+-+---+--
0 |  8133.84019 | 0.057 |   0.0403
  100 | 5150.452458 |0.1398 |   0.0936
  200 |  4352.69018 |0.1414 |   0.0964
  400 |  3298.86364 |0.1459 |0.099
  800 | 2245.946178 |0.1559 |   0.1029
 1600 |  1386.92366 |0.1591 |   0.1048
 3200 |  802.024765 |0.1617 |   0.1042
 6400 |  407.214158 | 0.168 |   0.1087
(8 rows)

pgbench -n -T 60 -r -f insert.sql

part_num |   tps_ex| plan_time_avg | execute_time_avg
--+-+---+--
0 | 5246.142416 |0.0307 |   0.0601
  100 | 2190.331571 |0.0311 |   0.3587
  200 | 1452.601752 |0.0301 |   0.5065
  400 |  863.771879 | 0.031 |   0.7864
  800 |  482.528223 |0.0308 |

Re: Performance improvements for src/port/snprintf.c

2018-09-26 Thread Andres Freund
On 2018-09-26 21:44:41 -0400, Tom Lane wrote:
> Andres Freund  writes:
> > Reading around the interwebz lead me to look at ryu
> 
> > https://dl.acm.org/citation.cfm?id=3192369
> > https://github.com/ulfjack/ryu/tree/46f4c5572121a6f1428749fe3e24132c3626c946
> 
> > That's an algorithm that always generates the minimally sized
> > roundtrip-safe string output for a floating point number. That makes it
> > insuitable for the innards of printf, but it very well could be
> > interesting for e.g. float8out, especially when we currently specify a
> > "too high" precision to guarantee round-trip safeity.
> 
> Yeah, the whole business of round-trip safety is a bit worrisome.

Seems like using a better algorithm also has the potential to make the
output a bit smaller / more readable than what we currently produce.


> If we change printf, and it produces different low-order digits
> than before, will floats still round-trip correctly?  I think we
> have to ensure that they do.

Yea, I think that's an absolutely hard requirement.  It'd possibly be a
good idea to add an  assert that enforce that, although I'm not sure
it's worth the portability issues around crappy system libcs that do
randomly different things.


> BTW, were you thinking of plugging in strfromd() inside snprintf.c,
> or just invoking it directly from float[48]out?  The latter would
> presumably be cheaper, and it'd solve the most pressing performance
> problem, if not every problem.

I wasn't actually seriously suggesting we should use strfromd, but I
guess one way to deal with this would be to add a wrapper routine that
could directly be called from float[48]out *and* from fmtfloat(). Wonder
if it'd be worthwhile to *not* pass that wrapper a format string, but
instead pass the sprecision as an explicit argument.  Would make the use
in snprintf.c a bit more annoying (due to fFeEgG support), but probably
considerably simpler and faster if we ever reimplement that ourself.

Greetings,

Andres Freund



Re: Performance improvements for src/port/snprintf.c

2018-09-26 Thread Thomas Munro
On Thu, Sep 27, 2018 at 1:18 PM Andres Freund  wrote:
> On 2018-09-26 17:57:05 -0700, Andres Freund wrote:
> > snprintf time = 1324.87 ms total, 0.000264975 ms per iteration
> > pg time = 1434.57 ms total, 0.000286915 ms per iteration
> > stbsp time = 552.14 ms total, 0.000110428 ms per iteration
>
> Reading around the interwebz lead me to look at ryu
>
> https://dl.acm.org/citation.cfm?id=3192369
> https://github.com/ulfjack/ryu/tree/46f4c5572121a6f1428749fe3e24132c3626c946
>
> That's an algorithm that always generates the minimally sized
> roundtrip-safe string output for a floating point number. That makes it
> insuitable for the innards of printf, but it very well could be
> interesting for e.g. float8out, especially when we currently specify a
> "too high" precision to guarantee round-trip safeity.

Wow.  While all the algorithms have that round trip goal, they keep
doing it faster.  I was once interested in their speed for a work
problem, and looked into the 30 year old dragon4 and 8 year old grisu3
algorithms.  It's amazing to me that we have a new algorithm in 2018
for this ancient problem, and it claims to be 3 times faster than the
competition.  (Hah, I see that "ryū" is Japanese for dragon.  "Grisù"
is a dragon from an Italian TV series.)

-- 
Thomas Munro
http://www.enterprisedb.com



Re: Performance improvements for src/port/snprintf.c

2018-09-26 Thread Andres Freund
Hi,

On 2018-09-26 21:30:25 -0400, Tom Lane wrote:
> Here's a rebased version of <15785.1536776...@sss.pgh.pa.us>.
> 
> I think we should try to get this reviewed and committed before
> we worry more about the float business.  It would be silly to
> not be benchmarking any bigger changes against the low-hanging
> fruit here.

Yea, no arguments there.

I'll try to have a look tomorrow.

Greetings,

Andres Freund



Re: Performance improvements for src/port/snprintf.c

2018-09-26 Thread Tom Lane
Andres Freund  writes:
> Reading around the interwebz lead me to look at ryu

> https://dl.acm.org/citation.cfm?id=3192369
> https://github.com/ulfjack/ryu/tree/46f4c5572121a6f1428749fe3e24132c3626c946

> That's an algorithm that always generates the minimally sized
> roundtrip-safe string output for a floating point number. That makes it
> insuitable for the innards of printf, but it very well could be
> interesting for e.g. float8out, especially when we currently specify a
> "too high" precision to guarantee round-trip safeity.

Yeah, the whole business of round-trip safety is a bit worrisome.
If we change printf, and it produces different low-order digits
than before, will floats still round-trip correctly?  I think we
have to ensure that they do.  If we just use strfromd(), then it's
libc's problem if the results change ... but if we stick in some
code we got from elsewhere, it's our problem.

BTW, were you thinking of plugging in strfromd() inside snprintf.c,
or just invoking it directly from float[48]out?  The latter would
presumably be cheaper, and it'd solve the most pressing performance
problem, if not every problem.

regards, tom lane



Re: Performance improvements for src/port/snprintf.c

2018-09-26 Thread Tom Lane
Here's a rebased version of <15785.1536776...@sss.pgh.pa.us>.

I think we should try to get this reviewed and committed before
we worry more about the float business.  It would be silly to
not be benchmarking any bigger changes against the low-hanging
fruit here.

regards, tom lane

diff --git a/configure b/configure
index 6414ec1..0448c6b 100755
*** a/configure
--- b/configure
*** fi
*** 15100,15106 
  LIBS_including_readline="$LIBS"
  LIBS=`echo "$LIBS" | sed -e 's/-ledit//g' -e 's/-lreadline//g'`
  
! for ac_func in cbrt clock_gettime fdatasync getifaddrs getpeerucred getrlimit mbstowcs_l memmove poll posix_fallocate ppoll pstat pthread_is_threaded_np readlink setproctitle setproctitle_fast setsid shm_open symlink sync_file_range utime utimes wcstombs_l
  do :
as_ac_var=`$as_echo "ac_cv_func_$ac_func" | $as_tr_sh`
  ac_fn_c_check_func "$LINENO" "$ac_func" "$as_ac_var"
--- 15100,15106 
  LIBS_including_readline="$LIBS"
  LIBS=`echo "$LIBS" | sed -e 's/-ledit//g' -e 's/-lreadline//g'`
  
! for ac_func in cbrt clock_gettime fdatasync getifaddrs getpeerucred getrlimit mbstowcs_l memmove poll posix_fallocate ppoll pstat pthread_is_threaded_np readlink setproctitle setproctitle_fast setsid shm_open strchrnul symlink sync_file_range utime utimes wcstombs_l
  do :
as_ac_var=`$as_echo "ac_cv_func_$ac_func" | $as_tr_sh`
  ac_fn_c_check_func "$LINENO" "$ac_func" "$as_ac_var"
diff --git a/configure.in b/configure.in
index 158d5a1..23b5bb8 100644
*** a/configure.in
--- b/configure.in
*** PGAC_FUNC_WCSTOMBS_L
*** 1571,1577 
  LIBS_including_readline="$LIBS"
  LIBS=`echo "$LIBS" | sed -e 's/-ledit//g' -e 's/-lreadline//g'`
  
! AC_CHECK_FUNCS([cbrt clock_gettime fdatasync getifaddrs getpeerucred getrlimit mbstowcs_l memmove poll posix_fallocate ppoll pstat pthread_is_threaded_np readlink setproctitle setproctitle_fast setsid shm_open symlink sync_file_range utime utimes wcstombs_l])
  
  AC_REPLACE_FUNCS(fseeko)
  case $host_os in
--- 1571,1577 
  LIBS_including_readline="$LIBS"
  LIBS=`echo "$LIBS" | sed -e 's/-ledit//g' -e 's/-lreadline//g'`
  
! AC_CHECK_FUNCS([cbrt clock_gettime fdatasync getifaddrs getpeerucred getrlimit mbstowcs_l memmove poll posix_fallocate ppoll pstat pthread_is_threaded_np readlink setproctitle setproctitle_fast setsid shm_open strchrnul symlink sync_file_range utime utimes wcstombs_l])
  
  AC_REPLACE_FUNCS(fseeko)
  case $host_os in
diff --git a/src/include/pg_config.h.in b/src/include/pg_config.h.in
index 90dda8e..7894caa 100644
*** a/src/include/pg_config.h.in
--- b/src/include/pg_config.h.in
***
*** 523,528 
--- 523,531 
  /* Define to 1 if you have the  header file. */
  #undef HAVE_STDLIB_H
  
+ /* Define to 1 if you have the `strchrnul' function. */
+ #undef HAVE_STRCHRNUL
+ 
  /* Define to 1 if you have the `strerror_r' function. */
  #undef HAVE_STRERROR_R
  
diff --git a/src/include/pg_config.h.win32 b/src/include/pg_config.h.win32
index 93bb773..f7a051d 100644
*** a/src/include/pg_config.h.win32
--- b/src/include/pg_config.h.win32
***
*** 394,399 
--- 394,402 
  /* Define to 1 if you have the  header file. */
  #define HAVE_STDLIB_H 1
  
+ /* Define to 1 if you have the `strchrnul' function. */
+ /* #undef HAVE_STRCHRNUL */
+ 
  /* Define to 1 if you have the `strerror_r' function. */
  /* #undef HAVE_STRERROR_R */
  
diff --git a/src/port/snprintf.c b/src/port/snprintf.c
index 2c77eec..1469878 100644
*** a/src/port/snprintf.c
--- b/src/port/snprintf.c
*** flushbuffer(PrintfTarget *target)
*** 310,316 
  }
  
  
! static void fmtstr(char *value, int leftjust, int minlen, int maxwidth,
  	   int pointflag, PrintfTarget *target);
  static void fmtptr(void *value, PrintfTarget *target);
  static void fmtint(int64 value, char type, int forcesign,
--- 310,318 
  }
  
  
! static bool find_arguments(const char *format, va_list args,
! 			   PrintfArgValue *argvalues);
! static void fmtstr(const char *value, int leftjust, int minlen, int maxwidth,
  	   int pointflag, PrintfTarget *target);
  static void fmtptr(void *value, PrintfTarget *target);
  static void fmtint(int64 value, char type, int forcesign,
*** static void fmtfloat(double value, char 
*** 322,332 
  		 PrintfTarget *target);
  static void dostr(const char *str, int slen, PrintfTarget *target);
  static void dopr_outch(int c, PrintfTarget *target);
  static int	adjust_sign(int is_negative, int forcesign, int *signvalue);
! static void adjust_padlen(int minlen, int vallen, int leftjust, int *padlen);
! static void leading_pad(int zpad, int *signvalue, int *padlen,
  			PrintfTarget *target);
! static void trailing_pad(int *padlen, PrintfTarget *target);
  
  
  /*
--- 324,335 
  		 PrintfTarget *target);
  static void dostr(const char *str, int slen, PrintfTarget *target);
  static void dopr_outch(int c, PrintfTarget *target);
+ static void dopr_outchmulti(int c, int slen, 

Re: Performance improvements for src/port/snprintf.c

2018-09-26 Thread Andres Freund
Hi,

On 2018-09-26 17:57:05 -0700, Andres Freund wrote:
> snprintf time = 1324.87 ms total, 0.000264975 ms per iteration
> pg time = 1434.57 ms total, 0.000286915 ms per iteration
> stbsp time = 552.14 ms total, 0.000110428 ms per iteration

Reading around the interwebz lead me to look at ryu

https://dl.acm.org/citation.cfm?id=3192369
https://github.com/ulfjack/ryu/tree/46f4c5572121a6f1428749fe3e24132c3626c946

That's an algorithm that always generates the minimally sized
roundtrip-safe string output for a floating point number. That makes it
insuitable for the innards of printf, but it very well could be
interesting for e.g. float8out, especially when we currently specify a
"too high" precision to guarantee round-trip safeity.

Greetings,

Andres Freund



Re: Performance improvements for src/port/snprintf.c

2018-09-26 Thread Andres Freund
On 2018-09-26 17:40:22 -0700, Andres Freund wrote:
> On 2018-09-26 20:25:44 -0400, Tom Lane wrote:
> > Andres Freund  writes:
> > > On 2018-09-26 19:45:07 -0400, Tom Lane wrote:
> > >> No, there are no additional layers that weren't there before ---
> > >> snprintf.c's snprintf() slots in directly where the platform's did 
> > >> before.
> > 
> > > Hm? What I mean is that we can't realistically be faster with the
> > > current architecture, because for floating point we end up doing glibc
> > > sprintf() in either case.
> > 
> > Oh, you mean specifically for the float conversion case.  I still say
> > that I will *not* accept judging this code solely on the float case.
> 
> Oh, it should definitely not be judged solely based on floating point,
> we agree.
> 
> 
> > The string and integer cases are at least as important if not more so.
> 
> I think the integer stuff has become a *little* bit less important,
> because we converted the hot cases over to pg_lto etc.
> 
> 
> > >> Interesting.  strfromd() is a glibc-ism, and a fairly recent one at
> > >> that (my RHEL6 box doesn't seem to have it).
> > 
> > > It's C99 afaict.
> > 
> > It's not in POSIX 2008, and I don't see it in my admittedly-draft
> > copy of C99 either.  But that's not real relevant -- I don't see
> > much reason not to use it if we want a quick and dirty answer for
> > the platforms that have it.
> 
> Right, I really just wanted some more baseline numbers.
> 
> 
> > If we had more ambition, we might consider stealing the float
> > conversion logic out of the "stb" library that Alexander pointed
> > to upthread.  It says it's public domain, so there's no license
> > impediment to borrowing some code ...
> 
> Yea, I started to play around with doing so with musl, but based on
> early my benchmarks it's not fast enough to bother.  I've not integrated
> it into our code, but instead printed two floating point numbers with
> your test:
> 
> musl 500 iterations:
> snprintf time = 3144.46 ms total, 0.000628892 ms per iteration
> pg_snprintf time = 4215.1 ms total, 0.00084302 ms per iteration
> ratio = 1.340
> 
> glibc 500 iterations:
> snprintf time = 1680.82 ms total, 0.000336165 ms per iteration
> pg_snprintf time = 2629.46 ms total, 0.000525892 ms per iteration
> ratio = 1.564
> 
> So there's pretty clearly no point in even considering starting from
> musl.

Hm, stb's results just for floating point isn't bad. The above numbers
were for %f %f. But as the minimal usage would be about the internal
usage of dopr(), here's comparing %.*f:

snprintf time = 1324.87 ms total, 0.000264975 ms per iteration
pg time = 1434.57 ms total, 0.000286915 ms per iteration
stbsp time = 552.14 ms total, 0.000110428 ms per iteration

Greetings,

Andres Freund



Re: Performance improvements for src/port/snprintf.c

2018-09-26 Thread Andres Freund
On 2018-09-26 20:25:44 -0400, Tom Lane wrote:
> Andres Freund  writes:
> > On 2018-09-26 19:45:07 -0400, Tom Lane wrote:
> >> No, there are no additional layers that weren't there before ---
> >> snprintf.c's snprintf() slots in directly where the platform's did before.
> 
> > Hm? What I mean is that we can't realistically be faster with the
> > current architecture, because for floating point we end up doing glibc
> > sprintf() in either case.
> 
> Oh, you mean specifically for the float conversion case.  I still say
> that I will *not* accept judging this code solely on the float case.

Oh, it should definitely not be judged solely based on floating point,
we agree.


> The string and integer cases are at least as important if not more so.

I think the integer stuff has become a *little* bit less important,
because we converted the hot cases over to pg_lto etc.


> >> Interesting.  strfromd() is a glibc-ism, and a fairly recent one at
> >> that (my RHEL6 box doesn't seem to have it).
> 
> > It's C99 afaict.
> 
> It's not in POSIX 2008, and I don't see it in my admittedly-draft
> copy of C99 either.  But that's not real relevant -- I don't see
> much reason not to use it if we want a quick and dirty answer for
> the platforms that have it.

Right, I really just wanted some more baseline numbers.


> If we had more ambition, we might consider stealing the float
> conversion logic out of the "stb" library that Alexander pointed
> to upthread.  It says it's public domain, so there's no license
> impediment to borrowing some code ...

Yea, I started to play around with doing so with musl, but based on
early my benchmarks it's not fast enough to bother.  I've not integrated
it into our code, but instead printed two floating point numbers with
your test:

musl 500 iterations:
snprintf time = 3144.46 ms total, 0.000628892 ms per iteration
pg_snprintf time = 4215.1 ms total, 0.00084302 ms per iteration
ratio = 1.340

glibc 500 iterations:
snprintf time = 1680.82 ms total, 0.000336165 ms per iteration
pg_snprintf time = 2629.46 ms total, 0.000525892 ms per iteration
ratio = 1.564

So there's pretty clearly no point in even considering starting from
musl.

Greetings,

Andres Freund



Re: Performance improvements for src/port/snprintf.c

2018-09-26 Thread Tom Lane
Andres Freund  writes:
> On 2018-09-26 19:45:07 -0400, Tom Lane wrote:
>> No, there are no additional layers that weren't there before ---
>> snprintf.c's snprintf() slots in directly where the platform's did before.

> Hm? What I mean is that we can't realistically be faster with the
> current architecture, because for floating point we end up doing glibc
> sprintf() in either case.

Oh, you mean specifically for the float conversion case.  I still say
that I will *not* accept judging this code solely on the float case.
The string and integer cases are at least as important if not more so.

>> Interesting.  strfromd() is a glibc-ism, and a fairly recent one at
>> that (my RHEL6 box doesn't seem to have it).

> It's C99 afaict.

It's not in POSIX 2008, and I don't see it in my admittedly-draft
copy of C99 either.  But that's not real relevant -- I don't see
much reason not to use it if we want a quick and dirty answer for
the platforms that have it.

If we had more ambition, we might consider stealing the float
conversion logic out of the "stb" library that Alexander pointed
to upthread.  It says it's public domain, so there's no license
impediment to borrowing some code ...

regards, tom lane



Re: Performance improvements for src/port/snprintf.c

2018-09-26 Thread Andres Freund
On 2018-09-26 19:45:07 -0400, Tom Lane wrote:
> Andres Freund  writes:
> > On 2018-09-26 15:04:20 -0700, Andres Freund wrote:
> >> I assume this partially is just the additional layers of function calls
> >> (psprintf, pvsnprintf, pg_vsnprintf, dopr) that are now done, in
> >> addition to pretty much the same work as before (i.e. sprintf("%.*f")).
> 
> No, there are no additional layers that weren't there before ---
> snprintf.c's snprintf() slots in directly where the platform's did before.

Hm? What I mean is that we can't realistically be faster with the
current architecture, because for floating point we end up doing glibc
sprintf() in either case.  And after the unconditional replacement,
we're doing a bunch of *additional* work (at the very least we're
parsing the format string twice).

> Well, ok, dopr() wasn't there before, but I trust you're not claiming
> that glibc's implementation of snprintf() is totally flat either.

I don't even think it's all that fast...


> > I'm *NOT* proposing that as the actual solution, but as a datapoint, it
> > might be interesting that hardcoding the precision and thus allowing use
> > ofusing strfromd() instead of sprintf yields a *better* runtime than
> > master.
> 
> Interesting.  strfromd() is a glibc-ism, and a fairly recent one at
> that (my RHEL6 box doesn't seem to have it).  But we could use it where
> available.  And it doesn't seem unreasonable to have a fast path for
> the specific precision value(s) that float4/8out will actually use.

It's C99 afaict.  What I did for my quick hack is to just hack the
precision as characters into the format that dopr() uses...

Greetings,

Andres Freund



Re: Performance improvements for src/port/snprintf.c

2018-09-26 Thread Tom Lane
Andres Freund  writes:
> I kinda wonder if we shouldn't replace the non pg_* functions in
> snprintf.c with a more modern copy of a compatibly licensed libc. Looks
> to me like our implementation has forked off some BSD a fair while ago.

Maybe, but the benchmarking I was doing last month didn't convince me
that the *BSD versions were remarkably fast.  There are a lot of cases
where our version is faster.

regards, tom lane



Re: Performance improvements for src/port/snprintf.c

2018-09-26 Thread Tom Lane
Andres Freund  writes:
> On 2018-09-26 15:04:20 -0700, Andres Freund wrote:
>> I assume this partially is just the additional layers of function calls
>> (psprintf, pvsnprintf, pg_vsnprintf, dopr) that are now done, in
>> addition to pretty much the same work as before (i.e. sprintf("%.*f")).

No, there are no additional layers that weren't there before ---
snprintf.c's snprintf() slots in directly where the platform's did before.

Well, ok, dopr() wasn't there before, but I trust you're not claiming
that glibc's implementation of snprintf() is totally flat either.

I think it's just that snprintf.c is a bit slower in this case.  If you
look at glibc's implementation, they've expended a heck of a lot of code
and sweat on it.  The only reason we could hope to beat it is that we're
prepared to throw out some functionality, like LC_NUMERIC handling.

> I'm *NOT* proposing that as the actual solution, but as a datapoint, it
> might be interesting that hardcoding the precision and thus allowing use
> ofusing strfromd() instead of sprintf yields a *better* runtime than
> master.

Interesting.  strfromd() is a glibc-ism, and a fairly recent one at
that (my RHEL6 box doesn't seem to have it).  But we could use it where
available.  And it doesn't seem unreasonable to have a fast path for
the specific precision value(s) that float4/8out will actually use.

regards, tom lane



Re: Performance improvements for src/port/snprintf.c

2018-09-26 Thread Andres Freund
On 2018-08-17 14:32:59 -0400, Tom Lane wrote:
> I've been looking into the possible performance consequences of that,
> in particular comparing snprintf.c to the library versions on macOS,
> FreeBSD, OpenBSD, and NetBSD.  While it held up well in simpler cases,
> I noted that it was significantly slower on long format strings, which
> I traced to two separate problems:

> Perhaps there's a way to improve that
> without writing our own floating-point conversion code, but I'm not
> seeing an easy way offhand.  I don't think that's a showstopper though.
> This code is now faster than the native code for very many other cases,
> so on average it should cause no real performance problem.

I kinda wonder if we shouldn't replace the non pg_* functions in
snprintf.c with a more modern copy of a compatibly licensed libc. Looks
to me like our implementation has forked off some BSD a fair while ago.

There seems to be a few choices. Among others:
- freebsd libc:
  https://github.com/freebsd/freebsd/blob/master/lib/libc/stdio/vfprintf.c
  (floating point stuff is elsewhere)
- musl libc:
  https://git.musl-libc.org/cgit/musl/tree/src/stdio/vfprintf.c
- stb (as Alexander referenced earlier)
  https://github.com/nothings/stb/blob/master/stb_sprintf.h

I've not benchmarked any of these. Just by looking at the code, the musl
one looks by far the most compact - looks like all the relevant code is
in the one file referenced.

Greetings,

Andres Freund



Let's stop with the retail rebuilds of src/port/ files already

2018-09-26 Thread Tom Lane
I'm getting tired of having to make fixes like ce4887bd0.  I think
we should rearrange things so that src/port/ and src/common/ compile
all their files a third time using shared-library-friendly switches,
put them into new .a files, and have libpq and the ecpg libraries
just include those libraries instead of what they're doing now.

This would result in compiling some of the port+common files uselessly,
since they'd never actually get pulled in by any shared library.
But I think we're approaching the point where we might have a net savings
of build time anyway, due to not having to compile the same files multiple
times in different subdirectories.  And it'd sure be a savings of
developer brain-cells and sanity.

Maybe use the extension "_shlib" (vs "_srv") for these .o and .a files.

Thoughts?

regards, tom lane



Re: [PATCH] Improve geometric types

2018-09-26 Thread Tom Lane
Tomas Vondra  writes:
> On 09/26/2018 06:45 PM, Tom Lane wrote:
>> gaur's not happy, but rather surprisingly, it looks like we're
>> mostly OK elsewhere.  Do you need me to trace down exactly what's
>> going wrong on gaur?

> Or you could just try doing
> select '(0,0)'::point * '(-3,4)'::point;
> If this is what's going on, I'd say the best solution is to make it
> produce (0,0) everywhere, so that we don't expect -0.0 anywhere.

Actually, it seems simpler than that: gaur produces plus zero already
from the multiplication:

regression=# select '-3'::float8 * '0'::float8;
 ?column? 
--
0
(1 row)

whereas I get -0 elsewhere.  I'm surprised that this doesn't create
more widely-visible regression failures, but there you have it.

> We could do that either by adding the == 0.0 check to yet another place,
> or to point_construct() directly. Adding it to point_construct() means
> we'll pay the price always, but I guess there are few paths where we
> know we don't need it. And if we add it to many places it's likely about
> as expensive as adding it to point_construct.

If gaur is the only machine showing this failure, which seems more
likely by the hour, I'm not sure that we should give up performance
across-the-board to make it happy.  Perhaps a variant expected-file
is a better answer; or we could remove these specific test cases.

Anyway, I'd counsel doing nothing for a day or so, till the buildfarm
breakage from the strerror/snprintf changes clears up.  Then we'll
have a better idea of whether any other machines are affected.

regards, tom lane



Re: Proposal for Signal Detection Refactoring

2018-09-26 Thread Michael Paquier
On Wed, Sep 26, 2018 at 09:50:00AM +0200, Chris Travers wrote:
> On Wed, Sep 26, 2018 at 12:41 AM Michael Paquier 
> wrote:
>> Agreed.  Are there any objections with the proposal of changing the
>> interruption pending flags in miscadmin.h to use sig_atomic_t?
>> ClientConnectionLost would gain PGDLLIMPORT at the same time.
> 
> I am strongly in favor of doing this.

Okay, done.
--
Michael


signature.asc
Description: PGP signature


Re: Allowing printf("%m") only where it actually works

2018-09-26 Thread Andres Freund
On 2018-09-26 18:31:07 -0400, Tom Lane wrote:
> Andres Freund  writes:
> > On 2018-09-26 17:41:36 -0400, Tom Lane wrote:
> >> Well, if you're unhappy about snprintf.c's performance, you could review
> >> https://commitfest.postgresql.org/19/1763/
> >> so I can push that.  In my tests, that got us down to circa 10% penalty
> >> for float conversions.
> 
> > Uh, I can do that, but the fact remains that your commit slowed down
> > COPY and other conversion intensive workloads by a *significant* amount.
> 
> [ shrug... ]  There are other cases that got faster (particularly after
> the above-mentioned patch).  I do not wish to consider floating-point
> conversion speed as the sole figure of merit for this change.  If we
> are to consider only the worst-case, we should be reverting JIT.

Oh, come on. One can be disabled with a GUC, has (although not good
enough) intelligence when it switches on, the other has ... none of
that.  Obviously performance is always a balancing act, but you'd be
pretty pissed at anybody else regressing performance in a non-fringe
case, and then refused responsibility.  And as I said, I'm willing to
help.

Greetings,

Andres Freund



Re: Allowing printf("%m") only where it actually works

2018-09-26 Thread Tom Lane
Andres Freund  writes:
> On 2018-09-26 17:41:36 -0400, Tom Lane wrote:
>> Well, if you're unhappy about snprintf.c's performance, you could review
>> https://commitfest.postgresql.org/19/1763/
>> so I can push that.  In my tests, that got us down to circa 10% penalty
>> for float conversions.

> Uh, I can do that, but the fact remains that your commit slowed down
> COPY and other conversion intensive workloads by a *significant* amount.

[ shrug... ]  There are other cases that got faster (particularly after
the above-mentioned patch).  I do not wish to consider floating-point
conversion speed as the sole figure of merit for this change.  If we
are to consider only the worst-case, we should be reverting JIT.

regards, tom lane



Re: Performance improvements for src/port/snprintf.c

2018-09-26 Thread Andres Freund
On 2018-09-26 15:04:20 -0700, Andres Freund wrote:
> On 2018-09-12 14:14:15 -0400, Tom Lane wrote:
> > Alexander Kuzmenkov  writes:
> > > I benchmarked this, using your testbed and comparing to libc sprintf
> > > (Ubuntu GLIBC 2.27-0ubuntu3) and another implementation I know [1], all
> > > compiled with gcc 5.
> >
> > Thanks for reviewing!
> >
> > The cfbot noticed that the recent dlopen patch conflicted with this in
> > configure.in, so here's a rebased version.  The code itself didn't change.
> 
> Conflicts again, but not too hard to resolve.
> 
> The mini benchmark from 
> http://archives.postgresql.org/message-id/20180926174645.nsyj77lx2mvtz4kx%40alap3.anarazel.de
> is significantly improved by this patch.
> 
> > 96bf88d52711ad3a0a4cc2d1d9cb0e2acab85e63:
> >
> > COPY somefloats TO '/dev/null';
> > COPY 1000
> > Time: 24575.770 ms (00:24.576)
> >
> > 96bf88d52711ad3a0a4cc2d1d9cb0e2acab85e63^:
> >
> > COPY somefloats TO '/dev/null';
> > COPY 1000
> > Time: 12877.037 ms (00:12.877)
> 
> This patch:
> 
> postgres[32704][1]=# ;SELECT pg_prewarm('somefloats');COPY somefloats TO 
> '/dev/null';
> Time: 0.269 ms
> ┌┐
> │ pg_prewarm │
> ├┤
> │  73530 │
> └┘
> (1 row)
> 
> Time: 34.983 ms
> COPY 1000
> Time: 15511.478 ms (00:15.511)
> 
> 
> The profile from 96bf88d52711ad3a0a4cc2d1d9cb0e2acab85e63^ is:
> +   38.15%  postgres  libc-2.27.so  [.] __GI___printf_fp_l
> +   13.98%  postgres  libc-2.27.so  [.] hack_digit
> +7.54%  postgres  libc-2.27.so  [.] __mpn_mul_1
> +7.32%  postgres  postgres  [.] CopyOneRowTo
> +6.12%  postgres  libc-2.27.so  [.] vfprintf
> +3.14%  postgres  libc-2.27.so  [.] __strlen_avx2
> +1.97%  postgres  postgres  [.] heap_deform_tuple
> +1.77%  postgres  postgres  [.] AllocSetAlloc
> +1.43%  postgres  postgres  [.] psprintf
> +1.25%  postgres  libc-2.27.so  [.] _IO_str_init_static_internal
> +1.09%  postgres  libc-2.27.so  [.] _IO_vsnprintf
> +1.09%  postgres  postgres  [.] appendBinaryStringInfo
> 
> The profile of master with this patch is:
> 
> +   32.38%  postgres  libc-2.27.so  [.] __GI___printf_fp_l
> +   11.08%  postgres  libc-2.27.so  [.] hack_digit
> +9.55%  postgres  postgres  [.] CopyOneRowTo
> +6.24%  postgres  libc-2.27.so  [.] __mpn_mul_1
> +5.01%  postgres  libc-2.27.so  [.] vfprintf
> +4.91%  postgres  postgres  [.] dopr.constprop.4
> +3.53%  postgres  libc-2.27.so  [.] __strlen_avx2
> +1.55%  postgres  libc-2.27.so  [.] __strchrnul_avx2
> +1.49%  postgres  libc-2.27.so  [.] __memmove_avx_unaligned_erms
> +1.35%  postgres  postgres  [.] AllocSetAlloc
> +1.32%  postgres  libc-2.27.so  [.] _IO_str_init_static_internal
> +1.30%  postgres  postgres  [.] FunctionCall1Coll
> +1.27%  postgres  postgres  [.] psprintf
> +1.16%  postgres  postgres  [.] appendBinaryStringInfo
> +1.16%  postgres  libc-2.27.so  [.] _IO_old_init
> +1.06%  postgres  postgres  [.] heap_deform_tuple
> +1.02%  postgres  libc-2.27.so  [.] sprintf
> +1.02%  postgres  libc-2.27.so  [.] _IO_vsprintf
> 
> (all functions above 1%)
> 
> 
> I assume this partially is just the additional layers of function calls
> (psprintf, pvsnprintf, pg_vsnprintf, dopr) that are now done, in
> addition to pretty much the same work as before (i.e. sprintf("%.*f")).

I'm *NOT* proposing that as the actual solution, but as a datapoint, it
might be interesting that hardcoding the precision and thus allowing use
ofusing strfromd() instead of sprintf yields a *better* runtime than
master.

Time: 10255.134 ms (00:10.255)

Greetings,

Andres Freund



Re: Performance improvements for src/port/snprintf.c

2018-09-26 Thread Andres Freund
On 2018-09-12 14:14:15 -0400, Tom Lane wrote:
> Alexander Kuzmenkov  writes:
> > I benchmarked this, using your testbed and comparing to libc sprintf
> > (Ubuntu GLIBC 2.27-0ubuntu3) and another implementation I know [1], all
> > compiled with gcc 5.
>
> Thanks for reviewing!
>
> The cfbot noticed that the recent dlopen patch conflicted with this in
> configure.in, so here's a rebased version.  The code itself didn't change.

Conflicts again, but not too hard to resolve.

The mini benchmark from 
http://archives.postgresql.org/message-id/20180926174645.nsyj77lx2mvtz4kx%40alap3.anarazel.de
is significantly improved by this patch.

> 96bf88d52711ad3a0a4cc2d1d9cb0e2acab85e63:
>
> COPY somefloats TO '/dev/null';
> COPY 1000
> Time: 24575.770 ms (00:24.576)
>
> 96bf88d52711ad3a0a4cc2d1d9cb0e2acab85e63^:
>
> COPY somefloats TO '/dev/null';
> COPY 1000
> Time: 12877.037 ms (00:12.877)

This patch:

postgres[32704][1]=# ;SELECT pg_prewarm('somefloats');COPY somefloats TO 
'/dev/null';
Time: 0.269 ms
┌┐
│ pg_prewarm │
├┤
│  73530 │
└┘
(1 row)

Time: 34.983 ms
COPY 1000
Time: 15511.478 ms (00:15.511)


The profile from 96bf88d52711ad3a0a4cc2d1d9cb0e2acab85e63^ is:
+   38.15%  postgres  libc-2.27.so  [.] __GI___printf_fp_l
+   13.98%  postgres  libc-2.27.so  [.] hack_digit
+7.54%  postgres  libc-2.27.so  [.] __mpn_mul_1
+7.32%  postgres  postgres  [.] CopyOneRowTo
+6.12%  postgres  libc-2.27.so  [.] vfprintf
+3.14%  postgres  libc-2.27.so  [.] __strlen_avx2
+1.97%  postgres  postgres  [.] heap_deform_tuple
+1.77%  postgres  postgres  [.] AllocSetAlloc
+1.43%  postgres  postgres  [.] psprintf
+1.25%  postgres  libc-2.27.so  [.] _IO_str_init_static_internal
+1.09%  postgres  libc-2.27.so  [.] _IO_vsnprintf
+1.09%  postgres  postgres  [.] appendBinaryStringInfo

The profile of master with this patch is:

+   32.38%  postgres  libc-2.27.so  [.] __GI___printf_fp_l
+   11.08%  postgres  libc-2.27.so  [.] hack_digit
+9.55%  postgres  postgres  [.] CopyOneRowTo
+6.24%  postgres  libc-2.27.so  [.] __mpn_mul_1
+5.01%  postgres  libc-2.27.so  [.] vfprintf
+4.91%  postgres  postgres  [.] dopr.constprop.4
+3.53%  postgres  libc-2.27.so  [.] __strlen_avx2
+1.55%  postgres  libc-2.27.so  [.] __strchrnul_avx2
+1.49%  postgres  libc-2.27.so  [.] __memmove_avx_unaligned_erms
+1.35%  postgres  postgres  [.] AllocSetAlloc
+1.32%  postgres  libc-2.27.so  [.] _IO_str_init_static_internal
+1.30%  postgres  postgres  [.] FunctionCall1Coll
+1.27%  postgres  postgres  [.] psprintf
+1.16%  postgres  postgres  [.] appendBinaryStringInfo
+1.16%  postgres  libc-2.27.so  [.] _IO_old_init
+1.06%  postgres  postgres  [.] heap_deform_tuple
+1.02%  postgres  libc-2.27.so  [.] sprintf
+1.02%  postgres  libc-2.27.so  [.] _IO_vsprintf

(all functions above 1%)


I assume this partially is just the additional layers of function calls
(psprintf, pvsnprintf, pg_vsnprintf, dopr) that are now done, in
addition to pretty much the same work as before (i.e. sprintf("%.*f")).

- Andres



Re: Allowing printf("%m") only where it actually works

2018-09-26 Thread Andres Freund
Hi,

On 2018-09-26 17:41:36 -0400, Tom Lane wrote:
> Andres Freund  writes:
> > I'm not saying we shouldn't default to our printf - in fact I think we
> > probably past due to use a faster float->string conversion than we
> > portably get from the OS - but I don't think we can default to our
> > sprintf without doing something about the float conversion performance.
> 
> Well, if you're unhappy about snprintf.c's performance, you could review
> https://commitfest.postgresql.org/19/1763/
> so I can push that.  In my tests, that got us down to circa 10% penalty
> for float conversions.

Uh, I can do that, but the fact remains that your commit slowed down
COPY and other conversion intensive workloads by a *significant* amount.
I'm ok helping with improving/winning-back performance, but I do think
the obligation to do so remains with the committer/authors that caused a
performance regression.

Greetings,

Andres Freund



Re: transction_timestamp() inside of procedures

2018-09-26 Thread Peter Eisentraut
On 26/09/2018 17:54, Alvaro Herrera wrote:
> What could be the use for the transaction timestamp?  I think one of the
> most important uses (at least in pg_stat_activity) is to verify that
> transactions are not taking excessively long time to complete; that's
> known to cause all sorts of trouble in Postgres, and probably other
> DBMSs too.  If we don't accurately measure what it really is, and
> instead keep the compatibility behavior, we risk panicking people
> because they think some transaction has been running for a long time
> when in reality it's just a very long procedure which commits frequently
> enough not to be a problem.

That's certainly a good argument.  Note that if we implemented that the
transaction timestamp is advanced inside procedures, that would also
mean that the transaction timestamp as observed in pg_stat_activity
would move during VACUUM, for example.  That might or might not be
desirable.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Allowing printf("%m") only where it actually works

2018-09-26 Thread Tom Lane
Andres Freund  writes:
> I'm not saying we shouldn't default to our printf - in fact I think we
> probably past due to use a faster float->string conversion than we
> portably get from the OS - but I don't think we can default to our
> sprintf without doing something about the float conversion performance.

Well, if you're unhappy about snprintf.c's performance, you could review
https://commitfest.postgresql.org/19/1763/
so I can push that.  In my tests, that got us down to circa 10% penalty
for float conversions.

More generally, I'm not averse to having our own float conversion code
if someone wants to put in the effort.  Performance aside, it'd be nice
to eliminate cross-platform differences in float output so we could get
rid of some of the Windows-specific regression result files.

regards, tom lane



Re: [PATCH] Improve geometric types

2018-09-26 Thread Tom Lane
Tomas Vondra  writes:
> Hmmm, interesting. It seems both failures happen in the chunk that
> multiplies paths with points, i.e. essentially point_mul_point. So it
> seems most platforms end up with

> (0,0) * (-3,4) = (-0, 0)

> while gaur apparently thinks it's (0,0). And indeed, that's what the
> attached trivial program does - I'd bet if you run it on gaur, it'll
> print 0.00, not -0.00.

Nope, no cigar:

$ gcc -Wall -O2 test.c
$ ./a.out
-0.00

(I tried a couple other -O levels to see if that affected anything,
but it didn't.)

I'll try to isolate the problem more closely, but it will take awhile.
That machine is slow :-(

regards, tom lane



Re: Regression tests fail once XID counter exceeds 2 billion

2018-09-26 Thread James Coleman




Added to TODO:

Add function to allow easier transaction id comparisons

http://archives.postgresql.org/pgsql-hackers/2011-11/msg00786.php


Did this ever happen? Or did it fall through the cracks?



Re: pgbench's expression parsing & negative numbers

2018-09-26 Thread Fabien COELHO



Hello Andres,


+"9223372036854775808" {
+   /* yuk: special handling for minint */
+   return MAXINT_PLUS_ONE_CONST;
+   }


Yikes, do we really need this?  If we dealt with - in the lexer, we
shouldn't need it, no?


I'm not sure how to handle unary minus constant in the lexer, given that 
the same '-' character is also used as minus binary operator.


The proposed solution is functional and has a reduce overall impact (one 
lexer and one parser rules added), so it looks good to me.


Probably other approaches are possible.


+   /* this can't happen here, function called on int-looking strings. */


This comment doesn't strike me as a good idea, it's almost guaranteed to
be outdated at some point.


It is valid now, and it can be removed anyway.


[...] Duplicating these routines is pretty ugly.


Sure.

I really wish we had more infrastructure to avoid this (in particularly 
a portable way to report an error and jump out).  But probably out of 
scope for this patches?


Yes.

Devising an appropriate client-side error handling/reporting 
infrastructure is a non trivial tasks, and would cost more than a few 
duplicate functions. "fprintf(stderr + return/exit" currently does the job 
with minimal hassle. I do not think that this patch is the right one to 
change how error are handle in postgres client applications.



+   if (unlikely(end == str || *end != '\0'))



Not sure I see much point in the unlikelys here, contrasting to the ones
in the backend (and the copy for the frontend) it's extremely unlikley
anything like this will ever be close to a bottleneck.  In general, I'd
strongly suggest not placing unlikelys in non-critical codepaths -
they're way too often wrongly estimated.


I put "unlikely" where I really thought it made sense. I do not know when 
they would have an actual performance impact, but I appreciate that they 
convey information to the reader of the code, telling that this path is 
expected not to be taken.


It can be removed anyway if you do not like it.

--
Fabien.



Re: pg_ls_tmpdir()

2018-09-26 Thread Laurenz Albe
Bossart, Nathan wrote:
> Attached is a patch to add a pg_ls_tmpdir() function that lists the
> contents of a specified tablespace's pgsql_tmp directory.  This is
> very similar to the existing pg_ls_logdir() and pg_ls_waldir()
> functions.
> 
> By providing more visibility into the temporary file directories,
> users can more easily track queries that are consuming a lot of disk
> space for temporary files.  This function already provides enough
> information to calculate the total temporary file usage per PID, but
> it might be worthwhile to create a system view that does this, too.
> 
> I am submitting this patch for consideration in commitfest 2018-11.

The patch applies, builds without warning and passes "make check-world".

Since pgsql_tmp is only created when the first temp file is written,
calling the function on a newly initdb'ed data directory fails with

ERROR:  could not open directory "base/pgsql_tmp": No such file or directory

This should be fixed; it shouldn't be an error.

Calling the function with a non-existing tablespace OID produces:

ERROR:  could not open directory "pg_tblspc/1665/PG_12_201809121/pgsql_tmp": No 
such file or directory

Wouldn't it be better to have a check if the tablespace exists?

About the code:
The catversion bump shouldn't be part of the patch, it will
rot too quickly.  The committer is supposed to add it.


I think the idea to have such a function is fine, but I have two doubts:

1. Do we really need two functions, one without input argument
   to list the default tablespace?
   I think that anybody who uses with such a function whould
   be able to feed the OID of "pg_default".

2. There already are functions "pg_ls_logdir" and "pg_ls_waldir",
   and I can imagine new requests, e.g. pg_ls_datafiles() to list the
   data files in a database directory.

   It may make sense to have a generic function like

  pg_ls_dir(dirname text, tablespace oid)

   instead.  But maybe that's taking it too far...

I'll set the patch to "waiting for author".

Yours,
Laurenz Albe




Re: Allowing printf("%m") only where it actually works

2018-09-26 Thread Tom Lane
Andres Freund  writes:
> The strerror push, I assume it's that at least, broke something on icc:
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=fulmar=2018-09-26%2018%3A00%3A16

Yeah.  It looks like the problem is that configure's test for strerror_r's
return type does not work on icc:

onfigure:10784: checking whether strerror_r returns int
configure:10805: icc -std=gnu99 -c  -mp1 -fno-strict-aliasing -g -O2 -pthread 
-D_REENTRANT -D_THREAD_SAFE -D_POSIX_PTHREAD_SEMANTICS  -D_GNU_SOURCE 
-I/usr/include/libxml2  conftest.c >&5
conftest.c(45): warning #159: declaration is incompatible with previous 
"strerror_r" (declared at line 438 of "/usr/include/string.h")
  int strerror_r(int, char *, size_t);
  ^

configure:10805: $? = 0
configure:10812: result: yes

Configure is expecting this to throw a hard error (if the platform
declares strerror_r to return char*) but icc only makes it a warning.

What I am not quite figuring out here is how the code seemed to work
before.  Before this patch, it only mattered in libpq because that was the
only place using pqStrerror.  Maybe the regression tests don't ever
produce a strerror output from libpq?

regards, tom lane



Re: [PATCH] Improve geometric types

2018-09-26 Thread Tomas Vondra
On 09/26/2018 06:45 PM, Tom Lane wrote:
> Tomas Vondra  writes:
>> Pushed. Now let's wait for the buildfarm to complain ...
> 
> gaur's not happy, but rather surprisingly, it looks like we're
> mostly OK elsewhere.  Do you need me to trace down exactly what's
> going wrong on gaur?
> 

Hmmm, interesting. It seems both failures happen in the chunk that
multiplies paths with points, i.e. essentially point_mul_point. So it
seems most platforms end up with

(0,0) * (-3,4) = (-0, 0)

while gaur apparently thinks it's (0,0). And indeed, that's what the
attached trivial program does - I'd bet if you run it on gaur, it'll
print 0.00, not -0.00.

Or you could just try doing

select '(0,0)'::point * '(-3,4)'::point;

If this is what's going on, I'd say the best solution is to make it
produce (0,0) everywhere, so that we don't expect -0.0 anywhere.

We could do that either by adding the == 0.0 check to yet another place,
or to point_construct() directly. Adding it to point_construct() means
we'll pay the price always, but I guess there are few paths where we
know we don't need it. And if we add it to many places it's likely about
as expensive as adding it to point_construct.

regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
#include 

int main()
{
	double 	x1 = 0.0,
			x2 = -3,
			y1 = 0.0,
			y2 = 4;

	printf("%+f\n", (x1 * x2) - (y1 * y2));

	return 0;
}


Re: pgbench's expression parsing & negative numbers

2018-09-26 Thread Andres Freund
Hi,

On 2018-08-10 10:24:29 +0200, Fabien COELHO wrote:
> diff --git a/src/bin/pgbench/exprscan.l b/src/bin/pgbench/exprscan.l
> index 5c1bd88128..e8faea3857 100644
> --- a/src/bin/pgbench/exprscan.l
> +++ b/src/bin/pgbench/exprscan.l
> @@ -191,16 +191,26 @@ notnull [Nn][Oo][Tt][Nn][Uu][Ll][Ll]
>   yylval->bval = false;
>   return BOOLEAN_CONST;
>   }
> +"9223372036854775808" {
> + /* yuk: special handling for minint */
> + return MAXINT_PLUS_ONE_CONST;
> + }

Yikes, do we really need this?  If we dealt with - in the lexer, we
shouldn't need it, no?


>  /*
>   * strtoint64 -- convert a string to 64-bit integer
>   *
> - * This function is a modified version of scanint8() from
> + * This function is a slightly modified version of scanint8() from
>   * src/backend/utils/adt/int8.c.
> + *
> + * The function returns whether the conversion worked, and if so
> + * "*result" is set to the result.
> + *
> + * If not errorOK, an error message is also printed out on errors.
>   */
> -int64
> -strtoint64(const char *str)
> +bool
> +strtoint64(const char *str, bool errorOK, int64 *result)
>  {
>   const char *ptr = str;
> - int64   result = 0;
> - int sign = 1;
> + int64   tmp = 0;
> + boolneg = false;
>  
>   /*
>* Do our own scan, rather than relying on sscanf which might be broken
>* for long long.
> +  *
> +  * As INT64_MIN can't be stored as a positive 64 bit integer, accumulate
> +  * value as a negative number.
>*/
>  
>   /* skip leading spaces */
> @@ -650,46 +660,80 @@ strtoint64(const char *str)
>   if (*ptr == '-')
>   {
>   ptr++;
> -
> - /*
> -  * Do an explicit check for INT64_MIN.  Ugly though this is, 
> it's
> -  * cleaner than trying to get the loop below to handle it 
> portably.
> -  */
> - if (strncmp(ptr, "9223372036854775808", 19) == 0)
> - {
> - result = PG_INT64_MIN;
> - ptr += 19;
> - goto gotdigits;
> - }
> - sign = -1;
> + neg = true;
>   }
>   else if (*ptr == '+')
>   ptr++;
>  
>   /* require at least one digit */
> - if (!isdigit((unsigned char) *ptr))
> - fprintf(stderr, "invalid input syntax for integer: \"%s\"\n", 
> str);
> + if (unlikely(!isdigit((unsigned char) *ptr)))
> + goto invalid_syntax;
>  
>   /* process digits */
>   while (*ptr && isdigit((unsigned char) *ptr))
>   {
> - int64   tmp = result * 10 + (*ptr++ - '0');
> + int8digit = (*ptr++ - '0');
>  
> - if ((tmp / 10) != result)   /* overflow? */
> - fprintf(stderr, "value \"%s\" is out of range for type 
> bigint\n", str);
> - result = tmp;
> + if (unlikely(pg_mul_s64_overflow(tmp, 10, )) ||
> + unlikely(pg_sub_s64_overflow(tmp, digit, )))
> + goto out_of_range;
>   }
>  
> -gotdigits:
> -
>   /* allow trailing whitespace, but not other trailing chars */
>   while (*ptr != '\0' && isspace((unsigned char) *ptr))
>   ptr++;
>  
> - if (*ptr != '\0')
> - fprintf(stderr, "invalid input syntax for integer: \"%s\"\n", 
> str);
> + if (unlikely(*ptr != '\0'))
> + goto invalid_syntax;
>  
> - return ((sign < 0) ? -result : result);
> + if (!neg)
> + {
> + if (unlikely(tmp == PG_INT64_MIN))
> + goto out_of_range;
> + tmp = -tmp;
> + }
> +
> + *result = tmp;
> + return true;
> +
> +out_of_range:
> + if (!errorOK)
> + fprintf(stderr,
> + "value \"%s\" is out of range for type 
> bigint\n", str);
> + return false;
> +
> +invalid_syntax:
> + /* this can't happen here, function called on int-looking strings. */

This comment doesn't strike me as a good idea, it's almost guaranteed to
be outdated at some point.

> + if (!errorOK)
> + fprintf(stderr,
> + "invalid input syntax for type bigint: 
> \"%s\"\n",str);
> + return false;
> +}


Duplicating these routines is pretty ugly.  I really wish we had more
infrastructure to avoid this (in particularly a portable way to report
an error and jump out).  But probably out of scope for this patches?

> +/* convert string to double, detecting overflows/underflows */
> +bool
> +strtodouble(const char *str, bool errorOK, double *dv)
> +{
> + char *end;
> +
> + *dv = strtod(str, );
> +
> + if (unlikely(errno != 0))
> + {
> + if 

Re: pgsql: Remove absolete function TupleDescGetSlot().

2018-09-26 Thread Andres Freund
Hi,

On 2018-09-26 12:41:38 +0900, Michael Paquier wrote:
> On Tue, Sep 25, 2018 at 06:42:51PM -0700, Andres Freund wrote:
> > My point is that FuncCallContext->slot isn't actually all that related
> > to TupleDescGetSlot() and could be used entirely independently.
> 
> That's fair.  Why not just replacing the existing comment with something
> like "slot can be used in your own context to store tuple values,
> useful in the context of user-defined SRFs"?  Just my 2c.

I've tried to do search for users of FuncCallContext->slot and couldn't
find anything. Therefore I'm more inclined to drop it too - just about
all cases where it's useful seem to require a more extensive
datastructure around anyway.  Unless somebody protests I'm going to that
in a few - if somebody later in the cycle signals a need for it, we can
still revert that part.

Greetings,

Andres Freund



Re: transction_timestamp() inside of procedures

2018-09-26 Thread Daniel Verite
Tom Lane wrote:

> I agree that it would be surprising for transaction timestamp to be newer
> than statement timestamp. 

To me it's more surprising to start a new transaction and having
transaction_timestamp() still pointing at the start of a previous 
transaction.
This feels like a side-effect of being spawned by a procedure,
and an exception to what transaction_timestamp() normally means
or meant until now.

OTOH transaction_timestamp() being possibly newer than
statement_timestamp() seems like a normal consequence of
transactions being created intra-statement.

+1 for transaction_timestamp() and pg_stat_activity being updated
to follow intra-procedure transactions.


Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite



Re: Allowing printf("%m") only where it actually works

2018-09-26 Thread Andres Freund
On 2018-09-26 12:09:34 -0700, Andres Freund wrote:
> and then fails with:
> xlc_r -qmaxmem=33554432 -D_LARGE_FILES=1 -DRANDOMIZE_ALLOCATED_MEMORY 
> -qnoansialias -g -O2 -qmaxmem=16384 -qsrcmsg -D_REENTRANT -D_THREAD_SAFE 
> -D_POSIX_PTHREAD_SEMANTICS  -o libpq.so.5 libpq.a -Wl,-bE:libpq.exp 
> -L../../../src/port -L../../../src/common   
> -L/home/nm/sw/nopath/libxml2-32/lib  -L/home/nm/sw/nopath/uuid-32/lib 
> -L/home/nm/sw/nopath/openldap-32/lib -L/home/nm/sw/nopath/icu58.2-32/lib 
> -L/home/nm/sw/nopath/libxml2-32/lib 
> -Wl,-blibpath:'/home/nm/farm/xlc32/HEAD/inst/lib:/home/nm/sw/nopath/libxml2-32/lib:/home/nm/sw/nopath/uuid-32/lib:/home/nm/sw/nopath/openldap-32/lib:/home/nm/sw/nopath/icu58.2-32/lib:/home/nm/sw/nopath/libxml2-32/lib:/usr/lib:/lib'
>   -Wl,-bnoentry -Wl,-H512 -Wl,-bM:SRE -lintl -lssl -lcrypto -lldap_r -llber 
> -lpthreads
> ld: 0711-317 ERROR: Undefined symbol: ._isnan
> ld: 0711-345 Use the -bloadmap or -bnoquiet option to obtain more information.
> ../../../src/Makefile.shlib:339: recipe for target 'libpq.so.5' failed
> 
> but I think the latter is more likely to be caused by
> 2e2a392de3 Wed Sep 26 08:25:24 2018 UTC  Fix problems in handling the line 
> data type 
> rather than this thread.

Err, no, that commit was backend code, this fails in frontend code...

Greetings,

Andres Freund



Re: Allowing printf("%m") only where it actually works

2018-09-26 Thread Andres Freund
On 2018-09-26 11:57:34 -0700, Andres Freund wrote:
> On 2018-09-26 11:09:59 -0400, Tom Lane wrote:
> > Michael Paquier  writes:
> > > On Tue, Sep 25, 2018 at 12:05:42PM -0400, Tom Lane wrote:
> > >> Alvaro Herrera  writes:
> > >>> Actually I think it *is* useful to do it like this, because then the
> > >>> user knows to fix the netmsg.dll problem so that they can continue to
> > >>> investigate the winsock problem.  If we don't report the secondary error
> > >>> message, how are users going to figure out how to fix the problem?
> > 
> > >> OK, I'm fine with doing it like that if people want it.
> > 
> > > +1.
> > 
> > OK, pushed 0001 with that adjustment.
> > 
> > While looking over the thread, I remembered I wanted to convert
> > strerror_r into a wrapper as well.  Think I'll go do that next,
> > because really it'd be better for snprintf.c to be calling strerror_r
> > not strerror.
> 
> The strerror push, I assume it's that at least, broke something on icc:
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=fulmar=2018-09-26%2018%3A00%3A16
> 
> == pgsql.build/src/test/regress/regression.diffs 
> ===
> *** 
> /var/buildfarm/fulmar/build/HEAD/pgsql.build/src/test/regress/expected/create_function_1.out
>   Wed Sep 26 20:10:35 2018
> --- 
> /var/buildfarm/fulmar/build/HEAD/pgsql.build/src/test/regress/results/create_function_1.out
>Wed Sep 26 20:10:43 2018
> ***
> *** 86,92 
>   ERROR:  only one AS item needed for language "sql"
>   CREATE FUNCTION test1 (int) RETURNS int LANGUAGE C
>   AS 'nosuchfile';
> ! ERROR:  could not access file "nosuchfile": No such file or directory
>   CREATE FUNCTION test1 (int) RETURNS int LANGUAGE C
>   AS 
> '/var/buildfarm/fulmar/build/HEAD/pgsql.build/src/test/regress/regress.so', 
> 'nosuchsymbol';
>   ERROR:  could not find function "nosuchsymbol" in file 
> "/var/buildfarm/fulmar/build/HEAD/pgsql.build/src/test/regress/regress.so"
> --- 86,92 
>   ERROR:  only one AS item needed for language "sql"
>   CREATE FUNCTION test1 (int) RETURNS int LANGUAGE C
>   AS 'nosuchfile';
> ! ERROR:  could not access file "nosuchfile": ENOENT
>   CREATE FUNCTION test1 (int) RETURNS int LANGUAGE C
>   AS 
> '/var/buildfarm/fulmar/build/HEAD/pgsql.build/src/test/regress/regress.so', 
> 'nosuchsymbol';
>   ERROR:  could not find function "nosuchsymbol" in file 
> "/var/buildfarm/fulmar/build/HEAD/pgsql.build/src/test/regress/regress.so"
> 
> ==

Mandrill as well:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=mandrill=2018-09-26%2018%3A30%3A26

*Lots* of new warnings like:
xlc_r -qmaxmem=33554432 -D_LARGE_FILES=1 -DRANDOMIZE_ALLOCATED_MEMORY 
-qnoansialias -g -O2 -qmaxmem=16384 -qsrcmsg -DFRONTEND -I../../src/include 
-I/home/nm/sw/nopath/icu58.2-32/include 
-I/home/nm/sw/nopath/libxml2-32/include/libxml2  
-I/home/nm/sw/nopath/uuid-32/include -I/home/nm/sw/nopath/openldap-32/include 
-I/home/nm/sw/nopath/icu58.2-32/include -I/home/nm/sw/nopath/libxml2-32/include 
-DVAL_CONFIGURE="\"'--enable-cassert' '--enable-debug' '--enable-nls' 
'--enable-tap-tests' '--with-icu' '--with-ldap' '--with-libxml' 
'--with-libxslt' '--with-openssl' '--with-ossp-uuid' '--with-perl' 
'--with-python' '--with-includes=/home/nm/sw/nopath/uuid-32/include 
/home/nm/sw/nopath/openldap-32/include /home/nm/sw/nopath/icu58.2-32/include 
/home/nm/sw/nopath/libxml2-32/include' 
'--with-libraries=/home/nm/sw/nopath/uuid-32/lib 
/home/nm/sw/nopath/openldap-32/lib /home/nm/sw/nopath/icu58.2-32/lib 
/home/nm/sw/nopath/libxml2-32/lib' '--with-tcl' 
'--prefix=/home/nm/farm/xlc32/HEAD/inst' '--with-pgport=7678' 'CC=xlc_r 
-qmaxmem=33554432 -D_LARGE_FILES=1 -DRANDOMIZE_ALLOCATED_MEMORY' 
'PKG_CONFIG_PATH=/home/nm/sw/nopath/icu58.2-32/lib/pkgconfig'\"" 
-DVAL_CC="\"xlc_r -qmaxmem=33554432 -D_LARGE_FILES=1 
-DRANDOMIZE_ALLOCATED_MEMORY\"" 
-DVAL_CPPFLAGS="\"-I/home/nm/sw/nopath/icu58.2-32/include 
-I/home/nm/sw/nopath/libxml2-32/include/libxml2 
-I/home/nm/sw/nopath/uuid-32/include -I/home/nm/sw/nopath/openldap-32/include 
-I/home/nm/sw/nopath/icu58.2-32/include 
-I/home/nm/sw/nopath/libxml2-32/include\"" -DVAL_CFLAGS="\"-qnoansialias -g -O2 
-qmaxmem=16384 -qsrcmsg\"" -DVAL_CFLAGS_SL="\"\"" 
-DVAL_LDFLAGS="\"-L/home/nm/sw/nopath/libxml2-32/lib 
-L/home/nm/sw/nopath/uuid-32/lib -L/home/nm/sw/nopath/openldap-32/lib 
-L/home/nm/sw/nopath/icu58.2-32/lib -L/home/nm/sw/nopath/libxml2-32/lib 
-Wl,-blibpath:'/home/nm/farm/xlc32/HEAD/inst/lib:/home/nm/sw/nopath/libxml2-32/lib:/home/nm/sw/nopath/uuid-32/lib:/home/nm/sw/nopath/openldap-32/lib:/home/nm/sw/nopath/icu58.2-32/lib:/home/nm/sw/nopath/libxml2-32/lib:/usr/lib:/lib'\""
 -DVAL_LDFLAGS_EX="\"\"" -DVAL_LDFLAGS_SL="\" -Wl,-bnoentry -Wl,-H512 
-Wl,-bM:SRE\"" -DVAL_LIBS="\"-lpgcommon -lpgport -lintl -lxslt -lxml2 -lssl 
-lcrypto -lz -lreadline -lm \""  -c -o base64.o base64.c
"../../src/include/common/fe_memutils.h", line 

Re: Allowing printf("%m") only where it actually works

2018-09-26 Thread Andres Freund
On 2018-09-26 11:09:59 -0400, Tom Lane wrote:
> Michael Paquier  writes:
> > On Tue, Sep 25, 2018 at 12:05:42PM -0400, Tom Lane wrote:
> >> Alvaro Herrera  writes:
> >>> Actually I think it *is* useful to do it like this, because then the
> >>> user knows to fix the netmsg.dll problem so that they can continue to
> >>> investigate the winsock problem.  If we don't report the secondary error
> >>> message, how are users going to figure out how to fix the problem?
> 
> >> OK, I'm fine with doing it like that if people want it.
> 
> > +1.
> 
> OK, pushed 0001 with that adjustment.
> 
> While looking over the thread, I remembered I wanted to convert
> strerror_r into a wrapper as well.  Think I'll go do that next,
> because really it'd be better for snprintf.c to be calling strerror_r
> not strerror.

The strerror push, I assume it's that at least, broke something on icc:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=fulmar=2018-09-26%2018%3A00%3A16

== pgsql.build/src/test/regress/regression.diffs 
===
*** 
/var/buildfarm/fulmar/build/HEAD/pgsql.build/src/test/regress/expected/create_function_1.out
Wed Sep 26 20:10:35 2018
--- 
/var/buildfarm/fulmar/build/HEAD/pgsql.build/src/test/regress/results/create_function_1.out
 Wed Sep 26 20:10:43 2018
***
*** 86,92 
  ERROR:  only one AS item needed for language "sql"
  CREATE FUNCTION test1 (int) RETURNS int LANGUAGE C
  AS 'nosuchfile';
! ERROR:  could not access file "nosuchfile": No such file or directory
  CREATE FUNCTION test1 (int) RETURNS int LANGUAGE C
  AS 
'/var/buildfarm/fulmar/build/HEAD/pgsql.build/src/test/regress/regress.so', 
'nosuchsymbol';
  ERROR:  could not find function "nosuchsymbol" in file 
"/var/buildfarm/fulmar/build/HEAD/pgsql.build/src/test/regress/regress.so"
--- 86,92 
  ERROR:  only one AS item needed for language "sql"
  CREATE FUNCTION test1 (int) RETURNS int LANGUAGE C
  AS 'nosuchfile';
! ERROR:  could not access file "nosuchfile": ENOENT
  CREATE FUNCTION test1 (int) RETURNS int LANGUAGE C
  AS 
'/var/buildfarm/fulmar/build/HEAD/pgsql.build/src/test/regress/regress.so', 
'nosuchsymbol';
  ERROR:  could not find function "nosuchsymbol" in file 
"/var/buildfarm/fulmar/build/HEAD/pgsql.build/src/test/regress/regress.so"

==


Greetings,

Andres Freund



Re: Implementing SQL ASSERTION

2018-09-26 Thread David Fetter
On Tue, Sep 25, 2018 at 12:04:12AM +0100, Joe Wildish wrote:
> Hi Peter,
> 
> > My feeling is that if we want to move forward on this topic, we need to
> > solve the concurrency question first.  All these optimizations for when
> > we don't need to check the assertion are cool, but they are just
> > optimizations that we can apply later on, once we have solved the
> > critical problems.
> 
> Having said all that: there are obviously going to be some expressions
> that cannot be proven to have no potential for invalidating the assertion
> truth. I guess this is the prime concern from a concurrency PoV? Example:
> 
> CREATE TABLE t (
>   b BOOLEAN NOT NULL,
>   n INTEGER NOT NULL,
>   PRIMARY KEY (b, n)
> );
> 
> CREATE ASSERTION sum_per_b_less_than_10 CHECK
>   (NOT EXISTS
> (SELECT FROM (SELECT b, SUM(n)
> FROM t
>GROUP BY b) AS v(b, sum_n)
>   WHERE sum_n > 10));

> 
> Invalidating operations are "INSERT(t) and UPDATE(t.b, t.n)".

So would DELETE(t), assuming n can be negative.

Is there some interesting and fairly easily documented subset of
ASSERTIONs that wouldn't have the "can't prove" property?

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate



Re: Query is over 2x slower with jit=on

2018-09-26 Thread Andres Freund
On 2018-09-26 23:45:35 +0530, Amit Khandekar wrote:
> On Tue, 25 Sep 2018 at 14:17, Andres Freund  wrote:
> >
> > On 2018-09-18 10:03:02 +0530, Amit Khandekar wrote:
> > > Attached v3 patch that does the above change.
> >
> > Attached is a revised version of that patch.  I've changed quite a few
> > things:
> > - I've reverted the split of "base" and "provider specific" contexts - I
> >   don't think it really buys us anything here.
> 
> The idea was to have a single estate field that accumulates all the
> JIT counters of leader as well as workers. I see that you want to
> delay the merging of workers and backend counters until end of query
> execution. More points on this in the bottom section.

No, I never want to merge things into the leader's stats.

> > - I've reverted the context creation changes - instead of creating a
> >   context in the leader just to store instrumentation in the worker,
> >   there's now a new EState->es_jit_combined_instr.
> 
> The context created in the leader was a light context, involving only
> the resource owner stuff, and not the provider initialization.

I know, but it added a fair bit of infrastructure and complications just
for that - and I see very very little reason that that'd ever be a
necessary separation.

> >
> > - That also means worker instrumentation doesn't get folded into the
> >   leader's instrumentation.
> 
> You mean the worker instrumentation doesn't get folded into the leader
> until the query execution end, right ? In the committed code, I see
> that now we merge the leader instrumentation into the combined worker
> instrumentation in standard_ExecutorEnd().

No, I mean it *never* gets folded into the leader's
instrumentation. There's a *separate* instrumentation field, where they
do get combined. But that still allows code to print out the leader's
stats alone.


> > This seems good for the future and for
> >   extensions - it's not actually "linear" time that's spent doing
> >   JIT in workers (& leader), as all of that work happens in
> >   parallel. Being able to disentangle that seems important.
> 
> Ok. Your point is: we should have the backend and workers info stored
> in two separate fields, and combine them only when we need it; so that
> we will be in a position to show combined workers-only info separately
> in the future. From the code, it looks like the es_jit_combined_instr
> stores combined workers info not just from a single Gather node, but
> all the Gather nodes in the plan. If we want to have separate workers
> info, I am not sure if it makes sense in combining workers from two
> separate Gather nodes; because these two sets of workers are
> unrelated, aren't they ?

Well, we now have all the individual stats around in an unmodified
manner. We could print both the aggregate and individualized stats.

Greetings,

Andres Freund



Re: Query is over 2x slower with jit=on

2018-09-26 Thread Amit Khandekar
On Tue, 25 Sep 2018 at 14:17, Andres Freund  wrote:
>
> On 2018-09-18 10:03:02 +0530, Amit Khandekar wrote:
> > Attached v3 patch that does the above change.
>
> Attached is a revised version of that patch.  I've changed quite a few
> things:
> - I've reverted the split of "base" and "provider specific" contexts - I
>   don't think it really buys us anything here.

The idea was to have a single estate field that accumulates all the
JIT counters of leader as well as workers. I see that you want to
delay the merging of workers and backend counters until end of query
execution. More points on this in the bottom section.

>
> - I've reverted the context creation changes - instead of creating a
>   context in the leader just to store instrumentation in the worker,
>   there's now a new EState->es_jit_combined_instr.

The context created in the leader was a light context, involving only
the resource owner stuff, and not the provider initialization.

>
> - That also means worker instrumentation doesn't get folded into the
>   leader's instrumentation.

You mean the worker instrumentation doesn't get folded into the leader
until the query execution end, right ? In the committed code, I see
that now we merge the leader instrumentation into the combined worker
instrumentation in standard_ExecutorEnd().

> This seems good for the future and for
>   extensions - it's not actually "linear" time that's spent doing
>   JIT in workers (& leader), as all of that work happens in
>   parallel. Being able to disentangle that seems important.

Ok. Your point is: we should have the backend and workers info stored
in two separate fields, and combine them only when we need it; so that
we will be in a position to show combined workers-only info separately
in the future. From the code, it looks like the es_jit_combined_instr
stores combined workers info not just from a single Gather node, but
all the Gather nodes in the plan. If we want to have separate workers
info, I am not sure if it makes sense in combining workers from two
separate Gather nodes; because these two sets of workers are
unrelated, aren't they ?

>
> This needs a bit more polish tomorrow, but I'm starting to like where
> this is going.  Comments?

Yeah, I think the plan output looks reasonable compact now. Thanks.


-- 
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company



Re: Allowing printf("%m") only where it actually works

2018-09-26 Thread Andres Freund
Hi,

On 2018-09-24 13:18:47 -0400, Tom Lane wrote:
> 0002 changes things so that we always use our snprintf, removing all the
> configure logic associated with that.

In the commit message you wrote:

> Preliminary performance testing suggests that as it stands, snprintf.c is
> faster than the native printf functions for some tasks on some platforms,
> and slower for other cases.  A pending patch will improve that, though
> cases with floating-point conversions will doubtless remain slower unless
> we want to put a *lot* of effort into that.  Still, we've not observed
> that *printf is really a performance bottleneck for most workloads, so
> I doubt this matters much.

I severely doubt the last sentence. I've *many* times seen *printf be a
significant bottleneck. In particular just about any pg_dump of a
database that has large tables with even just a single float column is
commonly bottlenecked on float -> string conversion.

A trivial bad benchmark:

CREATE TABLE somefloats(id serial, data1 float8, data2 float8, data3 float8);
INSERT INTO somefloats(data1, data2, data3) SELECT random(), random(), random() 
FROM generate_series(1, 1000);
VACUUM FREEZE somefloats;


postgres[12850][1]=# \dt+ somefloats
   List of relations
┌┬┬───┬┬┬─┐
│ Schema │Name│ Type  │ Owner  │  Size  │ Description │
├┼┼───┼┼┼─┤
│ public │ somefloats │ table │ andres │ 575 MB │ │
└┴┴───┴┴┴─┘

96bf88d52711ad3a0a4cc2d1d9cb0e2acab85e63:

COPY somefloats TO '/dev/null';
COPY 1000
Time: 24575.770 ms (00:24.576)

96bf88d52711ad3a0a4cc2d1d9cb0e2acab85e63^:

COPY somefloats TO '/dev/null';
COPY 1000
Time: 12877.037 ms (00:12.877)

IOW, we regress copy performance by about 2x. And one int and three
floats isn't a particularly insane table layout.


I'm not saying we shouldn't default to our printf - in fact I think we
probably past due to use a faster float->string conversion than we
portably get from the OS - but I don't think we can default to our
sprintf without doing something about the float conversion performance.


Greetings,

Andres Freund



Re: Participate in GCI as a Mentor

2018-09-26 Thread Sarah Conway Schnurr
Tahir,

Thank you for spending time thinking of task ideas!

We are ideally attempting to think of tasks that could benefit PostgreSQL
as a whole either through code, outreach/advocacy, quality assurance, or
design based contributions, as that seems to be the trend for other tasks
submitted by open source organizations. Additionally, we'd like to avoid
comparison style tasks to prevent negative connotations or press around
PostgreSQL or competing databases.

As an alternative, could you perhaps write up some tasks around "Performing
and Documenting backing up a PostgreSQL instance"? There could be specific
unique tasks for each of the backup tools, such as pgBackRest, pg_dump,
pg_basebackup, pgBarman, wal-e, and wal-g. I would write up the
requirements as being "Write up a 1000 word (minimum) essay" and
"include a paragraph
explaining in which situation the tool you're working with would be best
suited".

It would most likely fall under the "Research" or "Documentation"
categories, for researching and documenting their steps taken.

Following that, you could add in additional unique tasks for similar ideas,
such as:

   - "Write up a new user guide for pgBackRest based on a s3 backup
   architecture."
   - "Compose a cheat sheet for backing up and restoring a PostgreSQL
   instance using X backup tool."
   - "Create well-designed online graphical resources / cheat sheets for
   common PostgreSQL operations for the Online Resources
    page on
   PostgreSQL.org."
   - "Create a video tutorial explaining how to backup and restore a
   PostgreSQL instance using X backup tool."
   - "Create a video tutorial explaining how to upgrade a PostgreSQL
   instance from 9.6 to 10."

Thank you!

On Wed, Sep 26, 2018 at 12:31 AM Tahir Ramzan 
wrote:

> Honorable Concern,
>
> I am planning to mentor these ideas IN Google Code-in as these are
> learning, interesting, informative and doable for young students:
>
> Compare PostgreSQL with another RDBMS:
> • DB2
> • MySQL
> • Oracle
> • SQLite
> • MS SQL Server
> • Sybase
>
> Compare PostgreSQL with a NoSQL Column Database:
> • Accumulo
> • Cassandra
> • Druid
> • HBase
> • Vertica
>
> Compare PostgreSQL with a NoSQL Document Database:
> • Apache CouchDB
> • ArangoDB
> • BaseX
> • Clusterpoint
> • Couchbase
> • Cosmos DB
> • IBM Domino
> • MarkLogic
> • MongoDB
> • OrientDB
> • Qizx
> • RethinkDB
>
> Compare PostgreSQL with NoSQL Key-value Database:
> • Aerospike
> • Apache Ignite
> • ArangoDB
> • Berkeley DB
> • Couchbase
> • Dynamo
> • FairCom c-treeACE
> • FoundationDB
> • InfinityDB
> • MemcacheDB
> • MUMPS
> • Oracle NoSQL Database
> • OrientDB
> • Redis
> • Riak
> • SciDB
> • SDBM/Flat File dbm
> • ZooKeeper
>
> Compare PostgreSQL with a NoSQL Graph Database
> • AllegroGraph
> • ArangoDB
> • InfiniteGraph
> • Apache Giraph
> • MarkLogic
> • Neo4J
> • OrientDB
> • Virtuoso
>
> Kindly share your precious thoughts and feedback, thanks in anticipation.
>
> --
> Regards
> Tahir Ramzan
> MSCS Research Scholar
> Google Summer of Code 2015 (CiviCRM)
> Google Summer of Code 2016 (ModSecurity)
> Outside Collaborator of SpiderLabs (Powered by TrustWave)
> Google Android Students Club Facilitator and Organizer 2015
>
> Contact:
>
> +92-312-5518018 <+92%20312%205518018>
>
> tahirram...@alumni.vu.edu.pk
>
>
> More details about me and my work:
>
> GitHub Profile: https://github.com/tahirramzan
>
> LinkedIn Profile: https://pk.linkedin.com/in/tahirramzan
>


-- 
Sarah Conway Schnurr


Re: transction_timestamp() inside of procedures

2018-09-26 Thread Andres Freund
On 2018-09-26 12:54:43 -0300, Alvaro Herrera wrote:
> On 2018-Sep-26, Tom Lane wrote:
> 
> > Alvaro Herrera  writes:
> > > On 2018-Sep-26, Tom Lane wrote:
> > >> I agree that it would be surprising for transaction timestamp to be newer
> > >> than statement timestamp.  So for now at least, I'd be satisfied with
> > >> documenting the behavior.
> > 
> > > Really?  I thought it was practically obvious that for transaction-
> > > controlling procedures, the transaction timestamp would not necessarily
> > > be aligned with the statement timestamp.  The surprise would come
> > > together with the usage of the new feature, so existing users would not
> > > be surprised in any way.
> > 
> > Nope.  That's the same poor reasoning we've fallen into in some other
> > cases, of assuming that "the user" is a point source of knowledge.
> > But DBMSes tend to interact with lots of different code.  If some part
> > of application A starts using intraprocedure transactions, and then
> > application B breaks because it wasn't expecting to see xact_start
> > later than query_start in pg_stat_activity, you've still got a problem.
> 
> While that's true, I think it's also highly hypothetical.
> 
> What could be the use for the transaction timestamp?  I think one of the
> most important uses (at least in pg_stat_activity) is to verify that
> transactions are not taking excessively long time to complete; that's
> known to cause all sorts of trouble in Postgres, and probably other
> DBMSs too.  If we don't accurately measure what it really is, and
> instead keep the compatibility behavior, we risk panicking people
> because they think some transaction has been running for a long time
> when in reality it's just a very long procedure which commits frequently
> enough not to be a problem.

+1


> > I'm also a bit hesitant to invent new semantics here based on the
> > assumption that we've got only one, nonoverlapping, top-level transaction
> > at a time.  It's not terribly hard to imagine suspend-and-resume-
> > transaction features coming down the pike at some point.  What will
> > we do then?  We'll already have a definitional issue for xact_start,
> > but it'll get worse the more different kinds of xact_start we have.
> 
> This is even more hypothetical.
> 
> If we can have a list or stack of running transactions, clearly a single
> timestamp value is not sufficient.  We could report a single value for
> "the oldest transaction", or perhaps "the transaction that's currently
> active".  But if we wanted to be really thorough about it, we'd need to
> report the list of timestamps for each running transaction in the
> current session.  However, I don't think those future developments would
> change what the transaction timestamp is, namely, the start of the
> current transaction, not the start of the statement that (after possibly
> many iterations) gave rise to the current transaction.

+1

Greetings,

Andres Freund



Re: [PATCH] Improve geometric types

2018-09-26 Thread Tom Lane
Tomas Vondra  writes:
> Pushed. Now let's wait for the buildfarm to complain ...

gaur's not happy, but rather surprisingly, it looks like we're
mostly OK elsewhere.  Do you need me to trace down exactly what's
going wrong on gaur?

regards, tom lane



Re: Allowing printf("%m") only where it actually works

2018-09-26 Thread Tom Lane
I wrote:
> While looking over the thread, I remembered I wanted to convert
> strerror_r into a wrapper as well.  Think I'll go do that next,
> because really it'd be better for snprintf.c to be calling strerror_r
> not strerror.

So while chasing that, I realized that libpq contains its own version
of the backend's win32_socket_strerror code, in libpq/win32.c.
This probably explains why we've not heard complaints about bogus
socket error reports from libpq; it's that code that's handling it.

What I think ought to happen is to merge win32.c's version of that
code into strerror.c, which'd allow removing win32.c and win32.h
altogether.  However, not having a Windows environment, I can't
test such changes and probably shouldn't be the one to take point
on making the change.  Anybody?

regards, tom lane



Re: transction_timestamp() inside of procedures

2018-09-26 Thread Alvaro Herrera
On 2018-Sep-26, Tom Lane wrote:

> Alvaro Herrera  writes:
> > On 2018-Sep-26, Tom Lane wrote:
> >> I agree that it would be surprising for transaction timestamp to be newer
> >> than statement timestamp.  So for now at least, I'd be satisfied with
> >> documenting the behavior.
> 
> > Really?  I thought it was practically obvious that for transaction-
> > controlling procedures, the transaction timestamp would not necessarily
> > be aligned with the statement timestamp.  The surprise would come
> > together with the usage of the new feature, so existing users would not
> > be surprised in any way.
> 
> Nope.  That's the same poor reasoning we've fallen into in some other
> cases, of assuming that "the user" is a point source of knowledge.
> But DBMSes tend to interact with lots of different code.  If some part
> of application A starts using intraprocedure transactions, and then
> application B breaks because it wasn't expecting to see xact_start
> later than query_start in pg_stat_activity, you've still got a problem.

While that's true, I think it's also highly hypothetical.

What could be the use for the transaction timestamp?  I think one of the
most important uses (at least in pg_stat_activity) is to verify that
transactions are not taking excessively long time to complete; that's
known to cause all sorts of trouble in Postgres, and probably other
DBMSs too.  If we don't accurately measure what it really is, and
instead keep the compatibility behavior, we risk panicking people
because they think some transaction has been running for a long time
when in reality it's just a very long procedure which commits frequently
enough not to be a problem.

> I'm also a bit hesitant to invent new semantics here based on the
> assumption that we've got only one, nonoverlapping, top-level transaction
> at a time.  It's not terribly hard to imagine suspend-and-resume-
> transaction features coming down the pike at some point.  What will
> we do then?  We'll already have a definitional issue for xact_start,
> but it'll get worse the more different kinds of xact_start we have.

This is even more hypothetical.

If we can have a list or stack of running transactions, clearly a single
timestamp value is not sufficient.  We could report a single value for
"the oldest transaction", or perhaps "the transaction that's currently
active".  But if we wanted to be really thorough about it, we'd need to
report the list of timestamps for each running transaction in the
current session.  However, I don't think those future developments would
change what the transaction timestamp is, namely, the start of the
current transaction, not the start of the statement that (after possibly
many iterations) gave rise to the current transaction.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Online verification of checksums

2018-09-26 Thread Stephen Frost
Greetings,

* Fabien COELHO (coe...@cri.ensmp.fr) wrote:
> >Note that a short read isn't an error and falls under the 'new' blocks
> >discussion above.
> 
> I'm really unsure that a short read should really be coldly skipped:
> 
> If the check is offline, then one file is in a very bad state, this is
> really a panic situation.

Why?  Are we sure that's really something which can't ever happen, even
if the database was shutdown with 'immediate'?  I don't think it can but
that's something to consider.  In any case, my comments were
specifically thinking about it from an 'online' perspective.

> If the check is online, given that both postgres and the verify command
> interact with the same OS (?) and at the pg page level, I'm not sure in
> which situation there could be a partial block, because pg would only send
> full pages to the OS.

The OS doesn't operate at the same level that PG does- a single write in
PG could get blocked and scheduled off after having only copied half of
the 8k that PG sends.  This isn't really debatable- we've seen it happen
and everything is operating perfectly correctly, it just happens that
you were able to get a read() at the same time a write() was happening
and that only part of the page had been updated at that point.

Thanks!

Stephen


signature.asc
Description: PGP signature


Re: Online verification of checksums

2018-09-26 Thread Fabien COELHO



Hello Stephen,


I certainly don't see a lot of point in doing much more than what was
discussed previously for 'new' blocks (counting them as skipped and
moving on).


Sure.


An actual read() error (that is, a failure on a read() call such as
getting back EIO), on the other hand, is something which I'd probably
report back to the user immediately and then move on, and perhaps
report again at the end.


Yep.


Note that a short read isn't an error and falls under the 'new' blocks
discussion above.


I'm really unsure that a short read should really be coldly skipped:

If the check is offline, then one file is in a very bad state, this is 
really a panic situation.


If the check is online, given that both postgres and the verify command 
interact with the same OS (?) and at the pg page level, I'm not sure in 
which situation there could be a partial block, because pg would only 
send full pages to the OS.


--
Fabien.



Re: Online verification of checksums

2018-09-26 Thread Michael Banck
Hi,

Am Mittwoch, den 26.09.2018, 10:54 -0400 schrieb Stephen Frost:
> * Michael Banck (michael.ba...@credativ.de) wrote:
> > Am Mittwoch, den 26.09.2018, 13:23 +0200 schrieb Fabien COELHO:
> > > There are debatable changes of behavior:
> > > 
> > > if (errno == ENOENT) return / continue...
> > > 
> > > For instance, a file disappearing is ok online, but not so if offline. On 
> > > the other hand, the probability that a file suddenly disappears while the 
> > > server offline looks remote, so reporting such issues does not seem 
> > > useful.
> > > 
> > > However I'm more wary with other continues/skips added. ISTM that 
> > > skipping 
> > > a block because of a read error, or because it is new, or some other 
> > > reasons, is not the same thing, so should be counted & reported 
> > > differently?
> > 
> > I think that would complicate things further without a lot of benefit.
> > 
> > After all, we are interested in checksum failures, not necessarily read
> > failures etc. so exiting on them (and skip checking possibly large parts
> > of PGDATA) looks undesirable to me.
> > 
> > So I have done no changes in this part so far, what do others think
> > about this?
> 
> I certainly don't see a lot of point in doing much more than what was
> discussed previously for 'new' blocks (counting them as skipped and
> moving on).
> 
> An actual read() error (that is, a failure on a read() call such as
> getting back EIO), on the other hand, is something which I'd probably
> report back to the user immediately and then move on, and perhaps
> report again at the end.
> 
> Note that a short read isn't an error and falls under the 'new' blocks
> discussion above.

So I've added ENOENT checks when opening or statting files, i.e. EIO
would still be reported.

The current code in master exits on reads which do not return BLCKSZ,
which I've changed to a skip. So that means we now no longer check for
read failures (return code < 0) so I have now added a check for that and
emit an error message and return.

New version 5 attached.


Michael

-- 
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax:  +49 2166 9901-100
Email: michael.ba...@credativ.de

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer

Unser Umgang mit personenbezogenen Daten unterliegt
folgenden Bestimmungen: https://www.credativ.de/datenschutzdiff --git a/doc/src/sgml/ref/pg_verify_checksums.sgml b/doc/src/sgml/ref/pg_verify_checksums.sgml
index 905b8f1222..4ad6edcde6 100644
--- a/doc/src/sgml/ref/pg_verify_checksums.sgml
+++ b/doc/src/sgml/ref/pg_verify_checksums.sgml
@@ -37,9 +37,8 @@ PostgreSQL documentation
   Description
   
pg_verify_checksums verifies data checksums in a
-   PostgreSQL cluster.  The server must be shut
-   down cleanly before running pg_verify_checksums.
-   The exit status is zero if there are no checksum errors, otherwise nonzero.
+   PostgreSQL cluster.  The exit status is zero if
+   there are no checksum errors, otherwise nonzero.  
   
  
 
diff --git a/src/bin/pg_verify_checksums/pg_verify_checksums.c b/src/bin/pg_verify_checksums/pg_verify_checksums.c
index 1bc020ab6c..12cd41b9ea 100644
--- a/src/bin/pg_verify_checksums/pg_verify_checksums.c
+++ b/src/bin/pg_verify_checksums/pg_verify_checksums.c
@@ -1,7 +1,7 @@
 /*
  * pg_verify_checksums
  *
- * Verifies page level checksums in an offline cluster
+ * Verifies page level checksums in a cluster
  *
  *	Copyright (c) 2010-2018, PostgreSQL Global Development Group
  *
@@ -25,7 +25,9 @@
 static int64 files = 0;
 static int64 blocks = 0;
 static int64 badblocks = 0;
+static int64 skippedblocks = 0;
 static ControlFileData *ControlFile;
+static XLogRecPtr checkpointLSN;
 
 static char *only_relfilenode = NULL;
 static bool verbose = false;
@@ -54,6 +56,7 @@ static const char *const skip[] = {
 	"pg_filenode.map",
 	"pg_internal.init",
 	"PG_VERSION",
+	"pgsql_tmp",
 	NULL,
 };
 
@@ -67,8 +70,14 @@ skipfile(const char *fn)
 		return true;
 
 	for (f = skip; *f; f++)
+	{
 		if (strcmp(*f, fn) == 0)
 			return true;
+		if (strcmp(*f, "pg_internal.init") == 0)
+			if (strncmp(*f, fn, strlen(*f)) == 0)
+return true;
+	}
+
 	return false;
 }
 
@@ -79,10 +88,17 @@ scan_file(const char *fn, BlockNumber segmentno)
 	PageHeader	header = (PageHeader) buf.data;
 	int			f;
 	BlockNumber blockno;
+	bool		block_retry = false;
 
 	f = open(fn, O_RDONLY | PG_BINARY, 0);
 	if (f < 0)
 	{
+		if (errno == ENOENT)
+		{
+			/* File was removed in the meantime */
+			return;
+		}
+
 		fprintf(stderr, _("%s: could not open file \"%s\": %s\n"),
 progname, fn, strerror(errno));
 		exit(1);
@@ -97,26 +113,79 @@ scan_file(const char *fn, BlockNumber segmentno)
 
 		if (r == 0)
 			break;
+		if (r < 0)
+		{
+			fprintf(stderr, _("%s: could not read block %u in file \"%s\": %s\n"),
+	progname, blockno, fn, strerror(errno));
+			return;
+		}
 		if (r != 

Re: Online verification of checksums

2018-09-26 Thread Fabien COELHO




The patch is missing a documentation update.


I've added that now. I think the only change needed was removing the
"server needs to be offline" part?


Yes, and also checking that the described behavior correspond to the new 
version.



There are debatable changes of behavior:

if (errno == ENOENT) return / continue...

For instance, a file disappearing is ok online, but not so if offline. On
the other hand, the probability that a file suddenly disappears while the
server offline looks remote, so reporting such issues does not seem
useful.

However I'm more wary with other continues/skips added. ISTM that skipping
a block because of a read error, or because it is new, or some other
reasons, is not the same thing, so should be counted & reported
differently?


I think that would complicate things further without a lot of benefit.

After all, we are interested in checksum failures, not necessarily read
failures etc. so exiting on them (and skip checking possibly large parts
of PGDATA) looks undesirable to me.


Hmmm.

I'm really saying that it is debatable, so here is some fuel to the 
debate:


If I run the check command and it cannot do its job, there is a problem 
which is as bad as a failing checksum. The only safe assumption on a 
cannot-read block is that the checksum is bad... So ISTM that on 
on some of the "skipped" errors there should be appropriate report (exit 
code, final output) that something is amiss.


--
Fabien.



Re: Allowing printf("%m") only where it actually works

2018-09-26 Thread Tom Lane
Michael Paquier  writes:
> On Tue, Sep 25, 2018 at 12:05:42PM -0400, Tom Lane wrote:
>> Alvaro Herrera  writes:
>>> Actually I think it *is* useful to do it like this, because then the
>>> user knows to fix the netmsg.dll problem so that they can continue to
>>> investigate the winsock problem.  If we don't report the secondary error
>>> message, how are users going to figure out how to fix the problem?

>> OK, I'm fine with doing it like that if people want it.

> +1.

OK, pushed 0001 with that adjustment.

While looking over the thread, I remembered I wanted to convert
strerror_r into a wrapper as well.  Think I'll go do that next,
because really it'd be better for snprintf.c to be calling strerror_r
not strerror.

regards, tom lane



Re: Global snapshots

2018-09-26 Thread Arseny Sher
Hello,

Andrey Borodin  writes:

> I like the idea that with this patch set universally all postgres
> instances are bound into single distributed DB, even if they never
> heard about each other before :) This is just amazing. Or do I get
> something wrong?

Yeah, in a sense of xact visibility we can view it like this.

> I've got few questions:
> 1. If we coordinate HA-clusters with replicas, can replicas
> participate if their part of transaction is read-only?

Ok, there are several things to consider. Clock-SI as described in the
paper technically boils down to three things. First two assume CSN-based
implementation of MVCC where local time acts as CSN/snapshot source, and
they impose the following additional rules:

1) When xact expands on some node and imports its snapshot, it must be
  blocked until clocks on this node will show time >= snapshot being
  imported: node never processes xacts with snap 'from the future'
2) When we choose CSN to commit xact with, we must read clocks on
  all the nodes who participated in it and set CSN to max among read
  values.

These rules ensure *integrity* of the snapshot in the face of clock
skews regardless of which node we access: that is, snapshots are stable
(no non-repeatable reads) and no xact is considered half committed: they
prevent situation when the snapshot sees some xact on one node as
committed and on another node as still running.
(Actually, this is only true under the assumption that any distributed
xact is committed at all nodes instantly at the same time; this is
obviously not true, see 3rd point below.)

If we are speaking about e.g. traditional usage of hot standy, when
client in one xact accesses either primary, or some standby, but not
several nodes at once, we just don't need this stuff because usual MVCC
in Postgres already provides you consistent snapshot. Same is true for
multimaster-like setups, when each node accepts writes, but client still
accesses single node; if there is a write conflict (e.g. same row
updated on different nodes), one of xacts must be aborted; the snapshot
is still good.

However, if you really intend to read in one xact data from multiple
nodes (e.g. read primary and then replica), then yes, these problems
arise and Clock-SI helps with them. However, honestly it is hard for me
to make up a reason why would you want to do that: reading local data is
always more efficient than visiting several nodes. It would make sense
if we could read primary and replica in parallel, but that currently is
impossible in core Postgres. More straightforward application of the
patchset is sharding, when data is splitted and you might need to go to
several nodes in one xact to collect needed data.

Also, this patchset adds core algorithm and makes use of it only in
postgres_fdw; you would need to adapt replication (import/export global
snapshot API) to make it work there.

3) The third rule of Clock-SI deals with the following problem.
  Distributed (writing to several nodes) xact doesn't commit (i.e.
  becomes visible) instantly at all nodes. That means that there is a
  time hole in which we can see xact as committed on some node and still
  running on another. To mitigate this, Clock-SI adds kind of two-phase
  commit on visibility: additional state InDoubt which blocks all
  attempts to read this xact changes until xact's fate (commit/abort) is
  determined.

Unlike the previous problem, this issue exists in all replicated
setups. Even if we just have primary streaming data to one hot standby,
xacts are not committed on them instantly and we might observe xact as
committed on primary, then quickly switch to standby and find the data
we have just seen disappeared. remote_apply mode partially alleviates
this problem (apparently to the degree comfortable for most application
developers) by switching positions: with it xact always commits on
replicas earlier than on master. At least this guarantees that the guy
who wrote the xact will definitely see it on replica unless it drops the
connection to master before commit ack. Still the problem is not fully
solved: only addition of InDoubt state can fix this.

While Clock-SI (and this patchset) certainly addresses the issue as it
becomes even more serious in sharded setups (it makes possible to see
/parts/ of transactions), there is nothing CSN or clock specific
here. In theory, we could implement the same two-phase commit on
visiblity without switching to timestamp-based CSN MVCC.

Aside from the paper, you can have a look at Clock-SI explanation in
these slides [1] from PGCon.

> 2. How does InDoubt transaction behave when we add or subtract leap seconds?

Good question! In Clock-SI, time can be arbitrary desynchronized and
might go forward with arbitrary speed (e.g. clocks can be stopped), but
it must never go backwards. So if leap second correction is implemented
by doubling the duration of certain second (as it usually seems to be),
we are fine.

> Also, I could not understand some 

Re: Online verification of checksums

2018-09-26 Thread Stephen Frost
Greetings,

* Michael Banck (michael.ba...@credativ.de) wrote:
> Am Mittwoch, den 26.09.2018, 13:23 +0200 schrieb Fabien COELHO:
> > There are debatable changes of behavior:
> > 
> > if (errno == ENOENT) return / continue...
> > 
> > For instance, a file disappearing is ok online, but not so if offline. On 
> > the other hand, the probability that a file suddenly disappears while the 
> > server offline looks remote, so reporting such issues does not seem 
> > useful.
> > 
> > However I'm more wary with other continues/skips added. ISTM that skipping 
> > a block because of a read error, or because it is new, or some other 
> > reasons, is not the same thing, so should be counted & reported 
> > differently?
> 
> I think that would complicate things further without a lot of benefit.
> 
> After all, we are interested in checksum failures, not necessarily read
> failures etc. so exiting on them (and skip checking possibly large parts
> of PGDATA) looks undesirable to me.
> 
> So I have done no changes in this part so far, what do others think
> about this?

I certainly don't see a lot of point in doing much more than what was
discussed previously for 'new' blocks (counting them as skipped and
moving on).

An actual read() error (that is, a failure on a read() call such as
getting back EIO), on the other hand, is something which I'd probably
report back to the user immediately and then move on, and perhaps
report again at the end.

Note that a short read isn't an error and falls under the 'new' blocks
discussion above.

Thanks!

Stephen


signature.asc
Description: PGP signature


Re: [HACKERS] Bug in to_timestamp().

2018-09-26 Thread Liudmila Mantrova


On 09/22/2018 10:00 PM, Alexander Korotkov wrote:

On Thu, Sep 20, 2018 at 3:52 PM Alexander Korotkov
 wrote:

On Thu, Sep 20, 2018 at 6:09 AM amul sul  wrote:

Agreed, thanks for working on this.

Pushed, thanks.

Please, find attached patch improving documentation about
letters/digits in to_date()/to_timestamp() template string.  I think
it needs review from native English speaker.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Hi Alexander,

I'm not a native speaker, but let me try to help. A new doc version is 
attached.



--

Liudmila Mantrova
Technical writer at Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 9a7f683..1532bcc 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -6286,13 +6286,46 @@ SELECT regexp_match('abc01234xyz', '(?:(.*?)(\d+)(.*)){1,1}');
exceeds the number of separators in the template.
   
   
-   If FX is specified, separator in template string
-   matches to exactly one character in input string.  Notice we don't insist
-   input string character to be the same as template string separator.
+   If FX is specified, a separator in the template
+   string matches exactly one character in the input string.  The
+   input string character does not need to be the same as the template
+   string separator.
For example, to_timestamp('2000/JUN', 'FX MON')
works, but to_timestamp('2000/JUN', 'FXMON')
-   returns an error because a space second template string space consumed
-   letter J from the input string.
+   returns an error because the second space in the template string
+   consumes letter J of the input string.
+  
+ 
+
+ 
+  
+Template strings of to_timestamp and
+to_date functions can also contain arbitrary
+letters/digits between patterns.  Such letters/digits can match
+any characters in the input string.  For example,
+to_timestamp('2000yJUN', 'xMON') works.
+  
+  
+Letters/digits consume an input string character only if the number
+of extra characters at the beginning of the input string or between
+the identified date/time values is less than or equal to the number
+of the corresponding characters in the template string. This ensures
+that the template string does not consume any characters of date/time
+values when used without the FX option, even if
+a letter/digit separator in the input string appears after a space.
+For example, to_timestamp('2000y JUN', 'xMON')
+works, but to_timestamp('2000 yJUN', 'xMON')
+returns an error.
+  
+  
+Note that if the template string contains an arbitrary letter,
+the pattern that precedes this letter becomes greedy and tries
+to match as many characters as possible. For example,
+to_timestamp('2000906901', 'xMMxDD')
+fails because the  pattern matches
+the whole input string instead of the first four characters.
+Patterns separated by digits are non-greedy, so
+to_timestamp('2000906901', '0MM0DD') works fine.
   
  
 


Re: Online verification of checksums

2018-09-26 Thread Michael Banck
Hi,

Am Mittwoch, den 26.09.2018, 13:23 +0200 schrieb Fabien COELHO:
> Patch v3 applies cleanly, code compiles and make check is ok, but the 
> command is probably not tested anywhere, as already mentioned on other 
> threads.

Right.

> The patch is missing a documentation update.

I've added that now. I think the only change needed was removing the
"server needs to be offline" part?

> There are debatable changes of behavior:
> 
> if (errno == ENOENT) return / continue...
> 
> For instance, a file disappearing is ok online, but not so if offline. On 
> the other hand, the probability that a file suddenly disappears while the 
> server offline looks remote, so reporting such issues does not seem 
> useful.
> 
> However I'm more wary with other continues/skips added. ISTM that skipping 
> a block because of a read error, or because it is new, or some other 
> reasons, is not the same thing, so should be counted & reported 
> differently?

I think that would complicate things further without a lot of benefit.

After all, we are interested in checksum failures, not necessarily read
failures etc. so exiting on them (and skip checking possibly large parts
of PGDATA) looks undesirable to me.

So I have done no changes in this part so far, what do others think
about this?

>+ if (block_retry == false)
> 
> Why not trust boolean operations?
> 
>if (!block_retry)

I've changed that as well.

Version 4 is attached.


Michael

-- 
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax:  +49 2166 9901-100
Email: michael.ba...@credativ.de

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer

Unser Umgang mit personenbezogenen Daten unterliegt
folgenden Bestimmungen: https://www.credativ.de/datenschutzdiff --git a/doc/src/sgml/ref/pg_verify_checksums.sgml b/doc/src/sgml/ref/pg_verify_checksums.sgml
index 905b8f1222..4ad6edcde6 100644
--- a/doc/src/sgml/ref/pg_verify_checksums.sgml
+++ b/doc/src/sgml/ref/pg_verify_checksums.sgml
@@ -37,9 +37,8 @@ PostgreSQL documentation
   Description
   
pg_verify_checksums verifies data checksums in a
-   PostgreSQL cluster.  The server must be shut
-   down cleanly before running pg_verify_checksums.
-   The exit status is zero if there are no checksum errors, otherwise nonzero.
+   PostgreSQL cluster.  The exit status is zero if
+   there are no checksum errors, otherwise nonzero.  
   
  
 
diff --git a/src/bin/pg_verify_checksums/pg_verify_checksums.c b/src/bin/pg_verify_checksums/pg_verify_checksums.c
index 1bc020ab6c..30c26e7a84 100644
--- a/src/bin/pg_verify_checksums/pg_verify_checksums.c
+++ b/src/bin/pg_verify_checksums/pg_verify_checksums.c
@@ -1,7 +1,7 @@
 /*
  * pg_verify_checksums
  *
- * Verifies page level checksums in an offline cluster
+ * Verifies page level checksums in a cluster
  *
  *	Copyright (c) 2010-2018, PostgreSQL Global Development Group
  *
@@ -25,7 +25,9 @@
 static int64 files = 0;
 static int64 blocks = 0;
 static int64 badblocks = 0;
+static int64 skippedblocks = 0;
 static ControlFileData *ControlFile;
+static XLogRecPtr checkpointLSN;
 
 static char *only_relfilenode = NULL;
 static bool verbose = false;
@@ -54,6 +56,7 @@ static const char *const skip[] = {
 	"pg_filenode.map",
 	"pg_internal.init",
 	"PG_VERSION",
+	"pgsql_tmp",
 	NULL,
 };
 
@@ -67,8 +70,14 @@ skipfile(const char *fn)
 		return true;
 
 	for (f = skip; *f; f++)
+	{
 		if (strcmp(*f, fn) == 0)
 			return true;
+		if (strcmp(*f, "pg_internal.init") == 0)
+			if (strncmp(*f, fn, strlen(*f)) == 0)
+return true;
+	}
+
 	return false;
 }
 
@@ -79,10 +88,17 @@ scan_file(const char *fn, BlockNumber segmentno)
 	PageHeader	header = (PageHeader) buf.data;
 	int			f;
 	BlockNumber blockno;
+	bool		block_retry = false;
 
 	f = open(fn, O_RDONLY | PG_BINARY, 0);
 	if (f < 0)
 	{
+		if (errno == ENOENT)
+		{
+			/* File was removed in the meantime */
+			return;
+		}
+
 		fprintf(stderr, _("%s: could not open file \"%s\": %s\n"),
 progname, fn, strerror(errno));
 		exit(1);
@@ -99,24 +115,71 @@ scan_file(const char *fn, BlockNumber segmentno)
 			break;
 		if (r != BLCKSZ)
 		{
-			fprintf(stderr, _("%s: could not read block %u in file \"%s\": read %d of %d\n"),
-	progname, blockno, fn, r, BLCKSZ);
-			exit(1);
+			/* Skip partially read blocks */
+			skippedblocks++;
+			continue;
 		}
-		blocks++;
 
 		/* New pages have no checksum yet */
 		if (PageIsNew(header))
+		{
+			skippedblocks++;
 			continue;
+		}
+
+		blocks++;
 
 		csum = pg_checksum_page(buf.data, blockno + segmentno * RELSEG_SIZE);
 		if (csum != header->pd_checksum)
 		{
+			/*
+			 * Retry the block on the first failure.  It's
+			 * possible that we read the first 4K page of
+			 * the block just before postgres updated the
+			 * entire block so it ends up looking torn to
+			 * us.  We only need to retry once because the
+			 * LSN should be updated to 

Re: transction_timestamp() inside of procedures

2018-09-26 Thread Tom Lane
Alvaro Herrera  writes:
> On 2018-Sep-26, Tom Lane wrote:
>> I agree that it would be surprising for transaction timestamp to be newer
>> than statement timestamp.  So for now at least, I'd be satisfied with
>> documenting the behavior.

> Really?  I thought it was practically obvious that for transaction-
> controlling procedures, the transaction timestamp would not necessarily
> be aligned with the statement timestamp.  The surprise would come
> together with the usage of the new feature, so existing users would not
> be surprised in any way.

Nope.  That's the same poor reasoning we've fallen into in some other
cases, of assuming that "the user" is a point source of knowledge.
But DBMSes tend to interact with lots of different code.  If some part
of application A starts using intraprocedure transactions, and then
application B breaks because it wasn't expecting to see xact_start
later than query_start in pg_stat_activity, you've still got a problem.

I'm also a bit hesitant to invent new semantics here based on the
assumption that we've got only one, nonoverlapping, top-level transaction
at a time.  It's not terribly hard to imagine suspend-and-resume-
transaction features coming down the pike at some point.  What will
we do then?  We'll already have a definitional issue for xact_start,
but it'll get worse the more different kinds of xact_start we have.

regards, tom lane



Re: transction_timestamp() inside of procedures

2018-09-26 Thread Alvaro Herrera
On 2018-Sep-26, Tom Lane wrote:

> Bruce Momjian  writes:
> > On Wed, Sep 26, 2018 at 02:38:25PM +0200, Peter Eisentraut wrote:
> >> We could certainly address this by adding three or four or five new
> >> timestamps that cover all these varieties.  But perhaps it's worth
> >> asking what these timestamps are useful for and which ones we really need.
> 
> > Frankly, we might be fine with just documenting it and see if anyone
> > complains.
> 
> I'm not for adding a bunch of new action-start timestamps without very
> clear use-cases for them, because each one we add means more gettimeday()
> overhead that might or might not ever be useful.
> 
> I agree that it would be surprising for transaction timestamp to be newer
> than statement timestamp.  So for now at least, I'd be satisfied with
> documenting the behavior.

Really?  I thought it was practically obvious that for transaction-
controlling procedures, the transaction timestamp would not necessarily
be aligned with the statement timestamp.  The surprise would come
together with the usage of the new feature, so existing users would not
be surprised in any way.

I do wonder how do other systems behave in this area, though.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: transction_timestamp() inside of procedures

2018-09-26 Thread Alvaro Herrera
On 2018-Sep-25, Bruce Momjian wrote:

> Well, it is an entire paragraph, so it might be too much.  If you
> download the zip file here:
> 
>   http://www.wiscorp.com/sql200n.zip
> 
> and open 5CD2-02-Foundation-2006-01.pdf, at the top of page 289 under
> General Rules, paragraph label 3 has the description.  It talks about
> procedure statements and trigger functions, which seems promising.

I have the 2011 draft, not the 2006 one; you seem to be referring to
 (which is 6.32 in the 2011 draft I have).
General rule 3 is entirely unreadable, and is followed by this note:

  WG3:LCY-025 took no action on the preceding instance of general containment.
  It was felt that this rule is too complicated, to the point of being virtually
  unintelligible. In addition, the rule does not recognize that s can be evaluated implicitly as s. It is believed
  that this rule does not reflect actual practice and should be rewritten to
  align it with implementations. Note that Subclause 15.1, “Effect of opening a
  cursor”, also has a General Rule on this subject. See
  Possible Problem FND-992 .

In SQL2016, this rule was removed completely.

I don't think this offers any practical guidance.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: transction_timestamp() inside of procedures

2018-09-26 Thread Tom Lane
Bruce Momjian  writes:
> On Wed, Sep 26, 2018 at 02:38:25PM +0200, Peter Eisentraut wrote:
>> We could certainly address this by adding three or four or five new
>> timestamps that cover all these varieties.  But perhaps it's worth
>> asking what these timestamps are useful for and which ones we really need.

> Frankly, we might be fine with just documenting it and see if anyone
> complains.

I'm not for adding a bunch of new action-start timestamps without very
clear use-cases for them, because each one we add means more gettimeday()
overhead that might or might not ever be useful.

I agree that it would be surprising for transaction timestamp to be newer
than statement timestamp.  So for now at least, I'd be satisfied with
documenting the behavior.

regards, tom lane



Re: txid_status returns NULL for recently commited transactions

2018-09-26 Thread Michail Nikolaev
Hello.

Got some new information.

There are 6 replicas and master in cluster. I rebooted two replicas... And
they started to work as expected!

So, on master and 4 untouched replicas:
txid_current() -> 4459388265
txid_status(BIGINT '4459388265') -> NULL

On two rebooted replicas:
txid_status(BIGINT '4459388265') -> 'commited'

All replicas are in sync with master.

Root cause is ShmemVariableCache.oldestClogXid value, as described in
previous message.

On master and 4 replicas:
(gdb) p ShmemVariableCache.oldestClogXid
$13 = 2207340131
(gdb) p ShmemVariableCache.oldestXid
$11 = 3764954191

On two rebooted replicas:
(gdb) p ShmemVariableCache.oldestClogXid
$14 = 3764954191
(gdb) p ShmemVariableCache.oldestXid
$12 = 3764954191

SELECT datname, age(datfrozenxid), datfrozenxid FROM pg_database;
template0252790897 4207340131
postgres  302786659 4157344369
template1302786564 4157344464
project 695176837 3764954191

As far as I remember master and replicas were not rebooted after upgrading
from 9.6 to 10. So, maybe issue is upgrade-related.

вт, 25 сент. 2018 г. в 22:22, Michail Nikolaev :

> Hi, thanks for the reply!
>
>
> > What are you using it for?
>
> I want to use it to validate status of related entities in other database
> (queue) in short interval after PG transaction commit/rollback.
>
>
> > I can't reproduce that...
>
> Yes, it happens only with one cluster. All others work as expected.
>
>
> > Your mailer appears to do very annoying things by converting numbers to
> phone numbers.
>
> Sorry.
>
>
> > It's from the last epoch. Plain xids are 32bit wide, the epochs deal
> > with values that are bigger. And 2207340131 <(220)%20734-0131> is less
> than 2^31 in the
> > past.
>
> Yes, and probably it is cause of the issue.
>
> ShmemVariableCache->oldestClogXid = 2207340131 <(220)%20734-0131>
>
> xid_epoch = 1
> xid = 150227913
> TransactionIdPrecedes(xid, ShmemVariableCache->oldestClogXid)) return TRUE
> , then TransactionIdInRecentPast return FALSE and txtd_status return NULL.
>
> But xid (1) and xid_epoch (150227913) are correct values from my active
> (or recently commited) transaction.
>
> >> SELECT txid_status(BIGINT '4294967295') -> 'commited'.
> >> SELECT txid_status(BIGINT '4294967296') -> NULL
> > Why do you think that is the wrong result?
>
> Let's leave it for now (maybe my misunderstanding). I think it is better
> to deal with "txid_status(txid_current()) -> NULL" issue first.
>
> Thanks,
> Michail.
>


Re: transction_timestamp() inside of procedures

2018-09-26 Thread Bruce Momjian
On Wed, Sep 26, 2018 at 02:38:25PM +0200, Peter Eisentraut wrote:
> On 22/09/2018 00:35, Bruce Momjian wrote:
> > I have always thought of clock/statement/transation as decreasing levels
> > of time precision, and it might be odd to change that.  I don't think we
> > want to change the behavior of statement_timestamp() in procedures, so
> > that kind of requires us not to change transaction_timestamp() inside of
> > procedures.
> > 
> > However, no change in behavior causes the problem that if you have a
> > transaction block using transaction_timestamp() or CURRENT_TIMESTAMP,
> > and you move it into a procedure, the behavior of those functions will
> > change, but this was always true of moving statement_timestamp() into a
> > function, and I have never heard a complaint about that.
> 
> Right.  Statement timestamp actually means the timestamp of the
> top-level statement, because it's set where the protocol interaction
> happens.  The transaction timestamp is implemented as the statement
> timestamp when the transaction starts, but for intra-procedural
> transactions, the statement timestamp does not advance, so the
> transaction timestamp doesn't change.
> 
> Note that this also affects the xact_start column in pg_stat_activity.
> 
> We could certainly address this by adding three or four or five new
> timestamps that cover all these varieties.  But perhaps it's worth
> asking what these timestamps are useful for and which ones we really need.

Frankly, we might be fine with just documenting it and see if anyone
complains.  I think the top-level statement part is obvious, but I am
not sure the top-level transaction part is.  This is because it is clear
the top-level statement causes all the lower statements underneath it,
but the top-level transaction doesn't control all the transactions under
it in the same way.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +



Re: transction_timestamp() inside of procedures

2018-09-26 Thread Peter Eisentraut
On 22/09/2018 00:35, Bruce Momjian wrote:
> I have always thought of clock/statement/transation as decreasing levels
> of time precision, and it might be odd to change that.  I don't think we
> want to change the behavior of statement_timestamp() in procedures, so
> that kind of requires us not to change transaction_timestamp() inside of
> procedures.
> 
> However, no change in behavior causes the problem that if you have a
> transaction block using transaction_timestamp() or CURRENT_TIMESTAMP,
> and you move it into a procedure, the behavior of those functions will
> change, but this was always true of moving statement_timestamp() into a
> function, and I have never heard a complaint about that.

Right.  Statement timestamp actually means the timestamp of the
top-level statement, because it's set where the protocol interaction
happens.  The transaction timestamp is implemented as the statement
timestamp when the transaction starts, but for intra-procedural
transactions, the statement timestamp does not advance, so the
transaction timestamp doesn't change.

Note that this also affects the xact_start column in pg_stat_activity.

We could certainly address this by adding three or four or five new
timestamps that cover all these varieties.  But perhaps it's worth
asking what these timestamps are useful for and which ones we really need.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [patch]overallocate memory for curly braces in array_out

2018-09-26 Thread Keiichi Hirobe
Hi,

Thanks for the reply.

I am not sure whether to fix another bug, but I fixed and I attached a new
patch,
please check it.

Please note that I inserted a line for updating "overall_length" at line
1185.

I checked below case.

select ARRAY[ARRAY[1,2,3],ARRAY[4,5,6]]; (regular case)
select '[3:4][2:4]={{1,2,3},{4,5,6}}'::_int4; (needdims case)
select  ARRAY['a"bc','def']; (needquote case)


Cheers,
Keiichi Hirobe

2018年9月24日(月) 23:46 Tom Lane :

> Keiichi Hirobe  writes:
> > Attached is a patch that fixes a bug
> > for miscounting total number of curly braces in output string in
> array_out.
>
> Wow, good catch!
>
> Testing this, I found there's a second way in which the space calculation
> is off: it always allocated one more byte than required, as a result of
> counting one more comma than is really required.  That's not nearly as
> significant as the curly-brace miscount, but it still got in the way of
> doing this:
>
> *** 1234,1239 
> --- 1243,1251 
>   #undef APPENDSTR
>   #undef APPENDCHAR
>
> +   /* Assert that we calculated the string length accurately */
> +   Assert(overall_length == (p - retval + 1));
> +
> pfree(values);
> pfree(needquotes);
>
>
> which seemed to me like a good idea now that we know this code isn't
> so perfect as all that.
>
> Will push shortly.
>
> regards, tom lane
>


array_out_bugfix2.patch
Description: Binary data


Re: Implementing SQL ASSERTION

2018-09-26 Thread Peter Eisentraut
On 25/09/2018 01:04, Joe Wildish wrote:
> Having said all that: there are obviously going to be some expressions
> that cannot be proven to have no potential for invalidating the assertion
> truth. I guess this is the prime concern from a concurrency PoV?

Before we spend more time on this, I think we need to have at least a
plan for that.  Perhaps we could should disallow cases that we can't
handle otherwise.  But even that would need some analysis of which
practical cases we can and cannot handle, how we could extend support in
the future, etc.

In the meantime, I have committed parts of your gram.y changes that seem
to come up every time someone dusts off an assertions patch.  Keep that
in mind when you rebase.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Online verification of checksums

2018-09-26 Thread Fabien COELHO



Hallo Michael,

Patch v3 applies cleanly, code compiles and make check is ok, but the 
command is probably not tested anywhere, as already mentioned on other 
threads.


The patch is missing a documentation update.

There are debatable changes of behavior:

   if (errno == ENOENT) return / continue...

For instance, a file disappearing is ok online, but not so if offline. On 
the other hand, the probability that a file suddenly disappears while the 
server offline looks remote, so reporting such issues does not seem 
useful.


However I'm more wary with other continues/skips added. ISTM that skipping 
a block because of a read error, or because it is new, or some other 
reasons, is not the same thing, so should be counted & reported 
differently?


  + if (block_retry == false)

Why not trust boolean operations?

  if (!block_retry)

--
Fabien.



Re: pgbench - add pseudo-random permutation function

2018-09-26 Thread Fabien COELHO



Hello,


That is necessary because most people consume PostgreSQL through
packages from distributions that have to work on an Athlon II or
whatever, so we can't just use -msse4.2 for every translation unit.
So it becomes our job to isolate small bits of code that use newer
instructions, if it's really worth the effort to do that, and supply
our own runtime checks and provide a fallback.


Ok. That was my understanding so as to improve the portability/performance 
compromise. I do not think that pgbench is worth the effort on this 
particular point.



[...] None of that seems worth it for something like this.


Indeed.

So, am I right to deducing that you are satisfied with the current status 
of the patch, with the nbits implementation either based on popcount (v4) 
or clz (v5) compiler intrinsics? I think that the clz option is better.


--
Fabien.



Re: pgbench - add pseudo-random permutation function

2018-09-26 Thread Thomas Munro
On Wed, Sep 26, 2018 at 8:26 PM Fabien COELHO  wrote:
> > modular_multiply() is an interesting device.  I will leave this to
> > committers with a stronger mathematical background than me, but I have
> > a small comment in passing:
>
> For testing this function, I have manually commented out the various
> shortcuts so that the manual multiplication code is always used, and the
> tests passed. I just did it again.
>
> > +#ifdef HAVE__BUILTIN_POPCOUNTLL
> > + return __builtin_popcountll(n);
> > +#else /* use clever no branching bitwise operator version */
> >
> > I think it is not enough for the compiler to have
> > __builtin_popcountll().  The CPU that this is eventually executed on
> > must actually have the POPCNT instruction[1] (or equivalent on ARM,
> > POWER etc), or the program will die with SIGILL.
>
> Hmmm, I'd be pretty surprised: The point of the builtin is to delegate to
> the compiler the hassle of selecting the best option available, depending
> on target hardware class: whether it issues a cpu/sse4 instruction is
> not/should not be our problem.

True, it selects based on what you tell it:

$ cat x.c
#include 
#include 

int
main(int argc, char *argv[])
{
printf("%d\n", __builtin_popcount(atoi(argv[1])));
}
$ gdb -q a.out
...
(gdb) disass main
...
   0x004005a4 <+39>: callq  0x4005c0 <__popcountdi2>
...
$ gcc -mpopcnt x.c
$ gdb -q a.out
...
(gdb) disass main
...
   0x0040059f <+34>: popcnt %eax,%eax
...

I'd forgotten one detail... we figure out if need to tell it that we
have SSE4.2 instructions explicitly in the configure script:

checking which CRC-32C implementation to use... SSE 4.2 with runtime check

We enable -msse4.2 just on the one file that needs it, in src/port/Makefile:

pg_crc32c_sse42.o: CFLAGS+=$(CFLAGS_SSE42)

On this CentOS machine I see:

gcc ... -msse4.2 ... -c pg_crc32c_sse42.c -o pg_crc32c_sse42_srv.o

That is necessary because most people consume PostgreSQL through
packages from distributions that have to work on an Athlon II or
whatever, so we can't just use -msse4.2 for every translation unit.
So it becomes our job to isolate small bits of code that use newer
instructions, if it's really worth the effort to do that, and supply
our own runtime checks and provide a fallback.

> > There are CPUs in circulation produced in this decade that don't have
> > it.
>
> Then the compiler, when generating code that is expected to run for a
> large class of hardware which include such old ones, should not use a
> possibly unavailable instruction, or the runtime should take
> responsability for dynamically selecting a viable option.

Right.  Our problem is that if we didn't do our own runtime testing,
users of (say) Debian and RHEL packages (= most users of PostgreSQL)
would effectively never be able to use things like CRC32 instructions.
None of that seems worth it for something like this.

> An interesting side effect of your mail is that while researching a bit on
> the subject I noticed __builtin_clzll() which helps improve the nbits code
> compared to using popcount. Patch attached uses CLZ insted of POPCOUNT.

It's annoying that fls() didn't make it into POSIX along with ffs().
On my system it compiles to a BSR instruction, just like
__builtin_clz().

-- 
Thomas Munro
http://www.enterprisedb.com



64-bit hash function for hstore and citext data type

2018-09-26 Thread amul sul
Hi all,

Commit[1] has added 64-bit hash functions for core data types and in the same
discussion thread[2] Robert Haas suggested to have the similar extended hash
function for hstore and citext data type. Attaching patch proposes the same.

Regards,
Amul

1] 
https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=81c5e46c490e2426db243eada186995da5bb0ba7
2] 
http://postgr.es/m/ca+tgmoafx2yojuhcqqol5cocei-w_ug4s2xt0etgijnpgch...@mail.gmail.com


hstore-add-extended-hash-function-v1.patch
Description: Binary data


citext-add-extended-hash-function-v1.patch
Description: Binary data


Re: [PATCH] Improve geometric types

2018-09-26 Thread Tomas Vondra
On 09/23/2018 11:55 PM, Tomas Vondra wrote:
> On 09/03/2018 04:08 AM, Tom Lane wrote:
>> Tomas Vondra  writes:
>>> On 08/17/2018 11:23 PM, Tom Lane wrote:
 Yeah, we've definitely hit such problems before.  The geometric logic
 seems particularly prone to it because it's doing more and subtler
 float arithmetic than the rest of the system ... but it's not the sole
 source of minus-zero issues.
>>
>>> We can go through the patch and fix places with obvious -0 risks, but
>>> then what? I don't have access to any powepc machines, so I can't check
>>> if there are any other failures. So the only thing I could do is commit
>>> and fix based on buildfarm failures ...
>>
>> That's the usual solution.  I don't see anything particularly wrong
>> with it.  If the buildfarm results suggest a whole mess of problems,
>> it might be appropriate to revert and rethink; but you shouldn't be
>> afraid to use the buildfarm as a testbed.
>>
> 
> FWIW I plan to get this committed sometime this week, when I'm available
> to fix the expected buildfarm breakage.
> 

Pushed. Now let's wait for the buildfarm to complain ...


regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Progress reporting for pg_verify_checksums

2018-09-26 Thread Fabien COELHO



Hallo Michael,

About patch v3: applies cleanly and compiles.

The xml documentation should be updated! (It is kind of hard to notice 
what is not there:-)


See "doc/src/sgml/ref/pg_verify_checksums.sgml".


The time() granularity to the second makes the result awkward on small
tests:

 8/1554552 kB (0%, 8 kB/s)
 1040856/1554552 kB (66%, 1040856 kB/s)
 1554552/1554552 kB (100%, 1554552 kB/s)

I'd suggest to reuse the "portability/instr_time.h" stuff which allows a
much better precision.


I still think it would make sense to use that instead of low-precision 
time().



The "kB" kilo-byte symbols stands for 1000 bytes, but you are displaying
1024-bytes kilos, which symbol is either "KiB" or "KB". Given that we are
about storage which is usually measured in powers of 1,000, so I'd suggest
to use thousands.

Another reserve I have is that on any hardware it is likely that there will
be a lot of kilos, so maybe megas (MB) should be used instead.


The display is exactly the same as in pg_basebackup (except the
bandwith is shown as well), so right now I think it is more useful to be
consistent here.


Hmmm... units are units, and the display is wrong. The fact that other 
commands are wrong as well does not look like a good argument to persist 
in an error.



So if we change that, pg_basebackup should be changed as well I think.


I'm okay with fixing pg_basebackup as well! I'm unsure about the best 
place to put such a function, though. Probably under "src/common/" where 
there is already some front-end specific code ("fe_memutils.c").



Maybe the code could be factored out to some common file in the future.


I would be okay with a progress printing function shared between some 
commands. It just needs some place. pg_rewind also has a --rewind option.



Maybe it would be nice to show elapsed time and expected completion time
based on the current speed.


Maybe; this could be added to the factored out common code I mentioned
above.


Yep.


I would be in favor or having this turned on by default, and silenced with
some option. I'm not sure other people would agree, though, so it is an open
question as well.


If this runs in a script, it would be pretty annoying, so everybody
would have to add --no-progress so I am not convinced. Also, both
pg_basebackup and pgbench don't show progress by default.



Ok.


I do not see much advantage in using intermediate string buffers for values:

 + snprintf(totalstr, sizeof(totalstr), INT64_FORMAT,
 +  total_size / 1024);
 + snprintf(currentstr, sizeof(currentstr), INT64_FORMAT,
 +  current_size / 1024);
 + snprintf(currspeedstr, sizeof(currspeedstr), INT64_FORMAT,
 +  (current_size / 1024) / (((time(NULL) - scan_started) == 0) ? 1 : 
(time(NULL) - scan_started)));
 + fprintf(stderr, "%s/%s kB (%d%%, %s kB/s)",
 +  currentstr, totalstr, total_percent, currspeedstr);

ISTM that just one simple fprintf would be fine:

  fprintf(stderr, INT64_FORMAT "/" INT64_FORMAT " ...",
  formulas for each slot...);


That code is adopted from pg_basebackup.c and the comment there says:

| * Separate step to keep platform-dependent format code out of
| * translatable strings.  And we only test for INT64_FORMAT
| * availability in snprintf, not fprintf.

So that sounds legit to me.


Hmmm. Translation. Not sure I like much the idea of translating units,
but why not.


| * If we are reporting to a terminal, send a carriage return so that
| * we stay on the same line.  If not, send a newline.


And it runs a little amok with "-v".


Do you suggest we should make those mutually exlusive?  I agree that in
general, -P is not very useful if -v is on, but if you have a really big
table, it should still be, no?


Yep. I was just mentionning that they interfere on a terminal, but I agree
that there is no obvious fix.


 + memset(, '\0', sizeof(act));

pg uses 0 instead of '\0' everywhere else.


Ok.


Not '0', plain 0 (the integer of value zero).

--
Fabien.



Re: pgbench - add pseudo-random permutation function

2018-09-26 Thread Fabien COELHO


Hello Thomas,


modular_multiply() is an interesting device.  I will leave this to
committers with a stronger mathematical background than me, but I have
a small comment in passing:


For testing this function, I have manually commented out the various 
shortcuts so that the manual multiplication code is always used, and the 
tests passed. I just did it again.



+#ifdef HAVE__BUILTIN_POPCOUNTLL
+ return __builtin_popcountll(n);
+#else /* use clever no branching bitwise operator version */

I think it is not enough for the compiler to have 
__builtin_popcountll().  The CPU that this is eventually executed on 
must actually have the POPCNT instruction[1] (or equivalent on ARM, 
POWER etc), or the program will die with SIGILL.


Hmmm, I'd be pretty surprised: The point of the builtin is to delegate to 
the compiler the hassle of selecting the best option available, depending 
on target hardware class: whether it issues a cpu/sse4 instruction is 
not/should not be our problem.


There are CPUs in circulation produced in this decade that don't have 
it.


Then the compiler, when generating code that is expected to run for a 
large class of hardware which include such old ones, should not use a 
possibly unavailable instruction, or the runtime should take 
responsability for dynamically selecting a viable option.


My understanding is that it should always be safe to call __builtin_XYZ 
functions when available. Then if you compile saying that you want code 
specific to cpu X and then run it on cpu Y and it fails, basically you 
just shot yourself in the foot.


I have previously considered something like this[2], but realised you 
would therefore need a runtime check.  There are a couple of ways to do 
that: see commit a7a73875 for one example, also 
__builtin_cpu_supports(), and direct CPU ID bit checks (some platforms). 
There is also the GCC "multiversion" system that takes care of that for 
you but that worked only for C++ last time I checked.


ISTM that the purpose of a dynamic check would be to have the better 
hardware benefit even when compiling for a generic class of hardware which 
may or may not have the better option.


This approach is fine for reaching a better performance/portability 
compromise, but I do not think that it is that useful for pgbench to go to 
this level of optimization, unless someone else takes care, hence the 
compiler builtin.


An interesting side effect of your mail is that while researching a bit on 
the subject I noticed __builtin_clzll() which helps improve the nbits code 
compared to using popcount. Patch attached uses CLZ insted of POPCOUNT.


--
Fabien.diff --git a/config/c-compiler.m4 b/config/c-compiler.m4
index eedaf12d69..bca110ed0c 100644
--- a/config/c-compiler.m4
+++ b/config/c-compiler.m4
@@ -319,6 +319,26 @@ fi])# PGAC_C_BUILTIN_BSWAP64
 
 
 
+# PGAC_C_BUILTIN_CLZLL
+# -
+# Check if the C compiler understands __builtin_clzll(),
+# and define HAVE__BUILTIN_CLZLL if so.
+# Both GCC & CLANG seem to have one.
+AC_DEFUN([PGAC_C_BUILTIN_CLZLL],
+[AC_CACHE_CHECK(for __builtin_clzll, pgac_cv__builtin_clzll,
+[AC_COMPILE_IFELSE([AC_LANG_SOURCE(
+[static unsigned long int x = __builtin_clzll(0xaabbccddeeff0011);]
+)],
+[pgac_cv__builtin_clzll=yes],
+[pgac_cv__builtin_clzll=no])])
+if test x"$pgac_cv__builtin_clzll" = xyes ; then
+AC_DEFINE(HAVE__BUILTIN_CLZLL, 1,
+  [Define to 1 if your compiler understands __builtin_clzll.])
+fi])# PGAC_C_BUILTIN_CLZLL
+
+
+
+
 # PGAC_C_BUILTIN_CONSTANT_P
 # -
 # Check if the C compiler understands __builtin_constant_p(),
diff --git a/configure b/configure
index 23ebfa8f3d..13ba5bc093 100755
--- a/configure
+++ b/configure
@@ -13923,6 +13923,30 @@ if test x"$pgac_cv__builtin_bswap64" = xyes ; then
 
 $as_echo "#define HAVE__BUILTIN_BSWAP64 1" >>confdefs.h
 
+fi
+{ $as_echo "$as_me:${as_lineno-$LINENO}: checking for __builtin_clzll" >&5
+$as_echo_n "checking for __builtin_clzll... " >&6; }
+if ${pgac_cv__builtin_clzll+:} false; then :
+  $as_echo_n "(cached) " >&6
+else
+  cat confdefs.h - <<_ACEOF >conftest.$ac_ext
+/* end confdefs.h.  */
+static unsigned long int x = __builtin_clzll(0xaabbccddeeff0011);
+
+_ACEOF
+if ac_fn_c_try_compile "$LINENO"; then :
+  pgac_cv__builtin_clzll=yes
+else
+  pgac_cv__builtin_clzll=no
+fi
+rm -f core conftest.err conftest.$ac_objext conftest.$ac_ext
+fi
+{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $pgac_cv__builtin_clzll" >&5
+$as_echo "$pgac_cv__builtin_clzll" >&6; }
+if test x"$pgac_cv__builtin_clzll" = xyes ; then
+
+$as_echo "#define HAVE__BUILTIN_CLZLL 1" >>confdefs.h
+
 fi
 { $as_echo "$as_me:${as_lineno-$LINENO}: checking for __builtin_constant_p" >&5
 $as_echo_n "checking for __builtin_constant_p... " >&6; }
diff --git a/configure.in b/configure.in
index 530f275993..b9de9346a1 100644
--- a/configure.in
+++ b/configure.in
@@ -1458,6 +1458,7 @@ PGAC_C_TYPES_COMPATIBLE
 PGAC_C_BUILTIN_BSWAP16
 PGAC_C_BUILTIN_BSWAP32
 

Re: Proposal for Signal Detection Refactoring

2018-09-26 Thread Chris Travers
On Tue, Sep 25, 2018 at 3:23 PM Tom Lane  wrote:

> Chris Travers  writes:
> > However,  what I think one could do is use a struct of volatile
> > sig_atomic_t members and macros for checking/setting.  Simply writing a
> > value is safe in C89 and higher.
>
> Yeah, we could group those flags in a struct, but what's the point?
>

This was one of two things I noticed in my previous patch on interrupts and
loops where I wasn't sure what the best practice in our code is.

If we don't want to make this change, then would there be any objection to
me writing up a README describing the flags, and best practices in terms of
checking them in our code based on the current places we use them?  If the
current approach will be unlikely to change in the future, then at least we
can document that the way I went about things is consistent with current
best practices so next time someone doesn't really wonder.


>
> regards, tom lane
>


-- 
Best Regards,
Chris Travers
Head of Database

Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com
Saarbrücker Straße 37a, 10405 Berlin


Re: Proposal for Signal Detection Refactoring

2018-09-26 Thread Chris Travers
On Wed, Sep 26, 2018 at 12:41 AM Michael Paquier 
wrote:

> On Mon, Sep 24, 2018 at 09:27:35PM -0700, Andres Freund wrote:
> > On 2018-09-25 11:50:47 +0900, Michael Paquier wrote:
> >> PGDLLIMPORT changes don't get back-patched as well...
> >
> > We've been more aggressive with that lately, and I think that's good. It
> > just is a unnecessarily portability barrier for extension to be
> > judicious in applying that.
>
> Agreed.  Are there any objections with the proposal of changing the
> interruption pending flags in miscadmin.h to use sig_atomic_t?
> ClientConnectionLost would gain PGDLLIMPORT at the same time.
>

I am strongly in favor of doing this.


> --
> Michael
>


-- 
Best Regards,
Chris Travers
Head of Database

Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com
Saarbrücker Straße 37a, 10405 Berlin


Re: Participate in GCI as a Mentor

2018-09-26 Thread Tahir Ramzan
Honorable Concern,

I am planning to mentor these ideas IN Google Code-in as these are
learning, interesting, informative and doable for young students:

Compare PostgreSQL with another RDBMS:
• DB2
• MySQL
• Oracle
• SQLite
• MS SQL Server
• Sybase

Compare PostgreSQL with a NoSQL Column Database:
• Accumulo
• Cassandra
• Druid
• HBase
• Vertica

Compare PostgreSQL with a NoSQL Document Database:
• Apache CouchDB
• ArangoDB
• BaseX
• Clusterpoint
• Couchbase
• Cosmos DB
• IBM Domino
• MarkLogic
• MongoDB
• OrientDB
• Qizx
• RethinkDB

Compare PostgreSQL with NoSQL Key-value Database:
• Aerospike
• Apache Ignite
• ArangoDB
• Berkeley DB
• Couchbase
• Dynamo
• FairCom c-treeACE
• FoundationDB
• InfinityDB
• MemcacheDB
• MUMPS
• Oracle NoSQL Database
• OrientDB
• Redis
• Riak
• SciDB
• SDBM/Flat File dbm
• ZooKeeper

Compare PostgreSQL with a NoSQL Graph Database
• AllegroGraph
• ArangoDB
• InfiniteGraph
• Apache Giraph
• MarkLogic
• Neo4J
• OrientDB
• Virtuoso

Kindly share your precious thoughts and feedback, thanks in anticipation.

-- 
Regards
Tahir Ramzan
MSCS Research Scholar
Google Summer of Code 2015 (CiviCRM)
Google Summer of Code 2016 (ModSecurity)
Outside Collaborator of SpiderLabs (Powered by TrustWave)
Google Android Students Club Facilitator and Organizer 2015

Contact:

+92-312-5518018 <+92%20312%205518018>

tahirram...@alumni.vu.edu.pk


More details about me and my work:

GitHub Profile: https://github.com/tahirramzan

LinkedIn Profile: https://pk.linkedin.com/in/tahirramzan


Re: Implement predicate propagation for non-equivalence clauses

2018-09-26 Thread Richard Guo
Hi,

Thanks everyone for reviewing. We updated the patch and added more strict
checks for the non-equivalence clauses before deducing new quals, including:

1) The non-equivalence clause for deduction can only be type of OpExpr or
ScalarArrayOpExpr, with two arguments.

2) The operator of the non-equivalence clause must be in the eclass's
operator
family, if it intends to deduce new quals based on the eclass.

3) The expression in the non-equivalence clause must be equal to the
EquivalenceMember, if we want to replace that expression to get a new qual.

Thanks
Richard

On Thu, Sep 6, 2018 at 12:01 PM Richard Guo  wrote:

> Hi,
>
> On Wed, Sep 5, 2018 at 2:56 PM, Tom Lane  wrote:
>
>> Richard Guo  writes:
>> > In this patch, we are trying to do the similar deduction, from
>> > non-equivalence
>> > clauses, that is, A=B AND f(A) implies A=B AND f(A) and f(B), under some
>> > restrictions on f.
>>
>> Uh, *what* restrictions on f()?  In general the above equivalence
>> does not hold, at least not for any data type more complicated than
>> integers; and we do not have any semantic model for deciding
>> which functions it would be correct for.
>>
>
> Exactly! The operator in f() should be at least in the same opfamily as
> the equivalence class containing A,B.
> Besides, as far as I can consider, the clause in f() should not contain 
> volatile
> functions or subplans. Not sure
> if these restrictions are enough to make it safe.
>
>
>> One simple example to show what I'm talking about is that float8 zero
>> and minus zero are equal according to float8eq (assuming IEEE float
>> arithmetic); but they aren't equivalent for any function f() that is
>> sensitive to the sign or the text representation of the value.
>> The numeric data type likewise has values that are "equal" without
>> being identical for all purposes, eg 0.0 vs 0.000.  Or consider
>> citext.
>>
>
> Thanks for the example. Heikki materialized this example as:
>
> create table a (f float8);
> create table b (f float8);
>
> insert into a values ('0'), ('-0');
> insert into b values ('0'), ('-0');
>
> select * from a, b where a.f = b.f and a.f::text <> '-0';
>
> And run that query, this patch would give wrong result. Will address this
> in v2.
>
>
>> The existing planner deduction rules for equivalence classes are
>> carefully designed to ensure that we only generate derived clauses
>> using operators from the same operator class or family, so that
>> it's on the opclass author to ensure that the operators have consistent
>> semantics.  I don't think we can extrapolate from that to any random
>> function that accepts the datatype.
>
>
>
>
>
>
>> regards, tom lane
>>
>
>


v2-0001-Implement-predicate-propagation-for-non-equivalence-clauses.patch
Description: Binary data


Re: file cloning in pg_upgrade and CREATE DATABASE

2018-09-26 Thread Michael Paquier
Hi Peter,

On Sat, Sep 01, 2018 at 07:28:37AM +0200, Peter Eisentraut wrote:
> rebased patch, no functionality changes

Could you rebase once again?  I am going through the patch and wanted to
test pg_upgrade on Linux with XFS, but it does not apply anymore.

Thanks,
--
Michael


signature.asc
Description: PGP signature


Re: [PATCH] Add regress test for pg_read_all_stats role

2018-09-26 Thread Michael Paquier
On Fri, Aug 24, 2018 at 03:37:17PM +0100, Alexandra Ryzhevich wrote:
> CONNECT ON DATABASE privilege is granted to public by default (so
> all users by default can select pg_database_size()) and
> REVOKE CONNECT ... FROM  command doesn't impact
> user's privileges. The only way to revoke connect privilege from user
> is to revoke it from public. That's why maybe it will be less harmful
> just to remove pg_database_size tests at all. But will it be better to
> remove pg_tablespace_size() tests also? if possible costs are more
> harmful than not testing this code path then I'll remove these tests
> also.

Sorry for the delay, Alexandra, and thanks for the new version.  I have
removed the tests related to pg_tablespace_size as they are costly and
those for vacuum_cost_delay as we didn't gain more with those tests.
The rest has been committed, which makes a good first cut.
--
Michael


signature.asc
Description: PGP signature