[HACKERS] Re: Patch to address concerns about ICU collcollate stability in v10 (Was: CREATE COLLATION does not sanitize ICU's BCP 47 language tags. Should it?)

2017-09-26 Thread Peter Geoghegan
On Sun, Sep 24, 2017 at 9:24 PM, Peter Geoghegan <p...@bowt.ie> wrote:
> * Documents the aforementioned keyword collation attribute restriction
> on ICU versions before ICU 54. This was needed anyway. We only claim
> for Postgres collations what the ICU docs claim for ICU collators,
> even though there is reason to believe that some ICU versions before
> ICU 54 actually can do better.

Following some digging, it looks like the restriction actually only
needs to apply on versions prior to ICU 4.6. The internal function
_canonicalize() contains the functionality need for Postgres to avoid
converting to BCP 47 itself on the fly (this was previously believed
to be the case only in ICU 54):

https://ssl.icu-project.org/trac/browser/tags/release-50-1-2/icu4c/source/common/uloc.cpp#L1614

(This is uloc.c on earlier ICU versions.)

(The call stack here is that from ucol_open(), ucol_open_internal() is
called, which calls uloc_getBaseName(), which calls _canonicalize().)

ICU gained that _hasBCP47Extension() branch in ICU 46. The
_canonicalize function has hardly changed since, despite the rewrite
from C to C++ (I checked up until ICU 55, the reasonably current
version I've done most testing on). This is totally consistent with
what both Peter and I have observed, even though the ucol_open() docs
suggest that that stuff only became available within ICU 54. I think
that ucol_open() was updated to mention BCP 47 at the same time as it
mentioned the lifting of the restrictions on extended keywords (when
you no longer had to use ucol_setAttribute()), confusing two basically
unrelated issues.

I suggest that as a compromise, my proposal can be changed in either
of the following ways:

* Do the same thing as I propose, except only introduce the mapping
from/to language tags when ICU version < 46 is in use. This would mean
that we'd be doing that far less in practice.

Or:

* Never do *any* mapping/language tag conversion when opening a
collator, but still consistently canonicalize as BCP 47 on all ICU
versions. This can be done by only supporting versions that have the
required _hasBCP47Extension() branch (ICU >= 46). We'd desupport the
old ICU 42 and ICU 44 versions.

Support for ICU 42 and ICU 44 came fairly late, in commit eccead9 on
August 5th. Both Tom and I expressed reservations about that at the
time. It doesn't seem too late to desupport those 2 versions, leaving
us with a fix that is even less invasive. We actually wouldn't
technically be changing anything about canonicalization at all, this
way. We'd be changing our understanding of when a restriction applies
(not < 54, < 46), at which point the "collcollate =
U_ICU_VERSION_MAJOR_NUM >= 54 ? langtag : name" thing that I take
particular issue with becomes "collcollate = U_ICU_VERSION_MAJOR_NUM
>= 46 ? langtag : name".

This is equivalent to "collcollate = langtag" (which is what it would
actually look like), since we're desupporting earlier versions.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] CREATE COLLATION does not sanitize ICU's BCP 47 language tags. Should it?

2017-09-25 Thread Peter Geoghegan
On Mon, Sep 25, 2017 at 12:52 PM, Peter Geoghegan <p...@bowt.ie> wrote:
> That must have been the real reason why you canonicalized
> pg_collation.collname (I doubt it had anything to do with how keyword
> variants used to be created during initdb, as you suggested). As Tom
> pointed out recently, we've actually always canonicalized collation
> name for libc.

On further examination, none of this really matters, because you
simply cannot store ICU locale names like "en_US" within pg_collation;
it's impossible to do that without breaking many things that have
worked for a long time. initdb already canonicalizes the available
libc collations to produce collations whose names have exactly the
same "en_US" format. There will typically be both "en_US" and
"en_US.utf8" entries within pg_collation with Glibc on Linux, for example
(the former is created a convenient alias for the latter when the
database encoding is UTF-8).

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] CREATE COLLATION does not sanitize ICU's BCP 47 language tags. Should it?

2017-09-25 Thread Peter Geoghegan
On Mon, Sep 25, 2017 at 12:14 PM, Peter Geoghegan <p...@bowt.ie> wrote:
> But then our users categorically have to know about both formats,
> without any practical benefit to make up for it. You will also get
> people that don't realize that only one format is supported on some
> versions if go this way.

Oh, and if you go this way there is also a much bigger disadvantage:
you also break pg_restore when you cross the ICU 54 version boundary
in *either* direction (pg_collation.collname will be different, so
actually equivalent collations will not be recognized as such for
dump/restore purposes). This seems very likely, even, because ICU 54
isn't that old (we support versions as old as ICU 4.2), and because
you only have to have used ICU initdb collation for this to happen (no
CREATE COLLATION is necessary). Sure, you *may* be able to first do a
manual CREATE COLLATION when that happens, and that will be picked up
during the restore. But that's very user hostile.

That must have been the real reason why you canonicalized
pg_collation.collname (I doubt it had anything to do with how keyword
variants used to be created during initdb, as you suggested). As Tom
pointed out recently, we've actually always canonicalized collation
name for libc.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch to address concerns about ICU collcollate stability in v10 (Was: CREATE COLLATION does not sanitize ICU's BCP 47 language tags. Should it?)

2017-09-25 Thread Peter Geoghegan
On Mon, Sep 25, 2017 at 11:42 AM, Peter Eisentraut
<peter.eisentr...@2ndquadrant.com> wrote:
> On 9/25/17 00:24, Peter Geoghegan wrote:
>> * Creates root collation as root-x-icu (collcollate "root"), not
>> und-x-icu. "und" means undefined language.
>
> I'm curious about this point.  "und" is defined in BCP 47.  I don't see
> "root" defined anywhere.  ICU converts the root collation to "und",
> AFAIK, so it seems to agree with the current naming.

In my patch, "root" is a string that is passed to get a language tag.
That's technically in the old format.

I think that this is another ICU vs. UCA/CLDR thing (this causes much
confusion). Note that "root" is mentioned in the ICU locale explorer,
for example: https://ssl.icu-project.org/icu-bin/locexp

Note also that ucol_open() comments/docs say this:

 * @param loc The locale containing the required collation rules.
 *Special values for locales can be passed in -
 *if NULL is passed for the locale, the default locale
 *collation rules will be used. If empty string ("") or
 *"root" are passed, UCA rules will be used.

I went with "root" because that produces a meaningful/useful display
name for pg_collation, and seems to be widely used elsewhere.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] CREATE COLLATION does not sanitize ICU's BCP 47 language tags. Should it?

2017-09-25 Thread Peter Geoghegan
On Mon, Sep 25, 2017 at 11:40 AM, Peter Eisentraut
<peter.eisentr...@2ndquadrant.com> wrote:
> On 9/22/17 16:46, Peter Geoghegan wrote:
>> But you are *already* canonicalizing ICU collation names as BCP 47. My
>> point here is: Why not finish the job off, and *also* canonicalize
>> colcollate in the same way? This won't break ucol_open() if we take
>> appropriate precautions when we go to use the Postgres collation/ICU
>> locale.
>
> Reading over this code again, it is admittedly not quite clear why this
> "canonicalization" is in there right now.  I think it had something to
> do with how we built the keyword variants at one point.  It might not
> make sense.  I'd be glad to take that out and use the result straight
> from uloc_getAvailable() for collcollate.  That is, after all, the
> "canonical" version that ICU chooses to report to us.

But then our users categorically have to know about both formats,
without any practical benefit to make up for it. You will also get
people that don't realize that only one format is supported on some
versions if go this way.

>> One concern that makes me suggest this is: What happens when
>> the user *downgrades* ICU version, from a version where colcollate is
>> BCP 47 to one where it would not have been at initdb time? That will
>> break the downgrade in an unpleasant way, including in installations
>> that never do a CREATE COLLATION themselves. We want to be able to
>> restore a basebackup on a somewhat different OS, and have that still
>> work following REINDEX. At least, that seems like it should be an
>> important goal for us.
>
> This is an interesting point, and my proposal above would fix that.

I've already written a patch to standardize collcollate. If we do the
way you describe above instead, then what happens when ICU finally
removes the already deprecated legacy format?

> However, I think that taking a PostgreSQL data directory and moving or
> copying it to an *older* OS installation is always going to have a
> potential for problems.  So I wouldn't spend a huge amount of effort
> just to fix this specific case.

The downgrade thing is just the simplest, most immediate example of
where failing to standardize collcollate now could cause problems.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] CREATE COLLATION does not sanitize ICU's BCP 47 language tags. Should it?

2017-09-25 Thread Peter Geoghegan
On Mon, Sep 25, 2017 at 9:06 AM, Robert Haas <robertmh...@gmail.com> wrote:
>> The big concern I have here is that this feels a lot like something that
>> we'll regret at leisure, if it's not right in the first release.  I'd
>> much rather be restrictive in v10 and then loosen the rules later, than
>> be lax in v10 and then have to argue about whether to break backwards
>> compatibility in order to gain saner behavior.
>
> I think it's inevitable that a certain number of users are going to
> have to cope with ICU version changes breaking stuff.

Wasn't the main point of adopting ICU that that doesn't happen when it
isn't essential? There will be some risk with pg_dump across ICU
versions, due rare to political changes. But, on pg_upgrade, the old
collations will continue to work, even if they would not have appeared
at initdb time on that ICU version, because ICU has all kinds of
fallbacks. It will work with a language it has heard of, but a country
it has never heard of, for example. I just want to have a consistent
collcollate format, to consistently take advantage of that.

My patch has no behavioral changes, apart from the sanitization, on
ICU >= 54, BTW. It's really not very much code, especially when you
exclude objective bug fixes.

> If ICU decides
> a collation is stupid or unused and drops it, or is mis-defined and
> redefines it to some behavior that breaks things for somebody, they
> are going to have to deal with it.  I don't think you can make that
> problem go away by any amount of strictness introduced into v10, but
> if you make the checks zealous enough, you can probably make them rule
> out input that users would have preferred to have accepted.

The sanitization thing isn't my main concern. The stability issue is
the major issue, and it isn't all that connected to sanitization.

> I also think that if there's a compelling reason to bet on BCP 47 to
> be a stable canonical form, I haven't heard it presented here.  At the
> risk of repeating myself, it's not even supported in some ICU versions
> we support, so how's that going to work?  And if it's been changed in
> the recent past, why not again?  Peter Geoghegan said that he doesn't
> know of any plans to eliminate BCP 47 support, but that doesn't seem
> like it's much proof of anything.

It's described by an IETF standard, and the extensions are described
by Unicode Technical Standard #35, a formal standard [1]. The BCP 47
format is not an ICU thing at all -- it's a Unicode consortium thing.
An express goal of BCP 47 is forward and backward compatibility [2].
And, ICU themselves have said that that's what they're standardizing
on, since they are upstream consumers of the CLDR (Unicode consortium)
locale data. They've done that because they want to move away from
their own format towards a universal format, usable in a wide variety
of contexts.

While I don't pretend to be able to predict the future, I think that
this is something that is as safe as is practically achievable to
standardize on. (It also has all kinds of provision for further
extension, should that prove necessary.)

> If, on the other hand, you introduce an error check that is overly
> stringent and precludes people from defining collations that are legal
> and useful (in their judgement, not yours), I intend to whine about
> that extensively.  And that applies to 10, 11, and any future versions
> for which I may be around.

That's not going to happen.

> In short, I judge that allowing users access to *all* of the things
> that ICU has now, has had in the past in versions we support, or may
> have in the future is an important goal, but that preventing them from
> relying on options that may go away is not a goal at all, since
> barring the ability to predict the future, it's impossible anyway.

+1

> If it's possible to prevent to write a precisely-targeted check that
> rules out only actually-meaningless collations and nothing else, I'm
> fine with that.

That's what I've done with my patch, IMV. I admit that it's still a
tiny bit subjective that we should reject an empty string, and not
allow the fallback there, for example (we could instead fall back on
that as "undefined"). I think that that's not going to be a problem
for anyone.

[1] http://unicode.org/reports/tr35/tr35-collation.html
[2] https://tools.ietf.org/html/bcp47#section-3.4
-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Patch to address concerns about ICU collcollate stability in v10 (Was: CREATE COLLATION does not sanitize ICU's BCP 47 language tags. Should it?)

2017-09-24 Thread Peter Geoghegan
On Fri, Sep 22, 2017 at 11:34 AM, Peter Eisentraut
<peter.eisentr...@2ndquadrant.com> wrote:
> After reviewing this thread and testing around a bit, I think the code
> is mostly fine as it is, but the documentation is lacking.  Hence,
> attached is a patch to expand the documentation quite a bit, especially
> to document in more detail what ICU locale strings are accepted.

Attached is my counterproposal. This patch has us always make sure
that collcollate is in BCP 47 format, even on ICU versions prior to
ICU 54.

Other improvements/bug fixes:

* Sanitizes locale names within CREATE COLLATION. This has been tested
on ICU 42 (earliest supported version) and ICU 55.

* Enforces a NAMEDATALEN restriction for collcollate during CREATE
COLLATION, forbidding collcollate truncation. This is useful because
truncating can allow subtly wrong answers later on.

* Adds DEBUG1 message with ICU display name, so we can satisfy
ourselves that we're getting the expected behavior, to some degree.

I used this to confirm that we get consistent behavior between ICU 42
and 55 for CREATE COLLATION. On ICU 42, keyword collation attributes
(e.g., the emoji keyword, numeric ordering for natural sort order)
still don't work, just as before, but the locale string is still
considered valid. (This is because ucol_setAttribute() is supposed to
be used there).

* Documents the aforementioned keyword collation attribute restriction
on ICU versions before ICU 54. This was needed anyway. We only claim
for Postgres collations what the ICU docs claim for ICU collators,
even though there is reason to believe that some ICU versions before
ICU 54 actually can do better.

* When using ICU 4.2, the examples in the docs (variant collations
like German Phonebook order) now actually work.

The examples are completely broken right now IMV, because the user has
to know that they are on ICU 4.2, which only accepts the old style
locale strings as things stand. And, they'd have no obvious indication
that things were broken without this patch, because there would have
been no sanitization or other feedback.

* Creates root collation as root-x-icu (collcollate "root"), not
und-x-icu. "und" means undefined language.

* Moves all knowledge of ICU version issues to within a few
pg_locale.c routines, leaving the code reasonably well encapsulated.

* Does an encoding conversion when getting a display name for the
initdb collation comment. This needs to be ascii-safe due to the
initdb/template0 DB encoding restriction, but I suspect that the way
we do it now is subtly wrong. This does imply that SQL_ASCII databases
will never get ICU pg_collation entries that come with comments added
by initdb, but such databases were already unable to use the
collations anyway, so this is no loss at all. (SQL_ASCII is mentioned
here as an example of a database collation that ICU doesn't support,
all of which are affected in this way.)

I decided to implement CREATE COLLATION sanitization based on whether
or not a display string can be generated (if not, or if it's empty, we
reject). This seems like about the right level of strictness to me,
because we're still very forgiving. I admit that that's a little bit
arbitrary, but it does seem to be a good match for Postgres; it is
forgiving of things that could plausibly make sense on another ICU
version to some user at some time, but rejects most things that are
inherently wrong, IMHO. You can still ask for Japanese as spoken in
Madagascar, or even specify a language that ICU has never heard of,
and there is no error. It catches syntax errors only. See the slightly
expanded tests for details. I'm very open to negotiating the exact
details of how we sanitize, but any level of sanitization will be at
least a little bit arbitrary (including what we have right now, which
is no sanitization).

Aside from the specific problems for Postgres that I've noted that the
patch prevents or fixes, there is another reason to do this. The old
style locale name format is officially deprecated by ICU, which makes
it seem like we should never expose it to users in the first place.
Per ICU docs:

"Starting with ICU 54, the following naming scheme and its API
functions are deprecated. Use ucol_open() with language tag collation
keywords instead (see Collation API Details)" [1]

[1] 
http://userguide.icu-project.org/collation/concepts#TOC-Collator-naming-scheme
-- 
Peter Geoghegan
From 989bc2f877aa01af4ac140a43a5cebcffe8b3ec9 Mon Sep 17 00:00:00 2001
From: Peter Geoghegan <p...@bowt.ie>
Date: Thu, 21 Sep 2017 17:34:13 -0700
Subject: [PATCH] Consistently canonicalize ICU collations' collcollate as BCP
 47.

Previously, on versions of ICU prior to ICU 54 collcollate was stored in
the legacy locale format at initdb time.  We now always use the BCP 47
format there.  To make that work, collcollate is now converted from BCP
47 format to the old locale tag format on-the-fly as an ICU collator is
initially o

Re: [HACKERS] ICU locales and text/char(n) SortSupport on Windows

2017-09-24 Thread Peter Geoghegan
On Sun, Sep 24, 2017 at 5:04 AM, Peter Eisentraut
<peter.eisentr...@2ndquadrant.com> wrote:
> On 9/22/17 12:25, Peter Geoghegan wrote:
>> On Fri, Sep 22, 2017 at 7:25 AM, Peter Eisentraut
>> <peter.eisentr...@2ndquadrant.com> wrote:
>>> I agree.  The attached patch should do it.
>>
>> I see one small issue here: You'll now need to set ssup->comparator
>> back to NULL before you return early in the Windows' libc case. That
>> way, a shim comparator (that goes through bttextcmp(), in the case of
>> text) will be installed within FinishSortSupportFunction(). Other than
>> that, looks good to me.
>
> committed accordingly

Thanks.

I am currently working on a patch for the issues with ICU colcollate
stability as I see them. I should be able to post something later
today or tomorrow. I would appreciate it if you held off on committing
anything there until you've considered what I'll propose.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] CREATE COLLATION does not sanitize ICU's BCP 47 language tags. Should it?

2017-09-22 Thread Peter Geoghegan
On Fri, Sep 22, 2017 at 8:56 PM, Tom Lane <t...@sss.pgh.pa.us> wrote:
> The big concern I have here is that this feels a lot like something that
> we'll regret at leisure, if it's not right in the first release.  I'd
> much rather be restrictive in v10 and then loosen the rules later, than
> be lax in v10 and then have to argue about whether to break backwards
> compatibility in order to gain saner behavior.

To the bests of my knowledge, the only restriction implied by limiting
ourselves to the BCP 47 format (as part of standardizing what is
stored in pg_collation) is that users might know about the traditional
locale strings from some other place, and be surprised when their
knowledge doesn't transfer to Postgres. Personally, I don't think that
that's a big deal. If it actually is important, then I'm surprised
that it took this long for a doc change mentioning it to be proposed
(though the docs *do* say "Collations provided by ICU are created with
names in BCP 47 language tag format").

>> We have never canonicalized collations before and therefore it is not
>> essential that we do that now.
>
> Actually, we try; see initdb.c's check_locale_name().  It's not our
> fault that setlocale(3) fails to play along on many platforms.

But it will be our fault if we ship a v10 that does the kind of
unsettled canonicalization you see within
pg_import_system_collations() (the "collcollate =
U_ICU_VERSION_MAJOR_NUM >= 54 ? langtag : name" thing). That looks
very much like the tail wagging the dog to me.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] CREATE COLLATION does not sanitize ICU's BCP 47 language tags. Should it?

2017-09-22 Thread Peter Geoghegan
On Fri, Sep 22, 2017 at 5:58 PM, Robert Haas <robertmh...@gmail.com> wrote:
> On Fri, Sep 22, 2017 at 4:46 PM, Peter Geoghegan <p...@bowt.ie> wrote:
>> But you are *already* canonicalizing ICU collation names as BCP 47. My
>> point here is: Why not finish the job off, and *also* canonicalize
>> colcollate in the same way?
>
> Peter, with respect, it's time to let this argument go.  We're
> scheduled to wrap a GA release in just over 72 hours.  It is far too
> late to change behavior like this.

I didn't say that it wasn't. That's above my paygrade.

> On the substantive issue, I am inclined (admittedly without deep
> study) to agree with Peter Eisentraut.  We have never canonicalized
> collations before and therefore it is not essential that we do that
> now.

As I've said, we're *already* canonicalizing them for ICU. Just not
consistently (across ICU versions, and arguably even within ICU
versions). That's the problem -- we're half way between both
positions.

The problem is most emphatically *not* that we've failed to
canonicalize them in the way that I happen to favor.

> That would be a new feature, and I don't think I'd be prepared
> to endorse adding it three days after feature freeze let alone three
> days before the GA wrap.  I do agree that the lack of canonicalization
> is utterly terrible.  The APIs that Unix-like operating systems
> provide for collations are poorly suited to our purposes and
> hopelessly squishy about semantics, and it's not clear how much better
> ICU will be.

In one important sense, this is a regression against libc, because you
never had something like en_US.UTF-8 break on downgrading glibc
version (like, when you restore a basebackup on a different OS with
the same arch). Sure, you probably had to REINDEX text indexes, to be
on the safe side, but once you did that there was no question about
the older glibc having never heard of "en_US.UTF-8" as a
LC_COLLATE/collcollate.

I regret that I didn't catch it sooner. It now seems very obvious, and
totally preventable given enough time.

> I simply do not buy the theory that this cannot be changed later.in

It can be changed later, of course -- at greater, though indeterminate cost.

> It's been the case for as long as we've had pg_collate that a new
> system could have different collations than the old one, resulting in
> a dump/restore failure.  I expect somebody's had that problem at some
> point, but I don't think it's become a major pain point because most
> people don't use exotic collations, and if they do they probably
> understand that they need those exotic collations to be on the new
> system too.

Like I said, you don't need exotic collations to have the downgrade
problem, unless *any* initdb ICU collation counts as exotic. No CREATE
COLLATION is needed.

> I also believe that Peter Eisentraut is entirely correct to be
> concerned about whether BCP 47 (or anything else) can really be
> regarded as a stable canonical form for ICU purposes.  His email
> indicates that the acceptable and canonical forms have changed
> multiple times in the course of releases new enough for us to care
> about them.  Assuming that statement is correct, it would be extremely
> short-sighted of us to bank on them not changing any more.

That statement isn't correct. Including even the suggestion that Peter
Eisentraut ever thought it. ICU uses BCP 47 for collation name *across
all versions*. Just not as the collcollate value (that's only the case
on versions of ICU >= 54).

> But even if all of the above argumentation is utterly and completely
> wrong, dredged up from the universe's deepest and most profound
> reserves of stupidity and destined for future entry into Webster's as
> the canonical example of cluelessness, we still shouldn't change it
> the weekend before the GA wraps.

That seems like a value judgement. I'm not going to tell you that
you're wrong. What I will say is that I think we've done poorly here.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] CREATE COLLATION does not sanitize ICU's BCP 47 language tags. Should it?

2017-09-22 Thread Peter Geoghegan
iety of display names at CREATE COLLATION time (I can provide a
rough patch that shows display name [1]), to make sure that it matches
expectations on all ICU versions. Your testing patch does not satisfy
me that versions prior to ICU 54 have a ucol_open() that accepts BCP
47 without *any* problem (it could be a subtle issue).

Besides, if it isn't going to work on ICU 4.2 anyway, I think that the
leniency of ICU 52 isn't actually interesting. We still have an
awkward special case to deal with.

> One conclusion here is that there are multiple dimensions to what sort
> of thing is accepted as a locale in different versions of ICU and what
> the canonical format is.  If we insist that everything has to be written
> in the form that is preferred today, then we'll have awkward problems if
> a future version of ICU establishes even more variants that will then be
> preferred.

I'd feel a lot better about that if there was a NOTICE showing the ICU
display name for the locale shown when the user does a CREATE
COLLATION. I think that the risks from not doing that outweigh the
risks of doing it, even at this late date. I could live without any
sanitization if we did this much. Without this, the user is flying
blind when using CREATE COLLATION. Why should the DBA be able to
verify that the sorting they get is culturally appropriate? That's
just not practical.

> I think there is room for incremental refinement here.  Features like
> optionally checking or printing the canonical or preferred locale format
> or making the locale description available via a function or system view
> might all be valuable additions to a future PostgreSQL release.

I don't think that there is room for incremental refinement when it
comes to what ends up in pg_collation at initdb time. I don't like to
create commotion at this late date, but there are some things that
we'll only really get one chance at here. Admittedly, that's not true
of everything I raise here; it's hard to strictly limit discussion to
issues that have that quality.

[1] 
postgr.es/m/cah2-wznopmj+3xh6bvea_yuyd4zdgiwg9yce31q09ou3xxw...@mail.gmail.com

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ICU locales and text/char(n) SortSupport on Windows

2017-09-22 Thread Peter Geoghegan
On Fri, Sep 22, 2017 at 7:25 AM, Peter Eisentraut
<peter.eisentr...@2ndquadrant.com> wrote:
> I agree.  The attached patch should do it.

I see one small issue here: You'll now need to set ssup->comparator
back to NULL before you return early in the Windows' libc case. That
way, a shim comparator (that goes through bttextcmp(), in the case of
text) will be installed within FinishSortSupportFunction(). Other than
that, looks good to me.

Thanks
-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] CREATE COLLATION does not sanitize ICU's BCP 47 language tags. Should it?

2017-09-22 Thread Peter Geoghegan
On Fri, Sep 22, 2017 at 1:50 AM, Andreas Karlsson <andr...@proxel.se> wrote:
> On 09/21/2017 10:55 PM, Peter Geoghegan wrote:
>>
>> I agree, but the bigger issue is that we're *half way* between
>> supporting only one format, and supporting two formats. AFAICT, there
>> is no reason that we can't simply support one format on all ICU
>> versions, and keep what ends up within pg_collation at initdb time
>> essentially the same across ICU versions (except for those that are
>> due to cultural/political developments).
>
>
> I think we are in agreement then, but I do not have the time to get this
> done before the release of 10 so would be happy if you implemented it. Peter
> E, what do you say in this?

I can write a patch for this, but will not without some kind of buy-in
from Peter E.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] CREATE COLLATION does not sanitize ICU's BCP 47 language tags. Should it?

2017-09-21 Thread Peter Geoghegan
On Tue, Sep 19, 2017 at 7:01 PM, Peter Geoghegan <p...@bowt.ie> wrote:
> I really think we need to add some kind of debug mode that makes ICU
> optionally spit out a locale display name at key points. We need this
> to gain confidence that the behavior that ICU provides actually
> matches what is expected across ICU different versions for different
> locale formats. We talked about this as a user-facing feature before,
> which can wait till v11; I just want this to debug problems like this
> one.

I patched CREATE COLLATION to show ICU display name, which produces
output like this:

postgres=# create collation basque (provider=icu,
locale='eu-u-kf-upper-kr-latn-digit-em-emoji-kn-true-co-eor');
NOTICE:  0: ICU collation description is "Basque
(colcasefirst=upper, Sort Order=European Ordering Rules,
colnumeric=yes, colreorder=latn-digit, em=emoji)"
CREATE COLLATION

I used an ISO 639-1 language code (2 letter language code) above,
which, as we can see, is recognized as Basque. ICU is also fine with
the 3 letter 639-2 code "eus-", recognizing that as Basque, too. If I
use an ISO 639-2 code for Basque that ICU/CLDR doesn't like, "baq-*",
I can see that my expectations have only partially been met, since the
notice doesn't say anything about the language Basque:

postgres=# create collation actually_not_basque (provider=icu,
locale='baq-u-kf-upper-kr-latn-digit-em-emoji-kn-true-co-eor');
NOTICE:  0: ICU collation description is "baq (colcasefirst=upper,
Sort Order=European Ordering Rules, colnumeric=yes,
colreorder=latn-digit, em=emoji)"
CREATE COLLATION

Functionality like this is starting to look essential to me, rather
than just a nice to have. Having this NOTICE would have made me
realize our problems with ICU versions < 54 much earlier, if nothing
else. If the purpose of NOTICE messages is to "Provide[s] information
that might be helpful to users", then I'd say that this definitely
qualifies. And, the extra code is trivial (we already get display name
in the context of initdb). I strongly recommend that we slip this into
v10, as part of fixing the problem with language tags that earlier ICU
versions have.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] CREATE COLLATION does not sanitize ICU's BCP 47 language tags. Should it?

2017-09-21 Thread Peter Geoghegan
On Thu, Sep 21, 2017 at 2:49 AM, Andreas Karlsson <andr...@proxel.se> wrote:
> If we are fine with supporting only ICU 4.2 and later (which I think we are
> given that ICU 4.2 was released in 2009) then using uloc_forLanguageTag()[1]
> to validate and canonize seems like the right solution. I had missed that
> this function even existed when I last read the documentation. Does it
> return a BCP 47 tag in modern versions of ICU?

The decision to support ICU >= 4.2 was already made (see commit
eccead9). I have no reason to think that that's a problem for us.

As I've said, we currently use uloc_toLanguageTag() on all supported
ICU versions, to at least create a collation name at initdb time, but
also to get our new collation's colcollate when ICU >= 54. If we now
put a uloc_forLanguageTag() in place before each ucol_open() call,
then we can safely store a BCP 47 format tag as colcollate on *every*
ICU version. We can reconstruct a traditional format locale string
*on-the-fly* within pg_newlocale_from_collation() and
get_collation_actual_version(), which would be what we'd pass to
ucol_open() (we'd never pass "raw colcollate").

To keep things simple, we could always convert colcollate to the
traditional format using uloc_forLanguageTag(), just in case we're on
a version of ICU where ucol_open() doesn't like locales that are
spelled using the BCP 47 format. It doesn't seem worth it to take
advantage of the leniency on ICU >= 54, since that would be a special
case (though we could if we wanted to).

> I strongly prefer if there, as much as possible, is only one format for
> inputting ICU locales.

I agree, but the bigger issue is that we're *half way* between
supporting only one format, and supporting two formats. AFAICT, there
is no reason that we can't simply support one format on all ICU
versions, and keep what ends up within pg_collation at initdb time
essentially the same across ICU versions (except for those that are
due to cultural/political developments).

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ICU locales and text/char(n) SortSupport on Windows

2017-09-21 Thread Peter Geoghegan
On Wed, Sep 20, 2017 at 11:05 PM, Noah Misch <n...@leadboat.com> wrote:
> This is currently a v10 open item, but I think it doesn't qualify for that
> treatment.  It's merely an opportunity for optimization, albeit an
> attractively-simple one.

I have withdrawn this as an open item. I'm still hopeful that it can
be worked through for v10 at Peter's discretion.


-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ICU locales and text/char(n) SortSupport on Windows

2017-09-21 Thread Peter Geoghegan
On Thu, Sep 21, 2017 at 12:12 PM, Peter Eisentraut
<peter.eisentr...@2ndquadrant.com> wrote:
>> Attached patch shows what I'm getting at. This is untested, since I
>> don't use Windows. Proceed with caution.
>
> Your analysis makes sense, but the patch doesn't work, because "locale"
> is never set before reading it.

It was just for illustrative purposes. Seems fine to actually move the
WIN32 block down to just before the existing TRUST_STRXFRM test, since
the locale is going to be cached and then immediately reused where we
return immediately (Windows libc) anyway. This would also be a win for
clarity, IMV.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ICU locales and text/char(n) SortSupport on Windows

2017-09-21 Thread Peter Geoghegan
On Wed, Sep 20, 2017 at 11:05 PM, Noah Misch <n...@leadboat.com> wrote:
> This is currently a v10 open item, but I think it doesn't qualify for that
> treatment.  It's merely an opportunity for optimization, albeit an
> attractively-simple one.

Fair enough.

This is clearly an omission that was made in 41839b7ab, the commit
that added/fixed Windows support for ICU. Hopefully the omission can
be corrected for v10. I see this as a surprising behavior for users,
since ICU locales on all other platforms *will* have much faster
sorting than libc locales (often more than 3x faster). This is mostly
because ICU brings back abbreviated keys. 3x - 5x faster is more of a
qualitative difference than a quantitative one, IMHO.

That having been said, I'm certainly not going to insist on a v10 fix
for this issue.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] CREATE COLLATION does not sanitize ICU's BCP 47 language tags. Should it?

2017-09-20 Thread Peter Geoghegan
On Wed, Sep 20, 2017 at 4:08 PM, Peter Geoghegan <p...@bowt.ie> wrote:
>> pg_import_system_collations() takes care to use the non-BCP-47 style for
>> such versions, so I think this is working correctly.
>
> But CREATE COLLATION doesn't use pg_import_system_collations().

And perhaps more to the point: it highly confusing that we use one or
the other of those 2 things ("langtag"/BCP 47 tag or "name"/legacy
locale name) as "colcollate", depending on ICU version, thereby
*behaving* as if ICU < 54 really didn't know anything about BCP 47
tags. Because, obviously earlier ICU versions know plenty about BCP
47, since 9 lines further down we use "langtag"/BCP 47 tag as collname
for CollationCreate() -- regardless of ICU version.

How can you say "ICU <54 doesn't even support the BCP 47 style", given
all that? Those versions will still have locales named "*-x-icu" when
users do "\dOS". Users will be highly confused when they quite
reasonably try to generalize from the example in the docs and what
"\dOS" shows, and get results that are wrong, often only in a very
subtle way.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] CREATE COLLATION does not sanitize ICU's BCP 47 language tags. Should it?

2017-09-20 Thread Peter Geoghegan
On Wed, Sep 20, 2017 at 4:04 PM, Peter Eisentraut
<peter.eisentr...@2ndquadrant.com> wrote:
> ICU <54 doesn't even support the BCP 47 style, so we need to keep
> supporting the old/other style anyway.

Yes it does -- just not as a native locale name. ucol_open()
apparently became more liberal in what it would accept in ICU 54.

>> And, I think that I
>> see a bigger problem: we pass a string that is almost certainly a BCP
>> 47 string to ucol_open() from within pg_newlocale_from_collation(). We
>> do so despite the fact that ucol_open() apparently doesn't accept BCP
>> 47 syntax locale strings until ICU 54 [1].
>
> pg_import_system_collations() takes care to use the non-BCP-47 style for
> such versions, so I think this is working correctly.

But CREATE COLLATION doesn't use pg_import_system_collations().

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Boom filters for hash joins (was: A design for amcheck heapam verification)

2017-09-20 Thread Peter Geoghegan
On Tue, Sep 19, 2017 at 1:25 PM, Tomas Vondra
<tomas.von...@2ndquadrant.com> wrote:
> On 09/19/2017 06:03 PM, Peter Geoghegan wrote:
>> I believe that parallelism makes the use of Bloom filter a lot more
>> compelling, too. Obviously this is something that wasn't taken into
>> consideration in 2015.
>>
>
> I haven't thought about it from that point of view. Can you elaborate
> why that would be the case? Sorry if this was explained earlier in this
> thread (I don't see it in the history, though).

Well, IPC and locking shared state to protect the state's structure is
potentially a big bottleneck for parallel hash join. I think that
Bloom filters were first used in distributed databases in the 1980s,
where a network round trip could be saved, which this is a little
like. That's why my guess is that Bloom filtering will be more
valuable when parallelism is used.

I think that right deep hash joins make this really compelling, if and
when they allow you to build multiple Bloom filters that can be
combined from multiple right deep hash table builds. I think you can
do fancy things like reduce the amount of I/O against a star schema
fact table considerably. You can use one Bloom filter (built against
some dimension table) to drive a bitmap index scan on a fact table
index, and then another Bloom filter (built against some other
dimension table) to drive another bitmap index scan. The executor then
does a bitmap AND to combine the two for a bitmap heap scan on the
fact table.

(Maybe this technique doesn't necessarily use a Bloom filter; it could
be some other type of bitmap.)

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] CREATE COLLATION does not sanitize ICU's BCP 47 language tags. Should it?

2017-09-20 Thread Peter Geoghegan
On Tue, Sep 19, 2017 at 7:01 PM, Peter Geoghegan <p...@bowt.ie> wrote:
> I didn't post the patch that generates the errors in my opening e-mail
> because I'm not confident it's correct just yet. And, I think that I
> see a bigger problem: we pass a string that is almost certainly a BCP
> 47 string to ucol_open() from within pg_newlocale_from_collation(). We
> do so despite the fact that ucol_open() apparently doesn't accept BCP
> 47 syntax locale strings until ICU 54 [1]. Seems entirely possible
> that this accounts for the problems you saw on ICU 4.2, back when we
> were still creating keyword variants (I guess that the keyword
> variants seem very "BCP 47-ish" to me).

ISTM that the proper fix here is to use uloc_forLanguageTag() [1] (not
to be confused with uloc_toLanguageTag()) to get a valid locale
identifier on versions of ICU where BCP 47 format tags are not
directly accepted as locale identifiers (versions prior to ICU 54).
This would happen as an extra step within
pg_newlocale_from_collation(), since BCP 47 format would be what is
stored in pg_collation.

Since uloc_forLanguageTag() become stable in ICU 4.2, the earliest
version that we support, I believe that that would leave us in good
shape.

[1] 
https://ssl.icu-project.org/apiref/icu4c/uloc_8h.html#aa45d6457f72867880f079e27a63c6fcb
-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel Hash take II

2017-09-20 Thread Peter Geoghegan
On Mon, Sep 18, 2017 at 1:06 PM, Robert Haas <robertmh...@gmail.com> wrote:
> If we don't adopt some approach along these lines, then I think we've
> got to articulate some alternative deadlock-avoidance rule and make
> sure every parallel query facility follows it.  I welcome ideas on
> that front, but I don't think the rule mentioned above is a bad one,
> and I'd obviously like to minimize the amount of rework that we need
> to do.  Assuming we do settle on the above rule, it clearly needs to
> be documented someplace -- not sure of the place.  I think that it
> doesn't belong in README.parallel because it's an executor-specific
> rule, not necessarily a general rule to which other users of
> parallelism must adhere; they can choose their own strategies.

+1

Graefe's "Query Evaluation Techniques for Large Databases" has several
pages on deadlock avoidance strategies. It was written almost 25 years
ago, but still has some good insights IMV (you'll recall that Graefe
is the author of the Volcano paper; this reference paper seems like
his follow-up). Apparently, deadlock avoidance strategy becomes
important for parallel sort with partitioning. You may be able to get
some ideas from there. And even if you don't, his handling of the
topic is very deliberate and high level, which suggests that ours
should be, too.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Varying results when using merge joins over postgres_fdw vs hash joins

2017-09-20 Thread Peter Geoghegan
On Wed, Sep 20, 2017 at 9:35 AM, Tom Lane <t...@sss.pgh.pa.us> wrote:
> TBH, the more I learn about ICU, the less faith I have in the proposition
> that it's going to fix anything at all for us in this area.  It seems to
> be just about as squishy as glibc in terms of locale identification,
> if not worse.

That may be our fault, to a significant degree. Did you read my mail
from yesterday, over on the "CREATE COLLATION does not sanitize ICU's
BCP 47 language tags" thread? I think that we've been incorrectly
specifying the locale name that is passed to ucol_open() this whole
time. At least, for ICU versions prior to ICU 54. This will need to be
addressed very soon.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Varying results when using merge joins over postgres_fdw vs hash joins

2017-09-20 Thread Peter Geoghegan
On Wed, Sep 20, 2017 at 9:06 AM, Tom Lane <t...@sss.pgh.pa.us> wrote:
> It is that.  I'm tempted to propose that we invent some kind of "unknown"
> collation, which the planner would have to be taught to not equate to any
> other column collation (not even other instances of "unknown"), and that
> postgres_fdw's IMPORT ought to label remote columns with that collation
> unless specifically told to do otherwise.  Then it's on the user's head
> if he tells us to do the wrong thing; but we won't produce incorrect
> plans by default.
>
> This is, of course, not at all a back-patchable fix.

I would like postgres_fdw to be taught about collation versioning, so
that postgres_fdw's IMPORT could automatically do the right thing when
ICU is in use. Maybe it's too early to discuss that, because we don't
even support alternative collation provider collations as the
database/cluster default collation just yet. FWIW, postgres_fdw +
collations was one of the issues that made me believe in the long term
strategic importance of ICU.

Anyway, I'm pretty sure that we also need to do something about this
in the short term. Maybe a prominent warning about server collation in
the postgres_fdw docs, or a user-visible NOTICE when incompatible
server collations are observed by postgres_fdw's IMPORT?

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Varying results when using merge joins over postgres_fdw vs hash joins

2017-09-20 Thread Peter Geoghegan
On Wed, Sep 20, 2017 at 2:06 AM, Ashutosh Bapat
<ashutosh.ba...@enterprisedb.com> wrote:
> In this case, both tables use same collation while comparing the rows,
> so result is different from the merge join result. Hash join executed
> on local server and the same executed on foreign server (by importing
> local table to the foreign server) would also differ.

Not really, because collatable types like text have the same equality
behavior, regardless of collation. (I would prefer it if they didn't
in at least some cases, but we don't have case insensitive collations
yet.)

I think that Corey describes a user hostile behavior. I feel that we
should try to do better here.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] CREATE COLLATION does not sanitize ICU's BCP 47 language tags. Should it?

2017-09-19 Thread Peter Geoghegan
On Tue, Sep 19, 2017 at 5:52 PM, Peter Eisentraut
<peter.eisentr...@2ndquadrant.com> wrote:
> On 9/18/17 18:46, Peter Geoghegan wrote:
>> As I pointed out a couple of times already [1], we don't currently
>> sanitize ICU's BCP 47 language tags within CREATE COLLATION.
>
> There is no requirement that the locale strings for ICU need to be BCP
> 47.  ICU locale names like 'de@collation=phonebook' are also acceptable.

Right. But, we only document that BCP 47 is supported by Postgres.
Maybe we can use get_icu_language_tag()/uloc_toLanguageTag() to ensure
that we end up with BCP 47, even when the user happens to specify the
legacy syntax. Should we be "canonicalizing" legacy ICU locale strings
as BCP 47, too?

> The reason they are not validated is that, as you know, ICU accepts any
> locale string as valid.  You appear to have found a way to do some
> validation, but I would like to see that code.

As I mentioned, I'm simply calling
get_icu_language_tag()/uloc_toLanguageTag() to do that sanitization.
The code to get the extra sanitization is completely trivial.

I didn't post the patch that generates the errors in my opening e-mail
because I'm not confident it's correct just yet. And, I think that I
see a bigger problem: we pass a string that is almost certainly a BCP
47 string to ucol_open() from within pg_newlocale_from_collation(). We
do so despite the fact that ucol_open() apparently doesn't accept BCP
47 syntax locale strings until ICU 54 [1]. Seems entirely possible
that this accounts for the problems you saw on ICU 4.2, back when we
were still creating keyword variants (I guess that the keyword
variants seem very "BCP 47-ish" to me).

I really think we need to add some kind of debug mode that makes ICU
optionally spit out a locale display name at key points. We need this
to gain confidence that the behavior that ICU provides actually
matches what is expected across ICU different versions for different
locale formats. We talked about this as a user-facing feature before,
which can wait till v11; I just want this to debug problems like this
one.

[1] 
https://ssl.icu-project.org/apiref/icu4c/ucol_8h.html#a4721e4c0a519bb0139a874e191223590
-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GUC for cleanup indexes threshold.

2017-09-19 Thread Peter Geoghegan
On Tue, Sep 19, 2017 at 4:47 PM, Claudio Freire <klaussfre...@gmail.com> wrote:
> Maybe this is looking at the problem from the wrong direction.
>
> Why can't the page be added to the FSM immediately and the check be
> done at runtime when looking for a reusable page?
>
> Index FSMs currently store only 0 or 255, couldn't they store 128 for
> half-recyclable pages and make the caller re-check reusability before
> using it?

No, because it's impossible for them to know whether or not the page
that their index scan just landed on recycled just a second ago, or
was like this since before their xact began/snapshot was acquired.

For your reference, this RecentGlobalXmin interlock stuff is what
Lanin & Shasha call "The Drain Technique" within "2.5 Freeing Empty
Nodes". Seems pretty hard to do it any other way.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)

2017-09-19 Thread Peter Geoghegan
 on
this, to make sure, it seems likely that this patch has exactly the
same performance characteristics as v10.

Thanks

[1] 
https://wiki.postgresql.org/wiki/Parallel_External_Sort#Partitioning_for_parallelism_.28parallel_external_sort_beyond_CREATE_INDEX.29
-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] CREATE COLLATION does not sanitize ICU's BCP 47 language tags. Should it?

2017-09-19 Thread Peter Geoghegan
On Tue, Sep 19, 2017 at 3:23 PM, Andreas Karlsson <andr...@proxel.se> wrote:
> If people think it is possible to get this done in time for PostgreSQL 10
> and it does not break anything on older version of ICU (or the migration
> from older versions) I am all for it.

The only behavioral difference would occur when CREATE COLLATION is
run (no changes to the initdb behavior, where the real risk exists).

What this boils down to is that we have every reason to think that the
right thing is to reject something that ICU's uloc_toLanguageTag()
itself rejects as invalid (through the get_icu_language_tag()
function). It looked like we were equivocating on following this at
one point, prior to 2bfd1b1, in order to suit ICU 4.2 (again, see
commit eccead9). I tend to think that the way we used to concatenate
variant keywords against locale names during initdb was simply wrong
in ICU 4.2, for some unknown reason. I think that the behavior that I
propose might prevent things from silently failing on ICU 4.2 that
were previously *believed* to work there, but in fact these things
(variant keywords) never really worked.

There might be an argument to be made for passing strict = FALSE to
uloc_toLanguageTag() instead, so that we replace the language tag with
one that is known to have valid syntax, and store that in pg_collation
instead (while possibly raising a NOTICE). I guess that that would
actually be the minimal fix here. I still favor a hard reject of
invalid BCP 47 syntax, though. PostgreSQL's CREATE COLLATION is not
one of the places where this kind of leniency makes sense.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] CREATE COLLATION does not sanitize ICU's BCP 47 language tags. Should it?

2017-09-19 Thread Peter Geoghegan
On Tue, Sep 19, 2017 at 2:22 PM, Tom Lane <t...@sss.pgh.pa.us> wrote:
> Andreas Karlsson <andr...@proxel.se> writes:
>> Hm, I like the idea but I see some issues.
>
>> Enforcing the BCP47 seems like a good thing to me. I do not see any
>> reason to allow input with syntax errors. The issue though is that we do
>> not want to break people's databases when they upgrade to PostgreSQL 11.
>> What if they have specified the locale in the old non-ICU format or they
>> have a bogus value and we then error out on pg_upgrade or pg_restore?
>
> Well, if PG10 shipped with that restriction in place then it wouldn't
> be an issue ;-)

I was proposing that this be treated as an open item for v10; sorry if
I was unclear on that. Much like the "ICU locales vs. ICU collations
within pg_collation" issue, this seems like the kind of thing that we
ought to go out of our way to get right in the *first* version.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Boom filters for hash joins (was: A design for amcheck heapam verification)

2017-09-19 Thread Peter Geoghegan
On Tue, Sep 19, 2017 at 6:28 AM, Tomas Vondra
<tomas.von...@2ndquadrant.com> wrote:
> The patch is fairly simple, and did not try to push the bloom filters to
> scan nodes or anything like that. It might be a meaningful first step,
> though, particularly for selective joins (where only small number of
> rows from the outer relation has a match in the hash table).

I believe that parallelism makes the use of Bloom filter a lot more
compelling, too. Obviously this is something that wasn't taken into
consideration in 2015.


-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GUC for cleanup indexes threshold.

2017-09-18 Thread Peter Geoghegan
On Wed, Apr 5, 2017 at 3:50 PM, Andres Freund <and...@anarazel.de> wrote:
> Hi,
>
> On 2017-04-01 03:05:07 +0900, Masahiko Sawada wrote:
>> On Fri, Mar 31, 2017 at 11:44 PM, Robert Haas <robertmh...@gmail.com> wrote:
>> [ lots of valuable discussion ]
>
> I think this patch clearly still is in the design stage, and has
> received plenty feedback this CF.  I'll therefore move this to the next
> commitfest.

Does anyone have ideas on a way forward here? I don't, but then I
haven't thought about it in detail in several months.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] CREATE COLLATION does not sanitize ICU's BCP 47 language tags. Should it?

2017-09-18 Thread Peter Geoghegan
As I pointed out a couple of times already [1], we don't currently
sanitize ICU's BCP 47 language tags within CREATE COLLATION. CREATE
COLLATION will accept literally any string as a language tag as things
stand, even when the string is unambiguously bogus. While I accept
that there are limits on how far you can take sanitizing the BCP 47
tag format, due to its extensibility and "best effort" emphasis on
forward and backward compatibility, we can and should do more here,
IMHO. We should at least do the bare minimum, which has no possible
downside, and some notable upsides.

If I hack the CREATE COLLATION code to put any language tag string
provided by the user through the same sanitization process that initdb
already puts ICU language tags through, then we do much better. CREATE
COLLATION rejects syntax errors, which seems desirable:

postgres=# CREATE COLLATION test1 (provider = icu, locale = 'en-x-icu');
CREATE COLLATION
postgres=# CREATE COLLATION test2 (provider = icu, locale = 'foo bar baz');
ERROR:  XX000: could not convert locale name "foo bar baz" to language
tag: U_ILLEGAL_ARGUMENT_ERROR
LOCATION:  get_icu_language_tag, collationcmds.c:454
postgres=# CREATE COLLATION test3 (provider = icu, locale = 'en-gb-icu');
ERROR:  XX000: could not convert locale name "en-gb-icu" to language
tag: U_ILLEGAL_ARGUMENT_ERROR
LOCATION:  get_icu_language_tag, collationcmds.c:454
postgres=# CREATE COLLATION test4 (provider = icu, locale = 'not-a-country');
CREATE COLLATION

(To be more specific, I'm calling
get_icu_language_tag()/uloc_toLanguageTag() [2] as an extra step for
CREATE COLLATION here.)

It's not like the current behavior is a disaster, or that the
alternative behavior that I propose is perfect. The collation behavior
you end up with today, having specified a language tag with a syntax
error is the totally generic base Ducet collation behavior. Using 'foo
bar baz' is effectively the same as using the preinstalled 'und-x-icu'
collation, which I'm pretty sure is the same as using any current
English locale anyway. That said, falling back on the most useful
collation behavior based on inferences about the language tag is
supposed to be about rolling with the complexities of
internationalization, like political changes that are not yet
accounted for by the CLDR/ICU version, and so on. It's no
justification for not letting the user know when they've fat fingered
a language tag, which they could easily miss. These are *syntax*
errors.

At one point a couple of months back, it was understood that
get_icu_language_tag() might not always work with (assumed) valid
locale names -- that is at least the impression that the commit
message of eccead9 left me with. But, that was only with ICU 4.2, and
in any case we've since stopped creating keyword variants at initdb
time for other reasons (see 2bfd1b1 for details of those other
reasons). I tend to think that we should not install any language tag
that uloc_toLanguageTag() does not accept as valid on general
principle (so not just at initdb time, when it's actually least
needed).

Thoughts? I can write a patch for this, if that helps. It should be
straightforward.

[1] 
postgr.es/m/cah2-wzm22vtxvd-e1oz90de8z_m61_8amhsdozf1pwrkfrm...@mail.gmail.com
[2] 
https://ssl.icu-project.org/apiref/icu4c/uloc_8h.html#a1d50c91925ca3853fce6f28cf7390c3c
-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Boom filters for hash joins (was: A design for amcheck heapam verification)

2017-09-18 Thread Peter Geoghegan
On Mon, Sep 18, 2017 at 2:07 PM, Robert Haas <robertmh...@gmail.com> wrote:
> On Mon, Sep 18, 2017 at 1:29 PM, Tom Lane <t...@sss.pgh.pa.us> wrote:
>> Uh, why does the planner need to be involved at all?
>
> Because it loses if the Bloom filter fails to filter anything.  That's
> not at all far-fetched; consider SELECT * FROM a.x, b.x WHERE a.x =
> b.x given a foreign key on a.x referencing b(x).

Wouldn't a merge join be a lot more likely in this case anyway? Low
selectivity hash joins with multiple batches are inherently slow; the
wasted overhead of using a bloom filter may not matter.

Obviously this is all pretty speculative. I suspect that this could be
true, and it seems worth investigating that framing of the problem
first.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Boom filters for hash joins (was: A design for amcheck heapam verification)

2017-09-18 Thread Peter Geoghegan
On Mon, Sep 18, 2017 at 10:29 AM, Tom Lane <t...@sss.pgh.pa.us> wrote:
> Robert Haas <robertmh...@gmail.com> writes:
>> On Tue, Aug 29, 2017 at 10:22 PM, Thomas Munro
>> <thomas.mu...@enterprisedb.com> wrote:
>>> (2) We could push a Bloom filter down to scans
>>> (many other databases do this, and at least one person has tried this
>>> with PostgreSQL and found it to pay off[1]).
>
>> I think the hard part is going to be figuring out a query planner
>> framework for this, because pushing down the Bloom filter down to the
>> scan changes the cost and the row-count of the scan.
>
> Uh, why does the planner need to be involved at all?  This seems like
> strictly an execution-time optimization.  Even if you wanted to try
> to account for it in costing, I think the reliability of the estimate
> would be nil, never mind any questions about whether the planner's
> structure makes it easy to apply such an adjustment.

That is how I think about it too, though I'm not at all confident of
being on the right track. (I'm pretty confident that the general idea
of using a Bloom filter for parallel hash join is a good idea,
though.)

I should point out that Bloom filters are quite insensitive to
misestimations in the total number of elements, an estimate of which
must be provided up-front (I suppose that a cardinality estimate might
make more sense for hash join bloom filters than an estimate of the
total number of elements in a batch). "Bloom Filters in Probabilistic
Verification" gives this as a key advantage for bloom filters over
other approaches to formal model verification [1]. If you can afford
to have significant misestimations and still not lose much benefit,
and if the additional overhead of having a Bloom filter is fixed and
reasonably low cost, then maybe it makes sense to apply bloom filters
as an opportunistic or optimistic optimization. Perhaps they can be
applied when there is little to lose but much to gain.

[1] http://www.ccs.neu.edu/home/pete/pub/bloom-filters-verification.pdf
-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] ICU locales and text/char(n) SortSupport on Windows

2017-09-16 Thread Peter Geoghegan
varstr_sortsupport() only allows Windows to use SortSupport with a
non-C-locale (when the server encoding happens to be UTF-8, which I
assume is the common case). This is because we (quite reasonably)
don't want to have to duplicate the ugly UTF-8 to UTF-16 conversion
hack from varstr_cmp() for the SortSupport authoritative comparator
(varstrfastcmp_locale() shouldn't get its own copy of this kludge,
because it's supposed to be "fast"). This broad restriction made sense
when Windows + UTF-8 + non-C-locale necessarily required the
aforementioned UTF-16 conversion kludge. However, iff an ICU locale is
in use on Windows (or any other platform), then we can always use
SortSupport, regardless of anything else (we should not have the core
code install a fmgr comparison shim that just calls varstr_cmp(),
though we still do). We don't actually need the UTF-16 kludge at all,
so we can use SortSupport without any special care.

The current state of affairs doesn't make any sense, AFAICT, and so
the restriction should be removed on general principle: we *already*
expect ICU to have no restrictions that are peculiar to Windows, as we
see in varstr_cmp() and str_tolower(). It's just arbitrary to hold on
to this restriction. This restriction also seems worth fixing because
Windows users are generally more likely to want to use ICU locales;
most of them would otherwise end up actually paying the overhead for
the UTF-16 kludge. (Presumably the UTF-16 conversion makes text
sorting *even slower* than it would be if we merely didn't do
SortSupport, which is to say: very slow indeed.)

In summary, we're currently attaching the use of SortSupport to the
wrong thing. We're treating this UTF-16 business as something that
implies a broad OS/platform restriction, when in fact it should be
treated as implying a restriction for one particular collation
provider only (a collation provider that happens to be built into
Windows, but isn't really special to us).

Attached patch shows what I'm getting at. This is untested, since I
don't use Windows. Proceed with caution.

On a related note, am I the only one that finds it questionable that
str_tolower() has an "#ifdef USE_WIDE_UPPER_LOWER" block that itself
contains an "#ifdef USE_ICU" block? It seems like those two things
might get conflated on some platforms. We don't want lower() to ever
not use the ICU infrastructure when an ICU collation is used, and yet
it's not obvious that that's impossible. I understand that the code in
regc_pg_locale.c kind of insists on using USE_WIDE_UPPER_LOWER
facilities, and that that was always accepted as legacy that ICU had
to live with. Maybe a static assertion is all that we need here (ICU
builds must also be USE_WIDE_UPPER_LOWER builds).

-- 
Peter Geoghegan
From fca07aca51fb7979f19180712707233ff0e6f4b4 Mon Sep 17 00:00:00 2001
From: Peter Geoghegan <p...@bowt.ie>
Date: Sat, 16 Sep 2017 13:36:25 -0700
Subject: [PATCH] Allow ICU to use SortSupport on Windows with UTF-8.

There is no reason to ever prevent the use of SortSupport on Windows
when ICU locales are used.  We previously avoided SortSupport on Windows
with UTF-8 server encoding and a non C-locale due to restrictions in
Windows' libc functionality.

This is now considered to be a restriction in one platform's libc
collation provider, and not a more general platform restriction.
---
 src/backend/utils/adt/varlena.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/src/backend/utils/adt/varlena.c b/src/backend/utils/adt/varlena.c
index ebfb823..ad8ebed 100644
--- a/src/backend/utils/adt/varlena.c
+++ b/src/backend/utils/adt/varlena.c
@@ -1828,7 +1828,8 @@ varstr_sortsupport(SortSupport ssup, Oid collid, bool bpchar)
 	 * UTF-8 and we are not using the C collation, complex hacks are required.
 	 * We don't currently have a comparator that handles that case, so we fall
 	 * back on the slow method of having the sort code invoke bttextcmp() (in
-	 * the case of text) via the fmgr trampoline.
+	 * the case of text) via the fmgr trampoline.  (ICU locales work just the
+	 * same on Windows, however.)
 	 */
 	if (lc_collate_is_c(collid))
 	{
@@ -1840,7 +1841,8 @@ varstr_sortsupport(SortSupport ssup, Oid collid, bool bpchar)
 		collate_c = true;
 	}
 #ifdef WIN32
-	else if (GetDatabaseEncoding() == PG_UTF8)
+	else if (GetDatabaseEncoding() == PG_UTF8 &&
+			 !(locale && locale->provider == COLLPROVIDER_ICU))
 		return;
 #endif
 	else
-- 
2.7.4


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] valgrind vs. shared typmod registry

2017-09-16 Thread Peter Geoghegan
On Sat, Sep 16, 2017 at 5:30 AM, Tomas Vondra
<tomas.von...@2ndquadrant.com> wrote:
> I've been running tests under valgrind not too long ago and I don't
> recall such failures, so perhaps something broke it in the past few days.

That's what we have the buildfarm animal Skink for. It has indeed been
failing within select_parallel only following the commit that you
mentioned:

https://buildfarm.postgresql.org/cgi-bin/show_history.pl?nm=skink=HEAD

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] A design for amcheck heapam verification

2017-09-16 Thread Peter Geoghegan
On Wed, Sep 6, 2017 at 7:26 PM, Peter Geoghegan <p...@bowt.ie> wrote:
> On Wed, Aug 30, 2017 at 9:29 AM, Peter Geoghegan <p...@bowt.ie> wrote:
>> On Wed, Aug 30, 2017 at 5:02 AM, Alvaro Herrera
>> <alvhe...@2ndquadrant.com> wrote:
>>> Eh, if you want to optimize it for the case where debug output is not
>>> enabled, make sure to use ereport() not elog().  ereport()
>>> short-circuits evaluation of arguments, whereas elog() does not.
>>
>> I should do that, but it's still not really noticeable.
>
> Since this patch has now bit-rotted, I attach a new revision, V2.

I should point out that I am relying on deterministic TOAST
compression within index_form_tuple() at present. This could, in
theory, become a problem later down the road, when
toast_compress_datum() compression becomes configurable via a storage
parameter or something (e.g., we use PGLZ_strategy_always, rather than
the hard coded PGLZ_strategy_default strategy).

While I should definitely have a comment above the new amcheck
index_form_tuple() call that points this out, it's not clear if that's
all that is required. Normalizing the representation of hashed index
tuples to make verification robust against unforeseen variation in
TOAST compression strategy seems like needless complexity to me, but
I'd like to hear a second opinion on that.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] The case for removing replacement selection sort

2017-09-15 Thread Peter Geoghegan
Tomas' e-mail from earlier today, that I've already replied to
directly, seems to have been lost to the mailing list. This must be
due to having a 1MB attachment (results spreadsheet), which seems a
bit aggressive as a reason to withold it IMV.

Here is a link to his results, converted to a Google docs spreadsheet:

https://docs.google.com/spreadsheets/d/1SL_IIkPdiJUZ9BHUgROcYkEKWZUch4wRax2SyOmMuT8/edit?usp=sharing

Here is his e-mail, in full:

On Fri, Sep 15, 2017 at 6:34 AM, Tomas Vondra
<tomas.von...@2ndquadrant.com> wrote:
> Attached is a spreadsheet with updated data (using the more accurate
> timing, and comparing master with the default replacement_sort_tuples
> value (150k) and increased per Peter's instructions (to 1B).
>
> There are 6 sheets - one for each combination of a dataset size (100k
> and 1M) and machine where I ran the tests (with different CPU models).
>
> There are 5 columns - first three are medians for each of the tested
> PostgreSQL configurations:
>
> - master (default)
> - master (1billion)
> - no-replacement-sort (with patch applied)
>
> The numbers are medians from 5 runs (perhaps minimums would be better in
> this case).
>
> The last two columns are comparisons / speed-ups
>
> - master(1B) / master(default)
> - no-replacement-sort / master(default)
>
> Green numbers (below 1.0) mean speed-up, red (above 1.0) slow-down.
>
> Firstly, the master with r_s_t=1B setting performs either the same or
> worse compared to a default values, in almost every test. On the 100k
> data set the results are a bit noisy (particularly on the oldest CPU),
> but on the 1M data set the difference is quite clear. So I've only
> compared results for master(default) and patched.
>
> Quick summary, for each CPU model (which clearly affects the behavior).
>
>
> e5-2620-v4
> --
> - probably the CPU we should be looking at, as it's the current model
> - in some cases this gives us 3-5x speedup with low work_mem settings
> - consider for example the very last line, which shows that
>
>   SELECT DISTINCT a FROM text_test_padding ORDER BY a
>
> completed in ~5531ms on work_mem=8MB and 1067ms on 32MB, but with the
> patch it completes in 1784ms (1MB), 1211ms (4MB) and 1104 (8MB).
>
> - Of course, this is for already-sorted data, and for other data sets
> the improvement is more modest. It's difficult to summarize this into a
> single number, but there are plenty of improvements in 20-30% range.
>
> - Some cases are a bit slower, of course, but overall I'd say the chart
> is much more green than red. Also the slow-downs are much smaller,
> compared to speed-ups (generally within 1-5%).
>
> i5-2500k
> 
> - same story, but this CPU gives more stable results (less noise)
>
> e5-5450
> ---
> - rather noisy CPU, particularly on the small (100k) dataset
> - the larger data set mostly matches the other CPUs, although the
> regressions are somewhat more significant
> - I wouldn't really worry about this too much, it's clearly an obsolete
> CPU and not something performance-conscious person would use nowadays
> (the other CPUs are often 2-3x faster).
>
> If needed, full data is available here (each machine is pushing data to
> a separate git repository):
>
> * https://bitbucket.org/tvondra/sort-bench-i5-2500k
> * https://bitbucket.org/tvondra/sort-bench-e5-2620-v4
> * https://bitbucket.org/tvondra/sort-bench-e5-5450
>
> At this point the 10M row tests are running, but I don't expect anything
> particularly surprising from those results. That is, it's not something
> that should block getting this patch committed, if the agreement is to
> commit otherwise.
>
> regards
>
> --
> Tomas Vondra  http://www.2ndQuadrant.com
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] The case for removing replacement selection sort

2017-09-15 Thread Peter Geoghegan
On Fri, Sep 15, 2017 at 6:34 AM, Tomas Vondra
<tomas.von...@2ndquadrant.com> wrote:
> e5-2620-v4
> --
> - probably the CPU we should be looking at, as it's the current model
> - in some cases this gives us 3-5x speedup with low work_mem settings
> - consider for example the very last line, which shows that
>
>   SELECT DISTINCT a FROM text_test_padding ORDER BY a
>
> completed in ~5531ms on work_mem=8MB and 1067ms on 32MB, but with the
> patch it completes in 1784ms (1MB), 1211ms (4MB) and 1104 (8MB).

I think that this is because of the importance of getting a final
on-the-fly merge. Overlapping I/O with computation matters a lot here.

> - Of course, this is for already-sorted data, and for other data sets
> the improvement is more modest. It's difficult to summarize this into a
> single number, but there are plenty of improvements in 20-30% range.
>
> - Some cases are a bit slower, of course, but overall I'd say the chart
> is much more green than red. Also the slow-downs are much smaller,
> compared to speed-ups (generally within 1-5%).

The larger regressions mostly involve numeric. We have two
palloc()/pfree() cycles in the standard numeric comparator (one for
each datum that is compared), a cost which could be removed by
enhancing numeric SortSupport directly. That's why the numeric
abbreviated key stuff was the most effective (more effective than
text) when added to 9.5. We currently have very expensive full
comparisons of datums within L1 cache ("merge comparisons") relative
to abbreviated key comparisons -- the difference is huge, but could be
reduced.

Anyway, it seems ironic that in those few cases where replacement
selection still does better, it seems to be due to reduced CPU costs,
not reduced I/O costs. This is the opposite benefit to the one you'd
expect from reading Knuth.

Thanks for benchmarking! I hope that this removes the doubt that
replacement selection previously benefited from; it now clearly
deserves to be removed from tuplesort.c.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Trouble with amcheck

2017-09-14 Thread Peter Geoghegan
On Thu, Sep 14, 2017 at 5:03 PM, Douglas Doole <dougdo...@gmail.com> wrote:
> I just cloned PostgreSQL to a new machine today (Ubuntu 17.04). "make
> install" and "make check-world" run fine but "make installcheck-world" is
> having trouble with amcheck:
>
> In contrib/amcheck/results:
>
> CREATE EXTENSION amcheck;
> ERROR:  could not open extension control file
> "/home/doole/pgCommunity/install/share/postgresql/extension/amcheck.control":
> No such file or directory
>
> I expect I'm missing something in the machine set up, but I'm stumped as to
> what.

I think you need to build and install contrib, so that it is available
to the server that you're running an installcheck against. amcheck is
alphabetically first among contrib modules that have tests, IIRC.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Automatic testing of patches in commit fest

2017-09-12 Thread Peter Geoghegan
On Tue, Sep 12, 2017 at 8:30 AM, Tom Lane <t...@sss.pgh.pa.us> wrote:
> +1 ... seems like there's enough noise here that changing patch status
> based on the results might be premature.  Still, I applaud the effort.

There are always going to be cases where the tool fails, unless there
are more formal guidelines on patch submission. For example, if
someone posts multiple patches, and did not use git format-patch, then
the heuristic on which order to apply patches in will fail at times. I
don't think it's necessary to constrain the patch format that people
use too much, but this does need to be thought out.

Another thing that I think needs to happen is that the CF app needs to
add a web API. Friends of mine who actually know about this stuff tell
me that JSON API is a good default choice these days. Currently,
Thomas fetches information from the CF app using "screen scraping"
techniques, which are obviously fairly brittle. Similarly, Thomas'
patch testing web application should itself have a web API,
potentially usable by the CF app.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] The case for removing replacement selection sort

2017-09-11 Thread Peter Geoghegan
On Wed, Sep 6, 2017 at 2:55 PM, Peter Geoghegan <p...@bowt.ie> wrote:
> On Wed, Sep 6, 2017 at 2:47 PM, Thomas Munro
> <thomas.mu...@enterprisedb.com> wrote:
>>> I attach a patch to remove replacement selection, which I'll submit to CF 1.
>>
>> This breaks the documentation build, because
>> doc/src/sgml/release-9.6.sgml still contains > linkend="guc-replacement-sort-tuples"> but you removed that id.
>
> Thanks for looking into it.
>
> I suppose that the solution is to change the 9.6 release notes to no
> longer have a replacement_sort_tuples link. Anyone else have an
> opinion on that?

Attached is a revision of the patch that no longer breaks the
documentation build, by using a literal tag to refer to
replacement_sort_tuples within doc/src/sgml/release-9.6.sgml. The
patch is otherwise unchanged, and so reviewers shouldn't bother with
it (I just want to unbreak Thomas' continuous integration build, and
to save a committer the hassle of fixing the doc build themselves). I
verified that "make check-world" passes this time.

I also eyeballed the html generated by a "make STYLE=website html", to
ensure that it looked consistent with its surroundings. The resulting
9.6 release notes looked good to me.

-- 
Peter Geoghegan
From a22a77d09a5fb37713d8486db84e9619acc449d3 Mon Sep 17 00:00:00 2001
From: Peter Geoghegan <p...@bowt.ie>
Date: Wed, 30 Aug 2017 16:14:36 -0700
Subject: [PATCH] Remove replacement selection sort.

This simplifies tuplesort.c, since heap maintenance routines need no
longer concern themselves with cases where two runs are represented
within a heap.

Replacement selection sort's traditional advantages no longer allow it
to outperform a simple sort-merge strategy, even in sympathetic cases.
This is due to long term trends in hardware performance characteristics,
and enhancements to the tuplesort.c merge code in the past couple of
years.

This will need to be called out as an incompatibility in the v11 release
notes, since it's possible that users set replacement_sort_tuples
themselves.

Discussion: https://postgr.es/m/cah2-wzmmnjg_k0r9nqywmq3zjyjjk+hcbizynghay-zyjs6...@mail.gmail.com
---
 doc/src/sgml/config.sgml  |  39 ---
 doc/src/sgml/release-9.6.sgml |   2 +-
 src/backend/utils/init/globals.c  |   1 -
 src/backend/utils/misc/guc.c  |  10 -
 src/backend/utils/misc/postgresql.conf.sample |   1 -
 src/backend/utils/sort/tuplesort.c| 415 +++---
 src/include/miscadmin.h   |   1 -
 src/test/regress/expected/cluster.out |  17 +-
 src/test/regress/sql/cluster.sql  |  14 +-
 9 files changed, 52 insertions(+), 448 deletions(-)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 2b6255e..7c625e2 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -1513,45 +1513,6 @@ include_dir 'conf.d'
   
  
 
- 
-  replacement_sort_tuples (integer)
-  
-   replacement_sort_tuples configuration parameter
-  
-  
-  
-   
-When the number of tuples to be sorted is smaller than this number,
-a sort will produce its first output run using replacement selection
-rather than quicksort.  This may be useful in memory-constrained
-environments where tuples that are input into larger sort operations
-have a strong physical-to-logical correlation.  Note that this does
-not include input tuples with an inverse
-correlation.  It is possible for the replacement selection algorithm
-to generate one long run that requires no merging, where use of the
-default strategy would result in many runs that must be merged
-to produce a final sorted output.  This may allow sort
-operations to complete sooner.
-   
-   
-The default is 150,000 tuples.  Note that higher values are typically
-not much more effective, and may be counter-productive, since the
-priority queue is sensitive to the size of available CPU cache, whereas
-the default strategy sorts runs using a cache
-oblivious algorithm.  This property allows the default sort
-strategy to automatically and transparently make effective use
-of available CPU cache.
-   
-   
-Setting maintenance_work_mem to its default
-value usually prevents utility command external sorts (e.g.,
-sorts used by CREATE INDEX to build B-Tree
-indexes) from ever using replacement selection sort, unless the
-input tuples are quite wide.
-   
-  
- 
-
  
   autovacuum_work_mem (integer)
   
diff --git a/doc/src/sgml/release-9.6.sgml b/doc/src/sgml/release-9.6.sgml
index fa5355f..603300a 100644
--- a/doc/src/sgml/release-9.6.sgml
+++ b/doc/src/sgml/release-9.6.sgml
@@ 

Re: [HACKERS] CLUSTER command progress monitor

2017-09-11 Thread Peter Geoghegan
On Mon, Sep 11, 2017 at 7:38 AM, Robert Haas <robertmh...@gmail.com> wrote:
> On Sun, Sep 10, 2017 at 10:36 PM, Tatsuro Yamada
> <yamada.tats...@lab.ntt.co.jp> wrote:
>> Thanks for the comment.
>>
>> As you know, CLUSTER command uses SEQ SCAN or INDEX SCAN as a scan method by
>> cost estimation. In the case of SEQ SCAN, these two phases not overlap.
>> However, in INDEX SCAN, it overlaps. Therefore I created the phase of "scan
>> heap and write new heap" when INDEX SCAN was selected.
>>
>> I agree that progress reporting for sort is difficult. So it only reports
>> the phase ("sorting tuples") in the current design of progress monitor of
>> cluster.
>> It doesn't report counter of sort.
>
> Doesn't that make it almost useless?  I would guess that scanning the
> heap and writing the new heap would ordinarily account for most of the
> runtime, or at least enough that you're going to want something more
> than just knowing that's the phase you're in.

It's definitely my experience that CLUSTER is incredibly I/O bound.
You're shoveling the tuples through tuplesort.c, but the actual
sorting component isn't where the real costs are. Profiling shows that
writing out the new heap (including moderately complicated
bookkeeping) is the bottleneck, IIRC. That's why parallel CLUSTER
didn't look attractive, even though it would be a fairly
straightforward matter to add that on top of the parallel CREATE INDEX
structure from the patch that I wrote to do that.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] The case for removing replacement selection sort

2017-09-11 Thread Peter Geoghegan
On Mon, Sep 11, 2017 at 8:50 AM, Robert Haas <robertmh...@gmail.com> wrote:
> On Mon, Sep 11, 2017 at 11:47 AM, Tomas Vondra
> <tomas.von...@2ndquadrant.com> wrote:
>> The question is what is the optimal replacement_sort_tuples value? I
>> assume it's the number of tuples that effectively uses CPU caches, at
>> least that's what our docs say. So I think you're right it to 1B rows
>> may break this assumption, and make it perform worse.
>>
>> But perhaps the fact that we're testing with multiple work_mem values,
>> and with smaller data sets (100k or 1M rows) makes this a non-issue?
>
> I am not sure that's the case -- I think that before Peter's changes
> it was pretty easy to find cases where lowering work_mem made sorting
> ordered data go faster.

Before enhancements to merging by both Heikki and myself went into
Postgres 10, there were cases where replacement selection of the first
run did relatively well, and cases where it did badly despite managing
to avoid a merge. The former were cases where work_mem was low, and
the latter were cases where it was high. This was true because the
heap used is cache inefficient. When it was small/cache efficient, and
when there was ordering to exploit, replacement selection could win.
Your recollection there sounds accurate to me.

These days, even a small, cache-efficient heap is generally not good
enough to beat quicksorting, since merging attained the ability to
exploit presortedness itself within commit 24598337c, and memory
utilization for merging got better, too. Very roughly speaking,
merging attained the same advantage that replacement selection had all
along, and replacement selection lost all ability to compete.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] The case for removing replacement selection sort

2017-09-11 Thread Peter Geoghegan
On Mon, Sep 11, 2017 at 8:47 AM, Tomas Vondra
<tomas.von...@2ndquadrant.com> wrote:
> The question is what is the optimal replacement_sort_tuples value?

See my remarks to Robert just now.

I think that it's incredibly hard to set replacement_sort_tuples
sensibly in 9.6. As of Postgres 10, it is much more likely to hurt
than to help, because of the enhancements to merging that went into
Postgres 10 reduced the downside of not using replacement selection.
And so, for Postgres 11 replacement_sort_tuples deserves to be
removed.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] The case for removing replacement selection sort

2017-09-11 Thread Peter Geoghegan
On Mon, Sep 11, 2017 at 8:32 AM, Robert Haas <robertmh...@gmail.com> wrote:
> On Sun, Sep 10, 2017 at 9:39 PM, Peter Geoghegan <p...@bowt.ie> wrote:
>> To be clear, you'll still need to set replacement_sort_tuples high
>> when testing RS, to make sure that we really use it for at least the
>> first run when we're expected to. (There is no easy way to have
>> testing mechanically verify that we really do only have one run in the
>> end with RS, but I assume that such paranoia is unneeded.)
>
> I seem to recall that raising replacement_sort_tuples makes
> replacement selection look worse in some cases -- the optimization is
> more likely to apply, sure, but the heap is also bigger, which hurts.

But that was only because work_mem was set relatively high. If you're
going to test work_mem values that are slightly on the high side for
replacement selection (like, 8MB or 32MB), then increasing
replacement_sort_tuples ensures that replacement selection is actually
used for at least the first run, and you don't waste time comparing a
behavior (quicksorting runs) to itself.

All that matters is whether or not replacement_sort_tuples exceeds the
number of tuples that the first run would have if quicksorted
immediately. replacement_sort_tuples is only ever used to answer a
single yes/no question.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] The case for removing replacement selection sort

2017-09-11 Thread Peter Geoghegan
On Mon, Sep 11, 2017 at 8:17 AM, Tomas Vondra
<tomas.von...@2ndquadrant.com> wrote:
> Overall I think the results show quite significant positive impact of
> the patch. There are a few cases of regression, but ISTM those may
> easily be noise as it's usually 0.03 vs 0.04 second, or something. I'll
> switch to the \timing (instead of /usr/bin/time) to get more accurate
> results, and rerun those tests.

I'm glad to hear it. While I'm not surprised, I still don't consider
the patch to be a performance enhancement. It is intended to lower the
complexity of tuplesort.c, and the overall performance improvement is
just a bonus IMV.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] The case for removing replacement selection sort

2017-09-10 Thread Peter Geoghegan
On Sun, Sep 10, 2017 at 5:59 PM, Tomas Vondra
<tomas.von...@2ndquadrant.com> wrote:
> OK, so 1MB, 4MB, 8MB, 32MB?

Right.

> Ah, so you suggest doing all the tests on current master, by only
> tweaking the replacement_sort_tuples value? I've been testing master vs.
> your patch, but I guess setting replacement_sort_tuples=0 should have
> the same effect.

I think that there may be a very slight advantage to RS-free
performance with my patch actually applied, because it removes the
number of instructions that heap maintenance (merging) routines
consist of. This is due to my removing HEAPCOMPARE()/tupindex
handling. However, it's probably a low single digit percentage
difference -- not enough to be worth taking into account, probably. I
assume that just setting replacement_sort_tuples to zero is more
convenient for you (that's what I did).

To be clear, you'll still need to set replacement_sort_tuples high
when testing RS, to make sure that we really use it for at least the
first run when we're expected to. (There is no easy way to have
testing mechanically verify that we really do only have one run in the
end with RS, but I assume that such paranoia is unneeded.)

> I probably won't eliminate the random/DESC data sets, though. At least
> not from the two smaller data sets - I want to do a bit of benchmarking
> on Heikki's polyphase merge removal patch, and for that patch those data
> sets are still relevant. Also, it's useful to have a subset of results
> where we know we don't expect any change.

Okay. The DESC dataset is going to make my patch look good, because it
won't change anything for merging (same number of runs in the end),
but sorting will be slower for the first run with RS.

> Meh, more data is probably better. And with the reduced work_mem values
> and skipping of random/DESC data sets it should complete much faster.

Note that my own test case had a much higher number of tuples than
even 10 million -- it had 100 million tuples. I think that if any of
your test cases bring about a new insight, it will not be due to the
number of distinct tuples. But, if the extra time it takes doesn't
matter to you, then it doesn't matter to me either.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] The case for removing replacement selection sort

2017-09-10 Thread Peter Geoghegan
On Sun, Sep 10, 2017 at 5:07 PM, Tomas Vondra
<tomas.von...@2ndquadrant.com> wrote:
> I'm currently re-running the benchmarks we did in 2016 for 9.6, but
> those are all sorts with a single column (see the attached script). But
> it'd be good to add a few queries testing sorts with multiple keys. We
> can either tweak some of the existing data sets + queries, or come up
> with entirely new tests.

I see that work_mem is set like this in the script:

"for wm in '1MB' '8MB' '32MB' '128MB' '512MB' '1GB'; do"

I suggest that we forget about values over 32MB, since the question of
how well quicksort does there was settled by your tests in 2016. I
would also add '4MB' to the list of wm values that you'll actually
test.

Any case with input that is initially in random order or DESC sort
order is not interesting, either. I suggest you remove those, too.

I think we're only interested in benchmarks where replacement
selection really does get its putative best case (no merge needed in
the end). Any (almost) sorted cases (the only cases that you are
interesting to test now) will always manage that, once you set
replacement_sort_tuples high enough, and provided there isn't even a
single tuple that is completely out of order. The "before" cases here
should have a replacement_sort_tuples of 1 billion (so that we're sure
to not have the limit prevent the use of replacement selection in the
first place), versus the "after" cases, which should have a
replacement_sort_tuples of 0 to represent my proposal (to represent
performance in a world where replacement selection is totally
removed).

> For the existing queries, I should have some initial results tomorrow,
> at least for the data sets with 100k and 1M rows. The tests with 10M
> rows will take much more time (it takes 1-2hours for a single work_mem
> value, and we're testing 6 of them).

I myself don't see that much value in a 10M row test.

Thanks for volunteering to test!
-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] The case for removing replacement selection sort

2017-09-10 Thread Peter Geoghegan
On Sat, Sep 9, 2017 at 8:28 AM, Greg Stark <st...@mit.edu> wrote:
> This may be a bit "how long is a piece of string" but how do those two
> compare with string sorting in an interesting encoding/locale -- say
> /usr/share/dict/polish in pl_PL for example. It's certainly true that
> people do sort text as well as numbers.

I haven't looked at text, because I presume that it's very rare for
tuples within a table to be more or less ordered by a text attribute.
While the traditional big advantage of replacement selection has
always been its ability to double initial run length on average, where
text performance is quite interesting because localized clustering
still happens, that doesn't really seem relevant here. The remaining
use of replacement selection is expressly only about the "single run,
no merge" best case, and in particular about avoiding regressions when
upgrading from versions prior to 9.6 (9.6 is the version where we
began to generally prefer quicksort).

> Also, people often sort on
> keys of more than one column....

Very true. I should test this.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] The case for removing replacement selection sort

2017-09-08 Thread Peter Geoghegan
On Thu, Sep 7, 2017 at 2:49 PM, Robert Haas <robertmh...@gmail.com> wrote:
> On Thu, Sep 7, 2017 at 5:38 PM, Tomas Vondra
> <tomas.von...@2ndquadrant.com> wrote:
>> Do we need/want to repeat some of that benchmarking on these patches? I
>> don't recall how much this code changed since those benchmarks were done
>> in the 9.6 cycle.
>
> +1 for some new benchmarking.  I'm all for removing this code if we
> don't need it any more, but it'd be a lot better to have more numbers
> before deciding that.

It isn't always a win. For example, if I alter the pgbench_accounts
table so that the column "aid" is of type numeric, the picture changes
for my test case -- replacement selection is still somewhat faster for
the "select count(distinct aid) from pgbench_accounts" query with
work_mem=2MB. It takes about 5.4 seconds with replacement selection,
versus 7.4 seconds for quicksorting. But, I still think we should
proceed with my patch, because:

* It's still faster with int4/int8 comparisons on modern hardware, and
I think that most ordered inputs will be of those types in practice.
The trade-off between those two properties (faster for int4/int8 when
ordered, slower for numeric) recommends killing replacement selection
entirely. It's not a slam dunk, but it makes sense on performance
grounds, IMV.

* The comparator numeric_fast_cmp() is not well optimized -- it
doesn't avoid allocating memory with each call, for example, unlike
varstrfastcmp_locale(). This could and should be improved, and might
even change the picture for this second test case.

* With the default work_mem of 8MB, we never use replacement selection
anyway. Whatever its merits, it's pretty rare to use replacement
selection simply because the default replacement_sort_tuples just
isn't that  many tuples (150,000).

* The upside of my patch is not inconsiderable: We can remove a lot of
code, which will enable further improvements in the future, which are
far more compelling (cleaner memory management, the use of memory
batches during run generation).

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] A design for amcheck heapam verification

2017-09-06 Thread Peter Geoghegan
On Wed, Aug 30, 2017 at 9:29 AM, Peter Geoghegan <p...@bowt.ie> wrote:
> On Wed, Aug 30, 2017 at 5:02 AM, Alvaro Herrera
> <alvhe...@2ndquadrant.com> wrote:
>> Eh, if you want to optimize it for the case where debug output is not
>> enabled, make sure to use ereport() not elog().  ereport()
>> short-circuits evaluation of arguments, whereas elog() does not.
>
> I should do that, but it's still not really noticeable.

Since this patch has now bit-rotted, I attach a new revision, V2.
Apart from fixing some Makefile bitrot, this revision also makes some
small tweaks as suggested by Thomas and Alvaro. The documentation is
also revised and expanded, and now discusses practical aspects of the
set membership being tested using a Bloom filter, how that relates to
maintenance_work_mem, and so on.

Note that this revision does not let the Bloom filter caller use their
own dynamic shared memory, which is something that Thomas asked about.
While that could easily be added, I think it should happen later. I
really just wanted to make sure that my Bloom filter was not in some
way fundamentally incompatible with Thomas' planned enhancements to
(parallel) hash join.

-- 
Peter Geoghegan
From d4dda95dd41204315dc12936fac83d2df8676992 Mon Sep 17 00:00:00 2001
From: Peter Geoghegan <p...@bowt.ie>
Date: Thu, 24 Aug 2017 20:58:21 -0700
Subject: [PATCH 1/2] Add Bloom filter data structure implementation.

A Bloom filter is a space-efficient, probabilistic data structure that
can be used to test set membership.  Callers will sometimes incur false
positives, but never false negatives.  The rate of false positives is a
function of the total number of elements and the amount of memory
available for the Bloom filter.

Two classic applications of Bloom filters are cache filtering, and data
synchronization testing.  Any user of Bloom filters must accept the
possibility of false positives as a cost worth paying for the benefit in
space efficiency.
---
 src/backend/lib/Makefile  |   4 +-
 src/backend/lib/README|   2 +
 src/backend/lib/bloomfilter.c | 297 ++
 src/include/lib/bloomfilter.h |  26 
 4 files changed, 327 insertions(+), 2 deletions(-)
 create mode 100644 src/backend/lib/bloomfilter.c
 create mode 100644 src/include/lib/bloomfilter.h

diff --git a/src/backend/lib/Makefile b/src/backend/lib/Makefile
index d1fefe4..191ea9b 100644
--- a/src/backend/lib/Makefile
+++ b/src/backend/lib/Makefile
@@ -12,7 +12,7 @@ subdir = src/backend/lib
 top_builddir = ../../..
 include $(top_builddir)/src/Makefile.global
 
-OBJS = binaryheap.o bipartite_match.o dshash.o hyperloglog.o ilist.o \
-	   knapsack.o pairingheap.o rbtree.o stringinfo.o
+OBJS = binaryheap.o bipartite_match.o bloomfilter.o dshash.o hyperloglog.o \
+   ilist.o knapsack.o pairingheap.o rbtree.o stringinfo.o
 
 include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/lib/README b/src/backend/lib/README
index 5e5ba5e..376ae27 100644
--- a/src/backend/lib/README
+++ b/src/backend/lib/README
@@ -3,6 +3,8 @@ in the backend:
 
 binaryheap.c - a binary heap
 
+bloomfilter.c - probabilistic, space-efficient set membership testing
+
 hyperloglog.c - a streaming cardinality estimator
 
 pairingheap.c - a pairing heap
diff --git a/src/backend/lib/bloomfilter.c b/src/backend/lib/bloomfilter.c
new file mode 100644
index 000..e93f9b0
--- /dev/null
+++ b/src/backend/lib/bloomfilter.c
@@ -0,0 +1,297 @@
+/*-
+ *
+ * bloomfilter.c
+ *		Minimal Bloom filter
+ *
+ * A Bloom filter is a probabilistic data structure that is used to test an
+ * element's membership of a set.  False positives are possible, but false
+ * negatives are not; a test of membership of the set returns either "possibly
+ * in set" or "definitely not in set".  This can be very space efficient when
+ * individual elements are larger than a few bytes, because elements are hashed
+ * in order to set bits in the Bloom filter bitset.
+ *
+ * Elements can be added to the set, but not removed.  The more elements that
+ * are added, the larger the probability of false positives.  Caller must hint
+ * an estimated total size of the set when its Bloom filter is initialized.
+ * This is used to balance the use of memory against the final false positive
+ * rate.
+ *
+ * Copyright (c) 2017, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *	  src/backend/lib/bloomfilter.c
+ *
+ *-
+ */
+#include "postgres.h"
+
+#include 
+
+#include "access/hash.h"
+#include "lib/bloomfilter.h"
+#include "utils/memutils.h"
+
+#define MAX_HASH_FUNCS	10
+#define NBITS(filt)		((1 << (filt)->bloom_power))
+
+typedef struct bloom_filter
+{
+	/* 2 ^ bloom_power is the size of the bitset (in bits) */
+	intbloom_

Re: [HACKERS] The case for removing replacement selection sort

2017-09-06 Thread Peter Geoghegan
On Wed, Sep 6, 2017 at 2:47 PM, Thomas Munro
<thomas.mu...@enterprisedb.com> wrote:
>> I attach a patch to remove replacement selection, which I'll submit to CF 1.
>
> This breaks the documentation build, because
> doc/src/sgml/release-9.6.sgml still contains  linkend="guc-replacement-sort-tuples"> but you removed that id.

Thanks for looking into it.

I suppose that the solution is to change the 9.6 release notes to no
longer have a replacement_sort_tuples link. Anyone else have an
opinion on that?

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] INSERT .. ON CONFLICT DO SELECT [FOR ..]

2017-09-04 Thread Peter Geoghegan
On Mon, Sep 4, 2017 at 10:05 AM, Marko Tiikkaja <ma...@joh.to> wrote:
> I had to look at the patch to see what I'd done, and the tests suggest that
> we already complain about this with if a FOR [lock level] clause is present:
>
>+begin transaction isolation level read committed;
>+insert into selfconflict values (10,1), (10,2) on conflict(f1) do select
> for update returning *;
>+ERROR:  ON CONFLICT command cannot affect row a second time
>+HINT:  Ensure that no rows proposed for insertion within the same
> command have duplicate constrained values.
>+commit;
>
> (in expected/insert_conflict.out.)

Right. I saw that you do it for ON CONFLICT DO SELECT FOR UPDATE, and so on.

>> On to the subject of my more general reservation: Why did you include
>> ON CONFLICT DO SELECT at all? Why not just ON CONFLICT DO SELECT FOR
>> UPDATE (and FOR SHARE, ...) ?
>
>
> If you don't intend to actually do anything with the row in the same
> database transaction, locking seems unnecessary.  For example, you might
> want to provide an idempotent method in your API which inserts the data and
> returns the ID to the caller.  The transaction is committed by the time the
> client sees the data, so locking is just extra overhead.

That makes sense, but I personally feel that the plain ON CONFLICT DO
SELECT variant isn't worth the trouble. I welcome other people's
opinions on that.

>> I think I know what you're going to say about it: ON CONFLICT DO
>> NOTHING doesn't lock the conflicting row, so why should I insist on it
>> here (why not have an ON CONFLICT DO SELECT variant, too)?
>
>
> I wasn't going to say that, no.

Well, it was a foundation for the ON CONFLICT DO SELECT variant that I
actually agree with, in any case.

>> In other words, while ON CONFLICT DO NOTHING may have set a precedent
>> here, it at least has semantics that limit the consequences of not
>> locking the row; and it *feels* a little bit dirty to use it
>> indifferently, even where that makes sense. ON CONFLICT DO SELECT is
>> probably going to be used within wCTEs some of the time. I'm not sure
>> that a plain ON CONFLICT DO SELECT variant won't allow unpredictable,
>> complicated problems when composed within a more complicated query.
>
>
> Yeah, in most cases you'd probably do a  SELECT FOR KEY SHARE.  And I
> wouldn't mind that being the default, but it would mean inventing special
> syntax with no previous precedent (as far as I know) for not locking the row
> in cases where the user doesn't want that.  And that doesn't seem too
> attractive, either.

I'm not in favor of spellings that are inconsistent with SELECT FOR ... .

> Maybe it would be better to make the default sensible for people who are not
> super familiar with postgres.  I don't know.  And the more I think about the
> use case above, the less I care about it.

I was going to ask you to offer a real-world example of where the
plain ON CONFLICT DO SELECT variant is compelling. If you care about
it (if you're not going to concede the point), then I still suggest
doing so.

> But I'm generally against
> interfaces which put arbitrary restrictions on what power users can do on
> the basis that less experienced people might misuse the interface.

I agree with that as a broad principle, but disagree that it applies
here. Or if it does, then you have yet to convince me of it. In all
sincerity, if you think that I just don't understand your perspective,
then please try to make me understand it. Would a power user actually
miss ON CONFLICT DO SELECT? And if that's the case, would it really be
something they'd remember 5 minutes later?

I actually think that ON CONFLICT DO NOTHING does have semantics that
are, shall we say, questionable. That's the cost of having it not lock
conflicting rows during big ETL operations. That's a huge practical
benefit for ETL use-cases. Whereas here, with ON CONFLICT DO SELECT,
I see a somewhat greater risk, and a much, much smaller benefit. A
benefit that might actually be indistinguishable from zero.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] INSERT .. ON CONFLICT DO SELECT [FOR ..]

2017-09-03 Thread Peter Geoghegan
On Tue, Aug 15, 2017 at 12:17 AM, Marko Tiikkaja <ma...@joh.to> wrote:
> On Tue, Aug 15, 2017 at 7:43 AM, Peter Geoghegan <p...@bowt.ie> wrote:
>>
>> On Mon, Aug 14, 2017 at 6:23 PM, Marko Tiikkaja <ma...@joh.to> wrote:
>> > Attached is a patch for $SUBJECT.  It might still be a bit rough around
>> > the
>> > edges and probably light on docs and testing, but I thought I'd post it
>> > anyway so I won't forget.
>>
>> Is it possible for ON CONFLICT DO SELECT to raise a cardinality
>> violation error? Why or why not?
>
>
> No.  I don't see when that would need to happen.  But I'm guessing you have
> a case in mind?

Actually, no, I didn't. But I wondered if you did. I think that it
makes some sense not to, now that I think about it. ON CONFLICT DO
NOTHING doesn't have cardinality violations, because it cannot affect
a row twice if there are duplicates proposed for insertion (at least,
not through any ON CONFLICT related codepath). But, this opinion only
applies to ON CONFLICT DO SELECT, not ON CONFLICT DO SELECT FOR
UPDATE. And I have other reservations, which I'll go in to
momentarily, about ON CONFLICT DO SELECT in general. So, the upshot is
that I think we need cardinality violations in all cases for this
feature. Why would a user ever not want to know that the row was
locked twice?

On to the subject of my more general reservation: Why did you include
ON CONFLICT DO SELECT at all? Why not just ON CONFLICT DO SELECT FOR
UPDATE (and FOR SHARE, ...) ?

I think I know what you're going to say about it: ON CONFLICT DO
NOTHING doesn't lock the conflicting row, so why should I insist on it
here (why not have an ON CONFLICT DO SELECT variant, too)? That's not
a bad argument. However, I think that there is still a difference.
Namely, ON CONFLICT DO NOTHING doesn't let you project the rows with
RETURNING (that may not even be visible to the command's MVCC snapshot
-- the rows that we also don't lock), because those are simply not the
semantics it has. DO NOTHING is more or less about ETL use-cases, some
of which involve data from very unclean sources. It seems questionable
to cite that as precedent for not locking a row (that is, for having a
plain ON CONFLICT DO SELECT variant at all).

In other words, while ON CONFLICT DO NOTHING may have set a precedent
here, it at least has semantics that limit the consequences of not
locking the row; and it *feels* a little bit dirty to use it
indifferently, even where that makes sense. ON CONFLICT DO SELECT is
probably going to be used within wCTEs some of the time. I'm not sure
that a plain ON CONFLICT DO SELECT variant won't allow unpredictable,
complicated problems when composed within a more complicated query.
(This brings me back!)

In other other words, plain SELECT FOR UPDATE has to do all the EPQ
stuff, in order to confront many of the same issues that ON CONFLICT
must confront in its own way [1], but a plain SELECT never does any
EPQ stuff at all. It seems to follow that a plain ON CONFLICT DO
SELECT is an oxymoron. If I've missed the point of ON CONFLICT DO
SELECT, please let me know how.

[1] 
https://wiki.postgresql.org/wiki/UPSERT#Visibility_issues_and_the_proposed_syntax_.28WHERE_clause.2Fpredicate_stuff.29
-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] The case for removing replacement selection sort

2017-08-31 Thread Peter Geoghegan
On Wed, Aug 30, 2017 at 4:59 PM, Peter Geoghegan <p...@bowt.ie> wrote:
> I may submit the simple patch to remove replacement selection, if
> other contributors are receptive. Apart from everything else, the
> "incrementalism" of replacement selection works against cleverer batch
> memory management of the type I just outlined, which seems like a
> useful direction to take tuplesort in.

I attach a patch to remove replacement selection, which I'll submit to CF 1.


-- 
Peter Geoghegan
From 6ccad74113d3a13264a19653e13ef897be5c902d Mon Sep 17 00:00:00 2001
From: Peter Geoghegan <p...@bowt.ie>
Date: Wed, 30 Aug 2017 16:14:36 -0700
Subject: [PATCH] Remove replacement selection sort.

This simplifies tuplesort.c, since heap maintenance routines need no
longer concern themselves with cases where two runs are represented
within a heap.

Replacement selection sort's traditional advantages no longer allow it
to outperform a simple sort-merge strategy, even in sympathetic cases.
This is due to long term trends in hardware performance characteristics,
and enhancements to the tuplesort.c merge code in the past couple of
years.
---
 doc/src/sgml/config.sgml  |  39 ---
 src/backend/utils/init/globals.c  |   1 -
 src/backend/utils/misc/guc.c  |  10 -
 src/backend/utils/misc/postgresql.conf.sample |   1 -
 src/backend/utils/sort/tuplesort.c| 415 +++---
 src/include/miscadmin.h   |   1 -
 src/test/regress/expected/cluster.out |  17 +-
 src/test/regress/sql/cluster.sql  |  14 +-
 8 files changed, 51 insertions(+), 447 deletions(-)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 2b6255e..7c625e2 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -1513,45 +1513,6 @@ include_dir 'conf.d'
   
  
 
- 
-  replacement_sort_tuples (integer)
-  
-   replacement_sort_tuples configuration parameter
-  
-  
-  
-   
-When the number of tuples to be sorted is smaller than this number,
-a sort will produce its first output run using replacement selection
-rather than quicksort.  This may be useful in memory-constrained
-environments where tuples that are input into larger sort operations
-have a strong physical-to-logical correlation.  Note that this does
-not include input tuples with an inverse
-correlation.  It is possible for the replacement selection algorithm
-to generate one long run that requires no merging, where use of the
-default strategy would result in many runs that must be merged
-to produce a final sorted output.  This may allow sort
-operations to complete sooner.
-   
-   
-The default is 150,000 tuples.  Note that higher values are typically
-not much more effective, and may be counter-productive, since the
-priority queue is sensitive to the size of available CPU cache, whereas
-the default strategy sorts runs using a cache
-oblivious algorithm.  This property allows the default sort
-strategy to automatically and transparently make effective use
-of available CPU cache.
-   
-   
-Setting maintenance_work_mem to its default
-value usually prevents utility command external sorts (e.g.,
-sorts used by CREATE INDEX to build B-Tree
-indexes) from ever using replacement selection sort, unless the
-input tuples are quite wide.
-   
-  
- 
-
  
   autovacuum_work_mem (integer)
   
diff --git a/src/backend/utils/init/globals.c b/src/backend/utils/init/globals.c
index 7c09498..9680a4b 100644
--- a/src/backend/utils/init/globals.c
+++ b/src/backend/utils/init/globals.c
@@ -112,7 +112,6 @@ bool		enableFsync = true;
 bool		allowSystemTableMods = false;
 int			work_mem = 1024;
 int			maintenance_work_mem = 16384;
-int			replacement_sort_tuples = 15;
 
 /*
  * Primary determinants of sizes of shared-memory structures.
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 246fea8..a459672 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -1932,16 +1932,6 @@ static struct config_int ConfigureNamesInt[] =
 		NULL, NULL, NULL
 	},
 
-	{
-		{"replacement_sort_tuples", PGC_USERSET, RESOURCES_MEM,
-			gettext_noop("Sets the maximum number of tuples to be sorted using replacement selection."),
-			gettext_noop("When more tuples than this are present, quicksort will be used.")
-		},
-		_sort_tuples,
-		15, 0, INT_MAX,
-		NULL, NULL, NULL
-	},
-
 	/*
 	 * We use the hopefully-safely-small value of 100kB as the compiled-in
 	 * default for max_stack_depth.  InitializeGUCOptions will increase it if
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresq

Re: [HACKERS] The case for removing replacement selection sort

2017-08-30 Thread Peter Geoghegan
On Wed, Aug 30, 2017 at 5:38 PM, Robert Haas <robertmh...@gmail.com> wrote:
> Wow.  Just to be clear, I am looking for the BEST case for replacement
> selection, not the worst case.  But I would have expected that case to
> be a win for replacement selection, and it clearly isn't.  I can
> reproduce your results here.

But I *was* trying to get a best case. That's why it isn't even worse.
That's what the docs say the best case is, after all.

This is the kind of thing that replacement selection actually did do
better with on 9.6. I clearly remember Tomas Vondra doing lots of
benchmarking, showing some benefit with RS with a work_mem of 8MB or
less. As I said in my introduction on this thread, we weren't wrong to
add replacement_sort_tuples to 9.6, given where things were with
merging at the time. But, it does very much appear to create less than
zero benefit these days. The picture changed.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] The case for removing replacement selection sort

2017-08-30 Thread Peter Geoghegan
On Wed, Aug 30, 2017 at 3:14 PM, Peter Geoghegan <p...@bowt.ie> wrote:
> This is significantly faster, in a way that's clearly reproducible and
> consistent, despite the fact that we need about 10 merge passes
> without replacement selection, and only have enough memory for 7
> tapes. I think that I could find a case that makes replacement
> selection look much worse, if I tried.

Just to see where we end up, I quickly wrote a patch to remove
replacement selection + replacement_sort_tuples. The LOC impact worked
out like this:

$ git diff master --stat
 doc/src/sgml/config.sgml  |  39 
 src/backend/utils/init/globals.c  |   1 -
 src/backend/utils/misc/guc.c  |  10 -
 src/backend/utils/misc/postgresql.conf.sample |   1 -
 src/backend/utils/sort/tuplesort.c| 403
+--
 src/include/miscadmin.h   |   1 -
 src/test/regress/expected/cluster.out |  17 +-
 src/test/regress/sql/cluster.sql  |  14 +-
 8 files changed, 52 insertions(+), 434 deletions(-)

It was nice to be able to remove comments that make certain
distinctions that replacement selection cares about. These were
tedious to read. I managed to remove several paragraphs within
tuplesort.c's header comments.

Another advantage of the changes made that I noticed in passing is
that SortTuple.tupindex is now only used while merging. It would be
quite possible to remove SortTuple.tupindex entirely, as an additional
piece of work, by providing some other indirection to get that
information when merging. From there, we could replace SortTuple.tuple
with a bitfield, that has one bit for NULLness, and treats other bits
as a 63-bit or 31-bit integer. This integer would be used an offset
into a buffer that we repalloc(), thus removing all retail palloc()s,
even during run generation. Using one big buffer for tuples would make
tuplesort.c quite a bit more memory efficient (SortTuple will only be
16 bytes, we won't waste memory on palloc() headers, and sequential
access is made more cache efficient in presorted pass-by-reference
cases).

I'm not planning to work on this myself. It is similar to something
that Heikki described last year, but goes a bit further by eliminating
all palloc() header overhead, while also not playing tricks with
reclaiming bits from pointers (I recall that Tom didn't like that
aspect of Heikki's proposal at all). There would be new infrastructure
required to make this work -- we'd have to be able to ask routines
like ExecCopySlotMinimalTuple() and heap_copytuple() to copy into our
own large tuple buffer, rather than doing their own palloc() on
tuplesort's behalf. But that seems like a good idea anyway.

I may submit the simple patch to remove replacement selection, if
other contributors are receptive. Apart from everything else, the
"incrementalism" of replacement selection works against cleverer batch
memory management of the type I just outlined, which seems like a
useful direction to take tuplesort in.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] The case for removing replacement selection sort

2017-08-30 Thread Peter Geoghegan
On Wed, Aug 30, 2017 at 3:01 PM, Robert Haas <robertmh...@gmail.com> wrote:
> That may all be true, but my point is that if it wins in some cases,
> we should keep it -- and proving it no longer wins in those cases will
> require running tests.

That's not hard. On my laptop:

$ pgbench -i -s 100
...

postgres=# set work_mem = '2MB';
SET
postgres=# show replacement_sort_tuples ;
 replacement_sort_tuples
─
 15
(1 row)
(30784) /postgres=# select count(distinct aid) from pgbench_accounts ;
   count

 10,000,000
(1 row)

Time: 4157.267 ms (00:04.157)
(30784) /postgres=# set replacement_sort_tuples = 0;
SET
(30784) /postgres=# select count(distinct aid) from pgbench_accounts ;
   count

 10,000,000
(1 row)

Time: 3343.789 ms (00:03.344)

This is significantly faster, in a way that's clearly reproducible and
consistent, despite the fact that we need about 10 merge passes
without replacement selection, and only have enough memory for 7
tapes. I think that I could find a case that makes replacement
selection look much worse, if I tried.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Polyphase merge is obsolete

2017-08-30 Thread Peter Geoghegan
On Wed, Aug 30, 2017 at 12:48 PM, Robert Haas <robertmh...@gmail.com> wrote:
> These are separate topics.  They should each be discussed on their own
> thread.  Please don't hijack this thread to talk about something else.

I don't think that that is a fair summary.

Heikki has done a number of things in this area, of which this is only
the latest. I'm saying "hey, have you thought about RS too?". Whether
or not I'm "hijacking" this thread is, at best, ambiguous.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] The case for removing replacement selection sort

2017-08-30 Thread Peter Geoghegan
On Wed, Aug 30, 2017 at 12:51 PM, Robert Haas <robertmh...@gmail.com> wrote:
> On Fri, Jul 14, 2017 at 6:20 PM, Peter Geoghegan <p...@bowt.ie> wrote:
>> With the additional enhancements made to Postgres 10, I doubt that
>> there are any remaining cases where it wins.
>
> The thing to do about that would be to come up with some cases where
> someone might plausibly think it would win and benchmark them to find
> out what happens.  I find it really hard to believe that sorting a
> long presorted stream of tuples (or, say, 2-1-4-3-6-5-8-7-10-9 etc.)
> is ever going to be as fast with any other algorithm as it is with
> replacement selection.

Replacement selection as implemented in Postgres is supposed to be
about the "single run, no merge" best case. This must use
TSS_SORTEDONTAPE processing, which is optimized for random access,
which is usually the wrong thing.

In general, sorting is only one cost that is involved here, and is not
the predominant cost with presorted input.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Polyphase merge is obsolete

2017-08-30 Thread Peter Geoghegan
On Mon, Feb 27, 2017 at 2:45 PM, Peter Geoghegan <p...@bowt.ie> wrote:
> Since we have an awful lot of stuff in the last CF, and this patch
> doesn't seem particularly strategic, I've marked it "Returned with
> Feedback".

I noticed that this is in the upcoming CF 1 for v11. I'm signed up to review.

I'd like to point out that replacement selection is also obsolete,
which is something I brought up recently [1]. I don't actually have
any feature-driven reason to want to kill replacement selection - it's
just an annoyance at this point. I do think that RS is more deserving
of being killed than Polyphase merge, because it actually costs users
something to continue to support it. The replacement_sort_tuples GUC
particularly deserves to be removed.

It would be nice if killing RS was put in scope here. I'd appreciate
it, at least, since it would simplify the heap routines noticeably.
The original analysis that led to adding replacement_sort_tuples was
based on certain performance characteristics of merging that have
since changed by quite a bit, due to our work for v10.

[1] 
postgr.es/m/cah2-wzmmnjg_k0r9nqywmq3zjyjjk+hcbizynghay-zyjs6...@mail.gmail.com
-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] A design for amcheck heapam verification

2017-08-30 Thread Peter Geoghegan
On Wed, Aug 30, 2017 at 5:02 AM, Alvaro Herrera
<alvhe...@2ndquadrant.com> wrote:
> Eh, if you want to optimize it for the case where debug output is not
> enabled, make sure to use ereport() not elog().  ereport()
> short-circuits evaluation of arguments, whereas elog() does not.

I should do that, but it's still not really noticeable.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] A design for amcheck heapam verification

2017-08-29 Thread Peter Geoghegan
On Tue, Aug 29, 2017 at 7:22 PM, Thomas Munro
<thomas.mu...@enterprisedb.com> wrote:
> Indeed.  Thank you for working on this!  To summarise a couple of
> ideas that Peter and I discussed off-list a while back:  (1) While
> building the hash table for a hash join we could build a Bloom filter
> per future batch and keep it in memory, and then while reading from
> the outer plan we could skip writing tuples out to future batches if
> there is no chance they'll find a match when read back in later (works
> only for inner joins and only pays off in inverse proportion to the
> join's selectivity);

Right. Hash joins do tend to be very selective, though, so I think
that this could help rather a lot. With just 8 or 10 bits per element,
you can eliminate almost all the batch write-outs on the outer side.
No per-worker synchronization for BufFiles is needed when this
happens, either. It seems like that could be very important.

> To use this for anything involving parallelism where a Bloom filter
> must be shared we'd probably finish up having to create a shared
> version of bloom_init() that either uses caller-provided memory and
> avoids the internal pointer, or allocates DSA memory.  I suppose you
> could consider splitting your bloom_init() function up into
> bloom_estimate() and bloom_init(user_supplied_space, ...) now, and
> getting rid of the separate pointer to bitset (ie stick char
> bitset[FLEXIBLE_ARRAY_MEMBER] at the end of the struct)?

Makes sense. Not hard to add that.

> I was just observing that there is an opportunity for code reuse here.
> This function's definition would ideally be something like:
>
> double
> bloom_prop_bits_set(bloom_filter *filter)
> {
> return popcount(filter->bitset, NBYTES(filter)) / (double) NBITS(filter);
> }
>
> This isn't an objection to the way you have it, since we don't have a
> popcount function yet!  We have several routines in the tree for
> counting bits, though not yet the complete set from Hacker's Delight.

Right. I'm also reminded of the lookup tables for the visibility/freeze map.

> Your patch brings us one step closer to that goal.  (The book says
> that this approach is good far sparse bitsets, but your comment says
> that we expect something near 50%.  That's irrelevant anyway since a
> future centralised popcount() implementation would do this in
> word-sized chunks with a hardware instruction or branch-free-per-word
> lookups in a table and not care at all about sparseness.)

I own a copy of Hacker's Delight (well, uh, Daniel Farina lent me his
copy about 2 years ago!). pop()/popcount() does seem like a clever
algorithm, that we should probably think about adopting in some cases,
but I should point at that the current caller to my
bloom_prop_bits_set() function is an elog() DEBUG1 call. This is not
at all performance critical.

> + * Test if bloom filter definitely lacks element.
>
> I think where "Bloom filter" appears in prose it should have a capital
> letter (person's name).

Agreed.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] A design for amcheck heapam verification

2017-08-29 Thread Peter Geoghegan
On Tue, Aug 29, 2017 at 4:34 PM, Thomas Munro
<thomas.mu...@enterprisedb.com> wrote:
> Some drive-by comments on the lib patch:

I was hoping that you'd look at this, since you'll probably want to
use a bloom filter for parallel hash join at some point. I've tried to
keep this one as simple as possible. I think that there is a good
chance that it will be usable for parallel hash join with multiple
batches. You'll need to update the interface a little bit to make that
work (e.g., bring-your-own-hash interface), but hopefully the changes
you'll need will be limited to a few small details.

> +bloom_add_element(bloom_filter *filter, unsigned char *elem, Size len)
>
> I think the plan is to use size_t for new stuff[1].

I'd forgotten.

> This is another my_log2(), right?
>
> It'd be nice to replace both with fls() or flsl(), though it's
> annoying to have to think about long vs int64 etc.  We already use
> fls() in two places and supply an implementation in src/port/fls.c for
> platforms that lack it (Windows?), but not the long version.

win64 longs are only 32-bits, so my_log2() would do the wrong thing
for me on that platform. pow2_truncate() is provided with a number of
bits as its argument, not a number of bytes (otherwise this would
work).

Ideally, we'd never use long integers, because its width is platform
dependent, and yet it is only ever used as an alternative to int
because it is wider than int. One example of where this causes
trouble: logtape.c uses long ints, so external sorts can use half the
temp space on win64.

> +/*
> + * Hash function is taken from sdbm, a public-domain reimplementation of the
> + * ndbm database library.
> + */
> +static uint32
> +sdbmhash(unsigned char *elem, Size len)
> +{

> I see that this is used in gawk, BerkeleyDB and all over the place[2].
> Nice.  I understand that this point of this is to be a hash function
> that is different from our usual one, for use by k_hashes().

Right. It's only job is to be a credible hash function that isn't
derivative of hash_any().

> Do you
> think it belongs somewhere more common than this?  It seems a bit like
> our hash-related code is scattered all over the place but should be
> consolidated, but I suppose that's a separate project.

Unsure. In its defense, there is also a private murmurhash one-liner
within tidbitmap.c.  I don't mind changing this, but it's slightly odd
to expose a hash function whose only job is to be completely unrelated
to hash_any().

> Unnecessary braces here and elsewhere for single line body of for loops.
>
> +bloom_prop_bits_set(bloom_filter *filter)
> +{
> +intbitset_bytes = NBITS(filter) / BITS_PER_BYTE;
> +int64bits_set = 0;
> +inti;
> +
> +for (i = 0; i < bitset_bytes; i++)
> +{
> +unsigned char byte = filter->bitset[i];
> +
> +while (byte)
> +{
> +bits_set++;
> +byte &= (byte - 1);
> +}
> +}

I don't follow what you mean here.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] A design for amcheck heapam verification

2017-08-29 Thread Peter Geoghegan
On Thu, May 11, 2017 at 4:30 PM, Peter Geoghegan <p...@bowt.ie> wrote:
> I spent only a few hours writing a rough prototype, and came up with
> something that does an IndexBuildHeapScan() scan following the
> existing index verification steps. Its amcheck callback does an
> index_form_tuple() call, hashes the resulting IndexTuple (heap TID and
> all), and tests it for membership of a bloom filter generated as part
> of the main B-Tree verification phase. The IndexTuple memory is freed
> immediately following hashing.

I attach a cleaned-up version of this. It has extensive documentation.
My bloom filter implementation is broken out as a separate patch,
added as core infrastructure under "lib".

I do have some outstanding concerns about V1 of the patch series:

* I'm still uncertain about the question of using IndexBuildHeapScan()
during Hot Standby. It seems safe, since we're only using the
CONCURRENTLY/AccessShareLock path when this happens, but someone might
find that objectionable on general principle. For now, in this first
version, it remains possible to call IndexBuildHeapScan() during Hot
Standby, to allow the new verification to work there.

* The bloom filter has been experimentally verified, and is based on
source material which is directly cited. It would nevertheless be
useful to have the hashing stuff scrutinized, because it's possible
that I've overlooked some subtlety.

This is only the beginning for heapam verification. Comprehensive
coverage can be added later, within routines that specifically target
some table, not some index.

While this patch series only adds index-to-heap verification for
B-Tree indexes, I can imagine someone adopting the same technique to
verifying that other access methods are consistent with their heap
relation. For example, it would be easy to do this with hash indexes.
Any other index access method where the same high-level principle that
I rely on applies can do index-to-heap verification with just a few
tweaks. I'm referring to the high-level principle that comments
specifically point out in the patch: that REINDEX leaves you with an
index structure that has exactly the same entries as the old index
structure had, though possibly with fewer dead index tuples. I like my
idea of reusing IndexBuildHeapScan() for verification. Very few new
LOCs are actually added to amcheck by this patch, and
IndexBuildHeapScan() is itself tested.

-- 
Peter Geoghegan
From 48499cfb58b7bf705e93fb12cc5359ec12cd9c51 Mon Sep 17 00:00:00 2001
From: Peter Geoghegan <p...@bowt.ie>
Date: Tue, 2 May 2017 00:19:24 -0700
Subject: [PATCH 2/2] Add amcheck verification of indexes against heap.

Add a new, optional capability to bt_index_check() and
bt_index_parent_check():  callers can check that each heap tuple that
ought to have an index entry does in fact have one.  This happens at the
end of the existing verification checks.

This is implemented by using a bloom filter data structure.  The
implementation performs set membership tests within a callback (the same
type of callback that each index AM registers for CREATE INDEX).  The
bloom filter is populated during the initial index verification scan.
---
 contrib/amcheck/Makefile |   2 +-
 contrib/amcheck/amcheck--1.0--1.1.sql|  28 
 contrib/amcheck/amcheck.control  |   2 +-
 contrib/amcheck/expected/check_btree.out |  14 +-
 contrib/amcheck/sql/check_btree.sql  |   9 +-
 contrib/amcheck/verify_nbtree.c  | 236 ---
 doc/src/sgml/amcheck.sgml| 103 +++---
 7 files changed, 345 insertions(+), 49 deletions(-)
 create mode 100644 contrib/amcheck/amcheck--1.0--1.1.sql

diff --git a/contrib/amcheck/Makefile b/contrib/amcheck/Makefile
index 43bed91..c5764b5 100644
--- a/contrib/amcheck/Makefile
+++ b/contrib/amcheck/Makefile
@@ -4,7 +4,7 @@ MODULE_big	= amcheck
 OBJS		= verify_nbtree.o $(WIN32RES)
 
 EXTENSION = amcheck
-DATA = amcheck--1.0.sql
+DATA = amcheck--1.0--1.1.sql amcheck--1.0.sql
 PGFILEDESC = "amcheck - function for verifying relation integrity"
 
 REGRESS = check check_btree
diff --git a/contrib/amcheck/amcheck--1.0--1.1.sql b/contrib/amcheck/amcheck--1.0--1.1.sql
new file mode 100644
index 000..e6cca0a
--- /dev/null
+++ b/contrib/amcheck/amcheck--1.0--1.1.sql
@@ -0,0 +1,28 @@
+/* contrib/amcheck/amcheck--1.0--1.1.sql */
+
+-- complain if script is sourced in psql, rather than via CREATE EXTENSION
+\echo Use "ALTER EXTENSION amcheck UPDATE TO '1.1'" to load this file. \quit
+
+--
+-- bt_index_check()
+--
+DROP FUNCTION bt_index_check(regclass);
+CREATE FUNCTION bt_index_check(index regclass,
+heapallindexed boolean DEFAULT false)
+RETURNS VOID
+AS 'MODULE_PATHNAME', 'bt_index_check'
+LANGUAGE C STRICT PARALLEL RESTRICTED;
+
+--
+-- bt_index_parent_check()
+--
+DROP FUNCTION bt_index_parent_check(regclass);
+CREATE FUNCTION bt_index_parent_check(index regclass,
+heapallin

Re: [HACKERS] pgbench: faster version of tpcb-like transaction

2017-08-26 Thread Peter Geoghegan
On Sat, Aug 26, 2017 at 4:59 PM, Jeff Janes <jeff.ja...@gmail.com> wrote:
> I still get a 2 fold improvement, from 13668 to 27036, when both
> transactions are tested with -M prepared.
>
> I am surprised, I usually haven't seen that much difference for the default
> queries between prepared or not, to the point that I got out of the habit of
> testing with it.  But back when I was testing with and without
> systematically, I did notice that it changed a lot depending on hardware and
> concurrency.  And of course from version to version different bottlenecks
> come and go.

I must admit that I had a similar unpleasant surprise at one point --
"-M prepared" seems to matter *a lot* these days. That's the default
that I'd change, if any.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgbench: faster version of tpcb-like transaction

2017-08-26 Thread Peter Geoghegan
On Sat, Aug 26, 2017 at 3:53 PM, Jeff Janes <jeff.ja...@gmail.com> wrote:
> I get nearly a 3 fold speed up using the new transaction, from 9184 to 26383
> TPS, on 8 CPU machine using scale 50 and:
>
> PGOPTIONS="-c synchronous_commit=off" pgbench -c32 -j32 -T60 -b tpcb-like

What about with "-M prepared"? I think that most of us use that
setting already, especially with CPU-bound workloads.


-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: ICU collation variant keywords and pg_collation entries (Was: [BUGS] Crash report for some ICU-52 (debian8) COLLATE and work_mem values)

2017-08-25 Thread Peter Geoghegan
On Tue, Aug 22, 2017 at 4:58 AM, Daniel Verite <dan...@manitou-mail.org> wrote:
> For the record, attached are the collname that initdb now creates
> in pg_collation, when compiled successively with all current
> versions of ICU (49 to 59), versus what 10beta2 did.
>
> There are still a few names that get dropped along the ICU
> upgrade path, but now they look like isolated cases.

Even though ICU initdb collations are now as stable as possible, which
is great, I still think that Tom had it right about pg_upgrade: Long
term, it would be preferable if we also did a CREATE COLLATION when
initdb stable collations/base ICU locales go away for pg_upgrade. We
should do such a CREATE COLLATION if and only if that makes the
upgrade succeed where it would otherwise fail. This wouldn't be a
substitute for initdb collation name stability. It would work
alongside it.

This makes sense with ICU. The equivalent of a user-defined CREATE
COLLATION with an old country code may continue to work acceptably
because ICU/CLDR supports aliasing, and/or doesn't actually care that
a deleted country tag (e.g. the one for Serbia and Montenegro [1]) was
used. It'll still interpret Serbian as Serbian (sr-*), regardless of
what country code may also appear, even if the country code is not
just obsolete, but entirely bogus.

Events like the dissolution of countries are rare enough that that
extra assurance is just a nice-to-have, though.

[1] https://en.wikipedia.org/wiki/ISO_3166-2:CS
-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: ICU collation variant keywords and pg_collation entries (Was: [BUGS] Crash report for some ICU-52 (debian8) COLLATE and work_mem values)

2017-08-21 Thread Peter Geoghegan
On Mon, Aug 21, 2017 at 4:48 PM, Peter Eisentraut
<peter.eisentr...@2ndquadrant.com> wrote:
> On 8/21/17 12:33, Peter Geoghegan wrote:
>> On Mon, Aug 21, 2017 at 8:23 AM, Peter Eisentraut
>> <peter.eisentr...@2ndquadrant.com> wrote:
>>> Here are my patches to address this.
>>
>> These look good.
>
> Committed.  That closes this open item.

Thanks again.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: ICU collation variant keywords and pg_collation entries (Was: [BUGS] Crash report for some ICU-52 (debian8) COLLATE and work_mem values)

2017-08-21 Thread Peter Geoghegan
On Mon, Aug 21, 2017 at 9:33 AM, Peter Geoghegan <p...@bowt.ie> wrote:
> On Mon, Aug 21, 2017 at 8:23 AM, Peter Eisentraut
> <peter.eisentr...@2ndquadrant.com> wrote:
>> Here are my patches to address this.
>
> These look good.

Also, I don't know why en-u-kr-others-digit wasn't accepted by CREATE
COLLATION, as you said on the other thread just now. That's directly
lifted from TR #35. Is it an ICU version issue? I guess it doesn't
matter that much, though.


-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: ICU collation variant keywords and pg_collation entries (Was: [BUGS] Crash report for some ICU-52 (debian8) COLLATE and work_mem values)

2017-08-21 Thread Peter Geoghegan
On Mon, Aug 21, 2017 at 8:23 AM, Peter Eisentraut
<peter.eisentr...@2ndquadrant.com> wrote:
> Here are my patches to address this.

These look good.

One small piece of feedback: I suggest naming the custom collation
"numeric" something else instead: "natural". Apparently, the behavior
it implements is sometimes called natural sorting. See
https://en.wikipedia.org/wiki/Natural_sort_order.

Thanks
-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: ICU collation variant keywords and pg_collation entries (Was: [BUGS] Crash report for some ICU-52 (debian8) COLLATE and work_mem values)

2017-08-19 Thread Peter Geoghegan

Noah Misch <n...@leadboat.com> wrote:

I think you're contending that, as formulated, this is not a valid v10 open
item.  Are you?


As the person that came up with this formulation, I'd like to give a
quick summary of my current understanding of the item's status:

* We're in agreement that we ought to have initdb create initial
 collations based on ICU locales, not based on distinct ICU
 collations [1].

* We're in agreement that variant keywords should not be
 created for each base locale/collation [2].

Once these two changes are made, I think that everything will be in good
shape as far as pg_collation name stability goes. It shouldn't take
Peter E. long to write the patch. I'm happy to write the patch on his
behalf if that saves time.

We're also going to work on the documentation, to make keyword variants
like -emoji and -traditional at least somewhat discoverable, and to
explain the capabilities of custom ICU collations more generally.

[1] https://postgr.es/m/f67f36d7-ceb6-cfbd-28d4-413c6d22f...@2ndquadrant.com
[2] https://postgr.es/m/3862d484-f0a5-9eef-c54e-3f6808338...@2ndquadrant.com

--
Peter Geoghegan


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Atomics for heap_parallelscan_nextpage()

2017-08-16 Thread Peter Geoghegan
On Wed, Aug 16, 2017 at 9:55 AM, Andres Freund <and...@anarazel.de> wrote:
> On 2017-08-16 11:16:58 -0400, Tom Lane wrote:
>> Heikki Linnakangas <hlinn...@iki.fi> writes:
>> > A couple of 32-bit x86 buildfarm members don't seem to be happy with
>> > this. I'll investigate, but if anyone has a clue, I'm all ears...
>>
>> dromedary's issue seems to be alignment:
>>
>> TRAP: UnalignedPointer("(((uintptr_t) ((uintptr_t)(ptr)) + ((8) - 1)) & 
>> ~((uintptr_t) ((8) - 1))) != (uintptr_t)(ptr)", File: 
>> "../../../../src/include/port/atomics.h", Line: 452)
>> 2017-08-16 11:11:38.558 EDT [75693:3] LOG:  server process (PID 76277) was 
>> terminated by signal 6: Abort trap
>> 2017-08-16 11:11:38.558 EDT [75693:4] DETAIL:  Failed process was running: 
>> select count(*) from a_star;
>>
>> Not sure if this is your bug or if it's exposing a pre-existing
>> deficiency in the atomics code, viz, failure to ensure that
>> pg_atomic_uint64 is actually a 64-bit-aligned type.  Andres?
>
> I suspect it's the former.  Suspect that the shared memory that holds
> the "parallel desc" isn't properly aligned:

Clang has an -fsanitize=alignment option that may be useful here.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] What users can do with custom ICU collations in Postgres 10

2017-08-15 Thread Peter Geoghegan
On Tue, Aug 15, 2017 at 11:33 AM, Peter Eisentraut
<peter.eisentr...@2ndquadrant.com> wrote:
> On 8/9/17 18:49, Peter Geoghegan wrote:
>> I'd like to give a demo on what is already possible, but not currently
>> documented. I didn't see anyone else comment on this, including Peter
>> E (maybe I missed that?). We should improve the documentation in this
>> area, to get this into the hands of users.
>
> Here is a small piece of documentation.  Thoughts?

This looks pretty good, but I do have some feedback:

* "23.2.2.3. Copying Collations" suggests that the only use of CREATE
COLLATION is copying collations, which is far from true with ICU. We
should change that at the same time as this change is made. I think
that just changing the title would improve the overall flow of the
page.

* Maybe add an example of numeric ordering -- the "alphanumeric
invoice" case, where you want text containing numbers to have the
numbers sort as numbers iff the comparison is to be resolved when
comparing numbers. I think that that's really useful, and worth
specifically calling out. I definitely would have used that had it
been available ten years ago.

* Let's use "en-u-kr-others-digit" instead of "en-u-kr-latn-digit' in
the example. It makes no real difference to us English speakers, but
means that the example works the same for those that use a different
alphabet. It's more culturally neutral.

* If we end up having initdb put all locales rather than all
collations in pg_collation, which I think is very likely, then we can
put in a link to ICU's locale explorer web resource:

https://ssl.icu-project.org/icu-bin/locexp?d_=en&_=en_HK

This lets the user see exactly what they'll get from a base locale
using an intuitive interface (assuming it matches their CLDR version).

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] What users can do with custom ICU collations in Postgres 10

2017-08-15 Thread Peter Geoghegan
On Tue, Aug 15, 2017 at 11:19 AM, Peter Eisentraut
<peter.eisentr...@2ndquadrant.com> wrote:
> On 8/14/17 12:15, Peter Eisentraut wrote:
>> Given that we cannot reasonably preload all these new variants that you
>> demonstrated, I think it would make sense to drop all the keyword
>> variants from the preloaded set.
>
> After playing with this a bit, I'm having some doubts.  While the "k"
> keys from TR 35 are algorithmic parameters that apply to all locales and
> can be looked up in the respective documents, I don't find any way a
> user can discover what collation types ("co") are available for a
> locale.  Any ideas?  If there isn't one, I think we need to provide one.

I wanted to do that too, but Tom didn't seem sold on it yesterday. He
called it v11 material over on the ICU bug thread.

All of the unicode "u" extensions are documented per-CLDR version as
an XML file. For example:

http://www.unicode.org/repos/cldr/tags/release-31/common/bcp47/collation.xml

This isn't ideal, because only some of the "co" variants change things
for all possible base collations. But, there isn't that many "co"
options to choose from, and I think that for the most part it's
reasonably obvious which one is desirable. For example, Chinese people
are probably well aware of what Pinyin is, and what stroke is. Things
like EOR and search are much more esoteric, but also much less useful.
So, I wouldn't hate it if this was the only way that users could
discover the variants in v10.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] INSERT .. ON CONFLICT DO SELECT [FOR ..]

2017-08-14 Thread Peter Geoghegan
On Mon, Aug 14, 2017 at 6:23 PM, Marko Tiikkaja <ma...@joh.to> wrote:
> Attached is a patch for $SUBJECT.  It might still be a bit rough around the
> edges and probably light on docs and testing, but I thought I'd post it
> anyway so I won't forget.

Is it possible for ON CONFLICT DO SELECT to raise a cardinality
violation error? Why or why not?



-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] What users can do with custom ICU collations in Postgres 10

2017-08-14 Thread Peter Geoghegan
On Mon, Aug 14, 2017 at 9:15 AM, Peter Eisentraut
<peter.eisentr...@2ndquadrant.com> wrote:
> I'm having trouble finding some concrete documentation for this.  The TR
> 35 link you showed documents the key words and values, BCP 47 documents
> the syntax, but nothing puts it all together in a form consumable by
> users.  The ICU documentation still mainly focuses on the "old"
> @keyword=value syntax.  I guess we'll have to write our own for now.

There is an unusual style to the standards that apply here. It's
incredibly detailed, and the options are very powerful, but it's in an
unfamiliar language. ICU just considers itself a consumer of the CLDR
locale stuff, which is a broad standard.

We don't have to write comprehensive documentation of these
kn/kb/ka/kh options that I pointed out exist. I think it would be nice
to cover a few interesting cases, and link to the BCP 47 Unicode
extension (TR 35) stuff.

Here is a list of scripts, that are all reorderable with this TR 35
stuff (varies somewhat based on CLDR/ICU version):

http://unicode.org/iso15924/iso15924-codes.html

Here is a CLDR specific XML specification of the variant keywords (can
be mapped to specific ICU version easily):

http://www.unicode.org/repos/cldr/tags/release-31/common/bcp47/collation.xml

> Given that we cannot reasonably preload all these new variants that you
> demonstrated, I think it would make sense to drop all the keyword
> variants from the preloaded set.

Cool. While I am of course in favor of this, I actually understand
very well why you had initdb add them. I think that removing them
creates a discoverability problem that cannot easily be fixed through
documentation. ISTM that we ought to also add an SQL-callable function
that lists the most common keyword variants. Some of those are
specific to one or two locales, such as traditional Spanish, or the
alternative sort orders for Han characters.

What do you think of that idea?

I guess an alternative idea is to just link to that XML document
(collation.xml), which exactly specifies the variants. Users can get
the "co" variants there. Should be for the most part obvious which one
is interesting to which locale, since there is not that many "co"
variants to choose from, and users will probably know what to look for
if they look at all.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [BUGS] Replication to Postgres 10 on Windows is broken

2017-08-13 Thread Peter Geoghegan
On Sun, Aug 13, 2017 at 2:22 PM, Andres Freund <and...@anarazel.de> wrote:
> On 2017-08-13 16:55:33 -0400, Tom Lane wrote:
>> Peter Geoghegan <p...@bowt.ie> writes:
>> > I think that it's useful for these things to be handled in an
>> > adversarial manner, in the same way that litigation is adversarial in
>> > a common law court. I doubt that Noah actually set out to demoralize
>> > anyone. He is just doing the job he was assigned.
>>
>> FWIW, I agree that Noah is just carrying out the RMT's task as
>> assigned.
>
> Well, then that's a sign that the tasks/process need to be rescoped.

Why? I might agree if the RMT had an outsized influence on final
outcome. If that's the case, then it's something that I missed.

I also don't think that it's fair to expect Noah to spend a lot of
time poring over whether sending out one of those pro forma status
update policy violation mails is warranted. I expect someone is his
position to aim to err on the side of sending out too many of those.
It's easy for the patch author to explain the situation, but hard for
the RMT to understand the situation fully at all times.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [BUGS] Replication to Postgres 10 on Windows is broken

2017-08-13 Thread Peter Geoghegan
On Sun, Aug 13, 2017 at 12:57 PM, Andres Freund <and...@anarazel.de> wrote:
> FWIW, I'm personally quite demotivated by this style of handling
> issues. You're essentially saying that any code change, even if it just
> increases exposure of a preexisting bug, needs to be handled by the
> committer of the exposing change. And even if that bug is on a platform
> the committer doesn't have.  And all that despite the issue getting
> attention.

I don't think you can generalize from what Noah said like that,
because it's always a matter of degree (the degree to which the
preexisting bug was a problem). Abbreviated keys for collated text
were disabled, though not due to bug in strxfrm(). Technically, it was
due to a bug in strcoll(), which glibc always had. strxfrm() therefore
only failed to be bug compatible with glibc's strcoll(). Does that
mean that we were wrong to disable the use of strxfrm() for
abbreviated keys?

I think that it's useful for these things to be handled in an
adversarial manner, in the same way that litigation is adversarial in
a common law court. I doubt that Noah actually set out to demoralize
anyone. He is just doing the job he was assigned.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Thoughts on unit testing?

2017-08-13 Thread Peter Geoghegan
On Thu, Aug 10, 2017 at 2:53 PM, Thomas Munro
<thomas.mu...@enterprisedb.com> wrote:
> The current regression tests, isolation tests and TAP tests are very
> good (though I admit my experience with TAP is limited), but IMHO we
> are lacking support for C-level unit testing.  Complicated, fiddly
> things with many states, interactions, edge cases etc can be hard to
> get full test coverage on from the outside.  Consider
> src/backend/utils/mmgr/freepage.c as a case in point.

It is my understanding that much of the benefit of unit testing comes
from maintainability. It's something that goes well with design by
contract. Writing unit tests is a forcing function. It requires
testable units, making the units more composable. The programmer must
be very deliberate about state, and must delineate code as testable
units. Unit tests are often supposed to be minimal, to serve as a kind
of design document.

In order to unit test something like an index access method, or
VACUUM, or the transaction manager, it would be necessary to mock the
buffer manager, and have a large amount of scaffolding code. This
would be an awful lot of work to do as an afterthought. I wouldn't
know where to draw the line between one unit and another, because the
code just wasn't written with that in mind to begin with. Is snapshot
acquisition a unit that includes heapam.c? If so, does that mean that
heapam.c has to have its own unit for other functionality, or does
that get lumped in with the snapshot acquisition unit?

As I've said a couple of times already, one thought behind amcheck was
that it would allow us to formalize the design of access methods in a
way that somewhat resembles unit testing. The code that we already
have for nbtree is extremely well commented, and is intended to
document how nbtree really works in terms of invariants. Although the
tests are very comprehensive, there really isn't all that much code,
because it's supposed to minimally describe correct states for a
B-Tree. It manages to do this well, perhaps because the Lehman & Yao
design formalizes everything as a small number of per-page atomic
operations, and is very clear about legal and illegal states.

It would be nice if amcheck eventually *guided* the development of the
heap access method. If comprehensive testing of invariants for heapam
becomes very complicated, it could be instructive to consider how to
refactor heapam to make the required verification code simpler but no
less comprehensive. Mostly, this is desirable because it will lead to
an overall simpler, more robust design, and because it lowers the
barrier to entry to assess the correctness of the code. Of course,
this requires buy-in from patch authors to work.

To put it another way, a patch author whose patch touches storage
implicitly asserts that their patch is correct. It would be useful if
they provided a precise falsifiable statement about its correctness up
front, in the form of verification code.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Getting server crash on Windows when using ICU collation

2017-08-10 Thread Peter Geoghegan
On Thu, Aug 10, 2017 at 3:57 PM, Peter Geoghegan <p...@bowt.ie> wrote:
> On Thu, Aug 10, 2017 at 11:02 AM, Robert Haas <robertmh...@gmail.com> wrote:
>> On Fri, Jun 23, 2017 at 1:14 AM, Ashutosh Sharma <ashu.coe...@gmail.com> 
>> wrote:
>>> Okay, attached is the patch which first detects the platform type and
>>> runs the uconv commands accordingly  from the corresponding icu bin
>>> path. Please have a look and let me know for any issues. Thanks.
>>
>> Should this be on the open items list?
>
> Maybe. Peter E has a patch to fix the other issue (ICU 52 has a buggy
> ucol_strcollUTF8()) by never using it on ICU versions prior to 53. He
> posted this just yesterday.

He committed it just now, so that's the end of this Windows issue, I suppose.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Getting server crash on Windows when using ICU collation

2017-08-10 Thread Peter Geoghegan
On Thu, Aug 10, 2017 at 11:02 AM, Robert Haas <robertmh...@gmail.com> wrote:
> On Fri, Jun 23, 2017 at 1:14 AM, Ashutosh Sharma <ashu.coe...@gmail.com> 
> wrote:
>> Okay, attached is the patch which first detects the platform type and
>> runs the uconv commands accordingly  from the corresponding icu bin
>> path. Please have a look and let me know for any issues. Thanks.
>
> Should this be on the open items list?

Maybe. Peter E has a patch to fix the other issue (ICU 52 has a buggy
ucol_strcollUTF8()) by never using it on ICU versions prior to 53. He
posted this just yesterday.

This removes the existing configure test, replacing it with something
that will work just the same on Windows, presuming Peter also reverts
the commit that had ICU never use ucol_strcollUTF8() on Windows.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] How can I find a specific collation in pg_collation when using ICU?

2017-08-09 Thread Peter Geoghegan
On Wed, Aug 9, 2017 at 6:19 PM, Tsunakawa, Takayuki
<tsunakawa.ta...@jp.fujitsu.com> wrote:
> From: pgsql-hackers-ow...@postgresql.org
>> [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Peter Eisentraut
>> There are no case-insensitive collations in PostgreSQL (yet).
>
> That's sad news, as I expected ICU to bring its various collation features to 
> PostgreSQL.  I hope it will be easy to add them.

The reason it is not easy is that text equality is based on strict
binary equality. We would have to teach hash operator classes about
collations to fix this, and make text_eq() hash strxfrm() or
something. That requires special work. You can ask ICU for case
insensitivity with Postgres 10, but the strcmp() tie breaker within
varstr_cmp() will prevent it from being truly case insensitive.

>> The best explanation of the difference that I can understand is here, under
>> "Why do CJK strings sort incorrectly in Unicode?":
>>
>> https://dev.mysql.com/doc/refman/5.5/en/faqs-cjk.html
>
> Thanks a lot.  MysQL seems to have many collations, doesn't it?

Well, it depends on how you define collation, which actually gets
quite complicated with ICU. You can combine certain options together
with great flexibility, it seems (see my e-mail from earlier today --
"What users can do with custom ICU collations in Postgres 10"). Let's
assume that there are 10 distinct options for each locale that each
independently affect sort order for the base collation of the locale.
I believe that there are at least 10 such options that change things,
and maybe a lot more. Theoretically, this means that there are an
absolutely enormous number of possible collations with ICU.

Now, I wouldn't *actually* say that we have many thousands of
collations with ICU, because that doesn't seem like a sensible way to
explain ICU Postgres collations. The way that we think about this from
using libc doesn't work well for ICU. Instead, I would say that we
have hundreds of locales (not collations), each of which support some
variant options (e.g., traditional Spanish sort order, alternative
Japanese sort order, pictographic emoji sorting), with some further
generic options for varying how case it handled, how numbers are
handled, and other things like that.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] What users can do with custom ICU collations in Postgres 10

2017-08-09 Thread Peter Geoghegan
There are actually very many customizations to collations that are
possible beyond what the "stock" ICU collations provide (whatever
"stock" means). Some of these are really cool, and I can imagine use
cases where they are very compelling that have nothing to do with
internationalization (such customizations are how we should eventually
implement case-insensitive collations, once the infrastructure for
doing that without breaking hashing is in place).

I'd like to give a demo on what is already possible, but not currently
documented. I didn't see anyone else comment on this, including Peter
E (maybe I missed that?). We should improve the documentation in this
area, to get this into the hands of users.

Say we're unhappy that numbers come first, which we see here:

postgres=# select * from (select '1a' i union select '1b' union select
'1c' union select 'a1' union select 'b2' union select 'c3') j order by
i collate "en-x-icu";
 i

 1a
 1b
 1c
 a1
 b2
 c3
(6 rows)

We may do this to get our desired sort order:

postgres=# create collation digitlast (provider=icu,
locale='en-u-kr-latn-digit');
CREATE COLLATION
postgres=# select * from (select '1a' i union select '1b' union select
'1c' union select 'a1' union select 'b2' union select 'c3') j order by
i collate "digitlast";
 i

 a1
 b2
 c3
 1a
 1b
 1c
(6 rows)

Note that 'kr' is a specific BCP47 Key [1]. Many different options can
be set in this manner.

Let's say we are unhappy with the fact that capital letters sort
higher than lowercase:

postgres=# select * from (select 'B' i union select 'b' union select
'A' union select 'a') j order by i collate "en-x-icu";
 i
───
 a
 A
 b
 B
(4 rows)

ICU provides a solution here, too:

postgres=# create collation capitalfirst (provider=icu, locale='en-u-kf-upper');
CREATE COLLATION
postgres=#
select * from (select 'B' i union select 'b' union select 'A' union
select 'a') j order by i collate "capitalfirst";
 i
───
 A
 a
 B
 b
(4 rows)

And, yes -- you can even *combine* these two options by creating a
third custom collation. That can be spelled
'en-u-kf-upper-kr-latn-digit', in case you were wondering.

Users have certainly complained about not liking this or that aspect
of how glibc sorts text many times over the years, particularly around
things like how whitespace and punctuation are handled, which they can
now do something about [2]. Users can also have numbers sort like
numbers should when compared against other numbers, by using the
numericOrdering option (not shown). numericOrdering would be great for
things like alphanumeric invoice numbers, or the alphanumeric car
registration plate numbers that are used in certain countries [3],
with fixed number/letter fields. These options are very powerful.

[1] http://unicode.org/reports/tr35/tr35-collation.html#Setting_Options
[2] http://unicode.org/reports/tr35/tr35-collation.html#Common_Settings
[3] 
https://en.wikipedia.org/wiki/Vehicle_registration_plates_of_the_Republic_of_Ireland
-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] How can I find a specific collation in pg_collation when using ICU?

2017-08-09 Thread Peter Geoghegan
On Wed, Aug 9, 2017 at 5:38 AM, MauMau <maumau...@gmail.com> wrote:
> I tried to find a particular collation name in pg_collation, but I
> cannot understand the naming convention after reading the following
> article.  Specifically, I want to find out whether there is some
> collation equivalent to Japanese_CI_AS in SQL Server, which means
> Japanese, case-insensitive, and accent sensitive.  Could you tell me
> how to do this?  Is there any room to improve the PG manual?

This is not an answer to the question you asked, but it may interest
you to know that ICU uses JIS X 4061 for Japanese, unlike Glibc. This
will give more useful results when sorting Japanese.

The best explanation of the difference that I can understand is here,
under "Why do CJK strings sort incorrectly in Unicode?":

https://dev.mysql.com/doc/refman/5.5/en/faqs-cjk.html

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Possible issue with expanded object infrastructure on Postgres 9.6.1

2017-08-08 Thread Peter Geoghegan
On Thu, Jan 19, 2017 at 5:45 PM, Peter Geoghegan <p...@heroku.com> wrote:
> A customer is on 9.6.1, and complains of a segfault observed at least
> 3 times.

> I can use GDB to get details of the instruction pointer that appeared
> in the kernel trap error, which shows a function from the expanded
> value representation infrastructure:
>
> (gdb) info symbol 0x55fcf08b0491
> EOH_get_flat_size + 1 in section .text of /usr/lib/postgresql/9.6/bin/postgres
> (gdb) info symbol 0x55fcf08b0490
> EOH_get_flat_size in section .text of /usr/lib/postgresql/9.6/bin/postgres
> (gdb) disassemble 0x55fcf08b0490
> Dump of assembler code for function EOH_get_flat_size:
>0x55fcf08b0490 <+0>: push   %rbp
>0x55fcf08b0491 <+1>: mov0x8(%rdi),%rax
>0x55fcf08b0495 <+5>: mov%rsp,%rbp
>0x55fcf08b0498 <+8>: pop%rbp
>0x55fcf08b0499 <+9>: mov(%rax),%rax
>0x55fcf08b049c <+12>: jmpq   *%rax
> End of assembler dump.

For the sake of the archives: this now looks very much like the issue
that Tom just fixed with commit
9bf4068cc321a4d44ac54089ab651a49d89bb567.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ICU collation variant keywords and pg_collation entries (Was: [BUGS] Crash report for some ICU-52 (debian8) COLLATE and work_mem values)

2017-08-07 Thread Peter Geoghegan
On Mon, Aug 7, 2017 at 3:23 PM, Tom Lane <t...@sss.pgh.pa.us> wrote:
> The thing that I'm particularly thinking about is that if someone wants
> an ICU variant collation that we didn't make initdb provide, they'll do
> a CREATE COLLATION and go use it.  At update time, pg_dump or pg_upgrade
> will export/import that via CREATE COLLATION, and the only way it fails
> is if ICU rejects the collation name as garbage.  (Which, as we already
> established upthread, it's quite unlikely to do.)

Actually, it's *impossible* for ICU to fail to accept any string as a
valid locale within CREATE COLLATION, because CollationCreate() simply
doesn't sanitize ICU names. It doesn't do something like call
get_icu_language_tag(), unlike initdb (within
pg_import_system_collations()).

If I add such a test to CollationCreate(), it does a reasonable job of
sanitizing, while preserving the spirit of the BCP 47 language tag
format by not assuming that the user didn't specify a brand new locale
that it hasn't heard of. All of these are accepted with unmodified
master:

postgres=# CREATE COLLATION test1 (provider = icu, locale = 'en-x-icu');
CREATE COLLATION
postgres=# CREATE COLLATION test2 (provider = icu, locale = 'foo bar baz');
ERROR:  XX000: could not convert locale name "foo bar baz" to language
tag: U_ILLEGAL_ARGUMENT_ERROR
LOCATION:  get_icu_language_tag, collationcmds.c:454
postgres=# CREATE COLLATION test3 (provider = icu, locale = 'en-gb-icu');
ERROR:  XX000: could not convert locale name "en-gb-icu" to language
tag: U_ILLEGAL_ARGUMENT_ERROR
LOCATION:  get_icu_language_tag, collationcmds.c:454
postgres=# CREATE COLLATION test4 (provider = icu, locale = 'not-a-country');
CREATE COLLATION

If it's mandatory for get_icu_language_tag() to not throw an error
during initdb import when passed strings like these (that are
generated mechanically), why should we not do the same with CREATE
COLLATION? While the choice to preserve BCP 47's tolerance of missing
collations is debatable, not doing at least this much up-front is a
bug IMV.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ICU collation variant keywords and pg_collation entries (Was: [BUGS] Crash report for some ICU-52 (debian8) COLLATE and work_mem values)

2017-08-07 Thread Peter Geoghegan
On Mon, Aug 7, 2017 at 2:50 PM, Peter Eisentraut
<peter.eisentr...@2ndquadrant.com> wrote:
> On 8/6/17 20:07, Peter Geoghegan wrote:
>> I've looked into this. I'll give an example of what keyword variants
>> there are for Greek, and then discuss what I think each is.
>
> I'm not sure why we want to get into editorializing this.  We query ICU
> for the names of distinct collations and use that.

We ask ucol_getKeywordValuesForLocale() to get only "commonly used
[variant] values with the given locale" within
pg_import_system_collations(). So the editorializing has already
begun.

> It's more than most
> people need, sure, but it doesn't cost us anything.

It's also *less* than what other users need. I disagree on the cost of
redundancy among entries after initdb. It's just confusing to users,
and seems avoidable without adding special case logic. What's the
difference between el-u-co-standard-x-icu and el-x-icu?

> The alternatives
> are hand-maintaining a list of collations, or installing no collations
> by default.

A better alternative would be to actively take an interest in what
collations are created, by further refining the rules by which they
are created. We have a stable API, described by various standards,
that we can work with for this. This doesn't have to be a
maintainability burden. We can provide general guidance about how to
add stuff back within documentation.

I do think that we should actually list all the collations that are
available by default on some representative ICU version, once that
list is tightened up, just as other database systems list them. That
necessitates a little weasel wording that notes that later ICU
versions might add more, but that's not a problem IMV. I don't think
that CLDR will ever omit anything previously available, at least
within a reasonable timeframe [1].

[1] http://cldr.unicode.org/index/process/cldr-data-retention-policy
-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] max_files_per_processes vs others uses of file descriptors

2017-08-07 Thread Peter Geoghegan
On Mon, Aug 7, 2017 at 1:40 PM, Andres Freund <and...@anarazel.de> wrote:
> Given how close max_files_per_process is to the default linux limit of
> 1024 fds, I wonder if we shouldn't increase NUM_RESERVED_FDS by quite a
> bit?

Personally, any time I've seen a problem with this it was because an
extension leaked FDs, which is always going to fail in the end. The
extension leaked FDs because it didn't fully buy into using Postgres
resource managers, perhaps only in a subtle way. I find it hard to
imagine an extension author explicitly relying on any particular
amount of slop for FDs.

Is this specifically about postgres_fdw, or is there some other
specific problem you have in mind, that this would help solve?

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] ICU collation variant keywords and pg_collation entries (Was: [BUGS] Crash report for some ICU-52 (debian8) COLLATE and work_mem values)

2017-08-06 Thread Peter Geoghegan
On Sun, Aug 6, 2017 at 1:06 PM, Peter Geoghegan <p...@bowt.ie> wrote:
> On Sat, Aug 5, 2017 at 8:26 PM, Tom Lane <t...@sss.pgh.pa.us> wrote:
>> I'm quite disturbed though that the set of installed collations on these
>> two test cases seem to be entirely different both from each other and from
>> what you reported.  The base collations look generally similar, but the
>> "keyword variant" versions are not comparable at all.  Considering that
>> the entire reason we are interested in ICU in the first place is its
>> alleged cross-version collation behavior stability, this gives me the
>> exact opposite of a warm fuzzy feeling.  We need to understand why it's
>> like that and what we can do to reduce the variation, or else we're just
>> buying our users enormous future pain.  At least with the libc collations,
>> you can expect that if you have en_US.utf8 available today you will
>> probably still have en_US.utf8 available tomorrow.  I am not seeing any
>> reason to believe that the same holds for ICU collations.
>
> +1. That seems like something that is important to get right up-front.

I've looked into this. I'll give an example of what keyword variants
there are for Greek, and then discuss what I think each is. These
keyword variant locations on my machine with master + ICU support (ICU
55):

postgres=# \dOS+ el-*
 List of collations
   Schema   │  Name  │ Collate  │  Ctype
│ Provider │ Description
┼┼──┼──┼──┼─
 pg_catalog │ el-u-co-emoji-x-icu│ el-u-co-emoji│
el-u-co-emoji│ icu  │ Greek
 pg_catalog │ el-u-co-eor-x-icu  │ el-u-co-eor  │ el-u-co-eor
│ icu  │ Greek
 pg_catalog │ el-u-co-search-x-icu   │ el-u-co-search   │
el-u-co-search   │ icu  │ Greek
 pg_catalog │ el-u-co-standard-x-icu │ el-u-co-standard │
el-u-co-standard │ icu  │ Greek
 pg_catalog │ el-x-icu   │ el   │ el
│ icu  │ Greek
(5 rows)

Greek has only one region, standard Greek. A few other
language-regions have variations like multiple regions (e.g. Austrian
German), or a phonebook variant, which you don't see here. Almost all
have -emoji, -search, and -standard, which you do see here.

We pass "commonlyUsed = true" to ucol_getKeywordValuesForLocale()
within pg_import_system_collations(), and so it "will return only
commonly used values with the given locale in preferred order". But
should we go even further? If the charter of
pg_import_system_collations() is to import every possible valid
collation for pg_collation, then it's already failing at that by
limiting itself to "common variants". I agree with the decision to do
that, though, and I think we probably need to go a bit further.

Possible issues with current ICU pg_collation entries after initdb:

* I don't think we should have user-visible "search" collations at all.

Apparently "search" collations are useful because "primary- and
secondary-level distinctions for searching may not be the same as
those for sorting; in ICU, many languages provide a special "search"
collator with the appropriate level settings for search" [1]. I don't
think that we should expose "search" keyword variants at all, because
clearly they're an implementation detail that Postgres may one day
have special knowledge of [2], to correctly mix searching and sorting
semantics. For the time being, those should simply not be added within
pg_import_system_collations(). Someone could still create the entries
themselves, which seems harmless. Let's avoid establishing the
expectation that they'll be in pg_collation.

* Redundant ICU spellings for the same collation seem to appear.

I find it questionable that there is both a "el-x-icu" and a
"el-u-co-standard-x-icu". That looks like an artifact of how
pg_import_system_collations() was written, as opposed to a bonafide
behavioral difference. I cannot find an example of a
"$COUNTRY_CODE-x-icu" collation without a corresponding
"$COUNTRY_CODE-*-u-standard-x-icu" (The situation is similar for
regional variants, like Austrian German). What, if anything, is the
difference between each such pair of collations? Can we find a way to
provide only one canonical entry if those are simply different ICU
spellings?

* Many emoji variant collations.

I have to wonder if there is much value in creating so many
pg_collation entries that are mere variants to do pictographic emoji
sorting. Call me a killjoy, but I think that users that want that
behavior can create the collations themselves. We could still document
it. I wouldn't mind it if there wasn't so many emoji collations.

* Many EOR variant collations.

EOR as a collation variant i

Re: [HACKERS] modeling parallel contention (was: Parallel Append implementation)

2017-08-05 Thread Peter Geoghegan
On Thu, May 4, 2017 at 7:20 PM, David Rowley
<david.row...@2ndquadrant.com> wrote:
> I ended up writing the attached (which I'd not intended to post until
> some time closer to when the doors open for PG11). At the moment it's
> basically just a test patch to see how it affects things when we give
> workers a bit more to do before they come back to look for more work.
> In this case, I've just given them 10 pages to work on, instead of the
> 1 that's allocated in 9.6 and v10.

I think that this could benefit parallel sort, beyond the obvious fact
that it too must have the contention you describe.

We generally are faster at sorting presorted input for all kinds of
reasons (e.g., insertion sort fallback for quicksort, merging based on
replacement of heap's root item). It follows that it's to our
advantage to have parallel tuplesort read multiple pages in a range
into a worker at once within the parallel heap scan that feeds
workers. The leader, which merges worker runs, may ultimately have to
perform fewer comparisons as a result of this, which is where most of
the benefit would be.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: LP_DEAD hinting and not holding on to a buffer pin on leaf page (Was: [HACKERS] [WIP] Zipfian distribution in pgbench)

2017-08-05 Thread Peter Geoghegan
On Fri, Aug 4, 2017 at 3:30 PM, Peter Geoghegan <p...@bowt.ie> wrote:
> Yura Sokolov of Postgres Pro performed this benchmark at my request.
> He took the 9.5 commit immediately proceeding 2ed5b87f9 as a baseline.

I attach a simple patch that comments out the release of the buffer
pin for logged tables where an MVCC snapshot is used for a scan that
is not an index-only scan. This is the simplest way of evaluating the
effects of disabling 2ed5b87f9. Yura or others may find this helpful.

To be clear, I am certainly not suggesting that we should revert
2ed5b87f9. I do think that we need to give serious thought to fixing
the regression that 2ed5b87f9 introduced for LP_DEAD setting by index
scans, though.

-- 
Peter Geoghegan
diff --git a/src/backend/access/nbtree/nbtsearch.c b/src/backend/access/nbtree/nbtsearch.c
new file mode 100644
index 642c894..696beda
*** a/src/backend/access/nbtree/nbtsearch.c
--- b/src/backend/access/nbtree/nbtsearch.c
*** _bt_drop_lock_and_maybe_pin(IndexScanDes
*** 58,63 
--- 58,64 
  {
  	LockBuffer(sp->buf, BUFFER_LOCK_UNLOCK);
  
+ #if 0
  	if (IsMVCCSnapshot(scan->xs_snapshot) &&
  		RelationNeedsWAL(scan->indexRelation) &&
  		!scan->xs_want_itup)
*** _bt_drop_lock_and_maybe_pin(IndexScanDes
*** 65,70 
--- 66,72 
  		ReleaseBuffer(sp->buf);
  		sp->buf = InvalidBuffer;
  	}
+ #endif
  }
  
  

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [WIP] Zipfian distribution in pgbench

2017-08-04 Thread Peter Geoghegan
On Fri, Jul 21, 2017 at 4:51 AM, Alik Khilazhev
<a.khilaz...@postgrespro.ru> wrote:
> (Latest version of pgbench Zipfian patch)

While I'm +1 on this idea, I think that it would also be nice if there
was an option to make functions like random_zipfian() actually return
a value that has undergone perfect hashing. When this option is used,
any given value that the function returns would actually be taken from
a random mapping to some other value in the same range. So, you can
potentially get a Zipfian distribution without the locality.

While I think that Zipfian distributions are common, it's probably
less common for the popular items to be clustered together within the
primary key, unless the PK has multiple attributes, and the first is
the "popular attribute". For example, there would definitely be a lot
of interest among contiguous items within an index on "(artist,
album)" where "artist" is a popular artist, which is certainly quite
possible. But, if "album" is the primary key, and it's a SERIAL PK,
then there will only be weak temporal locality within the PK, and only
because today's fads are more popular than the fads from previous
years.

Just another idea, that doesn't have to hold this patch up.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: LP_DEAD hinting and not holding on to a buffer pin on leaf page (Was: [HACKERS] [WIP] Zipfian distribution in pgbench)

2017-08-04 Thread Peter Geoghegan
On Mon, Jul 31, 2017 at 10:54 AM, Peter Geoghegan <p...@bowt.ie> wrote:
> Let's wait to see what difference it makes if Alik's zipfian
> distribution pgbench test case uses unlogged tables. That may gives us a
> good sense of the problem for cases with contention/concurrency.

Yura Sokolov of Postgres Pro performed this benchmark at my request.
He took the 9.5 commit immediately proceeding 2ed5b87f9 as a baseline.
In all cases, logged tables were used. Note that this is not the most
effective benchmark for showing the regression, because he didn't
replace the PK with a non-unique index, though that is planned as
follow-up; I wanted to stick with Alik's Zipfian test case for the
time being.

(Using a unique index is not the most effective thing for showing the
regression because unique indexes have most LP_DEAD setting done in
_bt_check_unique(), so only SELECTs will do less LP_DEAD cleanup
there; SELECTs are 50% of all queries.)

His results with 10 minute pgbench runs:

Logged
clients | 8217fb14 | 2ed5b87f | master | hashsnap | hashsnap_lwlock
+--+--++--+
 10 |   201569 |   204154 | 201095 |   201793 |  206111
 20 |   328180 |   58 | 334858 |   336721 |  370769
 40 |   196352 |   194692 | 232413 |   231968 |  393947
 70 |   121083 |   116881 | 148066 |   148443 |  224162
110 |77989 |73414 |  99305 |   111404 |  161900
160 |48958 |45830 |  65830 |82340 |  115678
230 |27699 |25510 |  38186 |57617 |   80575
310 |16369 |15137 |  21571 |39745 |   56819
400 |10327 | 9486 |  13157 |27318 |   40859
500 | 6920 | 6496 |   8638 |18677 |   29429
650 | 4315 | 3971 |   5196 |11762 |   17883
800 | 2850 | 2772 |   3523 | 7733 |   10977

Note that you also see numbers from various patches from Yura, and the
master branch mixed in here, but 8217fb14 (before) and 2ed5b87f
(after) are the interesting numbers as far as this regression goes.

There is an appreciable reduction in TPS here, though this workload is
not as bad by that measure as first thought. There is a roughly 5%
regression here past 40 clients or so. The regression in the
*consistency* of transaction *throughput* is far more interesting,
though. I've been doing analysis of this by drilling down to
individual test cases with vimdiff, as follows:

$ vimdiff test_8217fb14_logged_1_pgbench_40.out
test_2ed5b87f_logged_1_pgbench_40.out

(I attach these two files as a single example. I can provide the full
results to those that ask for them privately; it's too much data to
attach to an e-mail to the list.)

You can see in this example that for most 5 second slices of the 10
minute benchmark, commit 2ed5b87f actually increases TPS somewhat,
which is good. But there are also occasional *big* drops in TPS,
sometimes by as much as 50% over a single 5 second period (when
ANALYZE runs, adding random I/O during holding an exclusive buffer
lock [1]?). When this slowdown happens, latency can be over 3 times
higher, too.

We see much more consistent performance without the B-Tree buffer pin
VACUUM optimization, where there is no discernible pattern of
performance dropping. The headline regression of 4% or 5% is not the
main problem here, it seems.

In summary, commit 2ed5b87f makes the workload have increased
throughput most of the time, but occasionally sharply reduces
throughput, which averages out to TPS being 4% or 5% lower overall. I
think we'll find that there are bigger problems TPS-wise with
non-unique indexes when that other test is performed by Yura; let's
wait for those results to come back.

Finally, I find it interesting that when Yura did the same benchmark,
but with 5% SELECTs + 95% UPDATEs, rather than 50% SELECTs + 50%
UPDATEs as above, the overall impact was surprisingly similar. His
results:

clients | 8217fb14 | 2ed5b87f | master | hashsnap | hashsnap_lwlock
+--+--++--+
 20 |   187697 |   187335 | 217558 |   215059 |  266894
 50 |81272 |78784 |  97948 |97659 |  157710
 85 |49446 |47683 |  64597 |70814 |  107748
130 |32358 |30393 |  42216 |50531 |   75001
190 |19403 |17569 |  25704 |35506 |   51292
270 |10803 | 9878 |  14166 |23851 |   35257
370 | 6108 | 5645 |   7684 |15390 |   23659
500 | 3649 | 3297 |   4225 | 9172 |   14643
650 | 2239 | 2049 |   2635 | 5887 |8588
800 | 1475 | 1424 |   1804 | 4035 |5611

If nothing else, this shows how generally reliant these kinds of
workload can be on LP_DEAD setting. And, there is one difference: The
regression is seen here 

Re: [HACKERS] Small code improvement for btree

2017-08-04 Thread Peter Geoghegan
On Fri, Aug 4, 2017 at 11:12 AM, Alvaro Herrera
<alvhe...@2ndquadrant.com> wrote:
> Interesting.  We learned elsewhere that it's better to integrate the
> "!= 0" test as part of the macro definition; so a
> better formulation of this patch would be to change the
> P_INCOMPLETE_SPLIT macro and omit the comparison in the Assert.  (See
> commit 594e61a1de03 for an example).
>
>
>> -   LockBuffer(hbuffer, BUFFER_LOCK_SHARE);
>> +   LockBuffer(hbuffer, BT_READ);

+1.

One Linus Torvalds rant that I actually agreed with was a rant against
the use of bool as a type in C code. It's fine, as long as you never
forget that it's actually just another integer.

> I think BT_READ and BT_WRITE are useless, and I'd rather get rid of
> them ...

Fair enough, but we should either use them consistently or not at all.
I'm not especially concerned about which, as long as it's one of those
two.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: LP_DEAD hinting and not holding on to a buffer pin on leaf page (Was: [HACKERS] [WIP] Zipfian distribution in pgbench)

2017-07-31 Thread Peter Geoghegan

Robert Haas <robertmh...@gmail.com> wrote:

On Mon, Jul 31, 2017 at 1:54 PM, Peter Geoghegan <p...@bowt.ie> wrote:

That is hard to justify. I don't think that failing to set LP_DEAD hints
is the cost that must be paid to realize a benefit elsewhere, though. I
don't see much problem with having both benefits consistently. It's
actually very unlikely that VACUUM will run, and a TID will be recycled
at exactly the wrong time. We could probably come up with a more
discriminating way of detecting that that may have happened, at least
for Postgres 11. We'd continue to use LSN; the slow path would be taken
when the LSN changed, but we do not give up on setting LP_DEAD bits. I
think we can justify going to the heap again in this slow path, if
that's what it takes.


Yeah, that might possibly be a good approach.


I also wonder if it's worth teaching lazy_scan_heap() to keep around a
list of TIDs that can at least have their LP_DEAD bit set within their
index page, for use during subsequent index vacuuming. Doing at least
this much for TIDs from heap pages that are skipped due to some other
session concurrently holding a pin on the heap page ("pinskipped_pages"
pages) could help some cases, and seems doable. VACUUM does not need an
extra interlock against another VACUUM (such as a buffer pin) here, of
course.

I wouldn't expect this to help very much on many workloads, including
Alik's Zipfian workload, but it might be useful to have a real guarantee
about how long it can be, in VACUUM cycles, before a dead index tuple at
least has its LP_DEAD bit set. LP_DEAD will only be set when an index
scan goes to the heap, and it's not hard to imagine workloads where no
index scan ever does that with dead tuples whose heap TIDs were killed
only very recently.

Unlike with heap pruning, setting the LP_DEAD bit of all dead index
tuples on a leaf page is pretty much as good as a full VACUUM of the
page. The only thing that makes it not quite as good is that you cannot
assume that it's safe to kill the heap TIDs afterwards.

--
Peter Geoghegan


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: LP_DEAD hinting and not holding on to a buffer pin on leaf page (Was: [HACKERS] [WIP] Zipfian distribution in pgbench)

2017-07-31 Thread Peter Geoghegan

Robert Haas <robertmh...@gmail.com> wrote:

On Thu, Jul 27, 2017 at 10:05 PM, Peter Geoghegan <p...@bowt.ie> wrote:

I really don't know if that would be worthwhile. It would certainly fix
the regression shown in my test case, but that might not go far enough.
I strongly suspect that there are more complicated workloads where
LP_DEAD cleanup from SELECT statements matters, which is prevented by
the LSN check thing, just because there are always other sessions that
modify the page concurrently. This might be true of Alik's Zipfian test
case, for example.


I haven't studied the test case, but I think as a general principle it
makes sense to be happy when someone comes up with an algorithm that
holds a lock for a shorter period of time (and buffer pins are a type
of lock).  There are a number of places (fast-path locking, for
example, or vacuum skipping pinned heap pages) where we have
fast-paths that get taken most of the time and slow paths that get
used when concurrent activity happens; empirically, such things often
work out to a win.  I think it's disturbing that this code seems to be
taking the slow-path (which, in context, means skipping LP_DEAD
cleanup) even there is no concurrent activity.  That's hard to
justify.


That is hard to justify. I don't think that failing to set LP_DEAD hints
is the cost that must be paid to realize a benefit elsewhere, though. I
don't see much problem with having both benefits consistently. It's
actually very unlikely that VACUUM will run, and a TID will be recycled
at exactly the wrong time. We could probably come up with a more
discriminating way of detecting that that may have happened, at least
for Postgres 11. We'd continue to use LSN; the slow path would be taken
when the LSN changed, but we do not give up on setting LP_DEAD bits. I
think we can justify going to the heap again in this slow path, if
that's what it takes.


But the fact that it is taking the slow-path when there *is*
concurrent activity is harder to complain about.  That might win or it
might lose; the non-concurrent case only loses.


Let's wait to see what difference it makes if Alik's zipfian
distribution pgbench test case uses unlogged tables. That may gives us a
good sense of the problem for cases with contention/concurrency.

--
Peter Geoghegan


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


<    1   2   3   4   5   6   7   8   9   10   >