date:20160909

Re: [HACKERS] Fix for PL/Python slow input arrays traversal issue

2016-09-09 Thread Pavel Stehule

This entry, should be closed, because this patch is part of another patch

The new status of this patch is: Waiting on Author

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] IF (NOT) EXISTS in psql-completion

2016-09-09 Thread Pavel Stehule

2016-09-06 15:00 GMT+02:00 Kyotaro HORIGUCHI <
horiguchi.kyot...@lab.ntt.co.jp>:

> Hello, this is the new version of this patch. Rebased on the
> current master.
>
> At Tue, 06 Sep 2016 13:06:51 +0900 (Tokyo Standard Time), Kyotaro
> HORIGUCHI  wrote in <
> 20160906.130651.171572544.horiguchi.kyot...@lab.ntt.co.jp>
> > Thank you. I'll rebase the following patch and repost as soon as
> > possible.
> >
> > https://www.postgresql.org/message-id/20160407.211917.
> 147996130.horiguchi.kyot...@lab.ntt.co.jp
>
> This patch consists of the following files. Since these files are
> splitted in strange criteria and order for historical reasons,
> I'll reorganize this and post them later.
>

ok, can I start with testing and review with some from these files?

Regards

Pavel


>
> - 0001-Suggest-IF-NOT-EXISTS-for-tab-completion-of-psql.patch
>
>   Add suggestion of IF (NOT) EXISTS on master. This should be the
>   last patch in this patchset.
>
> - 0002-Make-added-keywords-for-completion-queries-follow-to.patch
>
>   Current suggestion mechanism doesn't distinguish object names
>   and keywords, which should be differently handled in
>   determining letter cases. This patch fixes that.
>
> - 0003-Make-COMPLETE_WITH_ATTR-to-accept-additional-keyword.patch
>
>   This patch apply the 0002 fix to COMPLET_WITH_ATTR.
>
> - 0004-Refactoring-tab-complete-to-make-psql_completion-cod.patch
>
>   By Tom's suggestion, in order to modify previous_words in more
>   sane way, transforming the else-if sequence in psql_completion
>   into a simple if sequence.
>
> regards,
>
> --
> Kyotaro Horiguchi
> NTT Open Source Software Center
>

Re: [HACKERS][REVIEW] Tab Completion for CREATE DATABASE ... TEMPLATE ...

2016-09-09 Thread Vitaly Burovoy

Hello Sehrope Sarkuni,

I have reviewed the patch.
It is very simple (only an SQL query in the "psql" application changed).

It applies at the top of master.
It implements completion database names ("") for commands like
"CREATE DATABASE ... TEMPLATE ".
According to the documentation since 9.2 till devel a database can be
used as a template if it has a "datistemplate" mark or by superusers
or by their owners.
Previous implementation checked only "datistemplate" mark.

Tested manually in versions 9.2 and 10devel, I hope it can be
back-patched to all supported versions.
No documentation needed.

Mark it as "Ready for committer".


P.S.: While I was reviewing I simplified SQL query: improved version
only 2 seqscans instead of 3 seqscans with an inner loop in an
original one.
Please find a file "tab-complete-create-database-improved.patch" attached.

-- 
Best regards,
Vitaly Burovoy


tab-complete-create-database-improved.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] identity columns

2016-09-09 Thread Vitaly Burovoy

Hello Peter,

I have reviewed the patch.
Currently it does not applies at the top of master, the last commit
without a conflict is 975768f

It compiles and passes "make check" tests, but fails with "make check-world" at:
test foreign_data ... FAILED

It tries to implement SQL:2011 feature T174 ("Identity columns"):
* column definition;
* column altering;
* inserting clauses "OVERRIDING {SYSTEM|USER} VALUE".

It has documentation changes.


===
The implementation has several distinctions from the standard:

1. The standard requires "... ALTER COLUMN ... SET GENERATED { ALWAYS
| BY DEFAULT }" (9075-2:2011 subcl 11.20), but the patch implements it
as "... ALTER COLUMN ... ADD GENERATED { ALWAYS | BY DEFAULT } AS
IDENTITY"

2. The standard requires not more than one identity column, the patch
does not follow that requirement, but it does not mentioned in the
doc.

3. Changes in the table "information_schema.columns" is not full. Some
required columns still empty for identity columns:
* identity_start
* identity_increment
* identity_maximum
* identity_minimum
* identity_cycle

4. "" is not fully implemented
because "" is implemented
whereas "" is not.

5. According to 9075-2:2011 subcl 14.11 Syntax Rule 11)c) for a column
with an indication that values are generated by default the only
possible "" is "OVERRIDING USER VALUE".
Implementation allows to use "OVERRIDING SYSTEM VALUE" (as "do
nothing"), but it should be mentioned in "Compatibility" part in the
doc.

postgres=# CREATE TABLE itest10 (a int generated BY DEFAULT as
identity PRIMARY KEY, b text);
CREATE TABLE
postgres=# INSERT INTO itest10 overriding SYSTEM value SELECT 10, 'a';
INSERT 0 1
postgres=# SELECT * FROM itest10;
 a  | b
---+---
 10 | a
(1 row)

===

6. "CREATE TABLE ... (LIKE ... INCLUDING ALL)" fails Assertion  at
src/backend/commands/tablecmds.c:631
postgres=# CREATE TABLE itest1 (a int generated by default as identity, b text);
CREATE TABLE
postgres=# CREATE TABLE x(LIKE itest1 INCLUDING ALL);
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.


===

Also the implementation has several flaws in corner cases:


7. Changing default is allowed but a column is still "identity":

postgres=# CREATE TABLE itest4 (a int GENERATED ALWAYS AS IDENTITY, b text);
CREATE TABLE
postgres=# ALTER TABLE itest4 ALTER COLUMN a set DEFAULT 1;
ALTER TABLE
postgres=# \d itest4
Table "public.itest4"
 Column |  Type   |Modifiers
+-+--
 a  | integer | not null default 1  generated always as identity
 b  | text|


---
8. Changing a column to be "identity" raises "duplicate key" exception:

postgres=# CREATE TABLE itest4 (a serial, b text);
CREATE TABLE
postgres=# ALTER TABLE itest4 ALTER COLUMN a ADD GENERATED ALWAYS AS IDENTITY;
ERROR:  duplicate key value violates unique constraint
"pg_attrdef_adrelid_adnum_index"


---
9. Changing type of a column deletes linked sequence but leaves
"default" and "identity" marks:

postgres=# CREATE TABLE itest4 (a int GENERATED ALWAYS AS IDENTITY, b int);
CREATE TABLE
postgres=# ALTER TABLE itest4 ALTER COLUMN a TYPE text;
ALTER TABLE
postgres=# \d itest4;
Table "public.itest4"
 Column |  Type   |Modifiers
+-+--
 a  | text| default nextval('16445'::regclass)  generated
always as identity
 b  | integer |

postgres=# insert into itest4(b) values(1);
ERROR:  could not open relation with OID 16445
postgres=# select * from itest4;
 a | b
---+---
(0 rows)


---
10. "identity" modifier is lost when the table inherits another one:

postgres=# CREATE TABLE itest_err_1 (a int);
CREATE TABLE
postgres=# CREATE TABLE x (a int GENERATED ALWAYS AS
IDENTITY)inherits(itest_err_1);
NOTICE:  merging column "a" with inherited definition
CREATE TABLE
postgres=# \d itest_err_1; \d x;
  Table "public.itest_err_1"
 Column |  Type   | Modifiers
+-+---
 a  | integer |
Number of child tables: 1 (Use \d+ to list them.)

 Table "public.x"
 Column |  Type   |   Modifiers
+-+---
 a  | integer | not null default nextval('x_a_seq'::regclass)
Inherits: itest_err_1


---
11. The documentation says "OVERRIDING ... VALUE" can be placed even
before "DEFAULT VALUES", but it is against SQL spec and the
implementation:

postgres=# CREATE TABLE itest10 (a int GENERATED BY DEFAULT AS
IDENTITY, b text);
CREATE TABLE
postgres=# INSERT INTO itest10 DEFAULT VALUES;
INSERT 0 1
postgres=# INSERT INTO itest10 OVERRIDING USER VALUE VALUES(1,2);
INSERT 0 1
postgres=# INSERT INTO itest10 OVERRIDING USER VALUE

[HACKERS] cost_sort() may need to be updated

2016-09-09 Thread Peter Geoghegan

For external sorts, cost_sort() assumes the following:

 * The disk traffic is assumed to be 3/4ths sequential and 1/4th random
 * accesses (XXX can't we refine that guess?)

...
/* Assume 3/4ths of accesses are sequential, 1/4th are not */
startup_cost += npageaccesses *
(seq_page_cost * 0.75 + random_page_cost * 0.25);

I think that we *can* refine this guess, and should, because random
I/O is really quite unlikely to be a large cost these days (I/O in
general often isn't a large cost, actually). More fundamentally, I
think it's a problem that cost_sort() thinks that external sorts are
far more expensive than internal sorts in general. There is good
reason to think that that does not reflect the reality. I think we can
expect external sorts to be *faster* than internal sorts with
increasing regularity in Postgres 10.

In one sense, things got worse here in 9.6 when replacement selection
was all but killed, since runs became on average half their size,
which cost_sort() was taught about at the time. But, cost_sort()
didn't care about cases where only one pass is predicted (the great
majority of external sorts), so that didn't really matter. cost_sort()
doesn't seem to care about the size of runs to be merged at all, which
seems limiting. It also seems limiting that cost_sort() doesn't
differentiate between internal and external sort *startup* costs at
all. External sorts can often start returning tuples sooner, due to
the final on-the-fly merge mechanism.

It's pretty common to see cases where merging 4 runs takes only
slightly less time than an internal sort, and yet are thought to cost
more than twice as much. I realize that costs are always a bit fudged,
but I am fairly suspicious of how big the differences between external
and internal sorts regularly are. I suggest that this be revisited
soon.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] kqueue

2016-09-09 Thread Thomas Munro

On Wed, Sep 7, 2016 at 12:32 AM, Marko Tiikkaja  wrote:
> I've tested and reviewed this, and it looks good to me, other than this
> part:
>
> +   /*
> +* kevent guarantees that the change list has been processed in the
> EINTR
> +* case.  Here we are only applying a change list so EINTR counts as
> +* success.
> +*/
>
> this doesn't seem to be guaranteed on old versions of FreeBSD or any other
> BSD flavors, so I don't think it's a good idea to bake the assumption into
> this code.  Or what do you think?

Thanks for the testing and review!

Hmm.  Well spotted.  I wrote that because the man page from FreeBSD 10.3 says:

  When kevent() call fails with EINTR error, all changes in the changelist
  have been applied.

This sentence is indeed missing from the OpenBSD, NetBSD and OSX man
pages.  It was introduced by FreeBSD commit r280818[1] which made
kevent a Pthread cancellation point.  I investigated whether it is
also true in older FreeBSD and the rest of the BSD family.  I believe
the answer is yes.

1.  That commit doesn't do anything that would change the situation:
it just adds thread cancellation wrapper code to libc and libthr which
exits under certain conditions but otherwise lets EINTR through to the
caller.  So I think the new sentence is documentation of the existing
behaviour of the syscall.

2.  I looked at the code in FreeBSD 4.1[2] (the original kqueue
implementation from which all others derive) and the four modern
OSes[3][4][5][6].  They vary a bit but in all cases, the first place
that can produce EINTR appears to be in kqueue_scan when the
(variously named) kernel sleep routine is invoked, which can return
EINTR or ERESTART  (later translated to EINTR because kevent doesn't
support restarting).  That comes after all changes have been applied.
In fact it's unreachable if nevents is 0: OSX doesn't call kqueue_scan
in that case, and the others return early from kqueue_scan in that
case.

3.  An old email[7] from Jonathan Lemon (creator of kqueue) seems to
support that at least in respect of ancient FreeBSD.  He wrote:
"Technically, an EINTR is returned when a signal interrupts the
process after it goes to sleep (that is, after it calls tsleep).  So
if (as an example) you call kevent() with a zero valued timespec,
you'll never get EINTR, since there's no possibility of it sleeping."

So if I've understood correctly, what I wrote in the v4 patch is
universally true, but it's also moot in this case: kevent cannot fail
with errno == EINTR because nevents == 0.  On that basis, here is a
new version with the comment and special case for EINTR removed.

[1] https://svnweb.freebsd.org/base?view=revision=280818
[2] https://github.com/freebsd/freebsd/blob/release/4.1.0/sys/kern/kern_event.c
[3] https://github.com/freebsd/freebsd/blob/master/sys/kern/kern_event.c
[4] https://github.com/IIJ-NetBSD/netbsd-src/blob/master/sys/kern/kern_event.c
[5] https://github.com/openbsd/src/blob/master/sys/kern/kern_event.c
[6] https://github.com/opensource-apple/xnu/blob/master/bsd/kern/kern_event.c
[7] http://marc.info/?l=freebsd-arch=98147346707952=2

-- 
Thomas Munro
http://www.enterprisedb.com

kqueue-v5.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Tuplesort merge pre-reading

2016-09-09 Thread Claudio Freire

On Fri, Sep 9, 2016 at 9:51 PM, Claudio Freire  wrote:
> On Fri, Sep 9, 2016 at 8:13 AM, Heikki Linnakangas  wrote:
>>
>> Claudio, if you could also repeat the tests you ran on Peter's patch set on
>> the other thread, with these patches, that'd be nice. These patches are
>> effectively a replacement for
>> 0002-Use-tuplesort-batch-memory-for-randomAccess-sorts.patch. And review
>> would be much appreciated too, of course.
>>
>> Attached are new versions. Compared to last set, they contain a few comment
>> fixes, and a change to the 2nd patch to not allocate tape buffers for tapes
>> that were completely unused.
>
>
> Will do so

It seems both 1 and 1+2 break make check.

Did I misunderstand something? I'm applying the first patch of Peter's
series (cap number of tapes), then your first one (remove prefetch)
and second one (use larger read buffers).

Peter's patch needs some rebasing on top of those but nothing major.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Tuplesort merge pre-reading

2016-09-09 Thread Claudio Freire

On Fri, Sep 9, 2016 at 8:13 AM, Heikki Linnakangas  wrote:
>
> Claudio, if you could also repeat the tests you ran on Peter's patch set on
> the other thread, with these patches, that'd be nice. These patches are
> effectively a replacement for
> 0002-Use-tuplesort-batch-memory-for-randomAccess-sorts.patch. And review
> would be much appreciated too, of course.
>
> Attached are new versions. Compared to last set, they contain a few comment
> fixes, and a change to the 2nd patch to not allocate tape buffers for tapes
> that were completely unused.


Will do so


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)

2016-09-09 Thread Claudio Freire

...
On Fri, Sep 9, 2016 at 9:22 PM, Claudio Freire  wrote:
> Since it is true that doing so would make it impossible to keep the
> asserts about tupindex in tuplesort_heap_root_displace, I guess it
> depends on how useful those asserts are (ie: how likely it is that
> those conditions could be violated, and how damaging it could be if
> they were). If it is decided the refactor is desirable, I'd suggest
> making the common siftup producedure static inline, to allow
> tuplesort_heap_root_displace to inline and specialize it, since it
> will be called with checkIndex=False and that simplifies the resulting
> code considerably.
>
> Peter also mentioned that there were some other changes going on in
> the surrounding code that could impact this patch, so I'm marking the
> patch Waiting on Author.
>
> Overall, however, I believe the patch is in good shape. Only minor
> form issues need to be changed, the functionality seems both desirable
> and ready.


Sorry, forgot to specify, that was all about patch 3, the one about
tuplesort_heap_root_displace.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: GiST optimizing memmoves in gistplacetopage for fixed-size updates [PoC]

2016-09-09 Thread Tom Lane

I wrote:
> ... BTW, a quick grep for other calls of PageIndexTupleDelete suggests
> that SPGiST could make very good use of PageIndexTupleOverwrite, and
> there may be use for it in GIN too.

I've pushed this patch with some editorializing and some followup
work.  The above remains as possible material for a future patch,
but I'm going to mark this commitfest entry as closed.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] zero knowledge users

2016-09-09 Thread Langley, Scott E

Hello Andrew (from back in 2004),

Have things improved at all regarding Postgres and the configurability of 
reduced public privileges for databases?

We have a goal to hide unneeded database objects from our database-challenged 
users - as they might see in a simple database viewer application - by removing 
their privileges on such objects.

Regards,

Scott Langley
Systems Analyst/Programmer
Statistical Center for HIV/AIDS Research and Prevention (SCHARP)
Fred Hutchinson Cancer Research Center
Seattle, Washington


From:   Andrew Dunstan 
To: 
Cc: Postgresql Hackers 
Subject:Re: zero knowledge users
Date:   2004-04-06 15:50:46

I have been doing some experimentation for a series of articles I am 
writing, and want to create a user with as little privilege as possible 
who can still do the things I explicitly want him/her to be able to do.

In particular, I wanted to be able to deny any useful access to the 
metadata contained in catalogs and the information schema.

It can be argued that this is security by obscurity - and thus a Bad 
Thing (tm) - and if this were the only security measure being considered 
I would agree that it is less than 100% effective. Nevertheless, I think 
it is still a possibly useful measure against some unwanted intruders, 
for example, even if ways around it can be discovered by some. In any 
case, the policy question is not the purpose of my writing here ;-)

The problem I encountered in implementing this is that many of the 
catalogs have public access, and it seems impossible to designate a 
level of access lower than those for public. Also, to my surprise, 
removing public usage on the pg_catalog scheme didn't stop access to its 
contents.  In my testing, I have 3 users - accountsdba, apiowner and 
webuser, and want to make webuser a zero knowledge user. The approach I 
took was to create a new group called pspublic, containing the first 2 
users but not webuser, and then to replace all the public privileges on 
critical objects with equivalent privileges for pspublic. The commands I 
gave to do this are shown below - the tables and views chosen were those 
with "name" type objects, on the assumption that if the user can't see 
any names anything else should be fairly useless. I might well have 
included some I shouldn't have, though, or missed some I should have 
included.

So far the results have been reasonably promising - i.e. all users can 
do what I want/expect them to be able to do, and I have not discovered a 
way for the zero knowledge user to gain knowledge I want to deny access to.

The downsides to this approach are 1) it won't survive across a 
dump/reload, and 2) it will break on catalog structure changes. Some 
steps could be taken to alleviate these disadvantages, but it is would 
still be somewhat fragile. This set me thinking that maybe it would be 
worthwhile to provide a flag on create database which significantly 
reduces public privileges, so one would not have to go through such 
contortions.

Comments welcome

cheers

andrew


create group pspublic with user accountsdba, apiowner;
revoke all on schema pg_catalog, public, information_schema from public;
grant  usage on schema pg_catalog,information_schema to group pspublic;
grant all on schema public to group pspublic;
revoke select on table
 pg_am, pg_attribute, pg_class, pg_constraint, pg_conversion, pg_database,
 pg_group, pg_indexes, pg_language, pg_listener, pg_namespace, pg_opclass,
 pg_operator, pg_proc, pg_rewrite, pg_rules, pg_stat_activity,
 pg_stat_all_indexes, pg_stat_all_tables, pg_stat_database,
 pg_stat_sys_indexes, pg_stat_sys_tables, pg_stat_user_indexes,
 pg_stat_user_tables, pg_statio_all_indexes, pg_statio_all_sequences,
 pg_statio_all_tables, pg_statio_sys_indexes, pg_statio_sys_sequences,
 pg_statio_sys_tables, pg_statio_user_indexes, pg_statio_user_sequences,
 pg_statio_user_tables, pg_stats, pg_tables, pg_trigger, pg_type, pg_user,
 pg_views
from public;
grant select on table
 pg_am, pg_attribute, pg_class, pg_constraint, pg_conversion, pg_database,
 pg_group, pg_indexes, pg_language, pg_listener, pg_namespace, pg_opclass,
 pg_operator, pg_proc, pg_rewrite, pg_rules, pg_stat_activity,
 pg_stat_all_indexes, pg_stat_all_tables, pg_stat_database,
 pg_stat_sys_indexes, pg_stat_sys_tables, pg_stat_user_indexes,
 pg_stat_user_tables, pg_statio_all_indexes, pg_statio_all_sequences,
 pg_statio_all_tables, pg_statio_sys_indexes, pg_statio_sys_sequences,
 pg_statio_sys_tables, pg_statio_user_indexes, pg_statio_user_sequences,
 pg_statio_user_tables, pg_stats, pg_tables, pg_trigger, pg_type, pg_user,
 pg_views
to group pspublic;
revoke select, update on table pg_settings from public;
grant select,update on table pg_settings to group pspublic




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)

2016-09-09 Thread Peter Geoghegan

On Wed, Sep 7, 2016 at 2:36 PM, Heikki Linnakangas  wrote:
> 3. If we do that, we'll still have to reserve the tape buffers for all the
> tapes that we use during merge. So after we've built the initial runs, we'll
> need to reserve memory for those buffers. That might require shrinking
> memtuples. But that's OK: after building the initial runs, memtuples is
> empty, so we can shrink it.

Do you really think all this is worth the effort? Given how things are
going to improve for merging anyway, I tend to doubt it. I'd rather
just apply the cap (not necessarily 501 tapes, but something), and be
done with it. As you know, Knuth never advocated more than 7 tapes at
once, which I don't think had anything to do with the economics of
tape drives in the 1970s (or problems with tape operators getting
repetitive strange injuries). There is a chart in volume 3 about this.
Senior hackers talked about a cap like this from day one, back in
2006, when Simon and Tom initially worked on scaling the number of
tapes. Alternatively, we could make MERGE_BUFFER_SIZE much larger,
which I think would be a good idea independent of whatever waste
logically allocation of never-used tapes presents us with. It's
currently 1/4 of 1MiB, which is hardly anything these days, and
doesn't seem to have much to do with OS read ahead trigger sizes.

If we were going to do something like you describe here, I'd prefer it
to be driven by an observable benefit in performance, rather than a
theoretical benefit. Not doing everything in one pass isn't
necessarily worse than having a less cache efficient heap -- it might
be quite a bit better, in fact. You've seen how hard it can be to get
a sort that is I/O bound. (Sorting will tend to not be completely I/O
bound, unless perhaps parallelism is used).

Anyway, this patch (patch 0001-*) is by far the least important of the
3 that you and Claudio are signed up to review. I don't think it's
worth bending over backwards to do better. If you're not comfortable
with a simple cap like this, than I'd suggest that we leave it at
that, since our time is better spent elsewhere. We can just shelve it
for now -- "returned with feedback". I wouldn't make any noise about
it (although, I actually don't think that the cap idea is at all
controversial).

-- 
Peter Geoghegan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] delta relations in AFTER triggers

2016-09-09 Thread Kevin Grittner

v6 fixes recently-introduced bit-rot.

Kevin Grittner


On Wed, Aug 31, 2016 at 3:24 PM, Kevin Grittner  wrote:
> I have merged in the changes since v4 (a year and a half ago) and
> cured all bit-rot I found, to get the attached v5 which runs `make
> check world` without problem -- including the tests added for this
> feature.
>
> I did remove the contrib code that David Fetter wrote to
> demonstrate the correctness and performance of the tuplestores as
> created during the transaction, and how to use them directly from C
> code, before any API code was written.  If we want that to be
> committed, it should be considered separately after the main
> feature is in.
>
> Thanks to Thomas Munro who took a look at v4 and pointed out a bug
> (which is fixed in v5) and suggested a way forward for using the
> parameters.  Initial attempts to get that working were not
> successful,, but I think it is fundamentally the right course,
> should we reach a consensus to go that way,
>
> On Thu, Jul 7, 2016 at 5:07 PM, Robert Haas  wrote:
>> On Fri, May 13, 2016 at 2:37 PM, Kevin Grittner  wrote:
>>> On Fri, May 13, 2016 at 1:02 PM, David Fetter  wrote:
 On Thu, Jan 22, 2015 at 02:41:42PM +, Kevin Grittner wrote:
>>>
> [ideas on how to pass around references to ephemeral relations]

 [almost 17 months later]

 It seems like now is getting close to the time to get this into
 master.  The patch might have suffered from some bit rot, but the
 design so far seems sound.

 What say?
>>>
>>> I had a talk with Tom in Brussels about this.  As I understood it,
>>> he wasn't too keen on the suggestion by Heikki (vaguely
>>> sorta-endorsed by Robert) of passing ephemeral named relations such
>>> as these tuplestores around in the structures currently used for
>>> parameter values.  He intuitively foresaw the types of problems I
>>> had run into trying to use the same structure to pass a relation
>>> (with structure and rows and columns of data) as is used to pass,
>>> say, an integer.
>>
>> See, the thing that disappoints me about this is that I think we were
>> pretty closed to having the ParamListInfo-based approach working.
>
> Maybe, but the thing I would like to do before proceeding down that
> road is to confirm that we have a consensus that such a course is
> better than what Is on the branch which is currently working.  If
> that's the consensus here, I'll work on that for the next CF.  If
> not, there may not be a lot left to do before commit.  (Notably, we
> may want to provide a way to free a tuplestore pointer when done
> with it, but that's not too much work.)  Let me describe the API I
> have working.
>
> There are 11 function prototypes modified under src/include, in all
> cases to add a Tsrcache parameter:
> 1 createas.h
> 3 explain.h
> 1 prepare.h
> 1 analyze.h
> 2 tcopprot.h
> 3 utility.h
>
> There are three new function prototypes in SPI.  NOTE: This does
> *not* mean that SPI is required to use named tuplestores in
> queries, just that there are convenience functions for any queries
> being run through SPI, so that any code using SPI (including any
> PLs that do) will find assigning a name to a tuplestore and
> referencing that within a query about as easy as falling off a log.
> A tuplestore is registered to the current SPI context and not
> visible outside that context.  This results in a Tsrcache being
> passed to the functions mentioned above when that context is
> active, just as any non-SPI code could do.
>
>> The thing I liked about that approach is that we already know that any
>> place where you can provide parameters for a query, there will also be
>> an opportunity to provide a ParamListInfo.  And I think that
>> parameterizing a query by an ephemeral table is conceptually similar
>> to parameterizing it by a scalar value.  If we invent a new mechanism,
>> it's got to reach all of those same places in the code.
>
> Do you see someplace that the patch missed?
>
>> One other comment that I would make here is that I think that it's
>> important, however we pass the data, that the scope of the tuplestores
>> is firmly lexical and not dynamic.  We need to make sure that we don't
>> set some sort of context via global variables that might get used for
>> some query other than the one to which it's intended to apply.
>
> Is this based on anything actually in the patch?
>
>
> For this CF, the main patch attached is a working version of the
> feature that people can test, review documentation, etc.  Any API
> changes are not expected to change these visible behaviors, so any
> feedback on usability or documentation can be directly useful
> regardless of the API discussion.
>
> I have also attached a smaller patch which applies on top of the
> main one which rips out the Tsrcache API to get to a "no API"
> version that compiles cleanly and runs fine as long as you don't
> try to use the

Re: [HACKERS] Floating point comparison inconsistencies of the geometric types

2016-09-09 Thread Kevin Grittner

On Sun, Sep 4, 2016 at 12:42 PM, Emre Hasegeli  wrote:
>> The first patch fails to apply due to bit-rot.  That's easy enough
>> to correct, but then it runs into warnings on make:
>
> Rebased and fixed.

These patches apply and build on top of 5c609a74 with no problems,
but `make check` finds differences per the attached.  Please
investigate why the regression tests are failing and what the
appropriate response is.

>> Something to consider before posting new version -- should we
>> change some of those macros to static inline (in the .h files) to
>> avoid double-evaluation hazards?  They might perform as well or
>> even better that way, and remove a subtle programmer foot-gun.
>
> One reason to keep them as macros is that they are currently
> supporting both float4 and float8.  Currently they look like this:
>
> FLOAT_EQ(val1, val2)
> FLOAT_PL(result, val1, val2)
>
> I guess if we would use inline functions, they would look like:
>
> FLOAT8_EQ(val1, val2)
> result = FLOAT8_PL(val1, val2)
>
> Result would need to be defined again in the function to be checked
> for overflow.  I think this way would be more developer-friendly.  I
> have gone some trouble to allocate the result to be able to use the
> macros.  Do you know if the performance of both would be the same?

In one case where I was concerned that a static inline definition
would not perform as well as a macro, I went so far as to compare
the compiled code for both at a number of call sites in an
optimized build and found that a reasonably current gcc generated
*identical* code for both.  Of course, this will not always be the
case -- in particular for macros which require multiple evaluations
of one or more parameters and they are not simple literals.  In
such cases, the static inline often benchmarks faster because the
results from the potentially expensive expression can be stored on
the stack or in a register, and just referenced again.

> I am not much experienced in C.  If you think that inline functions
> are better suited, I can rework the patches.

I suspect that they will be as fast or faster, and they eliminate
the hazard of multiple evaluation, where a programmer might not be
aware of the multiple evaluation or of some side-effect of an
argument.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

regression.diffs
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] More ideas for speeding up sorting

2016-09-09 Thread Tom Lane

Peter Geoghegan  writes:
> I know that in practice, even virtual address space is significantly
> less than 64-bits on x86_64, so I think that there should be several
> bits there for the taking, even if the addresses are not aligned
> (although, they are). However, I have no idea if there is any
> precedent for this general idea at all, and would feel better about it
> if there was. I guess we don't have to make it any more complicated
> than your argument about alignment.

There's a very long history of code stealing bits out of pointers, and
all of it is bad and unportable in the long run :-(.  Let's not go there.
I thought the idea of getting rid of isnull by physically segregating the
null items sounded good though.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: GiST optimizing memmoves in gistplacetopage for fixed-size updates [PoC]

2016-09-09 Thread Tom Lane

... BTW, a quick grep for other calls of PageIndexTupleDelete suggests
that SPGiST could make very good use of PageIndexTupleOverwrite, and
there may be use for it in GIN too.

I see one place in btree where there's a PageIndexTupleDelete and then
an insert, but it's unlikely to be worth touching because the page is
expected to be empty after the PageIndexTupleDelete; so there's no
data movement we could avoid in that case.

I don't plan to investigate these cases myself, but somebody should.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] More ideas for speeding up sorting

2016-09-09 Thread Peter Geoghegan

On Fri, Sep 9, 2016 at 1:54 PM, Peter Geoghegan  wrote:
>> 2. We could easily categorize incoming tuples as the come in, into those
>> that have a leading NULL, and others. If we kept them in separate arrays, or
>> perhaps grow memtuples from bottom-up for non-NULLs and from top-down for
>> NULLs, we wouldn't need to store, or compare, the 'isnull1' flag at all, in
>> the quicksort.'
>
> We also wouldn't need to use datum1 to store nothing, at least in the
> "state.tuples" (non-datum-pass-by-value) case. Where the leading
> attribute is NULL, we can "discard" it, and sort those tuples
> separately, and return them to caller first or last as required.

Another thing: ApplySortComparator() (but not
ApplySortAbbrevFullComparator()) is now under no obligation to care
about NULL values, since that happens out-of-band for the leading
attribute. (Actually, you'd need to create a new variation for the
leading attribute only, since we reuse ApplySortComparator() for
non-leading attributes, but that might be yet another win that this
leads to.)

-- 
Peter Geoghegan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] More ideas for speeding up sorting

2016-09-09 Thread Peter Geoghegan

On Fri, Sep 9, 2016 at 6:14 AM, Heikki Linnakangas  wrote:
> A few things caught my eye while hacking on the latest round of sorting
> patches.

This is how it begins.  :-)

> Starting a new thread because these are orthogonal to the things
> discussed on the other threads:
>
> 1. SortTuple.tupindex is not used when the sort fits in memory. If we also
> got rid of the isnull1 flag, we could shrink SortTuple from 24 bytes to 16
> bytes (on 64-bit systems). That would allow you to pack more tuples into
> memory, which generally speeds things up. We could for example steal the
> least-significant bit from the 'tuple' pointer for that, because the
> pointers are always at least 4-byte aligned. (But see next point)

I've wanted to do that for a long time. I'd rather make SortTuples 16
bytes and be done with it, rather than introduce variations, mostly
because I now think it's possible without any real downside. Maybe
your preread patch gets us to a place where the tupindex field is
useless, because there is no chaining of SortTuples during merging, no
free list of the "slot" SortTuples, and so on. Maybe we could kill
replacement selection fully, so it no longer needs tupindex, leaving
us with no need for it entirely (not just when everything still fits
in memory). But even if we don't kill replacement selection entirely,
we still only need one bit these days, since in practice there are
only two distinct run numbers ever represented in tupindex these days
(RUN_FIRST, and HEAP_RUN_NEXT). We can spare a second bit from the
"tuple" pointer, I suppose.

I know that in practice, even virtual address space is significantly
less than 64-bits on x86_64, so I think that there should be several
bits there for the taking, even if the addresses are not aligned
(although, they are). However, I have no idea if there is any
precedent for this general idea at all, and would feel better about it
if there was. I guess we don't have to make it any more complicated
than your argument about alignment.

> 2. We could easily categorize incoming tuples as the come in, into those
> that have a leading NULL, and others. If we kept them in separate arrays, or
> perhaps grow memtuples from bottom-up for non-NULLs and from top-down for
> NULLs, we wouldn't need to store, or compare, the 'isnull1' flag at all, in
> the quicksort.'

We also wouldn't need to use datum1 to store nothing, at least in the
"state.tuples" (non-datum-pass-by-value) case. Where the leading
attribute is NULL, we can "discard" it, and sort those tuples
separately, and return them to caller first or last as required.

> 3. In the single-Datum case, i.e. tuplesort_begin_datum(), we could just
> keep a counter of how many NULLs we saw, and not store them at all. That's
> quite a special case, but it's simple enough that it might be worth it.

A lot hinges on the performance of this special case. For example,
CREATE INDEX CONCURRENTLY is quite sensitive to it in practice, since
the TID sort is (even in 9.6) the major reason for it being slower
than plain CREATE INDEX (I think that the second pass over the heap is
less important most of the time -- you've seen how sorting isn't I/O
bound much of the time for yourself, so you can imagine). Even still,
I'd be inclined to try to get SortTuples down to 16 bytes, at which
point this special case hardly needs to be considered (we just skip
the quicksort iff it's a single attribute sort, which includes
MinimalTuple cases where there happens to only be one attribute that
we sort on).

-- 
Peter Geoghegan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Logical Replication WIP

2016-09-09 Thread Peter Eisentraut

On 9/6/16 8:56 PM, Peter Eisentraut wrote:
> Some thoughts on pg_dump and such:

Another issue to add to this list:

With the current patch set, pg_dump will fail for unprivileged users,
because it can't read pg_subscription.  The include_subscription flag
ought to be checked in getSubscriptions() already, not (only) in
dumpSubscription().  The test suite for pg_dump fails because of this.

We might make further changes in this area, per ongoing discussion, but
it would be good to put in a quick fix for this in the next patch set so
that the global test suite doesn't fail.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] SELECT FOR UPDATE regression in 9.5

2016-09-09 Thread Alvaro Herrera

Magnus wrote:
> I ran the web application mentioned in Marti's original mail on a patched
> Postgres server. It looks like it is working correctly now, no more test
> failures.

Thanks for the confirmation.  I pushed the fix, along with the presented
test case.

I chose to backpatch all the way back to 9.3.  While I couldn't find a
way to reproduce the misbehavior in 9.3/9.4, even with my alternative
proposed fix for bug #8470, it seems safer to get the fix everywhere
just in case there is a chance that this can be reproduced with multiple
sessions somehow.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Assertion failure in REL9_5_STABLE

2016-09-09 Thread Alvaro Herrera

Marko Tiikkaja wrote:
> Hi,
> 
> Running one specific test from our application against REL9_5_STABLE
> (current as of today) gives me this:

Just pushed the fix.  Thanks for the report!

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Relax requirement for INTO with SELECT in pl/pgsql

2016-09-09 Thread Peter Eisentraut

On 6/6/16 9:59 AM, Merlin Moncure wrote:
> Thanks for the review.  All your comments look pretty reasonable. I'll
> touch it up and resubmit.

This is the last email in this thread that the commit fest app shows.  I
think we are waiting on an updated patch, with docs and tests.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] COPY command with RLS bug

2016-09-09 Thread Adam Brightwell

> Thanks for the report and the fix.

Yup. I have added it to the 2016-11 commitfest:
https://commitfest.postgresql.org/11/794/

> This seems a rather basic error to occur a year after release.
>
> Is this a problem with the testing of RLS? What other RLS related
> failures exist in other commands?

I think was simply due to a small gap in the test suite.

> Perhaps we should extend rowsecurity test with a more comprehensive
> set of tests rather than just fix the COPY one?

I think more tests that provide value are always a *good* thing,
however, would argue that other tests 'unrelated' to this fix are more
of a TODO item than something to include with this fix. Though, I am
certainly willing to attempt to find/add more test cases around this
specific functionality if that is desired.

-Adam


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GiST penalty functions [PoC]

2016-09-09 Thread Greg Stark

On Thu, Sep 8, 2016 at 9:29 AM, Andrew Borodin  wrote:
>>autoconf check for IEEE 754 floats
> Autoconf man says folowing:
>>it is safe to assume IEEE-754 in most portable code these days
> https://www.gnu.org/software/autoconf/manual/autoconf.html#Floating-Point-Portability

Personally I wouldn't be very happy about an IEEE754 assumption. I did
go to the trouble of testing Postgres on a VAX and we fixed the few
instances where it had such dependencies for a net gain. If we
intentionally put a dependency in in one place then we'll never be
able to determine anywhere else where there are unintentional
dependencies.

I haven't followed the thread closely but I'm puzzled why you would
need to use bit twiddling to set a floating point number. Isn't there
a perfectly good way to calculate the value you want using ldexp() and
other standard C library functions?

-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: GiST optimizing memmoves in gistplacetopage for fixed-size updates [PoC]

2016-09-09 Thread Tom Lane

Here's a draft patch that replaces BRIN's use of PageIndexDeleteNoCompact
and an immediately following PageAddItemExtended with
PageIndexTupleOverwrite.  It turns out that there are two use-cases in
BRIN for PageIndexDeleteNoCompact, and the other one can't be replaced
because while there is a PageAddItem call beside it, that one is for a
different page.  But this is still enough to let us get rid of
PAI_ALLOW_FAR_OFFSET, which seems like a good thing.

Actually, with this patch, PageAddItemExtended isn't directly referenced
anywhere, so we could revert that API and fold it back into PageAddItem,
saving a nanosecond or two per call.  I'm inclined to leave that alone,
though, since I agree with the underlying thought that any future
extensions to PageAddItem's functionality would be better handled by more
bits in a flags bitmask.  Maybe a better idea is to get rid of the call
overhead by turning PageAddItem into a macro.  In any case, doing
something about that could be a separate patch.

It's also kind of tempting to rewrite PageIndexDeleteNoCompact into a
function that only deletes one tuple, and presumably is simpler and faster
than what's there.  But maybe there's something coming down the pike that
needs the extra generality?  That should also be a separate patch, anyway.

This passes make check-world but I'm not sure how heavily that stresses
BRIN ... are there any other tests you'd want to see run before
committing?

regards, tom lane

diff --git a/src/backend/access/brin/brin_pageops.c b/src/backend/access/brin/brin_pageops.c
index 6ebfedd..24ae38b 100644
*** a/src/backend/access/brin/brin_pageops.c
--- b/src/backend/access/brin/brin_pageops.c
*** brin_doupdate(Relation idxrel, BlockNumb
*** 178,187 
  		}
  
  		START_CRIT_SECTION();
! 		PageIndexDeleteNoCompact(oldpage, , 1);
! 		if (PageAddItemExtended(oldpage, (Item) newtup, newsz, oldoff,
! PAI_OVERWRITE | PAI_ALLOW_FAR_OFFSET) == InvalidOffsetNumber)
! 			elog(ERROR, "failed to add BRIN tuple");
  		MarkBufferDirty(oldbuf);
  
  		/* XLOG stuff */
--- 178,185 
  		}
  
  		START_CRIT_SECTION();
! 		if (!PageIndexTupleOverwrite(oldpage, oldoff, (Item) newtup, newsz))
! 			elog(ERROR, "failed to replace BRIN tuple");
  		MarkBufferDirty(oldbuf);
  
  		/* XLOG stuff */
diff --git a/src/backend/access/brin/brin_xlog.c b/src/backend/access/brin/brin_xlog.c
index 27ba0a9..ed14729 100644
*** a/src/backend/access/brin/brin_xlog.c
--- b/src/backend/access/brin/brin_xlog.c
*** brin_xlog_samepage_update(XLogReaderStat
*** 189,202 
  		page = (Page) BufferGetPage(buffer);
  
  		offnum = xlrec->offnum;
- 		if (PageGetMaxOffsetNumber(page) + 1 < offnum)
- 			elog(PANIC, "brin_xlog_samepage_update: invalid max offset number");
  
! 		PageIndexDeleteNoCompact(page, , 1);
! 		offnum = PageAddItemExtended(page, (Item) brintuple, tuplen, offnum,
! 	 PAI_OVERWRITE | PAI_ALLOW_FAR_OFFSET);
! 		if (offnum == InvalidOffsetNumber)
! 			elog(PANIC, "brin_xlog_samepage_update: failed to add tuple");
  
  		PageSetLSN(page, lsn);
  		MarkBufferDirty(buffer);
--- 189,197 
  		page = (Page) BufferGetPage(buffer);
  
  		offnum = xlrec->offnum;
  
! 		if (!PageIndexTupleOverwrite(page, offnum, (Item) brintuple, tuplen))
! 			elog(PANIC, "brin_xlog_samepage_update: failed to replace tuple");
  
  		PageSetLSN(page, lsn);
  		MarkBufferDirty(buffer);
diff --git a/src/backend/storage/page/bufpage.c b/src/backend/storage/page/bufpage.c
index bce0d53..08e222e 100644
*** a/src/backend/storage/page/bufpage.c
--- b/src/backend/storage/page/bufpage.c
*** PageIsVerified(Page page, BlockNumber bl
*** 166,186 
   *	inserted, or InvalidOffsetNumber if the item is not inserted for any
   *	reason.  A WARNING is issued indicating the reason for the refusal.
   *
!  *	If flag PAI_OVERWRITE is set, we just store the item at the specified
!  *	offsetNumber (which must be either a currently-unused item pointer,
!  *	or one past the last existing item).  Otherwise,
!  *	if offsetNumber is valid and <= current max offset in the page,
!  *	insert item into the array at that position by shuffling ItemId's
!  *	down to make room.
!  *	If offsetNumber is not valid, then assign one by finding the first
   *	one that is both unused and deallocated.
   *
   *	If flag PAI_IS_HEAP is set, we enforce that there can't be more than
   *	MaxHeapTuplesPerPage line pointers on the page.
   *
-  *	If flag PAI_ALLOW_FAR_OFFSET is not set, we disallow placing items
-  *	beyond one past the last existing item.
-  *
   *	!!! EREPORT(ERROR) IS DISALLOWED HERE !!!
   */
  OffsetNumber
--- 166,189 
   *	inserted, or InvalidOffsetNumber if the item is not inserted for any
   *	reason.  A WARNING is issued indicating the reason for the refusal.
   *
!  *	offsetNumber must be either InvalidOffsetNumber to specify finding a
!  *	free item pointer, or a value between FirstOffsetNumber and one past
!  *	the last existing

Re: [HACKERS] Re: [COMMITTERS] pgsql: Use LEFT JOINs in some system views in case referenced row doesn

2016-09-09 Thread Tom Lane

Fujii Masao  writes:
> On Sat, Aug 20, 2016 at 6:13 AM, Tom Lane  wrote:
>> Use LEFT JOINs in some system views in case referenced row doesn't exist.

> This change causes pg_stat_activity to report the "bogus" information about
> walsenders as follows.

Hmm ... but if we want to exclude walsenders from that view, surely we
should do so explicitly, not have it happen as a hidden side-effect
of not being able to join them to pg_database.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Re: [COMMITTERS] pgsql: Use LEFT JOINs in some system views in case referenced row doesn

2016-09-09 Thread Fujii Masao

On Sat, Aug 20, 2016 at 6:13 AM, Tom Lane  wrote:
> Use LEFT JOINs in some system views in case referenced row doesn't exist.
>
> In particular, left join to pg_authid so that rows in pg_stat_activity
> don't disappear if the session's owning user has been dropped.
> Also convert a few joins to pg_database to left joins, in the same spirit,
> though that case might be harder to hit.  We were doing this in other
> views already, so it was a bit inconsistent that these views didn't.

This change causes pg_stat_activity to report the "bogus" information about
walsenders as follows.

$ ps aux | grep 75076
postgres75076   0.0  0.0  2608252   1364   ??  Ss2:41AM
0:00.07 postgres: wal sender process postgres [local] streaming
0/3000140

=# SELECT * FROM pg_stat_activity;
-[ RECORD 1 ]+
datid| 0
datname  | (null)
pid  | 75076
usesysid | 10
usename  | postgres
application_name | sby1
client_addr  | (null)
client_hostname  | (null)
client_port  | -1
backend_start| 2016-09-10 02:41:24.568313+09
xact_start   | (null)
query_start  | (null)
state_change | 2016-09-10 02:41:24.570135+09
wait_event_type  | (null)
wait_event   | (null)
state| idle
backend_xid  | (null)
backend_xmin | (null)
query|

I think that this information is confusing and should not exist in
pg_stat_activity.

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Tuplesort merge pre-reading

2016-09-09 Thread Peter Geoghegan

On Fri, Sep 9, 2016 at 5:25 AM, Greg Stark  wrote:
> Wow, this is really cool. We should do something like this for query
> execution too.

We should certainly do this for tuplestore.c, too. I've been meaning
to adopt it to use batch memory. I did look at it briefly, and recall
that it was surprisingly awkward because a surprisingly large number
of callers want to have memory that they can manage independently of
the lifetime of their tuplestore.

-- 
Peter Geoghegan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Tuplesort merge pre-reading

2016-09-09 Thread Peter Geoghegan

On Fri, Sep 9, 2016 at 4:55 AM, Heikki Linnakangas  wrote:
> I'm happy with the amount of testing I've done now, and the results. Does
> anyone want to throw out any more test cases where there might be a
> regression? If not, let's get these reviewed and committed.

I'll try to look at this properly tomorrow. Currently still working
away at creating a new revision of my sorting patchset. Obviously this
is interesting, but it raises certain questions for the parallel
CREATE INDEX patch in particular that I'd like to get straight, aside
from everything else.

I've been using an AWS d2.4xlarge instance for testing all my recent
sort patches, with 16 vCPUs, 122 GiB RAM, 12 x 2 TB disks. It worked
well to emphasize I/O throughput and parallelism over latency. I'd
like to investigate how this pre-reading stuff does there. I recall
that for one very large case, it took a full minute to do just the
first round of preloading during the leader's final merge (this was
with something like 50GB of maintenance_work_mem). So, it will be
interesting.

BTW, noticed a typo here:

> + * track memory usage of indivitual tuples.

-- 
Peter Geoghegan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [sqlsmith] Failed assertion in joinrels.c

2016-09-09 Thread Dilip Kumar

On Fri, Sep 9, 2016 at 6:51 PM, Tom Lane  wrote:
> Pushed with cosmetic adjustments --- mostly, I thought we needed some
> comments about the topic.

Okay, Thanks.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: Install extensions using update scripts (was Re: [HACKERS] Remove superuser() checks from pgstattuple)

2016-09-09 Thread Tom Lane

Andres Freund  writes:
> On 2016-09-07 09:46:32 -0400, Tom Lane wrote:
>> + * If reject_indirect is true, ignore paths that go through installable
>> + * versions (presumably, caller will consider starting from such versions).

> Reading this I was initially confused, only after reading
> find_install_path() this made sense. It's to cut the search short,
> right?  Rephrasing this a bit might make sense.

Hm, I think I had a reason why that was important and not just an
optimization, but right now I don't remember why.  Will take a look
and clarify the comment.

>> +/*
>> + * We don't expect it to be installable, but maybe 
>> somebody added
>> + * a suitable script right after our stat() test.
>> + */
>> +if (evi_target->installable)
>> +{
>> +/* Easy, no extra scripts */
>> +updateVersions = NIL;
>> +}

> Heh, that's an odd thing to handle.

Yeah, it's an unlikely race condition, but if it did happen we'd hit the
"Assert(!evi_target->installable)" in find_install_path, and then the
"Assert(evi_start != evi_target)" in find_update_path.  Maybe that's not
worth worrying about, since it looks like there'd be no ill effects in
non-assert builds: AFAICS we'd correctly deduce that we should use the
now-installable script with no update steps.

>> +ereport(ERROR,
>> +
>> (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
>> + errmsg("extension 
>> \"%s\" has no installation script for version \"%s\"",
>> +
>> pcontrol->name, versionName)));

> Maybe, and I mean maybe, we should rephrase this to hint at indirect
> installability?

Not sure, what would better wording be?

>> + several versions, for which the author will need to write update 
>> scripts.
>> + For example, if you have released a foo extension in
>> + versions 1.0, 1.1, and 1.2, there
>> + should be update scripts foo--1.0--1.1.sql
>> + and foo--1.1--1.2.sql.
>> + Before PostgreSQL 10, it was necessary to create new
>> + script files foo--1.1.sql and foo--1.2.sql
>> + containing the same changes, or else the newer versions could not be

> Maybe replace "same changes" "directly creating the extension's
> contents" or such?

Well, the main point is that you'd have to duplicate the effects of the
update script.  Not sure how to improve it without drawing attention
away from that.

Thanks for reviewing!

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Add support for restrictive RLS policies

2016-09-09 Thread Tom Lane

Stephen Frost  writes:
> I saw the various list-based cases and commented in my reply (the one you
> quote part of above) why those didn't seem to make sense for this case
> (it's not a list and I don't see it ever growing into one).

I think Alvaro was simply questioning whether there would ever be a need
for more than two alternatives.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] An extra error for client disconnection on Windows

2016-09-09 Thread Tom Lane

Haribabu Kommi  writes:
> On Thu, Jun 2, 2016 at 6:51 PM, Kyotaro HORIGUCHI <
> horiguchi.kyot...@lab.ntt.co.jp> wrote:
>> After a process termination without PQfinish() of a client,
>> server emits the following log message not seen on Linux boxes.
>> LOG:  could not receive data from client: An existing connection was
>> forcibly closed by the remote host.
>> 
>> This patch translates WSAECONNRESET of WSARecv to an EOF so that
>> pgwin32_recv behaves the same way with Linux.

> Marked the patch as "ready for committer".

Windows is not my platform, but ... is this actually an improvement?
I'm fairly concerned that this change would mask real errors that ought
to get logged.  I don't know that that's an okay price to pay for
suppressing a log message when clients violate the protocol.

According to
https://msdn.microsoft.com/en-us/library/windows/desktop/ms740668(v=vs.85).aspx
WSAECONNRESET means:

An existing connection was forcibly closed by the remote host. This
normally results if the peer application on the remote host is
suddenly stopped, the host is rebooted, the host or remote network
interface is disabled, or the remote host uses a hard close (see
setsockopt for more information on the SO_LINGER option on the remote
socket). This error may also result if a connection was broken due to
keep-alive activity detecting a failure while one or more operations
are in progress. Operations that were in progress fail with
WSAENETRESET. Subsequent operations fail with WSAECONNRESET.

(The description of WSAENETRESET, on the same page, indicates that the
last two sentences apply only to the keep-alive failure case.)

So this change would deal nicely with the "peer application on the remote
host is suddenly stopped" case, at the price of being not nice about any
of the other cases.  Not convinced it's a good tradeoff.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Make better use of existing enums in plpgsql

2016-09-09 Thread Peter Eisentraut

On 9/4/16 12:26 PM, Pavel Stehule wrote:
> I am sending review of this trivial patch.
> 
> 1. No problems with patching, compiling
> 2. all regress tests passed
> 3. There are not any visible change, so tests, docs are not necessary
> 4. Using enum instead int is generally good idea
> 
> I will mark this patch as ready for commiters.

Committed, thanks.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Add support for restrictive RLS policies

2016-09-09 Thread Stephen Frost

* Tom Lane (t...@sss.pgh.pa.us) wrote:
> Stephen Frost  writes:
> > * Alvaro Herrera (alvhe...@2ndquadrant.com) wrote:
> >> Can't you keep those words as Sconst or something (DefElems?) until the
> >> execution phase, so that they don't need to be keywords at all?
> 
> > Seems like we could do that, though I'm not convinced that it really
> > gains us all that much.  These are only unreserved keywords, of course,
> > so they don't impact users the way reserved keywords (of any kind) can.
> > While there may be some places where we use a string to represent a set
> > of defined options, I don't believe that's typical
> 
> -1 for having to write them as string literals; but I think what Alvaro
> really means is to arrange for the words to just be identifiers in the
> grammar, which you strcmp against at execution.  See for example
> reloption_list.  (Whether you use DefElem as the internal representation
> is a minor detail, though it might help for making the parsetree
> copyObject-friendly.)

I saw the various list-based cases and commented in my reply (the one you
quote part of above) why those didn't seem to make sense for this case
(it's not a list and I don't see it ever growing into one).

> vacuum_option_elem shows another way to avoid making a word into a
> keyword, although to me that one is more of an antipattern; it'd be better
> to leave the strcmp to execution, since there's so much other code that
> does things that way.

Both of those cases are for lists, which this is not.  I certainly
understand that it makes sense to allow a list of options to be passed
in any order and that means we need to build just the list with the
grammar and then check what's in the list at execution time, and further
check that the user didn't provide a set of invalid or duplicative
options, but this isn't a list.  If the thinking is that it *should* be
a list, then it'd be really helpful to see an example or two of what the
list-based syntax would be.  I provided one in my reply and commented on
why it didn't seem like a good approach.

Thanks!

Stephen

signature.asc
Description: Digital signature

Re: [HACKERS] PoC: Make it possible to disallow WHERE-less UPDATE and DELETE

2016-09-09 Thread Peter Eisentraut

Review of the patch in the commit fest:

- Various naming/spelling inconsistencies: In the source, the module
  is require_where, the documentation titles it require-where, the GUC
  parameters are requires_where.*, but incorrectly documented.

- Unusual indentation in the Makefile

- Needs tests

- Not sure about errcode(ERRCODE_CARDINALITY_VIOLATION), which is
  documented in the code as "this means something returned the wrong
  number of rows".  I think ERRCODE_SYNTAX_ERROR or something from
  nearby there would be better.

- errhint() string should end with a period.

- The 7th argument of DefineCustomBoolVariable() is of type int, not
  bool, so passing false is somewhat wrong, even if it works.

- There ought to be a _PG_fini() function that undoes what _PG_init()
  does.

- The documentation should be expanded and clarified.  Given that this
  is a "training wheels" module, we can be extra clear here.  I would
  like to see some examples at least.

- The documentation is a bit incorrect about the ways to load this
  module.  shared_preload_libraries is not necessary.  session_ and
  local_ (with prep) should also work.

- The claim in the documentation that only superusers can do things
  with this module is not generally correct.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GiST penalty functions [PoC]

2016-09-09 Thread Михаил Бахтерев

If you are still interested in. Here are 3 versions of pack-float. The
union version of pack-float should run faster. The code is simpler, the
dependencies are easier.

But it may be less accurate or even wrong, as for signed integers (x>>2)
and (x/4) are not the same. Consider x = -1.

You may try pack_float_good, which gives the same asm as v3, but without
warnings.

- Mikhail, respectfully

On Thu, Sep 08, 2016 at 01:29:36PM +0500, Andrew Borodin wrote:
> >autoconf check for IEEE 754 floats
> Autoconf man says folowing:
> >it is safe to assume IEEE-754 in most portable code these days
> https://www.gnu.org/software/autoconf/manual/autoconf.html#Floating-Point-Portability
> 
> > A union might be more readable
> Here is union version of the patch. It's slower 10% than original cube
> and dereference version. Have no idea why.
> Select performance is improved as in v3.
> 
#include 

typedef union { float fp; int i; } U;

float pack_float(const float v, const int r)
{
  const U a = { .fp = v };
  const U b = { .i = (a.i >> 2) + r * (INT32_MAX / 4) };

  return b.fp;
}

float pack_float_av(float v, int r)
{
  U buf;

  buf.fp = v;
  buf.i = (buf.i >> 2) + (INT32_MAX / 4) * r;

  return buf.fp;
}

float
pack_float_v3(float actualValue, int realm)
{
  /* two bits for realm, others for value */
  /* we have 4 realms   */
  int realmAjustment = *((int*))/4;
  int realCode = realm * (INT32_MAX/4) + realmAjustment;
  return *((float*));
}

float pack_float_good(const float v, const int r)
{
  const U a = { .fp = v };
  const U b = { .i = a.i/4 + r * (INT32_MAX / 4) };

  return b.fp;
}

	.file	"pack-float.c"
	.text
	.p2align 4,,15
	.globl	pack_float
	.type	pack_float, @function
pack_float:
.LFB0:
	.cfi_startproc
	movd	%xmm0, %eax
	movl	%edi, %edx
	sall	$29, %edx
	sarl	$2, %eax
	subl	%edi, %edx
	addl	%edx, %eax
	movl	%eax, -4(%rsp)
	movss	-4(%rsp), %xmm0
	ret
	.cfi_endproc
.LFE0:
	.size	pack_float, .-pack_float
	.p2align 4,,15
	.globl	pack_float_av
	.type	pack_float_av, @function
pack_float_av:
.LFB1:
	.cfi_startproc
	movd	%xmm0, %eax
	movl	%edi, %edx
	sall	$29, %edx
	sarl	$2, %eax
	subl	%edi, %edx
	addl	%edx, %eax
	movl	%eax, -4(%rsp)
	movss	-4(%rsp), %xmm0
	ret
	.cfi_endproc
.LFE1:
	.size	pack_float_av, .-pack_float_av
	.p2align 4,,15
	.globl	pack_float_v3
	.type	pack_float_v3, @function
pack_float_v3:
.LFB2:
	.cfi_startproc
	movd	%xmm0, %edx
	leal	3(%rdx), %eax
	testl	%edx, %edx
	cmovns	%edx, %eax
	sarl	$2, %eax
	movl	%eax, %edx
	movl	%edi, %eax
	sall	$29, %eax
	subl	%edi, %eax
	addl	%edx, %eax
	movl	%eax, -4(%rsp)
	movss	-4(%rsp), %xmm0
	ret
	.cfi_endproc
.LFE2:
	.size	pack_float_v3, .-pack_float_v3
	.p2align 4,,15
	.globl	pack_float_good
	.type	pack_float_good, @function
pack_float_good:
.LFB3:
	.cfi_startproc
	movd	%xmm0, %edx
	leal	3(%rdx), %eax
	testl	%edx, %edx
	cmovns	%edx, %eax
	sarl	$2, %eax
	movl	%eax, %edx
	movl	%edi, %eax
	sall	$29, %eax
	subl	%edi, %eax
	addl	%edx, %eax
	movl	%eax, -4(%rsp)
	movss	-4(%rsp), %xmm0
	ret
	.cfi_endproc
.LFE3:
	.size	pack_float_good, .-pack_float_good
	.ident	"GCC: (GNU) 6.1.1 20160802"
	.section	.note.GNU-stack,"",@progbits


signature.asc
Description: PGP signature

Re: [HACKERS] GiST penalty functions [PoC]

2016-09-09 Thread Михаил Бахтерев

Yes. You are right, ANSI C allows only load-time initializers. Attached
ANSI compatible version leads to the same assembly.

And let me suggest a bit-twiddling version as well. It gives 12
instructions, instead of 13. 12 is better, as modern x86 CPU will fetch
them at most in 3 cycles, one less than for 13 instructions. Also this
bit-twiddling is more parallel at instruction level.

And for ARM, which is unsurpassed at bit-twiddling this code is a way
better.

Of course speed is influenced by a lot of factors as always, so it needs
to be tested on some datasets.

- Mikhail, respectfully

On Fri, Sep 09, 2016 at 08:50:53AM +0500, Andrey Borodin wrote:
> Thank you for your attention to details, Mikhail.
> 
> pack_float_good() looks good. But I'm not sure inline strict init is allowed 
> under ansi C. Converting to regular ancient form b.fp = v; won't change 
> compile result, would it?
> 
> Regards, Andrey Borodin.
#include 

typedef union { float fp; int i; } U;

float pack_float(const float v, const int r)
{
  const U a = { .fp = v };
  const U b = { .i = (a.i >> 2) + r * (INT32_MAX / 4) };

  return b.fp;
}

float pack_float_av(float v, int r)
{
  U buf;

  buf.fp = v;
  buf.i = (buf.i >> 2) + (INT32_MAX / 4) * r;

  return buf.fp;
}

float
pack_float_v3(float actualValue, int realm)
{
  /* two bits for realm, others for value */
  /* we have 4 realms   */
  int realmAjustment = *((int*))/4;
  int realCode = realm * (INT32_MAX/4) + realmAjustment;
  return *((float*));
}

float pack_float_good(const float v, const int r)
{
  const U a = { .fp = v };
  const U b = { .i = a.i/4 + r * (INT32_MAX / 4) };

  return b.fp;
}

float pack_float_ansi(const float v, const int r)
{
  union { float f; int i; } a;

  a.f = v;
  a.i = a.i / 4 + r * (INT32_MAX / 4);

  return a.f;
}

float pack_float_bits(const float v, const int r)
{
  union {
float f;
struct { unsigned value:31, sign:1; } vbits;
struct { unsigned value:29, realm:2, sign:1; } rbits;
  } a;

  a.f = v;
  a.rbits.value = a.vbits.value >> 2;
  a.rbits.realm = r;

  return a.f;
}
	.file	"pack-float.c"
	.text
	.p2align 4,,15
	.globl	pack_float
	.type	pack_float, @function
pack_float:
.LFB0:
	.cfi_startproc
	movd	%xmm0, %eax
	movl	%edi, %edx
	sall	$29, %edx
	sarl	$2, %eax
	subl	%edi, %edx
	addl	%edx, %eax
	movl	%eax, -4(%rsp)
	movss	-4(%rsp), %xmm0
	ret
	.cfi_endproc
.LFE0:
	.size	pack_float, .-pack_float
	.p2align 4,,15
	.globl	pack_float_av
	.type	pack_float_av, @function
pack_float_av:
.LFB1:
	.cfi_startproc
	movd	%xmm0, %eax
	movl	%edi, %edx
	sall	$29, %edx
	sarl	$2, %eax
	subl	%edi, %edx
	addl	%edx, %eax
	movl	%eax, -4(%rsp)
	movss	-4(%rsp), %xmm0
	ret
	.cfi_endproc
.LFE1:
	.size	pack_float_av, .-pack_float_av
	.p2align 4,,15
	.globl	pack_float_v3
	.type	pack_float_v3, @function
pack_float_v3:
.LFB2:
	.cfi_startproc
	movd	%xmm0, %edx
	leal	3(%rdx), %eax
	testl	%edx, %edx
	cmovns	%edx, %eax
	sarl	$2, %eax
	movl	%eax, %edx
	movl	%edi, %eax
	sall	$29, %eax
	subl	%edi, %eax
	addl	%edx, %eax
	movl	%eax, -4(%rsp)
	movss	-4(%rsp), %xmm0
	ret
	.cfi_endproc
.LFE2:
	.size	pack_float_v3, .-pack_float_v3
	.p2align 4,,15
	.globl	pack_float_good
	.type	pack_float_good, @function
pack_float_good:
.LFB3:
	.cfi_startproc
	movd	%xmm0, %edx
	leal	3(%rdx), %eax
	testl	%edx, %edx
	cmovns	%edx, %eax
	sarl	$2, %eax
	movl	%eax, %edx
	movl	%edi, %eax
	sall	$29, %eax
	subl	%edi, %eax
	addl	%edx, %eax
	movl	%eax, -4(%rsp)
	movss	-4(%rsp), %xmm0
	ret
	.cfi_endproc
.LFE3:
	.size	pack_float_good, .-pack_float_good
	.p2align 4,,15
	.globl	pack_float_ansi
	.type	pack_float_ansi, @function
pack_float_ansi:
.LFB4:
	.cfi_startproc
	movd	%xmm0, %edx
	leal	3(%rdx), %eax
	testl	%edx, %edx
	cmovns	%edx, %eax
	sarl	$2, %eax
	movl	%eax, %edx
	movl	%edi, %eax
	sall	$29, %eax
	subl	%edi, %eax
	addl	%edx, %eax
	movl	%eax, -4(%rsp)
	movss	-4(%rsp), %xmm0
	ret
	.cfi_endproc
.LFE4:
	.size	pack_float_ansi, .-pack_float_ansi
	.p2align 4,,15
	.globl	pack_float_bits
	.type	pack_float_bits, @function
pack_float_bits:
.LFB5:
	.cfi_startproc
	movd	%xmm0, %edx
	movd	%xmm0, %eax
	andl	$3, %edi
	sall	$29, %edi
	andl	$2147483647, %edx
	andl	$-2147483648, %eax
	shrl	$2, %edx
	orl	%edx, %eax
	orl	%edi, %eax
	movl	%eax, -4(%rsp)
	movss	-4(%rsp), %xmm0
	ret
	.cfi_endproc
.LFE5:
	.size	pack_float_bits, .-pack_float_bits
	.ident	"GCC: (GNU) 6.1.1 20160802"
	.section	.note.GNU-stack,"",@progbits
	.arch armv7-a
	.eabi_attribute 28, 1
	.eabi_attribute 20, 1
	.eabi_attribute 21, 1
	.eabi_attribute 23, 3
	.eabi_attribute 24, 1
	.eabi_attribute 25, 1
	.eabi_attribute 26, 2
	.eabi_attribute 30, 2
	.eabi_attribute 34, 1
	.eabi_attribute 18, 4
	.file	"pack-float.c"
	.text
	.align	2
	.global	pack_float
	.syntax unified
	.arm
	.fpu vfpv3-d16
	.type	pack_float, %function
pack_float:
	@ args = 0, pretend = 0, frame = 0
	@ frame_needed = 0, uses_anonymous_args = 0
	@ link register save eliminated.
	vmov	r3, s0	@ int
	rsb	r0, r0, r0, lsl #29
	add	r0, r0, r3, asr #2
	vmov	s0, r0
	bx	lr
	.size	pack_float,

Re: [HACKERS] patch: function xmltable

2016-09-09 Thread Pavel Stehule

2016-09-09 10:35 GMT+02:00 Pavel Stehule <pavel.steh...@gmail.com>:

> Hi,
>
> I am sending new version of this patch
>
> 1. now generic TableExpr is better separated from a real content generation
> 2. I removed cached typmod - using row type cache everywhere - it is
> consistent with other few places in Pg where dynamic types are used - the
> result tupdesc is generated few times more - but it is not on critical path.
> 3. More comments, few more lines in doc.
> 4. Reformated by pgindent
>

new update

more regress tests

Regards

Pavel


>
> Regards
>
> Pavel
>
>
>


xmltable-20160909-2.patch.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] ICU integration

2016-09-09 Thread Dave Page

On Fri, Sep 9, 2016 at 2:27 PM, Tom Lane  wrote:
> Dave Page  writes:
>> We needed a specific version that was newer than that shipped with
>> RHEL 6 (or in EPEL) iirc.
>
> Sure hope that's not true of the currently-proposed patch :-(

Looking back at my old emails, apparently ICU 5.0 and later include
ucol_strcollUTF8() which avoids the need to convert UTF-8 characters
to 16 bit before sorting. RHEL 6 has the older 4.2 version of ICU.

-- 
Dave Page
Blog: http://pgsnake.blogspot.com
Twitter: @pgsnake

EnterpriseDB UK: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Project Policies

2016-09-09 Thread Vik Fearing

I often see mention of project policies for various things (the one
triggering this email is in commit 967a7b0).

Where can I find documentation for these policies?
-- 
Vik Fearing  +33 6 46 75 15 36
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] ICU integration

2016-09-09 Thread Tom Lane

Dave Page  writes:
> We needed a specific version that was newer than that shipped with
> RHEL 6 (or in EPEL) iirc.

Sure hope that's not true of the currently-proposed patch :-(

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] SELECT FOR UPDATE regression in 9.5

2016-09-09 Thread Magnus

I ran the web application mentioned in Marti's original mail on a 
patched Postgres server. It looks like it is working correctly now, no 
more test failures.


Thanks
-Magnus

On 07.09.2016 21:49, Marko Tiikkaja wrote:

On 07/09/16 7:29 PM, Alvaro Herrera wrote:

Marko, does this fix your reported problem too?  Both the assertion and
the overall test case that causes it to fire?


The test case never realized anything was wrong, but the assertion is 
gone.  So yup, problem solved on this end, at least.



.m





--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [sqlsmith] Failed assertion in joinrels.c

2016-09-09 Thread Tom Lane

Dilip Kumar  writes:
> Seems like a better option, done it this way..
> attaching the modified patch..

Pushed with cosmetic adjustments --- mostly, I thought we needed some
comments about the topic.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PATCH: Exclude additional directories in pg_basebackup

2016-09-09 Thread David Steele


On 9/6/16 10:25 PM, Michael Paquier wrote:
>

On Wed, Sep 7, 2016 at 12:16 AM, David Steele  wrote:


Attached is a new patch that adds sgml documentation.  I can expand on each
directory individually if you think that's necessary, but thought it was
better to lump them into a few categories.


+be ommitted from the backup as they will be initialized on postmaster
+startup. If the  is set and is
+under the database cluster directory then the contents of the directory
+specified by  can also
be ommitted.

s/ommitted/omitted/


Fixed.


+#define EXCLUDE_DIR_MAX 8
+#define EXCLUDE_DIR_STAT_TMP 0
+
+const char *excludeDirContents[EXCLUDE_DIR_MAX] =
+{
+/*
+ * Skip temporary statistics files. The first array position will be
+ * filled with the value of pgstat_stat_directory relative to PGDATA.
+ * PG_STAT_TMP_DIR must be skipped even when stats_temp_directory is set
+ * because PGSS_TEXT_FILE is always created there.
+ */
+NULL,
I find that ugly. I'd rather use an array with undefined size for the
fixed elements finishing by NULL, remove EXCLUDE_DIR_MAX and
EXCLUDE_DIR_STAT_TMP and use a small routine to do the work done on
_tarWriteHeader...


Done.  Also writing out pg_xlog with the new routine.

--
-David
da...@pgmasters.net
diff --git a/doc/src/sgml/backup.sgml b/doc/src/sgml/backup.sgml
index 0f09d82..a8daa07 100644
--- a/doc/src/sgml/backup.sgml
+++ b/doc/src/sgml/backup.sgml
@@ -1090,6 +1090,22 @@ SELECT pg_stop_backup();

 

+The contents of the pg_dynshmem/, pg_stat_tmp/,
+pg_notify/, pg_serial/,
+pg_snapshots/, and pg_subtrans/ directories can
+be omitted from the backup as they will be initialized on postmaster
+startup. If the  is set and is
+under the database cluster directory then the contents of the directory
+specified by  can also be omitted.
+   
+
+   
+Any file or directory beginning with pgsql_tmp can be
+omitted from the backup.  These files are removed on postmaster start and
+the directories will be recreated as needed.
+   
+
+   
 The backup label
 file includes the label string you gave to pg_start_backup,
 as well as the time at which pg_start_backup was run, and
diff --git a/src/backend/replication/basebackup.c 
b/src/backend/replication/basebackup.c
index da9b7a6..a441ae2 100644
--- a/src/backend/replication/basebackup.c
+++ b/src/backend/replication/basebackup.c
@@ -30,6 +30,7 @@
 #include "replication/basebackup.h"
 #include "replication/walsender.h"
 #include "replication/walsender_private.h"
+#include "storage/dsm_impl.h"
 #include "storage/fd.h"
 #include "storage/ipc.h"
 #include "utils/builtins.h"
@@ -57,6 +58,8 @@ static bool sendFile(char *readfilename, char *tarfilename,
 static void sendFileWithContent(const char *filename, const char *content);
 static void _tarWriteHeader(const char *filename, const char *linktarget,
struct stat * statbuf);
+static int64 _tarWriteDir(char *pathbuf, int basepathlen, struct stat *statbuf,
+   bool sizeonly);
 static void send_int8_string(StringInfoData *buf, int64 intval);
 static void SendBackupHeader(List *tablespaces);
 static void base_backup_cleanup(int code, Datum arg);
@@ -95,6 +98,67 @@ static int64 elapsed_min_unit;
 static int64 throttled_last;
 
 /*
+ * The contents of these directories are removed or recreated during server
+ * start so they will not be included in the backup.  The directory entry
+ * will be included to preserve permissions.
+ */
+const char *excludeDirContents[] =
+{
+   /*
+* Skip temporary statistics files. PG_STAT_TMP_DIR must be skipped even
+* when stats_temp_directory is set because PGSS_TEXT_FILE is always 
created
+* there.
+*/
+   PG_STAT_TMP_DIR,
+
+   /* Recreated on startup, see StartupReplicationSlots(). */
+   "pg_replslot",
+
+   /* Removed on startup, see dsm_cleanup_for_mmap(). */
+   PG_DYNSHMEM_DIR,
+
+   /* Recreated/zeroed on startup, see AsyncShmemInit(). */
+   "pg_notify",
+
+   /* Recreated on startup, see OldSerXidInit(). */
+   "pg_serial",
+
+   /* Removed on startup, see DeleteAllExportedSnapshotFiles(). */
+   "pg_snapshots",
+
+   /* Recreated/zeroed on startup, see StartupSUBTRANS(). */
+   "pg_subtrans",
+
+   /* Terminate list. */
+   NULL
+};
+
+/*
+ * Files that should not be included in the backup.
+ */
+const char *excludeFile[] =
+{
+   /* Skip auto conf temporary file. */
+   PG_AUTOCONF_FILENAME ".tmp",
+
+   /*
+* If there's a backup_label or tablespace_map file, it belongs to a 
backup
+* started by the user with pg_start_backup(). It is *not* correct for 
this
+* backup, our backup_label/tablespace_map is injected into the tar
+* separately.
+*/
+   BACKUP_LABEL_FILE,
+   TABLESPACE_MAP,
+
+   /* Skip

[HACKERS] More ideas for speeding up sorting

2016-09-09 Thread Heikki Linnakangas

A few things caught my eye while hacking on the latest round of sorting 
patches. Starting a new thread because these are orthogonal to the 
things discussed on the other threads:


1. SortTuple.tupindex is not used when the sort fits in memory. If we 
also got rid of the isnull1 flag, we could shrink SortTuple from 24 
bytes to 16 bytes (on 64-bit systems). That would allow you to pack more 
tuples into memory, which generally speeds things up. We could for 
example steal the least-significant bit from the 'tuple' pointer for 
that, because the pointers are always at least 4-byte aligned. (But see 
next point)


2. We could easily categorize incoming tuples as the come in, into those 
that have a leading NULL, and others. If we kept them in separate 
arrays, or perhaps grow memtuples from bottom-up for non-NULLs and from 
top-down for NULLs, we wouldn't need to store, or compare, the 'isnull1' 
flag at all, in the quicksort.


3. In the single-Datum case, i.e. tuplesort_begin_datum(), we could just 
keep a counter of how many NULLs we saw, and not store them at all. 
That's quite a special case, but it's simple enough that it might be 
worth it.


I'm not going to pursue these right now, but might be good projects for 
someone.


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] ICU integration

2016-09-09 Thread Dave Page

On Fri, Sep 9, 2016 at 9:02 AM, Devrim Gündüz  wrote:
>
> Hi,
>
> On Fri, 2016-09-09 at 09:48 +0800, Craig Ringer wrote:
>> Personally I would be pretty reluctant to package libicu when it's
>> already in RH/Fedora. If it were bundled in Pg's source tree and a
>> private copy was built/installed by the build system so it was part of
>> the main postgresql-server package that'd be different.
>
> Agreed. I did not read the whole thread (yet), but if this is something like
> tzdata, I would personally want to use the libuci supplied by OS, like we do
> for tzdata.
>
> (That said, just checked EDB's ICU support. We currently ship our own libicu
> there, as a part of EPAS, but I don't know the reasoning/history behind 
> there.)

We needed a specific version that was newer than that shipped with
RHEL 6 (or in EPEL) iirc.

-- 
Dave Page
Blog: http://pgsnake.blogspot.com
Twitter: @pgsnake

EnterpriseDB UK: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] ICU integration

2016-09-09 Thread Peter Eisentraut

On 9/8/16 8:07 PM, Peter Geoghegan wrote:
> Not exactly. Peter E. didn't seem to be aware that there is an ICU
> collator versioning concept (perhaps I misunderstood, though).

You appear to have missed the part about the collversion column that my
patch adds.  That's exactly the collator version of ICU.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Tuplesort merge pre-reading

2016-09-09 Thread Heikki Linnakangas


On 09/09/2016 03:25 PM, Greg Stark wrote:

On Fri, Sep 9, 2016 at 1:01 PM, Heikki Linnakangas  wrote:

I'm happy with what it looks like. We are in fact getting a more sequential
access pattern with these patches, because we're not expanding the pre-read
tuples into SortTuples. Keeping densely-packed blocks in memory, instead of
SortTuples, allows caching more data overall.



Wow, this is really cool. We should do something like this for query
execution too.

I still didn't follow exactly why removing the prefetching allows more
sequential i/o. I thought the whole point of prefetching was to reduce
the random i/o from switching tapes.


The first patch removed prefetching, but the second patch re-introduced 
it, in a different form. The prefetching is now done in logtape.c, by 
reading multiple pages at a time. The on-tape representation of tuples 
is more compact than having them in memory as SortTuples, so you can fit 
more data in memory overall, which makes the access pattern more sequential.


There's one difference between these approaches that I didn't point out 
earlier: We used to prefetch tuples from each *run*, and stopped 
pre-reading when we reached the end of the run. Now that we're doing the 
prefetching as raw tape blocks, we don't stop at run boundaries. I don't 
think that makes any big difference one way or another, but I thought 
I'd mention it.


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] ICU integration

2016-09-09 Thread Peter Eisentraut

On 9/8/16 11:08 PM, Peter Geoghegan wrote:
> In principle, and assuming I haven't gotten something wrong, it ought
> to be possible to unambiguously identify a collation based on a
> matching UCA version (i.e. DUCET table), plus the collation tailorings
> matching exactly, even across ICU versions that happen to be based on
> the same UCA version (they only seem to put out a new UCA version
> about once a year [4]).  It *might* be fine, practically speaking, to
> assume that a collation with a matching iso-code and UCA version is
> compatible forever and always across any ICU version. If not, it might
> instead be feasible to write a custom fingerprinter for collation
> tailorings that ran at initdb time.

The documentation [0] states that the collation version covers a broad
range of things.  So I don't think these additional mechanisms you
describe are necessary.

[0]: http://userguide.icu-project.org/collation/architecture#TOC-Versioning

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Push down more full joins in postgres_fdw

2016-09-09 Thread Etsuro Fujita


On 2016/09/08 19:51, Etsuro Fujita wrote:

On 2016/09/06 22:07, Ashutosh Bapat wrote:

This patch tries to do two things at a time 1. support join pushdown for
FULL join when the joining relations have remote conditions 2. better
support for fetching placeholder vars, whole row references and some
system columns. To make reviews easy, I think we should split the patch
into two 1. supporting subqueries to be deparsed with support for one of
the above (I will suggest FULL join support) 2. support for the other.



OK, will try.


I extracted #1 from the patch.  Attached is a patch for that.  If that 
patch is reasonable, I'll create a patch for #2 on top of it.


Comments welcome!

Best regards,
Etsuro Fujita


postgres-fdw-more-full-join-pushdown-v3.patch
Description: binary/octet-stream

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Tuplesort merge pre-reading

2016-09-09 Thread Greg Stark

On Fri, Sep 9, 2016 at 1:01 PM, Heikki Linnakangas  wrote:
> I'm happy with what it looks like. We are in fact getting a more sequential
> access pattern with these patches, because we're not expanding the pre-read
> tuples into SortTuples. Keeping densely-packed blocks in memory, instead of
> SortTuples, allows caching more data overall.

Wow, this is really cool. We should do something like this for query
execution too.

I still didn't follow exactly why removing the prefetching allows more
sequential i/o. I thought the whole point of prefetching was to reduce
the random i/o from switching tapes.

-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Bug in two-phase transaction recovery

2016-09-09 Thread Simon Riggs

On 8 September 2016 at 11:18, Simon Riggs  wrote:
> On 8 September 2016 at 07:43, Michael Paquier  
> wrote:
>> On Wed, Sep 7, 2016 at 10:48 PM, Stas Kelvich  
>> wrote:
>>> Some time ago two-phase state file format was changed to have variable size 
>>> GID,
>>> but several places that read that files were not updated to use new 
>>> offsets. Problem
>>> exists in master and 9.6 and can be reproduced on prepared transactions with
>>> savepoints.
>>
>> Oops and meh. This meritates an open item, and has better be fixed by
>> 9.6.0. I am glad you noticed that honestly. And we had better take
>> care of this issue as soon as possible.
>
> Looking now.

Committed, thanks.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [COMMITTERS] pgsql: Fix corruption of 2PC recovery with subxacts

2016-09-09 Thread Michael Paquier

On Fri, Sep 9, 2016 at 9:12 PM, Simon Riggs  wrote:
> Fix corruption of 2PC recovery with subxacts
>
> Reading 2PC state files during recovery was borked, causing corruptions during
> recovery. Effect limited to servers with 2PC, subtransactions and
> recovery/replication.

Thanks!
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: GiST optimizing memmoves in gistplacetopage for fixed-size updates [PoC]

2016-09-09 Thread Tom Lane

Andrey Borodin  writes:
> That storage assertion fired during usual update table set x=random() without 
> conditions. Also Make check fails without it (for brin usage, gist is ok with 
> it).

I'm confused by that assertion, because the patch-as-submitted didn't
touch BRIN.  Based on Alvaro's comments, yes it would fail if we'd
modified BRIN to use this function ... but we hadn't yet.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Tuplesort merge pre-reading

2016-09-09 Thread Heikki Linnakangas


On 09/08/2016 09:59 PM, Heikki Linnakangas wrote:

On 09/06/2016 10:26 PM, Peter Geoghegan wrote:

On Tue, Sep 6, 2016 at 12:08 PM, Peter Geoghegan  wrote:

Offhand, I would think that taken together this is very important. I'd
certainly want to see cases in the hundreds of megabytes or gigabytes
of work_mem alongside your 4MB case, even just to be able to talk
informally about this. As you know, the default work_mem value is very
conservative.


I spent some more time polishing this up, and also added some code to
logtape.c, to use larger read buffers, to compensate for the fact that
we don't do pre-reading from tuplesort.c anymore. That should trigger
the OS read-ahead, and make the I/O more sequential, like was the
purpose of the old pre-reading code. But simpler. I haven't tested that
part much yet, but I plan to run some tests on larger data sets that
don't fit in RAM, to make the I/O effects visible.


Ok, I ran a few tests with 20 GB tables. I thought this would show any 
differences in I/O behaviour, but in fact it was still completely CPU 
bound, like the tests on smaller tables I posted yesterday. I guess I 
need to point temp_tablespaces to a USB drive or something. But here we go.


It looks like there was a regression when sorting random text, with 256 
MB work_mem. I suspect that was a fluke - I only ran these tests once 
because they took so long. But I don't know for sure.


Claudio, if you could also repeat the tests you ran on Peter's patch set 
on the other thread, with these patches, that'd be nice. These patches 
are effectively a replacement for 
0002-Use-tuplesort-batch-memory-for-randomAccess-sorts.patch. And review 
would be much appreciated too, of course.


Attached are new versions. Compared to last set, they contain a few 
comment fixes, and a change to the 2nd patch to not allocate tape 
buffers for tapes that were completely unused.


- Heikki

~/git-sandbox/postgresql/src/test/sort (sort-speedups)$ PGDATABASE=sorttest 
LD_LIBRARY_PATH=~/pgsql.master/lib ./speed
# Tests on large-sized tables (20 GB), 16MB work_mem
-
random_ints: 1106152 ms
random_text: 2941289 ms

# Tests on large-sized tables (20 GB), 256MB work_mem
-
random_ints: 624327 ms
random_text: 1551049 ms

# Tests on large-sized tables (20 GB), 512MB work_mem
-
random_ints: 537217 ms
random_text: 1552546 ms

# Tests on large-sized tables (20 GB), 2GB work_mem
-
random_ints: 525460 ms
random_text: 1619040 ms
~/git-sandbox/postgresql/src/test/sort (sort-speedups)$ PGDATABASE=sorttest 
LD_LIBRARY_PATH=~/pgsql.master/lib ./speed
# Tests on large-sized tables (20 GB), 16MB work_mem
-
random_ints: 755071 ms
random_text: 3054989 ms

# Tests on large-sized tables (20 GB), 256MB work_mem
-
random_ints: 500955 ms
random_text: 1797228 ms

# Tests on large-sized tables (20 GB), 512MB work_mem
-
random_ints: 413773 ms
random_text: 1520891 ms

# Tests on large-sized tables (20 GB), 2GB work_mem
-
random_ints: 400744 ms
random_text: 1667530 ms

>From 90137ebfac0d5f2e80e2fb24cd12bfb664367f5d Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas 
Date: Fri, 9 Sep 2016 14:10:05 +0300
Subject: [PATCH 1/2] Don't bother to pre-read tuples into SortTuple slots
 during merge.

That only seems to add overhead. We're doing the same number of READTUP()
calls either way, but we're spreading the memory usage over a larger area
if we try to pre-read, so it doesn't seem worth it.

The pre-reading can be helpful, to trigger the OS readahead of the
underlying tape, because it will make the read pattern appear more
sequential. But we'll fix that in the next patch, by teaching logtape.c to
read in larger chunks.
---
 src/backend/utils/sort/tuplesort.c | 903 ++---
 1 file changed, 226 insertions(+), 677 deletions(-)

diff --git a/src/backend/utils/sort/tuplesort.c b/src/backend/utils/sort/tuplesort.c
index c8fbcf8..a6d167a 100644
--- a/src/backend/utils/sort/tuplesort.c
+++ b/src/backend/utils/sort/tuplesort.c
@@ -162,7 +162,7 @@ bool		optimize_bounded_sort = true;
  * The objects we actually sort are SortTuple structs.  These contain
  * a pointer to the tuple proper (might be a MinimalTuple or IndexTuple),
  * which is a separate palloc chunk --- we assume it is just one chunk and
- * can be freed by a simple pfree() (except during final on-the-fly merge,
+ * can be freed by a simple pfree() (except during merge,
  * when memory is used in batch).  SortTuples also contain the tuple's
  * first key column in Datum/nullflag format, and an index integer.
  *
@@ -191,9 +191,8 @@ bool		optimize_bounded_sort = true;
  * it now only distinguishes RUN_FIRST and HEAP_RUN_NEXT, since replacement
  * selection is always abandoned after the first run; no other run number
  * should be represented here.  During merge passes, we re-use it to hold the
- * input tape number that each tuple in the heap was read from, or to hold the
- * index of the next tuple

Re: [HACKERS] Partition-wise join for join between (declaratively) partitioned tables

2016-09-09 Thread Amit Kapila

On Fri, Sep 9, 2016 at 3:17 PM, Ashutosh Bapat
 wrote:
>
> 4. Remove bms_to_char(): I have added this function to print Relids in the
> debugger. I have found it very useful to quickly examine Relids in debugger,
> which otherwise wasn't so easy. If others find it useful too, I can create a
> separate patch to be considered for a separate commit.
>

+1 to have such a function.  I often need something like that whenever
I debug the optimizer code.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] CF app and patch series

2016-09-09 Thread Magnus Hagander

On Thu, Sep 8, 2016 at 5:48 PM, Alvaro Herrera 
wrote:

> Craig Ringer wrote:
> > Hi all
> >
> > Now that it's becoming more common to post patch series, not just
> > standalone patches, it might be worth looking at how the CF app can
> > help manage them.
> >
> > Any ideas?
>
> I agree that we don't consider this case at all currently and that it's
> causing some pain.
>
> MIME parsing is mind-boggling.  Trying to figure out *one* patch file
> from an email is already pretty difficult.  Trying to figure out more
> than one might be nightmarish.  Maybe Magnus will contradict me and say
> it's trivial to do -- that'd be great.
>

Nothing is ever trivial :)

Actually, the archives collect all attachments. And they're available in
the API there. But the commitfest app only keeps the first one.

So would it be trivial to collect more? Not really. Would it be
nightmarish? No, not that either.

So if we can figure out how to actually work it into the UI of the CF app,
it should certainly be doable to track multiple attachments.

> I don't have any great ideas for how to support this; I'd say it would
> be something like the CF entry has sub-entries one for each patch in the
> series, that can be closed independently.  This probably involves
> surgery to the CF database and app which I'm not volunteering to write,
> however.  Django-enabled contributors speak up now ...
>

Yeah, that would require a bit more surgery. Not sure how to represent it
though, if those patches are all typically sent in the same mailthread.
Which I guess they are. Because then that thread would have to be attached
to all of those sub-entries, which would kind of defeat the purpose?

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

Re: [HACKERS] ICU integration

2016-09-09 Thread Craig Ringer

On 9 September 2016 at 16:21, Magnus Hagander  wrote:
> On Thu, Sep 8, 2016 at 6:19 PM, Peter Eisentraut
>  wrote:
>>
>> On 9/8/16 11:16 AM, Tom Lane wrote:
>> > This is a problem, if ICU won't guarantee cross-version compatibility,
>> > because it destroys the argument that moving to ICU would offer us
>> > collation behavior stability.
>>
>> It would offer a significant upgrade over the current situation.
>>
>> First, it offers stability inside the same version.  Whereas glibc might
>> change a collation in a minor upgrade, ICU won't do that.  And the
>> postgres binary is bound to a major version of ICU by the soname (which
>> changes with every major release).  So this would avoid the situation
>> that a simple OS update could break collations.
>>
>> Second, it offers a way to detect that something has changed.  With
>> glibc, you don't know anything unless you read the source diffs.  With
>> ICU, you can compare the collation version before and after and at least
>> tell the user that they need to refresh indexes or whatever.
>>
>
> +1 on the importance of this last part.
>
> We may not be able to handle it directly, but just being able to point out
> to the user that "this index is incorrect, you have to reindex" and then
> refuse to use the index until that has been done would be a *huge*
> improvement.  And it would definitely help solve an existing real-world
> problem, which is what can happen when you restore a physical backup onto a
> different version of an operating system at least.
>
> Sure, it would be even better if we could automatically *deal* with it. But
> failing in a loud and obvious way is a *lot* better than silently returning
> incorrect data...

Yep, I strongly agree. That's part of why I think this is well worth
doing even though it doesn't look like it'll give us a full solution
for stable collations.

It's likely also a step or three toward case-insensitive
locales/collations, which I'm sure many people would like. Though far
from the whole picture.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Quorum commit for multiple synchronous replication.

2016-09-09 Thread Simon Riggs

On 8 September 2016 at 10:26, Masahiko Sawada  wrote:

> "k (n1, n2, n3)" == "first k (n1, n2, n3)" doesn't break backward
> compatibility but most users would think "k(n1, n2, n3)" as quorum
> after introduced quorum.
> I wish we can change the s_s_names syntax of 9.6 to "first k(n1, n2,
> n3)" style before 9.6 releasing if we got consensus.

Let's see the proposed patch, so we can evaluate the proposal.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Patch: Implement failover on libpq connect level.

2016-09-09 Thread Victor Wagner

On Fri, 9 Sep 2016 11:20:59 +0530
Mithun Cy  wrote:

> If concern is only about excluding address which was tried
> previously. Then I think we can try this way, randomly permute given
> address list (for example fisher-yates shuffle) before trying any of
> those address in it. After that you can treat the new list exactly
> like we do for sequential connections. I think this makes code less
> complex.

Random permutation is much more computationally complex than random
picking. 

Alexander suggests to use other random pick algorithm than I use.

I use algorithm which works with lists of unknown length and chooses
line of such list with equal probability. This algorithm requires one
random number for each element of the list, but we are using C library
rand function which uses quite cheap linear congruental pseudo random
numbers, so random number generation is infinitely cheap than network
connection attempt.

As this algorithm doesn't need beforehand knowledge of the list length,
it is easy to wrap this algorithm into loop, which modifies the list,
moving already tried elements out of scope of next picking.
(it is really quite alike fisher-yates algorithm).

Alexander suggests to store somewhere number of elements of the list,
than generate one random number and pick element with this number.

Unfortunately, it doesn't save much effort in the linked list.
At average we'll need to scan half of the list to find desired element.
So, it is O(N) anyway.

Now, the main point is that we have two different timeframes of
operation.

1. Attempt to connect. It is short period, and we can assume that state
of servers doesn't change. So if, server is down, we should ignore it
until the end of this attempt to connect.

2. Attempt to failover. If no good candidate was found, we might need
to retry connection until one of standbys would be promoted to master
(with targetServerType=master) or some host comes up. See documentation
of failover_timeout configuration parameter.

It means that we cannot just throw away those hosts from list which
didn't respond during this connect attempt.

So, using random permutation instead  of random pick wouln't help.
And keeping somewhere number of elements in the list wouldn't help
either unless we change linked list to completely different data
structure which allows random access.
-- 

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] to_date_valid()

2016-09-09 Thread Andreas 'ads' Scherbaum


On 08.09.2016 17:31, Peter Eisentraut wrote:

On 8/15/16 7:33 AM, Andreas 'ads' Scherbaum wrote:

postgres=# SELECT to_date('2011 12  18', ' MM   DD');
   to_date

  2011-12-08
(1 row)


That is from the regression tests, and obviously handles the date
transformation wrong. My attempt catches this, because I compare the
date with the input date, and do not rely on a valid date only.


It's debatable what is correct here.

Using to_number, the behavior appears to be that a space in the pattern
ignores one character.  For example:

test=# select to_number('123 456', '999 999');
 to_number
---
123456

test=# select to_number('123 456', '999  999');
 to_number
---
 12356

Considering that, the above to_date result is not incorrect.

So just squashing the spaces and converting the value back is not a
correct approach to detecting overflow.

I think using ValidateDate() was the right idea.  That is what we use
for checking date validity everywhere else.


ValidateDate() will tell you if it's a valid date. But not if the 
transformation was correct:


postgres=# SELECT to_date('2011 12  18', ' MM   DD');
  to_date

 2011-12-08
(1 row)

(with the patch from Artur)


Any idea how to solve this problem?

--
Andreas 'ads' Scherbaum
German PostgreSQL User Group
European PostgreSQL User Group - Board of Directors
Volunteer Regional Contact, Germany - PostgreSQL Project


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Quorum commit for multiple synchronous replication.

2016-09-09 Thread Petr Jelinek




On 09/09/16 08:23, Vik Fearing wrote:

On 09/09/2016 03:28 AM, Michael Paquier wrote:

On Thu, Sep 8, 2016 at 6:26 PM, Masahiko Sawada  wrote:

"k (n1, n2, n3)" == "first k (n1, n2, n3)" doesn't break backward
compatibility but most users would think "k(n1, n2, n3)" as quorum
after introduced quorum.
I wish we can change the s_s_names syntax of 9.6 to "first k(n1, n2,
n3)" style before 9.6 releasing if we got consensus.


Considering breaking backward-compatibility in the next release does
not sound like a good idea to me for a new feature that is going to be
GA soon.


Indeed.  I'll vote for pulling a fast one on 9.6 for this.



+1

--
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] patch: function xmltable

2016-09-09 Thread Pavel Stehule

Hi,

I am sending new version of this patch

1. now generic TableExpr is better separated from a real content generation
2. I removed cached typmod - using row type cache everywhere - it is
consistent with other few places in Pg where dynamic types are used - the
result tupdesc is generated few times more - but it is not on critical path.
3. More comments, few more lines in doc.
4. Reformated by pgindent

Regards

Pavel
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 5148095..ca861d2 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -10099,6 +10099,97 @@ SELECT xmlroot(xmlparse(document 'abc'),
 

 
+   
+xmltable
+
+   
+xmltable
+   
+
+
+xmltable(xmlnamespaces(namespace uri AS namespace name|DEFAULT namespace uri , ...) rowexpr PASSING BY REF xml BY REF COLUMNS name type PATH columnexpr DEFAULT expr NOT NULL|NULL , ...)
+
+
+
+  The xmltable produces table based on passed XML value.
+
+
+
+
+
+
+   
+ The optional xmlnamespaces clause allow to specify a list
+ of namespaces specified by namespace URI and
+ namespace name (alias>. The default namespace is
+ specified by namespace URI after keyword
+ DEFAULT.
+
+
+   
+
+
+ The BY REF clauses have no effect in
+ PostgreSQL, but are allowed for SQL conformance and compatibility
+ with other implementations.  Per SQL standard, the
+ first BY REF is required, the second is
+ optional.  Also note that the SQL standard specifies
+ the xmlexists construct to take an XQuery
+ expression as first argument, but PostgreSQL currently only
+ supports XPath, which is a subset of XQuery.
+
+
+   
+ The optional COLUMNS clause allow to specify a list
+ of colums of generated table. The column with special mark
+ FOR ORDINALITY ensures row numbers in this column.
+ When PATH is not defined, then the name of column is
+ used as implicit path. Only one column should be marked FOR ORDINALITY.
+ When path expression is empty, then possible DEFAULT value
+ is used.
+
+   
+
+   
+

 xmlagg
 
diff --git a/src/backend/executor/execQual.c b/src/backend/executor/execQual.c
index 743e7d6..304afcb 100644
--- a/src/backend/executor/execQual.c
+++ b/src/backend/executor/execQual.c
@@ -43,6 +43,7 @@
 #include "catalog/pg_type.h"
 #include "executor/execdebug.h"
 #include "executor/nodeSubplan.h"
+#include "executor/tableexpr.h"
 #include "funcapi.h"
 #include "miscadmin.h"
 #include "nodes/makefuncs.h"
@@ -189,6 +190,9 @@ static Datum ExecEvalCurrentOfExpr(ExprState *exprstate, ExprContext *econtext,
 static Datum ExecEvalGroupingFuncExpr(GroupingFuncExprState *gstate,
 		 ExprContext *econtext,
 		 bool *isNull, ExprDoneCond *isDone);
+static Datum ExecEvalTableExpr(TableExprState * tstate,
+  ExprContext *econtext,
+  bool *isnull, ExprDoneCond *isDone);
 
 
 /* 
@@ -4500,6 +4504,213 @@ ExecEvalCurrentOfExpr(ExprState *exprstate, ExprContext *econtext,
 	return 0;	/* keep compiler quiet */
 }
 
+/* 
+ *		ExecEvalTableExpr
+ *
+ * 
+ */
+static Datum
+ExecEvalTableExprProtected(TableExprState * tstate,
+		   ExprContext *econtext,
+		   bool *isNull, ExprDoneCond *isDone)
+{
+	TupleDesc	tupdesc;
+	Datum		result;
+	int			i;
+	Datum		value;
+	bool		isnull;
+	const TableExprBuilder *builder;
+	void	   *builderCxt;
+
+	tupdesc = tstate->tupdesc;
+	builder = tstate->builder;
+	builderCxt = tstate->builderCxt;
+
+	if (builderCxt == NULL)
+	{
+		ListCell   *ns;
+
+		builderCxt = builder->CreateContext(tupdesc,
+			tstate->in_functions,
+			tstate->typioparams,
+			tstate->per_rowset_memory);
+		tstate->builderCxt = builderCxt;
+
+		/* Evaluate document expression first */
+		value = ExecEvalExpr(tstate->expr, econtext, , NULL);
+		if (isnull)
+		{
+			*isDone = ExprSingleResult;
+			*isNull = true;
+			return (Datum) 0;
+		}
+
+		/*
+		 * The content can be bigger document and transformation to cstring
+		 * can be expensive. The table builder is better place for this task -
+		 * pass value as Datum.
+		 */
+		builder->SetContent(builderCxt, value);
+
+		/* Evaluate namespace specifications */
+		foreach(ns, tstate->namespaces)
+		{
+			Node	   *n = (Node *) lfirst(ns);
+			ExprState  *expr;
+			char	   *ns_name;
+			char	   *ns_uri;
+
+			if (IsA(n, NamedArgExpr))
+			{
+NamedArgExpr *na = (NamedArgExpr *) n;
+
+expr = (ExprState *) na->arg;
+ns_name = na->name;
+			}
+			else
+			{
+expr = (ExprState *) n;
+ns_name = NULL;
+			}
+
+			value = ExecEvalExpr((ExprState *) expr, econtext, , NULL);
+			if (isnull)
+ereport(ERROR,
+		(errcode(ERRCODE_NULL_VALUE_NOT_ALLOWED),
+		 errmsg("namespace uri must not be null")));
+			ns_uri =

Re: [HACKERS] ICU integration

2016-09-09 Thread Magnus Hagander

On Thu, Sep 8, 2016 at 6:19 PM, Peter Eisentraut <
peter.eisentr...@2ndquadrant.com> wrote:

> On 9/8/16 11:16 AM, Tom Lane wrote:
> > This is a problem, if ICU won't guarantee cross-version compatibility,
> > because it destroys the argument that moving to ICU would offer us
> > collation behavior stability.
>
> It would offer a significant upgrade over the current situation.
>
> First, it offers stability inside the same version.  Whereas glibc might
> change a collation in a minor upgrade, ICU won't do that.  And the
> postgres binary is bound to a major version of ICU by the soname (which
> changes with every major release).  So this would avoid the situation
> that a simple OS update could break collations.
>
> Second, it offers a way to detect that something has changed.  With
> glibc, you don't know anything unless you read the source diffs.  With
> ICU, you can compare the collation version before and after and at least
> tell the user that they need to refresh indexes or whatever.
>
>
+1 on the importance of this last part.

We may not be able to handle it directly, but just being able to point out
to the user that "this index is incorrect, you have to reindex" and then
refuse to use the index until that has been done would be a *huge*
improvement.  And it would definitely help solve an existing real-world
problem, which is what can happen when you restore a physical backup onto a
different version of an operating system at least.

Sure, it would be even better if we could automatically *deal* with it. But
failing in a loud and obvious way is a *lot* better than silently returning
incorrect data...

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

Re: [HACKERS] [PATCH v2] Add overflow checks to money type input function

2016-09-09 Thread Fabien COELHO




I have updated the patch with additional tests and comments per your
review.  Final(?) version attached.


Applied on head, make check ok. No more comments on the code which does 
what it says. I'm fine with this patch.


--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] ICU integration

2016-09-09 Thread Devrim Gündüz

Hi,

On Fri, 2016-09-09 at 09:48 +0800, Craig Ringer wrote:
> Personally I would be pretty reluctant to package libicu when it's
> already in RH/Fedora. If it were bundled in Pg's source tree and a
> private copy was built/installed by the build system so it was part of
> the main postgresql-server package that'd be different.

Agreed. I did not read the whole thread (yet), but if this is something like
tzdata, I would personally want to use the libuci supplied by OS, like we do
for tzdata.

(That said, just checked EDB's ICU support. We currently ship our own libicu
there, as a part of EPAS, but I don't know the reasoning/history behind there.)

Regards,
-- 
Devrim GÜNDÜZ
EnterpriseDB: http://www.enterprisedb.com
PostgreSQL Danışmanı/Consultant, Red Hat Certified Engineer
Twitter: @DevrimGunduz , @DevrimGunduzTR

signature.asc
Description: This is a digitally signed message part

Re: [HACKERS] COPY command with RLS bug

2016-09-09 Thread Simon Riggs

On 8 September 2016 at 20:13, Adam Brightwell
 wrote:

> I have discovered a bug with the COPY command, specifically related to RLS.
...
> Connecting as a non-privileged user provides the following results:
...
> COPY foo (a, b, c) TO stdout; -- fail
> ERROR:  missing FROM-clause entry for table "b"
> LINE 1: copy foo (a, b, c) to stdout;
...
> The issue seems to be that the target list for the resulting SELECT
> statement is not being built correctly. I have attached a proposed
> patch to fix this issue.  As well, I have added a few regression tests
> for this case.

Thanks for the report and the fix.

This seems a rather basic error to occur a year after release.

Is this a problem with the testing of RLS? What other RLS related
failures exist in other commands?

Perhaps we should extend rowsecurity test with a more comprehensive
set of tests rather than just fix the COPY one?

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Vacuum: allow usage of more than 1GB of work mem

2016-09-09 Thread Masahiko Sawada

On Fri, Sep 9, 2016 at 12:33 PM, Pavan Deolasee
 wrote:
>
>
> On Thu, Sep 8, 2016 at 11:40 PM, Masahiko Sawada 
> wrote:
>>
>>
>>
>> Making the vacuum possible to choose between two data representations
>> sounds good.
>> I implemented the patch that changes dead tuple representation to bitmap
>> before.
>> I will measure the performance of bitmap representation again and post
>> them.
>
>
> Sounds great! I haven't seen your patch, but what I would suggest is to
> compute page density (D) = relpages/(dead+live tuples) and experiment with
> bitmap of sizes of D to 2D bits per page. May I also suggest that instead of
> putting in efforts in implementing the overflow area,  just count how many
> dead TIDs would fall under overflow area for a given choice of bitmap size.
>

Isn't that formula "page density (D) = (dead+live tuples)/relpages"?

> It might be a good idea to experiment with different vacuum scale factor,
> varying between 2% to 20% (may be 2, 5, 10, 20). You can probably run a
> longish pgbench test on a large table and then save the data directory for
> repeated experiments, although I'm not sure if pgbench will be a good choice
> because HOT will prevent accumulation of dead pointers, in which case you
> may try adding another index on abalance column.

Thank you, I will experiment with this.

>
> It'll be worth measuring memory consumption of both representations as well
> as performance implications on index vacuum. I don't expect to see any major
> difference in either heap scans.
>

Yeah, it would be effective for the index vacuum speed and the number
of execution of index vacuum.

Attached PoC patch changes the representation of dead tuple locations
to the hashmap having tuple bitmap.
The one hashmap entry consists of the block number and the TID bitmap
of corresponding block, and the block number is the hash key of
hashmap.
Current implementation of this patch is not smart yet because each
hashmap entry allocates the tuple bitmap with fixed
size(LAZY_ALLOC_TUPLES), so each hashentry can store up to
LAZY_ALLOC_TUPLES(291 if block size is 8kB) tuples.
In case where one block can store only the several tens tuples, the
most bits are would be waste.

After improved this patch as you suggested, I will measure performance benefit.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center
From d1968d625ca1bb07711681a2d824737c53d27c8c Mon Sep 17 00:00:00 2001
From: Masahiko Sawada 
Date: Thu, 8 Sep 2016 13:59:23 -0400
Subject: [PATCH] Collect dead tuple location as a bitmap.

---
 src/backend/commands/vacuumlazy.c | 232 +-
 1 file changed, 153 insertions(+), 79 deletions(-)

diff --git a/src/backend/commands/vacuumlazy.c 
b/src/backend/commands/vacuumlazy.c
index b5fb325..ce7bd7e 100644
--- a/src/backend/commands/vacuumlazy.c
+++ b/src/backend/commands/vacuumlazy.c
@@ -98,6 +98,28 @@
  */
 #define SKIP_PAGES_THRESHOLD   ((BlockNumber) 32)
 
+#define BITS_PER_BITMAP_CHUNK 32
+#define BITMAP_CHUNKS_PER_PAGE \
+   ((int) ((LAZY_ALLOC_TUPLES / BITS_PER_BITMAP_CHUNK) + 1))
+#define BitoffsetToOffsetNumber(chunk, offset) \
+   (((chunk) * BITS_PER_BITMAP_CHUNK) + (offset) + 1)
+#define OffsetNumberToChunk(offset) \
+   ((offset) / BITS_PER_BITMAP_CHUNK)
+#define OffsetNumberToBitoffset(offset) \
+   ((offset) % BITS_PER_BITMAP_CHUNK)
+
+typedef struct PageEntry
+{
+   BlockNumber blockno;
+   uint32  bm[BITMAP_CHUNKS_PER_PAGE]; /* tuple bitmap of blockno */
+} PageEntry;
+
+typedef struct DeadTupleMap
+{
+   HTAB*pagetable;
+   int npages; /* # of blocks hashmap has */
+} DeadTupleMap;
+
 typedef struct LVRelStats
 {
/* hasindex = true means two-pass strategy; false means one-pass */
@@ -120,6 +142,9 @@ typedef struct LVRelStats
int num_dead_tuples;/* current # of entries 
*/
int max_dead_tuples;/* # slots allocated in 
array */
ItemPointer dead_tuples;/* array of ItemPointerData */
+
+   DeadTupleMap *dead_tuple_map; /* hash map if dead ItemPointerData */
+
int num_index_scans;
TransactionId latestRemovedXid;
boollock_waiter_detected;
@@ -148,20 +173,19 @@ static void lazy_vacuum_index(Relation indrel,
 static void lazy_cleanup_index(Relation indrel,
   IndexBulkDeleteResult *stats,
   LVRelStats *vacrelstats);
-static int lazy_vacuum_page(Relation onerel, BlockNumber blkno, Buffer buffer,
-int tupindex, LVRelStats *vacrelstats, Buffer 
*vmbuffer);
+static void lazy_vacuum_page(Relation onerel, BlockNumber blkno, Buffer buffer,
+PageEntry *entry, LVRelStats *vacrelstats, 
Buffer *vmbuffer);
 static bool

Re: [HACKERS] WAL consistency check facility

2016-09-09 Thread Kuntal Ghosh

Hello Michael,

Thanks for your detailed review.

>> - If WAL consistency check is enabled for a rmgrID, we always include
>> the backup image in the WAL record.
>
> What happens if wal_consistency has different settings on a standby
> and its master? If for example it is set to 'all' on the standby, and
> 'none' on the master, or vice-versa, how do things react? An update of
> this parameter should be WAL-logged, no?
>
It is possible to set wal_consistency to 'All' in master and any other
values in standby. But, the scenario you mentioned will cause error in
standby since it may not get the required backup image for wal
consistency check. I think that user should be responsible to set
this value correctly. We can improve the error message to make the
user aware of the situation.

>> - I've extended the RmgrTable with a new function pointer
>> rm_checkConsistency, which is called after rm_redo. (only when WAL
>> consistency check is enabled for this rmgrID)
>> - In each rm_checkConsistency, both backup pages and buffer pages are
>> masked accordingly before any comparison.
>
> This leads to heavy code duplication...
>
> +   void(*rm_checkConsistency) (XLogReaderState *record);
> All your _checkConsistency functions share the same pattern, in short
> they all use a for loop for each block, call each time
> XLogReadBufferExtended, etc. And this leads to a *lot* of duplication.
> You would get a reduction by a couple of hundreds of lines by having a
> smarter refactoring. And to be honest, if I look at your patch what I
> think is the correct way of doing things is to add to the rmgr not
> this check consistency function, but just a pointer to the masking
> function.
Pointer to the masking function will certainly reduce a lot of redundant
code. I'll modify it accordingly.

>> - In recovery tests (src/test/recovery/t), I've added wal_consistency
>> parameter in the existing scripts. This feature doesn't change the
>> expected output. If there is any inconsistency, it can be verified in
>> corresponding log file.
>
> I am afraid that just generating a WARNING message is going to be
> useless for the buildfarm. If we want to detect errors, we could for
> example have an additional GUC to trigger an ERROR or a FATAL, taking
> down the cluster, and allowing things to show in red on a platform.
>
Yes, we can include an additional GUC to trigger an ERROR for any inconsistency.

>> Results
>> 
>>
>> I've tested with installcheck and installcheck-world in master-standby
>> set-up. Followings are the configuration parameters.
>
> So you tested as well the recovery tests, right?
>
Yes, I've done the recovery tests after enabling tap-test.

>
> +   /*
> +* Followings are the rmids which can have backup blocks.
> +* We'll enable this feature only for these rmids.
> +*/
> +   newwalconsistency[RM_HEAP2_ID] = true;
> +   newwalconsistency[RM_HEAP_ID] = true;
> +   newwalconsistency[RM_BTREE_ID] = true;
> +   newwalconsistency[RM_HASH_ID] = true;
> +   newwalconsistency[RM_GIN_ID] = true;
> +   newwalconsistency[RM_GIST_ID] = true;
> +   newwalconsistency[RM_SEQ_ID] = true;
> +   newwalconsistency[RM_SPGIST_ID] = true;
> +   newwalconsistency[RM_BRIN_ID] = true;
> +   newwalconsistency[RM_GENERIC_ID] = true;
> +   newwalconsistency[RM_XLOG_ID] = true;
> Here you can just use MemSet with RM_MAX_ID and simplify this code 
> maintenance.
>
Not all rmids can have backup blocks. So, for wal_consistency = 'all',
I've enabled only those rmids which can have backup blocks.

>
> +   if (pg_strcasecmp(tok, "Heap2") == 0)
> +   {
> +   newwalconsistency[RM_HEAP2_ID] = true;
> +   }
> Thinking more about it, I guess that we had better change the
> definition list of rmgrs in rmgr.h and get something closer to
> RmgrDescData that pg_xlogdump has to avoid all this stanza by
> completing it with the name of the rmgr. The only special cases that
> this code path would need to take care of would be then 'none' and
> 'all'. You could do this refactoring on top of the main patch to
> simplify it as it is rather big (1.7k lines).
>
I'm not sure about this point. wal_consistency doesn't support  all
the rmids. We should have some way to check this.

I'll update rest of the things as mentioned by you accordingly.

--
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Quorum commit for multiple synchronous replication.

2016-09-09 Thread Vik Fearing

On 09/09/2016 03:28 AM, Michael Paquier wrote:
> On Thu, Sep 8, 2016 at 6:26 PM, Masahiko Sawada  wrote:
>> "k (n1, n2, n3)" == "first k (n1, n2, n3)" doesn't break backward
>> compatibility but most users would think "k(n1, n2, n3)" as quorum
>> after introduced quorum.
>> I wish we can change the s_s_names syntax of 9.6 to "first k(n1, n2,
>> n3)" style before 9.6 releasing if we got consensus.
> 
> Considering breaking backward-compatibility in the next release does
> not sound like a good idea to me for a new feature that is going to be
> GA soon.

Indeed.  I'll vote for pulling a fast one on 9.6 for this.
-- 
Vik Fearing  +33 6 46 75 15 36
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

71 matches

Mail list logo